I don't think your question is posed quite right: Ngrams are a tool, not a problem to be solved, so there is no "state of the art" in ngrams. As @Hooked pointed out, an ngram is a kind of auto-correlation function (or "autoregressive function"). So what you really want to know is if there are any problems for which the state of the art solutions involve long ngrams.
For numerical applications such as fitting financial or weather models, or speech recognition, you'd definitely use vectors of dimension > 3. For example, autoregressive Hidden Markov Models fit a piecewise function of the last n measurements, where n can be moderately large if past states are relevant to predicting the future.
But all your examples concern word ngrams, and I can't think of any work that found n > 3 to be useful in that domain. I don't think it's a question of computational cost or finding enough training data: Superficial auto-correlation in language seems to peter out after 3 words or so. Random example: this article tries to reinterpret Zipf's law in terms of ngram-based information content. They consider n up to 4, but get the highest overall correlations for the trigram counts.
I don't mean to say that n > 3 is not useful; but your observation that it doesn't seem to come up much is well founded.
But note that the complexity of counting ngrams in a text is not an issue: If you have a tokenized corpus of length L, you could collect all ngrams of the corpus like this:
for i in range(0, L-n):
tuple = corpus[i:i+n]
ngrams[tuple] += 1
As you can see this requires only O(L) steps, i.e., it is linear on the size of the corpus and does not grow with n. So collecting ngrams of any dimension is a non-issue. But the number of possible ngrams quickly mushrooms. To illustrate, if you distinguish 32 letter tokens (letters and some punctuation classes), there are 1024 letter bigrams but 1048576 tetragrams. To find enough of them to populate your frequency tables, you need exponentially more text.
For word ngrams the sparsity problem is even worse, since not only do you have a lot more than 32 different word tokens, but the vocabulary size increases (slowly) with corpus size: the famous "long tail" property. So your data will be sparse (even for small n) no matter how large a corpus you collect. You'll then need to fit complicated statistical models, whose computation cost depends on the number of distinct ngrams.
As a result, sparsity is always an issue in word ngram applications (hence "smoothing" is usually necessary). If you google "ngram sparsity" you'll find a ton of references.