I would outright discard byte-level n-grams for text-related tasks, because bytes are not a meaningful representation of anything.
Of the 2 remaining levels, the character-level n-grams will need much less storage space and will , subsequently, hold much less information. They are usually utilized in such tasks as language identification, writer identification (i.e. fingerprinting), anomaly detection.
As for word-level n-grams, they may serve the same purposes, and much more, but they need much more storage. For instance, you'll need up to several gigabytes to represent in memory a useful subset of English word 3-grams (for general-purpose tasks). Yet, if you have a limited set of texts you need to work with, word-level n-grams may not require so much storage.
As for the issue of errors, a sufficiently large word n-grams corpus will also include and represent them. Besides, there are various smoothing methods to deal with sparsity.
There other issue with n-grams is that they will almost never be able to capture the whole needed context, so will only approximate it.
You can read more about n-grams in the classic Foundations of Statistical Natural Language Processing.