I'll actually post a full answer to this, since I think it's worth it being obvious that you can use n-gram models as classifiers (in much the same way as you can use any probability model of your features as one).
Generative classifiers approximate the posterior of interest, p(class | test doc) as:
p(c|t) \propto p(c) p(t|c)
where p(c) is the prior probability of c and p(t|c) is the likelihood. Classification picks the arg-max over all c. An n-gram language model, just like Naive Bayes or LDA or whatever generative model you like, can be construed as a probability model p(t|c) if you estimate a separate model for each class. As such, it can provide all the information required to do classification.
The question is whether the model is any use, of course. The major issue is that n-gram models tend to be built over billions of words of text, where classifiers are often trained on a few thousand. You can do complicated stuff like putting joint priors on the parameters of all the class' models, clamping hyperparameters to be equal (what these parameters are depends on how you do smoothing)... but it's still tricky.
An alternative is to build an n-gram model of characters (including spaces/punctuation if it turns out to be useful). This can be estimated much more reliably (26^3 parameters for tri-gram model instead of ~20000^3), and can be very useful for author identification/genre classification/other forms of classification that have stylistic elements.