I've read a paper that uses ngram counts as feature for a classifier, and I was wondering what this exactly means.
Example text: "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam"
I can create unigrams, bigrams, trigrams, etc. out of this text, where I have to define on which "level" to create these unigrams. The "level" can be character, syllable, word, ...
So creating unigrams out of the sentence above would simply create a list of all words?
Creating bigrams would result in word pairs bringing together words that follow each other?
So if the paper talks about ngram counts, it simply creates unigrams, bigrams, trigrams, etc. out of the text, and counts how often which ngram occurs?
Is there an existing method in python's nltk package? Or do I have to implement a version of my own?