What algorithm is used for finding ngrams?
Supposing my input data is an array of words and the size of the ngrams I want to find, what algorithm I should use?
I'm asking for code, with preference for R. The data is stored in database, so can be a plgpsql function too. Java is a language I know better, so I can "translate" it to another language.
I'm not lazy, I'm only asking for code because I don't want to reinvent the wheel trying to do an algorithm that is already done.
Edit: it's important know how many times each n-gram appears.
Edit 2: there is a R package for N-GRAMS?
tm
) and atextcat
package ...library("sos"); findFn("n-gram")
– Scarcely