term-document-matrix

7

I have been using the tm package to run some text analysis. My problem is with creating a list with words and their frequencies associated with the same library(tm) library(RWeka) txt <- read....

r text-mining word-frequency term-document-matrix

Elboa asked 7/8, 2013 at 10:30

2

Creating N-Grams with tm & RWeka - works with VCorpus but not Corpus

Following the many guides to creating biGrams using the 'tm' and 'RWeka' packages, I was getting frustrated that only 1-Grams were being returned in the tdm. Through much trial and error I discover...

r tm n-gram term-document-matrix rweka

Unicuspid asked 13/3, 2017 at 5:33

4

Solved

More efficient means of creating a corpus and DTM with 4M rows

My file has over 4M rows and I need a more efficient way of converting my data to a corpus and document term matrix such that I can pass it to a bayesian classifier. Consider the following code: ...

r data.table corpus term-document-matrix qdap

Whitmer asked 15/8, 2014 at 16:57

3

Solved

How can I tell Solr to return the hit search terms per document?

I have a question about queries in Solr. When I perform a query with multiple search terms that are all logically linked by OR (e.g. q=content:(foo OR bar OR foobar)) than Solr returns a list of do...

solr term-document-matrix

Lignify asked 30/7, 2014 at 13:27

1

How to efficiently compute similarity between documents in a stream of documents

I gather Text documents (in Node.js) where one document i is represented as a list of words. What is an efficient way to compute the similarity between these documents, taking into account that new...

node.js stream nlp cosine-similarity term-document-matrix

Superdreadnought asked 21/12, 2012 at 8:17

3

Solved

TermDocumentMatrix sometimes throwing error

I am creating a Word Cloud based on Tweets from various different sports teams. This code executes successfully about 1 in 10 times: handle <- 'arsenal' txt <- searchTwitter(handle,n=1000,la...

r word-cloud term-document-matrix

Sydney asked 6/9, 2014 at 10:31

3

Solved

efficient Term Document Matrix with NLTK

I am trying to create a term document matrix with NLTK and pandas. I wrote the following function: def fnDTM_Corpus(xCorpus): import pandas as pd '''to create a Term Document Matrix from a NLTK...

python pandas nltk term-document-matrix

Mcchesney asked 9/4, 2013 at 10:46

3

Solved

TermDocumentMatrix errors in R

I have been working through numerous online examples of the {tm} package in R, attempting to create a TermDocumentMatrix. Creating and cleaning a corpus has been pretty straightforward, but I consi...

r text-mining tm corpus term-document-matrix

Unwary asked 28/8, 2014 at 14:36

4

Solved

Error converting text to lowercase with tm_map(..., tolower)

I tried using the tm_map. It gave the following error. How can I get around this? require(tm) byword<-tm_map(byword, tolower) Error in UseMethod("tm_map", x) : no applicable method for 'tm...

r tm lowercase term-document-matrix

Ozan asked 30/11, 2012 at 6:35

2

Solved

How to build a Term-Document-Matrix from a set of texts and a specific set of terms (tags)?

I have two sets of data: a set of tags (single words like php, html, etc) a set of texts I wish now to build a Term-Document-Matrix representing the number occurrences of the tags element in th...

r term-document-matrix

Brookner asked 31/10, 2013 at 11:56

3

Solved

R - slowly working lapply with sort on ordered factor

Based on the question More efficient means of creating a corpus and DTM I've prepared my own method for building a Term Document Matrix from a large corpus which (I hope) do not require Terms x Doc...

r text-mining lapply corpus term-document-matrix

Peroxidase asked 5/4, 2015 at 23:37

1

Solved

Big Text Corpus breaks tm_map

I have been breaking my head over this one over the last few days. I searched all the SO archives and tried the suggested solutions but just can't seem to get this to work. I have sets of txt docum...

r text-mining tm text-analysis term-document-matrix

Mclain asked 9/11, 2014 at 23:30

1

Solved

findAssocs for multiple terms in R

In R I used the [tm package][1] for building a term-document matrix from a corpus of documents. My goal is to extract word-associations from all bigrams in the term document matrix and return for...

r text-mining term-document-matrix

Calendula asked 30/5, 2013 at 12:21

1

Solved

R : Finding the top 10 terms associated with the term 'fraud' across documents in a Document Term Matrix in R

I have a corpus of 39 text files named by the year - 1945.txt, 1978.txt.... 2013.txt. I've imported them into R and created a Document Term Matrix using TM package. I'm trying to investigate how w...

r word-frequency term-document-matrix

Spaceless asked 22/5, 2013 at 15:31

1

Solved

R tm package create matrix of Nmost frequent terms

I have a termDocumentMatrix created using the tm package in R. I'm trying to create a matrix/dataframe that has the 50 most frequently occurring terms. When I try to convert to a matrix I get thi...

r text-mining tm term-document-matrix

Elke asked 16/7, 2012 at 16:42

term-document-matrix Questions

Recommended topics

Hot tags