term-document-matrix Questions

7

I have been using the tm package to run some text analysis. My problem is with creating a list with words and their frequencies associated with the same library(tm) library(RWeka) txt <- read....
Elboa asked 7/8, 2013 at 10:30

2

Following the many guides to creating biGrams using the 'tm' and 'RWeka' packages, I was getting frustrated that only 1-Grams were being returned in the tdm. Through much trial and error I discover...
Unicuspid asked 13/3, 2017 at 5:33

4

Solved

My file has over 4M rows and I need a more efficient way of converting my data to a corpus and document term matrix such that I can pass it to a bayesian classifier. Consider the following code: ...
Whitmer asked 15/8, 2014 at 16:57

3

Solved

I have a question about queries in Solr. When I perform a query with multiple search terms that are all logically linked by OR (e.g. q=content:(foo OR bar OR foobar)) than Solr returns a list of do...
Lignify asked 30/7, 2014 at 13:27

1

I gather Text documents (in Node.js) where one document i is represented as a list of words. What is an efficient way to compute the similarity between these documents, taking into account that new...
Superdreadnought asked 21/12, 2012 at 8:17

3

Solved

I am creating a Word Cloud based on Tweets from various different sports teams. This code executes successfully about 1 in 10 times: handle <- 'arsenal' txt <- searchTwitter(handle,n=1000,la...
Sydney asked 6/9, 2014 at 10:31

3

Solved

I am trying to create a term document matrix with NLTK and pandas. I wrote the following function: def fnDTM_Corpus(xCorpus): import pandas as pd '''to create a Term Document Matrix from a NLTK...
Mcchesney asked 9/4, 2013 at 10:46

3

Solved

I have been working through numerous online examples of the {tm} package in R, attempting to create a TermDocumentMatrix. Creating and cleaning a corpus has been pretty straightforward, but I consi...
Unwary asked 28/8, 2014 at 14:36

4

Solved

I tried using the tm_map. It gave the following error. How can I get around this? require(tm) byword<-tm_map(byword, tolower) Error in UseMethod("tm_map", x) : no applicable method for 'tm...
Ozan asked 30/11, 2012 at 6:35

2

Solved

I have two sets of data: a set of tags (single words like php, html, etc) a set of texts I wish now to build a Term-Document-Matrix representing the number occurrences of the tags element in th...
Brookner asked 31/10, 2013 at 11:56

3

Solved

Based on the question More efficient means of creating a corpus and DTM I've prepared my own method for building a Term Document Matrix from a large corpus which (I hope) do not require Terms x Doc...
Peroxidase asked 5/4, 2015 at 23:37

1

Solved

I have been breaking my head over this one over the last few days. I searched all the SO archives and tried the suggested solutions but just can't seem to get this to work. I have sets of txt docum...
Mclain asked 9/11, 2014 at 23:30

1

Solved

In R I used the [tm package][1] for building a term-document matrix from a corpus of documents. My goal is to extract word-associations from all bigrams in the term document matrix and return for...
Calendula asked 30/5, 2013 at 12:21

1

Solved

I have a corpus of 39 text files named by the year - 1945.txt, 1978.txt.... 2013.txt. I've imported them into R and created a Document Term Matrix using TM package. I'm trying to investigate how w...
Spaceless asked 22/5, 2013 at 15:31

1

Solved

I have a termDocumentMatrix created using the tm package in R. I'm trying to create a matrix/dataframe that has the 50 most frequently occurring terms. When I try to convert to a matrix I get thi...
Elke asked 16/7, 2012 at 16:42
1

© 2022 - 2024 — McMap. All rights reserved.