R: sparse matrix multiplication with data.table and quanteda package?
Asked Answered
A

2

0

I am trying to create a matrix mulptiplication with sparse matrix and with the package called quanteda, utilising data.table package, related to this thread here. So

require(quanteda) 

mytext <- c("Let the big dogs hunt", "No holds barred", "My child is an honor student")     
myMatrix <-dfm(mytext, ignoredFeatures = stopwords("english"), stem = TRUE) #a data.table
as.matrix(myMatrix) %*% transpose(as.matrix(myMatrix))

how can you get the matrix multiplication working here with quanteda package and sparse matrices?

Algor answered 9/1, 2017 at 15:22 Comment(1)
@Roland quanteda returns the data.table object.Algor
C
1

This works just fine:

mytext <- c("Let the big dogs hunt", 
            "No holds barred", 
            "My child is an honor student")     
myMatrix <- dfm(mytext)

myMatrix %*% t(myMatrix)
## 3 x 3 sparse Matrix of class "dgCMatrix"
##       text1 text2 text3
## text1     5     .     .
## text2     .     3     .
## text3     .     .     6

No need to coerce to a dense matrix using as.matrix(). Note that it is no longer a "dfmSparse" object because it's no longer a matrix of documents by features.

Chosen answered 9/1, 2017 at 19:0 Comment(0)
A
1

Use t command, not transpose command, for the matrix multiplication such that

as.matrix(myMatrix) %*% t(as.matrix(myMatrix))

also as commented, as.matrix is non-sparse while Matrix::matrix is sparse but unnecessary here, so better

myMatrix %*% t(myMatrix)

and potentially even better

crossprod(myMatrix) 
tcrossprod(myMatrix) 

but it requires numeric/complex matrix/vector arguments, not working with the example in the question:

require(quanteda)  
mytext <- c("Let the big dogs hunt", "No holds barred", "My child is an honor student")      
myMatrix <-dfm(mytext, ignoredFeatures = stopwords("english"), stem = TRUE) 
crossprod(myMatrix) 
tcrossprod(myMatrix)
Algor answered 9/1, 2017 at 15:41 Comment(5)
Also, as.matrix will not create a sparse matrix. Use Matrix::Matrix instead.Bellows
@Bellows super important point, thank you! +1 How is Matrix::Matrix different from Matrix::sparseMatrix?Algor
I'm not familar wth quantega, but I just installed it, and it seems that 'dfm' already returns a sparse matrix of class dfm-class. In which case all you need is myMatrix %*% t(myMatrix). Are you using an old version of quantega that you get a data.table returned? Also in my version ignoredFeatures argument is ignored.Bellows
I always thought this is what we have tcrossprod forJochebed
@DavidArenburg can you clarify? crossprod/tcrossprod fires error about requiring numeric/complex matrix/vector arguments.Algor
C
1

This works just fine:

mytext <- c("Let the big dogs hunt", 
            "No holds barred", 
            "My child is an honor student")     
myMatrix <- dfm(mytext)

myMatrix %*% t(myMatrix)
## 3 x 3 sparse Matrix of class "dgCMatrix"
##       text1 text2 text3
## text1     5     .     .
## text2     .     3     .
## text3     .     .     6

No need to coerce to a dense matrix using as.matrix(). Note that it is no longer a "dfmSparse" object because it's no longer a matrix of documents by features.

Chosen answered 9/1, 2017 at 19:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.