Transporting Sparse Matrix from Python to R
Asked Answered
S

1

11

I am doing some text analysis work in Python. Unfortunately, I need to switch to R in order to use a particular package (unfortunately, the package cannot be replicated in Python easily).

Currently the text is parsed into bigram counts, reduced to a vocabulary of about 11,000 bigrams, and then stored as a dictionary:

{id1: {'bigrams':[(bigram1, count), (bigram2, count), ...]},
id2: {'bigrams': ...} 

I need to get this into a dgCMatrix in R, where the rows are id1, id2, ... and the columns are the different bigrams such that a cell represents the 'count' for that id-bigram.

Any suggestions? I thought about expanding it just to a massive CSV, but that seems super inefficient plus probably infeasible due to memory constraints.

Suppletory answered 5/6, 2015 at 21:15 Comment(1)
An example with actual values and in greater numbers might be more useful. As it is you are expecting us to do quite a bit of work before even attempting to code. Maybe you fancy Python coders grasp this layout better than this feeble R-coder, but can you please provide more substance?Perry
U
10

Could you could write out the matrix in MatrixMarket format using scipy mmwrite and then read it into R using readMM from the Matrix package?

Uchish answered 5/6, 2015 at 21:36 Comment(2)
This worked! It isn't a super memory efficient way of doing it (as far as I can tell), but managed to get it to run on my computer just fine.Suppletory
Hopefully it's pretty time efficient! LOL! :) Glad I could help.Uchish

© 2022 - 2024 — McMap. All rights reserved.