I'm working turning a list of records with two columns (A and B) into a matrix representation. I have been using the pivot function within pandas, but the result ends up being fairly large. Does pandas support pivoting into a sparse format? I know I can pivot it and then turn it into some kind of sparse representation, but isn't as elegant as I would like. My end goal is to use it as the input for a predictive model.
Alternatively, is there some kind of sparse pivot capability outside of pandas?
edit: here is an example of a non-sparse pivot
import pandas as pd
frame=pd.DataFrame()
frame['person']=['me','you','him','you','him','me']
frame['thing']=['a','a','b','c','d','d']
frame['count']=[1,1,1,1,1,1]
frame
person thing count
0 me a 1
1 you a 1
2 him b 1
3 you c 1
4 him d 1
5 me d 1
frame.pivot('person','thing')
count
thing a b c d
person
him NaN 1 NaN 1
me 1 NaN NaN 1
you 1 NaN 1 NaN
This creates a matrix that could contain all possible combinations of persons and things, but it is not sparse.
http://docs.scipy.org/doc/scipy/reference/sparse.html
Sparse matrices take up less space because they can imply things like NaN or 0. If I have a very large data set, this pivoting function can generate a matrix that should be sparse due to the large number of NaNs or 0s. I was hoping that I could save a lot of space/memory by generating something that was sparse right off the bat rather than creating a dense matrix and then converting it to sparse.
person
andthing
to integers) – Samples