I would like to specify the dtypes returned when doing pandas.read_sql. In particular I am interested in saving memory and having float values returned as np.float32 instead of np.float64. I know that I can convert afterwards with astype(np.float32) but that doesn't solve the problem of the large memory requirements in the initial query. In my actual code, I will be pulling 84 million rows, not the 5 shown here. pandas.read_csv allows for specifying dtypes as a dict, but I see no way to do that with read_sql.
I am using MySQLdb and Python 2.7.
As an aside, read_sql seems to use far more memory while running (about 2x) than it needs for the final DataFrame storage.
In [70]: df=pd.read_sql('select ARP, ACP from train where seq < 5', connection)
In [71]: df
Out[71]:
ARP ACP
0 1.17915 1.42595
1 1.10578 1.21369
2 1.35629 1.12693
3 1.56740 1.61847
4 1.28060 1.05935
In [72]: df.dtypes
Out[72]:
ARP float64
ACP float64
dtype: object