How to change the dtype of certain columns of a numpy recarray?
Asked Answered
L

3

15

Suppose I have a recarray such as the following:

import numpy as np

# example data from @unutbu's answer
recs = [('Bill', '31', 260.0), ('Fred', 15, '145.0')]
r = np.rec.fromrecords(recs, formats = 'S30,i2,f4', names = 'name, age, weight')

print(r)
# [('Bill', 31, 260.0) ('Fred', 15, 145.0)]

Say I want to convert certain columns to floats. How do I do this? Should I change to an ndarray and them back to a recarray?

Lati answered 30/3, 2012 at 19:38 Comment(0)
A
17

Here is an example using astype to perform the conversion:

import numpy as np
recs = [('Bill', '31', 260.0), ('Fred', 15, '145.0')]
r = np.rec.fromrecords(recs, formats = 'S30,i2,f4', names = 'name, age, weight')
print(r)
# [('Bill', 31, 260.0) ('Fred', 15, 145.0)]

The age is of dtype <i2:

print(r.dtype)
# [('name', '|S30'), ('age', '<i2'), ('weight', '<f4')]

We can change that to <f4 using astype:

r = r.astype([('name', '|S30'), ('age', '<f4'), ('weight', '<f4')])
print(r)
# [('Bill', 31.0, 260.0) ('Fred', 15.0, 145.0)]
Arsenopyrite answered 30/3, 2012 at 19:45 Comment(1)
Thanks! The "astype" is slightly more compact than recreating a new array ... I'm assuming it amounts to the same thing in terms of efficiency. I'm posting my solution below as well since it includes how to modify an existing dtype.Lati
L
17

There are basically two steps. My stumbling block was in finding how to modify an existing dtype. This is how I did it:

# change dtype by making a whole new array
dt = data.dtype
dt = dt.descr # this is now a modifiable list, can't modify numpy.dtype
# change the type of the first col:
dt[0] = (dt[0][0], 'float64')
dt = numpy.dtype(dt)
# data = numpy.array(data, dtype=dt) # option 1
data = data.astype(dt)
Lati answered 5/4, 2012 at 14:42 Comment(0)
C
1

Here is a minor refinement of the existing answers, plus an extension to situations where you want to make a change based on the dtype rather than column name (e.g. change all floats to integers).

First, you can improve the conciseness and readability by using a listcomp:

col       = 'age'
new_dtype = 'float64'

r.astype( [ (col, new_dtype) if d[0] == col else d for d in r.dtype.descr ] )

# rec.array([(b'Bill', 31.0, 260.0), (b'Fred', 15.0, 145.0)], 
#           dtype=[('name', 'S30'), ('age', '<f8'), ('weight', '<f4')])

Second, you can extend this syntax to handle cases where you want to change all floats to integers (or vice versa). For example, if you wanted to change any 32 or 64 bit float into a 64 bit integer, you could do something like:

old_dtype = ['<f4', '<f8']
new_dtype = 'int64'

r.astype( [ (d[0], new_dtype) if d[1] in old_dtype else d for d in r.dtype.descr ] )

# rec.array([(b'Bill', 31, 260), (b'Fred', 15, 145)], 
#           dtype=[('name', 'S30'), ('age', '<i2'), ('weight', '<i8')])

Note that astype has an optional casting argument that defaults to unsafe so you may want to specify casting='safe' to avoid accidentally losing precision when casting floats to integers:

r.astype( [ (d[0], new_dtype) if d[1] in old_dtype else d for d in r.dtype.descr ],
          casting='safe' )

Refer to the numpy documentation on astype for more on casting and other options.

Also note that for general cases of changing floats to integers or vice versa you might prefer to check the general number type with np.issubdtype rather than checking against multiple specific dtypes.

Coahuila answered 20/9, 2018 at 11:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.