How do I add an extra column to a NumPy array?
Asked Answered
A

17

449

Given the following 2D array:

a = np.array([
    [1, 2, 3],
    [2, 3, 4],
])

I want to add a column of zeros along the second axis to get:

b = np.array([
    [1, 2, 3, 0],
    [2, 3, 4, 0],
])
Arni answered 13/12, 2011 at 8:36 Comment(0)
B
234

I think a more straightforward solution and faster to boot is to do the following:

import numpy as np
N = 10
a = np.random.rand(N,N)
b = np.zeros((N,N+1))
b[:,:-1] = a

And timings:

In [23]: N = 10

In [24]: a = np.random.rand(N,N)

In [25]: %timeit b = np.hstack((a,np.zeros((a.shape[0],1))))
10000 loops, best of 3: 19.6 us per loop

In [27]: %timeit b = np.zeros((a.shape[0],a.shape[1]+1)); b[:,:-1] = a
100000 loops, best of 3: 5.62 us per loop
Bulge answered 13/12, 2011 at 12:47 Comment(7)
I want to append (985,1) shape np araay to (985,2) np array to make it (985,3) np array, but it's not working. I am getting "could not broadcast input array from shape (985) into shape (985,1)" error. What is wrong with my code? Code: np.hstack(data, data1)Revisory
@Outlier you should post a new question rather than ask one in the comments of this one.Bulge
@JoshAdel: I tried your code on ipython, and I think there's a syntax error. You might want to try changing a = np.random.rand((N,N)) to a = np.random.rand(N,N)Scenario
I guess this is an overkill for what OP asked for. Op's answer is apt!Florid
This is just a trick on performing append, or insert, or stack. and should not be accepted as answers. Engineers should consider using the answers below.Eastereasterday
@Outlier what was your solution? i'm getting the same errorRevue
Would be faster if np.empty is used instead of np.zeros.Gladiate
T
496

np.r_[...] (docs) and np.c_[...] (docs) are useful alternatives to np.vstack and np.hstack. Note that they use square brackets [] instead of parentheses ().

Some examples:

: import numpy as np
: N = 3
: A = np.eye(N)

: np.c_[ A, np.ones(N) ]              # add a column
array([[ 1.,  0.,  0.,  1.],
       [ 0.,  1.,  0.,  1.],
       [ 0.,  0.,  1.,  1.]])

: np.c_[ np.ones(N), A, np.ones(N) ]  # or two
array([[ 1.,  1.,  0.,  0.,  1.],
       [ 1.,  0.,  1.,  0.,  1.],
       [ 1.,  0.,  0.,  1.,  1.]])

: np.r_[ A, [A[1]] ]              # add a row
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.],
       [ 0.,  1.,  0.]])
: # not np.r_[ A, A[1] ]

: np.r_[ A[0], 1, 2, 3, A[1] ]    # mix vecs and scalars
  array([ 1.,  0.,  0.,  1.,  2.,  3.,  0.,  1.,  0.])

: np.r_[ A[0], [1, 2, 3], A[1] ]  # lists
  array([ 1.,  0.,  0.,  1.,  2.,  3.,  0.,  1.,  0.])

: np.r_[ A[0], (1, 2, 3), A[1] ]  # tuples
  array([ 1.,  0.,  0.,  1.,  2.,  3.,  0.,  1.,  0.])

: np.r_[ A[0], 1:4, A[1] ]        # same, 1:4 == arange(1,4) == 1,2,3
  array([ 1.,  0.,  0.,  1.,  2.,  3.,  0.,  1.,  0.])

The reason for square brackets [] instead of round () is that Python converts 1:4 to slice objects in square brackets.

Towline answered 14/12, 2011 at 13:56 Comment(6)
just was looking for information about this, and definitively this is a better answer than the accepted one, because it covers adding an extra column at the beginning and at the end, not just at the end as the other answersBugg
@Ay0 Exactly, I was looking for a way to add a bias unit to my artificial neuronal network in batch on all layers at once, and this is the perfect answer.Inelegant
And what if you want to add n columns in a time?Disc
@Riley, can you give an example please ? Python 3 has "iterable unpacking", e.g. np.c_[ * iterable ]; see expression-lists .Towline
What does "Python expands e.g. 1:4 in square -- the wonders of overloading." mean?Benefic
@Alex, operator overloading means that you can define [] -- an operator, thing[index] -- to do just about anything you want. In this case, np.r_[ 1:4 ] works because r_ handles slices (np.array( 1:4 ) is a syntax error). TL;DR: SO questions/tagged/python+operator-overloadingTowline
B
234

I think a more straightforward solution and faster to boot is to do the following:

import numpy as np
N = 10
a = np.random.rand(N,N)
b = np.zeros((N,N+1))
b[:,:-1] = a

And timings:

In [23]: N = 10

In [24]: a = np.random.rand(N,N)

In [25]: %timeit b = np.hstack((a,np.zeros((a.shape[0],1))))
10000 loops, best of 3: 19.6 us per loop

In [27]: %timeit b = np.zeros((a.shape[0],a.shape[1]+1)); b[:,:-1] = a
100000 loops, best of 3: 5.62 us per loop
Bulge answered 13/12, 2011 at 12:47 Comment(7)
I want to append (985,1) shape np araay to (985,2) np array to make it (985,3) np array, but it's not working. I am getting "could not broadcast input array from shape (985) into shape (985,1)" error. What is wrong with my code? Code: np.hstack(data, data1)Revisory
@Outlier you should post a new question rather than ask one in the comments of this one.Bulge
@JoshAdel: I tried your code on ipython, and I think there's a syntax error. You might want to try changing a = np.random.rand((N,N)) to a = np.random.rand(N,N)Scenario
I guess this is an overkill for what OP asked for. Op's answer is apt!Florid
This is just a trick on performing append, or insert, or stack. and should not be accepted as answers. Engineers should consider using the answers below.Eastereasterday
@Outlier what was your solution? i'm getting the same errorRevue
Would be faster if np.empty is used instead of np.zeros.Gladiate
S
209

Use numpy.append:

>>> a = np.array([[1,2,3],[2,3,4]])
>>> a
array([[1, 2, 3],
       [2, 3, 4]])

>>> z = np.zeros((2,1), dtype=int64)
>>> z
array([[0],
       [0]])

>>> np.append(a, z, axis=1)
array([[1, 2, 3, 0],
       [2, 3, 4, 0]])
Selmore answered 19/12, 2013 at 18:23 Comment(3)
This is nice when inserting more complicated columns.Wideangle
This is more straightforward than the answer by @JoshAdel, but when dealing with large data sets, it is slower. I'd pick between the two depending on the importance of readability.Panoply
append actually just calls concatenateCorcyra
A
78

One way, using hstack, is:

b = np.hstack((a, np.zeros((a.shape[0], 1), dtype=a.dtype)))
Arni answered 13/12, 2011 at 8:42 Comment(4)
I think this is the most elegant solution.Tonedeaf
+1 - this is how I would do it - you beat me to posting it as an answer :).Revise
Remove the dtype parameter, it is not needed and even not allowed. While your solution is elegant enough, pay attention not to use it if you need to "append" frequently to an array. If you cannot create the whole array at once and fill it later, create a list of arrays and hstack it all at once.Garonne
@Garonne I'm not sure how I managed to get the dtype at the wrong location, but the np.zeros needs a dtype to avoid everything becoming float (while a is int)Arni
R
74

I was also interested in this question and compared the speed of

numpy.c_[a, a]
numpy.stack([a, a]).T
numpy.vstack([a, a]).T
numpy.ascontiguousarray(numpy.stack([a, a]).T)               
numpy.ascontiguousarray(numpy.vstack([a, a]).T)
numpy.column_stack([a, a])
numpy.concatenate([a[:,None], a[:,None]], axis=1)
numpy.concatenate([a[None], a[None]], axis=0).T

which all do the same thing for any input vector a. Timings for growing a:

enter image description here

Note that all non-contiguous variants (in particular stack/vstack) are eventually faster than all contiguous variants. column_stack (for its clarity and speed) appears to be a good option if you require contiguity.


Code to reproduce the plot:

import numpy as np
import perfplot

b = perfplot.bench(
    setup=np.random.rand,
    kernels=[
        lambda a: np.c_[a, a],
        lambda a: np.ascontiguousarray(np.stack([a, a]).T),
        lambda a: np.ascontiguousarray(np.vstack([a, a]).T),
        lambda a: np.column_stack([a, a]),
        lambda a: np.concatenate([a[:, None], a[:, None]], axis=1),
        lambda a: np.ascontiguousarray(np.concatenate([a[None], a[None]], axis=0).T),
        lambda a: np.stack([a, a]).T,
        lambda a: np.vstack([a, a]).T,
        lambda a: np.concatenate([a[None], a[None]], axis=0).T,
    ],
    labels=[
        "c_",
        "ascont(stack)",
        "ascont(vstack)",
        "column_stack",
        "concat",
        "ascont(concat)",
        "stack (non-cont)",
        "vstack (non-cont)",
        "concat (non-cont)",
    ],
    n_range=[2 ** k for k in range(23)],
    xlabel="len(a)",
)
b.save("out.png")
Rand answered 24/10, 2016 at 12:21 Comment(7)
Nice graph! Just thought you'd like to know that under the hood, stack, hstack, vstack, column_stack, dstack are all helper functions built on top of np.concatenate. By tracing through the definition of stack I found that np.stack([a,a]) is calling np.concatenate([a[None], a[None]], axis=0). It might be nice to add np.concatenate([a[None], a[None]], axis=0).T to the perfplot to show that np.concatenate can always be at least as fast as its helper functions.Palladio
@Palladio Added that.Nahshunn
Nice library, never heard of it! Interesting enough that I got just the same plots except that stack and concat have changed places (in both ascont and non-cont variants). Plus concat-column and column_stack swapped as well.Oread
Wow, love these plots !Palanquin
It seems that for a recursive operation of appending a column to an array, e.g. b = [b, a], some of the command do not work (an error about unequal dimensions is raised). The only two that seem to work with arrays of unequal size (i.e. when one is a matrix and another one is a 1d vector) are c_ and column_stackFortuna
All these stack variants end up using concatenate. They just tweak the dimensions first.Litigate
Note that the image is generated in the same folder as the code file. The image doesn't show up automatically.Villain
D
57

I find the following most elegant:

b = np.insert(a, 3, values=0, axis=1) # Insert values before column 3

An advantage of insert is that it also allows you to insert columns (or rows) at other places inside the array. Also instead of inserting a single value you can easily insert a whole vector, for instance duplicate the last column:

b = np.insert(a, insert_index, values=a[:,2], axis=1)

Which leads to:

array([[1, 2, 3, 3],
       [2, 3, 4, 4]])

For the timing, insert might be slower than JoshAdel's solution:

In [1]: N = 10

In [2]: a = np.random.rand(N,N)

In [3]: %timeit b = np.hstack((a, np.zeros((a.shape[0], 1))))
100000 loops, best of 3: 7.5 µs per loop

In [4]: %timeit b = np.zeros((a.shape[0], a.shape[1]+1)); b[:,:-1] = a
100000 loops, best of 3: 2.17 µs per loop

In [5]: %timeit b = np.insert(a, 3, values=0, axis=1)
100000 loops, best of 3: 10.2 µs per loop
Depreciation answered 13/8, 2013 at 11:10 Comment(2)
This is pretty neat. Too bad I can't do insert(a, -1, ...) to append the column. Guess I'll just prepend it instead.Wideangle
@ThomasAhle You can append a row or column by getting the size in that axis using a.shape[axis]. I. e. for appending a row, you do np.insert(a, a.shape[0], 999, axis=0) and for a column, you do np.insert(a, a.shape[1], 999, axis=1).Scriptorium
V
35

I think:

np.column_stack((a, zeros(shape(a)[0])))

is more elegant.

Vertebra answered 26/9, 2013 at 17:9 Comment(0)
W
15

Assuming M is a (100,3) ndarray and y is a (100,) ndarray append can be used as follows:

M=numpy.append(M,y[:,None],1)

The trick is to use

y[:, None]

This converts y to a (100, 1) 2D array.

M.shape

now gives

(100, 4)
Wheezy answered 13/3, 2017 at 14:25 Comment(1)
You are a hero you know that?! That's precisely what I's pulling my hair for the past 1 hour! Ty!Rompers
G
14

Add an extra column to a numpy array:

Numpy's np.append method takes three parameters, the first two are 2D numpy arrays and the 3rd is an axis parameter instructing along which axis to append:

import numpy as np  
x = np.array([[1,2,3], [4,5,6]]) 
print("Original x:") 
print(x) 

y = np.array([[1], [1]]) 
print("Original y:") 
print(y) 

print("x appended to y on axis of 1:") 
print(np.append(x, y, axis=1)) 

Prints:

Original x:
[[1 2 3]
 [4 5 6]]
Original y:
[[1]
 [1]]
y appended to x on axis of 1:
[[1 2 3 1]
 [4 5 6 1]]
Girlie answered 2/7, 2019 at 15:40 Comment(2)
Note you are appending y to x here rather than appending x to y - that is why the column vector of y is to the right of the columns of x in the result.Aideaidedecamp
I updated the answer to reflect Brian's coment. "x appended to y" → "y appended to x"Bonine
S
13

np.concatenate also works

>>> a = np.array([[1,2,3],[2,3,4]])
>>> a
array([[1, 2, 3],
       [2, 3, 4]])
>>> z = np.zeros((2,1))
>>> z
array([[ 0.],
       [ 0.]])
>>> np.concatenate((a, z), axis=1)
array([[ 1.,  2.,  3.,  0.],
       [ 2.,  3.,  4.,  0.]])
Sculpsit answered 27/1, 2016 at 0:54 Comment(1)
np.concatenate seems to be 3 times faster than np.hstack for 2x1, 2x2 and 2x3 matrices. np.concatenate was also very slightly faster than copying the matrices manually into an empty matrix in my experiments. That's consistent with Nico Schlömer's answer below.Pastor
S
9

np.insert also serves the purpose.

matA = np.array([[1,2,3], 
                 [2,3,4]])
idx = 3
new_col = np.array([0, 0])
np.insert(matA, idx, new_col, axis=1)

array([[1, 2, 3, 0],
       [2, 3, 4, 0]])

It inserts values, here new_col, before a given index, here idx along one axis. In other words, the newly inserted values will occupy the idx column and move what were originally there at and after idx backward.

Shipyard answered 3/1, 2018 at 16:53 Comment(1)
Note that insert is not in place as one could assume given the name of the function (see docs linked in the answer).Malissa
B
8

I like JoshAdel's answer because of the focus on performance. A minor performance improvement is to avoid the overhead of initializing with zeros, only to be overwritten. This has a measurable difference when N is large, empty is used instead of zeros, and the column of zeros is written as a separate step:

In [1]: import numpy as np

In [2]: N = 10000

In [3]: a = np.ones((N,N))

In [4]: %timeit b = np.zeros((a.shape[0],a.shape[1]+1)); b[:,:-1] = a
1 loops, best of 3: 492 ms per loop

In [5]: %timeit b = np.empty((a.shape[0],a.shape[1]+1)); b[:,:-1] = a; b[:,-1] = np.zeros((a.shape[0],))
1 loops, best of 3: 407 ms per loop
Burletta answered 28/12, 2013 at 19:35 Comment(1)
You can use broadcasting to fill the last column with zeros (or any other value), which might be more readable: b[:,-1] = 0. Also, with very large arrays, the performance difference to np.insert() becomes negligible, which might make np.insert() more desirable due to its succinctness.Scriptorium
A
5

For me, the next way looks pretty intuitive and simple.

zeros = np.zeros((2,1)) #2 is a number of rows in your array.   
b = np.hstack((a, zeros))
Allyce answered 23/6, 2019 at 18:29 Comment(0)
B
4

A bit late to the party, but nobody posted this answer yet, so for the sake of completeness: you can do this with list comprehensions, on a plain Python array:

source = a.tolist()
result = [row + [0] for row in source]
b = np.array(result)
Burke answered 11/12, 2014 at 17:11 Comment(0)
L
3

In my case, I had to add a column of ones to a NumPy array

X = array([ 6.1101, 5.5277, ... ])
X.shape => (97,)
X = np.concatenate((np.ones((m,1), dtype=np.int), X.reshape(m,1)), axis=1)

After X.shape => (97, 2)

array([[ 1. , 6.1101],
       [ 1. , 5.5277],
...
Lurette answered 15/9, 2017 at 8:4 Comment(0)
C
2

There is a function specifically for this. It is called numpy.pad

a = np.array([[1,2,3], [2,3,4]])
b = np.pad(a, ((0, 0), (0, 1)), mode='constant', constant_values=0)
print b
>>> array([[1, 2, 3, 0],
           [2, 3, 4, 0]])

Here is what it says in the docstring:

Pads an array.

Parameters
----------
array : array_like of rank N
    Input array
pad_width : {sequence, array_like, int}
    Number of values padded to the edges of each axis.
    ((before_1, after_1), ... (before_N, after_N)) unique pad widths
    for each axis.
    ((before, after),) yields same before and after pad for each axis.
    (pad,) or int is a shortcut for before = after = pad width for all
    axes.
mode : str or function
    One of the following string values or a user supplied function.

    'constant'
        Pads with a constant value.
    'edge'
        Pads with the edge values of array.
    'linear_ramp'
        Pads with the linear ramp between end_value and the
        array edge value.
    'maximum'
        Pads with the maximum value of all or part of the
        vector along each axis.
    'mean'
        Pads with the mean value of all or part of the
        vector along each axis.
    'median'
        Pads with the median value of all or part of the
        vector along each axis.
    'minimum'
        Pads with the minimum value of all or part of the
        vector along each axis.
    'reflect'
        Pads with the reflection of the vector mirrored on
        the first and last values of the vector along each
        axis.
    'symmetric'
        Pads with the reflection of the vector mirrored
        along the edge of the array.
    'wrap'
        Pads with the wrap of the vector along the axis.
        The first values are used to pad the end and the
        end values are used to pad the beginning.
    <function>
        Padding function, see Notes.
stat_length : sequence or int, optional
    Used in 'maximum', 'mean', 'median', and 'minimum'.  Number of
    values at edge of each axis used to calculate the statistic value.

    ((before_1, after_1), ... (before_N, after_N)) unique statistic
    lengths for each axis.

    ((before, after),) yields same before and after statistic lengths
    for each axis.

    (stat_length,) or int is a shortcut for before = after = statistic
    length for all axes.

    Default is ``None``, to use the entire axis.
constant_values : sequence or int, optional
    Used in 'constant'.  The values to set the padded values for each
    axis.

    ((before_1, after_1), ... (before_N, after_N)) unique pad constants
    for each axis.

    ((before, after),) yields same before and after constants for each
    axis.

    (constant,) or int is a shortcut for before = after = constant for
    all axes.

    Default is 0.
end_values : sequence or int, optional
    Used in 'linear_ramp'.  The values used for the ending value of the
    linear_ramp and that will form the edge of the padded array.

    ((before_1, after_1), ... (before_N, after_N)) unique end values
    for each axis.

    ((before, after),) yields same before and after end values for each
    axis.

    (constant,) or int is a shortcut for before = after = end value for
    all axes.

    Default is 0.
reflect_type : {'even', 'odd'}, optional
    Used in 'reflect', and 'symmetric'.  The 'even' style is the
    default with an unaltered reflection around the edge value.  For
    the 'odd' style, the extented part of the array is created by
    subtracting the reflected values from two times the edge value.

Returns
-------
pad : ndarray
    Padded array of rank equal to `array` with shape increased
    according to `pad_width`.

Notes
-----
.. versionadded:: 1.7.0

For an array with rank greater than 1, some of the padding of later
axes is calculated from padding of previous axes.  This is easiest to
think about with a rank 2 array where the corners of the padded array
are calculated by using padded values from the first axis.

The padding function, if used, should return a rank 1 array equal in
length to the vector argument with padded values replaced. It has the
following signature::

    padding_func(vector, iaxis_pad_width, iaxis, kwargs)

where

    vector : ndarray
        A rank 1 array already padded with zeros.  Padded values are
        vector[:pad_tuple[0]] and vector[-pad_tuple[1]:].
    iaxis_pad_width : tuple
        A 2-tuple of ints, iaxis_pad_width[0] represents the number of
        values padded at the beginning of vector where
        iaxis_pad_width[1] represents the number of values padded at
        the end of vector.
    iaxis : int
        The axis currently being calculated.
    kwargs : dict
        Any keyword arguments the function requires.

Examples
--------
>>> a = [1, 2, 3, 4, 5]
>>> np.pad(a, (2,3), 'constant', constant_values=(4, 6))
array([4, 4, 1, 2, 3, 4, 5, 6, 6, 6])

>>> np.pad(a, (2, 3), 'edge')
array([1, 1, 1, 2, 3, 4, 5, 5, 5, 5])

>>> np.pad(a, (2, 3), 'linear_ramp', end_values=(5, -4))
array([ 5,  3,  1,  2,  3,  4,  5,  2, -1, -4])

>>> np.pad(a, (2,), 'maximum')
array([5, 5, 1, 2, 3, 4, 5, 5, 5])

>>> np.pad(a, (2,), 'mean')
array([3, 3, 1, 2, 3, 4, 5, 3, 3])

>>> np.pad(a, (2,), 'median')
array([3, 3, 1, 2, 3, 4, 5, 3, 3])

>>> a = [[1, 2], [3, 4]]
>>> np.pad(a, ((3, 2), (2, 3)), 'minimum')
array([[1, 1, 1, 2, 1, 1, 1],
       [1, 1, 1, 2, 1, 1, 1],
       [1, 1, 1, 2, 1, 1, 1],
       [1, 1, 1, 2, 1, 1, 1],
       [3, 3, 3, 4, 3, 3, 3],
       [1, 1, 1, 2, 1, 1, 1],
       [1, 1, 1, 2, 1, 1, 1]])

>>> a = [1, 2, 3, 4, 5]
>>> np.pad(a, (2, 3), 'reflect')
array([3, 2, 1, 2, 3, 4, 5, 4, 3, 2])

>>> np.pad(a, (2, 3), 'reflect', reflect_type='odd')
array([-1,  0,  1,  2,  3,  4,  5,  6,  7,  8])

>>> np.pad(a, (2, 3), 'symmetric')
array([2, 1, 1, 2, 3, 4, 5, 5, 4, 3])

>>> np.pad(a, (2, 3), 'symmetric', reflect_type='odd')
array([0, 1, 1, 2, 3, 4, 5, 5, 6, 7])

>>> np.pad(a, (2, 3), 'wrap')
array([4, 5, 1, 2, 3, 4, 5, 1, 2, 3])

>>> def pad_with(vector, pad_width, iaxis, kwargs):
...     pad_value = kwargs.get('padder', 10)
...     vector[:pad_width[0]] = pad_value
...     vector[-pad_width[1]:] = pad_value
...     return vector
>>> a = np.arange(6)
>>> a = a.reshape((2, 3))
>>> np.pad(a, 2, pad_with)
array([[10, 10, 10, 10, 10, 10, 10],
       [10, 10, 10, 10, 10, 10, 10],
       [10, 10,  0,  1,  2, 10, 10],
       [10, 10,  3,  4,  5, 10, 10],
       [10, 10, 10, 10, 10, 10, 10],
       [10, 10, 10, 10, 10, 10, 10]])
>>> np.pad(a, 2, pad_with, padder=100)
array([[100, 100, 100, 100, 100, 100, 100],
       [100, 100, 100, 100, 100, 100, 100],
       [100, 100,   0,   1,   2, 100, 100],
       [100, 100,   3,   4,   5, 100, 100],
       [100, 100, 100, 100, 100, 100, 100],
       [100, 100, 100, 100, 100, 100, 100]])
Celery answered 19/3, 2018 at 7:31 Comment(1)
Is np.pad a new function? I'm surprised this hasn't been upvoted more.Sesqui
M
1

I liked this:

new_column = np.zeros((len(a), 1))
b = np.block([a, new_column])
Moneymaking answered 10/10, 2020 at 18:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.