Creating a new column from two columns with apply()
Asked Answered
C

4

10

I want to creat a column s['C'] using apply() with a Pandas DataFrame.

My dataset is similiar to this:

[In]:

s=pd.DataFrame({'A':['hello', 'good', 'my', 'pandas','wrong'], 
                'B':[['all', 'say', 'hello'],
                     ['good', 'for', 'you'], 
                     ['so','hard'], 
                     ['pandas'],
                     []]})
[Out]: 
    A       B
0   hello   [all, say, hello]
1   good    [good, for, you]
2   my      [so, hard]
3   pandas  [pandas]
4   wrong   []

I need to creat a s['C'] column where the value of each row is a list with ones and zeros dependending if the word of column A is in the list of column B and the position of the element in the list of column B. My output should be like this:

[Out]: 
    A       B                   C
0   hello   [all, say, hello]   [0, 0, 1]
1   good    [good, for, you]    [1, 0, 0]
2   my      [so, hard]          [0, 0]
3   pandas  [pandas]            [1]
4   wrong   []                  [0]

I've been trying with a función and apply, but I still have not realized where is the error.

[In]:
def func(valueA,listB):
  new_list=[]
  for i in listB:
    if listB[i] == valueA:
      new_list.append(1)
    else:
      new_list.append(0)
  return new_list

s['C']=s.apply( lambda x: func(x.loc[:,'A'], x.loc[:,'B']))

The error is: Too many indexers

And I also tried with:

[In]:
list=[]
listC=[]
for i in s['A']:
  for j in s['B'][i]:
     if s['A'][i] == s['B'][i][j]:
        list.append(1)
     else:
        list.append(0)
  listC.append(list)

s['C']=listC

The error is: KeyError: 'hello'

Any suggests?

Constrictor answered 11/5, 2020 at 15:51 Comment(2)
Are the lists necessary? You can organize this with a MultiIndex where the first level is your original index and the second level would be your list index. Then all of these manipulations become far more efficient.Welford
@ALollz, it's interesting what you said. Have yo got some example? My github username is Ignacio-Ibarra, thanksConstrictor
E
9

If you are working with pandas 0.25+, explode is an option:

(s.explode('B')
  .assign(C=lambda x: x['A'].eq(x['B']).astype(int))
  .groupby(level=0).agg({'A':'first','B':list,'C':list})
)

Output:

        A                  B          C
0   hello  [all, say, hello]  [0, 0, 1]
1    good   [good, for, you]  [1, 0, 0]
2      my         [so, hard]     [0, 0]
3  pandas           [pandas]        [1]
4   wrong              [nan]        [0]

Option 2: Based on your logic, you can do a list comprehension. This should work with any version of pandas:

s['C'] = [[x==a for x in b] if b else [0] for a,b in zip(s['A'],s['B'])]

Output:

        A                  B                     C
0   hello  [all, say, hello]  [False, False, True]
1    good   [good, for, you]  [True, False, False]
2      my         [so, hard]        [False, False]
3  pandas           [pandas]                [True]
4   wrong                 []                   [0]
Endospore answered 11/5, 2020 at 15:55 Comment(0)
D
5

With apply would be

s['c'] = s.apply(lambda x: [int(x.A == i) for i in x.B], axis=1)
s
        A                  B          c
0   hello  [all, say, hello]  [0, 0, 1]
1    good   [good, for, you]  [1, 0, 0]
2      my         [so, hard]     [0, 0]
3  pandas           [pandas]        [1]
4   wrong                 []         []
Disject answered 11/5, 2020 at 16:28 Comment(0)
K
2

Another approach that requires numpy for easy indexing:

import numpy as np

def create_vector(word, vector):

    out = np.zeros(len(vector))
    indices = [i for i, x in enumerate(vector) if x == word]
    out[indices] = 1

    return out.astype(int)


s['C'] = s.apply(lambda x: create_vector(x.A, x.B), axis=1)

# Output
#      A        B                   C
# 0    hello    [all, say, hello]   [0, 0, 1]
# 1    good     [good, for, you]    [1, 0, 0]
# 2    my       [so, hard]          [0, 0]
# 3    pandas   [pandas]            [1]
# 4    wrong    []                  []
Kalevala answered 11/5, 2020 at 16:6 Comment(0)
E
2

I could get your function to work with some minor changes:

def func(valueA, listB):
    new_list = []
    for i in range(len(listB)): #I changed your in listB with len(listB)
        if listB[i] == valueA:
            new_list.append(1)
        else:
            new_list.append(0)
    return new_list

and adding the parameter axis = 1 to the apply function

s['C'] = s.apply(lambda x: func(x.A, x.B), axis=1)
Eolande answered 11/5, 2020 at 16:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.