How to resolve positional index error in python while solving a condition in python?
Asked Answered
A

1

0

I have the following data and I am trying the following code:

Name    Sensex_index    Start_Date       End_Date
AAA        0.5           20/08/2016    25/09/2016 
AAA        0.8           26/08/2016    29/08/2016 
AAA        0.4           30/08/2016    31/08/2016
AAA        0.9           01/09/2016    05/09/2016
AAA        0.5           12/09/2016    22/09/2016
AAA        0.3           24/09/2016    29/09/2016
ABC        0.9           01/01/2017    15/01/2017
ABC        0.5           23/01/2017    30/01/2017
ABC        0.7           02/02/2017    15/03/2017

so what i do is, If the sensex index of (with same name) increases from lower index and moves to higher index, then the Termination date is the previous value, for example, I am looking for the following output. To find the actual start and termination date from the above datatype.

Name   Sensex_index  Actual_Start      Termination_Date 
AAA        0.5        20/08/2016          31/08/2016
AAA        0.8        20/08/2016          31/08/2016
AAA        0.4        20/08/2016          31/08/2016 [high to low; low to high,terminate]
AAA        0.9        01/09/2016          29/09/2016
AAA        0.5        01/09/2016          29/09/2016      
AAA        0.3        01/09/2016          29/09/2016 [end of AAA]
ABC        0.9        01/01/2017          30/01/2017  
ABC        0.5        01/01/2017          30/01/2017 [high to low; low to high,terminate]
ABC        0.7        02/02/2017          15/03/2017 [end of ABC]

I use the following code which was working before but now i get index error,

#Find the rows where price change from high to low and then to high
df['change'] = df.groupby('Name')['Sensex_index'].apply(lambda x: x.rolling(3,center=True).apply(lambda y: True if (y[1]<y[0] and y[1]<y[2]) else False))
#Find the last row for each name
df.iloc[df.groupby('Name')['change'].tail(1).index, -1] = 1.0        
#Set End_Date as Termination_Date for those changing points
df['Termination_Date'] = df.apply(lambda x: x.End_Date if x.change>0 else np.nan, axis=1)
#Set Actual_Start
df['Actual_Start'] = df.apply(lambda x: x.Start_Date if (x.name==0 
                                                      or x.Name!= 
df.iloc[x.name-1]['Name'] 
                                                      or df.iloc[x.name-1]['change']>0) 
                                                 else np.nan, axis=1)
#back fill the Termination_Date for other rows.
df.Termination_Date.fillna(method='bfill', inplace=True)
#forward fill the Actual_Start for other rows.
df.Actual_Start.fillna(method='ffill', inplace=True)
print(df)

I get the following error:

 File "/usr/local/lib/python2.7/dist-packages/pandas/core/indexing.py", line 1554, in _is_valid_list_like
raise IndexError("positional indexers are out-of-bounds")

Index Error!

IndexError: positional indexers are out-of-bounds
Arianaariane answered 7/7, 2017 at 9:6 Comment(0)
C
0

You probably overwritten your df somewhere:

tsv = """Name    Sensex_index    Start_Date       End_Date
AAA        0.5           20/08/2016    25/09/2016 
AAA        0.8           26/08/2016    29/08/2016 
AAA        0.4           30/08/2016    31/08/2016
AAA        0.9           01/09/2016    05/09/2016
AAA        0.5           12/09/2016    22/09/2016
AAA        0.3           24/09/2016    29/09/2016
ABC        0.9           01/01/2017    15/01/2017
ABC        0.5           23/01/2017    30/01/2017
ABC        0.7           02/02/2017    15/03/2017
"""

df=pd.read_table(io.StringIO(tsv), sep="\s+")

then I copy-pasted your code and received no error, but this df

  Name  Sensex_index  Start_Date    End_Date  change Termination_Date  \
0  AAA           0.5  20/08/2016  25/09/2016     NaN       31/08/2016   
1  AAA           0.8  26/08/2016  29/08/2016     0.0       31/08/2016   
2  AAA           0.4  30/08/2016  31/08/2016     1.0       31/08/2016   
3  AAA           0.9  01/09/2016  05/09/2016     0.0       29/09/2016   
4  AAA           0.5  12/09/2016  22/09/2016     0.0       29/09/2016   
5  AAA           0.3  24/09/2016  29/09/2016     1.0       29/09/2016   
6  ABC           0.9  01/01/2017  15/01/2017     NaN       30/01/2017   
7  ABC           0.5  23/01/2017  30/01/2017     1.0       30/01/2017   
8  ABC           0.7  02/02/2017  15/03/2017     1.0       15/03/2017   

  Actual_Start  
0   20/08/2016  
1   20/08/2016  
2   20/08/2016  
3   01/09/2016  
4   01/09/2016  
5   01/09/2016  
6   01/01/2017  
7   01/01/2017  
8   02/02/2017

Just recreate your dataframe and you should be good.

Chacon answered 7/7, 2017 at 15:35 Comment(3)
hello! what do you mean by overwritten df ? i have 5gb data and i am trying to use chunksize=100000, and give the values and still it shows index errorArianaariane
I meant that maybe somewhere, between where you read your data in into df var and place, where you run this code above, you modified df in some way that it no longer have the index correct.Chacon
But, that chunksize is additional info you didn't give in your question: when you use chunksize, that df var no longer is DataFrame object, but is now an iterator, and that might give you the IndexError exception. Assign output from read_csv(..., chunksize=100000) to variable chunks and then df = chunks[0] - see the docs here.Chacon

© 2022 - 2024 — McMap. All rights reserved.