Converting a list of tuples to a Pandas series
Asked Answered
A

7

16

I have a list of tuples which I want to convert to a Series.

return array2

[(0, 0.07142857142857142),
  (0, 0.07142857142857142),
  (1, 0.08333333333333333),
  (1, 0.3333333333333333),
  (1, 0.3333333333333333),
  (1, 0.08333333333333333),
  (3, 0.058823529411764705),
  (3, 0.058823529411764705)]

I attempt to do this by converting the list to a dictionary and then to a Series:

 a = pd.Series(dict(array2))

The resulting Series however, doesn't behave as I need it to. It seems to drop key:value pairs (possibly arbitrarily?)

E.g.

return a

 0    0.071429
 1    0.083333
 3    0.058824

How would I obtain a series without dropping any key value pairs?

Alton answered 18/11, 2018 at 17:34 Comment(0)
C
8

Use DataFrame constructor with set_index by first column, then select second column for Series:

a = pd.DataFrame(array2).set_index(0)[1]
print (a)
0
0    0.071429
0    0.071429
1    0.083333
1    0.333333
1    0.333333
1    0.083333
3    0.058824
3    0.058824
Name: 1, dtype: float64

Or create 2 lists and pass to Series contructor:

idx = [x[0] for x in array2]
vals = [x[1] for x in array2]

a = pd.Series(vals, index=idx)
print (a)
0    0.071429
0    0.071429
1    0.083333
1    0.333333
1    0.333333
1    0.083333
3    0.058824
3    0.058824
dtype: float64
Constitute answered 18/11, 2018 at 17:37 Comment(1)
HI, if I want a column of this tuples only... How to do that @ConstituteBuchanan
C
26

Using zip and sequence unpacking:

idx, values = zip(*L)

a = pd.Series(values, idx)

With duplicate indices, as in your data, dict will not help as duplicate dictionary keys are not permitted: dict will only take the last value for every key supplied.

Curlpaper answered 18/11, 2018 at 18:0 Comment(1)
Can you elaborate on why the * is needed, it appears the * is required @CurlpaperCamacho
C
8

Use DataFrame constructor with set_index by first column, then select second column for Series:

a = pd.DataFrame(array2).set_index(0)[1]
print (a)
0
0    0.071429
0    0.071429
1    0.083333
1    0.333333
1    0.333333
1    0.083333
3    0.058824
3    0.058824
Name: 1, dtype: float64

Or create 2 lists and pass to Series contructor:

idx = [x[0] for x in array2]
vals = [x[1] for x in array2]

a = pd.Series(vals, index=idx)
print (a)
0    0.071429
0    0.071429
1    0.083333
1    0.333333
1    0.333333
1    0.083333
3    0.058824
3    0.058824
dtype: float64
Constitute answered 18/11, 2018 at 17:37 Comment(1)
HI, if I want a column of this tuples only... How to do that @ConstituteBuchanan
B
4

The problem is that when you convert a list of tuples to a dictionary, Python drops all duplicate keys and only uses the last value for each key. This is necessary since each key can only appear once in a dictionary. So you need to use a method that preserves all the records. This will do that:

df = pd.DataFrame.from_records(array2, columns=['key', 'val'])
df = df.set_index('key')
a = df['val']

Example:

import pandas as pd
array2 = [
    (0, 0.07142857142857142),
    (0, 0.07142857142857142),
    (1, 0.08333333333333333),
    (1, 0.3333333333333333),
    (1, 0.3333333333333333),
    (1, 0.08333333333333333),
    (3, 0.058823529411764705),
    (3, 0.058823529411764705)
]

df = pd.DataFrame.from_records(array2, columns=['key', 'val'])
df = df.set_index('key')
a = df['val']
print(a)
# key
# 0    0.071429
# 0    0.071429
# 1    0.083333
# 1    0.333333
# 1    0.333333
# 1    0.083333
# 3    0.058824
# 3    0.058824
# Name: val, dtype: float64
Backed answered 18/11, 2018 at 17:42 Comment(0)
E
4

You can use np.transpose to unpack the columns, then make a pd.Series:

import numpy as np
import pandas as pd

x, y = np.transpose(array2)
pd.Series(y, x)
Ephesian answered 23/3, 2021 at 22:37 Comment(3)
Is this faster than pd.Series.T (transpose)?Cranford
Or, you could use x, y = zip(*array2) to save having to load Numpy. Also, this preserves the type of the index (int).Trajan
Oh. That has already been suggested by @jpp. Sorry.Trajan
N
1

Using MultiIndex

pd.MultiIndex.from_tuples(L).to_frame()[1].reset_index(level=1,drop=True)
Out[79]: 
0    0.071429
0    0.071429
1    0.083333
1    0.333333
1    0.333333
1    0.083333
3    0.058824
3    0.058824
Name: 1, dtype: float64
Noletta answered 18/11, 2018 at 18:22 Comment(1)
out of box solution :)Constitute
I
0

Assuming your list of tuples is

tuples = [(0, 0.07142857142857142),
  (0, 0.07142857142857142),
  (1, 0.08333333333333333),
  (1, 0.3333333333333333),
  (1, 0.3333333333333333),
  (1, 0.08333333333333333),
  (3, 0.058823529411764705),
  (3, 0.058823529411764705)]

I would use (explicit is better than implicit):

pd.Series([value for _, value in tuples], index=[index for index, _ in tuples])

However, I would also reconsider if the series data format is appropriate and meaningful: An index is actually meant to be like a dict, namely mapping a unique value to a value.

Inadmissible answered 4/11, 2021 at 11:35 Comment(0)
B
0

While this is not a direct answer, sometimes it is easier to create pd.DataFrame directly out of list of tuples, instead of creating pd.Series This is especially true if you need to work with multiple series later.

When skipping creating Series, you can have both tuple left and right value as the DataFrame columns, instead of index and column. This avoids some problems like having duplicate index keys, Nones, NaNs, in the tuple data.

Also, the code for creating DataFrame is shorter and easier to read.

Here is an example of how to go directly to DataFrame:

# Prepare data as list of (value, timestamp) tuples
# Keep the order in mind
tvl_data = [(ps.total_equity, ps.calculated_at) for ps in state.stats.portfolio]
volume_data = [(t.get_volume(), t.executed_at) for t in state.portfolio.get_all_trades()]

# Convert to DataFrames without index
tvl = pd.DataFrame(tvl_data, columns=["tvl", "timestamp"])
volume = pd.DataFrame(volume_data, columns=["volume", "timestamp"])

# Merge DataFrames, index to a common index
df = pd.concat([tvl, volume]).set_index("timestamp")
display(df)

enter image description here

If you do not do this, your data is not clean, you are likely to encounter:

ValueError: cannot reindex on an axis with duplicate labels
Bailey answered 13/3 at 18:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.