Converting a list of tuples to a Pandas series

Asked 18/11, 2018 at 17:34 Answered 13/3, 2024 at 18:34

Solved python pandas dictionary tuples series

I have a list of tuples which I want to convert to a Series.

return array2

[(0, 0.07142857142857142),
  (0, 0.07142857142857142),
  (1, 0.08333333333333333),
  (1, 0.3333333333333333),
  (1, 0.3333333333333333),
  (1, 0.08333333333333333),
  (3, 0.058823529411764705),
  (3, 0.058823529411764705)]

I attempt to do this by converting the list to a dictionary and then to a Series:

 a = pd.Series(dict(array2))

The resulting Series however, doesn't behave as I need it to. It seems to drop key:value pairs (possibly arbitrarily?)

E.g.

return a

 0    0.071429
 1    0.083333
 3    0.058824

How would I obtain a series without dropping any key value pairs?

Alton answered 18/11, 2018 at 17:34 Comment(0)

Use DataFrame constructor with set_index by first column, then select second column for Series:

a = pd.DataFrame(array2).set_index(0)[1]
print (a)
0
0    0.071429
0    0.071429
1    0.083333
1    0.333333
1    0.333333
1    0.083333
3    0.058824
3    0.058824
Name: 1, dtype: float64

Or create 2 lists and pass to Series contructor:

idx = [x[0] for x in array2]
vals = [x[1] for x in array2]

a = pd.Series(vals, index=idx)
print (a)
0    0.071429
0    0.071429
1    0.083333
1    0.333333
1    0.333333
1    0.083333
3    0.058824
3    0.058824
dtype: float64

Constitute answered 18/11, 2018 at 17:37 Comment(1)

HI, if I want a column of this tuples only... How to do that @Constitute – Buchanan 24/3, 2021 at 14:6

Using zip and sequence unpacking:

idx, values = zip(*L)

a = pd.Series(values, idx)

With duplicate indices, as in your data, dict will not help as duplicate dictionary keys are not permitted: dict will only take the last value for every key supplied.

Curlpaper answered 18/11, 2018 at 18:0 Comment(1)

Can you elaborate on why the * is needed, it appears the * is required @Curlpaper – Camacho 29/9, 2022 at 22:48

Use DataFrame constructor with set_index by first column, then select second column for Series:

a = pd.DataFrame(array2).set_index(0)[1]
print (a)
0
0    0.071429
0    0.071429
1    0.083333
1    0.333333
1    0.333333
1    0.083333
3    0.058824
3    0.058824
Name: 1, dtype: float64

Or create 2 lists and pass to Series contructor:

idx = [x[0] for x in array2]
vals = [x[1] for x in array2]

a = pd.Series(vals, index=idx)
print (a)
0    0.071429
0    0.071429
1    0.083333
1    0.333333
1    0.333333
1    0.083333
3    0.058824
3    0.058824
dtype: float64

Constitute answered 18/11, 2018 at 17:37 Comment(1)

HI, if I want a column of this tuples only... How to do that @Constitute – Buchanan 24/3, 2021 at 14:6

The problem is that when you convert a list of tuples to a dictionary, Python drops all duplicate keys and only uses the last value for each key. This is necessary since each key can only appear once in a dictionary. So you need to use a method that preserves all the records. This will do that:

df = pd.DataFrame.from_records(array2, columns=['key', 'val'])
df = df.set_index('key')
a = df['val']

Example:

import pandas as pd
array2 = [
    (0, 0.07142857142857142),
    (0, 0.07142857142857142),
    (1, 0.08333333333333333),
    (1, 0.3333333333333333),
    (1, 0.3333333333333333),
    (1, 0.08333333333333333),
    (3, 0.058823529411764705),
    (3, 0.058823529411764705)
]

df = pd.DataFrame.from_records(array2, columns=['key', 'val'])
df = df.set_index('key')
a = df['val']
print(a)
# key
# 0    0.071429
# 0    0.071429
# 1    0.083333
# 1    0.333333
# 1    0.333333
# 1    0.083333
# 3    0.058824
# 3    0.058824
# Name: val, dtype: float64

Backed answered 18/11, 2018 at 17:42 Comment(0)

You can use np.transpose to unpack the columns, then make a pd.Series:

import numpy as np
import pandas as pd

x, y = np.transpose(array2)
pd.Series(y, x)

Ephesian answered 23/3, 2021 at 22:37 Comment(3)

Is this faster than pd.Series.T (transpose)? – Cranford 24/3, 2021 at 2:8

Or, you could use x, y = zip(*array2) to save having to load Numpy. Also, this preserves the type of the index (int). – Trajan 24/3, 2021 at 2:51

Oh. That has already been suggested by @jpp. Sorry. – Trajan 24/3, 2021 at 4:14

Using MultiIndex

pd.MultiIndex.from_tuples(L).to_frame()[1].reset_index(level=1,drop=True)
Out[79]: 
0    0.071429
0    0.071429
1    0.083333
1    0.333333
1    0.333333
1    0.083333
3    0.058824
3    0.058824
Name: 1, dtype: float64

Noletta answered 18/11, 2018 at 18:22 Comment(1)

out of box solution :) – Constitute 18/11, 2018 at 18:23

Assuming your list of tuples is

tuples = [(0, 0.07142857142857142),
  (0, 0.07142857142857142),
  (1, 0.08333333333333333),
  (1, 0.3333333333333333),
  (1, 0.3333333333333333),
  (1, 0.08333333333333333),
  (3, 0.058823529411764705),
  (3, 0.058823529411764705)]

I would use (explicit is better than implicit):

pd.Series([value for _, value in tuples], index=[index for index, _ in tuples])

However, I would also reconsider if the series data format is appropriate and meaningful: An index is actually meant to be like a dict, namely mapping a unique value to a value.

Inadmissible answered 4/11, 2021 at 11:35 Comment(0)

While this is not a direct answer, sometimes it is easier to create pd.DataFrame directly out of list of tuples, instead of creating pd.Series This is especially true if you need to work with multiple series later.

When skipping creating Series, you can have both tuple left and right value as the DataFrame columns, instead of index and column. This avoids some problems like having duplicate index keys, Nones, NaNs, in the tuple data.

Also, the code for creating DataFrame is shorter and easier to read.

Here is an example of how to go directly to DataFrame:

# Prepare data as list of (value, timestamp) tuples
# Keep the order in mind
tvl_data = [(ps.total_equity, ps.calculated_at) for ps in state.stats.portfolio]
volume_data = [(t.get_volume(), t.executed_at) for t in state.portfolio.get_all_trades()]

# Convert to DataFrames without index
tvl = pd.DataFrame(tvl_data, columns=["tvl", "timestamp"])
volume = pd.DataFrame(volume_data, columns=["volume", "timestamp"])

# Merge DataFrames, index to a common index
df = pd.concat([tvl, volume]).set_index("timestamp")
display(df)

If you do not do this, your data is not clean, you are likely to encounter:

ValueError: cannot reindex on an axis with duplicate labels

Bailey answered 13/3, 2024 at 18:34 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags