Floor or ceiling of a pandas series in python?
Asked Answered
E

7

119

I have a pandas series series. If I want to get the element-wise floor or ceiling, is there a built in method or do I have to write the function and use apply? I ask because the data is big so I appreciate efficiency. Also this question has not been asked with respect to the Pandas package.

Ellington answered 21/12, 2014 at 18:32 Comment(0)
C
154

You can use NumPy's built in methods to do this: np.ceil(series) or np.floor(series).

Both return a Series object (not an array) so the index information is preserved.

Cavan answered 21/12, 2014 at 18:37 Comment(3)
how can I chain this like using round() on a pandas Series?Sevier
This works: dataframe["new_series"] = np.ceil(dataframe[series].round(0))Gadolinium
@iamyojimbo, if you want to chain the method, use pipe(), df['column'].pipe(np.ceil)Telophase
E
38

I am the OP, but I tried this and it worked:

np.floor(series)
Ellington answered 21/12, 2014 at 18:37 Comment(0)
C
23

UPDATE: THIS ANSWER IS WRONG, DO NOT DO THIS

Explanation: using Series.apply() with a native vectorized Numpy function makes no sense in most cases as it will run the Numpy function in a Python loop, leading to much worse performance. You'd be much better off using np.floor(series) directly, as suggested by several other answers.

You could do something like this using NumPy's floor, for instance, with a dataframe:

floored_data = data.apply(np.floor)

Can't test it right now but an actual and working solution might not be far from it.

Cephalalgia answered 21/12, 2014 at 18:36 Comment(2)
If you use the argument raw=True then the df.apply function will achieve much better performance.Omphale
The apply function is not a vectorized implementation therefore it will be super slow. You can call np.floor directly to the dataframe.Detruncate
S
19

With pd.Series.clip, you can set a floor via clip(lower=x) or ceiling via clip(upper=x):

s = pd.Series([-1, 0, -5, 3])
    
print(s.clip(lower=0))
# 0    0
# 1    0
# 2    0
# 3    3
# dtype: int64
    
print(s.clip(upper=0))
# 0   -1
# 1    0
# 2   -5
# 3    0
# dtype: int64

pd.Series.clip allows generalised functionality, e.g. applying and flooring a ceiling simultaneously, e.g. s.clip(-1, 1)

NOTE: Answer originally referred to clip_lower / clip_upper which were removed in pandas 1.0.0.

Sade answered 14/1, 2019 at 0:5 Comment(4)
I can see that it's worth mentioning clip et al. as somewhat related functions, but clipping a value is a very different operation to finding the floor/ceiling of that value...Cavan
@AlexRiley, Point taken; the reason I mention it is there are situations where the upper and lower bounds are variables from an external input, and you may want (for example) to use s.clip(-1, np.inf) or s.clip(-np.inf, 1) to handle those situations seamlessly.Sade
It was more that I was puzzled by what OP's wants to do given the wording of their question (and indeed OP's own answer). Unless I'm missing something, if you have s = pd.Series([3.1, 2.2, 5.6]) there does not seem to be a way to use clip alone to compute the same result as np.floor(s) for example. (Not to detract from your answer which is well-written and useful information - I was just confused when I reread the question.)Cavan
@AlexRiley, Ah, now rereading the question is confusing me also. I came by the question when googling "how to floor a Pandas series". So I'll leave this answer for others who reach the somewhat ambiguous question via the same route!Sade
C
10

The pinned answer already the fastest. Here's I provide some alternative to do ceiling and floor using pure pandas and compare it with the numpy approach.

series = pd.Series(np.random.normal(100,20,1000000))

Floor

%timeit np.floor(series) # 1.65 ms ± 18.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit series.astype(int) # 2.2 ms ± 131 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit (series-0.5).round(0) # 3.1 ms ± 47 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit round(series-0.5,0) # 2.83 ms ± 60.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Why astype int works? Because in Python, when converting to integer, that it always get floored.

Ceil

%timeit np.ceil(series) # 1.67 ms ± 21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit (series+0.5).round(0) # 3.15 ms ± 46.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit round(series+0.5,0) # 2.99 ms ± 103 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

So yeah, just use the numpy function.

Courtenay answered 24/9, 2021 at 2:19 Comment(1)
(series+0.5).round(0) is not ceil: round(0 + 0.5) == 1Diffident
O
1

You can easily calculate the floor without using numpy, by calling:

series.astype(int)

This type conversion to int uses the floor operator.

Octuple answered 5/1, 2024 at 15:21 Comment(0)
K
0

The existing answers are limited. They either error on or incorrectly handle NaNs in the input_series.

You can correctly handle these cases with

# setup
input_series = pd.Series([pd.NA, pd.NA,3,4,5.4,pd.NA,5.3,7])

# floor all non-nans in input
mask_nan = input_series.isna()
input_series.where(mask_nan, np.floor(input_series[~mask_nan]))

# gives [<NA>, <NA>, 3, 4, 5, <NA>, 5, 7]

Important:

  • use pandas .isna() to future-proof the ongoing pandas NA dtype changes
  • use pandas where not np.where so we can operate on a subset for the replacement series
  • np.floor() for speed
Kus answered 12/1, 2024 at 19:51 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.