To extend the accepted answer, apply calls can be particularly expensive - the same task can be accomplished without it by constructing a numpy array from scratch.
isna = df['x'].isna()
df.loc[isna, 'x'] = pd.Series([[]] * isna.sum()).values
A quick timing comparison:
def empty_assign_1(s):
s[s.isna()].apply(lambda x: [])
def empty_assign_2(s):
[[]] * s.isna().sum()
series = pd.Series(np.random.choice([1, 2, np.nan], 1000000))
%timeit empty_assign_1(series)
>>> 61 ms ± 964 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit empty_assign_2(series)
>>> 2.17 ms ± 70.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Nearly 10 times faster!
EDIT:
Fixed a bug pointed out by @valentin
You have to be somewhat careful with data types when performing assignment in this case. In the example above, the test series is float, however, adding []
elements coerces the entire series to object. Pandas will handle that for you if you do something like
idx = series.isna()
series[isna] = series[isna].apply(lambda x: [])
Because the output of apply is itself a series. You can test live performance with assignment overhead like so (I've added a string value so the series with be an object, you could instead use a number as the replacement value rather than an empty list to avoid coercion).
def empty_assign_1(s):
idx = s.isna()
s[idx] = s[idx].apply(lambda x: [])
def empty_assign_2(s):
idx = s.isna()
s.loc[idx] = [[]] * idx.sum()
series = pd.Series(np.random.choice([1, 2, np.nan, '2'], 1000000))
%timeit empty_assign_1(series.copy())
>>> 45.1 ms ± 386 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit empty_assign_2(series.copy())
>>> 24 ms ± 393 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
About 4 ms of that is related to the copy, 10x to 2x, still pretty great.