One alternative solution would be to use the workflow tool dask. Though it's not as syntactically fun as...
var
| do this
| then do that
...it still allows your variable to flow down the chain and using dask gives the added benefit of parallelization where possible.
Here's how I use dask to accomplish a pipe-chain pattern:
import dask
def a(foo):
return foo + 1
def b(foo):
return foo / 2
def c(foo,bar):
return foo + bar
# pattern = 'name_of_behavior': (method_to_call, variables_to_pass_in, variables_can_be_task_names)
workflow = {'a_task':(a,1),
'b_task':(b,'a_task',),
'c_task':(c,99,'b_task'),}
#dask.visualize(workflow) #visualization available.
dask.get(workflow,'c_task')
# returns 100
After having worked with elixir I wanted to use the piping pattern in Python. This isn't exactly the same pattern, but it's similar and like I said, comes with added benefits of parallelization; if you tell dask to get a task in your workflow which isn't dependant upon others to run first, they'll run in parallel.
If you wanted easier syntax you could wrap it in something that would take care of the naming of the tasks for you. Of course in this situation you'd need all functions to take the pipe as the first argument, and you'd lose any benefit of parallization. But if you're ok with that you could do something like this:
def dask_pipe(initial_var, functions_args):
'''
call the dask_pipe with an init_var, and a list of functions
workflow, last_task = dask_pipe(initial_var, {function_1:[], function_2:[arg1, arg2]})
workflow, last_task = dask_pipe(initial_var, [function_1, function_2])
dask.get(workflow, last_task)
'''
workflow = {}
if isinstance(functions_args, list):
for ix, function in enumerate(functions_args):
if ix == 0:
workflow['task_' + str(ix)] = (function, initial_var)
else:
workflow['task_' + str(ix)] = (function, 'task_' + str(ix - 1))
return workflow, 'task_' + str(ix)
elif isinstance(functions_args, dict):
for ix, (function, args) in enumerate(functions_args.items()):
if ix == 0:
workflow['task_' + str(ix)] = (function, initial_var)
else:
workflow['task_' + str(ix)] = (function, 'task_' + str(ix - 1), *args )
return workflow, 'task_' + str(ix)
# piped functions
def foo(df):
return df[['a','b']]
def bar(df, s1, s2):
return df.columns.tolist() + [s1, s2]
def baz(df):
return df.columns.tolist()
# setup
import dask
import pandas as pd
df = pd.DataFrame({'a':[1,2,3],'b':[1,2,3],'c':[1,2,3]})
Now, with this wrapper, you can make a pipe following either of these syntactical patterns:
# wf, lt = dask_pipe(initial_var, [function_1, function_2])
# wf, lt = dask_pipe(initial_var, {function_1:[], function_2:[arg1, arg2]})
like this:
# test 1 - lists for functions only:
workflow, last_task = dask_pipe(df, [foo, baz])
print(dask.get(workflow, last_task)) # returns ['a','b']
# test 2 - dictionary for args:
workflow, last_task = dask_pipe(df, {foo:[], bar:['string1', 'string2']})
print(dask.get(workflow, last_task)) # returns ['a','b','string1','string2']
crime_by_state %>% filter(State=="New York", Year==2005) ...
from the end of How dplyr replaced my most common R idioms. – Grate%>%
is syntactic sugar. I don't think there's an equivalent shortcut in Python, preferring instead to use nested function calls. – Shannonshanny