Functional pipes in python like %>% from R's magrittr
Asked Answered
I

16

140

In R (thanks to magrittr) you can now perform operations with a more functional piping syntax via %>%. This means that instead of coding this:

> as.Date("2014-01-01")
> as.character((sqrt(12)^2)

You could also do this:

> "2014-01-01" %>% as.Date 
> 12 %>% sqrt %>% .^2 %>% as.character

To me this is more readable and this extends to use cases beyond the dataframe. Does the python language have support for something similar?

Inwardly answered 31/1, 2015 at 14:15 Comment(8)
Great question. I am especially interested in case, where functions have more arguments. As in crime_by_state %>% filter(State=="New York", Year==2005) ... from the end of How dplyr replaced my most common R idioms.Grate
Of course, one can do it with a lot of lambdas, maps and reduces (and it is straightforward to do so), but brevity and readability are the main points.Grate
The package in question is magrittr.Huck
@Huck very true! the reason for mentioning dplyr is because it is more known. magrittr is indeed the origin.Inwardly
Yes, for the same reason every R package ever written was authored by Hadley. He is more known. (poorly disguised envy alert here)Huck
See answers to #33658855 that are solving this problem.Grate
In short, not in Python. The best approach that I know is in Julia, where you have it in core and if you wish more (use placeholders for values, pipe multiple functions at once, etc) you can use the Pipe.jl packageStarcrossed
%>% is syntactic sugar. I don't think there's an equivalent shortcut in Python, preferring instead to use nested function calls.Shannonshanny
E
59

Pipes are a new feature in Pandas 0.16.2.

Example:

import pandas as pd
from sklearn.datasets import load_iris

x = load_iris()
x = pd.DataFrame(x.data, columns=x.feature_names)

def remove_units(df):
    df.columns = pd.Index(map(lambda x: x.replace(" (cm)", ""), df.columns))
    return df

def length_times_width(df):
    df['sepal length*width'] = df['sepal length'] * df['sepal width']
    df['petal length*width'] = df['petal length'] * df['petal width']
    
x.pipe(remove_units).pipe(length_times_width)
x

NB: The Pandas version retains Python's reference semantics. That's why length_times_width doesn't need a return value; it modifies x in place.

Epenthesis answered 24/6, 2015 at 22:1 Comment(1)
unfortunately this only works for dataframes, therefor i cannot assign this to be the correct answer. but good to mention here as the main use case i had in mind was to apply this to dataframes.Inwardly
O
50

One possible way of doing this is by using a module called macropy. Macropy allows you to apply transformations to the code that you have written. Thus a | b can be transformed to b(a). This has a number of advantages and disadvantages.

In comparison to the solution mentioned by Sylvain Leroux, The main advantage is that you do not need to create infix objects for the functions you are interested in using -- just mark the areas of code that you intend to use the transformation. Secondly, since the transformation is applied at compile time, rather than runtime, the transformed code suffers no overhead during runtime -- all the work is done when the byte code is first produced from the source code.

The main disadvantages are that macropy requires a certain way to be activated for it to work (mentioned later). In contrast to a faster runtime, the parsing of the source code is more computationally complex and so the program will take longer to start. Finally, it adds a syntactic style that means programmers who are not familiar with macropy may find your code harder to understand.

Example Code:

run.py

import macropy.activate 
# Activates macropy, modules using macropy cannot be imported before this statement
# in the program.
import target
# import the module using macropy

target.py

from fpipe import macros, fpipe
from macropy.quick_lambda import macros, f
# The `from module import macros, ...` must be used for macropy to know which 
# macros it should apply to your code.
# Here two macros have been imported `fpipe`, which does what you want
# and `f` which provides a quicker way to write lambdas.

from math import sqrt

# Using the fpipe macro in a single expression.
# The code between the square braces is interpreted as - str(sqrt(12))
print fpipe[12 | sqrt | str] # prints 3.46410161514

# using a decorator
# All code within the function is examined for `x | y` constructs.
x = 1 # global variable
@fpipe
def sum_range_then_square():
    "expected value (1 + 2 + 3)**2 -> 36"
    y = 4 # local variable
    return range(x, y) | sum | f[_**2]
    # `f[_**2]` is macropy syntax for -- `lambda x: x**2`, which would also work here

print sum_range_then_square() # prints 36

# using a with block.
# same as a decorator, but for limited blocks.
with fpipe:
    print range(4) | sum # prints 6
    print 'a b c' | f[_.split()] # prints ['a', 'b', 'c']

And finally the module that does the hard work. I've called it fpipe for functional pipe as its emulating shell syntax for passing output from one process to another.

fpipe.py

from macropy.core.macros import *
from macropy.core.quotes import macros, q, ast

macros = Macros()

@macros.decorator
@macros.block
@macros.expr
def fpipe(tree, **kw):

    @Walker
    def pipe_search(tree, stop, **kw):
        """Search code for bitwise or operators and transform `a | b` to `b(a)`."""
        if isinstance(tree, BinOp) and isinstance(tree.op, BitOr):
            operand = tree.left
            function = tree.right
            newtree = q[ast[function](ast[operand])]
            return newtree

    return pipe_search.recurse(tree)
Olden answered 31/1, 2015 at 22:8 Comment(3)
Sounds great, but as I see it only works on Python 2.7 (and not Python 3.4).Grate
I've created a smaller library with no dependencies that does the same thing as the @fpipe decorator but redefining right shift (>>) instead of or (|): pypi.org/project/pipeopNirvana
downvoted as requiring 3rd party libraries with the use of multiple decorators is a very complex solution for a fairly simple problem. Plus it is a python 2 only solution. Pretty sure the vanilla python solution is gonna be quicker as well.Commodus
M
36

PyToolz [doc] allows arbitrarily composable pipes, just they aren't defined with that pipe-operator syntax.

Follow the above link for the quickstart. And here's a video tutorial: http://pyvideo.org/video/2858/functional-programming-in-python-with-pytoolz

In [1]: from toolz import pipe

In [2]: from math import sqrt

In [3]: pipe(12, sqrt, str)
Out[3]: '3.4641016151377544'
Mobley answered 5/2, 2015 at 10:48 Comment(5)
PyToolz is a great pointer. Having said that one link is dead and the other one is dying soonCatholicize
His base URLs seem to be: http://matthewrocklin.com/blog and PyToolz toolz.readthedocs.io/en/latest . Ah, the ephemerality of the internetz...Mobley
What sucks about this is you can't do multi-argument functionsVinculum
@Frank: hey man this open-source, the authors don't get paid by you or me, so instead of saying 'package X sucks', just say 'package X is limited to use-case Y', and/or suggest a better alternative package, or contribute that feature to package X, or write it yourself.Mobley
sspipe, mentioned below, worked really well. Also, I didn't say the package sucks, I said the lack of some features sucks.Vinculum
E
26

If you just want this for personal scripting, you might want to consider using Coconut instead of Python.

Coconut is a superset of Python. You could therefore use Coconut's pipe operator |>, while completely ignoring the rest of the Coconut language.

For example:

def addone(x):
    x + 1

3 |> addone

compiles to

# lots of auto-generated header junk

# Compiled Coconut: -----------------------------------------------------------

def addone(x):
    return x + 1

(addone)(3)
Epenthesis answered 22/10, 2017 at 7:50 Comment(10)
print(1 |> isinstance(int))... TypeError: isinstance expected 2 arguments, got 1Frothy
@jimbo1qaz If you still have this problem, try print(1 |> isinstance$(int)), or preferably, 1 |> isinstance$(int) |> print.Lynching
@Solomon Ucko your answer is wrong. 1 |> print$(2) calls print(2, 1) since $ maps to Python partials. but I want print(1, 2) which matches UFCS and magrittr. Motivation: 1 |> add(2) |> divide(6) should be 0.5, and I should not need parentheses.Frothy
@jimbo1qaz Yeah, it looks like my previous comment is wrong. You would actually need 1 |> isinstance$(?, int) |> print. For your other examples: 1 |> print$(?, 2), 1 |> (+)$(?, 2) |> (/)$(?, 6). I don't think you can avoid parentheses for partial application.Lynching
Looking at how ugly both |> and (+)$(?, 2) is, I've come to the conclusion that the programming-language and math establishment does not want me to use this type of syntax, and makes it even uglier than resorting to a set of parentheses. I would use it if it had better syntax (eg. Dlang has UFCS but IDK about arithmetic functions, or if Python had a .. pipe operator).Frothy
@jimbo1qaz it would be easier if there were a reliable, cross-platform way to enter UnicodeEpenthesis
Out of interest, why do you say 'just for personal scripting'? I have a production use case where the pattern matching of coconut would be very useful... Is there something unsafe about it?Shuffle
@TomGreenwood 1) other developers won't be familiar with it, 2) it doesn't have a lot of library support & it's not clear what kind of long-term support the language itself has. If neither of those are a problem for you, then go for it. Take a look at the FAQ: coconut.readthedocs.io/en/master/FAQ.htmlEpenthesis
No debugger and limited IDE support (e.g. for refactoring) : showstoppers . I'll stick with stuff like fluentpyCarlisle
@Carlisle for what it's worth, Coconut just compiles to Python, so you can use the usual Python debuggers like PDB, PDB++, PuDB, etc.Epenthesis
V
24

Does the python language have support for something similar?

"more functional piping syntax" is this really a more "functional" syntax ? I would say it adds an "infix" syntax to R instead.

That being said, the Python's grammar does not have direct support for infix notation beyond the standard operators.


If you really need something like that, you should take that code from Tomer Filiba as a starting point to implement your own infix notation:

Code sample and comments by Tomer Filiba (http://tomerfiliba.com/blog/Infix-Operators/) :

from functools import partial

class Infix(object):
    def __init__(self, func):
        self.func = func
    def __or__(self, other):
        return self.func(other)
    def __ror__(self, other):
        return Infix(partial(self.func, other))
    def __call__(self, v1, v2):
        return self.func(v1, v2)

Using instances of this peculiar class, we can now use a new "syntax" for calling functions as infix operators:

>>> @Infix
... def add(x, y):
...     return x + y
...
>>> 5 |add| 6
Valdivia answered 31/1, 2015 at 14:33 Comment(0)
B
19

You can use sspipe library. It exposes two objects p and px. Similar to x %>% f(y,z), you can write x | p(f, y, z) and similar to x %>% .^2 you can write x | px**2.

from sspipe import p, px
from math import sqrt

12 | p(sqrt) | px ** 2 | p(str)
Bumkin answered 20/7, 2018 at 19:43 Comment(3)
This looks good but can you pipe to the second variable 3 | p( f( x, . ) ) ? In R this would be : 3 %>% f(x, .)Vinculum
@Vinculum Yes. That would be possible with px, which is similar to . in R. Though, you should be careful that px should be passed to p(), not to f(). Example: 3 | p(f, x, px)Bumkin
In application, this answer seems the closest to the dplyr %>%Arguseyed
N
19

There is dfply module. You can find more information at

https://github.com/kieferk/dfply

Some examples are:

from dfply import *
diamonds >> group_by('cut') >> row_slice(5)
diamonds >> distinct(X.color)
diamonds >> filter_by(X.cut == 'Ideal', X.color == 'E', X.table < 55, X.price < 500)
diamonds >> mutate(x_plus_y=X.x + X.y, y_div_z=(X.y / X.z)) >> select(columns_from('x')) >> head(3)
Navada answered 1/4, 2019 at 15:1 Comment(6)
This should be marked as the correct answer, in my opinion. Also, it appears that both dfply and dplython are the same packages. Is there any difference between them? @NavadaUse
dfply, dplython, plydata packages are python ports of the dplyr package so they are going to be pretty similar in syntax.Navada
dfply is the only one even remotely recently touched: and even that one has zero closed issues or commits for 18 months as of March 2021. I ping'ed the project to see if they have any plans to "wake up"Carlisle
FWIW I maintain another port called siuba. It has the added advantage of being able to generate SQL code, and speed up grouped operations! github.com/machow/siubaShiah
No way not correct answer. 3 >> np.sqrt gives an error, but in R it is 3 %>% sqrtVinculum
@Vinculum it's impossible to do the %>% in python so this is likely going to be the closest python equivalentCarlisle
C
14

There is no need for 3rd party libraries or confusing operator trickery to implement a pipe function - you can get the basics going quite easily yourself.

Lets start by defining what a pipe function actually is. At its heart, it is just a way to express a series of function calls in logical order, rather than the standard 'inside out' order.

For example, lets look at these functions:

def one(value):
  return value

def two(value):
  return 2*value

def three(value):
  return 3*value

Not very interesting, but assume interesting things are happening to value. We want to call them in order, passing the output of each to the next. In vanilla python that would be:

result = three(two(one(1)))

It is not incredibly readable and for more complex pipelines its gonna get worse. So, here is a simple pipe function which takes an initial argument, and the series of functions to apply it to:

def pipe(first, *args):
  for fn in args:
    first = fn(first)
  return first

Lets call it:

result = pipe(1, one, two, three)

That looks like very readable 'pipe' syntax to me :). I don't see how it is any less readable than overloading operators or anything like that. In fact, I would argue that it is more readable python code

Here is the humble pipe solving the OP's examples:

from math import sqrt
from datetime import datetime

def as_date(s):
  return datetime.strptime(s, '%Y-%m-%d')

def as_character(value):
  # Do whatever as.character does
  return value

pipe("2014-01-01", as_date)
pipe(12, sqrt, lambda x: x**2, as_character)
Commodus answered 10/3, 2020 at 16:7 Comment(4)
I liked this solution a lot because the syntax is simple and easy to read. It is something one could type constantly. My only question is if the for loop would affect the performance of the composition of functions.Aseptic
Python3 requires adding list(list(list(... ))) to the code . Nearly Impossible to read. Backwards reading is this also. Try fluentpy or infixpyCarlisle
@StephenBoesch....wrong on both counts. I don't see why you think multiple calls to list is necessary? the solution here does not return a list at all. Likewise, the pipe function does not read backwards - it is left to right. If you want right to left, you are looking for a compose function. Equally easy to do with the same principlesCommodus
This is certainly not "wrong on both counts". python3 defaults to iterators so list is needed to materialize the results of a collection. Multiple map, filter etc's require one list per each thus the pollution of the code . The pipe is a third party library so your comment does not apply to itCarlisle
N
12

I missed the |> pipe operator from Elixir so I created a simple function decorator (~ 50 lines of code) that reinterprets the >> Python right shift operator as a very Elixir-like pipe at compile time using the ast library and compile/exec:

from pipeop import pipes

def add3(a, b, c):
    return a + b + c

def times(a, b):
    return a * b

@pipes
def calc()
    print 1 >> add3(2, 3) >> times(4)  # prints 24

All it's doing is rewriting a >> b(...) as b(a, ...).

https://pypi.org/project/pipeop/

https://github.com/robinhilliard/pipes

Nirvana answered 9/4, 2018 at 13:8 Comment(0)
R
9

Building pipe with Infix

As hinted at by Sylvain Leroux, we can use the Infix operator to construct a infix pipe. Let's see how this is accomplished.

First, here is the code from Tomer Filiba

Code sample and comments by Tomer Filiba (http://tomerfiliba.com/blog/Infix-Operators/) :

from functools import partial

class Infix(object):
    def __init__(self, func):
        self.func = func
    def __or__(self, other):
        return self.func(other)
    def __ror__(self, other):
        return Infix(partial(self.func, other))
    def __call__(self, v1, v2):
        return self.func(v1, v2)

Using instances of this peculiar class, we can now use a new "syntax" for calling functions as infix operators:

>>> @Infix
... def add(x, y):
...     return x + y
...
>>> 5 |add| 6

The pipe operator passes the preceding object as an argument to the object that follows the pipe, so x %>% f can be transformed into f(x). Consequently, the pipe operator can be defined using Infix as follows:

In [1]: @Infix
   ...: def pipe(x, f):
   ...:     return f(x)
   ...:
   ...:

In [2]: from math import sqrt

In [3]: 12 |pipe| sqrt |pipe| str
Out[3]: '3.4641016151377544'

A note on partial application

The %>% operator from dpylr pushes arguments through the first argument in a function, so

df %>% 
filter(x >= 2) %>%
mutate(y = 2*x)

corresponds to

df1 <- filter(df, x >= 2)
df2 <- mutate(df1, y = 2*x)

The easiest way to achieve something similar in Python is to use currying. The toolz library provides a curry decorator function that makes constructing curried functions easy.

In [2]: from toolz import curry

In [3]: from datetime import datetime

In [4]: @curry
    def asDate(format, date_string):
        return datetime.strptime(date_string, format)
    ...:
    ...:

In [5]: "2014-01-01" |pipe| asDate("%Y-%m-%d")
Out[5]: datetime.datetime(2014, 1, 1, 0, 0)

Notice that |pipe| pushes the arguments into the last argument position, that is

x |pipe| f(2)

corresponds to

f(2, x)

When designing curried functions, static arguments (i.e. arguments that might be used for many examples) should be placed earlier in the parameter list.

Note that toolz includes many pre-curried functions, including various functions from the operator module.

In [11]: from toolz.curried import map

In [12]: from toolz.curried.operator import add

In [13]: range(5) |pipe| map(add(2)) |pipe| list
Out[13]: [2, 3, 4, 5, 6]

which roughly corresponds to the following in R

> library(dplyr)
> add2 <- function(x) {x + 2}
> 0:4 %>% sapply(add2)
[1] 2 3 4 5 6

Using other infix delimiters

You can change the symbols that surround the Infix invocation by overriding other Python operator methods. For example, switching __or__ and __ror__ to __mod__ and __rmod__ will change the | operator to the mod operator.

In [5]: 12 %pipe% sqrt %pipe% str
Out[5]: '3.4641016151377544'
Ruble answered 21/8, 2017 at 12:56 Comment(0)
M
6

Adding my 2c. I personally use package fn for functional style programming. Your example translates into

from fn import F, _
from math import sqrt

(F(sqrt) >> _**2 >> str)(12)

F is a wrapper class with functional-style syntactic sugar for partial application and composition. _ is a Scala-style constructor for anonymous functions (similar to Python's lambda); it represents a variable, hence you can combine several _ objects in one expression to get a function with more arguments (e.g. _ + _ is equivalent to lambda a, b: a + b). F(sqrt) >> _**2 >> str results in a Callable object that can be used as many times as you want.

Messieurs answered 8/12, 2017 at 21:42 Comment(2)
Just what i'm looking for - even mentioned scala as an illustration. Trying it out nowCarlisle
@javadba I'm glad you've found this useful. Take note, that _ is not 100% flexible: it doesn't not support all Python operators. Additionaly, if you plan on using _ in an interactive session, you should import it under another name (e.g. from fn import _ as var), because most (if not all) interactive Python shells use _ to represent the last unassigned returned value, thus shadowing the imported object.Messieurs
S
5

There is very nice pipe module here https://pypi.org/project/pipe/ It overloads | operator and provide a lot of pipe-functions like add, first, where, tail etc.

>>> [1, 2, 3, 4] | where(lambda x: x % 2 == 0) | add
6

>>> sum([1, [2, 3], 4] | traverse)
10

Plus it's very easy to write own pipe-functions

@Pipe
def p_sqrt(x):
    return sqrt(x)

@Pipe
def p_pr(x):
    print(x)

9 | p_sqrt | p_pr
Sizing answered 27/10, 2019 at 22:21 Comment(0)
H
4

One alternative solution would be to use the workflow tool dask. Though it's not as syntactically fun as...

var
| do this
| then do that

...it still allows your variable to flow down the chain and using dask gives the added benefit of parallelization where possible.

Here's how I use dask to accomplish a pipe-chain pattern:

import dask

def a(foo):
    return foo + 1
def b(foo):
    return foo / 2
def c(foo,bar):
    return foo + bar

# pattern = 'name_of_behavior': (method_to_call, variables_to_pass_in, variables_can_be_task_names)
workflow = {'a_task':(a,1),
            'b_task':(b,'a_task',),
            'c_task':(c,99,'b_task'),}

#dask.visualize(workflow) #visualization available. 

dask.get(workflow,'c_task')

# returns 100

After having worked with elixir I wanted to use the piping pattern in Python. This isn't exactly the same pattern, but it's similar and like I said, comes with added benefits of parallelization; if you tell dask to get a task in your workflow which isn't dependant upon others to run first, they'll run in parallel.

If you wanted easier syntax you could wrap it in something that would take care of the naming of the tasks for you. Of course in this situation you'd need all functions to take the pipe as the first argument, and you'd lose any benefit of parallization. But if you're ok with that you could do something like this:

def dask_pipe(initial_var, functions_args):
    '''
    call the dask_pipe with an init_var, and a list of functions
    workflow, last_task = dask_pipe(initial_var, {function_1:[], function_2:[arg1, arg2]})
    workflow, last_task = dask_pipe(initial_var, [function_1, function_2])
    dask.get(workflow, last_task)
    '''
    workflow = {}
    if isinstance(functions_args, list):
        for ix, function in enumerate(functions_args):
            if ix == 0:
                workflow['task_' + str(ix)] = (function, initial_var)
            else:
                workflow['task_' + str(ix)] = (function, 'task_' + str(ix - 1))
        return workflow, 'task_' + str(ix)
    elif isinstance(functions_args, dict):
        for ix, (function, args) in enumerate(functions_args.items()):
            if ix == 0:
                workflow['task_' + str(ix)] = (function, initial_var)
            else:
                workflow['task_' + str(ix)] = (function, 'task_' + str(ix - 1), *args )
        return workflow, 'task_' + str(ix)

# piped functions
def foo(df):
    return df[['a','b']]
def bar(df, s1, s2):
    return df.columns.tolist() + [s1, s2]
def baz(df):
    return df.columns.tolist()

# setup 
import dask
import pandas as pd
df = pd.DataFrame({'a':[1,2,3],'b':[1,2,3],'c':[1,2,3]})

Now, with this wrapper, you can make a pipe following either of these syntactical patterns:

# wf, lt = dask_pipe(initial_var, [function_1, function_2])
# wf, lt = dask_pipe(initial_var, {function_1:[], function_2:[arg1, arg2]})

like this:

# test 1 - lists for functions only:
workflow, last_task =  dask_pipe(df, [foo, baz])
print(dask.get(workflow, last_task)) # returns ['a','b']

# test 2 - dictionary for args:
workflow, last_task = dask_pipe(df, {foo:[], bar:['string1', 'string2']})
print(dask.get(workflow, last_task)) # returns ['a','b','string1','string2']
Heptastich answered 2/7, 2018 at 15:49 Comment(1)
one problem with this is that you can't pass functions in as arguments :(Heptastich
O
3

The pipe functionality can be achieved by composing pandas methods with the dot. Here is an example below.

Load a sample data frame:

import seaborn    
iris = seaborn.load_dataset("iris")
type(iris)
# <class 'pandas.core.frame.DataFrame'>

Illustrate the composition of pandas methods with the dot:

(iris.query("species == 'setosa'")
     .sort_values("petal_width")
     .head())

You can add new methods to panda data frame if needed (as done here for example):

pandas.DataFrame.new_method  = new_method
Onanism answered 13/11, 2020 at 17:39 Comment(0)
E
1

Just use cool.

First, run python -m pip install cool. Then, run python.

from cool import F

range(10) | F(filter, lambda x: x % 2) | F(sum) == 25

You can read https://github.com/abersheeran/cool to get more usages.

Elene answered 7/4, 2021 at 8:33 Comment(0)
E
0

My two cents inspired by http://tomerfiliba.com/blog/Infix-Operators/

class FuncPipe:
  class Arg:
    def __init__(self, arg):
      self.arg = arg
    def __or__(self, func):
      return func(self.arg)

  def __ror__(self, arg):
    return self.Arg(arg)
pipe = FuncPipe()

Then

1 |pipe| \
  (lambda x: return x+1) |pipe| \
  (lambda x: return 2*x)

returns

4 
Erikaerikson answered 11/1, 2021 at 1:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.