Python equivalent of find2perl
Asked Answered
H

4

15

Perl has a lovely little utility called find2perl that will translate (quite faithfully) a command line for the Unix find utility into a Perl script to do the same.

If you have a find command like this:

find /usr -xdev -type d -name '*share'

                         ^^^^^^^^^^^^  => name with shell expansion of '*share'
                 ^^^^ => Directory (not a file)
           ^^^ => Do not go to external file systems
     ^^^ => the /usr directory (could be multiple directories

It finds all the directories ending in share below /usr

Now run find2perl /usr -xdev -type d -name '*share' and it will emit a Perl script to do the same. You can then modify the script to your use.

Python has os.walk() which certainly has the needed functionality, recursive directory listing, but there are big differences.

Take the simple case of find . -type f -print to find and print all files under the current directory. A naïve implementation using os.walk() would be:

for path, dirs, files in os.walk(root):
    if files:
        for file in files:
            print os.path.join(path,file)

However, this will produce different results than typing find . -type f -print in the shell.

I have also been testing various os.walk() loops against:

# create pipe to 'find' with the commands with arg of 'root'
find_cmd='find %s -type f' % root
args=shlex.split(find_cmd)
p=subprocess.Popen(args,stdout=subprocess.PIPE)
out,err=p.communicate()    
out=out.rstrip()            # remove terminating \n
for line in out.splitlines()
   print line

The difference is that os.walk() counts links as files; find skips these.

So a correct implementation that is the same as file . -type f -print becomes:

for path, dirs, files in os.walk(root):
    if files:
        for file in files:
            p=os.path.join(path,file)
            if os.path.isfile(p) and not os.path.islink(p):
                 print(p)

Since there are hundreds of permutations of find primaries and different side effects, this becomes time consuming to test every variant. Since find is the gold standard in the POSIX world on how to count files in a tree, doing it the same way in Python is important to me.

So is there an equivalent of find2perl that can be used for Python? So far I have just been using find2perl and then manually translating the Perl code. This is hard because the Perl file test operators are different than the Python file tests in os.path at times.

Horseshoe answered 24/9, 2011 at 20:56 Comment(1)
I'd suggest part of the answer could be found here: https://mcmap.net/q/825995/-os-walk-with-regex Sorry I don't know find/find2perl enough to help more. (maybe also #5141937)Crysta
R
2

There are a couple of observations and several pieces of code to help you on your way.

First, Python can execute code in this form just like Perl:

 cat code.py | python | the rest of the pipe story...

find2perl is a clever code template that emits a Perl function based on a template of find. Therefor, replicate this template and you will not have the "hundreds of permutations" that you are perceiving.

Second, the results from find2perl are not perfect just as there are potentially differences between versions of find, such as GNU or BSD.

Third, by default, os.walk is bottom up; find is top down. This makes for different results if your underlying directory tree is changing while you recurse it.

There are two projects in Python that may help you: twander and dupfinder. Each strives to be os independent and each recurses the file system like find.

If you template a general find like function in Python, set os.walk to recurse top down, use glob to replicate shell expansion, and use some of the code that you find in those two projects, you can replicate find2perl without too much difficulty.

Sorry I could not point to something ready to go for your needs...

Rainband answered 2/10, 2011 at 22:42 Comment(0)
F
4

If you're trying to reimplement all of find, then yes, your code is going to get hairy. find is pretty hairy all by itself.

In most cases, though, you're not trying to replicate the complete behavior of find; you're performing a much simpler task (e.g., "find all files that end in .txt"). If you really need all of find, just run find and read the output. As you say, it's the gold standard; you might as well just use it.

I often write code that reads paths on stdin just so I can do this:

find ...a bunch of filters... | my_python_code.py
Fig answered 25/9, 2011 at 1:14 Comment(1)
This only works on the assumption that your target program environment is on Unix tho. The beauty of find2perl is that you can write something on Unix and run it anywhere that Perl runs -- on Windows for example.Rainband
R
2

There are a couple of observations and several pieces of code to help you on your way.

First, Python can execute code in this form just like Perl:

 cat code.py | python | the rest of the pipe story...

find2perl is a clever code template that emits a Perl function based on a template of find. Therefor, replicate this template and you will not have the "hundreds of permutations" that you are perceiving.

Second, the results from find2perl are not perfect just as there are potentially differences between versions of find, such as GNU or BSD.

Third, by default, os.walk is bottom up; find is top down. This makes for different results if your underlying directory tree is changing while you recurse it.

There are two projects in Python that may help you: twander and dupfinder. Each strives to be os independent and each recurses the file system like find.

If you template a general find like function in Python, set os.walk to recurse top down, use glob to replicate shell expansion, and use some of the code that you find in those two projects, you can replicate find2perl without too much difficulty.

Sorry I could not point to something ready to go for your needs...

Rainband answered 2/10, 2011 at 22:42 Comment(0)
T
1

I think glob could help in your implementation of this.

Thuggee answered 30/9, 2011 at 0:30 Comment(0)
E
1

I wrote a Python script to use os.walk() to search-and-replace; it might be a useful thing to look at before writing something like this.

Replace strings in files by Python

And any Python replacement for find(1) is going to rely heavily on os.stat() to check various properties of the file. For example, there are flags to find(1) that check the size of the file or the last modified timestamp.

Eradicate answered 30/9, 2011 at 1:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.