Execute "git submodule foreach" in parallel

Asked 24/4, 2018 at 10:27 Answered 20/12, 2021 at 6:25

Solved git parallel-processing git-submodules

Is there any way to execute a git submodule foreach command in parallel, similarly of how the --jobs 8 parameter works with git submodule update?

For example, one of the projects we work on involves almost 200 sub-components (submodules) and we heavily use the foreach command to operate on them. I'd like to speed them up.

PS: In the case the solution involves a script, I work on Windows and, most of the time, using git-bash.

Middling answered 24/4, 2018 at 10:27 Comment(2)

There is no a builtin way, you have to use external tools like foreach_submodule.js or git-deep. PS. I haven't tried them, no idea if they work at all. – Pauiie 24/4, 2018 at 15:14

@Pauiie such a pity a built-in way is not included, I suppose because the complexity of guaranteeing the mutual exclusion between operations so it is safer to not offer it at all, I'll take a look at those packages, thanks! – Middling 4/5, 2018 at 15:52

I propose you a solution based on a interpreted language multiplatform like Python.

Process Launcher

First of all you need define a class to manage the process to launch the command.

class PFSProcess(object):
    def __init__(self, submodule, path, cmd):
        self.__submodule = submodule
        self.__path = path
        self.__cmd = cmd
        self.__output = None
        self.__p = None

    def run(self):
        self.__output = "\n\n" + self.__submodule + "\n"
        self.__p = subprocess.Popen(self.__cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True,
                             cwd=os.path.join(self.__path, self.__submodule))
        self.__output += self.__p.communicate()[0].decode('utf-8')
        if self.__p.communicate()[1]:
            self.__output += self.__p.communicate()[1].decode('utf-8')
        print(self.__output)

Multithreading

Next step is a generate multithread execution. Python includes in its core very powerful library to work with Threads. You can use it importing the following package:

import threading

Before threads creation you need create a worker, a function to call for each thread:

def worker(submodule_list, path, command):
    for submodule in submodule_list:
        PFSProcess(submodule, path, command).run()

As you can see the worker recives a submodule list. For clarity and because it is out of our scope, I recommend you take a look to .gitmodules from where you can generate the list of your submodules reading the file.

💡 < Tip >

As basic orientation you can find the following line in each submodule:

path = relative_path/project

For that purpose you can use this regular expression:

'path ?= ?([A-za-z0-9-_]+)(\/[A-za-z0-9-_]+)*([A-za-z0-9-_])'

If the regular expression matches you can get the relative path using the following one in the same line:

' ([A-za-z0-9-_]+)(\/[A-za-z0-9-_]+)*([A-za-z0-9-_])'

Pay attention because the last regular expression returns the relative path with a space character at first position.

💡 < / Tip>

Then split the submodule list into as many chunks as jobs that you want:

num_jobs = 8

i = 0
for submodule in submodules:
    submodule_list[i % num_jobs].append(submodule)
    i += 1

Finally dispatch each chunk (job) to each thread and wait until all threads finish:

for i in range(num_jobs):
    t = threading.Thread(target=worker, args=(list_submodule_list[i], self.args.path, self.args.command,))
    self.__threads.append(t)
    t.start()

for i in range(num_jobs):
    self.__threads[i].join()

Obviously I have exposed the basic concepts, but you can access to full implementation accessing to parallel_foreach_submodule (PFS) project in GitHub.

Degrading answered 4/5, 2018 at 15:43 Comment(1)

Thanks a lot! already using it! BTW, shouldn't the i += 1 be inside the for submodule in submodules loop? – Middling 7/5, 2018 at 6:49

A simple, bash only solution is to do this (replace <command with your command>):

IFS=$'\n'
for DIR in $(git submodule foreach -q sh -c pwd); do
    cd $DIR && <command> &
done
wait

As a generic command (create a file called "git-foreach-parallel"):

#!/bin/bash

if [ -z "$1" ]; then
    echo "Missing Command" >&2
    exit 1
fi

IFS=$'\n'
for DIR in $(git submodule foreach -q sh -c pwd); do
    cd "$DIR" && "$@" &
done
wait

Smarmy answered 11/10, 2021 at 12:46 Comment(3)

Maybe I'm not reading it correctly, but I cannot see the parallelization at any point. – Middling 14/10, 2021 at 10:25

@Middling Notice that there is ` &` in the end of command. It makes command parallel – Dhoti 20/12, 2021 at 5:56

Oh! Thanks a lot, I need new glasses 😅 – Middling 20/12, 2021 at 6:3

If somebody is looking for a pure-bash way to do it(to not install python in docker container or something), that's what helped me

Usage examples

bash git-submodule-foreach-parallel.sh "git fetch && git checkout master"

bash git-submodule-foreach-parallel.sh "git fetch && git pull"

bash git-submodule-foreach-parallel.sh "git fetch && git push"

COMMAND="git clean -dfx -e \"**/.idea\""
# Running command in parent repository
eval "$COMMAND"
# Running command in submodules
bash git-submodule-foreach-parallel.sh "$COMMAND"

git-submodule-foreach-parallel.sh (usage examples run it)

#!/bin/bash

if [ -z "$1" ]; then
    echo "Missing Command" >&2
    exit 1
fi

COMMAND="$@"

IFS=$'\n'
for DIR in $(git submodule foreach --recursive -q sh -c pwd); do
    printf "\nStarted running command \"${COMMAND}\" in directory \"${DIR}\"\n" \
    && \
    cd "$DIR" \
    && \
    eval "$COMMAND" \
    && \
    printf "Finished running command \"${COMMAND}\" in directory \"${DIR}\"\n" \
    &
done
wait

Dhoti answered 20/12, 2021 at 6:25 Comment(0)

Process Launcher

Multithreading

💡 < Tip >

💡 < / Tip>

Recommended topics

Hot tags