Package set-up not propagating to workers with Distributed
Asked Answered
J

3

5

Info:

$ julia --version
julia version 1.6.0
$ lscpu
~/root/MyPackage$ lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   39 bits physical, 48 bits virtual
CPU(s):                          4
On-line CPU(s) list:             0-3
Thread(s) per core:              1
Core(s) per socket:              4
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           158
Model name:                      Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
...

Say I want the following package structure, and want to use ReTest's parallel testing (my issue appears to be with how code-loading works in Distributed, so this isn't really a ReTest-specific issue).

| root/
    | MyPackage/
        | Project.toml
        | Manifest.toml
        | src/
            | MyPackage.jl
        | test/
            | runtests.jl
            | MyPackageTests.jl

I initialised this package in the following way:

$ cd root && julia
(...) pkg> generate MyPackage;
$ cd MyPackage && julia
(...) pkg> activate .
(...) pkg> instantiate
(...) pkg> add ReTest InlineTest Distributed;
...

Fill in MyPackage.jl, runtests.jl, and MyPackageTests.jl with some Julia code. Not too important what that code is - although I am following the guide from here in ReTest.

Then to set up:

$ julia
(...) pkg> activate .
(...) pkg> instantiate
(MyPackage) pkg> st
     Project MyPackage v0.1.0
      Status `~/root/MyPackage/Project.toml`
  [bd334432] InlineTest v0.2.0
  [e0db7c4e] ReTest v0.3.2
  [8ba89e20] Distributed
julia> LOAD_PATH
3-element Vector{String}:
 "@"        # Should be current active environment for MyPackage
 "@v#.#"    # Should be @v1.6 on my system
 "@stdlib"  # Should be absolute path of current Julia installation's stdlib
julia> # Should this code be in .jl files? Don't think that should matter.
julia> using Distributed
julia> addprocs(2)
julia> @everywhere include("test/MyPackageTests.jl")
ERROR: On worker 2:
LoadError: ArgumentError: Package MyPackage not found in current path:
- Run `import Pkg; Pkg.add("MyPackage")` to install the MyPackage package.

Stacktrace:
 [1] require
   @ ./loading.jl:871
 [2] include
   @ ./client.jl:444
 [3] top-level scope
   @ none:1
 [4] eval
   @ ./boot.jl:360
 [5] #103
   @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Distributed/src/process_messages.jl:274
 [6] run_work_thunk
   @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Distributed/src/process_messages.jl:63
 [7] run_work_thunk
   @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Distributed/src/process_messages.jl:72
 [8] #96
   @ ./task.jl:406
in expression starting at /path/to/root/MyPackage/test/MyPackageTests.jl:1

...and 2 more exceptions.

Stacktrace:
 [1] sync_end(c::Channel{Any})
   @ Base ./task.jl:364
 [2] macro expansion
   @ ./task.jl:383 [inlined]
 [3] remotecall_eval(m::Module, procs::Vector{Int64}, ex::Expr)
   @ Distributed /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Distributed/src/macros.jl:223
 [4] top-level scope
   @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Distributed/src/macros.jl:207

I'd like to understand why this is. On GitHub I see this issue that was supposedly fixed. I can confirm when I run that example that I also get the exact same problem as above with MyPackage, if the using statement involves the package that the environment is for.

Before filing a bug or opening an issue there, I'd like to check here in case this is a process problem on my part. If not, there's evidently something wrong with Distributed/ReTest and I'll open tickets for those. Any help much appreciated.

Jollity answered 20/1, 2022 at 19:6 Comment(1)
The active Julia environment is not propagated to the workers by default. See github.com/JuliaLang/julia/issues/28781Lucienlucienne
J
4

Following on from what @carstenbauer said, the active Julia environment is not automatically propagated to worker processes by default. The way around this is to set the environment in the arguments to the call to addprocs like so:

julia> using Distributed
julia> addprocs(2, exeflags="--project=$(Base.active_project())")
julia> @everywhere include("test/MyPackageTests.jl")
julia> MyPackageTests.runtests()  # runs to completion

I can confirm that this works with both the MyPackage example as well as the one shown in the JuliaLang issue. Thanks to those that contributed towards this answer.

Jollity answered 28/1, 2022 at 15:10 Comment(0)
K
2

Try the following code:

using Distributed
addprocs(4) # or whatever you need or use the -p parameter
using Pkg
pkg"activate ."
pkg"instantiate"  # run this when needed
using MyPackage # first load package only an the master worker

@everywhere pkg"activate ."
@everywhere using MyPackage 

Explanation: each Julia processes is totally separated so it has its own package state, variables, memory etc.

Please note that you will usually prefer to load the package first on the master node as some packages might be performing some actions where loaded for the first time.

Kirksey answered 20/1, 2022 at 22:52 Comment(0)
O
0

I ran into this same problem, but strangely I could only overcome it if I used a separate @everywhere block when activating my environment, compared to importing the required packages.

e.g. This worked:

@everywhere begin
    using Pkg
    Pkg.activate(@__DIR__)
end
@everywhere begin
    using YAML
end

But this didn't work:

@everywhere begin
    using Pkg
    Pkg.activate(@__DIR__)
    using YAML
end
> ERROR: LoadError: ArgumentError: Package YAML not found in current path.
- Run `import Pkg; Pkg.add("YAML")` to install the YAML package.
Overstreet answered 13/4, 2023 at 5:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.