Parallel for loop in Julia
Asked Answered
E

1

6

I am aware there are a multitude of questions about running parallel for loops in Julia, using @threads, @distributed, and other methods. I have tried to implement the solutions there with no luck. The structure of what I'd like to do is as follows.

for index in list_of_indices  
    data = h5read("data_set_$index.h5")  
    result = perform_function(data)  
    save(result)  
end

The data sets are independent, and no part of this loop depends on any other. It seems this should be parallelizable.

I have tried, e.g.,

"@threads for index in list_of_indices..." and I get a segmentation error

"@distributed for index in list_of_indices..." and the code does not actually perform the function on my data.

I assume I'm missing something about how parallel processes work, and any insight would be appreciated.

Here is a MWE:

Assume we have files data_1.h5, data_2.h5, data_3.h5 in our working directory. (I don't know how to make things more self-contained than this because I think the problem is arising from asking multiple threads to read files.)

using Distributed
using HDF5

list = [1,2,3]


Threads.@threads for index in list
    data = h5read("data_$index.h5", "data")
    println(data)
end

The error I get is

signal (11): Segmentation fault
signal (6): Aborted
Allocations: 1587194 (Pool: 1586780; Big: 414); GC: 1
Segmentation fault (core dumped)
Evaporation answered 10/12, 2022 at 19:1 Comment(0)
H
3

As noted by other people there is no enough details. However, given the current state of information the safest code that has the highest chance to work is:

using Distributed
addprocs(4)
@everywhere using HDF5

list = [1,2,3]


@sync @distributed for index in list
    data = h5read("data_$index.h5", "data")
    println(data)
end

Distributed approach separates processes completely and hence you have much lesser chance to do something wrong (eg. use a library with a shared resource etc).

Huihuie answered 11/12, 2022 at 0:3 Comment(2)
Thank you! In case others arrive here, another approach is the following. What seemed to be the issue with multithreading was trying to read all of the files at the same time. If instead one loads the data into, e.g., a dictionary, then does the for loop with Threads.@threads, the code will run without error.Evaporation
reading all files at the same time (I/O) should not do segmentation fault. However something wrong about h5read implementation (some not thread safe shared global state) could have caused problem.Huihuie

© 2022 - 2024 — McMap. All rights reserved.