How do you select a subset of an array based on a condition in Julia

Asked 4/6, 2016 at 3:13 Answered 7/6, 2022 at 19:42

How do you do simply select a subset of an array based on a condition? I know Julia doesn't use vectorization, but there must be a simple way of doing the following without an ugly looking multi-line for loop

julia> map([1,2,3,4]) do x
       return (x%2==0)?x:nothing
       end
4-element Array{Any,1}:
  nothing
 2
  nothing
 4

Desired output:

[2, 4]

Observed output:

[nothing, 2, nothing, 4]

Jephthah answered 4/6, 2016 at 3:13 Comment(3)

What do you mean, "julia doesn't use vectorization"? That's just not true. You often get the fastest performance without using vectorization, but it's up to you to decide whether performance or brevity matters most. – Outbalance 4/6, 2016 at 20:3

+1 to what @Outbalance says. Vectorization improves readability but sometimes yields suboptimal performance. I remember answering a related question regarding find from the performance perspective. Depending on your design considerations, you could code your own efficient routine for x % 2 == 0 or just use Julia's eminently readable filter or find functions as people have described below. – Jacky 5/6, 2016 at 18:38

I meant "vectorization" more from a syntactic point of view than optimization point of view. Though @daycaster points out that I can actually index into arrays in the same way as matlab allows, I hadn't realized that, I had only tried == to index this way, not .==. I'm not sure which answer to pick, they're all right in different ways and very useful. – Jephthah 5/6, 2016 at 19:19

You are looking for filter. Here is an example an filter(x->x%2==0,[1,2,3,5]) returning [2].

Homesick answered 4/6, 2016 at 3:49 Comment(0)

There are element-wise operators (beginning with a "."):

julia> [1,2,3,4] % 2 .== 0
4-element BitArray{1}:
 false
  true
 false
  true

julia> x = [1,2,3,4]
4-element Array{Int64,1}:
 1
 2
 3
 4

julia> x % 2 .== 0
4-element BitArray{1}:
 false
  true
 false
  true

julia> x[x % 2 .== 0]
2-element Array{Int64,1}:
 2
 4

julia> x .% 2
4-element Array{Int64,1}:
 1
 0
 1
 0

Rocker answered 4/6, 2016 at 7:18 Comment(0)

You can use the find() function (or the .== syntax) to accomplish this. E.g.:

julia> x = collect(1:4)
4-element Array{Int64,1}:
 1
 2
 3
 4    

julia> y = x[find(x%2.==0)]
2-element Array{Int64,1}:
 2
 4

julia> y = x[x%2.==0]  ## more concise and slightly quicker
2-element Array{Int64,1}:
 2
 4

Note the .== syntax for the element-wise operation. Also, note that find() returns the indices that match the criteria. In this case, the indices matching the criteria are the same as the array elements that match the criteria. For the more general case though, we want to put the find() function in brackets to denote that we are using it to select indices from the original array x.

Update: Good point @Lutfullah Tomak about the filter() function. I believe though that find() can be quicker and more memory efficient. (though I understand that anonymous functions are supposed to get better in version 0.5 so perhaps this might change?) At least in my trial, I got:

x = collect(1:100000000);
@time y1 = filter(x->x%2==0,x);  
# 9.526485 seconds (100.00 M allocations: 1.554 GB, 2.76% gc time)    

@time y2 = x[find(x%2.==0)]; 
# 3.187476 seconds (48.85 k allocations: 1.504 GB, 4.89% gc time)

@time y3 = x[x%2.==0];
# 2.570451 seconds (57.98 k allocations: 1.131 GB, 4.17% gc time)

Update2: Good points in comments to this post that x[x%2.==0] is faster than x[find(x%2.==0)].

Noe answered 4/6, 2016 at 3:26 Comment(6)

why not just x[x%2.==0]? The find here is unnecessary and slow. – Lillalillard 4/6, 2016 at 4:56

I timed in julia 0.5 filter is fastest and something off with third one because it takes very long time to compile – Homesick 4/6, 2016 at 6:8

Anonymous functions will be as fast as normal functions in v0.5. The relevant github issue page is here. So filter will almost certainly be the way to go in v0.5+. – Trypanosome 4/6, 2016 at 11:48

Good to know! Is there an ETA for v0.5? – Noe 4/6, 2016 at 13:1

@aireties - @time y2 … is only faster because it's not doing what you think it's doing. y2 is empty! You're missing the . in .==. Fix that, and you'll find that calling find() is indeed slower. – Illboding 4/6, 2016 at 21:25

@MattB. You're right! Thanks! I've changed it now. – Noe 4/6, 2016 at 22:6

Another updated version:

v[v .% 2 .== 0]

Probably, for the newer versions of Julia, one needs to add broadcasting dot before both % and ==

Tolidine answered 7/6, 2022 at 19:42 Comment(0)

Recommended topics

Hot tags