How do you select a subset of an array based on a condition in Julia
Asked Answered
J

4

5

How do you do simply select a subset of an array based on a condition? I know Julia doesn't use vectorization, but there must be a simple way of doing the following without an ugly looking multi-line for loop

julia> map([1,2,3,4]) do x
       return (x%2==0)?x:nothing
       end
4-element Array{Any,1}:
  nothing
 2
  nothing
 4

Desired output:

[2, 4]

Observed output:

[nothing, 2, nothing, 4]
Jephthah answered 4/6, 2016 at 3:13 Comment(3)
What do you mean, "julia doesn't use vectorization"? That's just not true. You often get the fastest performance without using vectorization, but it's up to you to decide whether performance or brevity matters most.Outbalance
+1 to what @Outbalance says. Vectorization improves readability but sometimes yields suboptimal performance. I remember answering a related question regarding find from the performance perspective. Depending on your design considerations, you could code your own efficient routine for x % 2 == 0 or just use Julia's eminently readable filter or find functions as people have described below.Jacky
I meant "vectorization" more from a syntactic point of view than optimization point of view. Though @daycaster points out that I can actually index into arrays in the same way as matlab allows, I hadn't realized that, I had only tried == to index this way, not .==. I'm not sure which answer to pick, they're all right in different ways and very useful.Jephthah
H
14

You are looking for filter. Here is an example an filter(x->x%2==0,[1,2,3,5]) returning [2].

Homesick answered 4/6, 2016 at 3:49 Comment(0)
R
9

There are element-wise operators (beginning with a "."):

julia> [1,2,3,4] % 2 .== 0
4-element BitArray{1}:
 false
  true
 false
  true

julia> x = [1,2,3,4]
4-element Array{Int64,1}:
 1
 2
 3
 4

julia> x % 2 .== 0
4-element BitArray{1}:
 false
  true
 false
  true

julia> x[x % 2 .== 0]
2-element Array{Int64,1}:
 2
 4

julia> x .% 2
4-element Array{Int64,1}:
 1
 0
 1
 0
Rocker answered 4/6, 2016 at 7:18 Comment(0)
N
8

You can use the find() function (or the .== syntax) to accomplish this. E.g.:

julia> x = collect(1:4)
4-element Array{Int64,1}:
 1
 2
 3
 4    

julia> y = x[find(x%2.==0)]
2-element Array{Int64,1}:
 2
 4

julia> y = x[x%2.==0]  ## more concise and slightly quicker
2-element Array{Int64,1}:
 2
 4

Note the .== syntax for the element-wise operation. Also, note that find() returns the indices that match the criteria. In this case, the indices matching the criteria are the same as the array elements that match the criteria. For the more general case though, we want to put the find() function in brackets to denote that we are using it to select indices from the original array x.

Update: Good point @Lutfullah Tomak about the filter() function. I believe though that find() can be quicker and more memory efficient. (though I understand that anonymous functions are supposed to get better in version 0.5 so perhaps this might change?) At least in my trial, I got:

x = collect(1:100000000);
@time y1 = filter(x->x%2==0,x);  
# 9.526485 seconds (100.00 M allocations: 1.554 GB, 2.76% gc time)    

@time y2 = x[find(x%2.==0)]; 
# 3.187476 seconds (48.85 k allocations: 1.504 GB, 4.89% gc time)

@time y3 = x[x%2.==0];
# 2.570451 seconds (57.98 k allocations: 1.131 GB, 4.17% gc time)    

Update2: Good points in comments to this post that x[x%2.==0] is faster than x[find(x%2.==0)].

Noe answered 4/6, 2016 at 3:26 Comment(6)
why not just x[x%2.==0]? The find here is unnecessary and slow.Lillalillard
I timed in julia 0.5 filter is fastest and something off with third one because it takes very long time to compileHomesick
Anonymous functions will be as fast as normal functions in v0.5. The relevant github issue page is here. So filter will almost certainly be the way to go in v0.5+.Trypanosome
Good to know! Is there an ETA for v0.5?Noe
@aireties - @time y2 … is only faster because it's not doing what you think it's doing. y2 is empty! You're missing the . in .==. Fix that, and you'll find that calling find() is indeed slower.Illboding
@MattB. You're right! Thanks! I've changed it now.Noe
T
0

Another updated version:

v[v .% 2 .== 0]

Probably, for the newer versions of Julia, one needs to add broadcasting dot before both % and ==

Tolidine answered 7/6, 2022 at 19:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.