How to select elements from array in Julia matching predicate?
Asked Answered
H

7

36

Julia appears to have a lot of Matlab like features. I'd like to select from an array using a predicate. In Matlab I can do this like:

>> a = 2:7 ;
>> a > 4

ans =

     0     0     0     1     1     1

>> a(a>4)

ans =

     5     6     7

I found a kind of clunky seeming way to do part of this in Julia:

julia> a = 2:7
2:7

julia> [int(x > 3) for x in a]
6-element Array{Any,1}:
 0
 0
 1
 1
 1
 1

(Using what wikipedia calls list comprehension). I haven't figured out how to apply a set like this to select with in Julia, but may be barking up the wrong tree. How would one do a predicate selection from an array in Julia?

Hand answered 11/1, 2015 at 6:10 Comment(0)
H
46

You can use a very Matlab-like syntax if you use a dot . for elementwise comparison:

julia> a = 2:7
2:7

julia> a .> 4
6-element BitArray{1}:
 false
 false
 false
  true
  true
  true

julia> a[a .> 4]
3-element Array{Int32,1}:
 5
 6
 7

Alternatively, you can call filter if you want a more functional predicate approach:

julia> filter(x -> x > 4, a)
3-element Array{Int32,1}:
 5
 6
 7
Haunt answered 11/1, 2015 at 6:27 Comment(1)
For the record it looks like the elementwise method is about twice as fast as calling filter.Hand
C
21

Array comprehension in Julia is somewhat more primitive than list comprehension in Haskell or Python. There are two solutions — you can either use a higher-order filtering function, or use broadcasting operations.

Higher-order filtering

filter(x -> x > 4, a)

This calls the filter function with the predicate x -> x > 4 (see Anonymous functions in the Julia manual).

Broadcasting and indexing

a[Bool[a[i] > 4 for i = 1:length(a)]]

This performs a broadcasting comparision between the elements of a and 4, then uses the resulting array of booleans to index a. It can be written more compactly using a broadcasting operator:

a[a .> 4]
Caroche answered 11/1, 2015 at 6:36 Comment(4)
I have an array and would like to filter it on more than one thing. For example a > 3 && a < 5. However when I try to do this I get the error that non-boolean (BitArray{1}) used in boolean context. What is the problem here?Cypsela
This works fine with filter: filter( x -> (x > 3 && x < 5), a) docs.julialang.org/en/latest/stdlib/collections/…Emikoemil
And for arrays, you have to use bitwise operators a .> 3 & a .< 5 Notice the single & for bitwise andEmikoemil
Note that a=[1; 2; 3]; a[a.>0 & a.<1] will not yield the expected result! This gives 1 2 3, as the & operator has higher precedence. A correct solution is a[(a.>0) & (a.<1)].Johathan
C
8

I'm currently using Julia 1.3.1 and some syntax has changed compared to earlier answers. To filter an array on multiple conditions I had to do:

x = range(0,1,length=100)
x[(x .> 0.4) .& (x .< 0.51)] 

note the '.&' needed to do the AND operator.

Corrinacorrine answered 14/2, 2020 at 7:43 Comment(0)
Q
3

I would like to add an aspect that has not been covered by the previous answers. If you want to filter the array by the index values (as opposed to the array values) you can do so bya[1:end ...] where instead of the dots you apply a broadcast operator to the index values. E.g. in order to remove the third element you would write

a[1:end .!= 3].
Quadrivium answered 2/2, 2021 at 22:56 Comment(1)
This instance also allows a[1:end .!= [2,3]] But be careful because a[1:end .!= []] throws an exceptionMiles
I
1

To filter the keys in a dictionary, this worked for me:

mydict = Dict("key1" => 1.0, "key2" => 2.0, "a big string with a part of a string" => 3.0)
filter(x -> occursin("part of a string", string(x)), keys(mydict))

Here is what the output looks like on the REPL in Julia 1.0

julia> mydict = Dict("key1" => 1.0, "key2" => 2.0, "a big string with a part of a string" => 3.0)
Dict{String,Float64} with 3 entries:
  "key2"                                 => 2.0
  "key1"                                 => 1.0
  "a big string with a part of a string" => 3.0

julia> filter(x -> occursin("part of a string", string(x)), keys(mydict))
Set(["a big string with a part of a string"])

This in general is a great way to filter an array of strings.

Hope that helps.

Iatrochemistry answered 20/2, 2019 at 14:19 Comment(0)
G
0

Add a benchmark for this:

With an 1,000-length random Array{Float64}, filter(x->x>0.5,arr) seems faster than arr[arr.>0.5]:

This is for arr[arr.>0.5]:

BenchmarkTools.Trial: 10000 samples with 5 evaluations.
 Range (min … max):   6.080 μs … 819.740 μs  ┊ GC (min … max):  0.00% … 97.77%
 Time  (median):      7.440 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   11.572 μs ±  25.739 μs  ┊ GC (mean ± σ):  11.68% ±  5.28%

   ▅█▆▄▂          ▁▁▂▃▆▅▄▃▂▁                                   ▁
  ▆██████▇▆▆▆▅▅▄▄▇██████████▇▆▆▆▆▆▅▄▅▅▆▅▅▄▄▄▄▄▄▃▄▃▄▅▄▃▅▅▅▅▄▅▅▆ █
  6.08 μs       Histogram: log(frequency) by time      30.7 μs <

 Memory estimate: 44.00 KiB, allocs estimate: 6.

And this is for filter(x->x>0.5,arr):

 BenchmarkTools.Trial: 10000 samples with 6 evaluations.
 Range (min … max):  5.317 μs … 474.000 μs  ┊ GC (min … max):  0.00% … 97.66%
 Time  (median):     6.767 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   8.992 μs ±  15.098 μs  ┊ GC (mean ± σ):  11.70% ±  6.97%

  ▁▄▅▆▇█▆▅▂   ▁          ▁▁▃▃▂▃▃▃▂▂▁                          ▂
  ██████████████▇▇▅▇▄▄▄▆█████████████▇▆▆▅▅▆▆▅▆▄▅▅▅▅▅▄▅▃▄▃▃▂▃▂ █
  5.32 μs      Histogram: log(frequency) by time        21 μs <

 Memory estimate: 78.17 KiB, allocs estimate: 3.

I'm new to Julia. Could anyone tell me why this would happen?

Gigi answered 7/7 at 11:51 Comment(0)
F
0

11.572 μs ± 25.739 and 8.992 μs ± 15.098 are not so very different even after 10,000 samples. As a quick estimate, this means that the first one would be faster about 47% of the time which is a pretty good synonym for "about equal".

Facsimile answered 22/7 at 0:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.