How to write "good" Julia code when dealing with multiple types and arrays (multiple dispatch)
Asked Answered
C

2

40

OP UPDATE: Note that in the latest version of Julia (v0.5), the idiomatic approach to answering this question is to just define mysquare(x::Number) = x^2. The vectorised case is covered using automatic broadcasting, i.e. x = randn(5) ; mysquare.(x). See also the new answer explaining dot syntax in more detail.

I am new to Julia, and given my Matlab origins, I am having some difficulty determining how to write "good" Julia code that takes advantage of multiple dispatch and Julia's type system.

Consider the case where I have a function that provides the square of a Float64. I might write this as:

function mysquare(x::Float64)
    return(x^2);
end

Sometimes, I want to square all the Float64s in a one-dimentional array, but don't want to write out a loop over mysquare everytime, so I use multiple dispatch and add the following:

function mysquare(x::Array{Float64, 1})
    y = Array(Float64, length(x));
    for k = 1:length(x)
        y[k] = x[k]^2;
    end
    return(y);
end

But now I am sometimes working with Int64, so I write out two more functions that take advantage of multiple dispatch:

function mysquare(x::Int64)
    return(x^2);
end
function mysquare(x::Array{Int64, 1})
    y = Array(Float64, length(x));
    for k = 1:length(x)
        y[k] = x[k]^2;
    end
    return(y);
end

Is this right? Or is there a more ideomatic way to deal with this situation? Should I use type parameters like this?

function mysquare{T<:Number}(x::T)
    return(x^2);
end
function mysquare{T<:Number}(x::Array{T, 1})
    y = Array(Float64, length(x));
    for k = 1:length(x)
        y[k] = x[k]^2;
    end
    return(y);
end

This feels sensible, but will my code run as quickly as the case where I avoid parametric types?

In summary, there are two parts to my question:

  1. If fast code is important to me, should I use parametric types as described above, or should I write out multiple versions for different concrete types? Or should I do something else entirely?

  2. When I want a function that operates on arrays as well as scalars, is it good practice to write two versions of the function, one for the scalar, and one for the array? Or should I be doing something else entirely?

Finally, please point out any other issues you can think of in the code above as my ultimate goal here is to write good Julia code.

Cortege answered 29/7, 2014 at 6:2 Comment(0)
R
43

Julia compiles a specific version of your function for each set of inputs as required. Thus to answer part 1, there is no performance difference. The parametric way is the way to go.

As for part 2, it might be a good idea in some cases to write a separate version (sometimes for performance reasons, e.g., to avoid a copy). In your case however you can use the in-built macro @vectorize_1arg to automatically generate the array version, e.g.:

function mysquare{T<:Number}(x::T)
    return(x^2)
end
@vectorize_1arg Number mysquare
println(mysquare([1,2,3]))

As for general style, don't use semicolons, and mysquare(x::Number) = x^2 is a lot shorter.

As for your vectorized mysquare, consider the case where T is a BigFloat. Your output array, however, is Float64. One way to handle this would be to change it to

function mysquare{T<:Number}(x::Array{T,1})
    n = length(x)
    y = Array(T, n)
    for k = 1:n
        @inbounds y[k] = x[k]^2
    end
    return y
 end

where I've added the @inbounds macro to boost speed because we don't need to check the bound violation every time — we know the lengths. This function could still have issues in the event that the type of x[k]^2 isn't T. An even more defensive version would perhaps be

function mysquare{T<:Number}(x::Array{T,1})
    n = length(x)
    y = Array(typeof(one(T)^2), n)
    for k = 1:n
        @inbounds y[k] = x[k]^2
    end
    return y
 end

where one(T) would give 1 if T is an Int, and 1.0 if T is a Float64, and so on. These considerations only matter if you want to make hyper-robust library code. If you really only will be dealing with Float64s or things that can be promoted to Float64s, then it isn't an issue. It seems like hard work, but the power is amazing. You can always just settle for Python-like performance and disregard all type information.

Robinet answered 29/7, 2014 at 18:53 Comment(3)
Why mysquare{T<:Number}(x::T) instead of mysquare(x::Number)?She
No reason, it was just probably to make it visually consistent with the vectorized versions. Generally I wouldn't write it like that.Robinet
You might want to update this for the vectorization changes. This pops up on the sidebar because it's highly upvoted, so people searching might find the @vectorize_1arg when now that's the . notation.Coontie
M
6

As of Julia 0.6 (c. June 2017), the "dot syntax" provides an easy and idiomatic way to apply a function to a scalar or an array.

You only need to provide the scalar version of the function, written in the normal way.

function mysquare{x::Number)
    return(x^2)
end

Append a . to the function name (or preprend it to the operator) to call it on every element of an array:

x = [1 2 3 4]
x2 = mysquare(2)     # 4 
xs = mysquare.(x)    # [1,4,9,16]
xs = mysquare.(x*x') # [1 4 9 16; 4 16 36 64; 9 36 81 144; 16 64 144 256]
y  = x .+ 1          # [2 3 4 5]

Note that the dot-call will handle broadcasting, as in the last example.

If you have multiple dot-calls in the same expression, they will be fused so that y = sqrt.(sin.(x)) makes a single pass/allocation, instead of creating a temporary expression containing sin(x) and forwarding it to the sqrt() function. (This is different from Matlab/Numpy/Octave/Python/R, which don't make such a guarantee).

The macro @. vectorizes everything on a line, so @. y=sqrt(sin(x)) is the same as y = sqrt.(sin.(x)). This is particularly handy with polynomials, where the repeated dots can be confusing...

Manslayer answered 22/2, 2020 at 2:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.