Where does the magic of singleton arrays come from?
Asked Answered
P

1

7

The following difference between Vector{Missing} and Vector{Int} surprised me (in a positive way):

julia> @btime fill(20, 10^7);
  15.980 ms (2 allocations: 76.29 MiB)

julia> @btime fill(missing, 10^7);
  20.603 ns (1 allocation: 80 bytes)

julia> Base.summarysize(fill(20, 10^7))
80000040

julia> Base.summarysize(fill(missing, 10^7))
40

julia> typeof(fill(20, 10^7))
Vector{Int64} (alias for Array{Int64, 1})

julia> typeof(fill(missing, 10^7))
Vector{Missing} (alias for Array{Missing, 1})

Given that fill(missing, n) does not result in some optimized structure like FillArray, how is this optimization on singletons implemented? I guess it falls out in some way automagically from the fact that singletons have zero size, but how?

I tried to read array.c and julia.h, but can't really follow the details enough. Maybe the real question is instead how are singletons handled by the runtime system...?

Phytobiology answered 14/1, 2022 at 15:2 Comment(0)
U
7

The basic answer is that for an a = Array(T) Julia always allocates sizeof(T)*length(a)+40 bytes. Since sizeof(Missing) == 0, this means it doesn't allocate any bytes related to the length.

Indexing occurs on line 569 of array.c, and the relevant line is

jl_value_t *r = undefref_check((jl_datatype_t*)eltype, jl_new_bits(eltype, &((char*)a->data)[i * a->elsize]))

When the size of the array is zero, a->data[i * a->elsize] refers to the beginning of a->data. The content of this is, however, completely irrelevant: when eltype has zero jl_datatype_size, jl_new_bits ignores the data pointer and instead returns the instance field from the eltype, which is the singleton object.

Ustkamenogorsk answered 14/1, 2022 at 15:6 Comment(3)
I thought as much, but how does indexing work with this? Is there a magic value at the beginning of the allocated block (after the length zero data part) to tell the call "I'm empty, but look up the singleton here"?Phytobiology
Ah, this is the special place: github.com/JuliaLang/julia/blob/…. jl_new_bits ignores the null pointer data and returns the instance stored in the jl_datatype_t of the singleton.Phytobiology
You could probably either add another answer with that info or suggest an edit to Oscar's answer to include that since it's part of what you wanted to know. (Comments on SO are somewhat ephemeral.)Sheerlegs

© 2022 - 2024 — McMap. All rights reserved.