What is a "symbol" in Julia?
Asked Answered
S

2

207

Specifically: I am trying to use Julia's DataFrames package, specifically the readtable() function with the names option, but that requires a vector of symbols.

  • what is a symbol?
  • why would they choose that over a vector of strings?

So far I have found only a handful of references to the word symbol in the Julia language. It seems that symbols are represented by ":var", but it is far from clear to me what they are.

Aside: I can run

df = readtable( "table.txt", names = [symbol("var1"), symbol("var2")] )

My two bulleted questions still stand.

Shy answered 5/5, 2014 at 19:58 Comment(1)
Some conversation on this topic can be found here: groups.google.com/d/msg/julia-users/MS7KW8IU-0o/cQ-yDOs_CQEJContemporaneous
M
346

Symbols in Julia are the same as in Lisp, Scheme or Ruby. However, the answers to those related questions are not really satisfactory, in my opinion. If you read those answers, it seems that the reason a symbol is different from a string is that strings are mutable while symbols are immutable, and symbols are also "interned" – whatever that means. Strings do happen to be mutable in Ruby and Lisp, but they aren't in Julia, and that difference is actually a red herring. The fact that symbols are interned – i.e. hashed by the language implementation for fast equality comparisons – is also an irrelevant implementation detail. You could have an implementation that doesn't intern symbols and the language would be exactly the same.

So what is a symbol, really? The answer lies in something that Julia and Lisp have in common – the ability to represent the language's code as a data structure in the language itself. Some people call this "homoiconicity" (Wikipedia), but others don't seem to think that alone is sufficient for a language to be homoiconic. But the terminology doesn't really matter. The point is that when a language can represent its own code, it needs a way to represent things like assignments, function calls, things that can be written as literal values, etc. It also needs a way to represent its own variables. I.e., you need a way to represent – as data – the foo on the left-hand side of this:

foo == "foo"

Now we're getting to the heart of the matter: the difference between a symbol and a string is the difference between foo on the left-hand side of that comparison and "foo" on the right-hand side. On the left, foo is an identifier that evaluates the value bound to the variable foo in the current scope. On the right, "foo" is a string literal and it evaluates to the string value "foo". A symbol in both Lisp and Julia is how you represent a variable as data. A string represents itself. You can see the difference by applying eval to them:

julia> eval(:foo)
ERROR: foo not defined

julia> foo = "hello"
"hello"

julia> eval(:foo)
"hello"

julia> eval("foo")
"foo"

What the symbol :foo evaluates to depends on what – if anything – the variable foo is bound to, whereas "foo" always just evaluates to "foo". If you want to construct expressions in Julia that use variables, then you're using symbols (whether you know it or not). For example:

julia> ex = :(foo = "bar")
:(foo = "bar")

julia> dump(ex)
Expr
  head: Symbol =
  args: Array{Any}((2,))
    1: Symbol foo
    2: String "bar"
  typ: Any

What that dumped-out stuff shows, among other things, is that there's a :foo symbol object inside of the expression object you get by quoting the code foo = "bar". Here's another example, constructing an expression with the symbol :foo stored in the variable sym:

julia> sym = :foo
:foo

julia> eval(sym)
"hello"

julia> ex = :($sym = "bar"; 1 + 2)
:(begin
        foo = "bar"
        1 + 2
    end)

julia> eval(ex)
3

julia> foo
"bar"

If you try to do this when sym is bound to the string "foo", it won't work:

julia> sym = "foo"
"foo"

julia> ex = :($sym = "bar"; 1 + 2)
:(begin
        "foo" = "bar"
        1 + 2
    end)

julia> eval(ex)
ERROR: syntax: invalid assignment location ""foo""

It's pretty clear to see why this won't work – if you tried to assign "foo" = "bar" by hand, it also won't work.

This is the essence of a symbol: a symbol is used to represent a variable in metaprogramming. Once you have symbols as a data type, it becomes tempting to use them for other things, like hash keys. But that's an incidental, opportunistic usage of a data type that has another primary purpose.

Note that I stopped talking about Ruby a while back. That's because Ruby isn't homoiconic: Ruby doesn't represent its expressions as Ruby objects. So Ruby's symbol type is kind of a vestigial organ – a leftover adaptation, inherited from Lisp, but no longer used for its original purpose. Ruby symbols have been co-opted for other purposes – as hash keys, to pull methods out of method tables – but symbols in Ruby are not used to represent variables.

As to why symbols are used in DataFrames rather than strings, it's because you typically bind column values to variables inside of user-provided expressions. So it's natural for column names to be symbols, since symbols are exactly what you use to represent variables as data. Currently, you have to write df[:foo] to access the foo column, but in the future, you may be able to access it as df.foo instead. When that becomes possible, only columns whose names are valid identifiers will be accessible with this convenient syntax.

See also:

Marginate answered 5/5, 2014 at 21:30 Comment(7)
Interning: In computer science, string interning is a method of storing only one copy of each distinct string value, which must be immutable. Interning strings makes some string processing tasks more time- or space-efficient at the cost of requiring more time when the string is created or interned. en.wikipedia.org/wiki/String_interningSubtangent
At one point you write eval(:foo) and at another eval(sym). Is there a meaningful difference between eval(:foo) and eval(foo)?Aesthete
Very much so: eval(:foo) gives value that variable foo is bound to whereas eval(foo) calls eval on that value. Writing eval(:foo) is equivalent to just foo (in global scope) so eval(foo) is like eval(eval(:foo)).Marginate
One thing to note is that :(foo) == :foo, i.e., the symbol :foo is the expression :(foo) -- there is no difference whatsoever, and in fact despite the :() syntax and the fact that eval(:(foo)) evaluates to foo, typeof(:(foo)) == Symbol. So a Symbol is really just an expression-like object (not of type Expr; expressions are made of Symbols) that would evaluate to the value of the variable the symbol is named after. It's arguably only a side effect that such expression-like objects can be used as "atomic strings" in non-expression contexts such as table columns.Gyroplane
> but in the future, you may be able to access it as df.foo instead. is now possibleFiction
The future is now.Marginate
docs.julialang.org/en/v1/manual/metaprogramming is not available at the url above (and the edit queue is full, so I can't edit the answer)Deciduous
G
15

In reference to the original question as of now, i.e. 0.21 release (and in the future) DataFrames.jl allows both Symbols and strings to be used as column names as it is not a problem to support both and in different situations either Symbol or string might be preferred by the user.

Here is an example:

julia> using DataFrames

julia> df = DataFrame(:a => 1:2, :b => 3:4)
2×2 DataFrame
│ Row │ a     │ b     │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 1     │ 3     │
│ 2   │ 2     │ 4     │

julia> DataFrame("a" => 1:2, "b" => 3:4) # this is the same
2×2 DataFrame
│ Row │ a     │ b     │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 1     │ 3     │
│ 2   │ 2     │ 4     │

julia> df[:, :a]
2-element Array{Int64,1}:
 1
 2

julia> df[:, "a"] # this is the same
2-element Array{Int64,1}:
 1
 2
Garth answered 18/10, 2020 at 18:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.