Difference between an identifier and symbol in scheme?
Asked Answered
P

2

5

I am trying to understand how the Scheme meta-circular evaluator handles quoted expressions differently than symbolic data.

The accepted answer Stack Overflow question What exactly is a symbol in lisp/scheme? defines the "symbol" data object in Scheme:

In Scheme and Racket, a symbol is like an immutable string that happens to be interned

The accepted answer writes that in Scheme, there is a built-in correspondence between identifiers and symbols:

To call a method, you look up the symbol that corresponds to the method name. Lisp/Scheme/Racket makes that really easy, because the language already has a built-in correspondence between identifiers (part of the language's syntax) and symbols (values in the language).

To understand the correspondance, I read the page "A Note on Identifiers" in An Introduction to Scheme and Its Implementation, which says

Scheme identifiers (variable names and special form names and keywords) have almost the same restrictions as Scheme symbol object character sequences, and it's no coincidence. Most implementations of Scheme happen to be written in Scheme, and symbol objects are used in the interpreter or compiler to represent variable names.

Based on the above, I'm wondering if my understanding of what is happening in the following session is correct:

user@host:/home/user $ scheme
MIT/GNU Scheme running under GNU/Linux
Type `^C' (control-C) followed by `H' to obtain information about interrupts.

Copyright (C) 2011 Massachusetts Institute of Technology
This is free software; see the source for copying conditions. There is NO warranty; not even for
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Image saved on Sunday February 7, 2016 at 10:35:34 AM
  Release 9.1.1 || Microcode 15.3 || Runtime 15.7 || SF 4.41 || LIAR/x86-64 4.118 || Edwin 3.116

1 ]=> (define a (lambda (i) (+ i 1)))

;Value: a

1 ]=> a

;Value 13: #[compound-procedure 13 a]

1 ]=> (quote a)

;Value: a

1 ]=> (eval a (the-environment))

;Value 13: #[compound-procedure 13 a]

1 ]=> (eval (quote a) (the-environment))

;Value 13: #[compound-procedure 13 a]

1 ]=>
  1. The first define statement is a special form captured by the evaluator, which creates a binding for the symbol a to a compound procedure object in the global environment.

  2. Writing a in the top-level causes the evaluator to receive the symbol object 'a, which evaluates to the compound-procedure object that 'a points to in the global environment.

  3. Writing (quote a) in the top-level causes the evaluator to receive a list of symbols ('quote 'a)); this expression is a special form captured by the evaluator, which evaluates to the quoted expression, namely the symbol object 'a.

  4. Writing (eval a (the-environment)) causes the evaluator to receive a list of symbols ('eval 'a ...) (ignoring the environment). The evaluator performs a lookup for 'eval, which yields the eval compiled procedure object, a lookup for 'a, which yields the compound-procedure. Finally, the top-level evaluator applies the eval procedure to its arguments, since a compound-procedure is self-evaluating (not true in Scheme48), the final value of the expression is the compound-procedure itself.

  5. Writing (eval (quote a) (the-environment)) causes the evaluator to receive a list of symbols ('eval ('quote 'a) ...). The evaluator performs a lookup for 'eval, which yields the eval compiled procedure object. It evaluates the expression ('quote 'a) which yields the symbol object 'a. Finally, the top-level evaluator applies the eval procedure to 'a, which is a symbol object and therefore invokes an environment lookup that yields the compound procedure.

Does this explanation correctly describe (at a high level) how a Scheme interpreter might differentiate between symbol objects and identifiers in the language? Are there fundamental misunderstandings in these descriptions?

Palaeography answered 23/1, 2018 at 1:54 Comment(4)
> Does this explanation correctly describe (at a high level) how a Scheme interpreter might differentiate between symbol objects and identifiers in the language? It can't do any such thing since none of its points 1 through 5 even mention the word "identifier".Gripper
@Gripper fair enough, I'm assuming that "identifiers" are the tokens written into the prompt, before 1 ]=>. Perhaps I should say, "how a scheme interpreter might treat identifiers at the top-level, using symbols internally?Palaeography
@jesterll A Scheme interpreter written in Scheme wouldn't see the lexical identifiers; it would rely on a some read function to scan the text and produce a data structure. Then interpret purely that data structure. Only if you write a Scheme implementation from scratch do you deal with that yourself; but even then, you separate the reading from the interpretation in the same way.Gripper
identifiers are names that refer to bindings in our programs. Symbols are like named pointers that point to themselves.Ascocarp
G
6

The R6RS Scheme report, in 4.2 Lexical Syntax, uses the term identifer to refer to the character-level syntax. That is to say, roughly, identifier means something like the lexical token from which a symbol is constructed when the expression becomes an object. However, elsewhere in the text, identifier seems to be freely used as a synonym for symbol. E.g. "Scheme allows identifiers to stand for locations containing values. These identifiers are called variables." (1.3 Variables and Binding). Basically, the spec seems to be loose with regard to this terminology. Depending on context, an identifier is either the same thing as a symbol (an object), or else <identifier>: the grammar category from the lexical syntax.

In a sentence which says something like that a certain character may or may not appear in an identifier, the context is clearly lexical syntax, because a symbol object is an atom and not a character string; it doesn't contain anything. But when we talk about an identifier denoting a memory location (being a variable), that's the symbol; we're past the issue of what kinds of tokens can produce the symbol in the textual source code.

The An Introduction to Scheme and Its Implementation tutorial linked to in the question is using its own peculiar definition of identifier which is at odds with the Scheme language. It implies that identifiers are "variable names, and special form names and keywords" (so that symbols which are not variable names are not identifiers, which is not supported by the specification).

Gripper answered 23/1, 2018 at 2:15 Comment(1)
I would suspect that this is an attempt to make a difference between an identifier and a symbol, when for example a program is parsed by a certain compiler and no symbols for identifiers are constructed - for example by not using READ to read source as s-expressions.Thus you can have/use a syntax of a Scheme variant, independent of s-expressions.Shaff
W
4

ObPreface: Apologies in advance for telling you things you already know!

Your very first sentence is raising big XY question issues for me. You write "I am trying to understand how the Scheme meta-circular evaluator handles quoted expressions differently than symbolic data." What do you mean by "the Scheme meta-circular evaluator"? Also, what do you mean by "symbolic data"? Both of these terms suggest to me that you want to ask some more high-level questions.

Regardless, your title suggests a question about the difference between identifiers and symbols. The difference is this:

"Identifiers" are a syntactic category. That is, suppose we take a text file and break it up into tokens. Some of those tokens will be left-parens. Some will be right-parens. Some will be numbers. Some will be identifiers. Every language has its own set of syntactic categories, but many of them use the name "identifier" for "word-like thing that can usually be a function name or a variable name or whatever."

"Symbols", on the other hand, are a particular kind of value in Scheme and Lisp systems. Scheme has lots of different kinds of values: Numbers, Booleans, Strings, Pairs, Symbols, and others.

In Scheme, when developing a parser/interpreter/compiler/whatever, it turns out to be very convenient to use symbols (the values) to represent identifiers (the syntactic entities). Specifically, "quote" has a special ability to turn certain host language token sequences into lists of symbols, numbers, strings, and booleans. You don't need to take advantage of this, but it eliminates a lot of code.

Winshell answered 23/1, 2018 at 18:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.