Correct me if I'm wrong, but there is nothing like gensym in Java, C, C++, Python, Javascript, or any of the other languages I've used, and I've never seemed to need it. Why is it necessary in Lisp and not in other langauges? For clarification, I'm learning Common Lisp.
Common Lisp has a powerful macro system. You can make new syntax patterns that behave exactly the way you want them to behave. It's even expressed in its own language, making everything in the language available to transform the code from what you want to write to something that CL actually understands. All languages with powerful macro systems provide gensym
or do it implicitly in their macro implementation.
In Common Lisp you use gensym
when you want to make code where the symbol shouldn't match elements used any other places in the result. Without it there is no guarantee that a user uses a symbol that the macro implementer also use and they start to interfere and the result is something different than the intended behavior. It makes sure nested expansions of the same macro don't interfere with previous expansions. With the Common Lisp macro system it's possible to make more restrictive macro systems similar to Scheme syntax-rules
and syntax-case
.
In Scheme there are several macro systems. One with pattern matching where new introduced symbols act automatically as if they are made with gensym
. syntax-case
will also by default make new symbols as if they were made with gensym
and there is also a way to reduce hygiene. You can make CL defmacro
with syntax-case
but since Scheme doesn't have gensym
you wouldn't be able to make hygienic macros with it.
Java, C, C++, Python, Javascript are all Algol dialects and none of them have other than simple template based macros. Thus they don't have gensym
because they don't need it. Since the only way to introduce new syntax in these languages is to wish next version of it will provide it.
There are two Algol dialects with powerful macros that come to mind. Nemerle and Perl6. Both of them have hygienic approach, meaning variables introduced behave as if they are made with gensym
.
In CL, Scheme, Nemerle, Perl6 you don't need to wait for language features. You can make them yourself! The news in both Java and PHP are easily implemented with macros in any of them should it not already be available.
cond
so this works cond (expression) {...}(expression2) {...} else {...}
–
Italian gensym
guarantees a unique symbol that you can use to ensure the hygiene of the macro. –
Italian defmacro
with syntax-case
every binding in syntax made with it is unhygienic since you cannot introduce unqiue identities with gensym
. –
Italian Can't say which languages have an equivalent of GENSYM
. Many languages don't have a first-class symbol data type (with interned and uninterned symbols) and many are not providing similar code generation (macros, ...) facilities.
An interned symbol is registered in a package. An uninterned is not. If the reader (the reader is the Lisp subsystem which takes textual s-expressions as input and returns data) sees two interned symbols in the same package and with the same name, it assumes that it is the same symbol:
CL-USER 35 > (eq 'cl:list 'cl:list)
T
If the reader sees an uninterned symbol, it creates a new one:
CL-USER 36 > (eq '#:list '#:list)
NIL
Uninterned symbols are written with #:
in front of the name.
GENSYM
is used in Lisp to create numbered uninterned symbols, because it is sometimes useful in code generation and then debugging this code. Note that the symbols are always new and not eq
to anything else. But the symbol name could be the same as the name of another symbol. The number gives a clue to the human reader about the identity.
An example using MAKE-SYMBOL
make-symbol
creates a new uninterned symbol using a string argument as its name.
Let's see this function generating some code:
CL-USER 31 > (defun make-tagbody (exp test)
(let ((start-symbol (make-symbol "start"))
(exit-symbol (make-symbol "exit")))
`(tagbody ,start-symbol
,exp
(if ,test
(go ,start-symbol)
(go ,exit-symbol))
,exit-symbol)))
MAKE-TAGBODY
CL-USER 32 > (pprint (make-tagbody '(incf i) '(< i 10)))
(TAGBODY
#:|start| (INCF I)
(IF (< I 10) (GO #:|start|) (GO #:|exit|))
#:|exit|)
Above generated code uses uninterned symbols. Both #:|start|
are actually the same symbol. We would see this if we would have *print-circle*
to T
, since the printer then would clearly label identical objects. But here we don't get this added information. Now if you nest this code, then you would see more than the one start
and one exit
symbol, each which was used in two places.
An example using GENSYM
Now let's use gensym
. Gensym also creates an uninterned symbol. Optionally this symbol is named by a string. A number (see the variable CL:*GENSYM-COUNTER*
) is added.
CL-USER 33 > (defun make-tagbody (exp test)
(let ((start-symbol (gensym "start"))
(exit-symbol (gensym "exit")))
`(tagbody ,start-symbol
,exp
(if ,test
(go ,start-symbol)
(go ,exit-symbol))
,exit-symbol)))
MAKE-TAGBODY
CL-USER 34 > (pprint (make-tagbody '(incf i) '(< i 10)))
(TAGBODY
#:|start213051| (INCF I)
(IF (< I 10) (GO #:|start213051|) (GO #:|exit213052|))
#:|exit213052|)
Now the number is an indicator that the two uninterned #:|start213051|
symbols are actually the same. When the code would be nested, the new version of the start symbol would have a different number:
CL-USER 7 > (pprint (make-tagbody `(progn
(incf i)
(setf j 0)
,(make-tagbody '(incf ij) '(< j 10)))
'(< i 10)))
(TAGBODY
#:|start2756| (PROGN
(INCF I)
(SETF J 0)
(TAGBODY
#:|start2754| (INCF IJ)
(IF (< J 10)
(GO #:|start2754|)
(GO #:|exit2755|))
#:|exit2755|))
(IF (< I 10) (GO #:|start2756|) (GO #:|exit2757|))
#:|exit2757|)
Thus it helps understanding generated code, without the need to turn *print-circle*
on, which would label the identical objects:
CL-USER 8 > (let ((*print-circle* t))
(pprint (make-tagbody `(progn
(incf i)
(setf j 0)
,(make-tagbody '(incf ij) '(< j 10)))
'(< i 10))))
(TAGBODY
#3=#:|start1303| (PROGN
(INCF I)
(SETF J 0)
(TAGBODY
#1=#:|start1301| (INCF IJ)
(IF (< J 10) (GO #1#) (GO #2=#:|exit1302|))
#2#))
(IF (< I 10) (GO #3#) (GO #4=#:|exit1304|))
#4#)
Above is readable for the Lisp reader (the subsystem which reads s-expressions for textual representations), but a bit less for the human reader.
I believe that symbols (in the Lisp sense) are mostly useful in homoiconic languages (those in which the syntax of the language is representable as a data of that language).
Java, C, C++, Python, Javascript are not homoiconic.
Once you have symbols, you want some way to dynamically create them. gensym
is a possibility, but you can also intern them.
BTW, MELT is a lisp-like dialect, it does not create symbols with gensym
or by interning strings but with clone_symbol. (actually MELT symbols are instances of predefined CLASS_SYMBOL
, ...).
gensym
is available as a predicate in most of Prolog interpreters. You can find it in the eponym library.
© 2022 - 2024 — McMap. All rights reserved.
new Symbol("foo");
. The interning of symbols is what's actually more interesting (but not at all unique). Intern is like a factory method for symbols, where you can get the same one out if you created a similar one earlier; gensym is like a constructor where you always get something new. – Internationale__COUNTER__
used in cases like Linux Kernel's__UNIQUE_ID
, for example. However, even sanergensym
is not likely to be a good idea in general, because the approach relying on interaction with the process of symbol resolution is too intrusive to the underlying semantics of the language before the language rules gets formally unreasonable. Hygiene will be also difficult to maintain, when needed. There are alternatives like Kernel's keyed static variables. – Severin