Why does lisp use gensym and other languages don't?
Asked Answered
U

4

16

Correct me if I'm wrong, but there is nothing like gensym in Java, C, C++, Python, Javascript, or any of the other languages I've used, and I've never seemed to need it. Why is it necessary in Lisp and not in other langauges? For clarification, I'm learning Common Lisp.

Ultun answered 7/5, 2015 at 17:39 Comment(2)
Gensym isn't really anything different than what you'd do in Java with new Symbol("foo");. The interning of symbols is what's actually more interesting (but not at all unique). Intern is like a factory method for symbols, where you can get the same one out if you created a similar one earlier; gensym is like a constructor where you always get something new.Internationale
Actually there are broken ones. See GCC's __COUNTER__ used in cases like Linux Kernel's __UNIQUE_ID, for example. However, even saner gensym is not likely to be a good idea in general, because the approach relying on interaction with the process of symbol resolution is too intrusive to the underlying semantics of the language before the language rules gets formally unreasonable. Hygiene will be also difficult to maintain, when needed. There are alternatives like Kernel's keyed static variables.Severin
I
18

Common Lisp has a powerful macro system. You can make new syntax patterns that behave exactly the way you want them to behave. It's even expressed in its own language, making everything in the language available to transform the code from what you want to write to something that CL actually understands. All languages with powerful macro systems provide gensym or do it implicitly in their macro implementation.

In Common Lisp you use gensym when you want to make code where the symbol shouldn't match elements used any other places in the result. Without it there is no guarantee that a user uses a symbol that the macro implementer also use and they start to interfere and the result is something different than the intended behavior. It makes sure nested expansions of the same macro don't interfere with previous expansions. With the Common Lisp macro system it's possible to make more restrictive macro systems similar to Scheme syntax-rules and syntax-case.

In Scheme there are several macro systems. One with pattern matching where new introduced symbols act automatically as if they are made with gensym. syntax-case will also by default make new symbols as if they were made with gensym and there is also a way to reduce hygiene. You can make CL defmacro with syntax-case but since Scheme doesn't have gensym you wouldn't be able to make hygienic macros with it.

Java, C, C++, Python, Javascript are all Algol dialects and none of them have other than simple template based macros. Thus they don't have gensym because they don't need it. Since the only way to introduce new syntax in these languages is to wish next version of it will provide it.

There are two Algol dialects with powerful macros that come to mind. Nemerle and Perl6. Both of them have hygienic approach, meaning variables introduced behave as if they are made with gensym.

In CL, Scheme, Nemerle, Perl6 you don't need to wait for language features. You can make them yourself! The news in both Java and PHP are easily implemented with macros in any of them should it not already be available.

Italian answered 7/5, 2015 at 23:45 Comment(10)
In Javascript, you use "var" to define a new variable and limit it to the most restrictive lexical scope. Is Gensym used because there isn't something like "var" in CL?Ultun
@Ultun Javascript doesn't have powerfull macros. Eg. Implement cond so this works cond (expression) {...}(expression2) {...} else {...}Italian
I think I understand this now. I didn't understand the real difference between macros and functions and why variables in a macro can't be as carefully scoped as in a function. In a function, you are free to name lexical variables whatever you want because you are just dealing with a reference to a value, and changing it's name doesn't change it's value. A macro's arguments aren't variables containing values, they are containing code. The names of the expressions within that code are not arbitrary, they come from the environment of the macro and determine it's production.Ultun
@Ultun Yes, and since CL doesn't have hygiene automatically every symbol you introduce, eg. to make a temporary variable, has the potential to shadow or interfere with the lexical scope of the place the macro gets used. gensym guarantees a unique symbol that you can use to ensure the hygiene of the macro.Italian
could you edit this question in some neutral way? I accidentally downvoted it - sigh - I can only reverse it after an edit...Hartebeest
@RainerJoswig I've added some links. Usually my spelling is terrible, but I'm pretty satisfied with this one :)Italian
Yeah, thanks! Touched the downward button, while scrolling. Fixed!Hartebeest
"... Scheme doesn't have gensym you wouldn't be able to make hygienic macros with it." Did you mean "unhygienic" rather than "hygienic"?Expiation
@Expiation I meant hygenic. If you do create defmacro with syntax-case every binding in syntax made with it is unhygienic since you cannot introduce unqiue identities with gensym.Italian
Note that macro assemblers have a simple way to circumvent this problem. A parameter is called $1 .. $9 and cannot be mistaken for a symbol. There is more ways to skin a cat.Remind
H
11

Can't say which languages have an equivalent of GENSYM. Many languages don't have a first-class symbol data type (with interned and uninterned symbols) and many are not providing similar code generation (macros, ...) facilities.

An interned symbol is registered in a package. An uninterned is not. If the reader (the reader is the Lisp subsystem which takes textual s-expressions as input and returns data) sees two interned symbols in the same package and with the same name, it assumes that it is the same symbol:

CL-USER 35 > (eq 'cl:list 'cl:list)
T

If the reader sees an uninterned symbol, it creates a new one:

CL-USER 36 > (eq '#:list '#:list)
NIL

Uninterned symbols are written with #: in front of the name.

GENSYM is used in Lisp to create numbered uninterned symbols, because it is sometimes useful in code generation and then debugging this code. Note that the symbols are always new and not eq to anything else. But the symbol name could be the same as the name of another symbol. The number gives a clue to the human reader about the identity.

An example using MAKE-SYMBOL

make-symbol creates a new uninterned symbol using a string argument as its name.

Let's see this function generating some code:

CL-USER 31 > (defun make-tagbody (exp test)
               (let ((start-symbol (make-symbol "start"))
                     (exit-symbol  (make-symbol "exit")))
                 `(tagbody ,start-symbol
                           ,exp
                           (if ,test
                               (go ,start-symbol)
                             (go ,exit-symbol))
                           ,exit-symbol)))
MAKE-TAGBODY

CL-USER 32 > (pprint (make-tagbody '(incf i) '(< i 10)))

(TAGBODY
 #:|start| (INCF I)
         (IF (< I 10) (GO #:|start|) (GO #:|exit|))
 #:|exit|)

Above generated code uses uninterned symbols. Both #:|start| are actually the same symbol. We would see this if we would have *print-circle* to T, since the printer then would clearly label identical objects. But here we don't get this added information. Now if you nest this code, then you would see more than the one start and one exit symbol, each which was used in two places.

An example using GENSYM

Now let's use gensym. Gensym also creates an uninterned symbol. Optionally this symbol is named by a string. A number (see the variable CL:*GENSYM-COUNTER*) is added.

CL-USER 33 > (defun make-tagbody (exp test)
               (let ((start-symbol (gensym "start"))
                     (exit-symbol  (gensym "exit")))
                 `(tagbody ,start-symbol
                           ,exp
                           (if ,test
                               (go ,start-symbol)
                             (go ,exit-symbol))
                           ,exit-symbol)))
MAKE-TAGBODY

CL-USER 34 > (pprint (make-tagbody '(incf i) '(< i 10)))

(TAGBODY
 #:|start213051| (INCF I)
         (IF (< I 10) (GO #:|start213051|) (GO #:|exit213052|))
 #:|exit213052|)

Now the number is an indicator that the two uninterned #:|start213051| symbols are actually the same. When the code would be nested, the new version of the start symbol would have a different number:

CL-USER 7 > (pprint (make-tagbody `(progn
                                     (incf i)
                                     (setf j 0)
                                     ,(make-tagbody '(incf ij) '(< j 10)))
                                  '(< i 10)))

(TAGBODY
 #:|start2756| (PROGN
                 (INCF I)
                 (SETF J 0)
                 (TAGBODY
                  #:|start2754| (INCF IJ)
                          (IF (< J 10)
                              (GO #:|start2754|)
                            (GO #:|exit2755|))
                  #:|exit2755|))
         (IF (< I 10) (GO #:|start2756|) (GO #:|exit2757|))
 #:|exit2757|)

Thus it helps understanding generated code, without the need to turn *print-circle* on, which would label the identical objects:

CL-USER 8 > (let ((*print-circle* t))
              (pprint (make-tagbody `(progn
                                       (incf i)
                                       (setf j 0)
                                       ,(make-tagbody '(incf ij) '(< j 10)))
                                    '(< i 10))))

(TAGBODY
 #3=#:|start1303| (PROGN
                    (INCF I)
                    (SETF J 0)
                    (TAGBODY
                     #1=#:|start1301| (INCF IJ)
                             (IF (< J 10) (GO #1#) (GO #2=#:|exit1302|))
                     #2#))
         (IF (< I 10) (GO #3#) (GO #4=#:|exit1304|))
 #4#)

Above is readable for the Lisp reader (the subsystem which reads s-expressions for textual representations), but a bit less for the human reader.

Hartebeest answered 7/5, 2015 at 18:24 Comment(0)
M
1

I believe that symbols (in the Lisp sense) are mostly useful in homoiconic languages (those in which the syntax of the language is representable as a data of that language).

Java, C, C++, Python, Javascript are not homoiconic.

Once you have symbols, you want some way to dynamically create them. gensym is a possibility, but you can also intern them.

BTW, MELT is a lisp-like dialect, it does not create symbols with gensym or by interning strings but with clone_symbol. (actually MELT symbols are instances of predefined CLASS_SYMBOL, ...).

Marketable answered 7/5, 2015 at 18:35 Comment(0)
B
0

gensym is available as a predicate in most of Prolog interpreters. You can find it in the eponym library.

Buford answered 11/6, 2016 at 10:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.