I now attempt to provide a broad overview that answers my questions and may point others towards constructively investigating similar directions.
- Can I actually do this?
Yes. Though depending on the specifics of how and why, you may have a choice of options available to you, and they may have variable levels of portability between Common Lisp implementations.
- What are the actual mechanics of doing so and generating other user based transforms/extensions?
I answer this with respect to two possible methods that the programmer may choose to get started, and which seem most applicable.
For both examples, I reiterate that with limited reflection on the topic, i think it bad form to transform relationships between the input/output mappings of a function. I do so here for demonstration purposes only, to verify that the transformations I'm implementing are actually taking place.
I actually had quite a difficult time testing my transformations were actually happening: SBCL especially seems quite happy to optimise certain expressions and forms, there are additional pieces of information you can make available to the compiler not covered here. Additionally, there may be other transformations available, and so just because your transform isn't used, doesn't necessarily mean it isn't "working".
Environments and Define-Compiler-Macro Extensions using Common Lisp the Language 2
I was previously under the impression that DEFINE-COMPILER-MACRO
was relatively limited in its abilities, working only on types connected with literal values, but this is not necessarily the case.
To demonstrate this, i use three user-defined functions and a compiler macro.
First: We will begin with a general addition function gen+
that decides at run-time to either add two numbers together, or concatenate two strings:
(defun gen+ (x y)
(if (and (numberp x)
(numberp y))
(+ x y)
(concatenate 'string x y)))
But say we know at compile time that in certain instances, only strings will be fed to this function. Let's define our specialised string addition function, and to prove its actually being used, we'll do a very bad thing stylistically and additionally concatenate the string "kapow"
as well:
(defun string+ (x y)
(declare (optimize (speed 3) (safety 0))
(string x y))
(concatenate 'string x y "kapow"))
The following function is a very simple convenience function that checks an environment to establish whether the declared type of the variable bound in that environment is eq
to STRING
. We're using a NON-ANSI function here from Common Lisp the Language 2. In sbcl, the function VARIABLE-INFORMATION
, and other cltl2 functions are available in the sb-ctlt2
package.
(defun env-stringp (symbol environment)
(eq 'string
(cdr (assoc 'type
(nth-value 2 (sb-cltl2:variable-information symbol environment))))))
Lastly, we use DEFINE-COMPILER-MACRO
to generate the transformation. I've tried to name things in this code differently from other examples I've seen so that people can follow along and not get mixed up with what variable/symbol is in which scope/context. A couple of things I didn't know previously about DEFINE-COMPILER-MACRO
.
- The variable that immediately follows the
&whole
parameter is a variable which represents the form of the initial call. In our example it will be bound to the list (GEN+ A B)
- arg1 is bound to the symbol
A
- arg2 is bound to the symbol
B
- The
&environment
parameter says that within this macro, the symbol ENV
will be bound to the environment in which the macro is being evaluated. This is what lets us "kind of step back out of the macro" and check the surrounding code for declarations regarding the type of the variables represented by the symbols bound to 'ARG1' and 'ARG2'
In this definition, we tell the compiler macro that if the user has declared the parameters of GEN+
to be strings, then replace the call to (GEN+ ARG1 ARG2)
with a call to (STRING+ ARG1 ARG2)
.
Note that because the condition of this transformation is the result of a user-defined operation on the environment, if the parameters to GEN+
are literal strings, the transformation will not be triggered, because the environment does not see that the variables have been declared strings. To do that, you would have to add another option and transformation to explicitly check the types of the values in ARG1 and ARG2 as per a traditional use of DEFINE-COMPILER-MACRO
. This can be left as an exercise for the reader. But beware about the utility of doing so, because SBCL, for instance, might constant-fold your expression rather than use your transformation anyway.
(define-compiler-macro gen+ (&whole form arg1 arg2 &environment env)
(cond ((and (env-stringp arg1 env)
(env-stringp arg2 env))
`(string+ ,arg1 ,arg2))
(t form)))
Now we can test it with a simple call with type declarations:
(let ((a "bob")
(b "dole"))
(declare (string a b))
(gen+ a b))
This should return the string "bobdolekapow"
as the call to GEN+
was transformed into a call to STRING+
based on the declared types of the variables A
and B
, not just literal types.
Using Basic (defknown)/(deftransform) Combinations with the SBCL Implementation Compiler
The previous technique is indeed potentially useful, more powerful and flexible than transforming on the types of literals, and while not standard ANSI Common Lisp, is more portable/adaptable to other implementations than the technique that follows.
A reason you might forego the former technique in preference of the one that follows, is that the former doesn't get you everything. You still had to declare the types of the variables a
and b
and write the user-defined function to extract the declared type information from the environment.
If you can interact directly with the SBCL compiler however, with the cost of potentially some brittle-ness and extreme non-portability, you now gain the ability to hack into the compiler itself and gain the benefits of things like type propagation: you might not need to explicitly inform the compiler of the types of A
and B
for it to implement your transformation.
For our example, we will implement a very basic transformation on the functions wat
and string-wat
, which are identical in form to our previous functions gen+
and string+
.
Understand there are many more pieces of information and optimisation you can feed the SBCL compiler not covered here. And if anyone more experienced with SBCL internals wants to correct/extent anything regarding my impressions here, please comment and i'll be happy to update my answer:
First we tell the compiler about the existence and type signature of wat
. We do this by calling defknown
in the sb-c
package and inform it that wat
takes two parameters of any type: (T T)
and that it returns a single value of any type: *
(sb-c:defknown wat (T T) *)
Then we define a simple transform using sb-c:deftransform
, essentially saying when the two parameters fed to wat
are strings, we transform the code into a call to string-wat
.
(sb-c:deftransform wat ((x y) (string string) *)
`(string-wat x y))
The forms of wat
and string-wat
for completeness:
(defun wat (x y)
(if (and (numberp x)
(numberp y))
(+ x y)
(concatenate 'string x y)))
(defun string-wat (x y)
(declare (optimize (speed 3) (safety 0))
(string x y))
(concatenate 'string x y "watpow"))
And this time a demonstration in SBCL using bound variables but no explicit type declarations:
(let ((a (concatenate 'string "bo" "b"))
(b (concatenate 'string "dole")))
(wat a b))
And the returned string should be "bobdolewatpow"
.
- Any reading or sources apart from manually reading through the source to discover more info regarding this?
I haven't been able to find anything much about this out there, and would say that to get much deeper, you're going to have to start trawling through some source code.
SBCL github mirror is currently available here.
User @PuercoPop has suggested background reading of Starting to Hack on SBCL and The Python Compiler for CMU Common Lisp, albeit I am including a link to a .pdf version rather than a .ps version commonly linked to.