Yes, it's possible and not even very hard :)
We'll need to discuss a few things:
- What are syntax and semantics.
- How are programming languages parsed? What is a syntax tree?
- Extending the language syntax.
- Extending the language semantics.
- How do I add an operator to the JavaScript language.
If you're lazy and just want to see it in action - I put the working code on GitHub
1. What is syntax and semantics?
Very generally - a language is composed of two things.
Syntax - these are the symbols in the language like unary operators like ++
, as well as Expression
s like a FunctionExpression
that represent an "inline" function. The syntax represents just the symbols used and not their meaning. In short the syntax is just the drawings of letters and symbols - it holds no inherent meaning.
Semantics ties meaning to these symbols. Semantics is what says ++
means "increment by one", in fact here is the exact defintion. It ties meaning to our syntax and without it the syntax is just a list of symbols with an order.
2. How are programming languages parsed? What is a syntax tree?
At some point, when something executes your code in JavaScript or any other programming language - it needs to understand that code. A part of this called lexing (or tokenizing, let's not go into subtle differences here) means breaking up code like:
function foo(){ return 5;}
Into its meaningful parts - that is saying that there is a function
keyword here, followed by an identifier, an empty arguments list, then a block opening {
containing a return keyword with the literal 5
, then a semicolon, then an end block }
.
This part is entirely in the syntax, all it does is break it up to parts like function,foo,(,),{,return,5,;,}
. It still has no understanding of the code.
After that - a Syntax Tree
is built. A syntax tree is more aware of the grammar but is still entirely syntactic. For example, a syntax tree would see the tokens of:
function foo(){ return 5;}
And figure out "Hey! There is a function declaration here!".
It's called a tree because it's just that - trees allow nesting.
For example, the code above can produce something like:
Program
FunctionDeclaration (identifier = 'foo')
BlockStatement
ReturnStatement
Literal (5)
This is rather simple, just to show you it isn't always so linear, let's check 5 +5
:
Program
ExpressionStatement
BinaryExpression (operator +)
Literal (5) Literal(5) // notice the split her
Such splits can occur.
Basically, a syntax tree allows us to express the syntax.
This is where x ∘ y
fails - it sees ∘
and doesn't understand the syntax.
3. Extending the language syntax.
This just requires a project that parses the syntax. What we'll do here is read the syntax of "our" language which is not the same as JavaScript (and does not comply to the specification) and replace our operator with something the JavaScript syntax is OK with.
What we'll be making is not JavaScript. It does not follow the JavaScript specification and a standards complaint JS parser will throw an exception on it.
4. Extending the language semantics
This we do all the time anyway :) All we'll do here is just define a function to call when the operator is called.
5. How do I add an operator to the JavaScript language.
Let me just start by saying after this prefix that we'll not be adding an operator to JS here, rather - we're defining our own language - let's call it "CakeLanguage" or something and add the operator it it. This is because ∘
is not a part of the JS grammar and the JS grammar does not allow arbitrary operators like some other languages.
We'll use two open source projects for this:
- esprima which takes JS code and generates the syntax tree for it.
- escodegen which does the other direction, generating JS code from the syntax tree esprima spits.
It you paid close attention you'd know we can't use esprima directly since we'll be giving it grammar it does not understand.
We'll add a #
operator that does x # y === 2x + y
for the fun. We'll give it the precedence of multiplicity (because operators have operator precedence).
So, after you get your copy of Esprima.js - we'll need to change the following:
To FnExprTokens
- that is expressions we'll need to add #
so it'd recognize it. Afterwards, it'd look as such:
FnExprTokens = ['(', '{', '[', 'in', 'typeof', 'instanceof', 'new',
'return', 'case', 'delete', 'throw', 'void',
// assignment operators
'=', '+=', '-=', '*=', '/=', '%=', '<<=', '>>=', '>>>=',
'&=', '|=', '^=', ',',
// binary/unary operators
'+', '-', '*', '/', '%','#', '++', '--', '<<', '>>', '>>>', '&',
'|', '^', '!', '~', '&&', '||', '?', ':', '===', '==', '>=',
'<=', '<', '>', '!=', '!=='];
To scanPunctuator
we'll add it and its char code as a possible case: case 0x23: // #
And then to the test so it looks like:
if ('<>=!+-*#%&|^/'.indexOf(ch1) >= 0) {
Instead of:
if ('<>=!+-*%&|^/'.indexOf(ch1) >= 0) {
And then to binaryPrecedence
let's give it the same precedence as multiplicity:
case '*':
case '/':
case '#': // put it elsewhere if you want to give it another precedence
case '%':
prec = 11;
break;
That's it! We've just extended our language syntax to support the #
operator.
We're not done yet, we need to convert it back to JS.
Let's first define a short visitor
function for our tree that recursively visits all its node.
function visitor(tree,visit){
for(var i in tree){
visit(tree[i]);
if(typeof tree[i] === "object" && tree[i] !== null){
visitor(tree[i],visit);
}
}
}
This just goes through the Esprima generated tree and visits it. We pass it a function and it runs that on every node.
Now, let's treat our special new operator:
visitor(syntax,function(el){ // for every node in the syntax
if(el.type === "BinaryExpression"){ // if it's a binary expression
if(el.operator === "#"){ // with the operator #
el.type = "CallExpression"; // it is now a call expression
el.callee = {name:"operator_sharp",type:"Identifier"}; // for the function operator_#
el.arguments = [el.left, el.right]; // with the left and right side as arguments
delete el.operator; // remove BinaryExpression properties
delete el.left;
delete el.right;
}
}
});
So in short:
var syntax = esprima.parse("5 # 5");
visitor(syntax,function(el){ // for every node in the syntax
if(el.type === "BinaryExpression"){ // if it's a binary expression
if(el.operator === "#"){ // with the operator #
el.type = "CallExpression"; // it is now a call expression
el.callee = {name:"operator_sharp",type:"Identifier"}; // for the function operator_#
el.arguments = [el.left, el.right]; // with the left and right side as arguments
delete el.operator; // remove BinaryExpression properties
delete el.left;
delete el.right;
}
}
});
var asJS = escodegen.generate(syntax); // produces operator_sharp(5,5);
The last thing we need to do is define the function itself:
function operator_sharp(x,y){
return 2*x + y;
}
And include that above our code.
That's all there is to it! If you read so far - you deserve a cookie :)
Here is the code on GitHub so you can play with it.