I am studying a little about the source code of Python and I decided to put into practice some changes in the grammar, so I downloaded the source code of version 3.7.
I am following the guidelines of PEP 0306:
https://www.python.org/dev/peps/pep-0306/
And from the example of Hackernoon:
https://hackernoon.com/modifying-the-python-language-in-7-minutes-b94b0a99ce14
The idea came from the improvement in the syntax of decorators (remember, it's just an example of a study, I already know there are other ways to do the same thing):
@test
def mydef (self):
pass
It works perfectly well, following the line of the Grammar/Grammar file:
decorated: decorators (classdef | funcdef | async_funcdef)
Now the goal is to change the decorator to accept declarations, starting with the example:
@test
id: int = 1
Analyzing the grammar, I found the annassign, which would be:
annassign: ':' test ['=' test]
# or even use small_stmt
Given the token representing id: int = 1, I changed the token to:
decorated: decorators (classdef | funcdef | async_funcdef | annassign)
Done that (following PEP 0306) I went to ast.c and identified the ast_for_decorated method, and getting the piece of code:
[...]
assert(TYPE(CHILD(n, 1)) == funcdef ||
TYPE(CHILD(n, 1)) == async_funcdef ||
TYPE(CHILD(n, 1)) == classdef);
if (TYPE(CHILD(n, 1)) == funcdef) {
thing = ast_for_funcdef(c, CHILD(n, 1), decorator_seq);
} else if (TYPE(CHILD(n, 1)) == classdef) {
thing = ast_for_classdef(c, CHILD(n, 1), decorator_seq);
} else if (TYPE(CHILD(n, 1)) == async_funcdef) {
thing = ast_for_async_funcdef(c, CHILD(n, 1), decorator_seq);
}
[...]
You can verify that there is validation of the next token (function, class, or async), then calling the responsible method (ast_for). So I made the changes based on ast.c:
[...]
assert(TYPE(CHILD(n, 1)) == funcdef ||
TYPE(CHILD(n, 1)) == async_funcdef ||
TYPE(CHILD(n, 1)) == annassign ||
TYPE(CHILD(n, 1)) == classdef);
if (TYPE(CHILD(n, 1)) == funcdef) {
thing = ast_for_funcdef(c, CHILD(n, 1), decorator_seq);
} else if (TYPE(CHILD(n, 1)) == annassign) {
thing = ast_for_annassign(c, CHILD(n, 1));
} else if (TYPE(CHILD(n, 1)) == classdef) {
thing = ast_for_classdef(c, CHILD(n, 1), decorator_seq);
} else if (TYPE(CHILD(n, 1)) == async_funcdef) {
thing = ast_for_async_funcdef(c, CHILD(n, 1), decorator_seq);
}
[...]
Notice that I created the ast_for_annassign method, which contains the same verification code present in ast_for_expr_stmt for annassing:
static stmt_ty
ast_for_annassign(struct compiling *c, const node *n)
{
REQ(n, expr_stmt);
expr_ty expr1, expr2, expr3;
node *ch = CHILD(n, 0);
node *deep, *ann = CHILD(n, 1);
int simple = 1;
/* we keep track of parens to qualify (x) as expression not name */
deep = ch;
while (NCH(deep) == 1) {
deep = CHILD(deep, 0);
}
if (NCH(deep) > 0 && TYPE(CHILD(deep, 0)) == LPAR) {
simple = 0;
}
expr1 = ast_for_testlist(c, ch);
if (!expr1) {
return NULL;
}
switch (expr1->kind) {
case Name_kind:
if (forbidden_name(c, expr1->v.Name.id, n, 0)) {
return NULL;
}
expr1->v.Name.ctx = Store;
break;
case Attribute_kind:
if (forbidden_name(c, expr1->v.Attribute.attr, n, 1)) {
return NULL;
}
expr1->v.Attribute.ctx = Store;
break;
case Subscript_kind:
expr1->v.Subscript.ctx = Store;
break;
case List_kind:
ast_error(c, ch,
"only single target (not list) can be annotated");
return NULL;
case Tuple_kind:
ast_error(c, ch,
"only single target (not tuple) can be annotated");
return NULL;
default:
ast_error(c, ch,
"illegal target for annotation");
return NULL;
}
if (expr1->kind != Name_kind) {
simple = 0;
}
ch = CHILD(ann, 1);
expr2 = ast_for_expr(c, ch);
if (!expr2) {
return NULL;
}
if (NCH(ann) == 2) {
return AnnAssign(expr1, expr2, NULL, simple,
LINENO(n), n->n_col_offset, c->c_arena);
}
else {
ch = CHILD(ann, 3);
expr3 = ast_for_expr(c, ch);
if (!expr3) {
return NULL;
}
return AnnAssign(expr1, expr2, expr3, simple,
LINENO(n), n->n_col_offset, c->c_arena);
}
}
Now it was time to test (configure / make -j / make install), python3.7 and:
File "__init__.py", line 13
id: int = 1
^
SyntaxError: invalid syntax
With changes to the grammar and lexical parser, should the compiler interpret tokens as valid, where am I going wrong?