I want to understand how a transpiler works. The best to do this is to write one ofcourse.
I've been looking into a few resources to understand how this works, theoretically. And i understand the following:
From what i understand i basically need to write two classes:
- A Lexical Analyzer
- A Parser
Lexical Analyzer
The Lexical Analyzer
takes the source code of a file as input (the input stream). For example the following code:
if (someVar == 20) {
MessageBox("Hello World!");
}
Then the Lexical Analyzer
creates chunks of data out of this:
[if]
[ ]
[(]
[someVar]
[ ]
[==]
[ ]
[20]
[)]
[ ]
[{]
[\n]
[\t]
[MessageBox]
[(]
["]
[Hello World!]
["]
[)]
[;]
[\n]
[\t]
[}]
This will then be sent to the Parser
class.
The Parser
The Parser
class will then read all the chunks of tokens(?) and specify what each token(?) means. It will assign a certain type to it. So the result of the above string will be identified as something like:
[if] // Keyword
[ ] // Whitespace
[(] // L_Parenthesis
[someVar] // Identifier
[ ] // Whitespace
[==] // Operator
[ ] // Whitespace
[20] // Number (or Integer)
[)] // R_Parenthesis
[ ] // Whitespace
[{] // L_Bracket
[\n] // Whitespace
[\t] // Whitespace
[MessageBox] // Keyword
[(] // L_Parenthesis
["] // Not yet sure where this would go
[Hello World!] // Same..
["] // Same...
[)] // R_Parenthesis
[;] // Semicolon
[\n] // Whitespace
[\t] // Whitespace
[}] // R_Bracket
As you can see, i haven't fully sorted out what types exactly goes where. But this should be the basic idea.
Now the next thing i'd like to do, is convert that source code to another source code, thus transpiling it. But how does that work? I can't find any direct tutorials, explanations about that.
Suppose i have the following custom code:
def myVar = true;
public function myFunc ( def arg1 )
{
if ( arg1 == true ) {
MessageBox("My Message");
}
}
Then the Lexical process will parse this code. Then how do i convert that to something like Javascript?
var myVar = true;
myFunc = function ( arg1 )
{
if ( arg1 == true ) {
alert("My Message");
}
}
How does the mapping work, going from my custom cdoe, to a code like Javascript? Like, the function declaration. My Lexical
parser has the following: public
, function
, myFunc
. How can it know that it should map that to: myFunc = function
...?
Anyone any good and practical information on how this should be done in an transpiler
? Or am i going the wrong to by writing a lexical
analyzer for this job?
Edit
So obviously my idea how the lexer / parser works isn't exactly right. Any "pseudo" information on how this process works (with pseudo examples) is more than welcome.