We've seen an pure standard library approach.
This execute
d the instructions immediately.
Let's create a parser that build an AST (Abstract Syntax Tree). In the case of our simple stack machine it's just a list of instructions. Let's call it a Tape
.
Using Boost Spirit
I'd still advise against a lexer. Lexers are supported in Spirit v2 (not in X3 - yet?). But in practice they complicate matters, and Spirit knows how to backtrack the input in case of mismatch. So you can just tentatively match productions and try the next if it wasn't the right "token".
Here's what using a Spirit grammar should look like:
Tape program;
boost::spirit::istream_iterator f(std::cin >> std::noskipws), l; // Note the noskipws!
if (parse(f, l, Parser::program, program)) {
std::cout << "Parsed " << program.size() << " instructions\n";
} else {
std::cout << "Parse failed\n";
}
Now, the AST types are:
struct Add {};
struct Sub {};
struct Mul {};
struct Div {};
struct Pop {};
using Value = int;
using Instr = boost::variant<Add, Sub, Mul, Div, Pop, Value>;
using Tape = std::vector<Instr>;
Simple, right.
The grammar
In X3, making the grammar is pretty lightweight. Top-down:
auto instr = opcode_ | int_;
auto program = skip(space) [*instr];
Now, all we have to do is teach it to recognize opcodes. A start would be:
struct opcodes : symbols<Instr> {
opcodes() {
this->add("add", Add{})("sub", Sub{})("mul", Mul{})("div", Div{})("pop", Pop{});
}
} opcode_;
The seasoned Spirit guru will spot a problem here: opcode_
is not a lexeme, and neither does it guarantee "distinct identifier" parsing. E.g. "a dd"
would match Add
. And "additional"
would also match.
Luckily, X3 makes it really easy to compose directives on the fly:
auto opcode_ = [] {
struct opcodes : symbols<Instr> {
opcodes() { this->add("add", Add{})("sub", Sub{})("mul", Mul{})("div", Div{})("pop", Pop{}); }
} codes_;
return lexeme[codes_ >> !graph];
}();
So, now both holes are fixed.
Full Demo
Live On Coliru
#include <iostream>
#include <deque>
#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/include/support_istream_iterator.hpp>
struct Add {};
struct Sub {};
struct Mul {};
struct Div {};
struct Pop {};
using Value = int;
using Instr = boost::variant<Add, Sub, Mul, Div, Pop, Value>;
struct Machine {
using result_type = void;
std::deque<Value> stack_;
void operator()(Instr instr) {
boost::apply_visitor(*this, instr);
}
void operator()(Add) {
assert(stack_.size()>=2);
auto op2 = stack_.back(); stack_.pop_back();
auto op1 = stack_.back(); stack_.pop_back();
stack_.push_back(op1 + op2);
}
void operator()(Sub) {
assert(stack_.size()>=2);
auto op2 = stack_.back(); stack_.pop_back();
auto op1 = stack_.back(); stack_.pop_back();
stack_.push_back(op1 - op2);
}
void operator()(Mul) {
assert(stack_.size()>=2);
auto op2 = stack_.back(); stack_.pop_back();
auto op1 = stack_.back(); stack_.pop_back();
stack_.push_back(op1 * op2);
}
void operator()(Div) {
assert(stack_.size()>=2);
auto op2 = stack_.back(); stack_.pop_back();
auto op1 = stack_.back(); stack_.pop_back();
assert(op2 != 0);
stack_.push_back(op1 / op2);
}
void operator()(Value v) {
stack_.push_back(v);
}
void operator()(Pop) {
assert(stack_.size()>=1);
stack_.pop_back();
}
void trace() const {
using namespace std;
// debug trace
copy(stack_.begin(), stack_.end(), ostream_iterator<Value>(cout, " "));
cout << "\n";
}
};
using Tape = std::vector<Instr>;
namespace Parser {
using namespace boost::spirit::x3;
auto opcode_ = [] {
struct opcodes : symbols<Instr> {
opcodes() { this->add("add", Add{})("sub", Sub{})("mul", Mul{})("div", Div{})("pop", Pop{}); }
} codes_;
return lexeme[codes_ >> !graph];
}();
auto instr = opcode_ | int_; // TODO
auto program = skip(space) [*instr];
}
int main() {
Tape program;
boost::spirit::istream_iterator f(std::cin >> std::noskipws), l; // Note the noskipws!
if (parse(f, l, Parser::program, program)) {
std::cout << "Parsed " << program.size() << " instructions\n";
} else {
std::cout << "Parse failed\n";
}
if (f != l)
std::cout << "Unparsed program data: '" << std::string(f,l) << "'\n";
Machine machine;
for (auto instr : program)
{
machine(instr);
machine.trace();
}
}
Prints:
Parsed 7 instructions
1
1 2
3
3 3
3 3 1
3 2
6
Summary
The main gain here is:
- we define our grammar declaratively. For a more involved grammar, this could be a huge advantage
we get backtracking for free - so no need to separate tokens ahead of time
Note: This is a PEG grammar. Backtracking only amounts to trying the next sibling alternative or failing the current rule (so the parent rule could try the next sibling alternative).
This is significantly different from backtracking as in Regular Expression. You'll note the difference with Kleene-* en other repetitive parser expressions. In PEG grammars, these are always greedy, and never backtrack only a single element (similar to "Maximum Munch" rules).
We don't have any messy switch
anymore. In fact it's hidden inside the Variant Visitor (see apply_visitor
).
The instruction execution is virtually unmodified, but we renamed execute
to operator()
so it models the Visitor concept.