The problem is as follows.
A text file contains millions of lines of arithmetic - which need quick evaluation.
I have been exploring my options for this problem and have written a little script using the nice exprtk C++
library.
The code works and is able to evaluate the expressions but is slower than I thought it would be. The lines of arithmetic can get pretty long which might be compounding the issue. Out of interest I compared the evaluation times with that of the basic Python eval()
command and was surprised that eval()
was 3-4 times faster than exprtk!
Here is the C++
code:
#include <iostream>
#include <fstream>
#include <cstdio>
#include <sstream>
#include <string>
#include "exprtk.hpp"
int main()
{
typedef exprtk::symbol_table<double> symbol_table_t;
typedef exprtk::expression<double> expression_t;
typedef exprtk::parser<double> parser_t;
typedef exprtk::parser_error::type error_t;
// Define variables in strings
double A = 1.1;
double B = 2.2;
double C = 3.3;
double m3 = 1.0;
double z3 = 2.0;
symbol_table_t symbol_table;
symbol_table.add_constants();
symbol_table.add_variable("A", A);
symbol_table.add_variable("B", B);
symbol_table.add_variable("C", C);
symbol_table.add_variable("m3", m3);
symbol_table.add_variable("z3", z3);
expression_t expression;
expression.register_symbol_table(symbol_table);
parser_t parser;
// Load the text file and loop over the lines
std::ifstream fin("test.txt");
std::string file_line;
while(std::getline(fin, file_line)) {
// current line of text is in file_line, not including the \n
std::string expression_str = file_line;
if(!parser.compile(expression_str, expression)) {
printf("Error: %s\tExpression: %s\n", parser.error().c_str(), expression_str.c_str());
for(std::size_t i = 0; i < parser.error_count(); ++i) {
const error_t error = parser.get_error(i);
printf("Error: %02d Position: %02d Type: [%s] Msg: %s Expr: %s\n",
static_cast<int>(i),
static_cast<int>(error.token.position),
exprtk::parser_error::to_str(error.mode).c_str(),
error.diagnostic.c_str(),
expression_str.c_str());
}
return 1;
}
double result = expression.value();
printf("%10.16f\n", result);
}
return 0;
}
Here is the Python code:
with open("test.txt", 'r') as h:
linesHH = h.readlines()
// Apply list comprehension
matt = [eval(x) for x in linesHH]
The text file has a very basic format, with a new line of arithmetic following another. An example of the first line is as follows:
-10./(A+B)^4-2.500000000000000*A^3/C^3/(A+B)^4-9.250000000000000*A/C/(A+B)^4-2.500000000000000*B^3/C^3/(A+B)^4-9.250000000000000*B/C/(A+B)^4-8.*A^2/C^3/(A+B)^4-8.*B^2/C^3/(A+B)^4-1.750000000000000*A/B/(A+B)^4+2.250000000000000*B^2/C^2/(A+B)^4-1.750000000000000/A*B/(A+B)^4-1./A^2*B^2/(A+B)^4-.2500000000000000/A^3*B^3/(A+B)^4-13.*A/C^2/(A+B)^4-13.*B/C^2/(A+B)^4-.2500000000000000*A^3/B^3/(A+B)^4+2.250000000000000*A^2/C^2/(A+B)^4-1.*A^2/B^2/(A+B)^4+62./C/(A+B)^4*z3-11./C/(A+B)^4-13.*A^2*B/C^3/(A+B)^4+3.500000000000000*A^2/B/C/(A+B)^4-13.*A*B^2/C^3/(A+B)^4+3.500000000000000/A*B^2/C/(A+B)^4-14.*A*B/C^3/(A+B)^4-.5000000000000000/A*B^4/C^3/(A+B)^4-1./A*B^3/C^2/(A+B)^4-.2500000000000000/A^2*B^4/C^2/(A+B)^4-2./A^2*B^3/C/(A+B)^4-.2500000000000000/A^3*B^4/C/(A+B)^4-1.*A^3/B/C^3/(A+B)^4-.5000000000000000*A^3/B^2/C^2/(A+B)^4-2.500000000000000*A^2/B/C^2/(A+B)^4-.5000000000000000*A^2/B^2/C/(A+B)^4-2.*A/B/C/(A+B)^4-1./A*B^3/C^3/(A+B)^4-2.500000000000000/A*B^2/C^2/(A+B)^4-2./A*B/C/(A+B)^4-.5000000000000000/A^2*B^3/C^2/(A+B)^4-.5000000000000000/A^2*B^2/C/(A+B)^4-.5000000000000000*A^4/B/C^3/(A+B)^4-.2500000000000000*A^4/B^2/C^2/(A+B)^4-.2500000000000000*A^4/B^3/C/(A+B)^4-1.*A^3/B/C^2/(A+B)^4-2.*A^3/B^2/C/(A+B)^4-18.*A*B/C^2/(A+B)^4+26.*A/C^2/(A+B)^4*z3+26.*B/C^2/(A+B)^4*z3+11.*A/B/C/(A+B)^4*z3+5./A*B^2/C^2/(A+B)^4*z3+11./A*B/C/(A+B)^4*z3+1/A^2*B^3/C^2/(A+B)^4*z3+5./A^2*B^2/C/(A+B)^4*z3+1/A^3*B^3/C/(A+B)^4*z3+A^3/B^2/C^2/(A+B)^4*z3+A^3/B^3/C/(A+B)^4*z3+5.*A^2/B/C^2/(A+B)^4*z3+5.*A^2/B^2/C/(A+B)^4*z3
Should I be surprised by this?
I was surprised due to reading documentation highlighting that eval() is slow and generally should be avoided (mainly due to its inherent security issues), but in this particular example, it seems to perform better than my code.
I do not believe exprtk is thread safe, so I doubt there is much of an option for multithreading.
For a speed up the shunting yard algorithm and reverse polish notation could be employed, but just from this comparison I was surprised at the speed difference between C++
and Python
. Is there an obvious reason for this difference in speed?
Is there a lot more overhead in exprtk
or is my code just complete rubbish? Probably the latter...
Edit
The reason I went with the exprtk library is due to reading this math parser bench marking investigation.
^
in Python is XOR. Also, your timings are likely to be more about disk access than expression parsing. – Tollhouse