Problems with a shunting yard algorithm

Asked 8/3, 2011 at 23:41 Answered 12/1, 2022 at 2:48

I have successfully implemented a shunting yard algorithm in java. The algorithm itself was simple however I am having trouble with the tokenizer. Currently the algorithm works with everything I want excluding one thing. How can I tell the difference between subtraction(-) and negative (-)

such as 4-3 is subtraction but -4+3 is negative

I now know how to find out when it should be a negative and when it should be a minus, but where in the algorithm should it be placed because if you use it like a function it wont always work for example

3 + 4 * 2 / -( 1 − 5 ) ^ 2 ^ 3

when 1-5 becomes -4 it will become 4 before it gets squared and cubed

just like 3 + 4 * 2 / cos( 1 − 5 ) ^ 2 ^ 3 , you would take the cosine before squaring and cubing

but in real math you wouldn’t with a - because what your really saying is 3 + 4 * 2 / -(( 1 − 5 ) ^ 2 ^ 3) in order to have the right value

Cleasta answered 8/3, 2011 at 23:41 Comment(1)

I added the 'java' tag, I thought it might get your question more views. – Marcelenemarcelia 9/3, 2011 at 0:3

It sounds like you're doing a lex-then-parse style parser, where you're going to need a simple state machine in the lexer in order to get separate tokens for unary and binary minus. (In a PEG parser, this isn't something you have to worry about.)

In JavaCC, you would have a DEFAULT state, where you would consider the - character to be UNARY_MINUS. When you tokenized the end of a primary expression (either a closing paren, or an integer, based on the examples you gave), then you would switch to the INFIX state where - would be considered to be INFIX_MINUS. Once you encountered any infix operator, you would return to the DEFAULT state.

If you're rolling your own, it might be a bit simpler than that. Look at this Python code for a clever way of doing it. Basically, when you encounter a -, you just check to see if the previous token was an infix operator. That example uses the string "-u" to represent the unary minus token, which is convenient for an informal tokenization. Best I can tell, the Python example does fail to handle case where a - follows an open paren, or comes at the beginning of the input. Those should be considered unary as well.

In order for unary minus to be handled correctly in the shunting-yard algorithm itself, it needs to have higher precedence than any of the infix operators, and it needs to marked as right-associative. (Make sure you handle right-associativity. You may have left it out since the rest of your operators are left-associative.) This is clear enough in the Python code (although I would use some kind of struct rather than two separate maps).

When it comes time to evaluate, you will need to handle unary operators a little differently, since you only need to pop one number off the stack, rather than two. Depending on what your implementation looks like, it may be easier to just go through the list and replace every occurrence of "-u" with [-1, "*"].

If you can follow Python at all, you should be able to see everything I'm talking about in the example I linked to. I find the code to be a bit easier to read than the C version that someone else mentioned. Also, if you're curious, I did a little write-up a while back about using shunting-yard in Ruby, but I handled unary operators as a separate nonterminal, so they are not shown.

Chronograph answered 9/3, 2011 at 2:54 Comment(2)

That Python code you linked to is now only available in the Web Archive snapshot: web.archive.org/web/20130702040830/http://… – Biotope 13/5, 2022 at 17:0

In another SO thread someone else recommended this cheaper implementation to a unary minus: [0, "-"] (zero minus the value to be negated). – Biotope 13/5, 2022 at 17:3

The answers to this question might be helpful.

In particular, one of those answers references a solution in C that handles unary minus.

Basically, you have to recognize a unary minus based on the appearance of the minus sign in positions where a binary operator can't be, and make a different token for it, as it has different precedence.

Dijkstra's original paper doesn't too clearly explain how he dealt with this, but the unary minus was listed as a separate operator.

Decimeter answered 9/3, 2011 at 1:10 Comment(1)

the standard shunting yard algorithm deosn't support them, im trying to modify it to support them. Wolfram alpha, texas instruments, wolfram mathematica, microsoft math ect.. support them though and all of those use a version of the shunting yard algorithm – Cleasta 9/3, 2011 at 1:24

This isn't in Java, but here is a library I wrote to specifically solve this problem after searching and not finding any clear answers. This does all you want and more:

https://marginalhacks.com/Hacks/libExpr.rb/

It is a ruby library (as well as a testbench to check it) that runs a modified shunting yard algorithm that also supports unary ('-a') and ternary ('a?b:c') ops. It also does RPN, Prefix and AST (abstract syntax trees) - your choice, and can evaluate the expression, including the ability to yield to a block (a lambda of sorts) that can handle any variable evaluation. Only AST does the full set of operations, including the ability to handle short-circuit operations (such as '||' and '?:' and so on), but RPN does support unary. It also has a flexible precedence model that includes presets for precedence as done by C expressions or by Ruby expressions (not the same). The testbench itself is interesting as it can create random expressions which it can then eval() and also run through libExpr to compare results.

It's fairly documented/commented, so it shouldn't be too hard to convert the ideas to Java or some other language.

The basic idea as far as unary operators is that you can recognize them based on the previous token. If the previous token is either an operator or a left-paren, then the "unary-possible" operators (+ and -) are just unary and can be pushed with only one operand. It's important that your RPN stack distinguishes between the unary operator and the binary operator so it knows what to do on evaluation.

Boyer answered 12/1, 2022 at 2:48 Comment(7)

Adding a new answer to a 12-year old question, one which ignores the requested language in the question, doesn't attempt to actually answer the question posed, but happens to match a keyword for your newly published tool is simply spamming. – Super 12/1, 2022 at 3:50

The answer here gives information about how the solution can be derived. The age of the question is irrelevant, since there still aren't any solutions generally available. – Boyer 12/1, 2022 at 4:32

(to be more clear, I wrote this specifically because of this stackoverflow question and the fact that I can't find a general solution anywhere) – Boyer 12/1, 2022 at 4:49

That helps. Part of my problem is the "Here is a library that helps" without any disclaimer that it's your own work. It felt as though you wrote a library and then scanned SO for places to promote it. I'm glad that's not the case. If you had stated in your question that the library was written in response to the question, the age of the question would definitely be irrelevant. Note that it's relatively easy to find JS answers to this problem. I'll try to look at your code soon, although my Ruby is rusty. – Super 12/1, 2022 at 16:55

I've made it clear that I wrote it and it was in response to needing to solve this exact problem. Incidentally, your link "easy to find" doesn't seem to help me actually find any, but I'll believe you. – Boyer 13/1, 2022 at 1:17

Thank you. This is much improved. Upvoted. – Super 13/1, 2022 at 3:5

Glad we could figure out how to make it a better answer. :) – Boyer 14/1, 2022 at 18:56

In your lexer, you can implement this pseudo-logic:

if (symbol == '-') {
    if (previousToken is a number 
     OR previousToken is an identifier 
     OR previousToken is a function) {
        currentToken = SUBTRACT;
    } else {
        currentToken = NEGATION;
    }
}

You can set up negation to have a precedence higher than multiply and divide, but lower than exponentiation. You can also set it up to be right associative (just like '^'). Then you just need to integrate the precedence and associativity into the algorithm as described on Wikipedia's page.

If the token is an operator, o1, then: while there is an operator token, o2, at the top of the stack, and either o1 is left-associative and its precedence is less than or equal to that of o2, or o1 has precedence less than that of o2, pop o2 off the stack, onto the output queue; push o1 onto the stack.

I ended up implementing this corresponding code:

} else if (nextToken instanceof Operator) {
    final Operator o1 = (Operator) nextToken;

    while (!stack.isEmpty() && stack.peek() instanceof Operator) {
        final Operator o2 = (Operator) stack.peek();

        if ((o1.associativity == Associativity.LEFT && o1.precedence <= o2.precedence)
         || (o1.associativity == Associativity.RIGHT && o1.precedence < o2.precedence)) {
            popStackTopToOutput();
        } else {
            break;
        }
    }

    stack.push(nextToken);
}

Austin Taylor is quite right that you only need to pop off one number for a unary operator:

if (token is operator negate) {
    operand = pop;
    push operand * -1;
}

Example project:

https://github.com/Digipom/Calculator-for-Android/

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags