How to have a C++ stack with more than one data type?
Asked Answered
D

4

9

Here's the problem:

I am currently trying to create a simple stack-based programming language (Reverse Polish Notation, FORTH style) as a component of a larger project. I have hit a snag, though.

There is no problem with creating a stack in C++ (by using std::vector<>) that would contain one type of element (I could use the syntax std::vector<double> Stack, for instance).

However, a programming language needs to be able to hold multiple data types, such as ints, doubles, strings, and 3D vectors (as in physics vectors with X, Y, and Z components), just to name some simple things.

So, is there a construct in C++ that I could use as a stack that would be able to store more than one kind of primitive type/object/struct?

Dallas answered 16/2, 2014 at 0:16 Comment(0)
U
5

Sure, one way is to use a tagged union:

enum Type { INTEGER, DOUBLE, /* ... */ };

union Data {
    uint64_t as_integer;
    double as_double;
    // ...
};

struct Value {
    Type type;
    Data data;
};

The storage for as_integer, as_double, etc. will be overlapped, so a Value structure will take up two words of storage, and your stack will have type std::vector<Value>. Then you access members of data according to the value of type:

void sub(std::vector<Value>& stack) {
    // In reality you would probably factor this pattern into a function.
    auto b = stack.back();
    stack.pop_back();
    assert(b.type == INTEGER);

    auto a = stack.back();
    stack.pop_back();
    assert(a.type == INTEGER);

    Value result;
    result.type = INTEGER;
    result.data.as_integer = a.data.as_integer - b.data.as_integer;
    stack.push_back(result);
}

Of course, Forths are usually untyped, meaning that the stack consists of words only (std::vector<uint64_t>) and the interpretation of a data value is up to the word operating on it. In that case, you would pun via a union or reinterpret_cast to the appropriate type in the implementation of each word:

void subDouble(std::vector<Data>& stack) {
    // Note that this has no type safety guarantees anymore.
    double b = stack.back().as_double;
    stack.pop_back();

    double a = stack.back().as_double;
    stack.pop_back();

    Data result;
    result.as_double = a - b;
    stack.push_back(result);
}

void subDouble(std::vector<uint64_t>& stack) {
    double b = reinterpret_cast<double&>(stack.back());
    stack.pop_back();

    double a = reinterpret_cast<double&>(stack.back());
    stack.pop_back();

    double result = a - b;
    stack.push_back(reinterpret_cast<uint64_t&>(result));
}

Alternatively, you can store not values but pointers to instances of a class Value from which other value types such as Integer or Double would derive:

struct Value {};
struct Integer : Value { uint64_t value; };
struct Double : Value { double value; };
// ...

Your stack would have type std::vector<unique_ptr<Value>> or std::vector<Value*>. Then you needn’t worry about different value sizes, at the cost of making wrapper structures and allocating instances of them at runtime.

Unregenerate answered 16/2, 2014 at 0:21 Comment(8)
Good suggestion to go. Would you really need union in this case?Wheedle
@macroland: I would probably go for the tagged union; you do fewer allocations than with the inheritance-based approach, and you can still do runtime type checking for safety. And to operate on structures larger than a word, your union can include pointers such as Vector3* as_vector.Unregenerate
okay. If I understand this properly, there are 3 options: 1. Unions 2. Untyped words 3. A mess of structs. The third one is by far the least efficient, so we can forget about it. The second seems to be asking for runtime errors, so we are left with unions. I can kind of see how unions work, but are there any good tutorials on how to use them?Dallas
I did a little research of my own and I think I might get it better now. After looking, though, I have to ask about sanity-checking my code. EDIT: I think I get it now that I see your changed example.Dallas
So much undefined behavior in the "just-store-int64" version.Gossipry
@BenVoigt: Yep. I mention it for the sake of completeness, because it is the traditional way that Forth works. And you can rule out runtime errors by typechecking the input program, assuming the C++ compiler doesn’t optimise the stack-language compiler in such a way that it crashes. :PUnregenerate
The right way to "cast" those variants would be to use memcpy. No undefined behavior then, no overeager optimization.Gossipry
You could use std::variant (C++17) or boost::variant which like the tagged union but safer (but has, of course, the drawback of requiring boost or C++17 compliant compiler).Papoose
A
0

I would suggest to use the inheritance. Make common base class for the objects you need to store, and make a vector of base types. The store all the inheriting objects in this vector.

Antiphonary answered 16/2, 2014 at 0:21 Comment(2)
How does this work if you are trying to store primitive types? To my knowledge there is no superclass of the primitives.Dallas
You would have to create at least structures for that values. Look at the answer of Jon Purdy, especially its second part.Antiphonary
G
0

Since c++ is an object-orientated language, you might just use inheritance. Here is a quick example taken from http://www.cplusplus.com/forum/general/17754/ and extended:

#include <iostream>
#include <vector>
using namespace std;

// abstract base class
class Animal
{
public:
    // pure virtual method
    virtual void speak() = 0;
    // virtual destructor
    virtual ~Animal() {}
};

// derived class 1
class Dog : public Animal
{
public:
    // polymorphic implementation of speak
    virtual void speak() { cout << "Ruff!"; }
};

// derived class 2
class Cat : public Animal
{
public:
    // polymorphic implementation of speak
    virtual void speak() { cout << "Meow!"; }
};

int main( int argc, char* args[] )

    // container of base class pointers
    vector<Animal*> barn;

    // dynamically allocate an Animal instance and add it to the container
    barn.push_back( new Dog() );
    barn.push_back( new Cat() );

    // invoke the speak method of the first Animal in the container
    barn.front()->speak();

    // invoke all speak methods and free the allocated memory
    for( vector<Animal*>::iterator i = barn.begin(); i != barn.end(); ++i )
    {
        i->speak();
        delete *i;
    }
    // empty the container
    barn.clear();

    return 0;
}
Gawky answered 16/2, 2014 at 0:22 Comment(3)
If float, double, and string were all custom classes, maybe. How does this work with int, double, string, and other classes that I have no control over?Dallas
You can build wrapper classes for primitives and put them inside the vector. I don't know what exactly you want to do with the items of the stack, but this might be helpful for other parts of your program as well.Gawky
It should be added just for those looking through - @StefanNeubert used vector<Animal*> instead of vector<Animal> to prevent something called object slicing. en.wikipedia.org/wiki/Object_slicingAuriculate
P
0

The solution for storing different types is a tagged union

enum Type { INT, STRING, DOUBLE, POINT2D, VECTOR, OBJECT... };

union Data {
    int int_val;
    double double_val;
    struct point2D { int x, int y };
    struct { int v3, int v2, int v1, int v0 }; // you can even use unnamed structs
    // ...
};

struct StackElem {
    Type type;
    Data data;
};

In C++ it's even better to use std::variant (or boost::variant in older C++ standards), which might use a tagged union under the hood

However there's no need to use a single stack for all when using the reverse Polish notation. You can use a value stack and a separate operator stack. For every operator on the operator stack you pop the corresponding number of parameters from the value stack. That'll make things easier and save memory since you can use a small char array for operators (unless you need more than 255 operators), and no memory wasted for saving the type as well as the bigger-than-needed data field in the struct like above. That means you don't need a type OPERATOR in the Type enum

You can use a double type stack for all numeric types because a double can contain all int type's range without loss of precision. That's what implemented in Javascript and Lua. If the operator needs more than 1 parameter then just push/pop all of them just like what a compiler does when evaluating a function. You don't need to worry about int operations anymore, just do everything in double, unless there are specific int operators. But you may need different operators for different types, for example + for double addition, p or something like that for vector addition. However if you need 64-bit int then a separate integer type is needed

For example if you need to add 2 3D vectors, push 3 dimensions of the first vector, then the other. When you pop out a vector operator from the operator stack, pop 3 dimensions of the 2 vectors from value stack. After doing the math on it, push resulting 3 dimensions to stack. No need for a vector type.

If you don't want to store int as double then you can use NaN-boxing (or nunboxing/punboxing) like Firefox's JS engine, in which if the value is int then the upper 16 of 64 bits are 1s, otherwise it's double (or pointer, which you probable wouldn't use). Another way is type tag in 3 lower bits in old FFJS engines. In this case it's a little bit complicated but you can use the same operator for every type. For more information about this read Using the extra 16 bits in 64-bit pointers

You can even use a byte array for storing all data types and read the correct number of bytes specified by the operator. For example if the operator indicated that the next operand must be an int, just read 4 bytes. If it's a string, read the 4 bytes of string length first then the string content from the stack. If it's a 2D point of int read 4 bytes of x and 4 bytes of y. If it's a double read 8 bytes, etc. This is the most space efficient way, but obviously it must be traded by speed

Paramatta answered 17/2, 2014 at 3:51 Comment(8)
However, the problem comes when you already have more than one stack to begin with, and now every stack is actually a stack of stacks. In short, it seems a little more complex than necessary.Dallas
I don't understand what you mean. Why should you have more than one stack (actually in the above answer it's 2) to begin? And why is it a stack of stacks?Paramatta
well, if each stack has only one type of element, then I would need multiple stacks. These stacks have to be organized somehow, so they will end up in an array or other organizer of some sort. Now, if you are simulating more than one "computer" per program, then you now would store those "computers" in a stack, along with their memory. So, you end up with a stack of arrays of stacks, or a stack of stacks of objects that contain stacks (which sounds absurd when you first hear it, but it makes sense when you think about it enough). Unless shunting yard means something else and I don't get it.Dallas
No, you still only need 1 stack for data, see my edit. If you use tagged unions then it's nothing different than the way on my answer, except that it requires more memory. You still have to decide what type is it and then do the operations with it. Using the first way on my edit you don't even need to differentiate the typesParamatta
After some reading, researching, and thinking, I think that I understand what you are proposing. However, I do not see where the savings come in over a 16-byte system like this: float[4], int[4], double[2], char[16], bool[128]. I don't see how you could handle strings in your example, and this makes things standard and nice. Instructions are never on the stack to begin with, so I don't see any savings from putting them there.Dallas
for any other things you can use it to store a pointer to the real data, just like FF use a double or google chrome use a 32-bit int to store all javascript's datatype. The savings are from when you need to store only a double or int and won't have to waste more 8 bytes (plus a few bytes for the type tag)Paramatta
I probably wouldn't use a type tag. So if more than 1/3 of the stack would be pointers, my system is more efficient. You use 8 bytes for the pointer and then possibly another 16 for the data afterwards. so after a point, there are no savings.Dallas
Also, more importantly, unions seem simpler and more easy to understand. pointers seem less elegant when half of your data needs to be stored in one. Also, unions give me freedom to put whatever I want into 16 bytes, but it is also limiting in a way.Dallas

© 2022 - 2024 — McMap. All rights reserved.