Interpret a std::string as a std::vector of char_type?
Asked Answered
S

7

7

I have a template<typename T> function that takes a const vector<T>&. In said function, I have vectors cbegin(), cend(), size(), and operator[]. As far as I understand it, both string and vector use contiguous space, so I was wondering if I could reuse the function for both data types in an elegant manner.

Can a std::string be reinterpreted as a std::vector of (the appropriate) char_type? If so, what would the limitations be?

Supervise answered 8/10, 2015 at 21:38 Comment(3)
This is what iterators were created for.Dromous
Have you thought about making the function a template?Sapir
Relying on the under the hood behavior of a class instead of the published interface can be dangerous. What is the need for this?Vassalize
U
14

If you make your template just for type const T& and use the begin(), end(), etc, functions which both vector and string share then your code will work with both types.

Unconnected answered 8/10, 2015 at 21:41 Comment(4)
Is there any implementation in which the generated code is shared (and not just the source code)?Padraic
@6502: The generated code will not be shared unless the standard library authors went to some extreme trouble to make it happen. But really why do you care? When optimized the iterator and operator[] operations compile to just a few machine instructions each. It isn't a big deal.Unconnected
@ZanLynx: the whole function will be duplicated, not just the [] access code. It could be a big function...Padraic
@Padraic a quality linker will eliminate identical functionsOil
U
7

Go STL way and use iterators. Accept iterator to begin and iterator to end. It will work with all possible containers, including non-containers like streams.

Unlawful answered 8/10, 2015 at 21:44 Comment(0)
T
6

There is no guarantee the layout of string and vector will be the same. They theoretically could be, but they probably aren't in any common implementation. Therefore, you can't do this safely. See Zan's answer for a better solution.

Let me explain: If I am a standard library implementer and decide to implement std::string like so....

template ...
class basic_string {
public:
    ...
private:
    CharT* mData;
    size_t mSize;
};

and decide to implement std::vector like so...

template ...
class vector {
public:
    ...
private:
    T* mEnd;
    T* mBegin;
};

When you reinterpret_cast<string*>(&myVector) you wind up interpreting the pointer to the end of your data as the pointer to the start of your data, and the pointer to the start of your data to the size of your data. If the padding between members is different, or there are extra members, it could get even weirder and more broken than that too.

So yes, in order for this to possibly work they both need to store contiguous data, but they also need quite a bit else to be the same between the implementations for it to work.

Tomahawk answered 8/10, 2015 at 21:42 Comment(6)
Both string and vector use contiguous array for storing data. It means that a non-templated function taking just memory addresses can be used.Especial
@AndreyNasonov You're wrong. Updated answer to explain it. Please don't downvote before you understand :(Tomahawk
I'm not talking about field layout and particular implementation. I'm talking about data representation. Both string and vector provide data() function pointing to the first element. It is guaranteed that it is a contiguous piece of memory.Especial
@AndreyNasonov Are you saying that calling certain functions on reinterpret_cast<string*>(&myVector) will work? I can't quite tell what it is you're saying, but that's not right. Yes they both point to contiguous data... that doesn't change anything I've said...Tomahawk
What's the problem making function signature (const T *begin, const T *end) and calling f(v.data(), v.data() + v.size())? I'm taking ONLY about what returns data() method. It returns the same for vector and string.Especial
@AndreyNasonov nothing is wrong with that but I was answering the question as to whether you can reinterpret_cast a vector to a string safely... I think that's clear, no?Tomahawk
O
6

std::experimental::array_view<const char> n4512 represents a contiguous buffer of chars.

Writing your own is not hard, and it solves this problem and (in my experience) many more.

Both string and vector are compatible with an array view.

This lets you move your implementation into a .cpp file (and not expose it), gives you the same performance as doing it with std::vector<T> const& and probably the same implementation, avoids duplicating code, and uses light weight contiguous buffer type erasure (which is full of tasty keywords).

Oil answered 8/10, 2015 at 22:50 Comment(0)
P
2

If the key point is that you want to access a continuous area in memory where instances of a specific char type are stored then you could define your function as

void myfunc(const CType *p, int size) {
     ...
}

to make it clear that you assume they must be adjacent in memory.

Then for example to pass the content of a vector the code is simply

myfunc(&myvect[0], myvect.size());

and for a string

myfunc(mystr.data(), mystr.size());

or

myfunc(buffer, n);

for an array.

Padraic answered 8/10, 2015 at 21:48 Comment(13)
This approach has an advantage: it does not use templates. But please change int to size_t.Especial
@SergeyA: Writing a template (in the implementations I know) will share the source code, generating however distinct code for distinct types.Padraic
@AndreyNasonov, why would this be an advantage? In my view, it's a drawback.Unlawful
@6502, you are referring to code bloat? It is overhyped. Especially in the world of inlining.Unlawful
@AndreyNasonov: int is IMO a better type for size. size_t is an historical accident dating back to 16-bit era. Just because the standard library is doomed to it for backward compatibility reasons I'm not punishing myself repeating the same mistake in my code.Padraic
@6502, that's just plain wrong. I mean, sizes. int is usually 32 bit type, size_t 32 or 64 depending on pointer size. The code which uses int's for sizes is error prone.Unlawful
@6502, Agree, I hate size_t too because it is unsigned.Especial
@SergeyA: don't be fooled by the name. unsigned doesn't mean "non-negative" but instead "member of the Z_{2^n} modulo ring". You really think it does make sense to say that the size of a vector is a member of a modulo ring? Actually using unsigned types for size is the source of many bugs.Padraic
@SergeyA, Because I do not want to think about mixing signed and unsigned types. I want to use only signed types.Especial
@Padraic mixing unsigned and signed may be an issue but since a vector could be larger than what an int can hold you have now introduced another bug.Quentinquercetin
@NathanOliver: Like I said 16-bits machines were the reason for which we have unsigned size_t values. In my opinion even back then it was a wrong choice (if 15 bit are not enough now, 16 won't be enough either damn soon). Making that choice now would be just inexcusable (quantities are still unsigned only because of backward compatibility). The problem with unsigned is that while ints have problematic behavior around huge numbers/quantity that rarely are used in programs, unsigned have a problematic behavior around 0, that is an incredibly common value.Padraic
If you want signed, use std::ptrdiff_t. However, int for a size is just silly, at least on a 64-bit platform. One of my (small) company's servers has 512GiB RAM. There's freaking telephones today with more than 2GiB memory. So unless you're writing for "lesser" embedded systems, or writing small utilities, avoid using int for sizes.Kaufman
@ArneVogel: the problem is not using a specific type for sizes, the problem is using an unsigned type because C++ semantic for unsigned is special an just plain wrong for the size of a container (especially implicit promotion rules). Using a long long however makes perfect sense.Padraic
R
1

You can't directly typecast a std::vector to a std::string or vice versa. But using the iterators that STL containers provide does allow you to iterate both a vector and a string in the same way. And if your function requires random access of the container in question then either would work.

std::vector<char> str1 {'a', 'b', 'c'};
std::string str2 = "abc";

template<typename Iterator>
void iterator_function(Iterator begin, Iterator end)
{
  for(Iterator it = begin; it != end; ++it)
  {
    std::cout << *it << std::endl;
  }
}

iterator_function(str1.begin(), str1.end());
iterator_function(str2.begin(), str2.end());

Both of those last two function calls would print the same thing.

Now if you wanted to write a generic version that parsed only characters only stored in a string or in a vector you could write something that iterated the internal array.

void array_function(const char * array, unsigned length)
{
  for(unsigned i = 0; i < length; ++i)
  {
    std::cout << array[i] << std::endl;
  }
}

Both functions would do the same thing in the following scenarios.

std::vector<char> str1 {'a', 'b', 'c'};
std::string str2 = "abc";

iterator_function(str1.begin(), str1.end());
iterator_function(str2.begin(), str2.end());
array_function(str1.data(), str1.size());
array_function(str2.data(), str2.size());

There are always multiple ways to solve a problem. Depending on what you have available any number of solutions might work. Try both and see which works better for your application. If you don't know the iterator type then the char typed array iteration is useful. If you know you will always have the template type to pass in then the template iterator method might be more useful.

Raffle answered 8/10, 2015 at 21:46 Comment(0)
C
1

The way your question is put at the moment is a bit confusing. If you mean to be asking "is it safe to cast a std::vector type to a std::string type or vice versa if the vector happens to contain char values of the appropriate type?", the answer is: no way, don't even think about it! If you're asking: "can I access the contiguous memory of non-empty sequences of char type if they're of the type std::vector or std::string?" then the answer is, yes you can (with the data() member function).

Compressibility answered 9/10, 2015 at 7:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.