Understanding the efficiency of an std::string
Asked Answered
P

4

7

I'm trying to learn a little bit more about c++ strings.

consider

const char* cstring = "hello";
std::string string(cstring);

and

std::string string("hello");

Am I correct in assuming that both store "hello" in the .data section of an application and the bytes are then copied to another area on the heap where the pointer managed by the std::string can access them?

How could I efficiently store a really really long string? I'm kind of thinking about an application that reads in data from a socket stream. I fear concatenating many times. I could imagine using a linked list and traverse this list.

Strings have intimidated me for far too long!

Any links, tips, explanations, further details, would be extremely helpful.

Periapt answered 23/4, 2011 at 21:54 Comment(8)
How long is really really long? 10 MB? 20 perhaps? Or did you mean something like 10 GB? Computers have gigabytes of RAM these days.Calliecalligraphy
I was thinking more like 10-100 MB. This was intended to be an open ended question though, so feel free to explain either =). I'm pretty sure due to OS memory management you can't allocate 10GB of contiguous memory. I assume a list would be required for such a size.Periapt
@kispit - On my 64 bit machine I can allocate memory until the swap file fills the hard disk. Tens of GBs is no problem!Slim
@kisplit - Also, you assumption that "Hello" will be stored on the heap is incorrect for many implementations. Small strings are stored inside the string object. See here https://mcmap.net/q/386796/-why-is-sizeof-string-32.Slim
@Bo Persson - Thanks for that link. I didn't know std::string had a buffer for smaller strings. I'm still fairly certain that the literal will be stored in the .data section. I know I can allocate many chunks of memory until I run out of swap; I meant a large contiguous amount.Periapt
@kisplit: That assumption is most likely correct.Hectic
@kisplit - My point about the virtual memory was that allocating large blocks of memory is problem only on a 32 bit system, not on a 64 bit address space where you have terabytes of contiguous virtual memory.Slim
@Bo Persson - I just tested this on a linux 64 bit VM 20GB swap. You are indeed correct. Thank you for helping me erase this fallacy :)Periapt
H
3

I have stored strings in the 10's or 100's of MB range without issue. Naturally, it will be primarily limited by your available (contiguous) memory / address space.

If you are going to be appending / concatenating, there are a few things that may help efficiency-wise: If possible, try to use the reserve() member function to pre-allocate space-- even if you have a rough idea of how big the final size might be, it would save from unnecessary re-allocations as the string grows.

Additionally, many string implementations use "exponential growth", meaning that they grow by some percentage, rather than fixed byte size. For example, it might simply double the capacity any time additional space is needed. By increasing size exponentially, it becomes more efficient to perform lots of concatenations. (The exact details will depend on your version of stl.)

Finally, another option (if your library supports it) is to use rope<> template: Ropes are similar to strings, except that they are much more efficient when performing operations on very large strings. In particular, "ropes are allocated in small chunks, significantly reducing memory fragmentation problems introduced by large blocks". Some additional details on SGI's STL guide.

Hiram answered 23/4, 2011 at 22:6 Comment(0)
G
2

Since you're reading the string from a socket, you can reuse the same packet buffers and chain them together to represent the huge string. This will avoid any needless copying and is probably the most efficient solution possible. I seem to remember that the ACE library provides such a mechanism. I'll try to find it.

EDIT: ACE has ACE_Message_Block that allows you to store large messages in a linked-list fashion. You almost need to read the C++ Network Programming books to make sense of this colossal library. The free tutorials on the ACE website really suck.

I bet Boost.Asio must be capable of doing the same thing as ACE's message blocks. Boost.Asio now seems to have a larger mindshare than ACE, so I suggest looking for a solution within Boost.Asio first. If anyone can enlighten us about a Boost.Asio solution, that would be great!


It's about time I try writing a simple client-server app using Boost.Asio to see what all the fuss is about.

Grison answered 24/4, 2011 at 0:49 Comment(0)
I
1

I don't think efficiency should be the issue. Both will perform well enough.

The deciding factor here is encapsulation. std::string is a far better abstraction than char * could ever be. Encapsulating pointer arithmetic is a good thing.

A lot of people thought long and hard to come up with std::string. I think failing to use it for unfounded efficiency reasons is foolish. Stick to the better abstraction and encapsulation.

Infantilism answered 23/4, 2011 at 22:7 Comment(4)
I don't think it's unreasonable to worry about efficiency with the kind of data size we're talking about here; strcat leads to a guaranteed O(n^2) solution which might be show-stoppingly slow. There's a good chance string.append can be O(n), the standard might even guarantee it.Loreleilorelie
The fact that a lot of people thought long and hard about std::string unfortunately does not to speak to it's advantage... It is one of the most criticized parts of STL and for good reason, I think. Other languages do better for strings.Hectic
@kotlinski: I'd be curious to read some criticisms of std::string. Have any links?Hiram
@Eric: You can find the web... but basically it comes down to interface and performance. 1) A lot of basic stuff is just very complicated to do. For example, trimming a string, or making it upper/lowercase... why? If you used strings in some other language, you know what I mean. 2) Performance is just not very good in many cases - std::string is outperformed by C#/Java strings, also stringstreams are a LOT slower than sprintf (if you want to concat a float or int). 3) Too many member functions - std::string class is really big for no good reason. Many of these could be free functions instead.Hectic
T
0

As you probably know, an std::string is really just another name for basic_string<char>.

That said, they are a sequence container and memory will be allocated sequentially. It's possible to get an exceptions from an std::string if you try to make one bigger than the available contiguous memory that you can allocate. This threshold is typically considerably less than the total available memory due to memory fragmentation.

I've seen problems allocating contiguous memory when trying to allocate, for instance, large contiguous 3D buffers for images. But these issues don't start happening at least on the order of 100MB or so, at least in my experience, on Windows XP Pro (for instance.)

Are your strings this big?

Tobit answered 23/4, 2011 at 22:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.