Move the string out of a std::ostringstream
Asked Answered
P

6

35

If I construct a string made of a list of space separated floating point values using std::ostringstream:

std::ostringstream ss;
unsigned int s = floatData.size();
for(unsigned int i=0;i<s;i++)
{
    ss << floatData[i] << " ";
}

Then I get the result in a std::string:

std::string textValues(ss.str());

However, this will cause an unnecessary deep copy of the string contents, as ss will not be used anymore.

Is there any way to construct the string without copying the entire content?

Plop answered 8/10, 2014 at 21:11 Comment(20)
Are you sure that is copying? Its a perfectly reasonable case for applying RVO I think. Inspect your assembly to see what your compiler is doing.Bearskin
@Bearskin RVO applies to a return value. There is no return here.Dochandorrach
Standard says about str(): "returns a string object with a copy of the current contents of the stream." So yes it copiesPlop
@Plop you don't construct an istringstream anywhere here.Dochandorrach
@walter : sorry I was doing too much things at the same time. CorrectedPlop
As QoI, an implementation could do something nice with move(ss).str(), but I don't know if any does right now.Moxie
@MarcGlisse it can't, because it doesn't know from inside str() if there will be more writes or not.Flabbergast
I don't really know if this is exactly what you want, but you could use ss.rdbuf() which is supposed not to create the intermediate string.Mcguire
@Dochandorrach I mean the possible RVO from .str(). About the Standard quote, its a "copy" in an abstract sense since a string is a different media than a stream. But the implementation could do whatever it likes. Being practical, whoever cares of how that implementation works, if the data of the stream is buffered and can be easily moved into the stream instead of copied, etc...Bearskin
@MarcGlisse : can you write a member function prototype knowing that "*this" is a rvalue?Plop
@Plop Yes, you can, though I think most compilers don't support that well.Dochandorrach
@Plop yes akrzemi1.wordpress.com/2014/06/02/ref-qualifiersBearskin
Nice to know, I currently explicitely use "move_" prefixed non const functions to "pop" derived values for doing thisPlop
I am pretty sure RVO is applied on the str function. Why don't you step in your debugger to find out?Gremial
You might take a look at #1494682. Pity that there isn't a constructor taking a string & that would just use this as the underlying buffer.Dodecanese
May be useful - set your ostringstream to write to an external buffer that you have full control overSqualor
@MattMcNabb : I'd like to, but ostringstream does not allow this. It will not write to an external buffer.Plop
std::ostrstream will write to your buffer...Ascertain
@Ascertain : that's deprecated!!! Moreover, it will write to a char *, not to a string. This means it will not expand it when needed leading to buffer overrun. This might be the reason why it is deprecated.Plop
@Plop it will expand automatically unless you request a fixed-size output. Why do you think they can't remove it from the standard even though it was "deprecated" already in 1998? Of course it's not too hard to write your own streambuf with similar properties.Ascertain
A
14

std::ostringstream offers no public interface to access its in-memory buffer unless it non-portably supports pubsetbuf (but even then your buffer is fixed-size, see cppreference example)

If you want to torture some string streams, you could access the buffer using the protected interface:

#include <iostream>
#include <sstream>
#include <vector>

struct my_stringbuf : std::stringbuf {
    const char* my_str() const { return pbase(); } // pptr might be useful too
};

int main()
{
    std::vector<float> v = {1.1, -3.4, 1/7.0};
    my_stringbuf buf;
    std::ostream ss(&buf);
    for(unsigned int i=0; i < v.size(); ++i)
        ss << v[i] << ' ';
    ss << std::ends;
    std::cout << buf.my_str() << '\n';
}

The standard C++ way of directly accessing an auto-resizing output stream buffer is offered by std::ostrstream, deprecated in C++98, but still standard C++14 and counting.

#include <iostream>
#include <strstream>
#include <vector>

int main()
{
    std::vector<float> v = {1.1, -3.4, 1/7.0};
    std::ostrstream ss;
    for(unsigned int i=0; i < v.size(); ++i)
        ss << v[i] << ' ';
    ss << std::ends;
    const char* buffer = ss.str(); // direct access!
    std::cout << buffer << '\n';
    ss.freeze(false); // abomination
}

However, I think the cleanest (and the fastest) solution is boost.karma

#include <iostream>
#include <string>
#include <vector>
#include <boost/spirit/include/karma.hpp>
namespace karma = boost::spirit::karma;
int main()
{
    std::vector<float> v = {1.1, -3.4, 1/7.0};
    std::string s;
    karma::generate(back_inserter(s), karma::double_ % ' ', v);
    std::cout << s << '\n'; // here's your string
}
Ascertain answered 10/10, 2014 at 3:42 Comment(3)
+1 for the Karma approach of course. However, when Boost is in the picture, why not simply use Boost Iostreams and have ostream write to a container or array transparently :)Shameful
@Shameful thanks, and yes, boost::iostreams::array_sink is certainly worth mentioning (after all, cppreference's page on std::ostrstream mentions it)Ascertain
A much simpler approach is now possible in C++20, as detailed in a sibling answer.Doble
D
20

This is now possible with C++20, with syntax like:

const std::string s = std::move(ss).str();

This is possible because the std::ostringstream class now has a str() overload that is rvalue-ref qualified:

basic_string<charT, traits, Allocator> str() &&;  // since C++20

This was added in P0408, revision 7, which was adopted into C++20.

This is the exact approach suggested by @MarcGlisse in a prescient comment from October 2014.

Doble answered 16/3, 2021 at 19:52 Comment(0)
A
14

std::ostringstream offers no public interface to access its in-memory buffer unless it non-portably supports pubsetbuf (but even then your buffer is fixed-size, see cppreference example)

If you want to torture some string streams, you could access the buffer using the protected interface:

#include <iostream>
#include <sstream>
#include <vector>

struct my_stringbuf : std::stringbuf {
    const char* my_str() const { return pbase(); } // pptr might be useful too
};

int main()
{
    std::vector<float> v = {1.1, -3.4, 1/7.0};
    my_stringbuf buf;
    std::ostream ss(&buf);
    for(unsigned int i=0; i < v.size(); ++i)
        ss << v[i] << ' ';
    ss << std::ends;
    std::cout << buf.my_str() << '\n';
}

The standard C++ way of directly accessing an auto-resizing output stream buffer is offered by std::ostrstream, deprecated in C++98, but still standard C++14 and counting.

#include <iostream>
#include <strstream>
#include <vector>

int main()
{
    std::vector<float> v = {1.1, -3.4, 1/7.0};
    std::ostrstream ss;
    for(unsigned int i=0; i < v.size(); ++i)
        ss << v[i] << ' ';
    ss << std::ends;
    const char* buffer = ss.str(); // direct access!
    std::cout << buffer << '\n';
    ss.freeze(false); // abomination
}

However, I think the cleanest (and the fastest) solution is boost.karma

#include <iostream>
#include <string>
#include <vector>
#include <boost/spirit/include/karma.hpp>
namespace karma = boost::spirit::karma;
int main()
{
    std::vector<float> v = {1.1, -3.4, 1/7.0};
    std::string s;
    karma::generate(back_inserter(s), karma::double_ % ' ', v);
    std::cout << s << '\n'; // here's your string
}
Ascertain answered 10/10, 2014 at 3:42 Comment(3)
+1 for the Karma approach of course. However, when Boost is in the picture, why not simply use Boost Iostreams and have ostream write to a container or array transparently :)Shameful
@Shameful thanks, and yes, boost::iostreams::array_sink is certainly worth mentioning (after all, cppreference's page on std::ostrstream mentions it)Ascertain
A much simpler approach is now possible in C++20, as detailed in a sibling answer.Doble
S
5

+1 for the Boost Karma by @Cubbi and the suggestion to "create your own streambuf-dervied type that does not make a copy, and give that to the constructor of a basic_istream<>.".

A more generic answer, though, is missing, and sits between these two. It uses Boost Iostreams:

using string_buf = bio::stream_buffer<bio::back_insert_device<std::string> >;

Here's a demo program:

Live On Coliru

#include <boost/iostreams/device/back_inserter.hpp>
#include <boost/iostreams/stream_buffer.hpp>

namespace bio = boost::iostreams;

using string_buf = bio::stream_buffer<bio::back_insert_device<std::string> >;

// any code that uses ostream
void foo(std::ostream& os) {
    os << "Hello world " 
       << std::hex << std::showbase << 42
       << " " << std::boolalpha << (1==1) << "\n";
}

#include <iostream>

int main() {
    std::string output;
    output.reserve(100); // optionally optimize if you know roughly how large output is gonna, or know what minimal size it will require

    {
        string_buf buf(output);
        std::ostream os(&buf);
        foo(os);
    }

    std::cout << "Output contains: " << output;
}

Note that you can trivially replace the std::string withstd::wstring, or std::vector<char> etc.

Even better, you can use it with the array_sink device and have a fixed-size buffer. That way you can avoid any buffer allocation whatsoever with your Iostreams code!

Live On Coliru

#include <boost/iostreams/device/array.hpp>

using array_buf = bio::stream_buffer<bio::basic_array<char>>;

// ...

int main() {
    char output[100] = {0};

    {
        array_buf buf(output);
        std::ostream os(&buf);
        foo(os);
    }

    std::cout << "Output contains: " << output;
}

Both programs print:

Output contains: Hello world 0x2a true
Shameful answered 8/5, 2017 at 20:9 Comment(5)
Added a fixed-array buffer example that works with anything that accepts std::istream or std::ostreamShameful
Can the string output be cleared at will? Or will this break the stream?Divisionism
@BoundaryImposition Interesting question. If back_insert_device does do what the name suggests, that should be fine. I don't think I'd want to rely on that, since instantiating a new stream_buffer should not be expensive.Shameful
Is the reserve(100) important, or just speed optimization when output size can be determined?Leatherneck
@Leatherneck it's only allocation optimization, as the documentation of reserve will confirmShameful
S
4

I implemented "outstringstream" class, which I believe does exactly what you need (see take_str() method). I partially used code from: What is wrong with my implementation of overflow()?

#include <ostream>

template <typename char_type>
class basic_outstringstream : private std::basic_streambuf<char_type, std::char_traits<char_type>>,
                              public std::basic_ostream<char_type, std::char_traits<char_type>>
{
    using traits_type = std::char_traits<char_type>;
    using base_buf_type = std::basic_streambuf<char_type, traits_type>;
    using base_stream_type = std::basic_ostream<char_type, traits_type>;
    using int_type = typename base_buf_type::int_type;

    std::basic_string<char_type> m_str;

    int_type overflow(int_type ch) override
    {
        if (traits_type::eq_int_type(ch, traits_type::eof()))
            return traits_type::not_eof(ch);

        if (m_str.empty())
            m_str.resize(1);
        else
            m_str.resize(m_str.size() * 2);

        const std::ptrdiff_t diff = this->pptr() - this->pbase();
        this->setp(&m_str.front(), &m_str.back());

        this->pbump(diff);
        *this->pptr() = traits_type::to_char_type(ch);
        this->pbump(1);

        return traits_type::not_eof(traits_type::to_int_type(*this->pptr()));
    }

    void init()
    {
        this->setp(&m_str.front(), &m_str.back());

        const std::size_t size = m_str.size();
        if (size)
        {
            memcpy(this->pptr(), &m_str.front(), size);
            this->pbump(size);
        }
    }

public:

    explicit basic_outstringstream(std::size_t reserveSize = 8)
        : base_stream_type(this)
    {
        m_str.reserve(reserveSize);
        init();
    }

    explicit basic_outstringstream(std::basic_string<char_type>&& str)
        : base_stream_type(this), m_str(std::move(str))
    {
        init();
    }

    explicit basic_outstringstream(const std::basic_string<char_type>& str)
        : base_stream_type(this), m_str(str)
    {
        init();
    }

    const std::basic_string<char_type>& str() const
    {
        return m_str;
    }

    std::basic_string<char_type>&& take_str()
    {
        return std::move(m_str);
    }

    void clear()
    {
        m_str.clear();
        init();
    }
};

using outstringstream = basic_outstringstream<char>;
using woutstringstream = basic_outstringstream<wchar_t>;
Synonymy answered 10/8, 2016 at 0:30 Comment(9)
This is a good start, but shouldn't return a reference from str() and probably needs xsputn() and/or sync() overrides. I'm still working on it.Divisionism
Ok - no need for xsputn() or sync(), but your use of &m_str.front() and &m_str.back() in init() is broken; this has UB when the string is empty. With GCC 4.8.5, &m_str.front() is one after &m_str.back() in this case!! Then streamsize in xsputn() is -1 (rather than 0) and all hell breaks loose. &m_str[0] and &m_str[m_str.size()] should work (even when the latter is one-past-the-end; an impl kinda has to work that way in C++11).Divisionism
Frankly a vector<char> would be much safer all around (especially when you risk COW being in play, cough GCC), but it's not as useful to whoever's calling take_str().Divisionism
I reckon a call to setp in str() (between string copy and return) should finish the job. Here's my current implementation, in case you're interested and/or want to incorporate my changes: pastebin.com/jLZ3TF3bDivisionism
Eesh, remove the silly (and broken) xsputn() I left in there by mistake ;)Divisionism
@LightnessRacesinOrbit reference from str() seems to me a good idea and i would rather move to a C++11 version that guarantees non COW stringTomkin
@LightnessRacesinOrbit the code is already C++11 dependent, so I don't think your version of str() taking care of COW in pastebin is really needed, unless you are using non compliant C++11 compiler. I agree that the use of front() and back() looks a bit fishy, but for that I would rather use basic_string::data() and pointer arithmeticTomkin
@Tomkin libstdc++ even in C++11 mode had COW strings for several years (and, yes, this was non-compliant). It's fine since GCC 5, though. In reality I'd be tempted to use a compile-time check to build out that added hack for compliant toolchains.Divisionism
@LightnessRacesinOrbit Ok, returning reference to string was not good also because internal string doesn't have correct buffer content size. This was not correct also in your code in pastebin. Since Kuba is unresponsive, I posted a new answer with this fix and other fixes improvements. We can continue discussion there if you have other contributions.Tomkin
E
1

Update: In the face of people's continued dislike of this answer, I thought I'd make an edit and explain.

  1. No, there is no way to avoid a string copy (stringbuf has the same interface)

  2. It will never matter. It's actually more efficient that way. (I will try to explain this)

Imagine writing a version of stringbuf that keeps a perfect, moveable std::string available at all times. (I have actually tried this).

Adding characters is easy - we simply use push_back on the underlying string.

OK, but what about removing characters (reading from the buffer)? We'll have to move some pointer to account for the characters we've removed, all well and good.

However, we have a problem - the contract we're keeping that says we'll always have a std::string available.

So whenever we remove characters from the stream, we'll need to erase them from the underlying string. That means shuffling all the remaining characters down (memmove/memcpy). Because this contract must be kept every time the flow of control leaves our private implementation, this in practice means having to erase characters from the string every time we call getc or gets on the string buffer. This translates to a call to erase on every << operation on the stream.

Then of course there's the problem of implementing the pushback buffer. If you pushback characters into the underlying string, you've got to insert them at position 0 - shuffling the entire buffer up.

The long and short of it is that you can write an ostream-only stream buffer purely for building a std::string. You'll still need to deal with all the reallocations as the underlying buffer grows, so in the end you get to save exactly one string copy. So perhaps we go from 4 string copies (and calls to malloc/free) to 3, or 3 to 2.

You'll also need to deal with the problem that the streambuf interface is not split into istreambuf and ostreambuf. This means you still have to offer the input interface and either throw exceptions or assert if someone uses it. This amounts to lying to users - we've failed to implement an expected interface.

For this tiny improvement in performance, we must pay the cost of:

  1. developing a (quite complex, when you factor in locale management) software component.

  2. suffering the loss of flexibility of having a streambuf which only supports output operations.

  3. Laying landmines for future developers to step on.

Earthnut answered 8/10, 2014 at 21:41 Comment(18)
"String copies on a modern cpu are extremely cheap" Are they? What if my program needs to parse a few gigabytes of text data? (Sometimes it does)Gremial
The parsing will take longer than the copying, by a huge factor.Earthnut
Consider the following test code: pastebin.com/YYtT6VwH In release they are the same speed actually, but in debug mode (which I need to use too) f1 is nearly twice as fast.Gremial
@Neil : yes I'm working on somewhat large lists (several tensmegabytes), file I/O takes about 30s and profiling shows I'm constructing strings all the time.Plop
@Plop If speed is very important, unfortunately, you have to use C parsing. It's not a popular fact, but it is faster.Gremial
Same for parsing floats out of a large text file... It seems that c++ does not allow doing this without having two copies of the text data in memory at the same time at one point of the codePlop
what about double d; stream >> d; ?Earthnut
@Richard : Sorry I meant parsing floats out of a large string (not file). You cannot make a stream for parsing the string without duplicating the whole string in memoryPlop
@Plop You can create your own streambuf-dervied type that does not make a copy, and give that to the constructor of a basic_istream<>. This gives you utility with non-copying efficiency (if that is truly important). Again though, I would reiterate that parsing ascii characters to doubles is a lot more expensive than merely copying the string containing the ascii characters. If you want efficiency, you might want to avoid the need for string-double conversions until the point of absolute necessity. i.e. the point at which the strings enter/leave the library/program.Earthnut
@NeilKirk If your program really is copying a gigabytes of string data then there are a number of techniques for iterating over the data without reading it all into memory. Memory-mapped files, converting direct from the input stream, batch processing, not converting (store in binary format) etc etc. Efficiency is almost always a problem of choosing the correct algorithm, not optimising an existing algorithm.Earthnut
I know it was just an example. The point is minimizing string copies is a good idea. It might not always matter, but sometimes it does. Also I can't be bothered making my own streams and allocators for something that should be provided by the language automatically. If speed is critical, it's back to C parsing for me, unfortunately.Gremial
fscanf is still faster than stream >> d; for huge data.Gremial
That's probably true, but what you gain in speed you pay in safety. let the buyer beware :-)Earthnut
@NeilKirk C library's parsers/formatters are very slow too (relative to special-purpose libraries that don't have to honor locales).Ascertain
@Ascertain Yep we have our own parser for hex-only data.Gremial
@RichardHodges I like the remarks about fallacies lurking there. However, output-only streams into a pre-allocated buffer is are obviously very useful. How do you like the ~4-line approach¹ for that in my answer (¹ using Boost)?Shameful
@Shameful exactly what I would do. boost iostreams is awesome (but the documentation sucks!)Earthnut
@RichardHodges Not sure about "awesome" (I think it has the most crippling design warts of all boost libraries) but these building blocks are pretty functional indeed!Shameful
T
0

I adapted the very good @Kuba answer to fix some issues (unfortunately he's currently unresponsive). In particular:

  • added a safe_pbump to handle 64 bit offsets;
  • return a string_view instead of string (internal string doesn't have the right size of the buffer);
  • resize the string to current buffer size on the move semantics take_str method;
  • fixed take_str method move semantics with init before return;
  • removed a useless memcpy on init method;
  • renamed the template parameter char_type to CharT to avoid ambiguity with basic_streambuf::char_type;
  • used string::data() and pointer arithmetic instead of possible undefined behavior using string::front() and string::back() as pointed by @LightnessRacesinOrbit;
  • Implementation with streambuf composition.
#pragma once

#include <cstdlib>
#include <limits>
#include <ostream>
#include <string>
#if __cplusplus >= 201703L
#include <string_view>
#endif

namespace usr
{
    template <typename CharT>
    class basic_outstringstream : public std::basic_ostream<CharT, std::char_traits<CharT>>
    {
        using traits_type = std::char_traits<CharT>;
        using base_stream_type = std::basic_ostream<CharT, traits_type>;

        class buffer : public std::basic_streambuf<CharT, std::char_traits<CharT>>
        {
            using base_buf_type = std::basic_streambuf<CharT, traits_type>;
            using int_type = typename base_buf_type::int_type;

        private:
            void safe_pbump(std::streamsize off)
            {
                // pbump doesn't support 64 bit offsets
                // https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47921
                int maxbump;
                if (off > 0)
                    maxbump = std::numeric_limits<int>::max();
                else if (off < 0)
                    maxbump = std::numeric_limits<int>::min();
                else // == 0
                    return;

                while (std::abs(off) > std::numeric_limits<int>::max())
                {
                    this->pbump(maxbump);
                    off -= maxbump;
                }

                this->pbump((int)off);
            }

            void init()
            {
                this->setp(const_cast<CharT *>(m_str.data()),
                    const_cast<CharT *>(m_str.data()) + m_str.size());
                this->safe_pbump((std::streamsize)m_str.size());
            }

        protected:
            int_type overflow(int_type ch) override
            {
                if (traits_type::eq_int_type(ch, traits_type::eof()))
                    return traits_type::not_eof(ch);

                if (m_str.empty())
                    m_str.resize(1);
                else
                    m_str.resize(m_str.size() * 2);

                size_t size = this->size();
                this->setp(const_cast<CharT *>(m_str.data()),
                    const_cast<CharT *>(m_str.data()) + m_str.size());
                this->safe_pbump((std::streamsize)size);
                *this->pptr() = traits_type::to_char_type(ch);
                this->pbump(1);

                return ch;
            }

        public:
            buffer(std::size_t reserveSize)
            {
                m_str.reserve(reserveSize);
                init();
            }

            buffer(std::basic_string<CharT>&& str)
                : m_str(std::move(str))
            {
                init();
            }

            buffer(const std::basic_string<CharT>& str)
                : m_str(str)
            {
                init();
            }

        public:
            size_t size() const
            {
                return (size_t)(this->pptr() - this->pbase());
            }

#if __cplusplus >= 201703L
            std::basic_string_view<CharT> str() const
            {
                return std::basic_string_view<CharT>(m_str.data(), size());
            }
#endif
            std::basic_string<CharT> take_str()
            {
                // Resize the string to actual used buffer size
                m_str.resize(size());
                std::string ret = std::move(m_str);
                init();
                return ret;
            }

            void clear()
            {
                m_str.clear();
                init();
            }

            const CharT * data() const
            {
                return m_str.data();
            }

        private:
            std::basic_string<CharT> m_str;
        };

    public:
        explicit basic_outstringstream(std::size_t reserveSize = 8)
            : base_stream_type(nullptr), m_buffer(reserveSize)
        {
            this->rdbuf(&m_buffer);
        }

        explicit basic_outstringstream(std::basic_string<CharT>&& str)
            : base_stream_type(nullptr), m_buffer(str)
        {
            this->rdbuf(&m_buffer);
        }

        explicit basic_outstringstream(const std::basic_string<CharT>& str)
            : base_stream_type(nullptr), m_buffer(str)
        {
            this->rdbuf(&m_buffer);
        }

#if __cplusplus >= 201703L
        std::basic_string_view<CharT> str() const
        {
            return m_buffer.str();
        }
#endif
        std::basic_string<CharT> take_str()
        {
            return m_buffer.take_str();
        }

        const CharT * data() const
        {
            return m_buffer.data();
        }

        size_t size() const
        {
            return m_buffer.size();
        }

        void clear()
        {
            m_buffer.clear();
        }

    private:
        buffer m_buffer;
    };

    using outstringstream = basic_outstringstream<char>;
    using woutstringstream = basic_outstringstream<wchar_t>;
}
Tomkin answered 10/7, 2019 at 20:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.