Based on the comments and answers, there seem to be three approaches:
- Write a custom version of
getline()
possibly using the std::istream::getline()
member internally to get the actual characters.
- Use a filtering stream buffer to limit the amount of data potentially received.
- Instead of reading a
std::string
, use a string instantiation with a custom allocator limiting the amount of memory stored in the string.
Not all of the suggestions came with code. This answer provides code for all approaches and a bit of discussion of all three approaches. Before going into implementation details it is first worth pointing out that there are multiple choices of what should happen if an excessively long input is received:
- Reading an overlong line could result in a successful read of a partial line, i.e., the resulting string contains the read content and the stream doesn't have any error flags set. Doing so means, however, that it isn't possible to distinguish between a line hitting exactly the limit or being too long. Since the limit is somewhat arbitrary anyway it probably doesn't really matter, though.
- Reading an overlong line could be considered a failure (i.e., setting
std::ios_base::failbit
and/or std::ios_base::bad_bit
) and, since reading failed, yield an empty string. Yielding an empty string, obviously, prevents potentially looking at the string read so far to possibly see what's going on.
- Reading an overlong line could provide the partial line read and also set error flags on the stream. This seems reasonable behavior both detecting that there is something up and also providing the input for potential inspection.
Although there are multiple code examples implementing a limited version of getline()
already, here is another one! I think it is simpler (albeit possibly slower; performance can be dealt with when necessary) which also retains's std::getline()
s interface: it use the stream's width()
to communicate a limit (maybe taking width()
into account is a reasonable extension to std::getline()
):
template <typename cT, typename Traits, typename Alloc>
std::basic_istream<cT, Traits>&
safe_getline(std::basic_istream<cT, Traits>& in,
std::basic_string<cT, Traits, Alloc>& value,
cT delim)
{
typedef std::basic_string<cT, Traits, Alloc> string_type;
typedef typename string_type::size_type size_type;
typename std::basic_istream<cT, Traits>::sentry cerberos(in);
if (cerberos) {
value.clear();
size_type width(in.width(0));
if (width == 0) {
width = std::numeric_limits<size_type>::max();
}
std::istreambuf_iterator<char> it(in), end;
for (; value.size() != width && it != end; ++it) {
if (!Traits::eq(delim, *it)) {
value.push_back(*it);
}
else {
++it;
break;
}
}
if (value.size() == width) {
in.setstate(std::ios_base::failbit);
}
}
return in;
}
This version of getline()
is used just like std::getline()
but when it seems reasonable to limit the amount of data read, the width()
is set, e.g.:
std::string line;
if (safe_getline(in >> std::setw(max_characters), line)) {
// do something with the input
}
Another approach is to just use a filtering stream buffer to limit the amount of input: the filter would just count the number of characters processed and limit the amount to a suitable number of characters. This approach is actually easier applied to an entire stream than an individual line: when processing just one line, the filter can't just obtain buffers full of characters from the underlying stream because there is no reliable way to put the characters back. Implementing an unbuffered version is still simple but probably not particularly efficient:
template <typename cT, typename Traits = std::char_traits<char> >
class basic_limitbuf
: std::basic_streambuf <cT, Traits> {
public:
typedef Traits traits_type;
typedef typename Traits::int_type int_type;
private:
std::streamsize size;
std::streamsize max;
std::basic_istream<cT, Traits>* stream;
std::basic_streambuf<cT, Traits>* sbuf;
int_type underflow() {
if (this->size < this->max) {
return this->sbuf->sgetc();
}
else {
this->stream->setstate(std::ios_base::failbit);
return traits_type::eof();
}
}
int_type uflow() {
if (this->size < this->max) {
++this->size;
return this->sbuf->sbumpc();
}
else {
this->stream->setstate(std::ios_base::failbit);
return traits_type::eof();
}
}
public:
basic_limitbuf(std::streamsize max,
std::basic_istream<cT, Traits>& stream)
: size()
, max(max)
, stream(&stream)
, sbuf(this->stream->rdbuf(this)) {
}
~basic_limitbuf() {
std::ios_base::iostate state = this->stream->rdstate();
this->stream->rdbuf(this->sbuf);
this->stream->setstate(state);
}
};
This stream buffer is already set up to insert itself upon construction and remove itself upon destruction. That is, it can be used simply like this:
std::string line;
basic_limitbuf<char> sbuf(max_characters, in);
if (std::getline(in, line)) {
// do something with the input
}
It would be easy to add a manipulator setting up the limit, too. One advantage of this approach is that none of the reading code needs to be touched if the total size of the stream could be limited: the filter could be set up right after creating the stream. When there is no need to back out the filter, the filter could also use a buffer which would greatly improve the performance.
The third approach suggested is to use a std::basic_string
with a custom allocator. There are two aspects which are a bit awkward about the allocator approach:
- The string being read actually has a type which isn't immediately convertible to
std::string
(although it also isn't hard to do the conversion).
- The maximum array size can be easily limited but the string will have some more or less random size smaller than that: when the stream fails allocating an exception is thrown and there is no attempt to grow the string by a smaller size.
Here is the necessary code for an allocator limiting the allocated size:
template <typename T>
struct limit_alloc
{
private:
std::size_t max_;
public:
typedef T value_type;
limit_alloc(std::size_t max): max_(max) {}
template <typename S>
limit_alloc(limit_alloc<S> const& other): max_(other.max()) {}
std::size_t max() const { return this->max_; }
T* allocate(std::size_t size) {
return size <= max_
? static_cast<T*>(operator new[](size))
: throw std::bad_alloc();
}
void deallocate(void* ptr, std::size_t) {
return operator delete[](ptr);
}
};
template <typename T0, typename T1>
bool operator== (limit_alloc<T0> const& a0, limit_alloc<T1> const& a1) {
return a0.max() == a1.max();
}
template <typename T0, typename T1>
bool operator!= (limit_alloc<T0> const& a0, limit_alloc<T1> const& a1) {
return !(a0 == a1);
}
The allocator would be used something like this (the code compiles OK with a recent version of clang but not with gcc):
std::basic_string<char, std::char_traits<char>, limit_alloc<char> >
tmp(limit_alloc<char>(max_chars));
if (std::getline(in, tmp)) {
std::string(tmp.begin(), tmp.end());
// do something with the input
}
In summary, there are multiple approach each with its own small drawback but each reasonably viable for the stated goal of limiting denial of service attacks based on overlong lines:
- Using a custom version of
getline()
means the reading code needs to be changed.
- Using a custom stream buffer is slow unless the entire stream's size can be limited.
- Using a custom allocator gives less control and requires some changes to reading code.
basic_string
object using that allocator and read into it. – Disembarkmax_size()
function that spits out something small? – Footclothstd::string
. – Zhdanovin
with your own implementation that wraps the original streambuf and sends a'\n'
when certain number of characters are read. – Kovacevif (in.rdbuf()->in_avail() > max_size) { /* end */ }
... – Milamiladyin_avail()
shows how many characters are known to be available, not how many there will be available in total. Basically, it tells how many characters are currently in the stream buffer's buffer. Once these are read, the stream buffer will provide another buffer of characters, possibly blocking until more characters arrive or it becomes apparent that the source has gone stale. – Zhdanovgetline
, or higher up in your application. E.g. if you are reading from a socket (e.g. throughboost::asio::ip::tcp::iostream
), limiting the buffer size may guard against a single huge line. But a malicious agent may also mount a DoS attack by sending a huge numnber of small lines. Does your code also aim to guard against that threat? Or should it somehow be guarded against in your firewall, outside your application? – Ritenutostd::getline()
is that there is no obvious way to control when they should stop accepting input. That said, from the approaches suggested the approach limiting the amount of data forwarded by a stream buffer could be used easily to limit the total amount of data (in fact, it is easier and more efficient to limit the total amount of data). – Zhdanova malicious agent may **mount** a denial of service attack against this code using a huge line
-> That sounded soooo awesome but since I'm a C++ noobie, can you please tell how could one do that? :) – Prorogue