remove whitespace in std::string [duplicate]
Asked Answered
N

6

22

In C++, what's an easy way to turn:

This std::string

\t\tHELLO WORLD\r\nHELLO\t\nWORLD     \t

Into:

HELLOWORLDHELLOWORLD
Nidus answered 9/1, 2013 at 10:24 Comment(4)
@tomislav-maric I don't think it's a duplicate of that post, the OP there was working with a cin stream, and thus using iostream functions.Nidus
similar but not exact duplicate, so not voting to close.Cabernet
@Cabernet I checked it again.. you are right, sorry about that.Poohpooh
See also Remove spaces from std::string in C++Cassation
C
35

Simple combination of std::remove_if and std::string::erase.

Not totally safe version

s.erase( std::remove_if( s.begin(), s.end(), ::isspace ), s.end() );

For safer version replace ::isspace with

std::bind( std::isspace<char>, _1, std::locale::classic() )

(Include all relevant headers)

For a version that works with alternative character types replace <char> with <ElementType> or whatever your templated character type is. You can of course also replace the locale with a different one. If you do that, beware to avoid the inefficiency of recreating the locale facet too many times.

In C++11 you can make the safer version into a lambda with:

[]( char ch ) { return std::isspace<char>( ch, std::locale::classic() ); }
Cabernet answered 9/1, 2013 at 10:26 Comment(18)
@chris ::isspace includes the new line as well: cplusplus.com/reference/cctype/isspaceAccusatorial
it will. isspace will return true for newlines.Cabernet
isspace has UB for all characters except those in the basic something something. C99 §7.4/1.Malcah
How did you perform your output? Are you sure you didn't stick one in e.g .(std::cout << s << std::endl)Cabernet
Never mind, it was me being completely stupid and not passing the second argument to erase (I typed one up before the answer).Rubble
@R.MartinhoFernandes does C99 standard apply to C++? C++ has its own standard.Cabernet
C++98 delegates the behaviour of the C standard library to C89, and C++11 delegates the behaviour of the C standard library to C99.Malcah
@Rubble Yes as std::remove_if returns an iterator, and erase has an overload for a single iterator, it will indeed compile and not give you the result you want if you forget the second s.end()Cabernet
@CashCow, I know, it's completely irritating when you forget it. In my case, I never saw the second argument when reading it how many times before I finally used it, so it's still wired in my brain that it only takes one.Rubble
Presumably the -1 from Mr Fernandes for use of ::isspace. perhaps he will enlighten us as to the special locale-based / character-set-based? You know for perfect UTF-8 it is not necessarily even a character-char one-to-one relationship so no functor / lambda will work here officially. The only thing that will work for perfect UTF-8 iteration that might be multi-character is a custom iterator.Cabernet
FWIW, all the whitespace characters in the example are encoded as single byte sequences in UTF-8, so yes, a simple lambda works for UTF-8.Malcah
You are saying that what looks like a whitespace will never appear as part of a multibyte character? I don't know the UTF-8 standard. The only thing I see as "undefined" are things like &nbsp; (non-breaking space) which is commonly ASCII 160 (or 0xA0) but might vary in other character sets.Cabernet
My apologies. I got slightly confused about the true nature of the problem :) I knew using isspace was wrong, but I got confused as to the why. The why is related to isspace taking an int and to char being signed. Here is a small program that explains the issue stacked-crooked.com/view?id=817f92f4a2482e5da0b7533285e53edb.Malcah
(And as a side note, NBSP is not in ASCII. ASCII has only 128 values).Malcah
(And note how this is not about multibyte encodings; any byte with a value higher than 0x7F in the source, regardless of encoding will trigger this issue; even single byte encodings like Latin-1 or Windows-1252 will cause it. Only 7-bit encodings like ASCII work fine)Malcah
Ok I have given the alternative answer that uses std::isspace with a locale.Cabernet
Doesn't the lambda version require a "return" statement?Consubstantial
For C++ newbies like me _1 is from std::placeholders, and represent future argumentsMaidstone
D
13

If C++03

struct RemoveDelimiter
{
  bool operator()(char c)
  {
    return (c =='\r' || c =='\t' || c == ' ' || c == '\n');
  }
};

std::string s("\t\tHELLO WORLD\r\nHELLO\t\nWORLD     \t");
s.erase( std::remove_if( s.begin(), s.end(), RemoveDelimiter()), s.end());

Or use C++11 lambda

s.erase(std::remove_if( s.begin(), s.end(), 
     [](char c){ return (c =='\r' || c =='\t' || c == ' ' || c == '\n');}), s.end() );

PS. Erase-remove idiom is used

Dearden answered 9/1, 2013 at 10:34 Comment(0)
F
4

c++11

std::string input = "\t\tHELLO WORLD\r\nHELLO\t\nWORLD     \t";

auto rs = std::regex_replace(input,std::regex("\\s+"), "");

std::cout << rs << std::endl;

/tmp ❮❮❮ ./play

HELLOWORLDHELLOWORLD
Flawy answered 10/1, 2013 at 23:11 Comment(0)
I
4

In C++11 you can use a lambda rather than using std::bind:

str.erase(
    std::remove_if(str.begin(), str.end(), 
        [](char c) -> bool
        { 
            return std::isspace<char>(c, std::locale::classic()); 
        }), 
    str.end());
Introrse answered 10/2, 2014 at 20:6 Comment(0)
B
3

You could use Boost.Algorithm's erase_all

#include <boost/algorithm/string/erase.hpp>
#include <iostream>
#include <string>

int main()
{
    std::string s = "Hello World!";
    // or the more expensive one-liner in case your string is const
    // std::cout << boost::algorithm::erase_all_copy(s, " ") << "\n";
    boost::algorithm::erase_all(s, " "); 
    std::cout << s << "\n";
}

NOTE: as is mentioned in the comments: trim_copy (or its cousins trim_copy_left and trim_copy_right) only remove whitespace from the beginning and end of a string.

Bookseller answered 9/1, 2013 at 10:32 Comment(1)
I saw some solutions that used Boost, but I'm not after a trim function, trimming I believe is doing something like XX___XX_ -> XX_XX whereas I want the final solution to be XXXX.Nidus
G
2

Stepping through it character by character and using string::erase() should work fine.

void removeWhitespace(std::string& str) {
    for (size_t i = 0; i < str.length(); i++) {
        if (str[i] == ' ' || str[i] == '\n' || str[i] == '\t') {
            str.erase(i, 1);
            i--;
        }
    }
}
Girardo answered 9/1, 2013 at 10:36 Comment(2)
Doesn't work when there are adjacent space characters. The first one is erased, moving the second one down to position i. Then you go around the loop, increment i, and never check the second one.Crissycrist
You're right. Fixed it.Girardo

© 2022 - 2024 — McMap. All rights reserved.