How do you reverse a string in C or C++ without requiring a separate buffer to hold the reversed string?
The standard algorithm is to use pointers to the start / end, and walk them inward until they meet or cross in the middle. Swap as you go.
Reverse ASCII string, i.e. a 0-terminated array where every character fits in 1 char
. (Or other non-multibyte character sets).
void strrev(char *head)
{
if (!head) return;
char *tail = head;
while(*tail) ++tail; // find the 0 terminator, like head+strlen
--tail; // tail points to the last real char
// head still points to the first
for( ; head < tail; ++head, --tail) {
// walk pointers inwards until they meet or cross in the middle
char h = *head, t = *tail;
*head = t; // swapping as we go
*tail = h;
}
}
// test program that reverses its args
#include <stdio.h>
int main(int argc, char **argv)
{
do {
printf("%s ", argv[argc-1]);
strrev(argv[argc-1]);
printf("%s\n", argv[argc-1]);
} while(--argc);
return 0;
}
The same algorithm works for integer arrays with known length, just use tail = start + length - 1
instead of the end-finding loop.
(Editor's note: this answer originally used XOR-swap for this simple version, too. Fixed for the benefit of future readers of this popular question. XOR-swap is highly not recommended; hard to read and making your code compile less efficiently. You can see on the Godbolt compiler explorer how much more complicated the asm loop body is when xor-swap is compiled for x86-64 with gcc -O3.)
Ok, fine, let's fix the UTF-8 chars...
(This is XOR-swap thing. Take care to note that you must avoid swapping with self, because if *p
and *q
are the same location you'll zero it with a^a==0. XOR-swap depends on having two distinct locations, using them each as temporary storage.)
Editor's note: you can replace SWP with a safe inline function using a tmp variable.
#include <bits/types.h>
#include <stdio.h>
#define SWP(x,y) (x^=y, y^=x, x^=y)
void strrev(char *p)
{
char *q = p;
while(q && *q) ++q; /* find eos */
for(--q; p < q; ++p, --q) SWP(*p, *q);
}
void strrev_utf8(char *p)
{
char *q = p;
strrev(p); /* call base case */
/* Ok, now fix bass-ackwards UTF chars. */
while(q && *q) ++q; /* find eos */
while(p < --q)
switch( (*q & 0xF0) >> 4 ) {
case 0xF: /* U+010000-U+10FFFF: four bytes. */
SWP(*(q-0), *(q-3));
SWP(*(q-1), *(q-2));
q -= 3;
break;
case 0xE: /* U+000800-U+00FFFF: three bytes. */
SWP(*(q-0), *(q-2));
q -= 2;
break;
case 0xC: /* fall-through */
case 0xD: /* U+000080-U+0007FF: two bytes. */
SWP(*(q-0), *(q-1));
q--;
break;
}
}
int main(int argc, char **argv)
{
do {
printf("%s ", argv[argc-1]);
strrev_utf8(argv[argc-1]);
printf("%s\n", argv[argc-1]);
} while(--argc);
return 0;
}
- Why, yes, if the input is borked, this will cheerfully swap outside the place.
- Useful link when vandalising in the UNICODE: http://www.macchiato.com/unicode/chart/
- Also, UTF-8 over 0x10000 is untested (as I don't seem to have any font for it, nor the patience to use a hexeditor)
Examples:
$ ./strrev Räksmörgås ░▒▓○◔◑◕●
░▒▓○◔◑◕● ●◕◑◔○▓▒░
Räksmörgås sågrömskäR
./strrev verrts/.
log n
was applied there in a little bit different context. –
Tara while(q && *q) ++q;
to find the eos? Isn't it the same as while(*q) ++q;
, since q is never going to be 0? How can you be sure that q or *q are going to be 0? –
Knitted a^a==0
" - wrong. a ^ a == 0
, but that's not a problem, because then you will do a ^ (a ^ a)
which is a ^ 0
which is a
. So the XOR swap works even if the two swappees (is there such a word) are equal. –
Felicafelicdad char *q = p; while(q && *q) ++q; for(--q; p < q; ++p, --q)
to do NULL
detection is problematic. For the code progress to --q
when q
is NULL and at best that is the largest legal pointer, at worst it may be UB. Then the for()
loop would iterate for a long time. Suggest a simple if (q != NULL) { do the rest of code }
. –
Dorweiler while(q && *q) ++q;
why not just use strchr(q, '\0');
? Probably doesn't matter but it seems more natural. Of course there are other ways too. Maybe you just didn't want to include string.h even. –
Dilley #include <algorithm>
std::reverse(str.begin(), str.end());
This is the simplest way in C++.
Read Kernighan and Ritchie
#include <string.h>
void reverse(char s[])
{
int length = strlen(s) ;
int c, i, j;
for (i = 0, j = length - 1; i < j; i++, j--)
{
c = s[i];
s[i] = s[j];
s[j] = c;
}
}
man ascii
–
Soso s
must be declared in an array form. In other words, char s[] = "this is ok"
rather than char *s="cannot do this"
because the latter results in a string constant which cannot be modified –
Soso length, i, j
should be size_t
, but then would have trouble when length == 0
. –
Dorweiler The standard algorithm is to use pointers to the start / end, and walk them inward until they meet or cross in the middle. Swap as you go.
Reverse ASCII string, i.e. a 0-terminated array where every character fits in 1 char
. (Or other non-multibyte character sets).
void strrev(char *head)
{
if (!head) return;
char *tail = head;
while(*tail) ++tail; // find the 0 terminator, like head+strlen
--tail; // tail points to the last real char
// head still points to the first
for( ; head < tail; ++head, --tail) {
// walk pointers inwards until they meet or cross in the middle
char h = *head, t = *tail;
*head = t; // swapping as we go
*tail = h;
}
}
// test program that reverses its args
#include <stdio.h>
int main(int argc, char **argv)
{
do {
printf("%s ", argv[argc-1]);
strrev(argv[argc-1]);
printf("%s\n", argv[argc-1]);
} while(--argc);
return 0;
}
The same algorithm works for integer arrays with known length, just use tail = start + length - 1
instead of the end-finding loop.
(Editor's note: this answer originally used XOR-swap for this simple version, too. Fixed for the benefit of future readers of this popular question. XOR-swap is highly not recommended; hard to read and making your code compile less efficiently. You can see on the Godbolt compiler explorer how much more complicated the asm loop body is when xor-swap is compiled for x86-64 with gcc -O3.)
Ok, fine, let's fix the UTF-8 chars...
(This is XOR-swap thing. Take care to note that you must avoid swapping with self, because if *p
and *q
are the same location you'll zero it with a^a==0. XOR-swap depends on having two distinct locations, using them each as temporary storage.)
Editor's note: you can replace SWP with a safe inline function using a tmp variable.
#include <bits/types.h>
#include <stdio.h>
#define SWP(x,y) (x^=y, y^=x, x^=y)
void strrev(char *p)
{
char *q = p;
while(q && *q) ++q; /* find eos */
for(--q; p < q; ++p, --q) SWP(*p, *q);
}
void strrev_utf8(char *p)
{
char *q = p;
strrev(p); /* call base case */
/* Ok, now fix bass-ackwards UTF chars. */
while(q && *q) ++q; /* find eos */
while(p < --q)
switch( (*q & 0xF0) >> 4 ) {
case 0xF: /* U+010000-U+10FFFF: four bytes. */
SWP(*(q-0), *(q-3));
SWP(*(q-1), *(q-2));
q -= 3;
break;
case 0xE: /* U+000800-U+00FFFF: three bytes. */
SWP(*(q-0), *(q-2));
q -= 2;
break;
case 0xC: /* fall-through */
case 0xD: /* U+000080-U+0007FF: two bytes. */
SWP(*(q-0), *(q-1));
q--;
break;
}
}
int main(int argc, char **argv)
{
do {
printf("%s ", argv[argc-1]);
strrev_utf8(argv[argc-1]);
printf("%s\n", argv[argc-1]);
} while(--argc);
return 0;
}
- Why, yes, if the input is borked, this will cheerfully swap outside the place.
- Useful link when vandalising in the UNICODE: http://www.macchiato.com/unicode/chart/
- Also, UTF-8 over 0x10000 is untested (as I don't seem to have any font for it, nor the patience to use a hexeditor)
Examples:
$ ./strrev Räksmörgås ░▒▓○◔◑◕●
░▒▓○◔◑◕● ●◕◑◔○▓▒░
Räksmörgås sågrömskäR
./strrev verrts/.
log n
was applied there in a little bit different context. –
Tara while(q && *q) ++q;
to find the eos? Isn't it the same as while(*q) ++q;
, since q is never going to be 0? How can you be sure that q or *q are going to be 0? –
Knitted a^a==0
" - wrong. a ^ a == 0
, but that's not a problem, because then you will do a ^ (a ^ a)
which is a ^ 0
which is a
. So the XOR swap works even if the two swappees (is there such a word) are equal. –
Felicafelicdad char *q = p; while(q && *q) ++q; for(--q; p < q; ++p, --q)
to do NULL
detection is problematic. For the code progress to --q
when q
is NULL and at best that is the largest legal pointer, at worst it may be UB. Then the for()
loop would iterate for a long time. Suggest a simple if (q != NULL) { do the rest of code }
. –
Dorweiler while(q && *q) ++q;
why not just use strchr(q, '\0');
? Probably doesn't matter but it seems more natural. Of course there are other ways too. Maybe you just didn't want to include string.h even. –
Dilley Non-evil C, assuming the common case where the string is a null-terminated char
array:
#include <stddef.h>
#include <string.h>
/* PRE: str must be either NULL or a pointer to a
* (possibly empty) null-terminated string. */
void strrev(char *str) {
char temp, *end_ptr;
/* If str is NULL or empty, do nothing */
if( str == NULL || !(*str) )
return;
end_ptr = str + strlen(str) - 1;
/* Swap the chars */
while( end_ptr > str ) {
temp = *str;
*str = *end_ptr;
*end_ptr = temp;
str++;
end_ptr--;
}
}
int temp
, this solution looks best. +1 –
Dorweiler !(*str)
though. –
Dilley strchr()
searching for '\0'
. I thought of both. Not need for a loop. –
Dilley It's been a while and I don't remember which book taught me this algorithm, but I thought it was quite ingenious and simple to understand:
char input[] = "moc.wolfrevokcats";
int length = strlen(input);
int last_pos = length-1;
for(int i = 0; i < length/2; i++)
{
char tmp = input[i];
input[i] = input[last_pos - i];
input[last_pos - i] = tmp;
}
printf("%s\n", input);
A visualization of this algorithm, courtesy of slashdottir:
size_t
though. –
Dilley Note that the beauty of std::reverse is that it works with char *
strings and std::wstring
s just as well as std::string
s
void strrev(char *str)
{
if (str == NULL)
return;
std::reverse(str, str + strlen(str));
}
In the interest of completeness, it should be pointed out that there are representations of strings on various platforms in which the number of bytes per character varies depending on the character. Old-school programmers would refer to this as DBCS (Double Byte Character Set). Modern programmers more commonly encounter this in UTF-8 (as well as UTF-16 and others). There are other such encodings as well.
In any of these variable-width encoding schemes, the simple algorithms posted here (evil, non-evil or otherwise) would not work correctly at all! In fact, they could even cause the string to become illegible or even an illegal string in that encoding scheme. See Juan Pablo Califano's answer for some good examples.
std::reverse() potentially would still work in this case, as long as your platform's implementation of the Standard C++ Library (in particular, string iterators) properly took this into account.
If you're looking for reversing NULL terminated buffers, most solutions posted here are OK. But, as Tim Farley already pointed out, these algorithms will work only if it's valid to assume that a string is semantically an array of bytes (i.e. single-byte strings), which is a wrong assumption, I think.
Take for example, the string "año" (year in Spanish).
The Unicode code points are 0x61, 0xf1, 0x6f.
Consider some of the most used encodings:
Latin1 / iso-8859-1 (single byte encoding, 1 character is 1 byte and vice versa):
Original:
0x61, 0xf1, 0x6f, 0x00
Reverse:
0x6f, 0xf1, 0x61, 0x00
The result is OK
UTF-8:
Original:
0x61, 0xc3, 0xb1, 0x6f, 0x00
Reverse:
0x6f, 0xb1, 0xc3, 0x61, 0x00
The result is gibberish and an illegal UTF-8 sequence
UTF-16 Big Endian:
Original:
0x00, 0x61, 0x00, 0xf1, 0x00, 0x6f, 0x00, 0x00
The first byte will be treated as a NUL-terminator. No reversing will take place.
UTF-16 Little Endian:
Original:
0x61, 0x00, 0xf1, 0x00, 0x6f, 0x00, 0x00, 0x00
The second byte will be treated as a NUL-terminator. The result will be 0x61, 0x00, a string containing the 'a' character.
Another C++ way (though I would probably use std::reverse() myself :) as being more expressive and faster)
str = std::string(str.rbegin(), str.rend());
The C way (more or less :) ) and please, be careful about XOR trick for swapping, compilers sometimes cannot optimize that.
In such case it is usually much slower.
char* reverse(char* s)
{
char* beg = s, *end = s, tmp;
while (*end) end++;
while (end-- > beg)
{
tmp = *beg;
*beg++ = *end;
*end = tmp;
}
return s;
} // fixed: check history for details, as those are interesting ones
strlen
to find the end of the string, if it's potentially long. A good library implementation will use SIMD vectors to search faster than 1 byte per iteration. But for very short strings, while (*++end);
will be done before a library function call gets started searching. –
Carine _mm_shuffle_epi8
(PSHUFB) can reverse the order of a 16B vector, given the right shuffle control vector. It can probably run at nearly memcpy speed with some careful optimization, especially with AVX2. –
Carine char* beg = s-1
has undefined behavior (at least if s
points to the first element of an array, which is the most common case). while (*++end);
has undefined behavior if s
is an empty string. –
Rhigolene s-1
has defined behavior even if s
points to the first element of an array, so it's you who should be able to cite the standard in support. –
Rhigolene --
operator. I never said anything about types. Pointer arithmetic in general is only valid if it moves the pointer within the same array (or one element past the end). See also c-faq.com/aryptr/non0based.html, which explicitly says what you're doing is invalid. –
Rhigolene #include <cstdio>
#include <cstdlib>
#include <string>
void strrev(char *str)
{
if( str == NULL )
return;
char *end_ptr = &str[strlen(str) - 1];
char temp;
while( end_ptr > str )
{
temp = *str;
*str++ = *end_ptr;
*end_ptr-- = temp;
}
}
int main(int argc, char *argv[])
{
char buffer[32];
strcpy(buffer, "testing");
strrev(buffer);
printf("%s\n", buffer);
strcpy(buffer, "a");
strrev(buffer);
printf("%s\n", buffer);
strcpy(buffer, "abc");
strrev(buffer);
printf("%s\n", buffer);
strcpy(buffer, "");
strrev(buffer);
printf("%s\n", buffer);
strrev(NULL);
return 0;
}
This code produces this output:
gnitset
a
cba
strncpy()
less safe. You don't know a better way than to recommend against this? Why not recommend that they actually check for errors etc. instead? That's far more valuable than blacklisting something because some people use it incorrectly. –
Dilley strcpy
as you say isn't the same thing as your earlier Don't use strcpy. Ever.. Big difference. Huge difference. Your earlier comment is black and white. Being safe is not the same thing. Black and white is rarely a good idea. Life isn't that simple. strncpy
can also create unterminated strings. Unsafe. –
Dilley In case you are using GLib, it has two functions for that, g_strreverse() and g_utf8_strreverse()
I like Evgeny's K&R answer. However, it is nice to see a version using pointers. Otherwise, it's essentially the same:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
char *reverse(char *str) {
if( str == NULL || !(*str) ) return NULL;
int i, j = strlen(str)-1;
char *sallocd;
sallocd = malloc(sizeof(char) * (j+1));
for(i=0; j>=0; i++, j--) {
*(sallocd+i) = *(str+j);
}
return sallocd;
}
int main(void) {
char *s = "a man a plan a canal panama";
char *sret = reverse(s);
printf("%s\n", reverse(sret));
free(sret);
return 0;
}
Recursive function to reverse a string in place (no extra buffer, malloc).
Short, sexy code. Bad, bad stack usage.
#include <stdio.h>
/* Store the each value and move to next char going down
* the stack. Assign value to start ptr and increment
* when coming back up the stack (return).
* Neat code, horrible stack usage.
*
* val - value of current pointer.
* s - start pointer
* n - next char pointer in string.
*/
char *reverse_r(char val, char *s, char *n)
{
if (*n)
s = reverse_r(*n, s, n+1);
*s = val;
return s+1;
}
/*
* expect the string to be passed as argv[1]
*/
int main(int argc, char *argv[])
{
char *aString;
if (argc < 2)
{
printf("Usage: RSIP <string>\n");
return 0;
}
aString = argv[1];
printf("String to reverse: %s\n", aString );
reverse_r(*aString, aString, aString+1);
printf("Reversed String: %s\n", aString );
return 0;
}
char
onto the stack does not count as "in place". Especially not when you're actually pushing 4 * 8B per character (on a 64-bit machine: 3 args + a return address). –
Carine If you are using ATL/MFC CString
, simply call CString::MakeReverse()
.
Yet another:
#include <stdio.h>
#include <strings.h>
int main(int argc, char **argv) {
char *reverse = argv[argc-1];
char *left = reverse;
int length = strlen(reverse);
char *right = reverse+length-1;
char temp;
while(right-left>=1){
temp=*left;
*left=*right;
*right=temp;
++left;
--right;
}
printf("%s\n", reverse);
}
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>
unsigned char * utf8_reverse(const unsigned char *, int);
void assert_true(bool);
int main(void)
{
unsigned char str[] = "mañana mañana";
unsigned char *ret = utf8_reverse(str, strlen((const char *) str) + 1);
printf("%s\n", ret);
assert_true(0 == strncmp((const char *) ret, "anãnam anañam", strlen("anãnam anañam") + 1));
free(ret);
return EXIT_SUCCESS;
}
unsigned char * utf8_reverse(const unsigned char *str, int size)
{
unsigned char *ret = calloc(size, sizeof(unsigned char*));
int ret_size = 0;
int pos = size - 2;
int char_size = 0;
if (str == NULL) {
fprintf(stderr, "failed to allocate memory.\n");
exit(EXIT_FAILURE);
}
while (pos > -1) {
if (str[pos] < 0x80) {
char_size = 1;
} else if (pos > 0 && str[pos - 1] > 0xC1 && str[pos - 1] < 0xE0) {
char_size = 2;
} else if (pos > 1 && str[pos - 2] > 0xDF && str[pos - 2] < 0xF0) {
char_size = 3;
} else if (pos > 2 && str[pos - 3] > 0xEF && str[pos - 3] < 0xF5) {
char_size = 4;
} else {
char_size = 1;
}
pos -= char_size;
memcpy(ret + ret_size, str + pos + 1, char_size);
ret_size += char_size;
}
ret[ret_size] = '\0';
return ret;
}
void assert_true(bool boolean)
{
puts(boolean == true ? "true" : "false");
}
C++ multi-byte UTF-8 reverser
My thought is that you can never just swap ends, you must always move from beginning-to-end, move through the string and look for "how many bytes will this character require?" I attach the character starting at the original end position, and remove the character from the front of the string.
void StringReverser(std::string *original)
{
int eos = original->length() - 1;
while (eos > 0) {
char c = (*original)[0];
int characterBytes;
switch( (c & 0xF0) >> 4 ) {
case 0xC:
case 0xD: /* U+000080-U+0007FF: two bytes. */
characterBytes = 2;
break;
case 0xE: /* U+000800-U+00FFFF: three bytes. */
characterBytes = 3;
break;
case 0xF: /* U+010000-U+10FFFF: four bytes. */
characterBytes = 4;
break;
default:
characterBytes = 1;
break;
}
for (int i = 0; i < characterBytes; i++) {
original->insert(eos+i, 1, (*original)[i]);
}
original->erase(0, characterBytes);
eos -= characterBytes;
}
}
void reverseString(vector<char>& s) {
int l = s.size();
char ch ;
int i = 0 ;
int j = l-1;
while(i < j){
s[i] = s[i]^s[j];
s[j] = s[i]^s[j];
s[i] = s[i]^s[j];
i++;
j--;
}
for(char c : s)
cout <<c ;
cout<< endl;
}
input string, return string, No other library required
std::string reverse_string(std::string &str)
{
const char*buf = str.c_str();
char *start = const_cast<char*>(buf);
char *end = start + strlen(buf) - 1;
char t;
while(start < end)
{
t = *start;
*start = *end;
*end = t;
start ++;
end --;
}
str = buf;
return str;
}
std::string md1 = "abcdefghijklmnopqrstuvwxyz0123456789";
std::cout << reverse_string(md1) << std::endl;
//9876543210zyxwvutsrqponmlkjihgfedcba
If you don't need to store it, you can reduce the time spent like this:
void showReverse(char s[], int length)
{
printf("Reversed String without storing is ");
//could use another variable to test for length, keeping length whole.
//assumes contiguous memory
for (; length > 0; length--)
{
printf("%c", *(s+ length-1) );
}
printf("\n");
}
In C++ the reverse can be done in a function:
#include <algorithm>
#include <string>
void backwards(vector<string> &inputs_ref) {
for (auto i = inputs_ref.begin(); i != inputs_ref.end(); ++i) {
reverse(i->begin(), i->end());
}
}
© 2022 - 2024 — McMap. All rights reserved.