P

8

30

How can I add spaces between every character or symbol within a UTF-8 document? E.g. 123hello! becomes 1 2 3 h e l l o !.

I have BASH, OpenOffice.org, and gedit, if any of those can do that.
I don't care if it sometimes leaves extra spaces in places (e.g. 2 or 3 spaces in a single place is no problem).

Pinnatifid answered 2/1, 2012 at 1:26 Comment(0)

D

20

sed(1) can do this:

$ sed -e 's/\(.\)/\1 /g' < /etc/passwd
r o o t : x : 0 : 0 : r o o t : / r o o t : / b i n / b a s h 
d a e m o n : x : 1 : 1 : d a e m o n : / u s r / s b i n : / b i n / s h

It works well on e.g. UTF-8 encoded Japanese content:

$ file japanese 
japanese: UTF-8 Unicode text
$ sed -e 's/\(.\)/\1 /g' < japanese 
E X I F 中 の 画 像 回 転 情 報 対 応 に よ り 、 一 部 画 像 （ 特 に 『 
$

Deliverance answered 2/1, 2012 at 1:30 Comment(6)

Isn't this simpler sed 's/./& /g' infile? – Windsail 2/1, 2012 at 1:48

@JaypalSingh lol, jinx. I think I submitted my sed version 10 seconds before your comment – Vibrissa 2/1, 2012 at 1:49

@Jaypal: ah, yes, it is; my fault for not knowing which specific tools support & and which ones require the more general \n matching syntax. – Deliverance 2/1, 2012 at 1:52

Is there anyway to not have a trailing space? – Tjaden 3/11, 2019 at 21:48

@JerryJeremiah, the easiest way I know is to use another sed: sed -e 's/$.$/\1 /g' < /etc/passwd | sed -e 's/ $//' -- probably this could be done another way in a single pass, but this works okay. – Deliverance 4/11, 2019 at 18:42

Or both in the same sed: sed -e 's/$.$/\1 /g;s/ $//' < /etc/passwd – Tjaden 5/11, 2019 at 0:33

V

46

Shortest sed version

sed 's/./& /g'

Output

$ echo '123hello!' |  sed 's/./& /g'
1 2 3 h e l l o !

Obligatory awk version

awk '$1=$1' FS= OFS=" "

Output

$ echo '123hello!' |  awk '$1=$1' FS= OFS=" "
1 2 3 h e l l o !

Vibrissa answered 2/1, 2012 at 1:41 Comment(4)

LOL .. I thought may be this might not work on UTF8 or other encoding but it appears to work just fine. +1 :) – Windsail 2/1, 2012 at 1:53

Is there a way to also make the string uppercase in the same awk command? – Mvd 22/8, 2018 at 17:14

@EricWolf I would suggest you use sed 's/./\U& /g' for that – Vibrissa 23/8, 2018 at 21:18

@Vibrissa I actually figured it out since asking, echo $MESSAGE | awk '$1=$1 {print toupper($0)}' FS= OFS=" "); – Mvd 24/8, 2018 at 3:51

D

20

sed(1) can do this:

$ sed -e 's/\(.\)/\1 /g' < /etc/passwd
r o o t : x : 0 : 0 : r o o t : / r o o t : / b i n / b a s h 
d a e m o n : x : 1 : 1 : d a e m o n : / u s r / s b i n : / b i n / s h

It works well on e.g. UTF-8 encoded Japanese content:

$ file japanese 
japanese: UTF-8 Unicode text
$ sed -e 's/\(.\)/\1 /g' < japanese 
E X I F 中 の 画 像 回 転 情 報 対 応 に よ り 、 一 部 画 像 （ 特 に 『 
$

Deliverance answered 2/1, 2012 at 1:30 Comment(6)

Isn't this simpler sed 's/./& /g' infile? – Windsail 2/1, 2012 at 1:48

@JaypalSingh lol, jinx. I think I submitted my sed version 10 seconds before your comment – Vibrissa 2/1, 2012 at 1:49

@Jaypal: ah, yes, it is; my fault for not knowing which specific tools support & and which ones require the more general \n matching syntax. – Deliverance 2/1, 2012 at 1:52

Is there anyway to not have a trailing space? – Tjaden 3/11, 2019 at 21:48

@JerryJeremiah, the easiest way I know is to use another sed: sed -e 's/$.$/\1 /g' < /etc/passwd | sed -e 's/ $//' -- probably this could be done another way in a single pass, but this works okay. – Deliverance 4/11, 2019 at 18:42

Or both in the same sed: sed -e 's/$.$/\1 /g;s/ $//' < /etc/passwd – Tjaden 5/11, 2019 at 0:33

L

8

sed is ok but this is pure bash

string=hello
for ((i=0; i<${#string}; i++)); do
    string_new+="${string:$i:1} "
done

Loella answered 13/9, 2015 at 0:25 Comment(1)

This generates a trailing space. – Broody 23/11, 2016 at 22:22

E

7

Since you have bash, I am will assume that you have access to sed. The following command line will do what you wish.

$ sed -e 's:\(.\):\1 :g' < input.txt > output.txt

Edouard answered 2/1, 2012 at 1:31 Comment(0)

P

4

This might work for you:

echo '1 23h ello  !   ' |  sed 's/\s*/ /g;s/^\s*\(.*\S\)\s*$/\1/;l'
1 2 3 h e l l o !$
1 2 3 h e l l o !

In retrospect a far better solution:

sed 's/\B/ /g' file

Replaces the space between letters with a space.

Pastiche answered 2/1, 2012 at 5:24 Comment(4)

+1 especially for the completely different interpretation of the problem than the rest of us. :) – Deliverance 4/1, 2012 at 0:33

echo "hello" | sed 's/\B/ /g' outputs h el lo, not h e l l o. Using sed from BusyBox v1.36.1. – Collencollenchyma 9/5 at 23:56

@GrantGryczan that sounds like a bug in BusyBox v1.36.1. However I always use GNU sed. – Pastiche 10/5 at 6:24

Yeah, I'm sure it is; I thought this was a great idea and was very surprised when it didn't work. But still worth noting for all the Alpine Linux minimalists out there like me. – Collencollenchyma 10/5 at 6:27

W

4

I like these solutions because they do not have a trailing space like the rest here.

GNU awk:

echo 123hello! | awk NF=NF FS=

GNU awk:

echo 123hello! | awk NF=NF FPAT=.

POSIX awk:

echo 123hello! | awk '{while(a=substr($0,++b,1))printf b-1?FS a:a}'

Wolfie answered 3/1, 2016 at 18:39 Comment(0)

A

0

string='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
echo ${string} | sed -r 's/(.{1})/\1 /g'

Anchylose answered 17/1, 2022 at 18:53 Comment(0)

C

0

Pure POSIX Shell version:

addspace() {
  __addspace_str="$1"
  while [ -n "${__addspace_str#?}" ]; do
    printf '%c ' "$__addspace_str"
    __addspace_str="${__addspace_str#?}"
  done
  printf '%c' "$__addspace_str"
}

Or if you need to put it in a variable:

addspace_var() {
  addspace_result=""
  __addspace_str="$1"
  while [ -n "${__addspace_str#?}" ]; do
    addspace_result="$addspace_result${__addspace_str%${__addspace_str#?}} "
    __addspace_str="${__addspace_str#?}"
  done
  addspace_result="$addspace_result$__addspace_str"
}

addspace_var abc
echo "$addspace_result"

Tested with dash, ksh, zsh, bash (+ bash --posix), and busybox ash.

Explanation

${x#?}

This parameter expansion removes the first character of x. ${x#...} in general removes a prefix given by a pattern, and ? matches any single character.

printf '%c ' "$str"

The %c format parameter transforms the string argument into its first character, so the full format string '%c ' prints the first character of the string followed by a space. Note that if the string was empty this would cause issues, but we already checked that it wasn't before, so it's fine. To print the first character safely in any situation we can use '%.1s', but I like living dangerously :3j

${x%${x#?}}

This is an alternate way to get the first character of the string. We already know that ${x#?} is all but the first character. Well, ${x%...} removes ... from the end of x, so ${x%${x#?}} removes all but the first character from the end of x, leaving only the first one.

__prefixed_variable_names

POSIX doesn't define local, so to avoid variable conflicts it's safer to create unique names that are unlikely to clobber each other. I am starting to experiment using M4 to generate unique names while not having to destroy my code every time but it's probably overkill for people who don't use shell as much as me.

[ -n "${str#?}" ]

Why not just [ -n "$str" ]? It's to avoid the dreaded trailing space, it's also why we have a little statement guy at the bottom there outside the loop. The loops goes until the string is one character long, then we finish outside of it so we can append this last character without adding a space.

When should I use this?

This is good for small inputs in long running loops, since it avoids the overhead of calling an external process, but for larger inputs it starts lagging behind fast, specially the var version. (I fault the ${x%${x#?}} trick).

Benchmark Commands

# addspace
time dash -c ". ./addspace.sh; for x in $(seq -s ' ' 1 10000); do addspace \"$input\" >/dev/null; done"
# addspace_var
time dash -c ". ./addspace.sh; for x in $(seq -s ' ' 1 10000); do addspace_var \"$input\" >/dev/null; done"
# sed for comparison
time dash -c ". ./addspace.sh; for x in $(seq -s ' ' 1 10000); do echo \"$input\" |  sed 's/./& /g' >/dev/null; done"

Input Length = 3

      addspace     addspace_var  sed  

real  0m0,106s     0m0,106s      0m10,651s
user  0m0,077s     0m0,075s      0m9,349s
sys   0m0,029s     0m0,031s      0m3,030s

Input Length = 200

      addspace     addspace_var  sed  

real  0m6,050s     0m47,115s     0m11,049s
user  0m5,557s     0m46,919s     0m9,727s
sys   0m0,488s     0m0,068s      0m3,085s

Input Length = 1000

      addspace     addspace_var  sed  

real  0m55,989s    TBD           0m11,534s           
user  0m53,560s    TBD           0m10,214s
sys   0m2,428s     TBD           0m2,975s

(Yeah, I was waiting a bit for that last var one.)

In situations like this you can simply check the length of the input and call the appropriate function for maximum performance.

addspace() {
  if [ ${#1} -lt 100 ]; then
    addspace_builtins "$1"
  else
    addspace_proccess "$1"
  fi
}

Cockspur answered 5/8, 2022 at 23:52 Comment(0)

Output

Output

Explanation

When should I use this?

Benchmark Commands

Input Length = 3

Input Length = 200

Input Length = 1000

Recommended topics

Hot tags