unix tr find and replace
Asked Answered
S

4

16

This is the command I'm using on a standard web page I wget from a web site.

tr '<' '\n<' < index.html

however it giving me newlines, but not adding the left broket in again. e.g.

echo "<hello><world>" | tr '<' '\n<' | cat -e

returns

$
hello>$
world>$

instead of

$
<hello>$
<world>$

What's wrong?

Sibeal answered 1/12, 2011 at 23:19 Comment(0)
H
34

That's because tr only does character-for-character substitution (or deletion).

Try sed instead.

echo '<hello><world>' | sed -e 's/</\n&/g'

Or awk.

echo '<hello><world>' | awk '{gsub(/</,"\n<",$0)}1'

Or perl.

echo '<hello><world>' | perl -pe 's/</\n</g'

Or ruby.

echo '<hello><world>' | ruby -pe '$_.gsub!(/</,"\n<")'

Or python.

echo '<hello><world>' \
| python -c 'for l in __import__("fileinput").input():print l.replace("<","\n<")'
Heartily answered 1/12, 2011 at 23:23 Comment(7)
I tried that but I get n<hello>n<world>. I don't know what the sed newline character isSibeal
@Sibeal This works for me but try: echo -e '<hello><world>' | sed -e 's/</\n&/g'Revisory
@Sibeal \n is a GNU sed extension. What system are you on?Heartily
@Heartily SunOS (afs system on my campus)Sibeal
On SunOS you will have to put the new line manually. In substitution field, hit enter and continue with your replacement stuff. For tab you will have to manually hit spaces (8 times) or whatever is the default tab limit on your machine.Chilpancingo
@Jaypal A string of 8 spaces does not equal a tab; you need a literal tab character. The 8-space thing is about tab stops, not tabs.Nickeliferous
Use perl when you are on an unspecified Unix machine. Using sed or tr on those machines can reveal they don't support expected features.Underprop
N
3

If you have GNU grep, this may work for you:

grep -Po '<.*?>[^<]*' index.html

which should pass through all of the HTML, but each tag should start at the beginning of the line with possible non-tag text following on the same line.

If you want nothing but tags:

grep -Po '<.*?>' index.html

You should know, however, that it's not a good idea to parse HTML with regexes.

Nonexistence answered 4/12, 2011 at 6:30 Comment(0)
R
3

The order of where you put your newline is important. Also you can escape the "<".

tr '<' '<\n' < index.html

works as well.

Rubenstein answered 3/10, 2013 at 21:27 Comment(0)
C
2

Does this work for you?

awk -F"><" -v OFS=">\n<" '{print $1,$2}'

[jaypal:~/Temp] echo "<hello><world>" | awk -F"><" -v OFS=">\n<" '{$1=$1}1';
<hello>
<world>

You can put a regex / / (lines you want this to happen for) in front of the awk {} action.

Chilpancingo answered 1/12, 2011 at 23:38 Comment(2)
'{$1=$1}1' is shorter and will work if there is more than >< on a line.Heartily
This would replace fewer of the < characters than in the question.Nickeliferous

© 2022 - 2024 — McMap. All rights reserved.