I have file with a lot of text, what I want to do is to remove all alphanumeric words.
Example of words to be removed:
gr8
2006
sdlfj435ljsa
232asa
asld213
ladj2343asda
asd!32
what is the best way I can do this?
I have file with a lot of text, what I want to do is to remove all alphanumeric words.
Example of words to be removed:
gr8
2006
sdlfj435ljsa
232asa
asld213
ladj2343asda
asd!32
what is the best way I can do this?
If you want to remove all words that consist of letters and digits, leaving only words that consist of all digits or all letters:
sed 's/\([[:alpha:]]\+[[:digit:]]\+[[:alnum:]]*\|[[:digit:]]\+[[:alpha:]]\+[[:alnum:]]*\) \?//g' inputfile
Example:
$ echo 'abc def ghi 111 222 ab3 a34 43a a34a 4ab3' | sed 's/\<\([[:alpha:]]\+[[:digit:]]\+[[:alnum:]]*\|[[:digit:]]\+[[:alpha:]]\+[[:alnum:]]*\) \?//g'
abc def ghi 111 222
Assuming the only output you wanted from your sample text is 2006
and you have one word per line:
sed '/[[:alpha:]]\+/{/[[:digit:]]\+/d}' /path/to/alnum/file
$ cat alnum
gr8
2006
sdlFj435ljsa
232asa
asld213
ladj2343asda
asd!32
alpha
$ sed '/[[:alpha:]]\+/{/[[:digit:]]\+/d}' ./alnum
2006
alpha
If the goal is actually to remove all alphanumeric words (strings consisting entirely of letters and digits) then this sed
command will work. It replaces all alphanumeric strings with nothing.
sed 's/[[:alnum:]]*//g' < inputfile
Note that other character classes besides alnum
are also available (see man 7 regex
).
For your given example data, this leaves only 6 blank lines and a single !
(since that is the only non-alphanumeric character in the example data). Is this actually what you're trying to do?
AWK solution:
BEGIN { # Statement that will be executed once at the beginning.
FS="[ \t]" # Set space and tab characters to be treated as word separator.
}
# Code below will execute for each line in file.
{
x=1 # Set initial word index to 1 (0 is the original string in array)
fw=1 # Indicate that future matched word is a first word. This is needed to put newline and spaces correctly.
while ( x<=NF )
{
gsub(/[ \t]*/,"",$x) # Strip word. Remove any leading and trailing white-spaces.
if (!match($x,"^[A-Za-z0-9]*$")) # Print word only if it does not match pure alphanumeric set of characters.
{
if (fw == 0)
{
printf (" %s", $x) # Print the word offsetting it with space in case if this is not a first match.
}
else
{
printf ("%s", $x) # Print word as is...
fw=0 # ...and indicate that future matches are not first occurrences
}
}
x++ # Increase word index number.
}
if (fw == 0) # Print newline only if we had matched some words and printed something.
{
printf ("\n")
}
}
Assuming you have this script in script.awk' and data in
data.txt, you have to invoke
awk` like this:
awk -f ./test.awk ./data.txt
For your file it will produce:
asd!32
For more complex cases like this:
gr8
2006
sdlfj435ljsa
232asa he!he lol
asld213 f
ladj2343asda
asd!32 ab acd!s
... it will produce this:
he!he
asd!32 acd!s
Hope it helps. Good luck!
© 2022 - 2024 — McMap. All rights reserved.
;/^$/d'
command would clean up the output. For examplesed '/[[:alpha:]]\+/{/[[:digit:]]\+/s/.*//g}' alnum
would return2006
andalpha
on single lines – Linkboy