In a bash script, how do I sanitize user input?
Asked Answered
R

5

54

I'm looking for the best way to take a simple input:

echo -n "Enter a string here: "
read -e STRING

and clean it up by removing non-alphanumeric characters, lower(case), and replacing spaces with underscores.

Does order matter? Is tr the best / only way to go about this?

Rale answered 18/9, 2008 at 2:56 Comment(0)
O
57

As dj_segfault points out, the shell can do most of this for you. Looks like you'll have to fall back on something external for lower-casing the string, though. For this you have many options, like the perl one-liners above, etc., but I think tr is probably the simplest.

# first, strip underscores
CLEAN=${STRING//_/}
# next, replace spaces with underscores
CLEAN=${CLEAN// /_}
# now, clean out anything that's not alphanumeric or an underscore
CLEAN=${CLEAN//[^a-zA-Z0-9_]/}
# finally, lowercase with TR
CLEAN=`echo -n $CLEAN | tr A-Z a-z`

The order here is somewhat important. We want to get rid of underscores, plus replace spaces with underscores, so we have to be sure to strip underscores first. By waiting to pass things to tr until the end, we know we have only alphanumeric and underscores, and we can be sure we have no spaces, so we don't have to worry about special characters being interpreted by the shell.

Oversight answered 18/9, 2008 at 17:4 Comment(5)
Note to reader: If you are having trouble making this work, check your shebang to see if you're calling bash or sh, and how your system interprets 'sh'.Threewheeler
As of Bash 4, it can do case modification also. lowercase=${CLEAN,,} Bash Hackers Wiki explains parameter expansions in a more human-readable way than man pages.Publishing
Nice work. I wasn't previously aware of these shell features. Thanks! I just discovered that zsh allows you to actually nest all of these, so you can do it in one line: echo -n ${${${str//_/}// /_}//[^a-zA-Z0-9_]/} | tr A-Z a-z ..not that I would recommend putting something that incomprehensible in a script. :) (edit: formatting)Dallasdalli
very nice. It may need also a : LC_ALL=C before all the a-z A-Z invocations to be sure it doesn't leave any weird things (depending on your locale, or someone else's locale, a-z, A-Z, and maybe even 0-9 can mean a lot of different things...)Aromatize
you can also always declare -l foo and it will downcase anything put in it, just as -u will declare it forced to uppercase.Springtime
U
46

Bash can do this all on it's own, thank you very much. If you look at the section of the man page on Parameter Expansion, you'll see that that bash has built-in substitutions, substring, trim, rtrim, etc.

To eliminate all non-alphanumeric characters, do

CLEANSTRING=${STRING//[^a-zA-Z0-9]/}

That's Occam's razor. No need to launch another process.

Unlikelihood answered 18/9, 2008 at 4:18 Comment(2)
Well put, great answer. I was using parameter expansion without even realizing it.Rale
It is a good answer for a subset of the specifications, but it doesn't change spaces to underscores.Chapin
T
4

For Bash >= 4.0:

CLEAN="${STRING//_/}" && \
CLEAN="${CLEAN// /_}" && \
CLEAN="${CLEAN//[^a-zA-Z0-9]/}" && \
CLEAN="${CLEAN,,}"

This is especially useful for creating container names programmatically using docker/podman. However, in this case you'll also want to remove the underscores:

# Sanitize $STRING for a container name
CLEAN="${STRING//[^a-zA-Z0-9]/}" && \
CLEAN="${CLEAN,,}"
Tetracaine answered 26/1, 2020 at 14:43 Comment(0)
R
0

After a bit of looking around it seems tr is indeed the simplest way:

export CLEANSTRING="`echo -n "${STRING}" | tr -cd '[:alnum:] [:space:]' | tr '[:space:]' '-'  | tr '[:upper:]' '[:lower:]'`"

Occam's razor, I suppose.

Rale answered 18/9, 2008 at 2:56 Comment(1)
if you set the STRING=$(rm /tmp/*), if you echo the $STRING before cleaning, it will execute the sub-shell and remove your /tmp/ content... so you need to sanitize it BEFORE any echo is doneAnabasis
C
-1

You could run it through perl.

export CLEANSTRING=$(perl -e 'print join( q//, map { s/\\s+/_/g; lc } split /[^\\s\\w]+/, \$ENV{STRING} )')

I'm using ksh-style subshell here, I'm not totally sure that it works in bash.

That's the nice thing about shell, is that you can use perl, awk, sed, grep....

Chapin answered 18/9, 2008 at 3:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.