Create URL friendly slug with pure bash?
Asked Answered
G

2

11

I am after a pure bash solution to "slugify" a variable and that is not as ugly as mine.

slugify: lowercased, shortened to 63 bytes, and with everything except 0-9 and a-z replaced with -. No leading / trailing -. A string suitable to use in URL hostnames and domain names is the result. An input is most likely a series of words with undesired characters in throughout such as:

'Effrafax_mUKwT'uP7(Garkbit<\1}@NJ"RJ"Hactar*S;-H%x.?oLazlarl(=Zss@c9?qick.:?BZarquonelW{x>g@'k'

Of which a slug would look like: 'effrafax-mukwt-up7-garkbit-1-njrjhactar-s-h-x-olazlarl-zss-c9-q'

slugify () {
  next=${1//+([^A-Za-z0-9])/-}
  next=${next:0:63}
  next=${next,,}
  next=${next#-}
  next=${next%-}
  echo $next
}

Also why doesn't ${next//^-|-$} strip the prefix and suffix '-'? Other suggestions?

Gramps answered 1/11, 2017 at 8:6 Comment(2)
Can you give one such sample URL to test on?Donndonna
posting the input entries and expected output will increase your chances to obtain a quick helpTerina
G
10

I'm using this function, in my bash profile:

slugify () {
    echo "$1" | iconv -t ascii//TRANSLIT | sed -r s/[~\^]+//g | sed -r s/[^a-zA-Z0-9]+/-/g | sed -r s/^-+\|-+$//g | tr A-Z a-z
}

Based on: https://gist.github.com/oneohthree/f528c7ae1e701ad990e6

Goodly answered 28/2, 2018 at 18:4 Comment(1)
Works great! Add -c to iconv to silently discard characters that cannot be converted. ss64.com/bash/iconv.htmlLiver
D
11

OS X and linux compatible variant of answer above

slugify () {
    echo "$1" | iconv -c -t ascii//TRANSLIT | sed -E 's/[~^]+//g' | sed -E 's/[^a-zA-Z0-9]+/-/g' | sed -E 's/^-+|-+$//g' | tr A-Z a-z
}
Damage answered 6/8, 2020 at 14:44 Comment(2)
Remember that you can combine multiple sed statements, like this:Concern
Exchaning "$1" with "$@" or "$*" will allow you to do $ slugify test test into test-test without the need of quotes ($ slugify "test test")Peavey
G
10

I'm using this function, in my bash profile:

slugify () {
    echo "$1" | iconv -t ascii//TRANSLIT | sed -r s/[~\^]+//g | sed -r s/[^a-zA-Z0-9]+/-/g | sed -r s/^-+\|-+$//g | tr A-Z a-z
}

Based on: https://gist.github.com/oneohthree/f528c7ae1e701ad990e6

Goodly answered 28/2, 2018 at 18:4 Comment(1)
Works great! Add -c to iconv to silently discard characters that cannot be converted. ss64.com/bash/iconv.htmlLiver

© 2022 - 2024 — McMap. All rights reserved.