How to replace whitespaces with underscore?

Asked 17/6, 2009 at 14:41 Answered 28/11, 2021 at 6:25

312

I want to replace whitespace with underscore in a string to create nice URLs. So that for example:

"This should be connected"

Should become

"This_should_be_connected"

I am using Python with Django. Can this be solved using regular expressions?

Iterate answered 17/6, 2009 at 14:41 Comment(1)

How can this this be achieved in django template. Is there any way to remove white spaces. Is there any built in tag/filter to do this? Note: slugify doesn't gives the desired output. – Gitagitel 12/3, 2012 at 17:52

526

You don't need regular expressions. Python has a built-in string method that does what you need:

mystring.replace(" ", "_")

Epigraph answered 17/6, 2009 at 14:44 Comment(8)

This doesn't work with other whitespace characters, such as \t or a non-breaking space. – Impala 17/6, 2009 at 15:49

Yes you are correct, but for the purpose of the question asked, it doesn't seem necessary to take those other spaces into account. – Epigraph 17/6, 2009 at 17:39

do I need to import anything for this to work? I get the following error: AttributeError: 'builtin_function_or_method' object has no attribute 'replace' – Scatology 31/10, 2012 at 2:23

Probably the variable that you called replace on, was not a string type. – Stithy 10/8, 2015 at 9:10

This answer could be confusing, better write it as mystring = mystring.replace(" ", "_") as it doesn't directly alter the string but rather returns a changed version. – Neely 2/11, 2018 at 9:37

Thanks, This is working. Would be better if written like: mystring = mystring.replace(" ", "_") – Tennilletennis 9/10, 2019 at 11:15

does not work with non-breaking spaces, use re.sub(r"\s+", '', content) instead – Winny 9/11, 2019 at 16:44

Whitespace >>>> literal space character. The question title explicitly states "whitespace"; the question content then reiterates that. As @Winny suggests, re.sub(r'\s+', '_', content) is the canonical solution. And yet again the best answer is a comment with two upvotes. </facepalm> – Voiceless 31/3, 2022 at 5:24

113

Replacing spaces is fine, but I might suggest going a little further to handle other URL-hostile characters like question marks, apostrophes, exclamation points, etc.

Also note that the general consensus among SEO experts is that dashes are preferred to underscores in URLs.

import re

def urlify(s):

    # Remove all non-word characters (everything except numbers and letters)
    s = re.sub(r"[^\w\s]", '', s)

    # Replace all runs of whitespace with a single dash
    s = re.sub(r"\s+", '-', s)

    return s

# Prints: I-cant-get-no-satisfaction"
print(urlify("I can't get no satisfaction!"))

Pagoda answered 17/6, 2009 at 15:3 Comment(7)

This is interesting. I will definitely use this advice. – Iterate 17/6, 2009 at 15:8

Remember to urllib.quote() the output of your urlify() - what if s contains something non-ascii? – Eisk 19/6, 2009 at 7:17

This is nice - but the first RE with \W will also remove whitespace with the result that the subsequent RE has nothing to replace... If you want to replace your other characters with '-' between tokens have the first RE replace with a single space as indicated - i.e. s = re.sub(r"\W", '&nbsp', s) (this may be a shonky formatting issue on StackOverflow: meta.stackexchange.com/questions/105507/…) – Browse 12/6, 2012 at 15:45

@TimTheEnchanter - good catch. Fixed. What is the airspeed velocity of an unladen swallow? – Pagoda 12/6, 2012 at 17:0

@Triptych What do you mean? African or European swallow? – Browse 13/6, 2012 at 10:57

Another slight problem with this is you remove any preexisting hyphens in the url, so that if the user had attempted to clean the url string before uploading to be this-is-clean, it would be stripped to thisisclean. So s = re.sub(r'[^\w\s-]', '', s). Can go one step further and remove leading and trailing whitespace so that the filename doesn't end or start with a hyphen with s = re.sub(r'[^\w\s-]', '', s).strip() – Cartesian 17/7, 2012 at 5:12

re.sub('[%s\s]+' % '-', '-', s) this works for utf-8 chars – Uvular 29/5, 2019 at 9:58

This takes into account blank characters other than space and I think it's faster than using re module:

url = "_".join( title.split() )

Nicobarese answered 29/4, 2012 at 23:18 Comment(6)

More importantly it will work for any whitespace character or group of whitespace characters. – Frayne 8/5, 2013 at 15:17

This solution does not handle all whitespace characters. (e.g. \x8f) – Prophylactic 5/12, 2016 at 14:43

Good catch, @Lokal_Profil! The documentation doesn't specify which whitespace chars are taken into account. – Nicobarese 6/12, 2016 at 7:39

This solution will also not preserve repeat delimiters, as split() does not return empty items when using the default "split on whitespace" behavior. That is, if the input is "hello,(6 spaces here)world", this will result in "hello,_world" as output, rather than "hello,______world". – Cambric 4/12, 2018 at 14:29

regexp > split/join > replace – Overrule 17/3, 2022 at 12:23

This is especially helpful, if you want to replace any number of whitespace characters with 1 character. Like in "reducing all whitespace to 1 space" szenarios. Very handy to remove line breaks, tabs and multi-` ` etc. from a string. – Deuteronomy 3/5, 2023 at 13:28

Django has a 'slugify' function which does this, as well as other URL-friendly optimisations. It's hidden away in the defaultfilters module.

>>> from django.template.defaultfilters import slugify
>>> slugify("This should be connected")

this-should-be-connected

This isn't exactly the output you asked for, but IMO it's better for use in URLs.

Halyard answered 17/6, 2009 at 15:15 Comment(6)

That is an interesting option, but is this a matter of taste or what are the benefits of using hyphens instead of underscores. I just noticed that Stackoverflow uses hyphens like you suggest. But digg.com for example uses underscores. – Iterate 17/6, 2009 at 15:23

This happens to be the preferred option (AFAIK). Take your string, slugify it, store it in a SlugField, and make use of it in your model's get_absolute_url(). You can find examples on the net easily. – Byelorussian 17/6, 2009 at 16:13

@Lulu people use dashes because, for a long time, search engines treated dashes as word separators and so you'd get an easier time coming up in multi-word searches. – Theodore 19/6, 2009 at 20:8

@Daniel Roseman can I use this with dynamically variable. as I am getting dynamic websites as string in a veriable – Killer 22/8, 2017 at 7:30

This is the right answer. You need to sanitize your URLs. – Bartram 6/5, 2018 at 16:40

This doesn't work with utf-8 characters, I tested it with Arabic and it returned "" empty string – Uvular 29/5, 2019 at 9:51

Using the re module:

import re
re.sub('\s+', '_', "This should be connected") # This_should_be_connected
re.sub('\s+', '_', 'And     so\tshould this')  # And_so_should_this

Unless you have multiple spaces or other whitespace possibilities as above, you may just wish to use string.replace as others have suggested.

Joceline answered 17/6, 2009 at 14:45 Comment(2)

Thank you, this was exactly what I was asking for. But I agree, the "string.replace" seems more suitable for my task. – Iterate 17/6, 2009 at 14:59

PEP8 replace this '\s+' with this r'\s+'. Info: flake8rules.com/rules/W605.html – Jugum 18/10, 2021 at 16:58

use string's replace method:

"this should be connected".replace(" ", "_")

"this_should_be_disconnected".replace("_", " ")

Trstram answered 17/6, 2009 at 14:45 Comment(0)

You can try this instead:

mystring.replace(r' ','-')

Defy answered 6/5, 2018 at 15:28 Comment(2)

this should have been the correct answer not rogeriopvl response...how could he/she get 489 up vote when it does not even work ! – Sorghum 23/9, 2022 at 0:20

except it's not using underscore – Roybal 18/5, 2023 at 7:46

Python has a built in method on strings called replace which is used as so:

string.replace(old, new)

So you would use:

string.replace(" ", "_")

I had this problem a while ago and I wrote code to replace characters in a string. I have to start remembering to check the python documentation because they've got built in functions for everything.

Protomartyr answered 18/6, 2009 at 12:34 Comment(0)

Surprisingly this library not mentioned yet

python package named python-slugify, which does a pretty good job of slugifying:

pip install python-slugify

Works like this:

from slugify import slugify

txt = "This is a test ---"
r = slugify(txt)
self.assertEquals(r, "this-is-a-test")

txt = "This -- is a ## test ---"
r = slugify(txt)
self.assertEquals(r, "this-is-a-test")

txt = 'C\'est déjà l\'été.'
r = slugify(txt)
self.assertEquals(r, "cest-deja-lete")

txt = 'Nín hǎo. Wǒ shì zhōng guó rén'
r = slugify(txt)
self.assertEquals(r, "nin-hao-wo-shi-zhong-guo-ren")

txt = 'Компьютер'
r = slugify(txt)
self.assertEquals(r, "kompiuter")

txt = 'jaja---lol-méméméoo--a'
r = slugify(txt)
self.assertEquals(r, "jaja-lol-mememeoo-a")

Untouched answered 28/9, 2015 at 16:1 Comment(0)

I'm using the following piece of code for my friendly urls:

from unicodedata import normalize
from re import sub

def slugify(title):
    name = normalize('NFKD', title).encode('ascii', 'ignore').replace(' ', '-').lower()
    #remove `other` characters
    name = sub('[^a-zA-Z0-9_-]', '', name)
    #nomalize dashes
    name = sub('-+', '-', name)

    return name

It works fine with unicode characters as well.

Decrescendo answered 17/6, 2009 at 15:36 Comment(1)

Could you explain where this differs from the built-in Django slugify function? – Oakley 18/6, 2009 at 9:38

mystring.replace (" ", "_")

if you assign this value to any variable, it will work

s = mystring.replace (" ", "_")

by default mystring wont have this

Irvine answered 31/7, 2016 at 16:51 Comment(0)

OP is using python, but in javascript (something to be careful of since the syntaxes are similar.

// only replaces the first instance of ' ' with '_'
"one two three".replace(' ', '_'); 
=> "one_two three"

// replaces all instances of ' ' with '_'
"one two three".replace(/\s/g, '_');
=> "one_two_three"

Mostly answered 4/7, 2010 at 23:34 Comment(0)

x = re.sub("\s", "_", txt)

Lotze answered 28/11, 2021 at 6:25 Comment(4)

There are already 2 answers suggesting the match pattern \s+ for the replacement. Can you explain what are the differences/benefits of using your pattern (without the +)? – Johnsiejohnson 28/11, 2021 at 8:33

When you use \s,it will matches only for a single whitespace character.But when you use \s+,it will match one or more occurrence of white spaces. Example: text = "hi all" re.sub("\s", "_", text) re.sub("\s+", "_", text) Output for \s : hi__all Output for \s+ : hi_all With this example \s will mach only for one space.So when two white spaces is given consecutive on input,it will replace _ for each of them.On other hand \s+ will match for one space.So it will replace one _ for each consecutive white spaces. – Lotze 28/11, 2021 at 12:29

@AllenJose, all of that belongs in your answer, not in a comment. Comments can be deleted at any time for any reason. Please read How to Answer. – Crosslet 28/11, 2021 at 13:25

@AllenJose My comment was more to encourage you to edit your answer with that information. I personally know that difference but actually am not sure what is the benefit of your way. It seems to make more sense to replace multiple consecutive spaces with a single underscore. Again, this should be explained in your answer to help other readers make the distinction according to their needs – Johnsiejohnson 28/11, 2021 at 14:49

-4

perl -e 'map { $on=$_; s/ /_/; rename($on, $_) or warn $!; } <*>;'

Match et replace space > underscore of all files in current directory

Textbook answered 19/6, 2009 at 7:30 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags