Optional Whitespace Regex
Asked Answered
P

1

74

I'm having a problem trying to ignore whitespace in-between certain characters. I've been Googling around for a few days and can't seem to find the right solution.

Here's my code:

// Get Image data
preg_match('#<a href="(.*?)" title="(.*?)"><img alt="(.*?)" src="(.*?)"[\s*]width="150"[\s*]height="(.*?)"></a>#', $data, $imagematch);
$image = $imagematch[4];

Basically these are some of the scenarios I have:

 <a href="/wiki/File:Sky1.png" title="File:Sky1.png"><img alt="Sky1.png" src="http://media-mcw.cursecdn.com/thumb/5/56/Sky1.png/150px-Sky1.png"width="150" height="84"></a>

(Notice the lack of a space between width="" and src="")

And

<a href="/wiki/File:TallGrass.gif" title="File:TallGrass.gif"><img alt="TallGrass.gif" src="http://media-mcw.cursecdn.com/3/34/TallGrass.gif" width="150"height="150"></a>

(Notice the lack of a space in between width="" and height="".)

Is there anyway to ignore the whitespace in between those characters? As I am not a Regex expert.

Protasis answered 12/1, 2013 at 11:46 Comment(0)
E
171

Add a \s? if a space can be allowed.

\s stands for white space

? says the preceding character may occur once or not occur.

If more than one spaces are allowed and is optional, use \s*.

* says preceding character can occur zero or more times.

'#<a href\s?="(.*?)" title\s?="(.*?)"><img alt\s?="(.*?)" src\s?="(.*?)"[\s*]width\s?="150"[\s*]height\s?="(.*?)"></a>#'

allows an optional space between attribute name and =.

If you want an optional space after the = also, add a \s? after it also.

Likewise, wherever you have optional characters, you can use ? if the maximum occurrence is 1 or * if the maximum occurrence is unlimited, following the optional character.

And your actual problem was [\s*] which causes occurrence of a whitespace or a * as characters enclosed in [ and ] is a character class. A character class allows occurrence of any of its members once (so remove * from it) and if you append a quantifier (?, +, * etc) after the ] any character(s) in the character class can occur according to the quantifier.

Enlargement answered 12/1, 2013 at 11:49 Comment(3)
Thanks! I changed [\s*] to \s? and it works now! :) Thank you!Protasis
@Protasis \s? means 0 or 1 whitespace characters. However, what if there are more than 1 whitespace characters? You want \s* so it will match 0 or more. Btw you do not want to use regex to parse HTML. You want to use one of these methods.Discreditable
@naveed-s I'm having an issue with trailing space in named capturing but couldn't make it working can you please guide me on what I'm missing? Link to RegExp The word "contact" must include in the match searchTerm that's what I'm trying to achieve.Longinus

© 2022 - 2024 — McMap. All rights reserved.