Extract all hex colors from a multiline CSS string
Asked Answered
D

5

40

I'm trying to write regex that extracts all hex colors from CSS code.

This is what I have now:

Code:

$css = <<<CSS

/* Do not match me: #abcdefgh; I am longer than needed. */

.foo
{
    color: #cccaaa; background-color:#ababab;
}

#bar
{
    background-color:#123456
}
CSS;

preg_match_all('/#(?:[0-9a-fA-F]{6})/', $css, $matches);

Output:

Array
(
    [0] => Array
        (
            [0] => #abcdef
            [1] => #cccaaa
            [2] => #ababab
            [3] => #123456
        )

)

I don't know how to specify that only those colors are matched which ends with punctuation, whitespace or newline.

Dialectical answered 11/10, 2012 at 10:54 Comment(2)
Don't bother with regex. See @modu's answer. if(ctype_xdigit($color) && strlen($color)==6).Fillet
Make sure not to have the hash # inside the string $color, otherwise ctype_xdigit($color) returns false.Faythe
C
63

Since a hex color code may also consist of 3 characters, you can define a mandatory group and an optional group of letters and digits, so the long and elaborate notation would be:

/#([a-f]|[A-F]|[0-9]){3}(([a-f]|[A-F]|[0-9]){3})?\b/

Or if you want a nice and short version, you can say that you want either 1 or 2 groups of 3 alphanumeric characters, and that they should be matched case insensitively (/i).

/#([a-f0-9]{3}){1,2}\b/i

Instead of [a-f0-9] you can also write [[:xdigit:]], if the regex engine supports this posix character class. In this case you can skip the /i at the end, and the whole formula is only two characters more, but arguably more descriptive.

/#([[:xdigit:]]{3}){1,2}\b/
Clichy answered 11/10, 2012 at 10:58 Comment(8)
Thanks, \b is what was needed. Not sure why there is "?" though. Anyway, this works as needed: /#(?:[0-9a-fA-F]{6})\b/ Forgot to mention that 3 char codes is not needed.Dialectical
A question mark requires zero or one occurrences of the preceding, making the second captured group optional.Sharp
Those alternations are pointless. Here's a simpler version: /#([a-fA-F0-9]){3}(([a-fA-F0-9]){3})?\b/Staal
@HamZa I reverted your change. The extra explanation was nice, but you also changed the regex itself to a completely different one. If you want to make big changes like that, it's better to supply a separate answer than to completely rebuild the accepted answer (or any other answer, for that matter).Clichy
@Clichy I just stumped on this Q&A when I was searching for a duplicate. This one stood out, I was a bit scared when I saw this regex. I'm sure your regex skills has improved a lot in 2 years but there was a lot of redundant things so I decided to give it a polish. I know that it's a bit rude from my part doing this edit but with more than 2.5K view I really thought the accepted answer should look a bit more elegant. As a quick googler, I tend to scroll to the accepted answer first. Note that I don't make such big changes that often. It's quite rare. Sorry for the interruption.Evelineevelinn
@Evelineevelinn Thanks, no problem. Actually my regex skills haven't improved that much, since I use them sparsely. One reason for that is the poor readability. I'm happy with the one I wrote, because it is very readable even if it is a bit redundant. Just as with 'normal' code, I think that shorter isn't necessarily better. I would have left your version though, if it would have been an addition rather than a complete replacement of my answer.Clichy
On the updated short example, why does it break if I remove the #. It will start matching four and five digits - "FFFa" "000aF", etc. It matches the three characters in the middle, where with the # it must be three or six. BTW, this reg-exp matches "#FFF#FF" which it probably shouldn'tStratum
@Stratum It matches color codes in a string. It will match the #FFF in #FFF#FF, but not the last part. If you want to match the exact string, you could add string boundary 'anchors' to the regex, making it something like ^#([a-f0-9]{3}){1,2}\b$. See https://regex101.com/r/LZJr63/1 for a breakdown.Clichy
D
44

The accepted answer shows you how to extract hexcodes with regex, because that is the asked question.

If you, instead, want to validate if an isolated string (without preceeding #) IS a hexcode, you can use two non-regex functions.

if (ctype_xdigit($color) && strlen($color)==6) {
    // yay, it's a hex color!
}
Doi answered 10/11, 2016 at 11:14 Comment(10)
Who's going to call this function 100,000 times?Nonlinearity
Sorry, but these kind of things are crazy. That function will be called what, 5 times at the most in any given PHP file? So we're talking about a fraction of a millisecond?Nonlinearity
but I want to check with # would you please help meFrance
@SachinSarola if that's the case, it's easier to use the regex solution. This is how you can do it without regex: if(ctype_xdigit(substr($color,1)) && strlen(ltrim($color,"#"))==6){ }Draff
@modu thanks for rply but just putting length condition first we get more faster result if I'm not wrongFrance
@SachinSarola that is right, if the input is not a valid hex code. If it is, then both will give you similar performance scores. But still, the regex solution would be faster than both if you want to check for the #. You can also ltrim the hashtag first, and add a condition(to the one in my answer) for checking difference between strlen is equal to 1. It will perform better than the one at my answer. But regex is still the way to go, unless you will use the trimmed version later in your code(storing etc).Draff
Despite my answer I'm reluctant to use regex normally. But the question was to extract the color codes from a string. That's quite simple with regex, but would require a parser that takes at least a couple of lines if written out in code. This function checks if a string is an exact color code, which can be very useful, but does not answer this question.Clichy
Shouldn't it be 7 characters?Bullivant
Performance comparisons? First, the scripts need to do the same thing! This script does not extract qualifying substrings from a text. This scipt validates if a string is a hex color code. Completely different tasks. This answer does not satisfy the asked question.Edmonds
@Edmonds Yes, the original title was "Regex for matching CSS hex colors", and as I stated in my answer, this wasn't meant to 100% satisfy the original question, but to show another way of matching CSS hex colors. You are right, and that's the reason this is not an accepted solution. But it helps other people with similar questions who end up on this page quite often, so I keep it here.Draff
I
21

Shorter version of GolezTrol's answer that avoids writing the character set twice:

/#([a-fA-F0-9]{3}){1,2}\b/
Inmost answered 7/4, 2015 at 4:4 Comment(1)
You can make it even short by using the i case-insensitive match flag. /#([a-f0-9]{3}){1,2}\b/iCockswain
R
2

Despite this question's age I'd like to ammend the following:

^#([[:xdigit:]]{3}){1,2}$, where [[:xdigit:]] is a shorthand for [a-fA-F0-9].

So:
<?php preg_match_all("/^#(?>[[:xdigit:]]{3}){1,2}$/", $css, $matches) ?>

Also noteworthy here is the usage of a non-capturing group (?>...), to ensure we don't store data in memory we never wanted to store in the first place.

Try it online

Revert answered 3/8, 2017 at 7:56 Comment(0)
D
1

I'm not entirely sure if I got this right, but if you only want to match hex colors at the end of a CSS line:

preg_match_all('/#(?:[0-9a-fA-F]{6}|[0-9a-fA-F]{3})[\s;]*\n/',$css,$matches);

should work, all I did was add the optional \s; char group (optional semi-colon and spaces) and a line-break character (not optional) and it seemed to work.
And as @GolezTrol pointed out #FFF; is valid, too.

When tested on this:

$css = '/* Do not match me: #abcdefgh; I am longer than needed. */
.foo
{
    color: #CAB;
    background-color:#ababab;
}';
preg_match_all('/#(?:[0-9a-fA-F]{6}|[0-9a-fA-F]{3})[\s;]*\n/',$css,$matches);
var_dump($matches);

The output was:

array (array('#CAB;','#ababab;'))
Depew answered 11/10, 2012 at 11:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.