Using preg_replace() to convert alphanumeric strings from camelCase to kebab-case
Asked Answered
L

4

14

I have a method now that will convert my camelCase strings to kebab-case, but it's broken into three calls of preg_replace():

public function camelToKebab($string, $us = "-")
{
    // insert hyphen between any letter and the beginning of a numeric chain
    $string = preg_replace('/([a-z]+)([0-9]+)/i', '$1'.$us.'$2', $string);
    // insert hyphen between any lower-to-upper-case letter chain
    $string = preg_replace('/([a-z]+)([A-Z]+)/', '$1'.$us.'$2', $string);
    // insert hyphen between the end of a numeric chain and the beginning of an alpha chain
    $string = preg_replace('/([0-9]+)([a-z]+)/i', '$1' . $us . '$2', $string);

    // Lowercase
    $string = strtolower($string);

    return $string;
}

I wrote tests to verify its accuracy, and it works properly with the following array of inputs (array('input' => 'output')):

$test_values = [
    'foo'       => 'foo',
    'fooBar'    => 'foo-bar',
    'foo123'    => 'foo-123',
    '123Foo'    => '123-foo',
    'fooBar123' => 'foo-bar-123',
    'foo123Bar' => 'foo-123-bar',
    '123FooBar' => '123-foo-bar',
];

I'm wondering if there's a way to reduce my preg_replace() calls to a single line which will give me the same result. Any ideas?

NOTE: Referring to this post, my research has shown me a preg_replace() regex that gets me almost the result I want, except it doesn't work on the example of foo123 to convert it to foo-123.

Latinist answered 9/11, 2016 at 18:54 Comment(5)
@AdrienLeber read the bottom line of my question. It is not a duplicate. I read that post, and it did not help with my question.Latinist
Sorry, I deleted the duplicate flag and posted a new answer based on what was shared on the post you refer on your question.Indiscipline
@pete pay attentionLatinist
@Latinist Yup, little too quick on the trigger there, my apologies.Dawn
It is more common for researchers to be searching for "snake_case", but your minimal reproducible example is seeking "kebab-case" ...the strings are being "skewered".Discrimination
E
29

You can use lookarounds to do all this in a single regex:

function camelToUnderscore($string, $us = "-") {
    return strtolower(preg_replace(
        '/(?<=\d)(?=[A-Za-z])|(?<=[A-Za-z])(?=\d)|(?<=[a-z])(?=[A-Z])/', $us, $string));
}

RegEx Demo

Code Demo

RegEx Description:

(?<=\d)(?=[A-Za-z])  # if previous position has a digit and next has a letter
|                    # OR
(?<=[A-Za-z])(?=\d)  # if previous position has a letter and next has a digit
|                    # OR
(?<=[a-z])(?=[A-Z])  # if previous position has a lowercase and next has a uppercase letter
Endocardium answered 9/11, 2016 at 19:10 Comment(1)
Great solution but be aware of code injection while using preg_replace. Above solution will inject code injection vulnerability.Outofdoor
I
4

Here is my two cents based on the duplicated post I flagged earlier. The accepted solution here is awesome. I just wanted to try to solve it with what was shared :

function camelToUnderscore($string, $us = "-") {
    return strtolower(preg_replace('/(?<!^)[A-Z]+|(?<!^|\d)[\d]+/', $us.'$0', $string));
}

Example :

Array
(
    [0] => foo
    [1] => fooBar
    [2] => foo123
    [3] => 123Foo
    [4] => fooBar123
    [5] => foo123Bar
    [6] => 123FooBar
)

foreach ($arr as $item) {
    echo camelToUnderscore($item);
    echo "\r\n";
}

Output :

foo
foo-bar
foo-123
123-foo
foo-bar-123
foo-123-bar
123-foo-bar

Explanation :

(?<!^)[A-Z]+      // Match one or more Capital letter not at start of the string
|                 // OR
(?<!^|\d)[\d]+    // Match one or more digit not at start of the string

$us.'$0'          // Substitute the matching pattern(s)

online regex

The question is already solved so I won't say that I hope it helps but maybe someone will find this useful.


EDIT

There are limits with this regex :

foo123bar => foo-123bar
fooBARFoo => foo-barfoo

Thanks to @urban for pointed it out. Here is his link with tests with the three solutions posted on this question :

three solutions demo

Indiscipline answered 9/11, 2016 at 20:22 Comment(2)
Your solution is different from the OP solution: it doesn't take in account the case foo123bar... See code demo the difference between OP's solution, anubhava's solution and your solution.Option
@urban foo123bar is not camelCase. But you're right there are limits with this regex and it's not the best solution... Something like fooBARFoo will produce foo-barfoo. Anyway, this will for basics camelCase. I edit the answer. Thanks for your feedback !Indiscipline
L
2

From a colleague:

$string = preg_replace(array($pattern1, $pattern2), $us.'$1', $string); might work

My solution:

public function camelToUnderscore($string, $us = "-")
{
    $patterns = [
        '/([a-z]+)([0-9]+)/i',
        '/([a-z]+)([A-Z]+)/',
        '/([0-9]+)([a-z]+)/i'
    ];
    $string = preg_replace($patterns, '$1'.$us.'$2', $string);

    // Lowercase
    $string = strtolower($string);

    return $string;
}
Latinist answered 9/11, 2016 at 19:8 Comment(0)
D
0

You don't need to suffer the inefficiency of loads of lookarounds or multiple sets of patterns to target the positions between words or consecutive numbers.

Use greedy matching to find the desired sequences, then reset the fullstring match with \K, then check that the position is not the end of the string. Everything that qualifies should receive the delimiting character. The speed in this greedy pattern is in the fact that it consumes one or more sequences and never looks back.

I'll omit the strtolower() call from my answer because it is merely noise for the challenge.

Code: (Demo)

preg_replace(
    '/(?:\d++|[A-Za-z]?[a-z]++)\K(?!$)/',
    '-',
    $tests
)

Processing between words/numbers:

User steps pattern replacement
Anubhava 660 /(?<=\d)(?=[A-Za-z])|(?<=[A-Za-z])(?=\d)|(?<=[a-z])(?=[A-Z]) '-'
mickmackusa 337 /(?:\d++|[A-Za-z]?[a-z]++)\K(?!$)/ '-'

Strict camelCase processing:

User steps pattern replacement
JazZ 321 /(?<!^)[A-Z]+|(?<!^|\d)[\d]+/ '-$0'
mickmackusa 250 /(?>\d+|[A-Z][a-z]*|[a-z]+)(?!$)/ '$0-'
mickmackusa 244 /(?:\d++|[a-z]++)\K(?!$)/ '-'

I have discounted @Matt's answer because it is making three whole passes over each string -- it isn't even in the same ballpark in terms of efficiency.

Discrimination answered 9/3, 2023 at 16:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.