Regular Expression - Match all but first letter in each word in sentence
Asked Answered
C

4

7

I've almost got the answer here, but I'm missing something and I hope someone here can help me out.

I need a regular expression that will match all but the first letter in each word in a sentence. Then I need to replace the matched letters with the correct number of asterisks. For example, if I have the following sentence:

There is an enormous apple tree in my backyard.

I need to get this result:

T**** i* a* e******* a**** t*** i* m* b*******.

I have managed to come up with an expression that almost does that:

(?<=(\b[A-Za-z]))([a-z]+)

Using the example sentence above, that expression gives me:

T* i* a* e* a* t* i* m* b*.

How do I get the right number of asterisks?

Thank you.

Chausses answered 25/1, 2011 at 5:40 Comment(1)
Do you need to use regular expressions for any particular reason? Depending on the programming language you're writing in, you can use a substring with replacement to get the same effectUird
B
17

Try this:

\B[a-z]

\B is the opposite of \b - it matches where there is no word boundary - when we see a letter that is after another letter.

Your regex is replacing the whole tail of the word - [a-z]+, with a single asterisks. You should replace them one by one. If you want it to work, you should match a single letter, but check is has a word behind it (which is a little pointless, since you might as well check for a single letter (?<=[A-Za-z])[a-z]):

(?<=\b[A-Za-z]+)[a-z]

(note that the last regex has a variable length lookbehind, which isn't implemented in most regex flavors)

Bashful answered 25/1, 2011 at 5:56 Comment(6)
The shortest regex here is probably \B\w, but \w adds upper case letters and underscores.Bashful
(?<=\b[A-Za-z]+) won't work in any flavor but .NET and JGSoft. You had it right the first time.Scudo
@Alan - good point. I've added a warning on that. Either way, I did say it was rather pointless :)Bashful
And (depending on the regex flavor and whether you're planning on matching other characters than unaccented letters between a and z) you might want to use \p{L} or [^\W\d_] instead of [a-z].Percentage
How would that work when you had something like "Jack's toothbrush"?Dibucaine
@NathanArthur - That is a good question... The underlying question is even more difficult: What is a word? The pattern above assumes a word is made of alphanumeric characters, which is wrong. In fact, I do not believe I can solve that problem reliably with a simple pattern - there are just too many edge cases. Still, as for your question: In .Net, you can add an apostrophe to the pattern above: (?<=\b[A-Za-z']+)[a-z]. On other flavors I think (?<=\B|\b')[a-z] can work. Either way, it requires some thinking.Bashful
C
3

You can try this

\B\w

this will replace all characters except for the first letter of every word

from this ==Hello==World== into ==H****==W****==

Curule answered 26/2, 2022 at 11:22 Comment(0)
U
0

Try this possibly:

(\w{1})\w*
Uird answered 25/1, 2011 at 5:50 Comment(0)
M
0

This is an old question. Adding an answer since the others don't seem to solve this problem completely or clearly. The simplest regular expression that handles this is /(\B[a-z])/g. This adds 'g' as a global flag, so the single character search will be repeated throughout the string.

string = "There is an enormous apple tree in my backyard."
answer = string.replace(/\B[a-z]/g, "*");

string = "There is an enormous apple tree in my backyard."
$("#stringDiv").text(string);

answer = string.replace(/\B[a-z]/g, "*");
$("#answerDiv").text(answer);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div id="stringDiv"></div>
<div id="answerDiv"></div>
Mien answered 27/8, 2016 at 16:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.