Regular expression for anchor tag with all attributes
Asked Answered
M

7

14

I'm trying to get a regular expression to replace all the links out of a text string for the value of the link.

A link may look like these:

<a href="http://whatever" id="an_id" rel="a_rel">the link</a>
<a href="/absolute_url/whatever" id="an_id" rel="a_rel">the link</a>

I want a regular expression that I get: the link

Mina answered 6/2, 2012 at 9:56 Comment(1)
Related #239391Zamboanga
R
36
/<a[^>]*>([^<]+)<\/a>/g

It's far from being perfect, but you need to provide more examples of what is a correct match and what isn't (e.g. what about whitespaces?)

Rampant answered 6/2, 2012 at 10:1 Comment(2)
Hi Florian, others example: <a href="/absolute_url/whatever" id="an_id" rel="a_rel"></a> <a href="/absolute_url/whatever">a link</a> <a href="domain.com">a link</a>Mina
Note: This would not work for nested elements. Regex should be case insensitive as <a> and <A> both are valid.Joan
H
24
/<a[\s]+([^>]+)>((?:.(?!\<\/a\>))*.)<\/a>/g

This one will match any <a ...>...</a> tag including correctly matching ones that contain a < or any full tags such as:

blah blah <a href="test.html">This line contains an HTML opening < bracket.</a> blah blah
blah blah <a href="test.html">This line contains <strong>bold</strong> text.</a> blah blah

Would capture:

<a href="test.html">This line contains an HTML opening < bracket.</a>
  • with capture groups:
    • href="test.html"
    • This line contains an HTML opening < bracket.

and

<a href="test.html">This line contains <strong>bold</strong> text.</a>
  • with capture groups:
    • href="test.html"
    • This line contains <strong>bold</strong> text.

It also includes capturing groups for the tag attributes (like class="", href="", etc) and contain (what is between the tag) that can be removed if you do not need them.

If you want to capture across multiple lines add an "s" before or after the "g" flag at the end. Note that the "s" flag may not work in all flavors of regular expression.

Capture example (not using the "s" flag - not supported by regexr yet): http://regexr.com/39rsv

Hepsibah answered 5/11, 2014 at 18:38 Comment(4)
You have an unescaped forward slash near the endPhilbo
how would you modify this to cover bla bla <a href="test.html" data-annoying=">" >yikes</a>? That's the one killing me right now.Polyzoarium
Good question, @Jerry. I don't really know how to answer your question (and this post is over a year too late), but I would think that any HTML attributes that contain XML special characters like that should have those characters encoded somehow.Hydrocephalus
Escaped < and > where it shouldn't be... correct version is <a[\s]+([^>]+)>((?:.(?!<\/a>))*.)<\/a>Padraig
J
2

Just a little correction from the accepted answer. This is the correct regex: /<a[^>]*>([^<]+)<\/a>/g. The forward slash (/) for closing the anchor tag </a> was not escaped so no match will be made.

Jidda answered 5/7, 2016 at 18:9 Comment(0)
S
1

I just added explicitly named groups:

<a.*href\s?=['"]*(?<href>[^'"]*)[^>]*>((?<text>(.(?!\<\/a\>))*.))<\/a>

https://regex101.com/r/sbtcYr/1

Sheldon answered 12/2, 2022 at 1:43 Comment(0)
H
0

I was not able to get any of the answers listed here to work...not sure they read your question right.

The way I read your post you're looking for the INBETWEEN of the <a href="abcdefg">example tag</a>

(aka extract "example tag")

However I managed to come up with this solution. It doesn't appear to work in all browsers though which is a bummer (aka edge, IE, haven't tried FF)

This link shows it working https://regexr.com/5dd0m

(?<=<a.*>).+(?=<\/a>)
Holotype answered 5/10, 2020 at 20:43 Comment(0)
S
-1

try this 100% work

(?i)<a(.*)(")>

Spinode answered 30/10, 2016 at 18:52 Comment(0)
J
-2

Something like this should be enough

<a.*?>(.*)?</a>
Jock answered 6/2, 2012 at 10:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.