preg_match_all - regex to find full urls in string
Asked Answered
T

2

7

I have spent over 4 hours trying to find a regex patter to my php code without luck.

I have a string with html code. It has lot of urls formats like:

example.com
http://example.com
http://www.example.com
http://example.com/some.php
http://example.com/some.php?var1=1
http://example.com/some.php?var1=1&var2=2
etc.

I have the following php code working in part:

preg_match_all('/\b(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)[-A-Z0-9+&@#\/%=~_|$?!:,.]*[A-Z0-9+&@#\/%=~_|$]/i', $content, $result, PREG_PATTERN_ORDER);

The only thing I need is ALSO capture urls with multiple query strings using "&" I get them, but not in full, I only receive things like:

http://example.com/asdad.php?var1=1&

The left is lost.

Can someone help me adding the part lost to the pattern?

Thanks so much in advance.

Traject answered 5/3, 2014 at 16:1 Comment(0)
T
11

Well. Finally I got it:

The final regex code is:

$regex = "/\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|]/i";

It works.

Traject answered 5/3, 2014 at 16:18 Comment(0)
M
0

Check these pattern which can be used for any URL type

$regex = "((https?|ftp)\:\/\/)?"; // Checking scheme 
$regex .= "([a-z0-9-.]*)\.([a-z]{2,3})"; // Checking host name and/or IP
$regex .= "(\:[0-9]{2,5})?"; // Check it it has port number
$regex .= "(\/([a-z0-9+\$_-]\.?)+)*\/?"; // The real path
$regex .= "(\?[a-z+&\$_.-][a-z0-9;:@&%=+\/\$_.-]*)?"; // Check the query string params
$regex .= "(#[a-z_.-][a-z0-9+\$_.-]*)?"; // Check anchors if are used.

You can ignore any section which you may not need. As you see I am concatenating them

Moffett answered 5/3, 2014 at 16:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.