PHP filter_var() - FILTER_VALIDATE_URL
Asked Answered
A

3

8

The FILTER_VALIDATE_URL filter seems to have some trouble validating non-ASCII URLs:

var_dump(filter_var('http://pt.wikipedia.org/wiki/', FILTER_VALIDATE_URL)); // http://pt.wikipedia.org/wiki/
var_dump(filter_var('http://pt.wikipedia.org/wiki/Guimarães', FILTER_VALIDATE_URL)); // false

Why isn't the last URL correctly validated? And what are the possible workarounds? Running PHP 5.3.0.

I'd also like to know where I can find the source code of the FILTER_VALIDATE_URL validation filter.

Adust answered 26/1, 2010 at 1:51 Comment(1)
You can find the source code along with the rest of PHP's source. It's freely available in the Downloads section of their website. As of your question, that sounds like a bug, and you should report it. The only workaround I could suggest is to use some other logic (perhaps craft a function to use with FILTER_CALLBACK in the while).Ponderous
W
4

The parsing starts here:
http://svn.php.net/viewvc/php/php-src/trunk/ext/filter/logical_filters.c?view=markup

and is actually done in /trunk/ext/standard/url.c

At a first glance I can't see anything that purposely rejects non-ASCII characters, so it's probably just lack of unicode support. PHP is not good in handling non-ASCII characters anywhere. :(

Wirehaired answered 26/1, 2010 at 2:36 Comment(2)
Humm... if (!isalnum((int)*(unsigned char *)s) && *s != '_' && *s != '.') That must be the cause, any workarounds you can think of?Adust
@Alix - As zneak said, you can use FILTER_CALLBACK to write your own filter functions. It should actually work to just copy and paste the C function into a php script and replace isalnum with a more permissive function. (It'll need some adjustment for the pointers, but not much, I guess.)Wirehaired
A
41

Technically that is not a valid URL according to section 5 of RFC 1738. Browsers will automatically encode the ã character to %C3%A3 before sending the request to the server. The technically valid full url here is: http://pt.wikipedia.org/wiki/Guimar%C3%A3es Pass that to the VALIDATE_URL filter and it will work fine. The filter only validates according to spec, it doesn't try to fix/encode characters for you.

Algebra answered 31/3, 2010 at 23:50 Comment(0)
P
11

The following code uses filter_var but encode non ascii chars before calling it. Hope this helps someone.

<?php

function validate_url($url) {
    $path = parse_url($url, PHP_URL_PATH);
    $encoded_path = array_map('urlencode', explode('/', $path));
    $url = str_replace($path, implode('/', $encoded_path), $url);

    return filter_var($url, FILTER_VALIDATE_URL) ? true : false;
}

// example
if(!validate_url("http://somedomain.com/some/path/file1.jpg")) {
    echo "NOT A URL";
}
else {
    echo "IS A URL";
}
Primeval answered 2/5, 2017 at 19:49 Comment(0)
W
4

The parsing starts here:
http://svn.php.net/viewvc/php/php-src/trunk/ext/filter/logical_filters.c?view=markup

and is actually done in /trunk/ext/standard/url.c

At a first glance I can't see anything that purposely rejects non-ASCII characters, so it's probably just lack of unicode support. PHP is not good in handling non-ASCII characters anywhere. :(

Wirehaired answered 26/1, 2010 at 2:36 Comment(2)
Humm... if (!isalnum((int)*(unsigned char *)s) && *s != '_' && *s != '.') That must be the cause, any workarounds you can think of?Adust
@Alix - As zneak said, you can use FILTER_CALLBACK to write your own filter functions. It should actually work to just copy and paste the C function into a php script and replace isalnum with a more permissive function. (It'll need some adjustment for the pointers, but not much, I guess.)Wirehaired

© 2022 - 2024 — McMap. All rights reserved.