Convert QUrl with percent encoding into string
Asked Answered
S

3

9

I use a URL entered by the user as text to initialize a QUrl object. Later I want to convert the QUrl back into a string for displaying it and to check it using regular expression. This works fine as long as the user does not enter any percent encoded URLs.

Why doesn't the following example code work?

qDebug() << QUrl("http://test.com/query?q=%2B%2Be%3Axyz%2Fen").toDisplayString(QUrl::FullyDecoded); 

It simply doesn't decode any of the percent-encoded characters. It should print "http://test.com/query?q=++e:xyz/en" but it actually prints "http://test.com/query?q=%2B%2Be%3Axyz%2Fen".

I also tried a lot of other methods like fromUserInput() but I could not make the code work correctly in Qt5.3.

Can someone explain me how to do this and why the above code doesn't work (i.e. showing the decoded URL) even when using QUrl::FullyDecoded?

UPDATE

After getting the fromPercentEncoding() hint, I tried the following code:

QUrl UrlFromUserInput(const QString& input)
{
   QByteArray latin = input.toLatin1();
   QByteArray utf8 = input.toUtf8();
   if (latin != utf8)
   {
      // URL string containing unicode characters (no percent encoding expected)
      return QUrl::fromUserInput(input);
   }
   else
   {
      // URL string containing ASCII characters only (assume possible %-encoding)
      return QUrl::fromUserInput(QUrl::fromPercentEncoding(input.toLatin1()));
   }
}

This allows the user to input unicode URLs and percent-encoded URLs and it is possible to decode both kinds of URLs for displaying/matching. However the percent-encoded URLs did not work in QWebView... the web-server responded differently (it returned a different page). So obviously QUrl::fromPercentEncoding() is not a clean solution since it effectively changes the URL. I could create two QUrl objects in the above function... one constructed directly, one constructed using fromPercentEncoding(), using the first for QWebView and the latter for displaying/matching only... but this seems absurd.

Sarthe answered 21/6, 2014 at 16:27 Comment(3)
What do you mean, "why doesn't it work"? What are you expecting it to print?Drillmaster
Similar question - #4815918Improvisation
If you can't find a solution here, just post an email on the interest @ qt-project.org mailing list. QUrl maintaners are extremely active there.Drillmaster
B
21

#Conclusion

I've done some research, the conclusion so far is: absurd.

QUrl::fromPercentEncoding() is the way to go and what OP has done in the UPDATE section should've been the accepted answer to the question in title.

I think Qt's document of QUrl::toDisplayString is a little bit misleading :

"Returns a human-displayable string representation of the URL. The output can be customized by passing flags with options. The option RemovePassword is always enabled, since passwords should never be shown back to users."

Actually it doesn't claim any decoding ability, the document here is unclear about it's behavior. But at least the password part is true. I've found some clues on Gitorious:

"Add QUrl::toDisplayString(), which is toString() without password. And fix documentation of toString() which said this was the method to use for displaying to humans, while this has never been true."


#Test Code In order to discern the decoding ability of different functions. The following code has been tested on Qt 5.2.1 (not tested on Qt 5.3 yet!)

QString target(/*path*/);

QUrl url_path(target);
qDebug() << "[Original String]:" << target;
qDebug() << "--------------------------------------------------------------------";
qDebug() << "(QUrl::toEncoded)          :" << url_path.toEncoded(QUrl::FullyEncoded);
qDebug() << "(QUrl::url)                :" << url_path.url();
qDebug() << "(QUrl::toString)           :" << url_path.toString(); 
qDebug() << "(QUrl::toDisplayString)    :" << url_path.toDisplayString(QUrl::FullyDecoded);
qDebug() << "(QUrl::fromPercentEncoding):" << url_path.fromPercentEncoding(target.toUtf8());

P.S. QUrl::url is just synonym for QUrl::toString.


#Output [Case 1]: When target path = "%_%" (test the functionality of encoding):

[Original String]: "%_%" 
-------------------------------------------------------------------- 
(QUrl::toEncoded)          : "%25_%25" 
(QUrl::url)                : "%25_%25" 
(QUrl::toString)           : "%25_%25" 
(QUrl::toDisplayString)    : "%25_%25" 
(QUrl::fromPercentEncoding): "%_%" 

[Case 2]: When target path = "Meow !" (test the functionality of encoding):

[Original String]: "Meow !" 
-------------------------------------------------------------------- 
(QUrl::toEncoded)          : "Meow%20!" 
(QUrl::url)                : "Meow !" 
(QUrl::toString)           : "Meow !" 
(QUrl::toDisplayString)    : "Meow%20!" // "Meow !" when using QUrl::PrettyDecoded mode
(QUrl::fromPercentEncoding): "Meow !" 

[Case 3]: When target path = "Meow|!" (test the functionality of encoding):

[Original String]: "Meow|!" 
-------------------------------------------------------------------- 
(QUrl::toEncoded)          : "Meow%7C!" 
(QUrl::url)                : "Meow%7C!" 
(QUrl::toString)           : "Meow%7C!" 
(QUrl::toDisplayString)    : "Meow|!" // "Meow%7C!" when using QUrl::PrettyDecoded mode
(QUrl::fromPercentEncoding): "Meow|!" 

[Case 4]: When target path = "http://test.com/query?q=++e:xyz/en" (none % encoded):

[Original String]: "http://test.com/query?q=++e:xyz/en" 
-------------------------------------------------------------------- 
(QUrl::toEncoded)          : "http://test.com/query?q=++e:xyz/en" 
(QUrl::url)                : "http://test.com/query?q=++e:xyz/en" 
(QUrl::toString)           : "http://test.com/query?q=++e:xyz/en" 
(QUrl::toDisplayString)    : "http://test.com/query?q=++e:xyz/en" 
(QUrl::fromPercentEncoding): "http://test.com/query?q=++e:xyz/en" 

[Case 5]: When target path = "http://test.com/query?q=%2B%2Be%3Axyz%2Fen" (% encoded):

[Original String]: "http://test.com/query?q=%2B%2Be%3Axyz%2Fen" 
-------------------------------------------------------------------- 
(QUrl::toEncoded)          : "http://test.com/query?q=%2B%2Be%3Axyz%2Fen" 
(QUrl::url)                : "http://test.com/query?q=%2B%2Be%3Axyz%2Fen" 
(QUrl::toString)           : "http://test.com/query?q=%2B%2Be%3Axyz%2Fen" 
(QUrl::toDisplayString)    : "http://test.com/query?q=%2B%2Be%3Axyz%2Fen" 
(QUrl::fromPercentEncoding): "http://test.com/query?q=++e:xyz/en" 

P.S. I also encounter the bug that Ilya mentioned in comments: Percent Encoding doesn't seem to be working for '+' in QUrl


#Summary

The result of QUrl::toDisplayString is ambiguous. As the document says, the QUrl::FullyDecoded mode must be used with care. No matter what type of URL you got, encode them by QUrl::toEncode and display them with QUrl::fromPercentEncoding when necessary.

As for the malfunction of percent-encoded URLs in QWebView mentioned in OP, more details are needed to debug it. Different function and different mode used could be the reason.


#Helpful Resources

  1. RFC 3986 (which QUrl conforms)
  2. Encode table
  3. Source of qurl.cpp on Gitorious
Bors answered 24/6, 2014 at 19:5 Comment(4)
Thanks for your elaborate work. I agree that applying fromPercentEncoding() after constructing the QUrl from the original string is the right idea. Regarding the open question about how web servers handle encoded URLs... I could not find much in the web. With your way of handling this should be no problem. But it is obvious that percent encoding is not totally transparent since the server I used for tests definitely returns different pages. I suppose this totally depends on how the server side scripts are programmed.Sarthe
I wonder if there is any sensible report/suggestion we could create for the Qt issue tracker.Sarthe
Very good issue actually, though I think it's understandable that QUrl is not as omnipotent as other modern browser's decoder. In your case, can't you just use QString(from user) to do displaying/matching stuff and load it to QUrl only when QWebView needs it?Bors
Well done! I had the same issue with file paths and QUrl::fromPercentEncoding() does work indeed.Agreeable
T
3

You can use QUrlQuery::toString(QUrl::FullyEncoded) or QUrl::fromPercentEncoding() for this converting.

Talanta answered 21/6, 2014 at 16:37 Comment(5)
QUrlQuery::toString() does not help since I operate on a complete URL, not the query only. QUrl::fromPercentEncoding() actually seems to work. But I need to convert the user input toLatin1 first (to get a QByteArray) which unfortunately kills any unicode user input. Maybe a good trade of... but still no clean solution, is it?Sarthe
Yes, I see. Which version of Qt do you use? It looks that in 5.0.2 there was a problem in decoding: Percent Encoding doesn't seem to be working for '+' in QUrlTalanta
Ok, now I see that my answer is useless. And I don't understand why do you need to process '%' in uncode string. I was sure, that we shouldn't mix it: in 1-byte string we need '%' chars, in 2-byte strings we don't need it.Talanta
Your are right. But please understand that I do not know what the user enters in the QLineEdit. The input could be %-encoded URL but also could be unicode URL. Both in a QString. I think that's a common problem. So I expect Qt to provide a function that accepts both kinds of URLs and decodes it necessary.Sarthe
I think it is good idea to analyze and copy logic of address bar of Google Chrome. Since it's source codes are available, you can see how it works inside (it will not be Qt implementation with QUrl, but you'll see proper logic of convertations).Talanta
J
3

I am not sure why toDisplayString(QUrl::FullyDecoded) does not work.

After trying several versions I have found that copy.query(QUrl::FullyDecoded) does decode the query part. The Documentation has an example with the the following code does return the decoded URL:

QUrl url("http://test.com/query?q=%2B%2Be%3Axyz%2Fen");
url.setQuery(url.query(QUrl::FullyDecoded), QUrl::DecodedMode);
qDebug() << url.toString();

To solve the problem this way is not optimal because the query part is copied without need.

Juridical answered 24/6, 2014 at 8:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.