How can I make the Wikipedia API normalize and redirect without knowing the exact case of all characters?
Asked Answered
B

1

7

If I try to get the language links for a page on Wikipedia via their API like this:

http://en.wikipedia.org/w/api.php?action=query&prop=langlinks&format=json&lllimit=10&llurl=&titles=wreck-it%20Ralph&redirects=

I get a list of results.

But if I down-case the R in Ralph like this:

http://en.wikipedia.org/w/api.php?action=query&prop=langlinks&format=json&lllimit=10&llurl=&titles=wreck-it%20ralph&redirects=

I get no results.

Looking at the returned information, it looks like Wikipedia normalizes "wreck-it Ralph" in the first example to "Wreck-it Ralph" which redirects to "Wreck-It Ralph".

In the second example, "wreck-it ralph" is normalized to "Wreck-it ralph" which doesn't redirect anywhere, apparently.

Searching for "wreck-it ralph" on http://wikipedia.org works, of course:

http://www.wikipedia.org/search-redirect.php?family=wikipedia&search=wreck-it+ralph&language=en

Can I make the langlinks query work the same way, helping me when I don't know the exact case of all the characters of the search term?

Update From the answer by Sorawee I managed to find out how to do a case-insensitive search: https://en.wikipedia.org/w/api.php?action=query&generator=search&format=json&gsrsearch=wreck-it%20ralph&gsrlimit=1&prop=info

Bove answered 18/1, 2014 at 23:7 Comment(0)
I
9

In MediaWiki, all titles will be capitalized automatically. Therefore, "wreck-it Ralph" and "Wreck-it Ralph" are the same page. Similarly, "wreck-it ralph" and "Wreck-it ralph" are the same page. Note that capitalization only applies to the very first letter.

MediaWiki also has pages called "redirect pages." A redirect page can redirect you from the page to another totally different page. For example, https://en.wikipedia.org/wiki/Template:cn will redirect you to https://en.wikipedia.org/wiki/Template:Citation_needed. These pages are created by users, not software.

The situation you asked is like the below diagram.

"wreck-it Ralph" =normalized=> "Wreck-it Ralph" =redirected=> "Wreck-It Ralph" (found)

"wreck-it ralph" =normalized=> "Wreck-it ralph" (not exist)

So now you know that you can't query page "wreck-it ralph," because it doesn't exist.

However, if you want to query from "wreck-it Ralph," you might or might not get the langlinks of "Wreck-It Ralph." It depends on the parameter "&redirects=." If you don't have this parameter, it will not return any langlinks, as "wreck-it Ralph" itself has no langlinks. With "&redirects=," api will search langlinks at redirect page instead (if it exists). Therefore, it will return the langlinks that you want. You can compare:

For the question why does http://www.wikipedia.org/search-redirect.php?family=wikipedia&search=wreck-it+ralph&language=en work, the answer is that search-redirect.php is not api. It searches and returns for the nearest match, while the api that we are discussing returns only the exact result.

Implacable answered 12/3, 2014 at 0:13 Comment(4)
Thanks, this is great information, but it does not answer my question about how I can use the API to redirect automatically. Can I use the API to first redirect to the correct page, and then run the langlinks query? Or can I do it in one step by including some kind of parameter?Bove
Edited. I think that it answers your question now.Implacable
I should be happy that comments can not receive down-votes :) You did answer my question from the start, and it was "no". Thanks! But you lead me on the right track with "nearest match". Look here, what I dug up: en.wikipedia.org/w/…Bove
It can added that MediaWiki wikis can be configured either to capitalize, or not to. Most, if not all, Wikipedia editions capitalize the first letter of the page title automatically. Most, if not all, Wiktionary editions do not.Wangle

© 2022 - 2024 — McMap. All rights reserved.