How can I use the Wiktionary API for getting pronunciation data? [closed]
Asked Answered
T

3

10

I was looking for a way to get the pronunciation of any given word by querying an API of some sort. As Wiktionary comes in handy to find pronunciations of words I was trying to use their API, but how do I get the pronunciation of a specific word?

It seems their API only allows getting the entire Wiki article.

Tootsie answered 21/3, 2011 at 13:19 Comment(1)
My API also supports this: github.com/Vuizur/ultimate-dictionary-apiSherd
B
7

Wiktionary doesn't have an API of its own. MediaWiki, the software the Wiktionary runs on does have an API but it is completely unaware of the structure and content of Wiktionary.

The best you can do is use the MediaWiki API to find the wiki page for the word you want, then look at the table of contents. If the table of contents has a language section for the language you want and within that there is a Pronunciation section, then use another API call to get the wikitext of that section which you will have to parse yourself. It may well use or not different templates on different words since Wiktionary is constantly evolving.

There are also mailing lists for Wiktionary and for MediaWiki API.

Breger answered 19/4, 2011 at 0:48 Comment(6)
Thanks for that API; I'd been XML-parsing the dictionary entry pages in my application.Watchmaker
@Tortoise: You're welcome. It would probably be easier these days if there were a way to do jQuery-style selectors on the HTML. You can get the HTML of the whole page or a single section minus most of the boilerplate either with some URL parameters or via the API.Breger
The "jQuery-style" was just to mess with me, right? ;)Watchmaker
@Tortoise: Not really. I know there are implementations of the DOM API in languages other than JavaScript and I know jQuery's selection stuff is from a separate project called Sizzle. So without knowing much more I'm just not ruling out the possibility that somebody may have ported some subset of this stuff to PHP, or made something different that functions in a similar way. Another possibility is if there is some interface in existence between PHP and node.js ...Breger
You mean, like, SimpleXML? I don't follow.Watchmaker
@Tortoise: I don't know anything specific I only know that there are many things out there that I haven't heard about that you might be able to find with some searching. I'll keep a look out too though... Check out this old question which mentions a "phpQuery": Is there a JQuery DOM manipulator/CSS selector equivalent class in PHP?Breger
B
5

You could build on wiktionary dbpedia an send a SPARQL query like the following one to their SPARQL endpoint:

PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wt:<http://wiktionary.dbpedia.org/terms/>

SELECT DISTINCT ?spell ?pronounce
WHERE { 
  ?spell rdfs:label "work"@en ;
            wt:hasLangUsage ?use .

  ?use dc:language wt:English ;
          wt:hasPronunciation ?pronounce .
}

In this case "work" is the word for which you want to look up the spelling.

EDIT:

A similar project is dbnary, which is more active and delivers more reliable results. You can use the SPARQL endpoint with the following query:

SELECT DISTINCT ?pronun
WHERE {
  ?form lemon:writtenRep "work"@en ;
        lexinfo:pronunciation ?pronun .
}
Beamy answered 27/9, 2012 at 20:17 Comment(6)
That SPARQL endpoint is currently a broken link. Do you know if it's just temporary or do you have an alternate link? I tried this query elsewhere with no results. I'm a fan of DBpedia but not very knowledgeable.Breger
@hippietrail: The endpoint works fine for me.Beamy
OK I've moved from Seoul to Sydney and either it got fixed in that time or my location made a difference for some reason. I have noticed that the first letter of the first pronunciation is consistently missing though: "work" -> "ɜː(r)k"@en; "pork" -> "ɔː(r)k"@enBreger
@hippietrail: I get four results for work. Three are the different pronunciations and one of them is the "Rhymes" entry from wiktionary, which is "missing" the first letter for obvious reasons. I don't know if listing the rhyme as hasPronunciation is a bug or the dbpedia people really consider it a pronunciation, but I expect the former.Beamy
Is there documentation somewhere for this API? For example I want to locate audio files with pronunciation.Tapeworm
@MateuszKonieczny: I don't see links to the audio files in either project. However, it should be possible to add them to dbnary without too much work. I assume the project would be happy about such a contribution. I would also like to have this information for my project WikDict.Beamy
E
2

Here is what I did for a similar situation.

  1. Visit Scraping Links With PHP. It will teach you how to scrape links using PHP. Please do not copy and paste but try to learn it.
  2. Now that we have our links we need to separate the audio (*.ogg) ones from the normal links. We need to use the pathinfo function in PHP. The officual documentation for pathinfo should be a good start.
  3. Create a XML out of the result.
  4. Deliver the content using Ajax or any other prefered way.

Or you can give "http://api.forvo.com/demo" a try. It looks promising.

I will not give you the full answer! Because it will not be fun any more. I hope it helps.

Emmie answered 21/3, 2011 at 13:38 Comment(1)
Your solution doesn`t use an API, but does manual scrapping.Lapstrake

© 2022 - 2024 — McMap. All rights reserved.