Thesaurus class or API for PHP [edited]
Asked Answered
C

3

13

TL;DR Summary: I need a single command-line application which I can use to get synonyms and other related words. It needs to be multi-lingual and works cross platform. Can anyone suggest a suitable program for me, or help me with the ones I've already found? Thanks.


Longer version: I've been tasked with writing a system in PHP that can come up with alternative suggestions for words entered by the user. I need to find a thesaurus application / API or similar which I can use to generate these suggestions.

Importantly, it needs to be multilingual (English, Danish, French and German). This rules out most of the software that I managed to find using Google. It also needs to be cross-platform (it needs to work on Linux and Windows).

My research has let me to two promising candidates: WordNet and Stardict.

I've been focusing on WordNet so far, calling it from PHP using the shell_exec() function, and I've managed to use it to create a very promising prototype PHP page, but so far in English only. I'm struggling with how to use it multi-lingual.

The Wordnet site has external links to Wordnet projects in other language (eg DanNet for Danish), but although they're often called Wordnet, they seem to use a variety of database formats and software, which makes them unsuitable for me. I need a consistent interface that I can call from my PHP program.

Stardict looked more promising from that perspective: they provide dictionaries in many languages in a standard DB format for the one application.

But the down-side of Stardict is that its primarily a GUI app. Calling it from the command-line launches the GUI. There is apparently a command-line version (SDCV), but it seems quite out of date (last update 2006), and only for Linux.

Can anyone help me with my problems with either of these programs? Or else, can anyone suggest any other alternative software or API that I could use?

Many thanks.

Capitalize answered 28/4, 2011 at 11:9 Comment(0)
B
7

You could try to leverage PostgreSQL's full text search functionality:

http://www.postgresql.org/docs/9.0/static/textsearch.html

You can configure it with any of the available languages and all sorts of collations to fit your needs. PostgreSQL 9.1 adds some extra collation functionality that you may want to look into if the approach seems reasonable.

The basic steps would be (for each language):

  1. Create the needed table (collated appropriately). For our sake, a single column is enough, e.g.:

    create table dict_en (
      word text check (word = lower(word)) primary key
    );
    
  2. Fetch the needed dictionary/thesaurus files (those from aspell/Open-Office should work).

  3. Configure text search (see link above, namely section 12.6) using the relevant files.

  4. Insert the whole dictionary into the table. (Surely there's a csv file somewhere...)

  5. And finally index the vector, e.g.:

    create index on dict_en using gin (to_tsvector('english', word));
    

You can now run queries that use this index:

-- Find words related to `:word`
select word
from dict_en
where to_tsvector('english', word) @@ plainto_tsquery('english', :word)
and word <> :word;

You might need to create a separate database or schema for each language, and add an additional field (tsvector) if Postgres refuses to index the expression because of the language parameter. (I read the full text docs a long time ago). The details on this would be in section 12.2, and I'm sure you'll know how to adjust the above if this is the case.

Whichever the implementation details, though, I believe the approach should work.

Backstay answered 15/5, 2011 at 6:26 Comment(3)
+1 because there's some useful stuff here. We have a MySQL db, but I believe it has similar search functionality. You glossed over the bit about 'fetch the needed dictionary/thesaurus files' and 'insert into the table' -- I've been looking for suitable files and haven't found anything that I'm happy with yet. Any pointers would be welcome. Thanks.Capitalize
Haven't done this myself, but if memory serves the postgres doc mentions aspell and openoffice source files at some point. I take it this means the dictionary files for the latter two are available and around somewhere.Backstay
thanks. I'll keep digging. Bounty for you since you're the best answer, but I don't think I've really solved the problem yet.Capitalize
D
7

There is a PHP example for a thesaurus API usage here...

http://thesaurus.altervista.org/testphp

Available for Italian, English, French, Deutsch, Spanish and Portuguese.

Delgadillo answered 28/4, 2011 at 11:18 Comment(4)
Thanks for that. It looks good. The down sides: 1) I'd prefer to keep the dictionaries locally rather than relying on a third-party, and 2) more importantly for me, it doesn't support Danish, which is a key requirement for this project. Which is a shame, because if it did, I could have been tempted to accept the fact that it was a remote service, because it looks good.Capitalize
I note that it uses the OpenOffice thesaurus. I wonder if that might be an option for me to use locally? Is there an API that I can use for it?Capitalize
Hmmm - not sure on that - see this thread: user.services.openoffice.org/en/forum/…Delgadillo
I was also going to suggest OpenOffice (or LibreOffice) - we've recently used it from the command-line to perform various document conversions. Given that the code is open source, even if there isn't an existing API for it, you could always have a poke around and see it's usage inside OpenOffice itself and try to reverse-engineer something. I suppose it depends on time/budget constraints! :-)Brittain
B
7

You could try to leverage PostgreSQL's full text search functionality:

http://www.postgresql.org/docs/9.0/static/textsearch.html

You can configure it with any of the available languages and all sorts of collations to fit your needs. PostgreSQL 9.1 adds some extra collation functionality that you may want to look into if the approach seems reasonable.

The basic steps would be (for each language):

  1. Create the needed table (collated appropriately). For our sake, a single column is enough, e.g.:

    create table dict_en (
      word text check (word = lower(word)) primary key
    );
    
  2. Fetch the needed dictionary/thesaurus files (those from aspell/Open-Office should work).

  3. Configure text search (see link above, namely section 12.6) using the relevant files.

  4. Insert the whole dictionary into the table. (Surely there's a csv file somewhere...)

  5. And finally index the vector, e.g.:

    create index on dict_en using gin (to_tsvector('english', word));
    

You can now run queries that use this index:

-- Find words related to `:word`
select word
from dict_en
where to_tsvector('english', word) @@ plainto_tsquery('english', :word)
and word <> :word;

You might need to create a separate database or schema for each language, and add an additional field (tsvector) if Postgres refuses to index the expression because of the language parameter. (I read the full text docs a long time ago). The details on this would be in section 12.2, and I'm sure you'll know how to adjust the above if this is the case.

Whichever the implementation details, though, I believe the approach should work.

Backstay answered 15/5, 2011 at 6:26 Comment(3)
+1 because there's some useful stuff here. We have a MySQL db, but I believe it has similar search functionality. You glossed over the bit about 'fetch the needed dictionary/thesaurus files' and 'insert into the table' -- I've been looking for suitable files and haven't found anything that I'm happy with yet. Any pointers would be welcome. Thanks.Capitalize
Haven't done this myself, but if memory serves the postgres doc mentions aspell and openoffice source files at some point. I take it this means the dictionary files for the latter two are available and around somewhere.Backstay
thanks. I'll keep digging. Bounty for you since you're the best answer, but I don't think I've really solved the problem yet.Capitalize
S
0

This seems to be an option, though I'm not sure whether its multilingual: http://developer.dictionary.com/products/synonyms

I also found the following site which does something similar to your end goal, maybe you could try contacting the owner and ask him how he did it: http://www.synonymlab.com/

Severally answered 16/5, 2011 at 17:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.