Wordnet SQL Explanation
Asked Answered
C

4

30

I'm trying to get a simple synonym database up and running, so I can find synonyms of words the user entered (nothing else!). For this I grabbed a copy of the Wordnet sql thesarus (http://wnsql.sourceforge.net/), but now I'm presented with all these tables, and I can't find any simple explanation for their content anywhere:

adjpositions
adjpositiontypes
casedwords
lexdomains
lexlinks
linktypes
morphmaps
morphs
postypes
samples
semlinks
senses
synsets
vframemaps
vframes
vframesentencemaps
vframesentences
words

Could someone tell me what these tables contain and which I need, since I cant decipher their content based on their data.

Christoforo answered 16/8, 2013 at 16:38 Comment(1)
user15602949 posted an Answer saying "https://web.archive.org/web/20150923131021/http://wnsqlbuilder.sourceforge.net/wnsql.pdf this pretty much outlines the entire data structure, just thought i would leave this here for future reference"Maelstrom
F
46

WordNet is a super cool word database. I have been researching it myself. I'll list my findings below - and hopefully it will help you to understand the tables better.

The Synset Table The synsets table is one of the most important tables in the database. It is responsible for housing all the definitions within WordNet. Each row in the synset table has a synsetid, a definition, a pos (parts of speech field) and a lexdomainid (which links to the lexdomain table) There are 117373 synsets in the WordNet Database.

The Words Table WordNet also has a “words” table, that only has two fields: a wordid, and a “lemma”. The words table is responsible for housing all the lemmas (base words) within the Wordnet Database. There are 146625 entries in this table

So.. how are these two tables linked? The answer? The sense table!

The Sense Table The sense table is responsible for linking together words (in the words table), with definitions (in the synset table). The entries in the sense table are referred as “word-sense pairs” - because each pairing of a wordid with a synset is one complete meaning of a word - a “sense of the word”.
There are a total of 206,354 word senses in the WordNet database.

The Lexdomains table The Lexdomains table is referenced by the sense table, and is used to define what lexical domain a word-sense pair belongs to. There are 45 lexical domains in the lexdomains table. The lexdomain table therefore, is WordNet’s way of “tagging” a word-sense pair. However, it is quite limited, because a word-sense pair can only belong to ONE lexical domain.

The 45 lexical domains include:

Adjectives: all, pert

Adverbs all

Nouns tops, act, animal, artifact, attribute,body, cognition, communication, event, feeling, food, group, location,motive,object, person, phenomenon, plant, possession, process, quantity,linkdef, shape, state, substance, time,

Verbs body, change, cognition,communication, competition, consumption, contact, creation, emotion, motion, perception, possession, social, stative, weather, ppl

The casedwords table Some words within the words table naturally have the first letter capitalized ie: “A-team”. Since the words table stores all words as lowercase, WordNet uses this table to specify the uppercase version of the word. There are 40313 entries in this table.

There are many other tables in the WordNet DB, once I have them researched, I'll post again.

Finding yer synonyms To answer your question regarding synonyms - You need to do the following.

Let's say you want to find the synonyms for the word "Carry". In order to do so, you would first search the words table for a lemma matching the word "carry". This would yield the wordid 21253. You would then search the senses table, to find all word-sense pairs for the word carry. This yields 41 results - each result lists the wordid 21253, and a senseid (which is the index of the word-sense pair) and a synsetid.

Now, you would then need to query the synset table for each of the synsetid's returned so you can access the associated definition field in the synset table.

Lastly to find the synonyms for each of the synsets listed, you'd simply need to search the sense table for other word-sense pairs that shared the same synset.

Example: One of the 41 word-sense pairs for the word "carry" is listed below: wordsense example If we lookup the definition for this synsetid 202083512, you will find “transmit or serve as the medium for transmission”

To find all the synonyms for this definition, you would then search the sense table for the same synsetid 202083512. This yields synonyms: channel, conduct, convey, impart, and transmit (note: you will need to left join the words table to get the actual lemmas)

I hope this helps demystify WordNet for you.. I'm finding it to be quite cool...

Featherstone answered 7/11, 2013 at 12:22 Comment(8)
I want to know a little more: Is it all contained in one db file or part files? What is the exact procedure for creating db files. Because when I run batch files in the given package of this SQLBuilder I got sql command error.Melanymelaphyre
I downloaded the database as mysql tables, as that is what I am used to working with. You can download them here: wnsql.sourceforge.net I hope this helpsFeatherstone
Thanks Paul for quick reply. So according to you I will have to use many .sql files to work with?Melanymelaphyre
Sorry for the very late reply, the wordnet you download will have one sql file you can use to import - which will create all the needed tables...Featherstone
in the unzipped folder, read the "howto.html" file. There it is explained how to populate your database (make sure you have created your database first)Cashier
Hey paul I have the following query "select words.lemma AS word,synsets.definition AS Definition From senses LEFT JOIN synsets ON senses.synsetid=synsets.synsetid LEFT JOIN words ON senses.wordid = words.wordid where words.lemma='"+word+"'"; but it is not returning many common words(participles for example). How can I search for participles and all the words this query is not returning?Hermeneutics
Thanks so much Paul. Here is the SQL I came up with to find all of the synset synonyms to a word ('carry'). I was able to download and run this against SQLite, pretty easily. select words.lemma, synsets.definition from synsets join (select synsets.synsetid, synsets.definition from words join senses on words.wordid=senses.wordid join synsets on synsets.synsetid=senses.synsetid where words.lemma='carry' ) as syns on syns.synsetid=synsets.synsetid join senses on synsets.synsetid=senses.synsetid join words on senses.wordid=words.wordid order by words.lemma;Hartmann
Any idea how to get example sentences for each word?Giaour
A
7

Paul Preibisch explained several core tables, here are short explanations for the rest of them:

adjpositiontypes - defines three positions that adjectives can take in English language, predicate, attributive and immediatelly postnominal.

adjpositions - links concrete words (adjectives) with their allowed position types in adjpositiontypes table.

linktypes - defines all relation (link) types used in wordnet, about two dozen of them. Both lexlinks and semlinks tables use this table to define the type of each link. Some link types are marked as recursive, meaning that if "furniture" is, for example, a hypernim to a "chair", then a "chair" is a hyponym to "furniture".

lexlinks - lexical links, i.e., relations between words. Example:
sad - saddness (derivation)

semlinks - semantic links, i.e. relations between synsets. Example:
chair - furniture (hypernym)

morphs - connected to "words" table, contains irregular word forms. One word can have multiple morphs, and one morph can be an irregular form for multiple words, so you additionally have the morphmaps table. Examples:
abacus (word) - abaci (morph)
abhor (word) - abhorred, abhorring (morphs)

postypes - defines "parts of speech". Contains only following values:
n – noun, v –verb, a – adjective, r – adverb, s – adjective satellite.

samples - sample sentences for synsets. One synset can have multiple samples.

vframemaps & vframes - vframes define a kind of standard "verb templates". Vframemaps links words (verbs) with corresponding vframes in which they can appear.

vframesentencemaps & vframesentences - similar to previous two tables, just here you have entire sentences as verb templates.

Astringent answered 18/5, 2015 at 16:2 Comment(0)
F
3

To properly understand the meaning of the various terms in Wordnet, you should read the extensive documentation. For synonyms, you'll primarily need the synsets table. The actual database tables in the project you've downloaded are described on the project's schema page.

Fennell answered 17/8, 2013 at 10:11 Comment(2)
Any idea how to get example sentence for each sense for a given word?Giaour
this schema page link is dead :( There's a waybackmachine link to the schema diagram here: web.archive.org/web/20130604065331/http://wnsql.sourceforge.net/…Orlon
W
2

I think this figure will help you to demystify WordnetDB.

this figure I found it in /mysql-3.0.0-31-wn-31/doc/images. For clearer picture, you can choose tables-wordnet.png in that folder instead.

Whitfield answered 14/12, 2017 at 1:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.