.dic line format definition
Asked Answered
H

2

10

I am currently investigating the most appropriate dictionary to use in an application I am building.

Inspecting the dictionaries which are bundled with Sublime Text 2, the file format is as you would expect - a list of alphabetically ordered words. However, alot of those words have additional information appended to them. Take this snippet as an example:

abaft
abbreviation/M
abdicate/DNGSn
Abelard/M
abider/M
Abidjan
ablaze
abloom
aboveground
abrader/M
Abram/M
abreaction/MS
abrogator/MS
abscond/DRSG
absinthe/MS
absoluteness/S
absorbency/SM
abstract/ShTVDPiGY
absurdness/S

A fruitless Google search has not shed any light on what the letters after the slash (/) mean.

Maybe they hint at the sex of the word, but that is only a guess and I'd prefer to read a formal explanation of their meaning.

Has anybody come across these?

Hydrokinetics answered 17/9, 2013 at 14:7 Comment(0)
V
8

The letters following the slash are called affixes. These encodings can be prefixes or suffixes that may be applied to the root word.

See this blog post for a nice explanation and examples of what these affixes can be used for.

Another place to look is the aspell manual.

Vomitory answered 18/9, 2013 at 14:17 Comment(0)
S
8

TLDR: each letter in the .dic file following the slash is a name of a rule in the .aff file.

https://superuser.com/a/633869/367530

Each rule is in the .aff file for that language. The rules come in two flavors: SFX for suffixes, and PFX for prefixes. Each line begins with PFX/SFX and then the rule letter identifier (the ones that follow the word in the dictionary file:

PFX [rule_letter_identifier] [combineable_flag] [number_of_rule_lines_that_follow]

You can normally ignore the combinable flag, it is Y or N depending on whether it can be combined with other rules. Then there are some number of lines (indicated by the ) that list different possibilities for how this rule applies in different situations. It looks like this:

PFX [rule_letter_identifier] [number_of_letters_to_delete] [what_to_add] [when_to_add_it]

For example:

  • SFX B Y 3
  • SFX B 0 able [^aeiou]
  • SFX B 0 able ee
  • SFX B e able [^aeiou]e

If B is one of the letters following a word, i.e. someword/B, then this is one of the rules that can apply. There are three possibilities that can happen (because there are three lines). Only one will apply:

  • able is added to the end when the end of the word is not (indicated by ^) one of the letters in the set (indicated by [ ]) of letters a, e, i, o, and u. For example, question → questionable
  • able is added to the end when the end of the word is ee. For example, agree → agreeable.
  • able is added to the end when the end of the word is not a vowel ([^aeiou]) followed by an e. The letter e is stripped (the column before able). For example, excite → excitable.

PFX rules are the same, but apply at the beginning of the word instead for prefixes.

Slavey answered 27/10, 2016 at 21:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.