VIM thesaurus file
Asked Answered
W

3

10

I have been poking around for a good solution for a vim thesaurus. The capability is built-in, obviously, but the file everyone seems to use is the mthesaur.txt. While it 'works' in the sense that the commands in insert mode bring up a list, it seems to me the results are programatically correct but not super useful. The vim online thesaurus plugin works very well, but the latency over the wire and necessity of using a split for the returned buffer is less than ideal. Anyone have an opinion about this?

Wilsey answered 31/10, 2015 at 15:47 Comment(0)
F
7

I have written a plugin that can address the two issues you raised here.

Multi-language Thesaurus Query plugin for Vim

It improves the using experience in two regards: more sensible synonym choosing mechanism; and better and more flexible synonym source(s).

Thesaurus_query.vim screen cast

By default, the plugin uses vim's messagebox for candidate display, with each synonym labeled by a number. And it let user choose the suitable one to replace the word under cursor by typing in its number. It works similar to vim's default spell correction prompt. And drastically reduced the operation time for choosing proper synonym from a long list of candidates.

To improve the quality of synonym candidates, multiple query backends were used. For English user, two are note worthy.

  • thesaurus_com Backend using Thesaurus.com as synonym source
  • mthesaur_txt Backend using mthesaur.txt as synonym source

thesaurus_com Backend will work straight away. For Local Query Backend to work, you will need to download mthesaur.txt and tell the plugin where it is located either by setting variable thesaurus or specifying variable g:tq_mthesaur_file. Or else only Online Backend will be functional.

By default, Online Query Backend will be used first. But if internet is not available or too slow, future query in the current vim session will be handled by Local Query Backend first to reduce latency time. Priority of these two backends can also be manually altered(see documentation).

To address the latency issue(which usually stands out when the word is not found), I have introduced a timeout mechanism. You may set

let g:tq_online_backends_timeout = 0.6

if your internet is reasonably fast. So that the latency could be reduced to under 0.6 second.

The plugin is written in Python, though. So you might want to use it with Vim compiled with Python and/or Python3 support.

Fustian answered 28/2, 2016 at 8:47 Comment(4)
@ThadBrown Give it a go~ And feedback is welcomed. :DFustian
The tricky thing to me (both in terms of the latency and the overall plugin) is the lack of a really good thesaurus file. For example, you want auto completion for a huge database, you can just query for the column/table names, do another query for stored procedures, and so on, and cat the results into a dictionary file. That can be a basic autocompletion source. Is there a thesaurus file out there that is licensed in such a way that it can be redistributed and modified? I could scrape a web site for me, but that wouldn't help anyone else if it couldn't be redistributed. Thoughts?Wilsey
I will move this over the github. See you there. Great work, btw.Wilsey
@ThadBrown I think for basic synonym list, mthesaur.txt looks fine. My main beef with other plugins using it is the choosing mechanism. Classifying synonyms by defninitions like how Thesaurus.com did is certainly better. But what behinds that feels like to be a very complicated database. And so far I don't know if there is one for off-line use. And agree, we should move to github. Thanks for your interests~Fustian
S
3

If your system is unix-like and if you have awk installed, then I have a simple solution to your problem that gives you access to thesauri in multiple languages without internet connection and without a split window either.

First download LibreOffice thesauri from:

https://cgit.freedesktop.org/libreoffice/dictionaries/tree/

for example.

(Look after th_*.dat files, these are the ones you need, not the .aff and .dic files which work only for spellchecking with Hunspell.) Download the *.dat thesauri of your liking and copy them to a subdirectory of the folder where you will put your plugin; this subdirectory should be called, "thes."

Now create a new file in your plugin folder (the folder where you should have the "thes" subdirectory with the *.dat thesauri inside) and put the following in this file:

" offer choice among installed thesauri
" ==================================================
let s:thesaurusPath = expand("<sfile>:p:h") . "/thes"

function! s:PickThesaurus()
    " 1, 1: glob does not ignore any pattern, returns a list
    let thesaurusList = glob(s:thesaurusPath . "/*", 1, 1)
    if len(thesaurusList) == 0
        echo "Nothing found in " . s:thesaurusPath
        return
    endif
    let index = 0
    let optionList = []
    for name in thesaurusList
        let index = index + 1
        let shortName = fnamemodify(name, ":t:r")
        let optionList += [index . ". " . shortName]
    endfor
    let choice = inputlist(["Select thesaurus:"] + optionList)
    let indexFromZero = choice - 1
    if (indexFromZero >= 0) && (indexFromZero < len(thesaurusList))
        let b:thesaurus = thesaurusList[indexFromZero]
    endif
endfunction

command! Thesaurus call s:PickThesaurus()

This will allow you to pick the thesaurus of your choice by typing :Thesaurus in Vim's command mode.

(Actually, if you plan to use only one thesaurus then you don't need any of this; just assign the full name of your thesaurus file to the buffer-local variable, b:thesaurus).

Finally, add the following to your plugin file:

" run awk on external thesaurus to find synonyms
" ==================================================
function! OmniComplete(findstart, base)
    if ! exists("b:thesaurus")
        return
    endif
    if a:findstart
        " first, must find word
        let line = getline('.')
        let wordStart = col('.') - 1
        " check backward, accepting only non-white space
        while wordStart > 0 && line[wordStart - 1] =~ '\S'
            let wordStart -= 1
        endwhile
        return wordStart
    else
        " a word with single quotes would produce a shell error
        if match(a:base, "'") >= 0
            return
        endif
        let searchPattern = '/^' . tolower(a:base) . '\|/'
        " search pattern is single-quoted
        let thesaurusMatch = system('awk'
            \ . " '" . searchPattern . ' {printf "%s", NR ":" $0}' . "'"
            \ . " '" . b:thesaurus . "'"
        \)
        if thesaurusMatch == ''
            return
        endif
        " line info was returned by awk
        let matchingLine = substitute(thesaurusMatch, ':.*$', '', '')
        " entry count was in the thesaurus itself, right of |
        let entryCount = substitute(thesaurusMatch, '^.*|', '', '')
        let firstEntry = matchingLine + 1
        let lastEntry = matchingLine + entryCount
        let rawOutput = system('awk'
            \ . " '" . ' NR == ' . firstEntry . ', NR == ' . lastEntry
            \ . ' {printf "%s", $0}' . "'"
            \ . " '" . b:thesaurus . "'"
        \)
        " remove dash tags if any
        let rawOutput = substitute(rawOutput, '^-|', '', '')
        let rawOutput = substitute(rawOutput, '-|', '|', 'g')
        " remove grammatical tags if any
        let rawOutput = substitute(rawOutput, '(.\{-})', '', 'g')
        " clean spaces left by tag removal
        let rawOutput = substitute(rawOutput, '^ *|', '', '')
        let rawOutput = substitute(rawOutput, '| *|', '|', 'g')
        let listing = split(rawOutput, '|')
        return listing
    endif
endfunction

" configure completion
" ==================================================
set omnifunc=OmniComplete
set completeopt=menuone

This will allow you to get the synonyms of any word you type in insert mode. While still in insert mode, press Ctrl-X Ctrl-O (or any key combination you mapped on omnicompletion) and a popup menu will show up with the synonym list.

This solution is very crude as compared to Chong's powerful plugin (see above), but it is lightweight and works well enough for me. I use it with thesauri in four different languages.

Similarity answered 20/1, 2017 at 1:4 Comment(2)
Interesting, I wonder on what stage LibreOffice's thesaurus function is on now. Official releases of the package contain no thesaurus source; and the source you pointed looks quite similar to OpenOffice's thesaurus source, except it doesn't have idx files, which is actually quite useful in quickly locating the word in these huge dat files...Fustian
These are recent versions for development, and I think the Portuguese thesaurus (for example) is more complete than one I downloaded from the LibreOffice official extension page.Lefthanded
M
2

Script for ~/.vimrc, it needs the file thesaurii.txt (merged dictionaries from https://github.com/moshahmed/vim/blob/master/thesaurus/thesaurii.txt) and perl.exe in path for searching for synonyms. Script tested on win7 and cygwin perl.

Calls aspell to do spell correction, if no synonyms are found. See https://mcmap.net/q/672328/-how-to-use-ctags-for-autocomplete-in-vim on how to call this function on pressing [tab].

set thesaurus=thesaurii.txt
let s:thesaurus_pat = "thesaurii.txt"

set completeopt+=menuone
set omnifunc=MoshThesaurusOmniCompleter
function!    MoshThesaurusOmniCompleter(findstart, base)
    " == First call: find-space-backwards, see :help omnifunc
    if a:findstart
        let s:line = getline('.')
        let s:wordStart = col('.') - 1
        " Check backward, accepting only non-white space
        while s:wordStart > 0 && s:line[s:wordStart - 1] =~ '\S'
            let s:wordStart -= 1
        endwhile
        return s:wordStart

    else
        " == Second call: perl grep thesaurus for word_before_cursor, output: comma separated wordlist
        " == Test: so % and altitude[press <C-x><C-o>]
        let a:word_before_cursor = substitute(a:base,'\W','.','g')
        let s:cmd='perl -ne ''chomp; '
                    \.'next if m/^[;#]/;'
                    \.'print qq/$_,/ if '
                      \.'/\b'.a:word_before_cursor.'\b/io; '' '
                    \.s:thesaurus_pat
        " == To: Debug perl grep cmd, redir to file and echom till redir END.
        " redir >> c:/tmp/vim.log
        " echom s:cmd
        let   s:rawOutput = substitute(system(s:cmd), '\n\+$', '', '')
        " echom s:rawOutput
        let   s:listing = split(s:rawOutput, ',')
        " echom join(s:listing,',')
        " redir END
        if len(s:listing) > 0
          return s:listing
        endif

        " Try spell correction with aspell: echo mispeltword | aspell -a
        let s:cmd2 ='echo '.a:word_before_cursor
            \.'|aspell -a'
            \.'|perl -lne ''chomp; next unless s/^[&]\s.*?:\s*//;  print '' '
        let   s:rawOutput2 = substitute(system(s:cmd2), '\n\+$', '', '')
        let   s:listing2 = split(s:rawOutput2, ',\s*')
        if len(s:listing2) > 0
          return s:listing2
        endif

        " Search dictionary without word delimiters.
        let s:cmd3='perl -ne ''chomp; '
                    \.'next if m/^[;#]/;'
                    \.'print qq/$_,/ if '
                      \.'/'.a:word_before_cursor.'/io; '' '
                    \.&dictionary
        let   s:rawOutput3 = substitute(system(s:cmd3), '\n\+$', '', '')
        let   s:listing3 = split(s:rawOutput3, ',\s*')
        if len(s:listing3) > 0
          return s:listing3
        endif

        " Don't return empty list
        return [a:word_before_cursor, '(no synonyms or spell correction)']

    endif
endfunction  
Monia answered 31/10, 2015 at 15:47 Comment(2)
Please don't just post some tool or library as an answer. At least demonstrate how it solves the problem in the answer itself.Elizaelizabet
@BaummitAugen, Ok included the script itself, I had provided the link because the dictionary thesaurii.txt is huge (12M) it was the most difficult part for this answer.Monia

© 2022 - 2024 — McMap. All rights reserved.