Estimate Phonemic Similarity Between Two Words
Asked Answered
F

2

12

I am working on detecting rhymes in Python using the Carnegie Mellon University dictionary of pronunciation, and would like to know: How can I estimate the phonemic similarity between two words? In other words, is there an algorithm that can identify the fact that "hands" and "plans" are closer to rhyming than are "hands" and "fries"?

Some context: At first, I was willing to say that two words rhyme if their primary stressed syllable and all subsequent syllables are identical (c06d if you want to replicate in Python):

def create_cmu_sound_dict():

    final_sound_dict = {}

    with open('resources/c06d/c06d') as cmu_dict:
        cmu_dict = cmu_dict.read().split("\n")
        for i in cmu_dict:
            i_s = i.split()
            if len(i_s) > 1:
                word = i_s[0]
                syllables = i_s[1:]

                final_sound = ""
                final_sound_switch = 0

                for j in syllables:
                    if "1" in j:
                        final_sound_switch = 1
                        final_sound += j
                    elif final_sound_switch == 1:
                        final_sound += j

            final_sound_dict[word.lower()] = final_sound

    return final_sound_dict

If I then run

print cmu_final_sound_dict["hands"]
print cmu_final_sound_dict["plans"]

I can see that hands and plans sound very similar. I could work towards an estimation of this similarity on my own, but I thought I should ask: Are there sophisticated algorithms that can tie a mathematical value to this degree of sonic (or auditory) similarity? That is, what algorithms or packages can one use to mathematize the degree of phonemic similarity between two words? I realize this is a large question, but I would be most grateful for any advice others can offer on this question.

Feuillant answered 20/10, 2014 at 21:2 Comment(8)
Are you looking for something like the Soundex algorithm (en.wikipedia.org/wiki/Soundex)?Welborn
I can't speak for the downvoter, but the reason given for the close vote is that your question looks like it's asking for recommendations. You may want to rephrase it to more clearly ask "How can I do X?" rather than "Which tool should I use to do X?"Hypaethral
I take the questions to be synonymous (to do some thing implies/necessitates a method with which to do that thing) but I will be happy to rephrase if it will help...Feuillant
@Welborn Soundex looks interesting, but it seems more like a hashing algorithm of sorts rather than a method that can estimate degrees of phonemic similarity between two wordsFeuillant
Yes, I think you're right. Unfortunately, I don't know of any other phonetic algorithms. The Levenshtein distance will tell you how similar two words are in terms of writing but not based on how they sound.Welborn
This question is either algorithm-shopping or library-shopping. Either way, there's no way someone can write the right answer, because there are (hopefully) many possible right answers, and the choice of which one is best will be completely subjective. That doesn't mean it's a bad question—there are different StackExchange sites, and of course mailing lists and forums and so on—where this would be a great question. It just means it's not a fit for StackOverflow.Leialeibman
I'm not asking which algorithm is best; I'm simply asking which can be used to pursue the task. Any algorithm that estimates phonemic similarity will be "right", in the way that any answer that resolves a Traceback (and introduces no others) can be "right" within the scope of an debugging question. Certain answers will be preferable in both circumstances, but what the OP wants in both cases is a solution to the question at hand, with preference naturally given to the more elegant solutions. SO is a great resource for this kind of collective thinking; it'd be a shame to close the question...Feuillant
There's no one algorithm that can detect all rhymes, but phonetic algorithms can detect some types. Both metaphone and soundex (for english) can be used for that purpose.Millburn
F
4

1) get all TTS audio for all words through web API or the local SAPI,

2) Extract speech features if you can (1,2), or at least get the power of the speech data

3) Depends on the feature you have, here are some approaches.

If you can get the power of each samples(frames) of speech data (Dim=1) , one easy way is no doubt to compute the correlation of two set of features.

If you have other type of features ,which most likely will have more dimensions, you can treat it as image and check out the 2d convolution or Dynamic time warping

4) If you have no knowledge about speech processing for the task 1,2,3, check out pyphonetics

#pip install pyphonetics
>>> from pyphonetics import RefinedSoundex
>>> rs = RefinedSoundex()
>>> rs.distance('Rupert', 'Robert')
0
>>> rs.distance('assign', 'assist', metric='hamming')
2
Fawnia answered 28/4, 2016 at 5:44 Comment(0)
E
3

Cheat.

#!/usr/bin/env python

from Levenshtein import *

if __name__ == '__main__':
    s1 = ['HH AE1 N D Z', 'P L AE1 N Z']
    s2 = ['HH AE1 N D Z', 'F R AY1 Z']
    s1nospaces = map(lambda x: x.replace(' ', ''), s1)
    s2nospaces = map(lambda x: x.replace(' ', ''), s2)
    for seq in [s1, s2, s1nospaces, s2nospaces]:
        print seq, distance(*seq)

Output:

['HH AE1 N D Z', 'P L AE1 N Z'] 5
['HH AE1 N D Z', 'F R AY1 Z'] 8
['HHAE1NDZ', 'PLAE1NZ'] 3
['HHAE1NDZ', 'FRAY1Z'] 5

Library: https://pypi.python.org/pypi/python-Levenshtein/0.11.2

Seriously, however, since you only have text as input and pretty much the text-based CMU dict, you're limited to some sort of manipulation of the text input; but the way I see it, there's only a limited number of phonems available, so you could take the most important ones and assign "phonemic weights" to them. There's only 74 of them in the CMU dictionary you pointed to:

 % cat cmudict.06.txt | grep -v '#' | cut -f 2- -d ' ' | tr ' ' '\n' | sort | uniq | wc -l
 75

(75 minus one for empty line)

You'd probably get better results if you've done smth more advanced in step 2: assign weights to particular phonem combinations. Then you could modify some Levenshtein-type distance metric, e.g. in the library above, to come up with reasonably performing "phonemic distance" metric working on text inputs.

Not much work for step 3: profit.

Emasculate answered 24/10, 2014 at 10:38 Comment(1)
This completely ignores the phonemic features which makes "nd" tend to assimilate towards "n", whereas e.g. "nk" does not (or tends towards "ngk", or indeed is regularly realized as "ngk").Neuberger

© 2022 - 2024 — McMap. All rights reserved.