How to get phrase tables from word alignments?
Asked Answered
S

1

1

The output of my word alignment file looks as such:

I wish to say with regard to the initiative of the Portuguese Presidency that we support the spirit and the political intention behind it . In bezug auf die Initiative der portugiesischen Präsidentschaft möchte ich zum Ausdruck bringen , daß wir den Geist und die politische Absicht , die dahinter stehen , unterstützen .   0-0 5-1 5-2 2-3 8-4 7-5 11-6 12-7 1-8 0-9 9-10 3-11 10-12 13-13 13-14 14-15 16-16 17-17 18-18 16-19 20-20 21-21 19-22 19-23 22-24 22-25 23-26 15-27 24-28
It may not be an ideal initiative in terms of its structure but we accept Mr President-in-Office , that it is rooted in idealism and for that reason we are inclined to support it .    Von der Struktur her ist es vielleicht keine ideale Initiative , aber , Herr amtierender Ratspräsident , wir akzeptieren , daß sie auf Idealismus fußt , und sind deshalb geneigt , sie mitzutragen .   0-0 11-2 8-3 0-4 3-5 1-6 2-7 5-8 6-9 12-11 17-12 15-13 16-14 16-15 17-16 13-17 14-18 17-19 18-20 19-21 21-22 23-23 21-24 26-25 24-26 29-27 27-28 30-29 31-30 33-31 32-32 34-33

How can I produce the phrase tables that are used by MOSES from this output?

In this pdf, it explains the consistent phrase extraction: http://www.inf.ed.ac.uk/teaching/courses/mt/lectures/phrase-model.pdf but what is the algorithm to achieve the phrases? (slide 16-21)

Sarinasarine answered 26/7, 2014 at 11:15 Comment(8)
i've tried iterating all possible sizes of cells with all possible combination. but that will give me n! * m! * n * m cells to check through for every sentence, where n and m are length of the source and target sentence.Sarinasarine
I don't understand your question. Are you trying to get the alignment itself? How does your alignment work?Meadors
@Daniel, word alignment != phrase table. I've found the algorithm but it's not working somehow, #25109501Sarinasarine
What do you mean by "not working somehow"? You implemented the algorithm below in the response, and it is giving wrong answers?Meadors
yes, it's not giving the right output...Sarinasarine
well, it seems like the alignment below is just an approximation, and not guaranteed to give consistent results.Meadors
Is this a standard input format? Looks pretty ad-hoc and hard to use.Coextensive
yes, it's the pharaoh output format. One could also prefer the giza output format though, e.g. rali.iro.umontreal.ca/rali/?q=en/node/1325#ali.Sarinasarine
S
3

The way to get a phrase table is to first extract the phrase table with the following algorithm from Philip Koehn's Statistical MT book, pp. 133:

enter image description here

Then estimate the probabilities for the phrases with their relative frequencies, i.e.

enter image description here

Note that there is an error in the original printed version of the book but it's addressed in the errata on line 4 of the extract() function.

Also see Phrase extraction algorithm for statistical machine translation for the details.

Sarinasarine answered 3/8, 2014 at 21:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.