TeX Hyphenation patterns : What do they represent
Asked Answered
G

2

13

If you scroll down this page a bit, you'd see UK English hyphenation patterns like:

\patterns{ % just type <return> if you're not using INITEX
.ab4i
.ab3ol
.ace4
.acet3
.ach4
.ac5tiva

What do these patterns like .ab4i mean?

Gadabout answered 19/12, 2009 at 22:58 Comment(1)
Just guessing here, but I image the numbers are a priority system for choosing the best places to break a word based on the surrounding letters. The only question remain: does the system prefer high or low values?Creamcolored
S
17

There are three kinds of characters in a TeX hyphenation pattern. The dot . is an anchor for word boundary. A letter stands for itself, that is, a letter in the word to be hyphenated. A number stands for a potential hyphenation point, and the number signifies the hyphenation level. There are five levels in total.

The basic idea of the algorithm is that a word is matched against the patterns, and the hyphenation level inserted from each pattern that matches. If two levels from two different patterns match at the same point, the higher one is selected. Of the final values, only odd levels indicate allowed hyphenation points. The idea is to be able to specify both possible hyphenation points and places where a hyphen should not be inserted. So, for example, if a specific spot in a word matches two patterns that have a 1 and a 2 in that spot, hyphenation at that point is not allowed because the 2 overrides the 1 and only an odd value indicates a permitted hyphenation point.

Looking at your examples, .ab4i indicates that abi at the start of a word will rarely receive a hyphen between b and i because a level of 4, being even, will inhibit hyphenation unless overridden by a 5. On the other hand, a word beginning with activa can always be hyphenated between the c and the t because the 5 will override any other value and, being odd, permits hyphenation.

Snoddy answered 19/12, 2009 at 23:15 Comment(2)
@jk: Thanks for wonderful explanation. But I didn't get this part "Of the final values, only odd levels indicate allowed hyphenation points."Gadabout
@understack: Odd values indicate places where hyphens are encouraged, and even values where they are discouraged. Because the algorithm might find that several patterns apply, it has a system of precedence to say which hypenation rule counts more, indicated by the higher number.Phallus
V
4

These patterns are created using a tool called patgen2. There's TeX source for a tutorial about this tool at patgen2.tutorial, and the Ph. D. thesis on this topic available through tug.org.

Vibrant answered 19/12, 2009 at 23:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.