Algorithm to Convert text from Left Align to Justify

Asked 6/8, 2013 at 20:43 Answered 21/10, 2020 at 19:39

Recently I was asked in an Interview to design an algorithm to convert an input string which is Left Aligned (with spaces at the end of each line) into Justify (with no white space at the end of a complete line), similar to that in MS Word. I proposed him some basic solution which involved counting number of words and number of spaces of each line and then distributing them equally among all spaces (he asked me to assume that fractional space can be distributed between words). But later he asked me to consider the whole paragraph and then modify the text so that the beauty of the text is not lost when unequal distribution of spaces between words is inevitable.

I was unable to think of any proper solution for this at that moment. Later on he told me that this is done by Dynamic Programming. I am not sure if there is already some standard algorithm for this. If yes, please share some useful link.

PS: The solution I proposed was very abstract idea, hence I don't have any code to show what all I have already tried. Justification : http://en.wikipedia.org/wiki/Justification_(typesetting)

Galahad answered 6/8, 2013 at 20:43 Comment(5)

How is unequal space distribution inevitable if fractional space can be distributed? They would differ by a pixel at most, hardly noticeable to the naked eye. – Kucik 6/8, 2013 at 20:45

Remember that we need to make sure that last word in each line touches the right boundary! Except the last line of course. – Galahad 6/8, 2013 at 20:48

Yes. Let d be the distance of the last word to the right boundary. Let x be the number of spaces between words (words - 1). Add d / x to each space, and the last word will touch the right edge. If d is in pixels, you will have to add the remaining d % x pixels to the first d % x spaces or randomly. Either should look fine. – Kucik 6/8, 2013 at 20:50

It might happen that the next line has fewer or more spaces at the end of the line. In that case sometimes we might have to consider either bringing one word from next line to the previous line or even sometimes taking into consideration the number of spaces at end of next line while distributing spaces in current line. Our aim is not just to make the text Justify but also make it as beautiful as possible. – Galahad 6/8, 2013 at 20:55

In that case, what you would probably do by hand in a word processor is, while in left align mode, bring words from the next line to the current line while they fit, then justify align like I said. Not sure if this is the most beautiful you can get or not. – Kucik 6/8, 2013 at 21:0

The standard algorithm for breaking paragraphs into lines is probably still the algorithm of Knuth & Plass, used by Knuth's typesetting systemTeX. The algorithm, which 'avoids backtracing by a judicious use of the techniques of dynamic programming ' is described in

Donald E. Knuth and Michael F. Plass, Software - Practice and Experience 11 (1981) 1119-1184 DOI: 10.1002/spe.4380111102, also available in Digital Typography, Ch. 3, pp. 67–155.

The algorithm is based on considering each possible line break, starting from the beginning of the paragraph, and for each one finding the sequence of preceeding line breaks that gives the best result that far. As the entire sequence is determined by the last line break in the sequence, only the potential starting points for the current line has to be considered when a new potential break point is to be added, leading to a efficient algorithm.

A simplified version of the algorithm (without e.g. hyphenation), can be described like this:

Add start of paragraph to list of active breakpoints
For each possible breakpoint (space) B_n, starting from the beginning:
   For each breakpoint in active list as B_a:
      If B_a is too far away from B_n:
          Delete B_a from active list
      else
          Calculate badness of line from B_a to B_n
          Add B_n to active list
          If using B_a minimizes cumulative badness from start to B_n:
             Record B_a and cumulative badness as best path to B_n

The result is a linked list of breakpoints to use.

The badness of lines under consideration can be calculated like this:

Each space is assigned a nominal width, a strechability, and a shrinkability.
The badness is then calculated as the ratio of stretching or shrinking used,
relative to what is allowed, raised e.g. to the third power (in order to
ensure that several slightly bad lines are prefered over one really bad one)

An illustrated description can be found at http://defoe.sourceforge.net/folio/knuth-plass.html

Implementations in various languages are available on the web, e.g Bram Stein's implementation in Javascript: http://www.bramstein.com/projects/typeset/

Overcritical answered 6/8, 2013 at 22:22 Comment(3)

Answers should be self-contained, i.e. not dependent on reading up on some external resource (to which a link wasn't even provided, so we have to go look for it ourselves). So you should at least give a high-level overview of the algorithm. – Encaenia 7/8, 2013 at 8:37

@Dukeling As the algorithm is described in a 66 page scientific paper best available via the publisher, and not widely available elsewhere on the web, I thought that the citation was a sufficient link. However I have now added a simplified description of the algorithm, together with a link to a web page containing a better description. – Overcritical 7/8, 2013 at 21:31

Could you please give an example in c#? – Sassoon 7/10, 2013 at 5:11

This might be an old thread.

But wanted to share the solution anyway in case it helps.

Text Justification Algorithm

Hanaper answered 15/8, 2018 at 20:21 Comment(0)

I made a Space-Inserter function :)

But just insert a space until the line width is less than the desired width.

    public static List<string> GetText(string text, int width)
    {
        string[] palabras = text.Split(' ');
        StringBuilder sb1 = new StringBuilder();
        StringBuilder sb2 = new StringBuilder();
        int length = palabras.Length;
        List<string> resultado = new List<string>();
        for (int i = 0; i < length; i++)
        {
            sb1.AppendFormat("{0} ", palabras[i]);
            if (sb1.ToString().Length > width)
            {
                resultado.Add(sb2.ToString());
                sb1 = new StringBuilder();
                sb2 = new StringBuilder();
                sb1.AppendFormat("{0} ", palabras[i]);
            }
            else
            {
                sb2.AppendFormat("{0} ", palabras[i]);
            }
        }
        resultado.Add(sb2.ToString());

        List<string> resultado2 = new List<string>();
        string temp;

        int index1, index2, salto;
        string target;
        int limite = resultado.Count;
        foreach (var item in resultado)
        {
            target = " ";
            temp = item.ToString().Trim();
            index1 = 0; index2 = 0; salto = 2;

            if (limite <= 1)
            {
                resultado2.Add(temp);
                break;
            }
            while (temp.Length <= width)
            {
                if (temp.IndexOf(target, index2) < 0)
                {
                    index1 = 0; index2 = 0;
                    target = target + " ";
                    salto++;
                }
                index1 = temp.IndexOf(target, index2);
                temp = temp.Insert(temp.IndexOf(target, index2), " ");
                index2 = index1 + salto;

            }
            limite--;
            resultado2.Add(temp);
        }
        return resultado2;
    }

Hope it helps!

Fabrication answered 7/7, 2016 at 16:12 Comment(0)

I would suggest whoever wants to go into details and understand ins and outs of this problem, to watch MIT 6.006 course - Lecture No.20

Here is the link to it.

https://www.youtube.com/watch?v=ENyox7kNKeY

Woodrow answered 21/10, 2020 at 19:39 Comment(0)

Recommended topics

Hot tags