Untying Knuth's knots: how to restructure spaghetti code?
Asked Answered
H

5

10

This question was inspired by How to transform a flow chart into an implementation? which asks about ways of algorithmically eliminating the goto statement from code. The answer to the general problem is described in this scientific paper.

I have implemented some code following the high-level sketch of Algorithm X from Knuth's The art of computer programming describing the generation of Lexicographic permutations with restricted prefixes (see p. 16 of this draft).

This is the corresponding flow chart of the above algorithm.

This might be a very clever and very efficient algorithm, but the structure of the code seems to be challenging to follow. I ended up using the good old goto-style implementation:

//Algorithm X;
1:
initialize();
2:
enter_level(k);
3:
set(a[k],q);
if(test() == ok) {
  if (k == n) {
    visit();
    goto 6;
  }
  goto 4;
}
goto 5;
4:
increase(k);
goto 2;
5:
increasev2(a[k]);
if (q != 0) {
  goto 3;
}
6:
decrease(k);
if (k==0) {
  goto 7;
}
set(p,u_k);
goto 5;
7:
return;

The question is: how can this code be refactored to eliminate all the goto calls?

One (bogus) answer is a suggestion to "look up the cited scientific paper, and follow it line by line" - indeed, that is certainly a possibility. But this question is about what experienced programmers see instantly once they give a glance at this spaghetti code.

I am interested in the how of refactoring, step by step, more than just the code.


Note:

  1. It is straightforward to actually implement Algorithm X based on its high-level specification and goto jumps. Implementing the black-box functions initialize() etc. would take only a few extra instructions, but those are irrelevant regarding the structure of the code. It is not important what is happening during the function calls, as the focus is now on the flow of the program.
  2. The usual debate of "is GOTO still considered harmful?" is absolutely irrelevant regarding this question, and should not be addressed at all in the answers and in the comments.

Related: how to work with or complete the spaghetti code?

Hagiographa answered 6/5, 2016 at 18:39 Comment(9)
You might want to see about rewording this question. I've read it several times, and I think you're saying that you have some code that's algorithmically generated that you want to have an algorithm to refactor the resulting gotos... right? Or just how to do the first algorithm without gotos?Chondrule
I'm not sure I understand your comment. The title is how do you restructure this spaghetti code? and then the running text elaborates on it: "The question is how to refactor this code, by means of eliminating all the goto calls from it". I am not sure how to make it any clearer.Hagiographa
This question doesn't contain enough information to restructure the code. There are at least six variables. Which ones are global and/or const? Are the functions self contained or do any of the functions change the state of the program?Yorgo
@Hagiographa You just seem to have a whole lot of stuff describing irrelevant details. At first it seemed somewhat of a rhetorical question (or maybe you planned to self answer?). And it seemed like it might be off topic... Ironically, I think that your question needs to be refactored not to be spaghetti. If you want, I can see about moving the details around to make the post more clearer.Chondrule
An experienced programmer would probably say to themselves "Ugly as it is, this code is working, but after rewriting without gotos, there's an extremely good chance it will no longer work, in some subtle, hard to find way. Leave as-is (unless it's later discovered to not actually work, and rewrite then)."Tadpole
@flatmouse: feel free to replace the statement if (k==n) by if (an alien race attacks Earth) or with whatever else you want. Such a statement will then be transformed in the refactored code to a while(!an alien race attacks Earth). This goes to all other variables.Hagiographa
@Hagiographa Can you clarify what sort of an answer you're looking for? You're looking for more than just an example of what this code would look like without gotos, right?Chondrule
@Laurel: Indeed: I am more interested in understanding how one creates such a restructuring in the first place. Not the end result is what matters, but the process of obtaining that result is what I am intersted in. The question is "How do you do something like this?"Hagiographa
@Hagiographa I just went ahead and edited the question. I moved stuff around, removed some irrelevant details, and I hope you're ok with it. Otherwise, just rollback...Chondrule
S
2

I sketched an algorithm for OP earlier at https://mcmap.net/q/1165835/-how-to-transform-a-flow-chart-into-an-implementation-closed

Found a better paper that discusses how to generate structured code while preserving the original control flow graph exactly:

W.D Maurer, "Generalized structured programs and loop trees", Science of Computer Programming, 2007

I followed that procedure (on paper, hope I did it right, looks OK at 2:40 am). His basic trick is to find strongly connected regions (cycles in the code); these will become loops; He then breaks that cycle by deleting an edge; this eventually becomes a loop backlink (restored when he is complete). The process is repeated until no more cycles can be found; what is left is essentially a structured program with identified loops. It is tricky to do this right; you really need an automated procedure. Your bit of code, although small, is still pretty nasty :-}

I cheated in one place. Maurer insists that forward gotos are OK, even into the middle of loop. If you buy that, then you can preserve the CFG exactly. If not, you have to handle the case where a loop has two or more entry points; your algorithm has such a loop. I solved the problem by coding the loop, and coding a loop-tail-end-fragment equivalent that acts like the first iteration of jump-into-the-middle, which is followed by the loop itself.

My notation is a little funny: most languages don't have "block{...}" constructs. [The one I code in (see bio) does]. Think of this as being an "execute one iteration" loop :-} I assume that blocks/loops have loop exits and loop continues. If you dont have these, you can simulate them with sufficient quantity of just block{ ... } and exit_block@N.

EDIT after accepted: In the light of day, I didn't do it right, I left out the of the while loop@3. I've patched that; the need for the block construct now goes away because I can exit the while loop@3 to achieve the same affect. Actually, the code reads a little better.

I left your numeric labels in, even where they aren't needed, for easier reference.

//Algorithm X;
1:
initialize();
2:
while (true) {
   enter_level(k);
   3: 
   while (true) {
      set(a[k],q);
      if (test() == ok) {
         if (k != n) exit_while@3;
         visit();
         decrease(k); // replicate logic at 6 to avoid jumping into middle of 5 loop
         if (k==0) return;
         set(p,u_k);
      }
      5:
      while (true) {
         increasev2(a[k]);
         if (q != 0) continue_while@3;
         6:
         decrease(k);
         if (k==0) return;
         set(p,u_k);
      } // while(true)@5
  } // while(true)@3
  4:
  increase(k);
} // while(true)@2

Unlike most other answers I've seen so far, this runs at the same speed as the original (no extra flags or flag checks).

@hatchet's answer is interesting; a) it is equally fast, b) he chose to handle the two entry loop by the same technique, but he chose the "other entry" as the loop top. He did something similar with the "enter_level(k)" operation at label 2.

It is interesting that all this structuring doesn't seem to help readability of this code one bit. Makes one wonder about the whole point of "structured programs". Maybe well-designed spaghetti isn't so bad :-}

Sillimanite answered 7/5, 2016 at 7:46 Comment(3)
It is hard to imagine to accept any other answer than this one, since this approach points towards science, backed up by solid reasoning. The cited paper is open access, and on top of p. 224 -- I quote here -- "There is no such thing as spaghetti code" nicely summarizes what one should expect from going through it. It is also somewhat notable, that this is a fairly recent paper. I thank you for your answer.Hagiographa
Thank you for a great answer. I'm trying to follow along. How could if (q != 0) continue_while@3; be implemented in c++? I suppose a regular 'continue' would jump to the wrong place?Yorgo
C and C++'s break and continue statements are poorly designed. What you want in a language are statements that can exit a named block/loop, or continue a named loop, regardless of nesting.. I think you have to use "goto 3". I still consider a program containing gotos to structured places (loop start/ends) to be structured.Sillimanite
T
3

Without too much effort (and not a lot of risk) you can make some quick reductions in the amount of gotos and labels.

1) remove labels not referenced anywhere (this would be label 1:)

2) look for blocks of code that can't be entered except by goto, that are called in few places. These can often be factored out simply. 4: can be dealt with by moving the code to where it's called, and done safely since its only exit is a goto. This also allows us to remove the goto 5 above it, since that code will simply fall through to 5:. 7: can be handled by modifying the if statement. At this point we have

initialize();
2:
enter_level(k);
3:
set(a[k],q);
if(test() == ok) {
  if (k == n) {
    visit();
    goto 6;
  }
  increase(k);
  goto 2;
}
5:
increasev2(a[k]);
if (q != 0) {
  goto 3;
}
6:
decrease(k);
if (k!=0) {
  set(p,u_k);
  goto 5;
}
return;

I'd be inclined to stop here. But if you continue, it becomes a matter of identifying loops and replacing gotos with loop constructs. However, because of the way that code is structured, the risk in making those changes seems much greater. Also, you likely end up with breaks and continues which are gotos of a sort anyway. What I ended up with is this (and I wouldn't vouch for its correctness without some very rigorous testing):

initialize();
enter_level(k);
while (true) {
  set(a[k],q);
  if(test() == ok) {
    if (k == n) {
      visit();
    } else {
      increase(k);
      enter_level(k);
      continue;
    }
  } else {
    increasev2(a[k]);
    if (q != 0) {
      continue; 
    }
  }
  while (true) {
    decrease(k);
    if (k!=0) {
      set(p,u_k);
      increasev2(a[k]);
      if (q != 0) {
        break; 
      }
    } else {
      return;
    }
  }
}

I made 3: a loop, and 6: an inner loop. I got rid of the goto 5 by copying the 5: code in place of the goto, and replacing the goto 3 with a break. That makes it a little easier to make cleaner loops. The goto 6 is fixed through the use of an else. The goto 3's become continues.

After this (if you have the energy left) you can try to change the loops from while(true) with continues into whiles with actual conditions.

It's a good idea to develop your tests first, and then make one or two changes and test. Make another change, then test again. If you don't do that, it's easy to make a structural mistake early on that then invalidates the subsequent steps and forces you to start all over again.

Tadpole answered 6/5, 2016 at 20:53 Comment(2)
I like very much the exposition and the semi-algorithmic way how you obtained this. It is interesting to see that your code differs in many ways from @old_mountain's answer. Yours contain a certain degree of "code repetition" as the same function is called in multiple lines (e.g. increasev2(a[k])). I also observed this kind of phenomenon when I dealt with nasty nested loops with multiple entry and exit points.Hagiographa
I like this one best for using break and continue as it does.Yorgo
Y
2

In c++ the algorithm could be written as:

void initialize() {}
void enter_level(int k) {}
void set(int x,int y) {}
bool test() { return true; }
void visit() {}
void increase(int k) {}
void increasev2(int k) {}
void decrease(int k) {}

void algorithm_x()
{
    int k{0};
    int a[] ={1,2,3,4,5};
    int q{0};
    bool ok{true};
    int n{0};
    int p{0};
    int u_k{0};

        //Algorithm X;
    lbl1:
        initialize();
    lbl2:
        enter_level(k);
    lbl3:
        set(a[k],q);
        if (test() == ok) {
            if (k == n) {
                visit();
                goto lbl6;
            }
            goto lbl4;
        }
        goto lbl5;
    lbl4:
        increase(k);
        goto lbl2;
    lbl5:
        increasev2(a[k]);
        if (q != 0) {
            goto lbl3;
        }
    lbl6:
        decrease(k);
        if (k==0) {
            goto lbl7;
        }
        set(p,u_k);
        goto lbl5;
    lbl7:
        return;

}

int main()
{
    algorithm_x();
    return 0;
}

Assuming we don't use break statements the program could then be:

void initialize() {}
void enter_level(int k) {}
void set(int x,int y) {}
bool test() { return true; }
void visit() {}
void increase(int k) {}
void increasev2(int k) {}
void decrease(int k) {}

void algorithm_x()
{
    int k{0};
    int a[] ={1,2,3,4,5};
    int q{0};
    bool ok{true};
    int n{0};
    int p{0};
    int u_k{0};

    bool skiptail{false};

    //Algorithm X;
    initialize();
    enter_level(k);
    while (true) {

        skiptail = false;
        set(a[k],q);
        if (test() == ok) {
            if (k == n) {
                visit();
                decrease(k);
                if (k==0) {
                    return;
                }
                set(p,u_k);
                while (true) {
                    increasev2(a[k]);
                    if (q != 0) {
                        //goto lbl3;
                        skiptail = true;
                    }
                    if (!skiptail) decrease(k);
                    if (!skiptail) if (k==0) {
                        return;
                    }
                    if (!skiptail) set(p,u_k);
                }
            }
            if (!skiptail) increase(k);
            if (!skiptail) enter_level(k);
            //goto lbl3;
            skiptail = true;
        }
        if (!skiptail) while (true) {
            increasev2(a[k]);
            if (q != 0) {
                //goto lbl3;
                skiptail = true;
            }
            if (!skiptail) decrease(k);
            if (!skiptail) if (k==0) {
                return;
            }
            if (!skiptail) set(p,u_k);
        }
        if (!skiptail) increase(k);
        if (!skiptail) enter_level(k);
        //goto lbl3;
        skiptail = true;
        if (!skiptail) while (true) {
            increasev2(a[k]);
            if (q != 0) {
                //goto lbl3;
                skiptail = true;
            }
            if (!skiptail) decrease(k);
            if (!skiptail) if (k==0) {
                return;
            }
            if (!skiptail) set(p,u_k);
        }
    }

}

int main()
{
    algorithm_x();
    return 0;
}

The change used the following algorithm:

  1. Get rid of unused labels. Remove lbl1

  2. If a label ends with a goto, then replace that block wherever it is used. Remove lbl4, lbl6, lbl7

  3. if a label returns to itself then place block in while (true). Remove bottom lbl5 (lbl5 is now self contained and can be replaced where used)

  4. if a block is self contained, then replace wherever it is used. Remove lbl5

  5. If one label follows another then place a goto next label at the end of the block so that it can be replaced as per rule 2. Remove lbl2 (which can goto lbl3)

  6. now we're left with the last label's goto's throughout the code. Replace goto lbl3 with skiptail=true,place the remaining block in a while (true) block, and set remaining statements to check if skiptail=false. Remove lbl3 and replace with skiptail = false.

Yorgo answered 6/5, 2016 at 21:22 Comment(3)
It's very impressive that you threw down your gauntlets, and despite your earlier concerns, you managed to refactor this beast. You see now that the variables and function calls are indeed beside the point. While your code in this answer is certainly nearly unreadable, it was generated -- to the best of my understanding -- entirely by the six rules you mentioned. This is exactly what I was asking about. I will need a day or two to deeply understand in general what you proposed here. But it certainly looks like an algorithm :-).Hagiographa
One teeny-tiny detail is the purpose of skiptail. Why is it there, and where does it come from?Hagiographa
Once we get to step 6. there is no way to continue with the initial approach of replacing labels as we would now have recursion. So the alternative idea is to preserve the program behavior by introducing a single variable that will 'remember' that we would have jumped to label 3 by now.Yorgo
C
2

I've never used goto, but this seemed like a fun challenge, so I tried my own hand at refactoring.

First of all, go through the code and see how many statements are gotoing each label; this will be important to keep in mind to avoid mistakes. In your example, nothing is leading to 1, so we can ignore it.

Sometimes, I found it useful to add gotos when they were implied by the control flow. It helped to keep track of the order when I moved code between things.

The best way to refactor gotos is working upwards from the inside or bottom up.

  • The last instruction is 7:return;, which can simply be moved to wherever goto 7 is called. That's easy.

  • Next, I try to see which labels end with a goto (unconditionally) and come directly after a different goto. That would be 4 in this case; it can be moved in front of 2, inside an if controlled by a sentinel (in preparation for a loop). (goto my first point to see that 2 can be removed now.)

  • The next thing I did was put 5 and 6 into a loop. If I was wrong, I could just backtrack anyway.

  • At this point, I see that 6 will be executed after either 3 or 5. I also see that 5 can execute 3, so I decide to move 3 after 5. I add a variable so that I can skip 5 the first time. I set it to true at the end of 6.

  • In order to ensure 5 can go directly to 6 when it needs to, I can wrap 3 in an if statement with the opposite condition of 5 executing. When I do need to go from 5 to 3, I can change the condition while I'm inside 5 so that 3 is executed directly afterwards.

  • At this point I only have one goto, which goes from 3 to 4. If I change that to a break I can exit the one loop and I arrive at the end. To get to 4, I just wrap everything (except 1) in a loop.

You might be able use this trick to break out of nested loops without using goto, if you have any of that going on, but that wasn't necessary in this case.

In the end, I got this code (labels included only for clarity):


1: initialize();
reached4=false;
do5 = false;
while(true){
    if (reached4){
      4: increase(k);
    }
    2: enter_level(k);
    while(true){
      if(do5){
        5:
        increasev2(a[k]);
        if (q != 0) {
          do5 = false;//goto 3
        }
      }
      if(!do5){
        3:
        set(a[k],q);
        if(test() == ok) {
          if (k == n) {
            visit();//goto 6;
          }else{
            reached4 = true;
            break;//goto 4
          }
        }
      }
      6:
      decrease(k);
      if (k==0) {
        7: return;
      }
      set(p,u_k);
      do5 = true;
    }
}
Chondrule answered 6/5, 2016 at 21:57 Comment(2)
This is very similar to @old_mountain's answer, but here you have some break statements. I copy-pasted your code to code2flow.com but the flow chart was pretty scary. It really seems to be the case that getting rid of the gotos do more harm than good. And good point about the lambdas. Another thing I should look into.Hagiographa
@Hagiographa What's interesting is that the two programs look VERY different when you put them in that tool. After comparing the two flowcharts, I think I like mine better because it's long and slender (which I think makes it easy to follow).Chondrule
S
2

I sketched an algorithm for OP earlier at https://mcmap.net/q/1165835/-how-to-transform-a-flow-chart-into-an-implementation-closed

Found a better paper that discusses how to generate structured code while preserving the original control flow graph exactly:

W.D Maurer, "Generalized structured programs and loop trees", Science of Computer Programming, 2007

I followed that procedure (on paper, hope I did it right, looks OK at 2:40 am). His basic trick is to find strongly connected regions (cycles in the code); these will become loops; He then breaks that cycle by deleting an edge; this eventually becomes a loop backlink (restored when he is complete). The process is repeated until no more cycles can be found; what is left is essentially a structured program with identified loops. It is tricky to do this right; you really need an automated procedure. Your bit of code, although small, is still pretty nasty :-}

I cheated in one place. Maurer insists that forward gotos are OK, even into the middle of loop. If you buy that, then you can preserve the CFG exactly. If not, you have to handle the case where a loop has two or more entry points; your algorithm has such a loop. I solved the problem by coding the loop, and coding a loop-tail-end-fragment equivalent that acts like the first iteration of jump-into-the-middle, which is followed by the loop itself.

My notation is a little funny: most languages don't have "block{...}" constructs. [The one I code in (see bio) does]. Think of this as being an "execute one iteration" loop :-} I assume that blocks/loops have loop exits and loop continues. If you dont have these, you can simulate them with sufficient quantity of just block{ ... } and exit_block@N.

EDIT after accepted: In the light of day, I didn't do it right, I left out the of the while loop@3. I've patched that; the need for the block construct now goes away because I can exit the while loop@3 to achieve the same affect. Actually, the code reads a little better.

I left your numeric labels in, even where they aren't needed, for easier reference.

//Algorithm X;
1:
initialize();
2:
while (true) {
   enter_level(k);
   3: 
   while (true) {
      set(a[k],q);
      if (test() == ok) {
         if (k != n) exit_while@3;
         visit();
         decrease(k); // replicate logic at 6 to avoid jumping into middle of 5 loop
         if (k==0) return;
         set(p,u_k);
      }
      5:
      while (true) {
         increasev2(a[k]);
         if (q != 0) continue_while@3;
         6:
         decrease(k);
         if (k==0) return;
         set(p,u_k);
      } // while(true)@5
  } // while(true)@3
  4:
  increase(k);
} // while(true)@2

Unlike most other answers I've seen so far, this runs at the same speed as the original (no extra flags or flag checks).

@hatchet's answer is interesting; a) it is equally fast, b) he chose to handle the two entry loop by the same technique, but he chose the "other entry" as the loop top. He did something similar with the "enter_level(k)" operation at label 2.

It is interesting that all this structuring doesn't seem to help readability of this code one bit. Makes one wonder about the whole point of "structured programs". Maybe well-designed spaghetti isn't so bad :-}

Sillimanite answered 7/5, 2016 at 7:46 Comment(3)
It is hard to imagine to accept any other answer than this one, since this approach points towards science, backed up by solid reasoning. The cited paper is open access, and on top of p. 224 -- I quote here -- "There is no such thing as spaghetti code" nicely summarizes what one should expect from going through it. It is also somewhat notable, that this is a fairly recent paper. I thank you for your answer.Hagiographa
Thank you for a great answer. I'm trying to follow along. How could if (q != 0) continue_while@3; be implemented in c++? I suppose a regular 'continue' would jump to the wrong place?Yorgo
C and C++'s break and continue statements are poorly designed. What you want in a language are statements that can exit a named block/loop, or continue a named loop, regardless of nesting.. I think you have to use "goto 3". I still consider a program containing gotos to structured places (loop start/ends) to be structured.Sillimanite
C
1

You could use a lot of variables to simulate the flow of the gotos to work with if's and while's

initialize();

enterLevel = true;
executeWhile = true;

do 
{

    if (enterLevel)
    {
        enter_level(k);
    }

    enterLevel = false;

    goto4 = false;
    goto5 = false;
    goto6 = false;

    set(a[k],q);
    if(test() == ok) 
    {
        if (k == n) 
        {
            visit();
            goto6 = true;
        }
        else
        {
            goto4 = true;
        }
    }
    else
    {
        goto5 = true;
    }

    if (goto4) 
    {
        increase(k);
        enterLevel = true;
    }
    else
    {
        do
        {
            if(goto5)
            {
                increasev2(a[k]);
                goto6 = goto5 = !(q != 0); // if (q != 0) { goto6 = goto5 = false; } else { goto6 = goto5 = true; }
            }
            if(goto6)
            {
                decrease(k);
                executeWhile = !(k==0); // if (k == 0) { executeWhile = false; } else { executeWhile = true; }
                set(p,u_k);
                goto5 = true;
            }
        } while (goto5 && executeWhile);
    }
} while (executeWhile);

Whether this version is better than the one with goto's I can't say though.


First, I completely separate all the labels.

Then I identified that are 2 loops going on here:

1 - 
    * label 4 -> goto 2
    * label 5 -> goto 3. 

Both go to top of the code, but one executes enter_level(k) and the other doesn't. That's why the enterLevel var.

2 - 
    * label 6 -> goto 5. This goes up a little in the code, and then executes again. 

Inside this loop, there are 2 situations where it goes out:

    * label 5 -> goto 3. The same as before, but now inside a nested loop
    * label 6 -> goto 7. The way out of the outer loop.

The other variables and if's are just to mantain the control flow.

Yes, I could have used some breaks (and the code could have become shorter), but as the question is about goto's, I personally prefered to not use them.

Circumscissile answered 6/5, 2016 at 19:49 Comment(4)
I trust you that this is a correct implementation, or if not, then any minor details can be fixed later. I see that the flow of the program is more-or-less the same as the goto approach, except for some carefully positioned do-while statements. What is absolutely astonishing -- to me -- that you managed to do all of this without calling a break (although some control variables might just serve the same purpose). Can you elaborate on how did you come up with this implementation? You looked at the spaghetti... and then... did what, and in which order, and why? Care to elaborate?Hagiographa
@Hagiographa added an explanation.Circumscissile
So instead of having a "goto X", you have "gotoX=true" and then a conditional "if (gotoX) { X: ... }". Yes, that obviously works. But it strikes me as failing OP's interest in improving the code. You haven't gotten rid of any gotos; you've simply simulated them one for one where each one occured. (And the code runs more slowly, too boot).Sillimanite
@IraBaxter Yes, I agree with you. As I said in the answer, I can't say this is better then the original one. It's just the same without goto'sCircumscissile

© 2022 - 2024 — McMap. All rights reserved.