When is a function too long? [closed]
Asked Answered
G

24

150

35 lines, 55 lines, 100 lines, 300 lines? When you should start to break it apart? I'm asking because I have a function with 60 lines (including comments) and was thinking about breaking it apart.

long_function(){ ... }

into:

small_function_1(){...}
small_function_2(){...}
small_function_3(){...}

The functions are not going to be used outside of the long_function, making smaller functions means more function calls, etc.

When would you break apart a function into smaller ones? Why?

  1. Methods should do only one logical thing (think about functionality)
  2. You should be able to explain the method in a single sentence
  3. It should fit into the height of your display
  4. Avoid unnecessary overhead (comments that point out the obvious...)
  5. Unit testing is easier for small logical functions
  6. Check if part of the function can be reused by other classes or methods
  7. Avoid excessive inter-class coupling
  8. Avoid deeply nested control structures

Thanks everyone for the answers, edit the list and vote for the correct answer I'll choose that one ;)

I am refactoring now with those ideas in mind :)

Giovannagiovanni answered 24/1, 2009 at 7:13 Comment(4)
You're misstating the question by posing it in terms of lines of code. The determinant factors aren't measured in lines of code.Ripply
this question can become complicated depending on the code and the language. maybe you can post it.Impulsion
If it is complied with single responsibility principle - just do it. I'm usually feel a need to make a header or for every 20 lines of code, which is flagging me to abstract it out and name this fragment a function with meaningful name instead of making a chapter header.Gridley
What is the ideal length of a method for you?Marchioness
E
83

There aren't any real hard or fast rule for it. I generally like my methods to just "do one thing". So if it's grabbing data, then doing something with that data, then writing it to disk then I'd split out the grabbing and writing into separate methods so my "main" method just contains that "doing something".

That "doing something" could still be quite a few lines though, so I'm not sure if the "number of lines" metric is the right one to use :)

Edit: This is a single line of code I mailed around work last week (to prove a point.. it's definitely not something I make a habit of :)) - I certainly wouldn't want 50-60 of these bad boys in my method :D

return level4 != null ? GetResources().Where(r => (r.Level2 == (int)level2) && (r.Level3 == (int)level3) && (r.Level4 == (int)level4)).ToList() : level3 != null ? GetResources().Where(r => (r.Level2 == (int)level2) && (r.Level3 == (int)level3)).ToList() : level2 != null ? GetResources().Where(r => (r.Level2 == (int)level2)).ToList() : GetAllResourceList();
Elwell answered 24/1, 2009 at 7:13 Comment(7)
LOL Well I could remove all the whitespace in my method and it would only be one very long line and not a long function. Doing one thing, that probably is the answer than, thanksWaugh
@Movaxes That code snippet I posted is a single statement though, not just lots of lines on one line.. there's no semi-colons in there :) I could have expanded GetResources() out each time to make it even more evil :PElwell
Yeah that makes sense. Why not just take your entire source file and put it in one line. I mean then you really get to be a Web 2.0 "ninja" :)Faydra
I remember in magazines of old (I'm talking BBC Micro old) they used to have "10 line programs" that just had several statements on each line, up to the max length the BBC could handle.. they were always a right pain to type in :DElwell
I like the concept of the function doing just one thing, ....but. If you have a function that does 10 things, and you move 9 of those things into separate functions, that are still called by the remaining function isn't that remaining function still in essence doing 10 things! I do think breaking the function up like this makes it much easier to test.Buff
@Buff - we have this debate at work often as well. The conclusion we eventually agree on though is that the "parent" function actually DOES do just one thing - that one thing is the algorithm. We largely came to this conclusion after listening to a talk by Roy Osherove, and practising his TDD Kata for a few weeks.Lacroix
Plus that one "smart" line is probably buggy (unless we can be sure earlier levels being null makes later ones null too). It gets resources matching certain parameters based on whether they're non-null. But for example, if level 3 is null but level 4 isn't, we're in trouble, because only the last one has been checked.Hinkel
H
245

Here is a list of red-flags (in no particular order) that could indicate that a function is too long:

  1. Deeply nested control structures: e.g. for-loops 3 levels deep or even just 2 levels deep with nested if-statements that have complex conditions.

  2. Too many state-defining parameters: By state-defining parameter, I mean a function parameter that guarantees a particular execution path through the function. Get too many of these type of parameters and you have a combinatorial explosion of execution paths (this usually happens in tandem with #1).

  3. Logic that is duplicated in other methods: poor code re-use is a huge contributor to monolithic procedural code. A lot of such logic duplication can be very subtle, but once re-factored, the end result can be a far more elegant design.

  4. Excessive inter-class coupling: this lack of proper encapsulation results in functions being concerned with intimate characteristics of other classes, hence lengthening them.

  5. Unnecessary overhead: Comments that point out the obvious, deeply nested classes, superfluous getters and setters for private nested class variables, and unusually long function/variable names can all create syntactic noise within related functions that will ultimately increase their length.

  6. Your massive developer-grade display isn't big enough to display it: Actually, displays of today are big enough that a function that is anywhere close to its height is probably way too long. But, if it is larger, this is a smoking gun that something is wrong.

  7. You can't immediately determine the function's purpose: Furthermore, once you actually do determine its purpose, if you can't summarize this purpose in a single sentence or happen to have a tremendous headache, this should be a clue.

In conclusion, monolithic functions can have far-reaching consequences and are often a symptom of major design deficiencies. Whenever I encounter code that is an absolute joy to read, it's elegance is immediately apparent. And guess what: the functions are often very short in length.

Hyperkinesia answered 24/1, 2009 at 7:13 Comment(11)
Good post! IMHO, the right size is below 80x25 with 8-space tabs. Another criterion (8): there are repeating patterns in the code. This may be reduced to (3).Almaalmaata
Jetxee: it is funny you mention that as I almost did. But I dropped it since it is a bit redundant with the other items and probably extends a bit too much into the realm of general architectural design. Then again, this is another reason that long functions is often linked to poor architecture.Hyperkinesia
About 6, just found a 164 lines long method in the Java standard library. You really think something is wrong with it? how will splitting it to multiple methods will advantage, except extending the development time? ohh wait, that's a disadvantage :OLunge
So, if i have a very large function (300+ lines) that does order of operations, I should break it apart into smaller functions that handle each operation right? But should I put these functions in their own class in a separate file? I feel like the class would only have those, and would have no other purpose.Hilleary
Wouldn't the number of state defining parameters go up as you decompose your methods? After all, you have to either pass data between the caller/callees via argument and return, or by storing state in privates.Backstretch
I love 6, and I feel that Zippoxer's commend is senseless... Why should one assume that all the standard API is elegantly programmed? I am curious about which method, though.Edaedacious
@PedroMorteRolo Exactly. The standard API isn't always a model of elegance. Furthermore, much of the Java API was developed with an intimate knowledge of the Java Compiler and JVM, hence you have performance considerations that may explain it. I concede that critical sections of code that can not waste a single millisecond may have to break some of these rules, but that should always be considered a special-case. Spending extra development time up-front is an initial investment that can avoid future (potentially crippling) tech-debt.Hyperkinesia
By the way.. I am for the opinion that the long-methods-are-bad-heuristic also applies to classes. IMHO, long classes are bad, because they tend to violate the single responsability principle. It would be fun to have compilers emiting warnings for both long classes and methods....Edaedacious
@PedroMorteRolo I definitely agree on this one. Furthermore, large classes are likely to have more mutable-state: which leads to code that is very difficult to maintain.Hyperkinesia
@RyanDelucchi - sir i really wanted to know more about point #1. What if the method or the situation expects all conditions to be covered, and that the conditions involve every element in the list (and hence a for loop), would breaking the method make sense? And how? And would such kind of a logic sound dumb if there is no other alternative really?Limewater
Best answer. Another good clue is: what do the comments in the code look like? The number of times I've stumbled across someone's code with a line like: // fetch Foo's credentials where Bar is "uncomplete". That's almost certainly a function name right there and should be decoupled. Probably wants to be refactored into something like: Foo.fetchCredentialWhereBarUncomplete()Larisalarissa
E
83

There aren't any real hard or fast rule for it. I generally like my methods to just "do one thing". So if it's grabbing data, then doing something with that data, then writing it to disk then I'd split out the grabbing and writing into separate methods so my "main" method just contains that "doing something".

That "doing something" could still be quite a few lines though, so I'm not sure if the "number of lines" metric is the right one to use :)

Edit: This is a single line of code I mailed around work last week (to prove a point.. it's definitely not something I make a habit of :)) - I certainly wouldn't want 50-60 of these bad boys in my method :D

return level4 != null ? GetResources().Where(r => (r.Level2 == (int)level2) && (r.Level3 == (int)level3) && (r.Level4 == (int)level4)).ToList() : level3 != null ? GetResources().Where(r => (r.Level2 == (int)level2) && (r.Level3 == (int)level3)).ToList() : level2 != null ? GetResources().Where(r => (r.Level2 == (int)level2)).ToList() : GetAllResourceList();
Elwell answered 24/1, 2009 at 7:13 Comment(7)
LOL Well I could remove all the whitespace in my method and it would only be one very long line and not a long function. Doing one thing, that probably is the answer than, thanksWaugh
@Movaxes That code snippet I posted is a single statement though, not just lots of lines on one line.. there's no semi-colons in there :) I could have expanded GetResources() out each time to make it even more evil :PElwell
Yeah that makes sense. Why not just take your entire source file and put it in one line. I mean then you really get to be a Web 2.0 "ninja" :)Faydra
I remember in magazines of old (I'm talking BBC Micro old) they used to have "10 line programs" that just had several statements on each line, up to the max length the BBC could handle.. they were always a right pain to type in :DElwell
I like the concept of the function doing just one thing, ....but. If you have a function that does 10 things, and you move 9 of those things into separate functions, that are still called by the remaining function isn't that remaining function still in essence doing 10 things! I do think breaking the function up like this makes it much easier to test.Buff
@Buff - we have this debate at work often as well. The conclusion we eventually agree on though is that the "parent" function actually DOES do just one thing - that one thing is the algorithm. We largely came to this conclusion after listening to a talk by Roy Osherove, and practising his TDD Kata for a few weeks.Lacroix
Plus that one "smart" line is probably buggy (unless we can be sure earlier levels being null makes later ones null too). It gets resources matching certain parameters based on whether they're non-null. But for example, if level 3 is null but level 4 isn't, we're in trouble, because only the last one has been checked.Hinkel
C
41

I think there is a huge caveat to the "do only one thing" mantra on this page. Sometimes doing one thing juggles lots of variables. Don't break up a long function into a bunch of smaller functions if the smaller functions end up having long parameter lists. Doing that just turns a single function into a set of highly coupled functions with no real individual value.

Cholecystitis answered 24/1, 2009 at 7:13 Comment(0)
C
23

I agree a function should only do one thing, but at what level is that one thing.

If your 60 lines is accomplishing one thing (from your programs perspective) and the pieces that make up those 60 lines aren't going to be used by anything else then 60 lines is fine.

There is no real benefit to breaking it up, unless you can break it up into concrete pieces that stand on their own. The metric to use is functionality and not lines of code.

I have worked on many programs where the authors took the only one thing to an extreme level and all it ended up doing was to make it look like someone took a grenade to a function/method and blew it up into dozens of unconnected pieces that are hard to follow.

When pulling out pieces of that function you also need to consider if you will be adding any unnecessary overhead and avoid passing large amounts of data.

I believe the key point is to look for reuseability in that long function and pull those parts out. What you are left with is the function, whether it is 10, 20, or 60 lines long.

Charlatanism answered 24/1, 2009 at 7:13 Comment(2)
Another important metric is the number of levels of block nesting. Keep to a minimum. Breaking a function into smaller parts often helps. Other things can help too, such as multiple returns.Hygrometric
Readability is one big benefit.Gherardo
M
19

A function should do only one thing. If you are doing many small things in a function, make each small thing a function and call those functions from the long function.

What you really don't want to do is copy and paste every 10 lines of your long function into short functions (as your example suggests).

Manufactory answered 24/1, 2009 at 7:13 Comment(2)
Yeah make a lot of small functions with the copy&paste pattern is not a great idea, I agree a function should try always to do only one thingWaugh
"do one thing" may or may not be correct, depending on granularity. If a function multiplies a matrix, that's fine. If a function builds a virtual car -- that is "one thing" but it's also a very big thing. Multiple functions can be used to build a car, component by component.Lewandowski
S
14

60 lines is large but not too long for a function. If it fits on one screen in an editor you can see it all at once. It really depends on what the functions is doing.

Why I may break up a function:

  • It is too long
  • It makes the code more maintainable by breaking it up and using meaningful names for the new function
  • The function is not cohesive
  • Parts of the function are useful in themselves.
  • When it is difficult to come up with a meaningful name for the function (It is probably doing too much)
Sufferable answered 24/1, 2009 at 7:13 Comment(2)
You are just way out of order with this mate. 60 lines will always be too much. I'd say that if you are closing in on 10 lines you are probably close to the limit.Schauer
But another function is still calling these functions and is essentially the very same DoThisAndThisAndAlsoThis function but with a lot of abstraction that you still have to name somehowPiderit
J
8

Bear in mind that you can end up re-factoring just for re-factoring's sake, potentially making the code more unreadable than it was in the first place.

A former colleague of mine had a bizarre rule that a function/method must only contain 4 lines of code! He tried to stick to this so rigidly that his method names often became repetitive & meaningless plus the calls became deeply nested & confusing.

So my own mantra has become: if you can't think of a decent function/method name for the chunk of code you are re-factoring, don't bother.

Jd answered 24/1, 2009 at 7:13 Comment(0)
T
8

My personal heuristic is that it's too long if I can't see the whole thing without scrolling.

Twodimensional answered 24/1, 2009 at 7:13 Comment(1)
... while have set the font size to 5?Finder
T
6

Take a peek at McCabe's cyclomatic, in which he breaks up his code into a graph where, "Each node in the graph corresponds to a block of code in the program where the flow is sequential and the arcs correspond to branches taken in the program."

Now imagine your code has no functions/methods; its just one huge sprawl of code in the form of a graph.

You want to break this sprawl into methods. Consider that, when you do, there will be a certain number of blocks in each method. Only one block of each method will be visible to all other methods: the first block (we're presuming that you will be able to jump into a method at only one point: the first block). All the other blocks in each method will be information hidden within that method, but each block within a method may potentially jump to any other block within that method.

To determine what size your methods should be in terms of number of blocks per method, one question you might ask yourself is: how many methods should I have to minimise the maximum potential number of dependencies (MPE) between all blocks?

That answer is given by an equation. If r is the number of methods that minimises the MPE of the system, and n is the number of blocks in the system, then the equation is: r = sqrt(n)

And it can be shown that this gives the number of blocks per method to be, also, sqrt(n).

Tridimensional answered 24/1, 2009 at 7:13 Comment(0)
F
5

Rule of thumb: If a function contains code blocks that do something, that is somewhat detached from the rest of code, put it in a seperate function. Example:

function build_address_list_for_zip($zip) {

    $query = "SELECT * FROM ADDRESS WHERE zip = $zip";
    $results = perform_query($query);
    $addresses = array();
    while ($address = fetch_query_result($results)) {
        $addresses[] = $address;
    }

    // now create a nice looking list of
    // addresses for the user
    return $html_content;
}

much nicer:

function fetch_addresses_for_zip($zip) {
    $query = "SELECT * FROM ADDRESS WHERE zip = $zip";
    $results = perform_query($query);
    $addresses = array();
    while ($address = fetch_query_result($results)) {
        $addresses[] = $address;
    }
    return $addresses;
}

function build_address_list_for_zip($zip) {

    $addresses = fetch_addresses_for_zip($zip);

    // now create a nice looking list of
    // addresses for the user
    return $html_content;
}

This approach has two advantages:

  1. Whenever you need to fetch addresses for a certain zip code you can use the readily available function.

  2. When you ever need to read the function build_address_list_for_zip() again you know what the first code block is going to do (it fetches addresses for a certain zip code, at least thats what you can derive from the function name). If you would have left the query code inline, you would first need to analyze that code.

[On the other hand (I will deny I told you this, even under torture): If you read a lot about PHP optimization, you could get the idea to keep the number of functions as small a possible, because function call are very, very expensive in PHP. I don't know about that since I never did any benchmarks. If thats case you would probably be better of not following any of the answers to your question if you application is very "performance sensitive" ;-) ]

Finder answered 24/1, 2009 at 7:13 Comment(0)
C
5

Size approx you screen size (so go get a big pivot widescreen and turn it)... :-)

Joke aside, one logical thing per function.

And the positive thing is that unit testing is really much easier to do with small logical functions that do 1 thing. Big functions that do many things are harder to verify!

/Johan

Crypto answered 24/1, 2009 at 7:13 Comment(0)
F
3

I usually break functions up by the need to place comments describing the next code block. What previously went into the comments now goes into the new function name. This is no hard rule, but (for me) a nice rule of thumb. I like code speaking for itself better than one that needs comments (as I've learned that comments usually lie)

Foldboat answered 24/1, 2009 at 7:13 Comment(3)
I like to comment my code, mostly not for me but for others, that eliminates lot of questions on where $variable was defined, but I also like the code to be self explanatory. Do comments lie?Waugh
yes, because more ofthen than not they are not maintained. At the time of writing they might be correct, but once a bugfix or new feature is introduced, nobody forces comments to be changed according to the new situation. Method names tend to lie far less often than comments IMHOFoldboat
I just came across this answer: #407260 stating that "Most comments in code are in fact a pernicious form of code duplication". Also - Long line of comments there.Foldboat
D
3

The main reason I usually break a function up is either because bits and pieces of it are also ingredients in another nearby function I'm writing, so the common parts get factored out. Also, if it's using a lot of fields or properties out of some other class, there's a good chance that the relevant chunk can be lifted out wholesale and if possible moved into the other class.

If you have a block of code with a comment at the top, consider pulling it out into a function, with the function and argument names illustrating its purpose, and reserving the comment for the code's rationale.

Are you sure there are no pieces in there that would be useful elsewhere? What sort of function is it?

Duax answered 24/1, 2009 at 7:13 Comment(1)
The function makes a cache file from a template, based on the url, like post_2009_01_01.html from the url /post/2009/01/01 thanks for your answerWaugh
P
2

In my opinion the answer is: when it does too much things. Your function should perform only the actions you expect from the name of the function itself. Another thing to consider is if you want to reuse some parts of your functions in others; in this case it could be useful to split it.

Proclaim answered 24/1, 2009 at 7:13 Comment(0)
P
1

If it has more than three branches, generally this means that a function or method should be broken apart, to encapsulate the branching logic in different methods.

Each for loop, if statement, etc is then not seen as a branch in the calling method.

Cobertura for Java code (and I'm sure there are other tools for other languages) calculates the number of if, etc. in a function for each function and sums it for the "average cyclomatic complexity".

If a function/method only has three branches, it will get a three on that metric, which is very good.

Sometimes it is difficult to follow this guideline, namely for validating user input. Nevertheless putting branches in different methods aids not only development and maintenance but testing as well, since the inputs to the methods that perform the branching can be analyzed easily to see what inputs need to be added to the test cases in order to cover the branches that were not covered.

If all the branches were inside a single method, the inputs would have to be tracked since the start of the method, which hinders testability.

Putup answered 24/1, 2009 at 7:13 Comment(0)
A
1

Extending the spirit of a tweet from Uncle Bob a while back, you know a function is getting too long when you feel the need to put a blank line between two lines of code. The idea being that if you need a blank line to separate the code, its responsibility and scope are separating at that point.

Antheridium answered 24/1, 2009 at 7:13 Comment(0)
N
1

I normally uses a test driven approach to writing code. In this approach the function size is often related to the granularity of your tests.

If your test is sufficiently focused then it will lead you to write a small focused function to make the test pass.

This also works in the other direction. Functions need to be small enough to test effectively. So when working with legacy code I often find that I break down larger functions in-order to test the different parts of them.

I usually ask myself "what is the responsibility of this function" and if I can't state the responsibility in a clear concise sentence, and then translate that into a small focused test, I wonder if the function is too big.

Neri answered 24/1, 2009 at 7:13 Comment(0)
F
1

I have written 500 line functions before, however these were just big switch statements for decoding and responding to messages. When the code for a single message got more complex than a single if-then-else, I extracted it out.

In essence, although the function was 500 lines, the independently maintained regions averaged 5 lines.

Ferraro answered 24/1, 2009 at 7:13 Comment(2)
Switch statements are so commonly misused that I'm sure this is one of those cases. Let me guess: you were switching on an enum? The problem here is an attempt to subvert OOP with enums, and in turn that led to an absurdly long function, which will never stop growing.Haplo
@RyanNaccarato: Here's a nonremovable case. tronche.com/gui/x/xlib/events/structures.html The enum is on the int type; member; which cannot be converted to virtual dispatch because the calling code is in a different address space.Ferraro
Y
1

There has been some thorough research done on this very topic, if you want the fewest bugs, your code shouldn't be too long. But it also shouldn't be too short.

I disagree that a method should fit on your display in one, but if you are scrolling down by more than a page then the method is too long.

See The Optimal Class Size for Object-Oriented Software for further discussion.

Yand answered 24/1, 2009 at 7:13 Comment(0)
A
1

This is partly a matter of taste, but how I determine this is I try to keep my functions roughly only as long as will fit on my screen at one time (at a maximum). The reason being that it's easier to understand what's happening if you can see the whole thing at once.

When I code, it's a mix of writing long functions, then refactoring to pull out bits that could be reused by other functions -and- writing small functions that do discrete tasks as I go.

I don't know that there is any right or wrong answer to this (e.g., you may settle on 67 lines as your max, but there may be times when it makes sense to add a few more).

Ame answered 24/1, 2009 at 7:13 Comment(1)
Well I also like to see my complete function in the screen :) sometimes that means a Monospace 9 font and a big resolution in a black background, I agree is easier to understand it that way.Waugh
S
0

My idea is that if I have to ask myself if it is too long, it is probably too long. It helps making smaller functions, in this area, because it could help later in the application's life cycle.

Snowdrift answered 24/1, 2009 at 7:13 Comment(0)
P
0

One thing (and that thing should be obvious from the function name), but no more than a screenful of code, regardless. And feel free to increase your font size. And if in doubt, refactor it into two or more functions.

Precipitancy answered 24/1, 2009 at 7:13 Comment(0)
I
0

assuming that you are doing one thing, the length will depend on:

  • what you are doing
  • what language you are using
  • how many levels of abstraction you need to deal with in the code

60 lines might be too long or it might be just right. i suspect that it may be too long though.

Impulsion answered 24/1, 2009 at 7:13 Comment(0)
E
0

I suspect you'll find a lot of answers on this.

I would probably break it up based on the logical tasks that were being performed within the function. If it looks to you that your short story is turning into a novel, I would suggest find and extract distinct steps.

For example, if you have a function that handles some kind of string input and returns a string result, you might break the function up based on the logic to split your string into parts, the logic to add extra characters and the logic to put it all back together again as a formatted result.

In short, whatever makes your code clean and easy to read (whether that's by simply ensuring your function has good commenting or breaking it up) is the best approach.

Equisetum answered 24/1, 2009 at 7:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.