When, if ever, is "number of lines of code" a useful metric? [closed]

P

45

57

Some people claim that code's worst enemy is its size, and I tend to agree. Yet every day you keep hearing things like

I write blah lines of code in a day.
I own x lines of code.
Windows is x million lines of code.

Question: When is "#lines of code" useful?

ps: Note that when such statements are made, the tone is "more is better".

Potence answered 8/10, 2008 at 18:13 Comment(2)

It was useful 20 years ago when this was written. I bet it impressed the viewers. – Cavalier 8/9, 2013 at 10:53

Just wanted to add this classic story about the misuse of this metric. folklore.org/StoryView.py?story=Negative_2000_Lines_Of_Code.txt – Durstin 19/10, 2016 at 14:46

A

114

I'd say it's when you're removing code to make the project run better.

Saying you removed "X number of lines" is impressive. And far more helpful than you added lines of code.

Abdias answered 8/10, 2008 at 18:13 Comment(10)

int i = 0; double d = 0; changed to int i = 0; double d = 0; There, reduced by 1 LOC. Means nothing though does it? ;) – Overbite 8/10, 2008 at 18:34

This. For example, refactoring some legacy code for a client, I was able to cut the number of lines in their app's main form in half and that includes adding comment blocks to the refactored methods. I guess that also counts as bragging. – Ginetteginevra 8/10, 2008 at 18:35

I agree with @[Rob Allen]; first comment is argumentative. Though poster could have made answer more clear by mentioning the word "refactor". – Denunciate 8/10, 2008 at 18:42

You should always strive towards having a less-than-zero code ratio, although this may not always be possible. – Inspired 9/10, 2008 at 6:33

This answer is obviously better than the sum of all the other answers on this page. Thanks for using boldface. – Botheration 9/10, 2008 at 19:24

Number of lines removed seems only marginally more useful than lines added. It is still trivial to manipulate and therefore is not a very useful metric. – Kajdan 4/12, 2008 at 13:39

I have to agree with Eli. If going strictly by removing, you end up with code that looks like a Perl guru wrote it. It may even make the project run better, but you just sacrificed a bit of speed for a LOT of later development time / headaches if something needs changing. – Glauce 5/8, 2009 at 3:13

But saying "I deleted all my code this morning", isn't quite as impressive. Sometimes you don't need to remove code to make it more impressive, sometimes you just need to make it more readable. – Highway 9/10, 2009 at 15:41

Plus, I'd rather own less lines of good quality code than more lines of awfully written spaghetti. Who wants to maintain 1,000,000 lines of code? I certainly don't. – Highway 9/10, 2009 at 15:43

I once removed 2100 lines from a 2200 line module without any functionality change. (tidying up after some really sloppy copy+paste programmer... grrrrr) – Pitanga 23/12, 2010 at 15:56

I

52

I'm surprised nobody has mentioned Dijkstra's famous quote yet, so here goes:

My point today is that, if we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger.

The quote is from an article called "On the cruelty of really teaching computing science".

Inspired answered 8/10, 2008 at 18:13 Comment(0)

J

38

It's a terrible metric, but as other people have noted, it gives you a (very) rough idea of the overall complexity of a system. If you're comparing two projects, A and B, and A is 10,000 lines of code, and B is 20,000, that doesn't tell you much - project B could be excessively verbose, or A could be super-compressed.

On the other hand, if one project is 10,000 lines of code, and the other is 1,000,000 lines, the second project is significantly more complex, in general.

The problems with this metric come in when it's used to evaluate productivity or level of contribution to some project. If programmer "X" writes 2x the number of lines as programmer 'Y", he might or might not be contributing more - maybe "Y" is working on a harder problem...

Jumpy answered 8/10, 2008 at 18:13 Comment(1)

Even more than a harder problem, "Y" might be writing better code for the SAME problem that is a lot more DRY and maintainable. – Impedance 30/10, 2009 at 13:22

W

31

When bragging to friends.

Winnifredwinning answered 8/10, 2008 at 18:13 Comment(1)

If you're bragging about lines of code with your friends, you need to get out more. There's far more amusing things to brag about than your code-base, haha :D – Highway 9/10, 2009 at 15:46

V

28

At least, not for progress:

“Measuring programming progress by lines of code is like measuring aircraft building progress by weight.” --Bill Gates

Vomer answered 8/10, 2008 at 18:13 Comment(0)

I

22

There is one particular case when I find it invaluable. When you are in an interview and they tell you that part of your job will be to maintain an existing C++/Perl/Java/etc. legacy project. Asking the interviewer how many KLOC (approx.) are involved in the legacy project will give you a better idea as to whether you want their job or not.

Investigate answered 8/10, 2008 at 18:13 Comment(1)

In some interviews the answer was I don't know. – Milestone 16/3, 2014 at 16:22

E

19

It's useful when loading up your line printer, so that you know how many pages the code listing you're about to print will consume. ;)

Ellinger answered 8/10, 2008 at 18:13 Comment(0)

U

11

Reminds me of this:

The present letter is a very long one, simply because I had no leisure to make it shorter.
--Blaise Pascal.

Undistinguished answered 8/10, 2008 at 18:13 Comment(0)

W

9

To paraphrase a quote I read about 25 years ago,

"The problem with using lines of code as a metric is it measures the complexity of the solution, not the complexity of the problem".

I believe the quote is from David Parnas in an article in the Journal of the ACM.

Whippet answered 8/10, 2008 at 18:13 Comment(0)

B

9

like most metrics, they mean very little without a context. So the short answer is: never (except for the line printer, that's funny! Who prints out programs these days?)

An example:

Imagine that you're unit-testing and refactoring legacy code. It starts out with 50,000 lines of code (50 KLOC) and 1,000 demonstrable bugs (failed unit tests). The ratio is 1K/50KLOC = 1 bug per 50 lines of code. Clearly this is terrible code!

Now, several iterations later, you have reduced the known bugs by half (and the unknown bugs by more than that most likely) and the code base by a factor of five through exemplary refactoring. The ratio is now 500/10000 = 1 bug per 20 lines of code. Which is apparently even worse!

Depending on what impression you want to make, this can be presented as one or more of the following:

50% less bugs
five times less code
80% less code
60% worsening of the bugs-to-code ratio

all of these are true (assuming i didn't screw up the math), and they all suck at summarizing the vast improvement that such a refactoring effort must have achieved.

Barbiturism answered 8/10, 2008 at 18:13 Comment(2)

And yet no features have been added across several iterations, while competitor's products are rapidly evolving, leaving your company with little hope of enticing further investment. – Maxia 2/7, 2014 at 20:26

Featues are added as some other microservices. Any change in requirements means it's another project and should be treated as such. – Excerpta 17/2, 2023 at 8:55

S

5

There are a lot of different Software Metrics. Lines of code is the most used and is the easiest to understand.

I am surprised how often the lines of code metric correlates with the other metrics. In stead of buying a tool that can calculate cyclomatic complexity to discover code smells, I just look for the methods with many lines, and they tend to have high complexity as well.

A good example of use of lines of code is in the metric: Bugs per lines of code. It can give you a gut feel of how many bugs you should expect to find in your project. In my organization we are usually around 20 bugs per 1000 lines of code. This means that if we are ready to ship a product that has 100,000 lines of code, and our bug database shows that we have found 50 bugs, then we should probably do some more testing. If we have 20 bugs per 1000 lines of code, then we are probably approaching the quality that we usually are at.

A bad example of use is to measure developer productivity. If you measure developer productivity by lines of code, then people tend to use more lines to deliver less.

Snakebird answered 8/10, 2008 at 18:13 Comment(0)

P

4

Answer: when you can talk about negative lines of code. As in: "I removed 40 extraneous lines of code today, and the program is still functioning as well as before."

Pikeman answered 8/10, 2008 at 18:13 Comment(0)

N

3

The Software Engineering Institute's Process Maturity Profile of the Software Community: 1998 Year End Update (which I could not find a link to, unfortunately) discusses a survey of around 800 software development teams (or perhaps it was shops). The average defect density was 12 defects per 1000 LOC.

If you had an application with 0 defects (it doesn't exist in reality, but let's suppose) and wrote 1000 LOC, on average, you can assume that you just introduced 12 defects into the system. If QA finds 1 or 2 defects and that's it, then they need to do more testing as there are probably 10+ more defects.

Nobukonoby answered 8/10, 2008 at 18:13 Comment(0)

B

3

It seems to me that there's a finite limit of how many lines of code I can refer to off the top of my head from any given project. The limit is probably very similar for the average programmer. Therefore, if you know your project has 2 million lines of code, and your programmers can be expected to be able to understand whether or not a bug is related to the 5K lines of code they know well, then you know you need to hire 400 programmers for your code base to be well covered from someone's memory.

This will also make you think twice about growing your code base too fast and might get you thinking about refactoring it to make it more understandable.

Note I made up these numbers.

Botheration answered 8/10, 2008 at 18:13 Comment(0)

S

3

It is useful in many ways.

I don't remember the exact # but Microsoft had a web cast that talked about for every X lines of code on average there are y number of bugs. You can take that statement and use it to give a baseline for several things.

How well a code reviewer is doing their job.
judging skill level of 2 employees by comparing their bug ratio's over several projects.

Another thing we look at is, why is it so many lines? Often times when a new programmer is put in a jam they will just copy and paste chunks of code instead of creating functions and encapsulating.

I think that the I wrote x lines of code in a day is a terrible measure. It take no account for difficulty of problem, language your writing in, and so on.

Sacrilegious answered 8/10, 2008 at 18:13 Comment(1)

The statistic was published in the Software Engineering Institute's Process Maturity Profile of the Software Community: 1998 Year End Update. A survey of about 800 software development teams (or shops, I don't remember) led to a finding that there are, on average, 12 defects per 1000 lines of code. – Nobukonoby 9/10, 2009 at 15:45

M

3

I'd agree that taking the total number of lines of code in a project is one way to measure complexity.

It's certainly not the only measure of complexity. For example debugging a 100 line obfuscated Perl script is much different from debugging a 5,000 line Java project with comment templates.

But without looking at the source, you'd usually think more lines of code is more complex, just as you might think a 10MB source tarball is more complex than a 15kb source tarball.

Melonie answered 8/10, 2008 at 18:13 Comment(0)

Y

2

Always. Bunch o'rookies on this question. Masters write code prolifically and densely. Good grads write lots of lines but too much fluff. Crappers copy lines of code. So, first do a Tiles analysis or gate, of course.

LoC must be used if your org doesn't do any complexity points, feature points/function points, commits, or other analysis.

Any developer who tells you not to measure him or her by LoC is shite. Any master cranks code our like you would not believe. I've worked with a handful who are 20x to 200x as productive as the average programmer. And their code is very, very, very compact and efficient. Yes, like Dijkstra, they have enormous mental models.

Finally, in any undertaking, most people are not good at it and most doing it are not great. Programming is no different.

Yes, do a hit analysis on any large project and find out 20% plus is dead code. Again, master programmers regularly annihilate dead code and crapcode.

Younts answered 8/10, 2008 at 18:13 Comment(0)

F

2

As most people have already stated, it can be an ambiguous metric, especially if you are comparing people coding in different languages.

5,000 lines of Lisp != 5,000 lines of C

Feingold answered 8/10, 2008 at 18:13 Comment(0)

S

2

I wrote 2 blog post detailling the pro and cons of counting Lines of Code (LoC):

How do you count your number of Lines Of Code (LOC) ? : The idea is to explain that you need to count the logical number of lines of code instead of a physical count. To do so you can use tools like NDepend for example.

Why is it useful to count the number of Lines Of Code (LOC) ?: The idea is that LoC should never be used to measure productivity, but more to do test coverage estimation and software deadline estimation.

Swoosh answered 8/10, 2008 at 18:13 Comment(0)

S

2

When you have to budget for the number of punch cards you need to order.

Subtropical answered 8/10, 2008 at 18:13 Comment(0)

T

2

Lines of code are useful to know when you're wondering if a code file is getting too large. Hmmm...This file is now 5000 lines of code. Maybe I should refactor this.

Turboelectric answered 8/10, 2008 at 18:13 Comment(0)

C

2

It's a great metric for scaring/impressing people. That's about it, and definitely the context I'm seeing in all three of those examples.

Compensate answered 8/10, 2008 at 18:13 Comment(0)

S

2

It's a metric of productivity, as well as complexity. Like all metrics, it needs to be evaluated with care. A single metric usually is not sufficient for a complete answer.

IE, a 500 line program is not nearly as complex as a 5000 line. Now you have to ask other questions to get a better view of the program...but now you have a metric.

Stinkpot answered 8/10, 2008 at 18:13 Comment(3)

I would call this into question. There's plenty of ways to code, for example, in Python, where you can fit at least five different lines of code into one line of code. There's also differences between whether you need to build your own function or use pre-existing stuff. It really is subjective. – Compensate 8/10, 2008 at 18:17

I agree with Robert. A 5000 line program may just be a very badly written 500 line program. I've seen plenty of examples of this. – Rainy 8/10, 2008 at 18:25

Of course it's subjective, but it is a metric, which, by their nature, are a 1-dimensional representation. – Stinkpot 8/10, 2008 at 18:40

C

1

When determining level of effort (LOE). If you are putting together a proposal and you will have the roughly the SAME engineers working on the new project, then you might be able to determine how many engineers are needed for how long.

Canaliculus answered 8/10, 2008 at 18:13 Comment(1)

If the project is substantially the same, one would expect it to take less time, as much of the code code be reused. If the project is substatially different, then it is an apples to oranges compare. The idea that programmer X churns out Y lines of code per unit of time is simply false. There is a lot more to development that coding. – Milestone 16/3, 2014 at 16:21

F

1

It is a very usefull idea when it is associated with the number of defects. "Defects" gives you a measure of code quality. The least "defects" the better the software; It is nearly impossible to remove all defects. In many occasions, a single defect could be harmfull and fatal.

However, it does not seem that nondefective software exists.

Flawy answered 8/10, 2008 at 18:13 Comment(0)

I

1

They can be helpful to indicate the magnitude of an application - says nothing about quality! My point here is just that if you indicate you worked on an application with 1,000 lines and they have an application that is 500k lines (roughly), a potential employer can understand if you have large-system experience vs. small utility programming.

I fully agree with warren that the number of lines of code you remove from a system is more useful than the lines you add.

Ingunna answered 8/10, 2008 at 18:13 Comment(1)

how do you count the lines in times of microservices? Do you count all microservices the company have, or only ones you touched? If second, why do you count all lines in a big project, not just ones (in a files) you touched? – Excerpta 17/2, 2023 at 8:57

A

1

Check out wikipedia's definition: http://en.wikipedia.org/wiki/Source_lines_of_code

SLOC = 'source lines of code'

There is actually quite a bit of time put into these metrics where I work. There are also different ways to count SLOC.

From the wikipedia article:

There are two major types of SLOC measures: physical SLOC and logical SLOC.

Another good resource: http://www.dwheeler.com/sloc/

Ablation answered 8/10, 2008 at 18:13 Comment(0)

B

1

When the coder doesn't know you are counting lines of code, and so has no reason to deliberately add redundant code to game the system. And when everyone in the team has a similar coding style (so there is a known average "value" per line.) And only if you don't have a better measure available.

Bezoar answered 8/10, 2008 at 18:13 Comment(0)

B

1

In competitions.

Botnick answered 8/10, 2008 at 18:13 Comment(1)

ozoneasylum.com/… – Nerve 18/12, 2008 at 16:53

S

1

When pointing out why the change is going to take so long.

"Windows is 7 million lines of code and it takes a while to test out all the dependencies..."

Supportable answered 8/10, 2008 at 18:13 Comment(1)

windows was 7 million maybe 15 years ago. Now it's most likely 10 times more. – Amerind 19/1, 2010 at 7:33

E

1

Lines of code isn't so useful really, and if it is used as a metric by management it leads to programmers doing a lot of refactoring to boost their scores. In addition poor algorithms aren't replaced by neat short algorithms because that leads to negative LOC count which counts against you. To be honest, just don't work for a company that uses LOC/d as a productivity metric, because the management clearly doesn't have any clue about software development and thus you'll always be on the back foot from day one.

Efferent answered 8/10, 2008 at 18:13 Comment(0)

B

1

When you are refactoring a code base and can show that you removed lines of code, and all the regression tests still passed.

Basalt answered 8/10, 2008 at 18:13 Comment(0)

L

0

It can be useful when comparing languages. I once wrote a small module in both Groovy and Clojure. The Clojure program had about 250 loc and the Groovy 1000 loc. Interestingly when I looked at one complex function and wrote it in a similar manner it was exactly the same number of lines. This was some indication that the Groovy code was filled up by boiler plate and gave me some additional reasons to start using Clojure :)

As some other people have said, it's also good when looking at commits. If you have introduced more lines of code than you have removed then you need to be aware that you have increased the complexity of the solution. This may make you re-think your solution if the problem itself does not increase complexity. It can also be a good deal to make with yourself to encourage refactoring that if you add more lines of code then you should spend some time refactoring.

Finally, although you could write something that is difficult to read by trying too hard to reduce the loc, a solution with fewer loc is almost always easier to read as there is simply less to read.

Locksmith answered 8/10, 2008 at 18:13 Comment(2)

> a solution with fewer loc is almost always easier to read as there is simply less to read. < That is absolutely not true. the natural conclusion is code golf... I regularly expand single complex lines into two or three lines with clearly named variables, to make it obvious to people after me what is going on. And usually fix bugs in the process. – Ambrose 8/12, 2017 at 15:36

In the small (in one function or similar) I guess it depends on programming style and the team but in the large then IME it is almost always always true. By this I mean if a change has reduced the lines of code drastically and over more than 1 area then it has almost always made the code easier to read. – Locksmith 11/12, 2017 at 11:52

P

0

Lines of code is not a useful metric for comparing different projects.

However, it can be useful within a project as a moving figure, for watching how the size of the code base changes over time. If you generate a graph as part of your CI process showing the lines of code at each build, it will help you to visualise how the project is evolving.

Even in this context, I would argue that the exact "Lines of code" figure itself is unimportant; what's useful is the visualisation of the trend - the steady climb upward as more features are added; the jumps where big projects are completed; the dips where a bit of redundant code was removed.

Pitanga answered 8/10, 2008 at 18:13 Comment(0)

S

0

The number of LOC is useful when calculating the defect rate (bugs per 1,000 LOC, etc.)

Sapajou answered 8/10, 2008 at 18:13 Comment(0)

V

0

This is used so often during sales presentations. For instance, KLoC (Kilo Lines of Code) or LoC is used to demonstrate the kind of competence the vendor organization has with large/complex systems. This is especially true when the vendor is attempting to showcase their ability to MAINTAIN complex legacy systems. As part of negotiation, sometimes the customer organization provides a representative chunk of code to execute a Proof of Concept with the vendor to test the vendor's capability.This representative code will have enough complexities for the vendor company to handle and its sales pitch about "maintaining systems with several million LoC" can come under the radar.

So, yes, Lines of Code is used and abused during sales presentations and hence a useful metric in sales.

Vitovitoria answered 8/10, 2008 at 18:13 Comment(0)

S

0

The lines of code is dependent upon the language.

For example 1 line of C code is worth an average of x lines of ASM code. 1 line of C++ -> C etc....

Java and C# encapsulates quite a bit of lines of code due to the background support from the VM.

Shulman answered 8/10, 2008 at 18:13 Comment(0)

G

0

Functionally never, aside from the previously-mentioned "bragging" purpose.

Lines != effectiveness. Often the relationship is inverse, in my experience (though not strictly, especially for the extreme, for obvious reasons)

Glauce answered 8/10, 2008 at 18:13 Comment(0)

D

0

Lines of code counts are useful when pitching the extensiveness of your comprehensive product to a customer who considers lines of code to be a general indicator of product size. For example, when you're trying to convince someone your product handles many corner cases, or when you're trying to get into a beta for a development tool where the tool vendor wants to get maximum code coverage for testing purposes.

Dorcas answered 8/10, 2008 at 18:13 Comment(0)

N

0

It can be a very good measure of complexity for the purposes of risk assessment - the more lines changed the greater the chance of a bug being introduced.

Nemesis answered 8/10, 2008 at 18:13 Comment(0)

B

0

I heard that Microsoft used to fire 5% of people every 6 months, I always imagined it would be based on lines of code written, which is why Windows is so bulky, slow and inefficient ;). Lines of code is a useful metric for measuring the complexity of an application in terms of rough ordering, ie a beginners program in Basic might be 10 lines of code, 100 lines of code is a toy application, 50000 lines is reasonable size application, 10 million lines of code is a monstrosity called Windows.

Lines of code is not a very useful metric though, I used to write games in assembly language (68000 mainly) they would measure in at around 50k lines of code, but I kept the number of lines of code down by not pushing registers to the stack and keeping track of what was contained in the registers to cut down on code size (other programmers I knew did a push multiple of d0-d7,a0-a6 to the stack, which obviously slows down the code, but simplifies keeping track of what is affected).

Bustamante answered 8/10, 2008 at 18:13 Comment(0)

E

0

This is mostly an add to the already volumnous commentary.. But basically, lines of code (or perhaps totalCharacterCount/60) indicates the size of the monster. As a few people have said, that gives a clue to a codebase's complexity. It's level of complexity has a lot of impact. Partially it has impact on how difficult it is to comprehend the system and make a change.

That's why people want less lines of code. In theory, less lines of code is less complex, and there is less room for error. I'm not sure that knowing that upfront is terribly useful for anything other than estimation, and planning.

For example: Supposed I have a project and on cursory examination I realize that the matter will involve modifying as many as 1000 lines of code within an application that has 10,000 lines. I know that this project is likely to take longer to implement, be less stable, and take longer to debug and test.

It's also extremely useful for understanding the scope of change between two builds. I wrote a little program that will analyze the scope of change between any two SVN revisions. It will look at a unified diff, and from it, figure out how many lines were added, removed, or changed. This helps me know what to expect in the testing and QA that follows a new build. Basically, bigger numbers of change mean that we need to watch that build closer, put it through full regression testing, etc..

Ellinger answered 8/10, 2008 at 18:13 Comment(0)

N

0

The number of codes added for a given task largely depends on who is writing the code. It shouldn't be used as a measure of productivity. A given individual can produce 1000 lines of redundant and convoluted crap while the same problem could be solved by another individual in 10 concise lines of code. When trying to use LOC added as a metric, the "who" factor should also be taken into account.

An actually useful metric would be "the number of defects found against number of lines added". That would give you an indication of the coding and test coverage capabilities of a given team or individual.

As others have also pointed out, LOC removed has better bragging rights than LOC added :)

Nerve answered 8/10, 2008 at 18:13 Comment(0)

H

0

First of all, I would exclude generated code and add the code of the generator input and the generator itself.

I would then say (with some irony), that every line of code may contain a bug and needs to be maintained. To maintain more code you need more developers. In that sense more code generates more employment.

I would like to exclude unit tests from the statement above, as less unit tests do generally not improve maintainability :)

Hardtop answered 8/10, 2008 at 18:13 Comment(0)

B

0

I have found it useful under two conditions:

Gauging my own productivity on my own new project when it's heads down coding time.
When working with a large company and speaking with a manager that really only understands widgets per day.

Blooded answered 8/10, 2008 at 18:13 Comment(0)

Recommended topics

Hot tags