Basis for claim that the number of bugs per line of code is constant regardless of the language used

Asked 24/5, 2010 at 16:40 Answered 8/5, 2019 at 15:3

Solved language-agnostic code-metrics lines-of-code

I've heard people say (although I can't recall who in particular) that the number of bugs per line of code is roughly constant regardless of what language is used. What is the research that backs this up?

Edited to add: I don't have access to it, but apparently the authors of this paper "asked the question whether the number of bugs per lines of code (LOC) is the same for programs written in different programming languages or not."

Fernery answered 24/5, 2010 at 16:40 Comment(5)

Near duplicate of #862777, closed as not a real question. – Downtown 24/5, 2010 at 16:45

"What is the research that backs this up?" I think this is defiantly a real question. There is no opinion to listing credible sources – Entail 24/5, 2010 at 16:54

@Robert -- I hoped it was clear, but I am not asking for "the Industry Standard for bugs per 1000 Lines of Code", or anything like that. I am asking for references to research backing up this particular specific claim about bug density and language independence. I'm afraid I struggle to see how this would be a considered a duplicate. – Fernery 24/5, 2010 at 17:4

I guess my point is that the other question was not taken seriously. The metric is essentially meaningless, in the same way that Lines of Code Per Day is meaningless. Just more fodder for the pointy-haired bosses to talk about. I didn't vote to close, however. – Downtown 24/5, 2010 at 17:17

"The metric is essentially meaningless" -- so it may be, but I'd be interested to see what was originally behind this claim all the same. – Fernery 24/5, 2010 at 18:50

In his book Code Complete (quoting from the 2nd Edition), in the chapter "Developer Testing," Steve McConnell cites a handful of studies across a variety of languages:

Industry average experience is about 1-25 errors per 1000 lines of code for delivered software. The software has usually been developed using a hodgepodge of techniques (Boehm 1981, Gremillion 1984, Yourdon 1989a, Jones 1998, Jones 2000, Weber 2003). Cases that have one-tenth as many errors as this are rare; cases that have 10 times more tend not to be reported. (They probably aren't ever completed!)

The Applications Division at Microsoft experiences about 10–20 defects per 1000 lines of code during in-house testing and 0.5 defects per 1000 lines of code in released product (Moore 1992). The technique used to achieve this level is a combination of the code-reading techniques described in Other Kinds of Collaborative Development Practices, and independent testing.

Harlan Mills pioneered "cleanroom development," a technique that has been able to achieve rates as low as 3 defects per 1000 lines of code during in-house testing and 0.1 defects per 1000 lines of code in released product (Cobb and Mills 1990).

These studies ranged from high-level languages like Java, down to C++ and C, all the way down to assembly. Considering the massive impact of Code Complete on software engineering as a discipline, I suspect it is responsible for popularizing this idea.

Tantrum answered 8/5, 2019 at 15:3 Comment(2)

Thats all not very current data. With millions of open source projects out there and issue trackers one would assume it should be possible to better trace "defects". – Ewall 20/7, 2022 at 9:38

@WolfgangFahl My answer is not an endorsement of the claim—just an attempt to trace how it entered the zeitgeist. 😄 – Tantrum 20/7, 2022 at 18:30

One possible source would be Les Hatton's 1995 paper "Computer programming languages and safety-related systems", in which he concludes that language choice is at least close to irrelevant and other factors (chiefly fluency in the chosen language) are the controlling factors.

About all I could add to that would be to summarize various other papers, in which defect rates for individual projects (and such) are given. I've done a bit of looking, and never found a correlation between language and defect rate, but that's not really the same as saying the defect rate is constant across languages (i.e., they may be different, but they vary so widely within each language that I've never been able to prove a difference).

Samira answered 24/5, 2010 at 19:27 Comment(0)

Recommended topics

Hot tags