Embedded Software Defect Rate

S

7

What defect rate can I expect in a C++ codebase that is written for an embedded processor (DSP), given that there have been no unit tests, no code reviews, no static code analysis, and that compiling the project generates about 1500 warnings. Is 5 defects/100 lines of code a reasonable estimate?

Simoniac answered 29/10, 2010 at 7:43 Comment(5)

Why do you want to know? – Correggio 29/10, 2010 at 8:12

It's really hard to say how relevant the number of warnings is. Most warnings originating from a header will be reported in all translation units that include that header. – Statolith 29/10, 2010 at 8:19

@Jan: to show the management that there are probably lots of bugs in the codebase, and that we should start doing something about it right now. – Simoniac 29/10, 2010 at 8:25

I've been down that road, and it didn't work. I would recommend looking for a bug that blows up in the customer's face. If you start 'fixing' things in such a codebase, and you're not familiar with the code, expect new bugs and old bugs waking up from hibernation. – Correggio 29/10, 2010 at 8:40

Why is defect rate (i.e. a number) even important? One could be benign, while another could kill someone, or kill your business. You might tolerate many of the first, but just one of the second would be critical. – Gamaliel 30/10, 2010 at 7:19

G

4

Despite my scepticism of the validity of any estimate in this case, I have found some statistics that may be relevant.

In this article, the author cites figures from a "a large body of empirical studies", published in Software Assessments, Benchmarks, and Best Practices (Jones, 2000). At SIE CMM Level 1, which sounds like the level of this code, one can expect a defect rate of 0.75 per function point. I'll leave it to you to determine how function points and LOC may relate in your code - you'll probably need a metrics tool to perform that analysis.

Steve McConnell in Code Complete cites a study of 11 projects developed by the same team, 5 without code reviews, 6 with code reviews. The defect rate for the non-reviewed code was 4.5 per 100 LOC, and for the reviewed it was 0.82. So on that basis, your estimate seems fair in the absence of any other information. However I have to assume a level of professionalism amongst this team (just from the fact that they felt the need to perform the study), and that they would have at least attended to the warnings; your defect rate could be much higher.

The point about warnings is that some are benign, and some are errors (i.e. will result in undesired behaviour of the software), if you ignore them on the assumption that they are all benign, you will introduce errors. Moreover some will become errors under maintenance when other conditions change, but if you have already chosen to accept a warning, you have no defence against introduction of such errors.

Gamaliel answered 30/10, 2010 at 7:50 Comment(1)

Thanks a lot for the well-researched answer. I must admit than I had never heard of function points before. – Simoniac 30/10, 2010 at 9:14

S

10

Your question is "Is 5 defects/100 lines of code a reasonable estimate?" That question is extremely difficult to answer, and it's highly dependent on the codebase & code complexity.

You also mentioned in a comment "to show the management that there are probably lots of bugs in the codebase" -- that's great, kudos, right on.

In order to open management's figurative eyes, I'd suggest at least a 3-pronged approach:

take specific compiler warnings, and show how some of them can cause undefined / disastrous behavior. Not all warnings will be as weighty. For example, if you have someone using an uninitialized pointer, that's pure gold. If you have someone stuffing an unsigned 16-bit value into an unsigned 8-bit value, and it can be shown that the 16-bit value will always be <= 255, that one isn't gonna help make your case as strongly.
run a static analysis tool. PC-Lint (or Flexelint) is cheap & provides good "bang for the buck". It will almost certainly catch stuff the compiler won't, and it can also run across translation units (lint everything together, even with 2 or more passes) and find more subtle bugs. Again, use some of these as indications.
run a tool that will give other metrics on code complexity, another source of bugs. I'd recommend M Squared's Resource Standard Metrics (RSM) which will give you more information and metrics (including code complexity) than you could hope for. When you tell management that a complexity score over 50 is "basically untestable" and you have a score of 200 in one routine, that should open some eyes.

One other point: I require clean compiles in my groups, and clean Lint output too. Usually this can accomplished solely by writing good code, but occasionally the compiler / lint warnings need to be tweaked to quiet the tool for things that aren't problems (use judiciously).

But the important point I want to make is this: be very careful when going in & fixing compiler & lint warnings. It's an admirable goal, but you can also inadvertantly break working code, and/or uncover undefined behavior that accidentally worked in the "broken" code. Yes, this really does happen. So tread carefully.

Lastly, if you have a solid set of tests already in place, that will help you determine if you accidentally break something while refactoring.

Good luck!

Shipment answered 29/10, 2010 at 14:4 Comment(3)

+1 for careful advice. See also this book for help to make the transition from non-tested code to refactor-friendly code : amazon.com/Working-Effectively-Legacy-Michael-Feathers/dp/… – Bryson 29/10, 2010 at 20:2

@matthieu: that's a great book. It also highly recommend it. – Simoniac 29/10, 2010 at 20:43

Thanks for the great advice, and for pointing out the risk of introducing new bugs when fixing seemingly harmless warnings, especially when there are no tests. – Simoniac 29/10, 2010 at 21:9

T

4

That also depends on who wrote the code (level of experience), and how big the code base is.

I would treat all warnings as errors.

How many errors do you get when you run a static analysis tool on the code?

EDIT

Run cccc, and check the mccabe's cyclic complexity. It should tell how complex the code it.

http://sourceforge.net/projects/cccc/

Run other static analysis tools.

Talavera answered 29/10, 2010 at 7:53 Comment(6)

I haven't even tried to run a static analysis tool on the codebase. I think we should first try to get the warnings count to zero. Do this seem reasonable? – Simoniac 29/10, 2010 at 8:29

@Simoniac Yes, off course. In my opinion, the code should compile without warnings, as most warnings are valid. – Roentgen 29/10, 2010 at 8:47

I think that you have to assume that anyone who wrote that much code and did not think once to fix the warnings is very inexperienced! How did it get that bad I wonder? – Gamaliel 29/10, 2010 at 20:30

@Gamaliel Doesn't have to be. It might be the attitude. Some people do not care to write unit tests, others ignore warnings, etc. – Roentgen 29/10, 2010 at 21:1

@VJo: You don't become experienced by writing flaky code, you become unemployed. My point was that experienced professional developers understand the importance of such things. – Gamaliel 30/10, 2010 at 7:15

@Clifford: there are different reasons why the code base evolved like this, but I think it's mostly a question of attitude towards code. Quick hacks that help meeting deadlines are considered heroic. People don't realize that working this way without a safety net is just accumulating a technical debt that will have to be paid down someday. – Simoniac 30/10, 2010 at 9:38

T

4

Take a look at the code quality. It would quickly give you a indication of the amount of problems hiding in the source. If the source is ugly and take a long time to understand there will be a lot of bugs in the code.

Well structured code with consistent style and that is easy to understand are going to contain less problems. Code shows how much effort and thought went into it.

My guess is if the source contains that many warnings there is going to be a lot of bugs hiding out in the code.

Twobit answered 29/10, 2010 at 9:11 Comment(2)

Code quality is relative thing, and very hard to measure. Just looking at the code will not give much informations. – Roentgen 29/10, 2010 at 19:54

True, but it gives you an idea of what is going on in the software. – Twobit 1/11, 2010 at 7:19

G

4

Despite my scepticism of the validity of any estimate in this case, I have found some statistics that may be relevant.

In this article, the author cites figures from a "a large body of empirical studies", published in Software Assessments, Benchmarks, and Best Practices (Jones, 2000). At SIE CMM Level 1, which sounds like the level of this code, one can expect a defect rate of 0.75 per function point. I'll leave it to you to determine how function points and LOC may relate in your code - you'll probably need a metrics tool to perform that analysis.

Steve McConnell in Code Complete cites a study of 11 projects developed by the same team, 5 without code reviews, 6 with code reviews. The defect rate for the non-reviewed code was 4.5 per 100 LOC, and for the reviewed it was 0.82. So on that basis, your estimate seems fair in the absence of any other information. However I have to assume a level of professionalism amongst this team (just from the fact that they felt the need to perform the study), and that they would have at least attended to the warnings; your defect rate could be much higher.

The point about warnings is that some are benign, and some are errors (i.e. will result in undesired behaviour of the software), if you ignore them on the assumption that they are all benign, you will introduce errors. Moreover some will become errors under maintenance when other conditions change, but if you have already chosen to accept a warning, you have no defence against introduction of such errors.

Gamaliel answered 30/10, 2010 at 7:50 Comment(1)

Thanks a lot for the well-researched answer. I must admit than I had never heard of function points before. – Simoniac 30/10, 2010 at 9:14

H

3

If you want to get an estimate of the number of defects, the usual way of statistical estimatation is to subsample the data. I would pick three medium-sized subroutines at random, and check them carefully for bugs (eliminate compiler warnings, run static analysis tool, etc). If you find three bugs in 100 total lines of code selected at random, it seems reasonable that a similar density of bugs are in the rest of the code.

The problem mentioned here of introducing new bugs is an important issue, but you don't need to check the modified code back into the production branch to run this test. I would suggest a thorough set of unit tests before modifying any subroutines, and cleaning up all the code followed by very thorough system testing before releasing new code to production.

Hodge answered 29/10, 2010 at 16:42 Comment(2)

@VJo: Subsample does not mean one sample! That's statistics for you. – Gamaliel 30/10, 2010 at 8:34

@Gamaliel Statistics is fine, but murfy's law is above statistics ;) – Roentgen 30/10, 2010 at 10:44

T

2

If you want to demonstrate the benefits of unit tests, code reviews, static analysis tools, I suggest doing a pilot study.

Do some unit tests, code reviews, and run static analysis tools on a portion of the code. Show management how many bugs you find using those methods. Hopefully, the results speak for themselves.

Teniers answered 1/11, 2010 at 11:18 Comment(0)

L

1

The following article has some numbers based on real-life projects to which static analysis has been applied to: http://www.stsc.hill.af.mil/crosstalk/2003/11/0311German.html

Of course the criteria by which an anomaly is counted can affect the results dramatically, leading to the large variation in the figures shown in Table 1. In this table, the number of anomalies per thousand lines of code for C ranges from 500 (!) to about 10 (auto generated).

Lusatia answered 1/11, 2010 at 12:17 Comment(0)

Recommended topics

Hot tags