What's the best way to become familiar with a large codebase? [closed]
Asked Answered
E

19

81

Joining an existing team with a large codebase already in place can be daunting. What's the best approach;

  • Broad; try to get a general overview of how everything links together, from the code
  • Narrow; focus on small sections of code at a time, understanding how they work fully
  • Pick a feature to develop and learn as you go along
  • Try to gain insight from class diagrams and uml, if available (and up to date)
  • Something else entirely?

I'm working on what is currently an approx 20k line C++ app & library (Edit: small in the grand scheme of things!). In industry I imagine you'd get an introduction by an experienced programmer. However if this is not the case, what can you do to start adding value as quickly as possible?

--
Summary of answers:

  • Step through code in debug mode to see how it works
  • Pair up with someone more familiar with the code base than you, taking turns to be the person coding and the person watching/discussing. Rotate partners amongst team members so knowledge gets spread around.
  • Write unit tests. Start with an assertion of how you think code will work. If it turns out as you expected, you've probably understood the code. If not, you've got a puzzle to solve and or an enquiry to make. (Thanks Donal, this is a great answer)
  • Go through existing unit tests for functional code, in a similar fashion to above
  • Read UML, Doxygen generated class diagrams and other documentation to get a broad feel of the code.
  • Make small edits or bug fixes, then gradually build up
  • Keep notes, and don't jump in and start developing; it's more valuable to spend time understanding than to generate messy or inappropriate code.

this post is a partial duplicate of the-best-way-to-familiarize-yourself-with-an-inherited-codebase

Enrage answered 18/10, 2008 at 14:4 Comment(5)
20K lines is not a very large code base. When it's only 20K lines, I would read it. One of the things I did not learn at university is working with large code bases.Stanton
Indeed. 20k does not seem like much. We have C++ files with more than 10k lines in each. I know, it is bad, but we don't have the time for cleanup right now. (just imagine me rolling my eyes just thinking about it) Much of the bloat is from comments though.Pedicure
Heh, indeed! I did not mean to imply 20k was a huge code base (I never said it was), was just looking for general, scalable, advice. Great answers so far; lots to think about.Enrage
20k is .. what, one file? ;-)Contraindicate
One place I consulted at had a 40k line file of deeply-nested if/then statements that implemented some sort of business rule thing. It was awful.Rotator
M
29

Start with some small task if possible, debug the code around your problem. Stepping through code in debug mode is the easiest way to learn how something works.

Morula answered 18/10, 2008 at 14:7 Comment(3)
It pays to think of what you expect a variable to be when debugging, and if the debugger shows different, find out why. I personnaly don't like debuggers but prefer print statements which force you to think in advance.Eburnation
@Eburnation in addition to force you to think in advance, print statements allows you more easily to see great amount of input. So for instance if a variable is usually 2, and becomes 10 once in 100 loops, you'll be able to spot it with a printf statement, but it is harder to trace that with a debugger.Gloam
Starting with doing small tasks is not always good, because it may not give understanding the system as a whole. IMHO much better way is ignoring existing tests and trying to write own ones, and going to the small tasks after at least few days.Amabil
G
18

Another option is to write tests for the features you're interested in. Setting up the test harness is a good way of establishing what dependencies the system has and where its state resides. Each test starts with an assertion about the way you think the system should work. If it turns out to work that way, you've achieved something and you've got some working sample code to reproduce it. If it doesn't work that way, you've got a puzzle to solve and a line of enquiry to follow.

Gallinacean answered 18/10, 2008 at 14:15 Comment(1)
I've always thought this was the best way to get familiar with someone else's code.Basir
A
15

One thing that I usually suggest to people that has not yet been mentioned is that it is important to become a competent user of the existing code base before you can be a developer. When new developers come into our large software project, I suggest that they spend time becoming expert users before diving in trying to work on the code.

Maybe that's obvious, but I have seen a lot of people try to jump into the code too quickly because they are eager to start making progress.

Aberrant answered 18/10, 2008 at 15:12 Comment(1)
It's not very obvious, and what a great suggestion! It is critical to understand how the app works from a users perspective, or you will be starting from a flawed perspective. If you don't understand the flow of the code from the users POV, you will make simple logic mistakes that can cost time and money.Geary
M
9

This is quite dependent on what sort of learner and what sort of programmer you are, but:

  • Broad first - you need an idea of scope and size. This might include skimming docs/uml if they're good. If it's a long term project and you're going to need a full understanding of everything, I might actually read the docs properly. Again, if they're good.
  • Narrow - pick something manageable and try to understand it. Get a "taste" for the code.
  • Pick a feature - possibly a different one to the one you just looked at if you're feeling confident, and start making some small changes.
  • Iterate - assess how well things have gone and see if you could benefit from repeating an early step in more depth.
Microsurgery answered 18/10, 2008 at 14:9 Comment(0)
M
7

I would suggest running Doxygen on it to get an up-to-date class diagram, then going broad-in for a while. This gives you a quickie big picture that you can use as you get up close and dirty with the code.

Mallarme answered 18/10, 2008 at 14:14 Comment(0)
O
7

Pairing with strict rotation.

If possible, while going through the documentation/codebase, try to employ pairing with strict rotation. Meaning, two of you sit together for a fixed period of time (say, a 2 hour session), then you switch pairs, one person will continue working on that task while the other moves to another task with another partner.

In pairs you'll both pick up a piece of knowledge, which can then be fed to other members of the team when the rotation occurs. What's good about this also, is that when a new pair is brought together, the one who worked on the task (in this case, investigating the code) can then summarise and explain the concepts in a more easily understood way. As time progresses everyone should be at a similar level of understanding, and hopefully avoid the "Oh, only John knows that bit of the code" syndrome.

From what I can tell about your scenario, you have a good number for this (3 pairs), however, if you're distributed, or not working to the same timescale, it's unlikely to be possible.

Ovolo answered 18/10, 2008 at 15:5 Comment(0)
I
5

I agree that it depends entirely on what type of learner you are. Having said that, I've been at two companies which had very large code-bases to begin with. Typically, I work like this:

If possible, before looking at any of the functional code, I go through unit tests that are already written. These can generally help out quite a lot. If they aren't available, then I do the following.

First, I largely ignore implementation and look only at header files, or just the class interfaces. I try to get an idea of what the purpose of each class is. Second, I go one level deep into the implementation starting with what seems to be the area of most importance. This is hard to gauge, so occasionally I just start at the top and work my way down in the file list. I call this breadth-first learning. After this initial step, I generally go depth-wise through the rest of the code. The initial breadth-first look helps to solidify/fix any ideas I got from the interface level, and then the depth-wise look shows me the patterns that have been used to implement the system, as well as the different design ideas. By depth-first, I mean you basically step through the program using the debugger, stepping into each function to see how it works, and so on. This obviously isn't possible with really large systems, but 20k LOC is not that many. :)

Insolvency answered 18/10, 2008 at 14:20 Comment(0)
A
3

Work with another programmer who is more familiar with the system to develop a new feature or to fix a bug. This is the method that I've seen work out the best.

Acute answered 18/10, 2008 at 14:18 Comment(0)
Q
3

I had a similar situation. I'd say you go like this:

  • If its a database driven application, start from the database and try to make sense of each table, its fields and then its relation to the other tables.
  • Once fine with the underlying store, move up to the ORM layer. Those table must have some kind of representation in code.
  • Once done with that then move on to how and where from these objects are coming from. Interface? what interface? Any validations? What preprocessing takes place on them before they go to the datastore?

This would familiarize you better with the system. Remember that trying to write or understand unit tests is only possible when you know very well what is being tested and why it needs to be tested in only that way.

And in case of a large application that is not driven towards databases, I'd recommend an other approach:

  • What the main goal of the system?
  • What are the major components of the system then to solve this problem?
  • What interactions each of the component has among them? Make a graph that depicts component dependencies. Ask someone already working on it. These componentns must be exchanging something among each other so try to figure out those as well (like IO might be returning File object back to GUI and like)
  • Once comfortable to this, dive into component that is least dependent among others. Now study how that component is further divided into classes and how they interact wtih each other. This way you've got a hang of a single component in total
  • Move to the next least dependent component
  • To the very end, move to the core component that typically would have dependencies on many of the other components which you've already tackled
  • While looking at the core component, you might be referring back to the components you examined earlier, so dont worry keep working hard!

For the first strategy: Take the example of this stackoverflow site for instance. Examine the datastore, what is being stored, how being stored, what representations those items have in the code, how an where those are presented on the UI. Where from do they come and what processing takes place on them once they're going back to the datastore.

For the second one Take the example of a word processor for example. What components are there? IO, UI, Page and like. How these are interacting with each other? Move along as you learn further.

Be relaxed. Written code is someone's mindset, froze logic and thinking style and it would take time to read that mind.

Quiteris answered 25/4, 2009 at 5:25 Comment(0)
S
2

I think you need to tie this to a particular task. When you have time on your hands, go for whichever approach you are in the mood for.

When you have something that needs to get done, give yourself a narrow focus and get it done.

Sheepfold answered 18/10, 2008 at 14:9 Comment(0)
S
2

First, if you have team members available who have experience with the code you should arrange for them to do an overview of the code with you. Each team member should provide you with information on their area of expertise. It is usually valuable to get multiple people explaining things, because some will be better at explaining than others and some will have a better understanding than others.

Then, you need to start reading the code for a while without any pressure (a couple of days or a week if your boss will provide that). It often helps to compile/build the project yourself and be able to run the project in debug mode so you can step through the code. Then, start getting your feet wet, fixing small bugs and making small enhancements. You will hopefully soon be ready for a medium-sized project, and later, a big project. Continue to lean on your team-mates as you go - often you can find one in particular who is willing to mentor you.

Don't be too hard on yourself if you struggle - that's normal. It can take a long time, maybe years, to understand a large code base. Actually, it's often the case that even after years there are still some parts of the code that are still a bit scary and opaque. When you get downtime between projects you can dig in to those areas and you'll often find that after a few tries you can figure even those parts out.

Good luck!

Succoth answered 18/10, 2008 at 14:24 Comment(0)
R
2

Get the team to put you on bug fixing for two weeks (if you have two weeks). They'll be happy to get someone to take responsibility for that, and by the end of the period you will have spent so much time problem-solving with the library that you'll probably know it pretty well.

Rese answered 18/10, 2008 at 14:54 Comment(1)
This is how I tend to do things. There is no substitute for doing thing. Simply reading code/documentation/tests never really cuts it.Bradfordbradlee
O
2

If it has unit tests (I'm betting it doesn't). Start small and make sure the unit tests don't fail. If you stare at the entire codebase at once your eyes will glaze over and you will feel overwhelmed.

If there are no unit tests, you need to focus on the feature that you want. Run the app and look at the results of things that your feature should affect. Then start looking through the code trying to figure out how the app creates the things you want to change. Finally change it and check that the results come out the way you want.

You mentioned it is an app and a library. First change the app and stick to using the library as a user. Then after you learn the library it will be easier to change.

From a top down approach, the app probably has a main loop or a main gui that controls all the action. It is worth understanding the main control flow of the application. It is worth reading the code to give yourself a broad overview of the main flow of the app. If it is a GUI app, creating a paper that shows which screens there are and how to get from one screen to another. If it is a command line app, how the processing is done.

Even in companies it is not unusual to have this approach. Often no one fully understands how an application works. And people don't have time to show you around. They prefer specific questions about specific things so you have to dig in and experiment on your own. Then once you get your specific question you can try to isolate the source of knowledge for that piece of the application and ask it.

Orleans answered 18/10, 2008 at 15:8 Comment(0)
N
2

You may want to consider looking at source code reverse engineering tools. There are two tools that I know of:

Both tools offer similar feature sets that include static analysis that produces graphs of the relations between modules in the software.

This mostly consists of call graphs and type/class decencies. Viewing this information should give you a good picture of how the parts of the code relate to one another. Using this information, you can dig into the actual source for the parts that you are most interested in and that you need to understand/modify first.

Noni answered 18/10, 2008 at 15:17 Comment(0)
E
2

Start by understanding the 'problem domain' (is it a payroll system? inventory? real time control or whatever). If you don't understand the jargon the users use, you'll never understand the code.

Then look at the object model; there might already be a diagram or you might have to reverse engineer one (either manually or using a tool as suggested by Doug). At this stage you could also investigate the database (if any), if should follow the object model but it may not, and that's important to know.

Have a look at the change history or bug database, if there's an area that comes up a lot, look into that bit first. This doesn't mean that it's badly written, but that it's the bit everyone uses.

Lastly, keep some notes (I prefer a wiki).

  • The existing guys can use it to sanity check your assumptions and help you out.
  • You will need to refer back to it later.
  • The next new guy on the team will really thank you.
Endblown answered 18/10, 2008 at 20:47 Comment(0)
M
1

I find that just jumping in to code can be a a bit overwhelming. Try to read as much documentation on the design as possible. This will hopefully explain the purpose and structure of each component. Its best if an existing developer can take you through it but that isn't always possible.

Once you are comfortable with the high level structure of the code, try to fix a bug or two. this will help you get to grips with the actual code.

Monicamonie answered 18/10, 2008 at 14:28 Comment(0)
M
1

I like all the answers that say you should use a tool like Doxygen to get a class diagram, and first try to understand the big picture. I totally agree with this.

That said, this largely depends on how well factored the code is to begin with. If its a gigantic mess, it's going to be hard to learn. If its clean, and organized properly, it shouldn't be that bad.

Matherly answered 18/10, 2008 at 21:4 Comment(0)
V
1

See this answer on how to use test coverage tools to locate the code for a feature of interest, without knowing anything about where that feature is, or how it is spread across many modules.

Vanhorn answered 11/7, 2010 at 22:33 Comment(1)
The link is broken.Linger
A
0

(shameless marketing ahead)

You should check out nWire. It is an Eclipse plugin for navigating and visualizing large codebases. Many of our customers use it to break-in new developers by printing out visualizations of the major flows.

Appoggiatura answered 19/4, 2009 at 4:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.