Learning a new codebase

Hi! My name is Hristo Deshev, and I am the new kid on the block here at Wildbit. Well, “kid” is a gross exaggeration with me being almost 30, but I sometimes like to think of myself as just a kid — the only difference from me being 10 is that I have different toys now. :-)

I am a programming language, development tool, and platform junkie with many interests, but at the moment I am serving as a .NET developer for Newsberry. Newsberry is an old project with a huge codebase. And guess what? I have to learn it. I’ve been doing that for some time now, and I daresay I have been making good progress. Here I’d like to share some tips and techniques both on the technological and philosophical level that have proven themselves useful.

Keep your thoughts organized and don’t forget the big picture

I know many people just get the source code, compile it and start running it in the debugger, trying to get an idea about what is going on. This may be a good strategy for smaller projects and one-off scripts that you find on the web, but it does not scale well. At best, you will achieve good knowledge about a part of the system (the portion of code you have read through), while you will have no idea about the rest. Don’t forget that there is a good chance that you don’t understand anything as the code can only answer the “How is this application built?” question. You will not get the “Why” part, and that is often crucial in order to understand the “How” bit.

My first steps were to get an account over at http://www.newsberry.com/ and start playing with the system. I had to first learn how it works as a user, so that I know what problems it solves. Once I felt confident I knew it at a reasonable level, I got to the source code part. I had to learn a lot about the problem domain — I was an email marketing newbie, and the body of knowledge on the subject is huge.

After you know enough about the business domain, you can start playing with the source code. To me the most important part in a system is its deployment story. You have to know the various components in the system: what are the jobs they perform, how they interact, etc. Newsberry is pretty complex: it has several web sites, Windows services, mail servers, and a couple of custom tools. Getting to know those components is essential. Exploring their interactions will quickly expose holes in your understanding of the business domain, and you have the duty and the opportunity to fill them. There are two tools that helped me here:

Mind mapping

I have been using mind maps for years, and I am used to drawing them on paper or with a computer program. They are a great tool to organize your thoughts when learning something. My guess is that they boost your understanding by stimulating the creative parts of your brain. They trigger that by making you produce something instead of passively reading up on a subject. Anyway my theory really doesn’t matter — all I care is that mind maps work for me, and I recommend everyone try them too. So, how do I use mind maps when learning Newsberry? Using my favorite mapping program, Freemind, I started creating maps for the email marketing process, Newsberry system deployment, data flow throughout the system such as email generation and transformation. As an example, here is a screenshot of my deployment map:

Newsberry deployment

You can use maps to store all kinds of information: from the high-level overviews down to the nitty-gritty. Again, the most important thing is not the map itself, but the process of creating the map. Try it — It works!


Now that you are pretty comfortable with the deployment architecture, you want to get your hands dirty. Of course, you can’t do that on the production server. You can’t do it on the staging server too — I didn’t want to do something stupid and kill the staging environment or destroy the CI process just because I don’t know the system yet. You can’t really test on your machine either, because... well because you have to be able to deploy your program everywhere and it is too easy to hardcode settings that work on your machine only. That is why, apart from running the system on my development machine, I decided to recreate the production environment in a virtual machine. My virtualization solution of choice is VirtualBox. It is fast, easy to use and offers tons of advanced features too. I quickly got a Windows 2003 server OS installed on a VM, and started playing. I configured my virtual network cards with the same IP addresses that the server had. I edited my hosts file to redirect domains to the local machine. I installed SQL Server and imported a test DB full of usable data. I got an SMTP server running that was able to deliver mail to several fake domains. I could then fetch that mail using POP3 and inspect it for errors or write automated tests. I even went ahead and duplicated the CruiseControl.NET setup so that I could test CI builds as if they were running on the real build machine.This looks like a lot of work, and it really is. I am glad I did it because it helped me learn how the system operates. I went through the horrors of administering an SMTP server, configuring IIS, fighting SQL Server, and, at the end, I came out stronger. I really recommend using virtualization to safely play with a new, unknown system. One VirtualBox tip: use the snapshots feature as often as possible. If you manage to destroy your system, just revert to the previous snapshot, and you’ll be running again in seconds.

Use code exploration tools

One word here: Resharper. Many think it is a refactoring tool, but it is really much more. Load all your code in a Visual Studio solution, and start poking around. I use Resharper’s “Find Usages” feature to find who calls a given method, what code uses a property or sets a new value. The tool is indispensable.

Another tool that deserves mention here is Reflector. It does not work with source, but decompiles assemblies instead. It has great code analysis features, and you can quickly understand how a method is used, who instantiates a given class, etc. Not needing the source code makes Reflector the perfect detective tool that allows you to unobtrusively poke even at your production code.

The ultimate method

Well really there is no such thing as the one and only codebase learning technique. You can’t use purely a technical approach or a purely theoretical one. Creating software is a complex collaborative process, and you can’t understand how a large system works using only of the two approaches.

I am sure there are a lot of useful techniques one can use to learn a new codebase quickly that I have not covered here. What are your tools and methods of choice? Share them in the comments!


Our products