Comments
Richard Davies wrote: The UK has a good crop of technology pioneers in cloud computing - for example ElasticHosts, FlexiScale, Flexiant, OnApp - and also some strong government initiatives such as G-Cloud. We will have to see whether this kind of technical leadership converts into swift mass-market adoption or not.
Cloud Computing
Conference & Expo
November 2-4, 2009 NYC
Register Today and SAVE !..
SYS-CON.TV
Today's Top SOA Links


DNDJ Feature — Clean and Protect a Large .NET Code Base with Coding Standards and Unit Testing
How to avoid runtime exceptions, instability, crashes, degraded performance and insecure backdoors

Developer testing done early in the software's lifecycle is known to have a high positive impact on application quality, since this is the phase where finding and fixing bugs is cheapest, easiest, and fastest. Ideally, coding standard checking and unit testing would be done on every piece of code before it was added to a team's code base. However, this is not always practical. Many organizations don't give developers the time and resources needed for this testing. Moreover, most organizations don't develop applications "from scratch" by writing new code for all required functionality. Rather, they typically make incremental enhancements to a large amount of functioning legacy code, or add their own code to extend third-party or Open Source packages. The resulting code bases could include legacy code written by the organization, code obtained via a merger/acquisition, code obtained from an outsourcer, or code that was developed by the Open Source community and downloaded from the Internet.

Consequently, most teams accumulate large and complex code bases with at least some code that hasn't been subject to coding standard analysis and unit testing. This involves several critical risks:

  • When the application is used in a way that development and QA didn't anticipate (and didn't test), the code might throw unexpected runtime exceptions that cause the application to become unstable, produce unexpected results, or even crash.
  • The code might open the only door that an attacker needs to manipulate the system and/or access privileged information.
  • Small coding mistakes could lead to significant performance or functionality problems.
  • The code's functionality might be broken as the application evolves over the course of its lifecycle.
If your team already has a large and complex code base (hundreds of thousands of lines, or millions of lines), it's not too late to benefit from coding standard analysis and unit testing. As long as these practices are automated and applied properly, they can still be used to identify critical problems before the release/deployment - as well as satisfy any contractual obligations for performing unit testing or complying with a designated set of standards.

This article explains a simple strategy that has proven to deliver fast and significant improvements to large and complex .NET code bases:

  1. Use coding standard analysis to identify bugs and bug-prone code.
  2. Use unit-level regression testing to ensure that the functionality is intact and use unit-level reliability testing (exercising each function/method as thoroughly as possible and checking for unexpected exceptions) to ensure that all code base changes are reliable and secure
Both steps can be automated to promote consistent implementation and allow your team to reap the potential benefits without disrupting your development efforts or adding overhead to your already hectic schedule. Moreover, automating these practices lets you concentrate on deeper design/logic issues during code review.

1. Use coding standard analysis to identify bugs and bug-prone code
WHY IS IT IMPORTANT?
Complying with coding standard rules is a proven way to achieve the following key benefits:

  1. Detect bugs or potential bugs that impact reliability, security, and performance.
  2. Enforce organizational design guidelines and specifications (application-specific, use-specific, or platform-specific) and error-prevention guidelines abstracted from specific known bugs.
  3. Improve code maintainability by improving class design and code organization.
  4. Enhance code readability by applying common formatting, naming, and other stylistic conventions.
Rules that provide benefit number 1 will be referred to in the text as Group 1 rules; rules that provide benefit number 2 will be referred to in the text as Group 2 rules, and so on.

For an example of why it's important to check coding standards even after the code is written, assume that coding standard analysis reveals that code from a frequently used module violates the "Avoid static collections" rule. This rule is important because it identifies code that could cause memory leaks. Static collection objects (i.e., ArrayList etc.) can hold a large numbers of objects, making them candidates for memory leaks. How can .NET have memory leaks? If you put a short-lived object into a "static" collection, that object will be referenced by the collection for the life of the program if you forget to remove the object from the collection when you're done with the object. If you've already removed all other references to the object, it can be difficult to see that it's still referenced.

Any memory leaks that resulted from this coding issue might be uncovered through profiling or load testing. However, this would require considerable time and effort - to design and implement the tests then track the problem back to a specific line of code. Using an automated code analysis tool, code that may cause memory leaks now or in the future can be automatically detected in seconds - without requiring team members to write any tests or manually track down the root cause of reported memory leaks.

WHAT'S REQUIRED TO DO IT?
a. Decide which coding standard rules to check.
First, review industry-standard .NET coding standard rules and decide which ones are most applicable to your project and will prevent the most common or serious defects. The rules defined by Microsoft's .NET Framework Design Guidelines and the rules implemented by automated .NET static analysis tools offer a convenient place to start. If needed, you can supplement these rules with the ones listed in books and articles by .NET experts. Note that while many tools focus on rules that examine the IL code, it is also helpful to check rules that examine the source code; this enables you to check for many code issues that cannot be identified by IL-level analysis (for example, formatting issues, empty blocks, etc.).

Also, consider rules that are unique to your organization, team, and project (for instance, an informal list of lessons learned from past experiences). Do your most experienced team developers have an informal list of lessons learned from past experiences? Have you encountered a specific bug that can be abstracted into a rule so that the bug never occurs in your code stream again? Are there explicit rules for formatting or naming conventions that your team is required to comply with?

Because legacy code bases are typically very large, checking a legacy code base requires a special strategy. It's important to recognize that legacy code's compliance with design and development rules won't be consistent because different parts of the code base probably originated from different sources. Applying rules from Groups 3 and 4 to the entire code base is likely to result in an impractically large number of rule violations that might be more overwhelming than helpful at this stage of the project. We strongly recommend that legacy code checking focus initially on rules from Groups 1-2, which will identify significant problems that should be corrected before the release/deployment.

b. Automatically check the code base and respond to findings.
Manually checking whether a large and complex code base follows coding standard rules would be incredibly slow, resource-intensive, and error-prone. Even if you had the vast resources required to review the code base manually, some rule violations would inevitably be overlooked, and just one overlooked rule violation could cause serious problems.

A more practical, thorough, and accurate way to check whether a large code base complies with coding standard rules is to use an automated coding standard analysis tool to check the entire code base at a scheduled time each night.

The two complementary strategies below are well suited to the nature and size of legacy code:

  • Smoke alarm mode. Run a smaller rule set (including only Group 1 and Group 2 rules) on the entire code base to check if the code has critical problems. If violations are found, treat them as bugs (fix them immediately).
  • Gradual "fix it" mode: Select a code module, run a full rule set on it, then fix/refactor the code as needed. This mode is used to improve general compliance. Be sure to use this mode to check all new and modified code. If possible, check that code is compliant immediately after it's written and before it's committed to source control.
It's also possible that different modules in the legacy code base call for different rules, especially from Group 2. For instance, some code analysis tools let users apply a filter to enable or disable a specific rule or a group of rules for a given set of files, which allows such custom-tailoring of the rules to the nature and origin of the code. This can be thought of as "file-based" or "directory-based" application of specific rules.

c. Hold weekly reviews for bug root cause analysis and prevention.
Hold weekly meetings to analyze the root cause of the various bugs (defects that were reported by testers, customers, etc. - not violations of development rules) that were fixed during that week. The best time to do root cause analysis on a bug is when it's still fresh in your mind. After root cause analysis, try to identify a set of rules that will prevent the same bugs from reoccurring then add these rules to the set of development rules you check.

2. Use unit-level regression testing to ensure that the functionality is intact and use unit-level reliability testing to ensure that new code is reliable and secure
The next step toward reliable and secure code is to do unit-level regression testing on all existing code, and then do unit-level reliability testing (also known as white-box testing or construction testing) on any code that's added or modified. Regression tests capture existing functionality and don't report any errors until a code modification changes that functionality. Reliability tests use unexpected stimulus and report any errors immediately. In .NET development, this involves exercising each method as thoroughly as possible for both categories of tests and checking for unexpected exceptions in reliability tests.

WHY IS IT IMPORTANT?
A large base of legacy code is a huge investment of time and resources. Its functionality has to be protected from undesired changes if some of that code is modified. After obtaining a certain level of acceptance, it's critical not to go backwards by introducing bugs in functionality during the maintenance of legacy code.

However, if your testing only checks expected functionality, you can't predict what could happen when untested paths are taken by well-meaning users exercising the application in unanticipated ways - or taken by attackers trying to gain control of your application or access to privileged data. It's hardly practical to try to identify and verify every possible user path and input or analyze every possible exception from legacy code. It's important to identify the possible paths and inputs that could cause unexpected exceptions in new and security-sensitive code because:

  • Unexpected exceptions can cause application crashes and other serious runtime problems. If unexpected exceptions surface in the field, they could cause instability, unexpected results, or crashes. Many development teams have had trouble with applications crashing for unknown reasons. Once these teams started identifying and correcting the unexpected exceptions that they previously overlooked, their applications stopped crashing.
  • Unexpected exceptions can open the door to security attacks.

About Hari Hampapuram
Hari Hampapuram is currently Parasoft's Director of Development. Hampapuram has extensive experience in building software development tools at Microsoft Research, Intrinsa, Philips Semiconductors, and AT&T Bell Laboratories. Hari has a PhD in computer science from Rutgers University.

About Matt Love
Matt Love is a software development manager with Parasoft Corporation. He has been involved in the development of Jtest, Parasoft's automated code analysis and unit testing tool , since 2001. Love has been developing since 1997 and earned his BS in Computer Engineering from University of California San Diego.

In order to post a comment you need to be registered and logged in.

Register | Sign-in

Reader Feedback: Page 1 of 1

Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021

SYS-CON Featured Whitepapers
ADS BY GOOGLE