Today’s enterprise applications are massive, complex and integral to the operation of critical business functions. Many things can and will go wrong during the creation of these software products, and because of the weight that enterprise applications carry in a deployed environment, it’s crucial that the testing and diagnostics cycles are accurate and timely and that the final product is free of all major bugs and glitches. Unfortunately, that last part is impossible. There are an almost infinite number of permutations of hardware, operating systems releases, patches and configurations. Even the largest firm couldn’t test them all. Plus, it is rare that an application will become exposed to the variety of situations or scenarios that it will experience in a full production environment.
Consequently, developers attempting to reproduce and resolve production problems will address what they believe to be the problem but may actually be testing something different. As a result, they may focus on the symptom and resolve the immediate issue without solving the underlying problem.
The application life cycle is fraught with problems, so it is imperative that developers equip themselves with processes and tools to troubleshoot issues when they arise. It’s important to remember the movement of a new release from development to production is not a one-way trip. Invariably, it will have an issue and require an intervention from its developer.
Lack of visibility to determine root cause
User Acceptance Testing is a critical step in reducing the number of faults an application will have in production by attempting to run real-world use scenarios by teams of subject matter experts during pre-production. During this process, the goal is to obtain confirmation that a system meets requirements from performance, results and user experience perspectives. As one of the final stages of a development project, developers need to be able to quickly pinpoint and resolve problems as they arise.
While basic information about a message or transaction can be obtained from the underlying middleware message layer, without additional tools it is impossible to determine root cause when a transaction takes too long or a message is routed to the wrong location – two commonplace events.
The lack of visibility, combined with an infinite number of variables, makes it impossible to reproduce and truly resolve a production problem. Lack of visibility also forces development to manually contact the middleware administrator in shared services and request information about message contents. It’s an interruption to the middleware administrator and a very inefficient, costly and error-prone process.
As more firms move to a DevOps culture, cooperation in usage of tools across development and production is important. At the very least it gets the two teams speaking the same language, which means time spent in trying to reproduce a problem that is adequately specified can be reduced. At best it can help joint teams rapidly identify a problem, reproduce it in the test cycle and then develop a resolution. In many cases, collaborative work combined with the help of a monitoring tool can speed the discovery and remediation processes.
Best practices for resolving issues during production
The following “no fail” best practices are designed for Independent Software Vendors to enforce consistent guidelines among collaboration teams for application, middleware and transaction diagnostics in order to rapidly identify, trace, replicate and resolve issues that occur during production.
- Visibility – Ensure that you have the most detailed visibility into the performance of your applications as possible. Synthetic transactions are not enough. Detailed diagnostics down to the message contents or method level are essential – you need to see more than just what is being passed into and out of an application as if it were a “black box.” This is especially true where there are multiple “hops” within a server. Instead, ensure you have full visibility of each message and transaction. Use diagnostics at each juncture to proactively provide sufficient detailed information when an application’s behavior veers from the expected and ensure that it can be repaired.
- Traceability – Knowing when a metric has been breached is an important first step in optimizing application performance during test cycles. Knowing exactly what caused the problem is more challenging. Traditional testing methodologies treat the symptoms looking outside-in – not the root cause, which often requires an inside-out viewpoint. Make certain that you can trace the message path in its entirety to uncover the precise moment and environment when the problem occurred.
- Reproducibility – The key to any successful testing program is the ability to reproduce an error. It is the confirmation of a problem solved and guarantees that the same problem will never need to be resolved twice.
- Actionability – Once a problem and its trigger have been identified, and after it has been successfully isolated through replication, developers have all of the tools they need to confidently act on the information and permanently resolve application performance problems. This means they need the tools to, on their own, create new messages, re-route them and test their problem resolution.
Beyond User Acceptance Testing, the ability to identify problems sooner in the application life cycle will yield better results when the need to remediate issues arises. This can only happen when development and production are working together as a team, utilizing a common tool set, and when development is enabled with full visibility. This approach will save time and money as well as help organizations meet SLAs and drive ROI from these applications.
Charley Rich, Vice President of Product Management and Marketing at Nastel, is a software product management professional who brings over 20 years of experience working with large-scale customers to meet their application and systems management requirements. Earlier in his career he held positions in Worldwide Product Management at IBM, as Director of Product Management at EMC/SMARTS, and Vice President of Field Marketing for eCommerce firm InterWorld. Charley is a sought after speaker and a published author with a patent in the application management field.