Troubleshooting the Everwell Hub

What do we do when something goes wrong ?

At Everwell, our focus is to deliver robust and high quality solutions that cater to a wide range of scenarios for building a better healthcare system and making impact at the grassroot level. To ensure the quality of our solutions, we have adopted a sophisticated approach where a solution goes through different stages in its life cycle with our engineers diligently working at each step to ensure highest possible quality of solution before it is ready to be delivered.

However as Stephen Hawking had once said “Nothing is perfect in this universe” and it applies to every single aspect of our lives, including everything that is built in this world. In the world of Software Development, hence it is a common practice to develop contingency plans and mechanisms to handle such unforeseen circumstances and problems commonly called out as “Bugs” which are encountered after the solutions are delivered to users.

How do we come to know of a bug?

The gateway to this framework is our designated support team which, round the clock, pays attention to our users for any concerns or issues that they may be facing when using our applications and services. For any anomalous behavior observed, the support team logs tickets into our framework which provides a detailed overview of the challenges or inconveniences being faced by our user base along with the impact being created.

However not all the bugs are visible to users and there are always chances of hidden bugs which impact the functioning of the application as a whole. In order to identify such bugs beforehand we leverage the power of Sentry and Google Firebase, which are monitoring and error tracking platforms for Web and Android applications respectively. The exceptions and error events tracked and logged into Sentry and Firebase are converted to tickets with all the monitoring details captured by these tools including error stack trace, in order to help engineers in effective handling of the bug.

How do we handle a bug?

The primary component of this framework involves a team comprising our Engineers and Product Managers who, on a daily basis, keep track of all the open tickets and collaborate towards achieving effective resolutions for the same.

There are several steps involved that our team follows in order to resolve a bug as fast and as efficiently as possible:

Firstly, as soon as we receive a new bug, a preliminary analysis is carried out by the team which determines the nature and scope of the bug as well as affected areas of the system. As part of the preliminary analysis, we also tag the bug with appropriate labels and a search is carried out in our directory of open and resolved bugs to determine if the current bug is, in any way, related to any previously logged bug.
After the preliminary analysis, it is classified whether the bug is of technical or functional nature. Technical bugs are those bugs which can be caused by multiple factors, including but not limited to erroneous or incomplete code, unhandled scenarios, missed implementation of a code specification and so on. Whereas, functional bugs are those bugs which are related to functionality misses or gaps that were not covered as part of the solution implementation.
Post the classification stage, an appropriate severity is assigned to the bug which reflects the urgency and importance of determining the resolution for the bug. There is a sophisticated matrix that is constantly updated and upgraded which is referred to as the guideline for determining the severity of any bug on the basis of various factors like user impact, monetary impact, business impact, etc.
After assessing the severity, it is determined whether there is enough data and clarification present to resolve the bug or if more clarity is needed. In case more information is required, priority discussions are carried out between Product, Program and Engineering teams to identify the correct approach for resolving the bug.
Once the correct approach is determined, the bug is assigned to the respective point of concern who works on applying the proposed resolution.
After the resolution has been completed, the bug goes through a high pace iteration of regular solution development cycle with requisite stages of multi-level testing to ensure the new code is working as expected.
Once clearance is received for resolution effectiveness the same is applied to the live environment and users are notified, if needed.

When is a bug not a bug?

Formulating a one size fits all approach for all types of bugs reduces the effectiveness of problem handling as not all bugs are of same nature. In order to keep the effectiveness of our framework high, it is very important to determine when a bug is not to be handled by the framework. There are scenarios where a challenge being faced by the user is related to a use case which our implemented solution was never considered for or was not designed to handle. In such cases these bugs are considered as enhancement requirements over existing functionality and the proposed solution enhancement for these requirements are processed as part of regular development iterations.

How do we reduce the occurrence of bugs ?

Apart from prompt handling of bugs, another aspect of maintaining effectiveness of our solutions and bug handling framework is to ensure the development is done in a comprehensive manner that can forecast and preemptively solve any of the areas of concerns that can result in a future bug.

For the Everwell Hub, we have adopted a regimented approach towards our SDLC (Software Development Lifecycle), in order to minimize the probability of future bugs raised. This approach includes:

A Technical Specification Document (TSD) that serves the purpose of detailing the solution of the proposal as well as determining any challenges or limitations that can be forecasted for the proposal. The development of any solution only starts after the TSD is reviewed and approved by the specific points of concern.
Any solution being developed goes through a rigorous cycle of code reviews and testing to ensure we cover as many scenarios and corner cases as we can think of.
Testing cycle covers different aspects of the solution being tested through the means of Unit Testing, Integration Testing, Functional Testing, Performance Testing, Regression Testing and User Acceptance Testing. These different phases involve a range of stakeholders including Engineering, Product, QE, Devops and Development, thus ensuring that detailed test plans will result in as few chances of a bug being encountered as possible.
As part of determining resolution for the bug, Root Cause Analysis (RCA) is carried out and details are updated in the ticket reflecting the origin of the bug and what led to the occurrence of the bug. RCA of every bug in turn enhances our knowledge base and in turn prepares us to incrementally design more effective solutions.

How do we evolve ?

At Everwell, our focus has always been to come up with innovative ways to keep ourselves ahead of the curve and resolve all software related issues that come up. This is managed through the following activities:

Rotation of team members working on production bugs over a pre-specified duration of time, which ensures all our team members are equally aware and equipped with the mindset to handle the nuances presented by a bug.
Periodic retrospective sessions where the whole team analyzes and brainstorms the current progress and effectiveness of our bug handling framework as well as the RCA of resolved bugs with focus on identifying areas of improvement and coming up with different ways to handle them as well.

Blog Credits:

Ashish Shrivastava, Senior Software Engineer II, Everwell Health Solutions

Comments