Aftermath of Production bug

Yes, the aftermath. That’s where the game resumes.
The update is released.. A Few days pass by fine and we suddenly start getting reports of a few things not working as we promised. “Production issue” they call it.

And even before we know the root cause, the famous ritual of ‘Why did the Tester not find it’ starts.

Well the production bug could be because we missed:

some configs
the redirect URLs
Cache clearing after logout
Time-zone consideration
Performance handling
Some change of a code to fix a regression issue
And many more unrelated to business functionality itself.

When it comes to business functionality not working as expected, the question “Why did not the testers didn’t find these issues” leads to discovering time spent going back and forth when the functionality was groomed, debating what was groomed, what was designed, and what and where the functionality was executed. A lot of time and money spent, really. Sometimes, unnecessarily.

Before we start blaming the testers, did we check whether Testers are really the bottleneck as being portrayed?

Do we have have a process in place for the Quality check by Developers & Product Owners?

Did we try checking the kind of first build Testers are receiving. How much time does it take to test the acceptance criteria? Are all your testers focused on only finding acceptance criteria issues? If so, this is a big concern that no - one other than testers are performing a quality check. Quality is everyone’s responsibility. The Devs, the POs, the Testers. All stakeholders should be accountable for whatever they delivering to be of good Quality. No one should wait for the Testers to find the bug and later blame tester when a defect is spotted in the Production environment. Read this section again.

Are we asking for just Acceptance Criteria (AC) testing? Are we asking testers to just do the AC testing as mentioned in the story. Are we asking Testing to then “cover everything else in the regression”!? Is this even a proper test?

Are we conducting an impact analysis of the features? We need to remember that the requirement stated in the story is not a stand alone action. It is surrounded by the product having multiple other features and requirements. Did we get a chance to think through and list down all that in Acceptance criteria?

Are we planning a regression to give extra time for those features to get tested thoroughly, thinking about the product?

When regression testing starts - some manual & some automated - do we design the tests accordingly? Did we give enough time to test the new features considering the whole product altogether? I believe, conducting only acceptance testing within the sprint and leaving everything else to the regression phase will leave a major chance for the customer to face issues in the live environment.

An important reason is Time lapse. The time is different and the memory & the context too between when you designed and when you tested. During the regression testing just before a release, you may not have the same focus or the context you had while the specific feature was tested standalone. (Question: How is “regression” here different from or related to exploratory testing? You may want to consider writing something about that)

Are we taking a thorough responsibility for rejected bugs? When the bugs are rejected or asked to be addressed in the next sprint - have we conducted a risk assessment of how the bug will impact the current users in the live environment? Deliberately continuing with a bug results in KNOWN ISSUES and then side effects are often ignored in production.

Are we asking testers to “not go overboard with testing stories” because of the time constraints?

An Estimate is not the deadlines. Let me make that clear. If the developer takes extra time, we are comfortable, thinking there must have been some complexity. How do you react when the tester says they need one more day, or a few more hours? Some points for organisations to think about: Why do we always question Testers’ ability and competence, when they are actually trying to review and test some uncovered scenarios? Why do we keep on showing continuous mistrust and questioning every other move tester makes?

This builds a poor culture for testers, one that they they do not want to work in. If this is the case in your team, maybe just change the Testers in your team and get a few you trust. This way the environment can be a little friendly and motivating for everyone to carry on. A friendly environment is the beginning, but the various questions still need to be answered. History does have a tendency to repeat itself, after all.

Are the stories too big to estimate and you do not allow QA to estimate higher than the developers? After working on projects where we’ve started from scratch as well as where I have joined projects already underway, I have never understood why Testers’ estimates are expected or even required (!) to be less than Dev estimates. E.g. A Dev may just have to switch off one main switch - so their efforts are just 1 (i.e. less efforts)

But when it comes to Testing - that one switch may affect 10 rooms. The Tester will then have to go to each room to check if the power is off or not. The effort has to be more. The Tester will try switching on and off the buttons as well and will not just stop after having a look if the power is off. They will also see if there are appliances that should not be switched off, and alert that special provisions may need to be made. Etc, etc. Such reviews help identify missed requirements or help one understand the domain better.

As you can see, this testing effort is far more than “just wiring a switch” and can lead to one understanding use cases, identifying new scenarios and refining existing scenarios better. Points help indicate estimation of effort and must be capped based on which role is estimating.

How testers can help themselves to avoid these pains and pitfalls:

Help your lead understand what you are testing.
Make mind maps to capture the known and unknown in refinement, design, and testing sessions.
Build the tests that uncover how user will use the behaviour.
In addition to Acceptance Criteria written in the story, build the tests considering the product surrounding the feature. Explain that, after all, a feature does not work in isolation. Automate all the knowns as early as possible. This will enable you to devote time to explore the unknown.
Check with the design team just after the first build if the UI & UX built is as per their expectations.

Well there are times, when testers might have missed it, but then so have the Developers and the Product Owner.

The questions should focus on why the team missed it instead of asking why just the Tester missed it. This will helps us conduct better Root Cause Analysis and we would improve our processes to lead to less production issues or issues from the customer in the live environment.

This would in turn save the money and more importantly ensure and improve the reputation of the product and of the organisation!