Confidence and Uncertainty

Tester 1: “Are you certain?”

Tester 2: “I’m almost certain”

Tester 1: “Then you’re uncertain?”

Tester 2: “No, I, uh…I’m fairly certain”

Tester 1: “So you’re not certain?”

Tester 2: “Dammit, yes I’m certain”

What is the opposite of “certain”? You might think the answer is “uncertain”, but the English language is a tricky beast.

Perhaps adding a little definition would help. There are two forms of certainty:

  • Epistemic certainty which relates to your state of knowledge and whether you hold a belief for which there are no possible grounds for doubt.
  • Psychological certainty which relates to your state of mind. Call this your degree of confidence.

Epistemic certainty is an absolute1 and not a matter of degree, whereas uncertainty is scalar. They cannot be opposites.  Imagine a continuum ranging from absolute uncertainty to absolute (epistemic) certainty.  Any point on that scale represents a degree of uncertainty. No degree of uncertainty can be sensibly said to be the opposite of epistemic certainty, any more than it is sensible to say that any rational number is the opposite of infinity. The relationship is the same; on this continuum epistemic certainty is an unobtainable construct much in the same way that one cannot count to infinity.

What does this have to do with testing? I’ve written several times on this blog about uncertainty (here, here and here). I’ve also written a little about confidence. Having just read Duncan Nisbet’s post on Michael Bolton’s Let’s Test keynote, I think it’s time to link the two.

When we first approach an item of software, we start from a position of little knowledge. This brings with it a great deal of uncertainty.  We read, we ask questions, we test. We build better models in our minds and test those models, constantly refining and enriching them. Our role as testers is to do just this, to learn, and to articulate what we have learned. A natural result of this growth of knowledge is the reduction of uncertainty. This does not mean that we increase epistemic certainty, or that we get closer to it. Moving from ten to one million does not increase one’s proximity to infinity; else infinity would not by definition be infinite.

Is our role as testers to reduce uncertainty? Ultimately yes, I believe that it is, in the epistemic sense at least. What is the value of any item of information that does not reduce uncertainty? If we provide information that has no effect on uncertainty, then we have most likely not provided any new information at all2. We might add value by providing information that increases uncertainty, by identifying an unknown that was not previously known to be unknown3 or that was previously thought to be better known than it is. However, in this sense we are not changing the balance of epistemic uncertainty, but have strayed into the realm of psychological certainty.

Psychological certainty, in contrast to epistemic certainty, is scalar in nature: one can suspect, one can be fairly confident or one can be utterly convinced. In the psychological sense, certain and uncertain are indeed opposites, and an increase in one reduces the other. So when Michael says “A key part of our service is to reduce unwarranted and potentially damaging certainty about the product”, I believe he is talking about psychological certainty4, and I’d be inclined to agree. How do we do so? By doing what we do: investigating, uncovering and revealing information that runs counter to the unwarranted certainty; in other words, by reducing epistemic uncertainty.

In testing, the danger we encounter is when we blur the distinction between epistemic and psychological certainty. “99% of the tests pass”: does this provide a case to increase our confidence? No. “We’ve found and fixed 1000 bugs”? No. A warrant might justify a belief, but we should be wary of seeing ourselves providing warrants that increase psychological certainty. We should certainly not engage in managing confidence. You may be told that one of the purposes of testing is to build confidence and that your practices need to be supportive. If you agree then you are agreeing to a scam. The most we can do is create an environment in which the confidence of our customers will live or die based on relevant information being made available to the right people when they need it. Their confidence is their business.

Notes:

  • 1 You might ask if I’m certain about this: my answer is no. It is entirely possible that one day some bright spark will solve the problems that have been plaguing philosophers for thousands of years, therefore I have reason to doubt this belief, and therefore I am not certain – in the epistemic sense. I might concede to being certain in the psychological sense, but that’s my problem.
  • 2 Think repetitive regression testing.
  • 3 A Rumsfeld.
  • 4 It makes as much sense to talk about reducing epistemic certainty as it does to talk about – you guessed it – reducing infinity.

User Acceptance Tricks?

Some time ago, I was an SAP consultant. Between projects I configured a variety of demonstration systems for use in presales. These were pretty rough and ready, but they did the job. The trick (and I mean that literally) was to carefully define detailed step by step scripts, test them, and make sure that the demonstrator followed them to the letter. These provided safe pathways; routes through the application that were free of problems. A demonstrator would stray from the path at their peril; the demo would quickly fall apart if they did.

This is analogous to some User Acceptance Testing practices that I’ve observed. Do you recognize this?

The blinkered scripting scam: Acceptance tests will be scripted in advance. They will be traced to requirements. The scripts will be reviewed and pretested by the project. If all the tests pass when executed by the acceptance team, then the software will be accepted.

From a project management perspective this would seem to make sense:

  • It gives the project an opportunity to check that the acceptance team is only testing behavior that is considered to be within the projects scope.
  • It gives the project an opportunity to make sure that acceptance tests will pass before they are formally executed.
  • It helps to ensure that the acceptance team can begin execution just as soon as the software is ready for them.

This is not testing, nor is it even meaningful checking: pretesting ensures that acceptance test execution will not reveal any new information. This is demonstration, and nothing more. It has consequences:

  • Execution is often the first opportunity that acceptance testers have to get their hands on the software. With little or no opportunity to interact with the software in advance, just how insightful will their preplanned tests be?
  • Bugs don’t neatly line up along the course charted by tests. Nor do they conveniently congregate around requirements or other abstractions of system behavior just waiting to be found. Confirmatory requirements based testing will miss all manner of problems.
  • Pretesting creates safe pathways through the software. If acceptance testing is confined to these tests it can result in an acceptance decision regardless of the hazards that may lurk beyond these paths.
  • Acceptance testers, be they customers, users or their representatives have the potential to bring important insights to testing. They have a different perspective, one that is often centered on the value that the software could bring. This opportunity is wasted if they are made to follow a process that blinds them.

Acceptance testing is often differentiated from other forms of testing in terms of its purpose: whilst earlier tests are focused on finding problems, acceptance testing is sometimes positioned as a confidence building exercise. The risk is that acceptance testing becomes a confidence trick.

The good news is that this risk can be mitigated, even whilst checking many of the boxes that will satisfy project management. A couple of years ago I found myself in an unusual position; working for a vendor yet managing all testing, including acceptance by the customer. This presented a potential conflict of interest that I was determined to avoid. The contract was fixed price and payment was tied to a specific delivery date so the project manager wanted to adopt practices similar to those described above. Fortunately, he also accepted that doing so risked imposing constraints on the quality of acceptance and was willing to entertain alternatives. We agreed on the following:

  • Domain and application experts would provide training to the acceptance team prior to testing, and would be on hand to provide coaching throughout.
  • User Acceptance Demonstrations would serve to provide basic verification of requirements.
  • This would be supplemented by exploratory testing, which would allow the acceptance testers to kick the tires and bring their own perspectives to bear in a more meaningful way that the scripts alone would allow.
  • A transparent and visibly fair triage process would be implemented, which would allow the customer to put forth their own prioritization of bugs whilst allowing the project management to intervene should bugs be reported that were beyond the scope of the project.

Project management had the control they needed over scope. The customer was able to get a good feel for the software and the value it would provide. We were able to identify a number of important bugs that would otherwise have escaped us and become warrantee issues. With a little bit of thought, we managed to put the testing back into acceptance testing. Which are you, tester or grifter?

Devil in the Detail II

Previously…

Continuing with the theme of review and approval as arguments for detailed test scripts…

Argument 2: “We need scripts so that they can be approved prior to execution”.

Like the previous argument, this expresses both a desire and an assumption:

  • DESIRE: We want tests to be approved before they are executed.
  • ASSUMPTION: We can only do that if there are scripts.

Again, we should question if the assumption is valid. It’s much easier for an approver to work through a high level outline of tests than a thousand pages of scripts. I’ve seen approvers asked to do the latter on a number of occasions: I’ve seldom seen an approval result from it.

We should also question the value of such approval. How valid is the approval of tests before they are executed? The tests will change, they must change otherwise many of the insights gained through testing go to waste. As we test, we learn about the software and how it will be used; we learn things that were never considered when the requirements were formulated; we learn implications of the design that were not – could not – be foreseen prior to it being materialized. Further, our learning and observations cause change: as we identify issues with the requirements and design, we drive change in the purpose and implementation of the software. If the tests aren’t adapted to match the evolution of the software, how useful will the tests be? Continuing to follow a plan once it has been established that the plan no longer reflects the situation on the ground is an exercise in futility. The tests must evolve with the software and with our understanding of it. And does every such change require another approval? This creates change inertia, which slows down testing, lengthens the feedback loop and drags down the value of testing.

Where does this desire originate?  In many cases, though demonstrating a lack of understanding as to the nature of discovery, it can be perfectly innocent. However, it can also be a warning signal for something far more sinister:

  • Perhaps the stakeholders of testing have little confidence in the testers: “We want to approve your tests because we don’t believe you are capable of making the right decisions as to how or what to test”. Now, it may be that in such situations the stakeholders have a crystal ball and can predict which tests are required to find all the important bugs, but it is rather more likely that this is evidence of perceived commoditization and the death of trust. If you don’t trust your testers, then replace them – or do a better job of articulating what’s important to you.
  • Perhaps the testers themselves are seeking approval as a means to protect themselves from a witch hunt: “We fear that you will haul us over the coals if we miss a bug – so we want you to approve our tests”. Developers can make bugs but woe betide the tester who misses one? It’s no secret that it impossible to test everything: missing bugs is inevitable. If this is typical of your work environment, take it as indicative of the state of relationships in your workplace. Put some serious thought into whether this can be fixed, or whether you should be considering other options.

Finally it’s worth mentioning that in some cases (for example outsourced testing), both scripts and approvals are a contractual requirement. In these cases it’s worth running through the contract to determine exactly what’s required, and giving thought to how to prevent this from dumbing down the testing.  And have a conversation with however writes the contracts!

More to follow…