Tag Archives: Confidence

L’Aquila

On April 6th 2009, an earthquake devastated L’Aquila, a town in central Italy. It killed more than three hundred people, injured thousands and displaced tens of thousands. This tragedy continues to have repercussions: last month, a group consisting of six scientists and a government official were convicted of manslaughter on the grounds that they failed to properly represent the risks of such an event.

These convictions have drawn considerable media attention, much of it contradictory, some of it misleading. One might be forgiven for believing the absurd: that these men have been found guilty of failing to predict an earthquake. The reality seems to be more complex. However, my intention with this post is simply to draw attention to lessons that this case may hold for the tester. This is neither a detailed account of the facts of the case, nor is it an opinion piece concerning the rights and wrongs of the Italian legal system. If you are seeking such things, then please Google your heart away: you’ll find plenty of accounts and opinion online.

The first thing to strike me about this case is the different opinions as to the types of information scientists should be providing about earthquake risks. In the week prior to the earthquake, these scientists met to assess the probability of an earthquake taking place, given an unusually high level of seismic activity in the preceding weeks. The scientists may have believed that providing such an assessment was the extent of their role: according to an official press release from Italy’s Department of Civil Protection, the purpose of this meeting was to provide local citizens “with all the information available to the scientific community about the seismic activity of recent weeks”. However, a key argument in the prosecution’s case was that the scientist had a legal obligation to provide a risk assessment that took into consideration factors such as population density, the age and structure of local housing etc. There is a world of difference between assessing the probability of an event and conducting risk assessment.

Does this sound familiar? Have YOU ever run into a situation where testers and their customers have conflicting views as to the role of the tester? I see these things playing out pretty much every day. Testers providing “sign off” vs. “assuring quality” vs. “providing information”: these are well known and well publicized debates that are not going to go away any time soon. Ignoring these issues and allowing such expectation gaps to persist is to court disaster: they erode and destroy relationships and it is the tester who will lose when unable to live up to the expectations of those they serve. Nor is toeing the line and trying to keep everyone happy a solution. Often testers must deliver difficult messages, and it is impossible to do so whilst playing popularity games. Imagine being the tester who has been cowed into “signing off” on a product that ends up killing someone or causing a financial loss. If you do this, then you deserve what is coming to you.

Now, whilst I strongly subscribe to the view that our role is as information providers, I have noticed something disturbing lately: testers who seem to feel that their responsibility ends with adding smiling/frowning faces to a dashboard or filing bug reports, “we reported the bug so now it’s not our problem”. This is wholly inadequate. If our role is to provide information, then handing over a report is not enough. Providing information implies not only acquiring information, but communicating it effectively. A test plan is not communication. A bug report is not communication. Even a conversation is not communication. Only when the information borne by such artifacts is absorbed and understood has communication taken place. This is neurons not paperwork. I recently had a conversation with a tester who objected to articulating the need to address automation related technical debt in a way that would be understood by project executives. Perhaps he thought this was simply self-evident. Perhaps the requests of testers should simply be accepted? Perhaps the language of testers is easily understood by executives? I disagree on all counts: testers need to be master communicators, we need to learn how to adapt our information to different mediums, but most importantly we need to learn to tailor our message to our many different audiences. Facts rarely speak for themselves; we need to give them a voice.

Another aspect of this case that I find interesting is that it seems that Bernardo De Bernardinis, the government official who has been convicted, may have had a different agenda from the scientists: to reassure the public. He had motivation to do so: not only had a local resident been making widespread, unofficial, earthquake predictions, but in 1985 Giuseppe Zamberletti, a previous head of the Department of Civil Protection, was investigated for causing panic after ordering several, and in hindsight unnecessary, evacuations in Tuscany. Before the above meeting took place he told journalists that everything was “normal” and that there was “no danger”. This advice had fatal results: many of the local inhabitants abandoned their traditional earthquake precautions in the belief that they were safe.

This is the kind of reassurance that science cannot, that scientists should not, give. It is the same with testing. Have you ever worked for a project manager, product owner or customer who simply wanted to know that everything would be okay? Of course you have, this is human nature: we crave certainty. Unfortunately, certainty is not a service we can provide. Not if we want to avoid misleading our customers. Not if we value integrity. We are not in the confidence business: we are no more Quality Reassurance than we are Quality Assurance.

Science and testing are often misunderstood, and the customers of both have needs and expectations that cannot be fulfilled by either. Scientists need to do a better job of communicating the limits of what they can do. Testers need do the same. In the L’Aquila case, the prosecution stated that the accused provided “incomplete, imprecise, and contradictory information”. Often information IS incomplete, imprecise and contradictory. Scientists and testers alike would be well advised not to hide the fact, but to frequently draw attention to it.

Confidence and Uncertainty

Tester 1: “Are you certain?”

Tester 2: “I’m almost certain”

Tester 1: “Then you’re uncertain?”

Tester 2: “No, I, uh…I’m fairly certain”

Tester 1: “So you’re not certain?”

Tester 2: “Dammit, yes I’m certain”

What is the opposite of “certain”? You might think the answer is “uncertain”, but the English language is a tricky beast.

Perhaps adding a little definition would help. There are two forms of certainty:

  • Epistemic certainty which relates to your state of knowledge and whether you hold a belief for which there are no possible grounds for doubt.
  • Psychological certainty which relates to your state of mind. Call this your degree of confidence.

Epistemic certainty is an absolute1 and not a matter of degree, whereas uncertainty is scalar. They cannot be opposites.  Imagine a continuum ranging from absolute uncertainty to absolute (epistemic) certainty.  Any point on that scale represents a degree of uncertainty. No degree of uncertainty can be sensibly said to be the opposite of epistemic certainty, any more than it is sensible to say that any rational number is the opposite of infinity. The relationship is the same; on this continuum epistemic certainty is an unobtainable construct much in the same way that one cannot count to infinity.

What does this have to do with testing? I’ve written several times on this blog about uncertainty (here, here and here). I’ve also written a little about confidence. Having just read Duncan Nisbet’s post on Michael Bolton’s Let’s Test keynote, I think it’s time to link the two.

When we first approach an item of software, we start from a position of little knowledge. This brings with it a great deal of uncertainty.  We read, we ask questions, we test. We build better models in our minds and test those models, constantly refining and enriching them. Our role as testers is to do just this, to learn, and to articulate what we have learned. A natural result of this growth of knowledge is the reduction of uncertainty. This does not mean that we increase epistemic certainty, or that we get closer to it. Moving from ten to one million does not increase one’s proximity to infinity; else infinity would not by definition be infinite.

Is our role as testers to reduce uncertainty? Ultimately yes, I believe that it is, in the epistemic sense at least. What is the value of any item of information that does not reduce uncertainty? If we provide information that has no effect on uncertainty, then we have most likely not provided any new information at all2. We might add value by providing information that increases uncertainty, by identifying an unknown that was not previously known to be unknown3 or that was previously thought to be better known than it is. However, in this sense we are not changing the balance of epistemic uncertainty, but have strayed into the realm of psychological certainty.

Psychological certainty, in contrast to epistemic certainty, is scalar in nature: one can suspect, one can be fairly confident or one can be utterly convinced. In the psychological sense, certain and uncertain are indeed opposites, and an increase in one reduces the other. So when Michael says “A key part of our service is to reduce unwarranted and potentially damaging certainty about the product”, I believe he is talking about psychological certainty4, and I’d be inclined to agree. How do we do so? By doing what we do: investigating, uncovering and revealing information that runs counter to the unwarranted certainty; in other words, by reducing epistemic uncertainty.

In testing, the danger we encounter is when we blur the distinction between epistemic and psychological certainty. “99% of the tests pass”: does this provide a case to increase our confidence? No. “We’ve found and fixed 1000 bugs”? No. A warrant might justify a belief, but we should be wary of seeing ourselves providing warrants that increase psychological certainty. We should certainly not engage in managing confidence. You may be told that one of the purposes of testing is to build confidence and that your practices need to be supportive. If you agree then you are agreeing to a scam. The most we can do is create an environment in which the confidence of our customers will live or die based on relevant information being made available to the right people when they need it. Their confidence is their business.

Notes:

  • 1 You might ask if I’m certain about this: my answer is no. It is entirely possible that one day some bright spark will solve the problems that have been plaguing philosophers for thousands of years, therefore I have reason to doubt this belief, and therefore I am not certain – in the epistemic sense. I might concede to being certain in the psychological sense, but that’s my problem.
  • 2 Think repetitive regression testing.
  • 3 A Rumsfeld.
  • 4 It makes as much sense to talk about reducing epistemic certainty as it does to talk about – you guessed it – reducing infinity.

User Acceptance Tricks?

Some time ago, I was an SAP consultant. Between projects I configured a variety of demonstration systems for use in presales. These were pretty rough and ready, but they did the job. The trick (and I mean that literally) was to carefully define detailed step by step scripts, test them, and make sure that the demonstrator followed them to the letter. These provided safe pathways; routes through the application that were free of problems. A demonstrator would stray from the path at their peril; the demo would quickly fall apart if they did.

This is analogous to some User Acceptance Testing practices that I’ve observed. Do you recognize this?

The blinkered scripting scam: Acceptance tests will be scripted in advance. They will be traced to requirements. The scripts will be reviewed and pretested by the project. If all the tests pass when executed by the acceptance team, then the software will be accepted.

From a project management perspective this would seem to make sense:

  • It gives the project an opportunity to check that the acceptance team is only testing behavior that is considered to be within the projects scope.
  • It gives the project an opportunity to make sure that acceptance tests will pass before they are formally executed.
  • It helps to ensure that the acceptance team can begin execution just as soon as the software is ready for them.

This is not testing, nor is it even meaningful checking: pretesting ensures that acceptance test execution will not reveal any new information. This is demonstration, and nothing more. It has consequences:

  • Execution is often the first opportunity that acceptance testers have to get their hands on the software. With little or no opportunity to interact with the software in advance, just how insightful will their preplanned tests be?
  • Bugs don’t neatly line up along the course charted by tests. Nor do they conveniently congregate around requirements or other abstractions of system behavior just waiting to be found. Confirmatory requirements based testing will miss all manner of problems.
  • Pretesting creates safe pathways through the software. If acceptance testing is confined to these tests it can result in an acceptance decision regardless of the hazards that may lurk beyond these paths.
  • Acceptance testers, be they customers, users or their representatives have the potential to bring important insights to testing. They have a different perspective, one that is often centered on the value that the software could bring. This opportunity is wasted if they are made to follow a process that blinds them.

Acceptance testing is often differentiated from other forms of testing in terms of its purpose: whilst earlier tests are focused on finding problems, acceptance testing is sometimes positioned as a confidence building exercise. The risk is that acceptance testing becomes a confidence trick.

The good news is that this risk can be mitigated, even whilst checking many of the boxes that will satisfy project management. A couple of years ago I found myself in an unusual position; working for a vendor yet managing all testing, including acceptance by the customer. This presented a potential conflict of interest that I was determined to avoid. The contract was fixed price and payment was tied to a specific delivery date so the project manager wanted to adopt practices similar to those described above. Fortunately, he also accepted that doing so risked imposing constraints on the quality of acceptance and was willing to entertain alternatives. We agreed on the following:

  • Domain and application experts would provide training to the acceptance team prior to testing, and would be on hand to provide coaching throughout.
  • User Acceptance Demonstrations would serve to provide basic verification of requirements.
  • This would be supplemented by exploratory testing, which would allow the acceptance testers to kick the tires and bring their own perspectives to bear in a more meaningful way that the scripts alone would allow.
  • A transparent and visibly fair triage process would be implemented, which would allow the customer to put forth their own prioritization of bugs whilst allowing the project management to intervene should bugs be reported that were beyond the scope of the project.

Project management had the control they needed over scope. The customer was able to get a good feel for the software and the value it would provide. We were able to identify a number of important bugs that would otherwise have escaped us and become warrantee issues. With a little bit of thought, we managed to put the testing back into acceptance testing. Which are you, tester or grifter?