Tag Archives: Heuristics

The Validity of the Testing Environment

From art experts recognizing a statue as a fake (Gladwell, 2005) to nurses recognizing obscure signs of infection in infants (Klein, 1998), the human mind is capable of performing amazing feats of intuition. As magical as it might appear, intuition is little more than recognition of a situation (Simon, 1992). As expertise develops, the decision maker is able to recognize an increasing array of contexts and match an appropriate response from a repertoire of possible actions. It is also subject to bias, systematic failures of judgment that the decision maker can walk blindly into. Intuition, it seems, is not accompanied by any subjective signal that marks it as valid, and confidence is not a reliable indicator as to the validity of intuition (Kahneman & Klein, 2009).

Daniel Kahneman and Gary Klein point out that professional environments differ in terms of whether they are conducive to the development of intuitive expertise (Kahneman & Klein, 2009). Whilst firefighters and nurses (amongst others) can develop such expertise, stock-pickers and political pundits do not seem to be able to do so. In the latter case, we would be better served to place our trust in algorithms and other such rational methods than to rely on the advice of experts. What about testers? Can we develop intuitive expertise? Or would we be better off relying on process, decision criteria and other mechanisms?  Given the controversy over best practices, and the importance that context driven testers place on judgment and skill, this would seem to be a fundamental question.

Kahneman and Klein identify two factors that are necessary conditions for the development of expertise: that the environment provides valid contextual cues, and that it provides adequate opportunities to learn through timely and relevant feedback. So how do the environments that we test within stack up against these criteria?

Firstly, let’s look at contextual cues. Does testing provide the kind of cues that facilitate learning? Given that every context is different, that we are not repeatedly producing the same software, one might be forgiven for believing that any such cues provide weak statistical signals. A comparison might be helpful. Consider chess: both chess and testing are intractable problems. It is widely acknowledged that it is impossible to test everything; similarly, the average chess game has in the order of 10120 possible positions. Whilst this would be unfeasible to brute force, grand masters are able to immediately recognize anywhere between fifty and a hundred thousand patterns and select a strong move within a matter of seconds. In short, chess is a paragon for expertise: it provides a valid environment despite the fact that individual cues might be presented only infrequently. We should not mistake complexity for invalidity: the validity of an environment is not solely determined by the frequency with which individual cues occur, but also by the relevance of those cues. For the tester who is willing to listen, there is an abundance of such cues: the values of customer and stakeholder, the interplay between project goals and constraints, the makeup of technical solution. I’ll discuss contextual cues again in the near future, both at CAST 2012 and in an accompanying post.

Expertise is not experience. Without the opportunity to learn, all the experience in the world will not lead to expertise. Let’s turn to that. The learning opportunities present in any given environment are determined by the quality and speed of the feedback that it provides. In testing, this varies. In some cases we are often able to gain feedback, such as our stakeholder’s reactions to our approach, strategies and the bugs that we find. In other cases it can be difficult to get rapid and relevant feedback, for example: on the bugs that we miss. Sometimes these stay missed for a long time, whilst in some contexts we get feedback of the wrong kind and risk learning the wrong lessons. For example if feedback takes the form of a witch hunt, an attempt to allocate blame, just what does that teach testers? Even where this is avoided, we often see an emphasis on processes and methods and how they might be improved, rather than a focus on what individual testers might learn from their mistakes. Perhaps an environment in which human judgment has been surrendered to process is one in which the conditions for the development of expertise have been removed. Not only are factory approaches blinkered and dehumanizing, but they might well rob testers of their means of escape: an opportunity to develop expertise. There are however some intriguing possibilities. Could we train ourselves to become better observers so as to be more receptive to the feedback that our environments supply? Is it possible to reconfigure our environments such that they provide better feedback? Dedicated testers can improve their chances of developing expertise by creating and nurturing feedback loops.

Perhaps asking whether the testing environment is conducive to developing expertise is too simplistic a question. Kahneman and Klein identify some professions as fractionated: where expertise is displayed in some activities but not in others. Given the uneven nature of feedback, it may well be that testing is one such profession: there may be some activities in where it is more appropriate to draw on algorithmic methods than on expertise. Of course, the trick is recognizing when to do so, recognizing the limits of our expertise. And that requires judgment: as James Bach tweeted at the weekend “In any matter that requires sapience, even algorithmic methods can only be applied heuristically.

 

References

  • Gladwell, M. (2005). Blink.
  • Kahneman, D. and Klein, G. (2009). Conditions for Intuitive Expertise: A Failure to Disagree.
  • Klein, G. (1999). Sources of Power: How People Make Decisions.
  • Simon, H. (1992). What is an Explanation of Behavior?

Should I Start Testing?

I recently participated in a Software Testing Club discussion about Quality Gates. This led me to reflect a little more on the subject.

My first test management gig was running system integration testing for an enterprise project. The phase was planned to last for five weeks. System testing for a number of component systems was running late, and that was making the project look bad. The project manager impressed on me the political importance of starting system integration on time.

“System test should be done in another week or so” she said.

“So you should be able to catch that up in the last month of integration”. I reluctantly agreed.

Twenty five weeks of hell later, I submitted the exit report. Nothing had worked out as expected. Delays had continued on completion of some systems. Others that were supposedly complete either had gaping holes where functionality should have been, or turned into a bugfest. Attempting to test and raise system integration level bugs simply poured fuel on the fire, adding to the confusion.  I swore that from that point on I’d only test software that was ready to be tested: that I’d implement strict quality gates and hard entry criteria.

I can understand why Quality Gates have become an article of faith to many testers; the above experience led me in the same direction. Fortunately, experience has also given me a reality check.

Let’s roll the clock forward a few years and consider another project: one on which time to market was the most critical factor.  The end date was not going to change: bugs or no bugs. Development was running late, and enforcing entry criteria would almost guarantee that no testing would be performed by my team, no feedback would be provided to development, no bugs would be found or fixed. I abandoned my gates, and helped to identify some major issues that we were able to iron out before we shipped.

In the first example, quality gates might have helped me. In the second, they were utterly inappropriate. The problem with quality gates goes deeper than a discussion of relevance to different contexts however: they are a trivial solution to a complex problem. They seek to reduce the process of answering the question “should I start testing” to simple box checking. Perhaps the most insidious problem with gates is that they tend to emphasize reasons NOT to test. Thus the tester who has come to believe that testing is like a sewerwhat you get out of it depends on what you put in to it * – can use them as an excuse not to test. Unit testing incomplete? Don’t test. Major bugs outstanding? Don’t test. Documentation outstanding? Don’t test. All these factors miss a vital point: reasons TO test.

Even when test entry criteria have not been fulfilled, there are many good reasons to test:

  • Perhaps you can provide early feedback on items that aren’t quite ready for the big time
  • Perhaps you can learn something new and think of new tests
  • Perhaps you can become more familiar with the software
  • Perhaps you can test your own models and assumptions about the software
  • Perhaps you test whether your test strategy is workable.

The decision to start testing demands more thought than simple box checking can provide. Instead, questions like these can serve as a guide:

  • If I we’re to start testing now, what kind of constraints would my testing be under?
  • Given those constraints, what testing missions could I accomplish?
  • For those missions, what kind of value could I provide to the project?
  • What is the value of tasks that I might forego in order to test now?
  • If I were to test now, what other costs might this impose on the project?

Next time you decide whether or not to test, use your brain, not a check list.

*A nod to Tom Lehrer.

 

Should I Keep Testing?

Every so often, one of my testers will ask me this question.

It normally goes like this:

  • I’m getting a lot of bugs, should I keep testing?
  • A lot of tests are failing, should I keep testing?
  • A lot of tests are blocked, should I keep testing?
  • The build seems hosed, should I keep testing?
  • Nothing seems to make sense, should I keep testing?

Now, in many of these cases there may be a valid reason to stop testing, but these statements in themselves are not sufficient to make that call.

My response usually goes something like this:

  • Does that prevent you from exploring other parts of the software?
  • Does that prevent you from learning anything new?
  • Does that prevent you from thinking of new ways to test it?
  • Does that prevent you from finding new bugs?
  • Does that prevent you from better understanding the bugs you’ve already found?
If the answer is no, then perhaps it’s not quite time to stop testing.

Selling Tobacco

I recently watched a presentation that Lee Copeland gave in 2007: The Nine Forgettings, which touches on a number of things that he feels that testers often forget.

One thing in particular jumped out at me: “forgetting the boundaries”. In this section, Copeland discusses the problems that arise when testers consistently compensate for unacceptable behavior by other project members – such as BAs writing poor requirements, developers handing over code that isn’t unit tested, and PMs who call for insane hours.

I can relate to this, having frequently witnessed the kind of codependent behaviour that Copeland is talking about: testers who shrug and say “that’s just the way it is” are testers who have given up thinking about how things could be better for their customer, the project. Perhaps there are some lines that testers need to draw, some things that we need to push back on.

This left me trying to square a circle.

I also support the context driven view that as testers we provide a service to the project, that we need to adapt our testing to suit the context within which we operate, and that we should do the best testing that we can with what we are given.

So how do I reconcile these seemingly conflicting views?

Here’s a few heuristics that help me:

Selling tobacco: Sometimes other members of the project will ask us to do something that we disagree with, that we believe will harm the effectiveness of our testing; our customers are asking us to sell them something that we don’t feel is in their interests. However, our customers are responsible adults, and are entitled to make their own decisions. Like selling tobacco, it is appropriate to give a health warning, then make the sale.

Selling crack: Sometimes (and hopefully rarely) we are asked to do something that is simply unethical – such as suppressing information or providing dishonest reports. Just say “no” to drugs.

Selling miracle cures: Last, but by no means least, sometimes we are asked to do the impossible – “ten days testing by this afternoon?”. Agreeing to unrealistic expectations is a recipe for disappointment. A grown up conversation about alternatives is called for.

So, what have you been asked to sell today?

Update; since writing this, I’ve been rereading The Seven Basic Principles of the Context-Driven School. The heuristics above map well to part of Kaner and Bach’s commentary:

Context-driven testing has no room for this advocacy. Testers get what they get, and skilled context-driven testers must know how to cope with what comes their way. Of course, we can and should explain tradeoffs to people, make it clear what makes us more efficient and more effective, but ultimately, we see testing as a service to stakeholders who make the broader project management decisions.

  • Yes, of course, some demands are unreasonable and we should refuse them, such as demands that the tester falsify records, make false claims about the product or the testing, or work unreasonable hours. But this doesn’t mean that every stakeholder request is unreasonable, even some that we don’t like.
  • And yes, of course, some demands are absurd because they call for the impossible, such as assessing conformance of a product with contractually-specified characteristics without access to the contract or its specifications. But this doesn’t mean that every stakeholder request that we don’t like is absurd, or impossible.
  • And yes, of course, if our task is to assess conformance of the product with its specification, we need a specification. But that doesn’t mean we always need specifications or that it is always appropriate (or even usually appropriate) for us to insist on receiving them.

There are always constraints. Some of them are practical, others ethical. But within those constraints, we start from the project’s needs, not from our process preferences.