The Validity of the Testing Environment

From art experts recognizing a statue as a fake (Gladwell, 2005) to nurses recognizing obscure signs of infection in infants (Klein, 1998), the human mind is capable of performing amazing feats of intuition. As magical as it might appear, intuition is little more than recognition of a situation (Simon, 1992). As expertise develops, the decision maker is able to recognize an increasing array of contexts and match an appropriate response from a repertoire of possible actions. It is also subject to bias, systematic failures of judgment that the decision maker can walk blindly into. Intuition, it seems, is not accompanied by any subjective signal that marks it as valid, and confidence is not a reliable indicator as to the validity of intuition (Kahneman & Klein, 2009).

Daniel Kahneman and Gary Klein point out that professional environments differ in terms of whether they are conducive to the development of intuitive expertise (Kahneman & Klein, 2009). Whilst firefighters and nurses (amongst others) can develop such expertise, stock-pickers and political pundits do not seem to be able to do so. In the latter case, we would be better served to place our trust in algorithms and other such rational methods than to rely on the advice of experts. What about testers? Can we develop intuitive expertise? Or would we be better off relying on process, decision criteria and other mechanisms?  Given the controversy over best practices, and the importance that context driven testers place on judgment and skill, this would seem to be a fundamental question.

Kahneman and Klein identify two factors that are necessary conditions for the development of expertise: that the environment provides valid contextual cues, and that it provides adequate opportunities to learn through timely and relevant feedback. So how do the environments that we test within stack up against these criteria?

Firstly, let’s look at contextual cues. Does testing provide the kind of cues that facilitate learning? Given that every context is different, that we are not repeatedly producing the same software, one might be forgiven for believing that any such cues provide weak statistical signals. A comparison might be helpful. Consider chess: both chess and testing are intractable problems. It is widely acknowledged that it is impossible to test everything; similarly, the average chess game has in the order of 10120 possible positions. Whilst this would be unfeasible to brute force, grand masters are able to immediately recognize anywhere between fifty and a hundred thousand patterns and select a strong move within a matter of seconds. In short, chess is a paragon for expertise: it provides a valid environment despite the fact that individual cues might be presented only infrequently. We should not mistake complexity for invalidity: the validity of an environment is not solely determined by the frequency with which individual cues occur, but also by the relevance of those cues. For the tester who is willing to listen, there is an abundance of such cues: the values of customer and stakeholder, the interplay between project goals and constraints, the makeup of technical solution. I’ll discuss contextual cues again in the near future, both at CAST 2012 and in an accompanying post.

Expertise is not experience. Without the opportunity to learn, all the experience in the world will not lead to expertise. Let’s turn to that. The learning opportunities present in any given environment are determined by the quality and speed of the feedback that it provides. In testing, this varies. In some cases we are often able to gain feedback, such as our stakeholder’s reactions to our approach, strategies and the bugs that we find. In other cases it can be difficult to get rapid and relevant feedback, for example: on the bugs that we miss. Sometimes these stay missed for a long time, whilst in some contexts we get feedback of the wrong kind and risk learning the wrong lessons. For example if feedback takes the form of a witch hunt, an attempt to allocate blame, just what does that teach testers? Even where this is avoided, we often see an emphasis on processes and methods and how they might be improved, rather than a focus on what individual testers might learn from their mistakes. Perhaps an environment in which human judgment has been surrendered to process is one in which the conditions for the development of expertise have been removed. Not only are factory approaches blinkered and dehumanizing, but they might well rob testers of their means of escape: an opportunity to develop expertise. There are however some intriguing possibilities. Could we train ourselves to become better observers so as to be more receptive to the feedback that our environments supply? Is it possible to reconfigure our environments such that they provide better feedback? Dedicated testers can improve their chances of developing expertise by creating and nurturing feedback loops.

Perhaps asking whether the testing environment is conducive to developing expertise is too simplistic a question. Kahneman and Klein identify some professions as fractionated: where expertise is displayed in some activities but not in others. Given the uneven nature of feedback, it may well be that testing is one such profession: there may be some activities in where it is more appropriate to draw on algorithmic methods than on expertise. Of course, the trick is recognizing when to do so, recognizing the limits of our expertise. And that requires judgment: as James Bach tweeted at the weekend “In any matter that requires sapience, even algorithmic methods can only be applied heuristically.



  • Gladwell, M. (2005). Blink.
  • Kahneman, D. and Klein, G. (2009). Conditions for Intuitive Expertise: A Failure to Disagree.
  • Klein, G. (1999). Sources of Power: How People Make Decisions.
  • Simon, H. (1992). What is an Explanation of Behavior?

2 thoughts on “The Validity of the Testing Environment”

  1. Exceptional post. I agree with you that the validity of an environment is not solely determined by the frequency with which individual cues occur, but also by the relevance of those cues. One of the best post that I ever came across, anyone with software testing background will love to read this.

Leave a Reply

Your email address will not be published. Required fields are marked *