Tag Archives: Decision Making

Doctor, Doctor

No best practice, no silver bullet, no one-size-fits-all solution. How convenient would it be were that not the case? I could simply hand out copies of the relevant textbook and let things take their course. Context dispels that illusion. In a context driven world there is no place for rote answers motivated by comfortable familiarity: context driven testers put context first, then test appropriately. Last year I read a paper (Kaner, 2003) which outlines a schema for evaluating different test design techniques. This planted questions in my mind that I’ve been unable to shake since: How do testers make decisions? How do we select practices in context?

So how do testers make decisions?

Decision making approaches can be categorized into two groups which align closely with the dual process characterization of reasoning that has gained popularity over the last twenty years: analytical reasoning and intuitive reasoning. Amongst the analytical group are hypothetico-deductive reasoning, algorithmic approaches and arborization. Whilst many of these may have a place within testing they make poor tools in the selection of testing practices: we have remarkably little empirical data relating to the effectiveness of different practices in different contexts. The analytical grouping also includes a variety of “rational choice models” for decision making. Bazerman (2005) provides an example:

  1. Define the problem.
  2. Identify the criteria.
  3. Weight the criteria.
  4. Generate alternatives.
  5. Rate each alternative on each criterion.
  6. Compute the optimal decision.

Sound familiar? This kind of model sits at the heart of many decision support systems. I’m sure both the models and associated software sell well: there’s a general bias towards this kind of thinking in the world of business. We all want to be seen as rational, we all want to be able to justify our decisions and be able to show that we’ve carefully considered our options before acting. The fact that many such models often rely on a healthy dose of process and mathematics lends them an air of credibility. But is this justified? I’m not entirely sure. My greatest reservation about these models is that they are prescriptive: they describe someone else’s idea about how you should think, as opposed to how you actually do think. These models don’t answer the question “how do testers think?” and if they don’t do that is it possible for us to rewire our brains to match these models? Perhaps to a degree. We can certainly follow the process but does going through the motions necessarily lead us to “rational” decisions? I think not. I remember one such model being described to me by one of my earliest managers. He talked me through the mechanics, and then coached me on actually using it. When presenting options, he said, always include three: a “do nothing” option, an option that no-one in their right mind will ever agree to, and the preferred option. This way, he assured me, my preferred option would always be selected! Some years later, whilst studying for an MBA, I encountered another such model: I was given a decision support tool and asked to use it to analyze a decision. I chose to analyze my choice of business school, set about identifying and weighting criteria, selecting candidate schools and scoring them against each criterion. Surprise, surprise, my actual school didn’t come out as the first choice, so I spent hours tweaking the weightings and scores until it clearly showed that I had made a rational choice. Rational? No. More like gaming and justification. You may have experienced a similar phenomenon with decisions made on a coin toss. Say that you want to choose between two options, A and B. You decide to make a “best of three” decision. A comes out ahead, but you’re not too happy about that…so you call it best of five, then best of seven, and so on until B wins: the choice you really wanted to make all along. Be it bias or intuition, we often have a feel for the choices that we want to make: models be damned. Perhaps there are ways to leverage the rational models such that these effects can be minimized, but that is a line of enquiry that I leave for another day.

Intuitive approaches offer a marked contrast to analytical models. For example, the Recognition Primed Decision model is a descriptive model based on studies of decision making in natural settings. Klein (1999) describes three variations of this model:

  1. On experiencing a situation, a decision maker recognizes it and simultaneously recognizes a course of action that is likely to succeed.
  2. On experiencing a situation, a decision maker recognizes aspects of a situation and employs feature matching until an analogous situation and corresponding action is found.
  3. On experiencing a situation, a decision maker recognizes a situation (either as prototypical or analogous), and then loops through a series of possible actions mentally simulating each one. The decision maker may modify or discard each action, and continues until a workable solution is found. At no point does the decision maker compare options: this evaluation is performed sequentially.

Unlike rational choice models, this model makes no particular claim to seek optimal solutions. Rather, it provides a satisficing (Simon, 1956) strategy. Given the intractable nature of the testing challenge, this could well be appropriate. Is it a reasonable model of how testers think? There is much about it that rings true. How often in the last week have you sat down and consciously made a decision based on comparing options? And how many times this week have you been presented with a situation to which you immediately knew how to react? Or weren’t even aware that you were making a decision at all? I suspect that the conscious comparisons of options will be significantly outnumbered. This model would certainly explain why we often seem to act without conscious thought or find it difficult to explain why we decided on a particular course of action.

An important aspect of this model is the role of expertise:

  • The decision maker must be able to recognize relevant contextual cues. Without knowing what information to take into consideration, understanding of context will be compromised.
  • The decision maker needs to be able to build a mental model of the context such that they have expectancies as to how it should operate. This can alert the decision maker to anomalies – the presence or absence of cues – that suggest that there is something different about the context.
  • A decision maker’s expertise provides a repertoire of possible actions for direct application or modification.

Of course, in situations where a decision maker lacks the appropriate expertise, this model cannot easily be applied. Apply this model to a situation where a decision maker is confronted with a wholly unfamiliar situation: with no recognition there is little prospect for action. Ever see someone freeze like deer in the headlights when facing something new? Perhaps more worrying are situations where decision makers believe that they have appropriate expertise but do not: incorrect recognition and inappropriate responses to context are a likely result. Further, some environments do not lend themselves to the development of expertise (Kahneman and Klein, 2009) because they provide neither reliable cues nor the feedback required to learn them.

Perhaps there is a role for a blended model which allows for either the application of either analytical or intuitive approaches. One such model is proposed by Croskerry (2009). Like the Recognition Primed Decision model, it starts with recognition. If a situation is recognized then an intuitive pathway is triggered, otherwise analytical reasoning is invoked. Klein (1999) describes similar behavior in firefighters who encounter a situation that they do not recognize. An important implication of Croskerry’s model is the suggestion that repeated use of the analytical pathway can in some situations facilitate the development of recognition and intuition.

By now, unless you attended the CAST 2012 presentation with the same name, you may be wondering what any of this has to do with the title of the post.  Last year I was travelling to work on the bus. The passenger in front of me appeared to be a medical student: he was studying presentations on his MacBook that described the indications and contraindications of a variety of treatments. These describe factors that suggest whether a particular diagnostic test, drug or procedure should or should not be applied. Some contraindications are absolute: in their presence there are no reasonable circumstances under which a particular test or treatment is recommended. In contrast all indications, and many contraindications, are relative. In these cases they serve as heuristics: a medical professional is expected to apply judgment in their application. Traditionally, doctors have been trained in the use of indications and contraindications through rote memorization. More recently there have been trends to incorporate these into decision support software and to encourage doctors to “look it up”.

That day, sitting on the bus, I was struck by the potential of this. It seems to me that the vast majority of the time we describe practices in terms of their pros and cons, their relative merits, their benefits and costs: all schemas that lend themselves to comparisons using a rational choice model. Yet here was a different schema, one that seemed to better fit naturalistic decision making, one that provides a direct link between contextual cues and action. Could this be something that we might make use of in testing? Might we use something akin to indications and contraindications to encode the relationship between context and practice? Might this facilitate decision making and the development of expert intuition in testers?

To do so, we need to satisfy two requirements. First we need to find some way of explicating the cues that correspond with a given course of action. This presents some difficulties, but is not impossible. Let’s take a look at a medical example:

Klein (1999) describes a study conducted by Beth Crandall, one of his colleagues. This study was with a group of neonatal nurses who displayed a remarkable ability to identify the signs of potentially fatal infections in premature babies and provide the appropriate antibiotics. In many cases they were doing so well before their judgments were confirmed by test results. When Crandall asked the nurses how these diagnostic decisions were being made, the nurses were at a loss: “It’s intuition” she was told. Through a series of interviews that focused on specific cases, Crandall was able to identify a series of cues that indicated infection. Several of these cues had not yet received widespread coverage in medical literature, some were in direct contradiction to cues that were relevant in adults, yet a specialist neonatologist later confirmed their relevance.

Making such knowledge visible is only half the battle. Often testing decisions, like many medical decisions, are made under time constraints. There are occasions when we do not have the luxury of referring to encoded knowledge, nor, if we are honest, are many of us inclined to do so. Our second requirement is sensitize people to contextual cues in such a way that it can feed their intuitions. Let’s turn once more to a medical example:

Gigerenzer (2008) describes how a group of doctors in a Michigan ER were struggling with the accuracy with which possible heart attack patients were assigned to intensive care. Initially the hospital implemented a computer system which calculated the probability of heart attack based on a small set of the most critical symptoms. This resulted in an immediate improvement in accuracy. The doctors, who would have made great testers, were not content to stop there. Seeking further evidence that the improvements were due to the new approach they withdrew the system. The result? Accuracy was maintained at the new levels. What had happened here? The diagnostic tool which the doctors had been provided with had sensitized them to the most important cues that indicated for intensive care. Cues can be trained.

So how might this work in practice? To date, I’ve attempted to put this to use on one project. This was by no means a scientific study: it was just an attempt to frame decision making along these lines to see if the idea had legs. It is also worth noting that I have deviated from the medical use of indications and contraindications: in the medical context they are used for the selection of actions and not for diagnosis, whereas in the testing context I have used them both for diagnosis (recognition) of context and for the selection of practice.

The project in question involved testing the integration of a third party off-the-shelf application with existing systems via ETL (extract, transform, and load) processes. I have some experience of testing ETL, so was interested in whether I could explicate cues that indicated for or against particular approaches. In this case I recognized the context as a reasonable analogue for one in which I had tested previously. I noted a number of cues:

  • Data integrity would be critical due to the regulatory nature of the solution.
  • Time and quality were the principle project constraints.
  • Many of the transformations between source and destination systems were complex, but were generally well specified.
  • There were a variety of different types of load with subtle variations in logic (e.g. full vs. delta, monthly vs. daily).
  • Project stakeholders had a very strong preference towards testing large volumes of conditioned production data rather than smaller sets of synthetic data.

Recognition of the situation was accompanied by recognition of a course of action: the use of an automated oracle and comparator. This approach involves large scale mechanized checking. The automated oracle is a parallel implementation that generates expected results for any given set of source data. The comparator then performs a reconciliation of the expected and actual results, with mismatches being potential problems to be investigated further. Some indications and contraindications for this practice are as follows:

Practice Indications Contraindications
Automated Oracle & Comparator
  • Data integrity is critical
  • Testing multiple loads with different data sets
  • Testing large data volumes
  • Little specification of transformations

Now, just because I was familiar with similar contexts and a course of action which had been successful in the past doesn’t mean that I was necessarily right.  I had a reasonable degree of confidence in my intuitive choice, but as Kahneman and Klein (2009) point out, subjective confidence is “an unreliable indication of the validity of intuitive judgments”.  Rather than simply leaping in with both feet, I consulted widely, searching for any cues that might suggest that this approach would not work. I asked my team to create a proof of concept. We looked at alternative options: and rejected several based on their contraindications:

Practice Contraindications
Tool Supported Data Inspection
  • Time or cost is critical
Data Profiling
  • Data integrity is critical

Ultimately we proceeded with the automated oracle and comparator, but we had a surprise waiting for us. There are a number of ways one can perform such reconciliation. Our approach was to use key matching to isolate the relative complements (giving us record level mismatches) and the intersection (which was then subject to further comparison to isolate field level mismatches). Shortly after we started execution, we ran into a problem: some of the tables used keys that we could not reliably predict during result generation, meaning that key matching was not going to work in these cases. At this more granular level of detail we had missed an important cue.

It was time to find an alternative, and I convened the team for a workshop. This time I wanted to see the approach could be used in an analytical mode. We identified and evaluated options but rather than doing a comparative evaluation we again focused on contraindications:

Practice Contraindications
Proxy key: identify a unique field in the source and match on that
  • Relationship between source and target not 1:1
Composite key: identify a combination of fields that guarantee uniqueness and match on them
  • No combination of keys guarantee uniqueness
  • Conditional logic in transformations of fields used in composite key

Understanding the contraindications of these options enabled us to quickly determine which practice should be applied to which tables. Then an additional benefit took me by surprise: when asked to give an account of our workaround, test framing was easier than expected. Compare the following attempts at framing:

  • Using contraindications: “For tables 1 and 2, we couldn’t match on primary keys because they were unpredictable. For table 1 we used a proxy key, however for table 2 there is no one to one relationship between the source and target. Therefore we chose to use a composite key.”
  • Using comparative evaluation: “Key matching is the preferred option because we have already developed the code. In contrast, additional effort will be required to use a proxy key, and even more in the case of a composite key. Key matching  also provides a more reliable match than a composite key would, because if there are bugs in the logic of any of the fields used as a composite key we risk experiencing false positives, and in some cases – where the logic for a field is inverted (think not equals instead of equals) – false negatives. Um, er…where was I? Oh yes, we’re using a proxy key for table 1 and a composite key for table 2.”

The latter is rather overwhelming, whereas the former is more digestible and more conducive to communication. I found that it feels more natural to have a conversation about situations in which one would, or would not, apply a set of practices than talk about their relative merits: it connects immediately to context.

In summary, this approach seems to have had some benefits: it helped to rationalize and articulate situations in which different ETL testing approaches would be appropriate, it helped to streamline the comparative analysis of different options when searching for workable solutions to a problem and it helped to enhance the clarity of test framing. However, it is too early to determine whether it will also help to sensitize people to relevant contextual cues. Whilst I suspect that it will pay dividends in terms of how my team looks at similar problems in the future, and may be useful in training, I have yet to find an opportunity to put that to the test. We should exercise a degree of caution though. I’ve borrowed indications and contraindications, as well as some interesting examples, from medicine but we should not simply accept these because medicine has an air of authority. Doctors are people too, they are subject to the same biases that we are, and they make mistakes: expert intuition is not infallible. Groopman (2007) describes a number of examples, and points to a 1995 study that indicated that approximately 15% of all medical cases in the U.S. are misdiagnosed. We should also be wary of claiming the relevance of indications and contraindications across contexts. Two contexts might be similar in all but a handful of ways, but if those differences are relevant to a given practice in ways that are not understood then we risk missing important cues. The last thing we want to do is open a back door to best practice. Even in medicine, where there are bodies that maintain “authoritative” lists of indications and contraindications, there is a trend towards making decisions that blend such factors with the individual context of the patient.

For now, I have more questions than I started with:

  • Does the Recognition Primed Decision model really provide a reasonable description of how testers think?
  • Which aspects of testing lend themselves to analytical approaches? To intuitive approaches?
  • Can a hybrid decision making model facilitate the training of intuition?
  • Is there value in exploring the indications and contraindications used by expert testers?
  • Can indications and contraindications add value in the training of testers?

What do you think; does this warrant further study?

 

References

  • Bazerman, M.H. (2005). Judgment in Managerial Decision Making.
  • Croskerry, P. (2009). A Universal Model for Diagnostic Reasoning.
  • Gigerenzer, G. (2008). Gut Feelings.
  • Groopman, J. (2007). How Doctors Think.
  • Kahneman, D. and Klein, G. (2009). Conditions for Intuitive Expertise: A Failure to Disagree.
  • Kaner, C. (2003). What is a good test case?
  • Klein, G. (1999). Sources of Power: How People Make Decisions.
  • Simon, H. (1956). Rational Choice and the Structure of the Environment. 

The Validity of the Testing Environment

From art experts recognizing a statue as a fake (Gladwell, 2005) to nurses recognizing obscure signs of infection in infants (Klein, 1998), the human mind is capable of performing amazing feats of intuition. As magical as it might appear, intuition is little more than recognition of a situation (Simon, 1992). As expertise develops, the decision maker is able to recognize an increasing array of contexts and match an appropriate response from a repertoire of possible actions. It is also subject to bias, systematic failures of judgment that the decision maker can walk blindly into. Intuition, it seems, is not accompanied by any subjective signal that marks it as valid, and confidence is not a reliable indicator as to the validity of intuition (Kahneman & Klein, 2009).

Daniel Kahneman and Gary Klein point out that professional environments differ in terms of whether they are conducive to the development of intuitive expertise (Kahneman & Klein, 2009). Whilst firefighters and nurses (amongst others) can develop such expertise, stock-pickers and political pundits do not seem to be able to do so. In the latter case, we would be better served to place our trust in algorithms and other such rational methods than to rely on the advice of experts. What about testers? Can we develop intuitive expertise? Or would we be better off relying on process, decision criteria and other mechanisms?  Given the controversy over best practices, and the importance that context driven testers place on judgment and skill, this would seem to be a fundamental question.

Kahneman and Klein identify two factors that are necessary conditions for the development of expertise: that the environment provides valid contextual cues, and that it provides adequate opportunities to learn through timely and relevant feedback. So how do the environments that we test within stack up against these criteria?

Firstly, let’s look at contextual cues. Does testing provide the kind of cues that facilitate learning? Given that every context is different, that we are not repeatedly producing the same software, one might be forgiven for believing that any such cues provide weak statistical signals. A comparison might be helpful. Consider chess: both chess and testing are intractable problems. It is widely acknowledged that it is impossible to test everything; similarly, the average chess game has in the order of 10120 possible positions. Whilst this would be unfeasible to brute force, grand masters are able to immediately recognize anywhere between fifty and a hundred thousand patterns and select a strong move within a matter of seconds. In short, chess is a paragon for expertise: it provides a valid environment despite the fact that individual cues might be presented only infrequently. We should not mistake complexity for invalidity: the validity of an environment is not solely determined by the frequency with which individual cues occur, but also by the relevance of those cues. For the tester who is willing to listen, there is an abundance of such cues: the values of customer and stakeholder, the interplay between project goals and constraints, the makeup of technical solution. I’ll discuss contextual cues again in the near future, both at CAST 2012 and in an accompanying post.

Expertise is not experience. Without the opportunity to learn, all the experience in the world will not lead to expertise. Let’s turn to that. The learning opportunities present in any given environment are determined by the quality and speed of the feedback that it provides. In testing, this varies. In some cases we are often able to gain feedback, such as our stakeholder’s reactions to our approach, strategies and the bugs that we find. In other cases it can be difficult to get rapid and relevant feedback, for example: on the bugs that we miss. Sometimes these stay missed for a long time, whilst in some contexts we get feedback of the wrong kind and risk learning the wrong lessons. For example if feedback takes the form of a witch hunt, an attempt to allocate blame, just what does that teach testers? Even where this is avoided, we often see an emphasis on processes and methods and how they might be improved, rather than a focus on what individual testers might learn from their mistakes. Perhaps an environment in which human judgment has been surrendered to process is one in which the conditions for the development of expertise have been removed. Not only are factory approaches blinkered and dehumanizing, but they might well rob testers of their means of escape: an opportunity to develop expertise. There are however some intriguing possibilities. Could we train ourselves to become better observers so as to be more receptive to the feedback that our environments supply? Is it possible to reconfigure our environments such that they provide better feedback? Dedicated testers can improve their chances of developing expertise by creating and nurturing feedback loops.

Perhaps asking whether the testing environment is conducive to developing expertise is too simplistic a question. Kahneman and Klein identify some professions as fractionated: where expertise is displayed in some activities but not in others. Given the uneven nature of feedback, it may well be that testing is one such profession: there may be some activities in where it is more appropriate to draw on algorithmic methods than on expertise. Of course, the trick is recognizing when to do so, recognizing the limits of our expertise. And that requires judgment: as James Bach tweeted at the weekend “In any matter that requires sapience, even algorithmic methods can only be applied heuristically.

 

References

  • Gladwell, M. (2005). Blink.
  • Kahneman, D. and Klein, G. (2009). Conditions for Intuitive Expertise: A Failure to Disagree.
  • Klein, G. (1999). Sources of Power: How People Make Decisions.
  • Simon, H. (1992). What is an Explanation of Behavior?