No best practice, no silver bullet, no one-size-fits-all solution. How convenient would it be were that not the case? I could simply hand out copies of the relevant textbook and let things take their course. Context dispels that illusion. In a context driven world there is no place for rote answers motivated by comfortable familiarity: context driven testers put context first, then test appropriately. Last year I read a paper (Kaner, 2003) which outlines a schema for evaluating different test design techniques. This planted questions in my mind that I’ve been unable to shake since: How do testers make decisions? How do we select practices in context?
So how do testers make decisions?
Decision making approaches can be categorized into two groups which align closely with the dual process characterization of reasoning that has gained popularity over the last twenty years: analytical reasoning and intuitive reasoning. Amongst the analytical group are hypothetico-deductive reasoning, algorithmic approaches and arborization. Whilst many of these may have a place within testing they make poor tools in the selection of testing practices: we have remarkably little empirical data relating to the effectiveness of different practices in different contexts. The analytical grouping also includes a variety of “rational choice models” for decision making. Bazerman (2005) provides an example:
- Define the problem.
- Identify the criteria.
- Weight the criteria.
- Generate alternatives.
- Rate each alternative on each criterion.
- Compute the optimal decision.
Sound familiar? This kind of model sits at the heart of many decision support systems. I’m sure both the models and associated software sell well: there’s a general bias towards this kind of thinking in the world of business. We all want to be seen as rational, we all want to be able to justify our decisions and be able to show that we’ve carefully considered our options before acting. The fact that many such models often rely on a healthy dose of process and mathematics lends them an air of credibility. But is this justified? I’m not entirely sure. My greatest reservation about these models is that they are prescriptive: they describe someone else’s idea about how you should think, as opposed to how you actually do think. These models don’t answer the question “how do testers think?” and if they don’t do that is it possible for us to rewire our brains to match these models? Perhaps to a degree. We can certainly follow the process but does going through the motions necessarily lead us to “rational” decisions? I think not. I remember one such model being described to me by one of my earliest managers. He talked me through the mechanics, and then coached me on actually using it. When presenting options, he said, always include three: a “do nothing” option, an option that no-one in their right mind will ever agree to, and the preferred option. This way, he assured me, my preferred option would always be selected! Some years later, whilst studying for an MBA, I encountered another such model: I was given a decision support tool and asked to use it to analyze a decision. I chose to analyze my choice of business school, set about identifying and weighting criteria, selecting candidate schools and scoring them against each criterion. Surprise, surprise, my actual school didn’t come out as the first choice, so I spent hours tweaking the weightings and scores until it clearly showed that I had made a rational choice. Rational? No. More like gaming and justification. You may have experienced a similar phenomenon with decisions made on a coin toss. Say that you want to choose between two options, A and B. You decide to make a “best of three” decision. A comes out ahead, but you’re not too happy about that…so you call it best of five, then best of seven, and so on until B wins: the choice you really wanted to make all along. Be it bias or intuition, we often have a feel for the choices that we want to make: models be damned. Perhaps there are ways to leverage the rational models such that these effects can be minimized, but that is a line of enquiry that I leave for another day.
Intuitive approaches offer a marked contrast to analytical models. For example, the Recognition Primed Decision model is a descriptive model based on studies of decision making in natural settings. Klein (1999) describes three variations of this model:
- On experiencing a situation, a decision maker recognizes it and simultaneously recognizes a course of action that is likely to succeed.
- On experiencing a situation, a decision maker recognizes aspects of a situation and employs feature matching until an analogous situation and corresponding action is found.
- On experiencing a situation, a decision maker recognizes a situation (either as prototypical or analogous), and then loops through a series of possible actions mentally simulating each one. The decision maker may modify or discard each action, and continues until a workable solution is found. At no point does the decision maker compare options: this evaluation is performed sequentially.
Unlike rational choice models, this model makes no particular claim to seek optimal solutions. Rather, it provides a satisficing (Simon, 1956) strategy. Given the intractable nature of the testing challenge, this could well be appropriate. Is it a reasonable model of how testers think? There is much about it that rings true. How often in the last week have you sat down and consciously made a decision based on comparing options? And how many times this week have you been presented with a situation to which you immediately knew how to react? Or weren’t even aware that you were making a decision at all? I suspect that the conscious comparisons of options will be significantly outnumbered. This model would certainly explain why we often seem to act without conscious thought or find it difficult to explain why we decided on a particular course of action.
An important aspect of this model is the role of expertise:
- The decision maker must be able to recognize relevant contextual cues. Without knowing what information to take into consideration, understanding of context will be compromised.
- The decision maker needs to be able to build a mental model of the context such that they have expectancies as to how it should operate. This can alert the decision maker to anomalies – the presence or absence of cues – that suggest that there is something different about the context.
- A decision maker’s expertise provides a repertoire of possible actions for direct application or modification.
Of course, in situations where a decision maker lacks the appropriate expertise, this model cannot easily be applied. Apply this model to a situation where a decision maker is confronted with a wholly unfamiliar situation: with no recognition there is little prospect for action. Ever see someone freeze like deer in the headlights when facing something new? Perhaps more worrying are situations where decision makers believe that they have appropriate expertise but do not: incorrect recognition and inappropriate responses to context are a likely result. Further, some environments do not lend themselves to the development of expertise (Kahneman and Klein, 2009) because they provide neither reliable cues nor the feedback required to learn them.
Perhaps there is a role for a blended model which allows for either the application of either analytical or intuitive approaches. One such model is proposed by Croskerry (2009). Like the Recognition Primed Decision model, it starts with recognition. If a situation is recognized then an intuitive pathway is triggered, otherwise analytical reasoning is invoked. Klein (1999) describes similar behavior in firefighters who encounter a situation that they do not recognize. An important implication of Croskerry’s model is the suggestion that repeated use of the analytical pathway can in some situations facilitate the development of recognition and intuition.
By now, unless you attended the CAST 2012 presentation with the same name, you may be wondering what any of this has to do with the title of the post. Last year I was travelling to work on the bus. The passenger in front of me appeared to be a medical student: he was studying presentations on his MacBook that described the indications and contraindications of a variety of treatments. These describe factors that suggest whether a particular diagnostic test, drug or procedure should or should not be applied. Some contraindications are absolute: in their presence there are no reasonable circumstances under which a particular test or treatment is recommended. In contrast all indications, and many contraindications, are relative. In these cases they serve as heuristics: a medical professional is expected to apply judgment in their application. Traditionally, doctors have been trained in the use of indications and contraindications through rote memorization. More recently there have been trends to incorporate these into decision support software and to encourage doctors to “look it up”.
That day, sitting on the bus, I was struck by the potential of this. It seems to me that the vast majority of the time we describe practices in terms of their pros and cons, their relative merits, their benefits and costs: all schemas that lend themselves to comparisons using a rational choice model. Yet here was a different schema, one that seemed to better fit naturalistic decision making, one that provides a direct link between contextual cues and action. Could this be something that we might make use of in testing? Might we use something akin to indications and contraindications to encode the relationship between context and practice? Might this facilitate decision making and the development of expert intuition in testers?
To do so, we need to satisfy two requirements. First we need to find some way of explicating the cues that correspond with a given course of action. This presents some difficulties, but is not impossible. Let’s take a look at a medical example:
Klein (1999) describes a study conducted by Beth Crandall, one of his colleagues. This study was with a group of neonatal nurses who displayed a remarkable ability to identify the signs of potentially fatal infections in premature babies and provide the appropriate antibiotics. In many cases they were doing so well before their judgments were confirmed by test results. When Crandall asked the nurses how these diagnostic decisions were being made, the nurses were at a loss: “It’s intuition” she was told. Through a series of interviews that focused on specific cases, Crandall was able to identify a series of cues that indicated infection. Several of these cues had not yet received widespread coverage in medical literature, some were in direct contradiction to cues that were relevant in adults, yet a specialist neonatologist later confirmed their relevance.
Making such knowledge visible is only half the battle. Often testing decisions, like many medical decisions, are made under time constraints. There are occasions when we do not have the luxury of referring to encoded knowledge, nor, if we are honest, are many of us inclined to do so. Our second requirement is sensitize people to contextual cues in such a way that it can feed their intuitions. Let’s turn once more to a medical example:
Gigerenzer (2008) describes how a group of doctors in a Michigan ER were struggling with the accuracy with which possible heart attack patients were assigned to intensive care. Initially the hospital implemented a computer system which calculated the probability of heart attack based on a small set of the most critical symptoms. This resulted in an immediate improvement in accuracy. The doctors, who would have made great testers, were not content to stop there. Seeking further evidence that the improvements were due to the new approach they withdrew the system. The result? Accuracy was maintained at the new levels. What had happened here? The diagnostic tool which the doctors had been provided with had sensitized them to the most important cues that indicated for intensive care. Cues can be trained.
So how might this work in practice? To date, I’ve attempted to put this to use on one project. This was by no means a scientific study: it was just an attempt to frame decision making along these lines to see if the idea had legs. It is also worth noting that I have deviated from the medical use of indications and contraindications: in the medical context they are used for the selection of actions and not for diagnosis, whereas in the testing context I have used them both for diagnosis (recognition) of context and for the selection of practice.
The project in question involved testing the integration of a third party off-the-shelf application with existing systems via ETL (extract, transform, and load) processes. I have some experience of testing ETL, so was interested in whether I could explicate cues that indicated for or against particular approaches. In this case I recognized the context as a reasonable analogue for one in which I had tested previously. I noted a number of cues:
- Data integrity would be critical due to the regulatory nature of the solution.
- Time and quality were the principle project constraints.
- Many of the transformations between source and destination systems were complex, but were generally well specified.
- There were a variety of different types of load with subtle variations in logic (e.g. full vs. delta, monthly vs. daily).
- Project stakeholders had a very strong preference towards testing large volumes of conditioned production data rather than smaller sets of synthetic data.
Recognition of the situation was accompanied by recognition of a course of action: the use of an automated oracle and comparator. This approach involves large scale mechanized checking. The automated oracle is a parallel implementation that generates expected results for any given set of source data. The comparator then performs a reconciliation of the expected and actual results, with mismatches being potential problems to be investigated further. Some indications and contraindications for this practice are as follows:
|Automated Oracle & Comparator||
Now, just because I was familiar with similar contexts and a course of action which had been successful in the past doesn’t mean that I was necessarily right. I had a reasonable degree of confidence in my intuitive choice, but as Kahneman and Klein (2009) point out, subjective confidence is “an unreliable indication of the validity of intuitive judgments”. Rather than simply leaping in with both feet, I consulted widely, searching for any cues that might suggest that this approach would not work. I asked my team to create a proof of concept. We looked at alternative options: and rejected several based on their contraindications:
|Tool Supported Data Inspection||
Ultimately we proceeded with the automated oracle and comparator, but we had a surprise waiting for us. There are a number of ways one can perform such reconciliation. Our approach was to use key matching to isolate the relative complements (giving us record level mismatches) and the intersection (which was then subject to further comparison to isolate field level mismatches). Shortly after we started execution, we ran into a problem: some of the tables used keys that we could not reliably predict during result generation, meaning that key matching was not going to work in these cases. At this more granular level of detail we had missed an important cue.
It was time to find an alternative, and I convened the team for a workshop. This time I wanted to see the approach could be used in an analytical mode. We identified and evaluated options but rather than doing a comparative evaluation we again focused on contraindications:
|Proxy key: identify a unique field in the source and match on that||
|Composite key: identify a combination of fields that guarantee uniqueness and match on them||
Understanding the contraindications of these options enabled us to quickly determine which practice should be applied to which tables. Then an additional benefit took me by surprise: when asked to give an account of our workaround, test framing was easier than expected. Compare the following attempts at framing:
- Using contraindications: “For tables 1 and 2, we couldn’t match on primary keys because they were unpredictable. For table 1 we used a proxy key, however for table 2 there is no one to one relationship between the source and target. Therefore we chose to use a composite key.”
- Using comparative evaluation: “Key matching is the preferred option because we have already developed the code. In contrast, additional effort will be required to use a proxy key, and even more in the case of a composite key. Key matching also provides a more reliable match than a composite key would, because if there are bugs in the logic of any of the fields used as a composite key we risk experiencing false positives, and in some cases – where the logic for a field is inverted (think not equals instead of equals) – false negatives. Um, er…where was I? Oh yes, we’re using a proxy key for table 1 and a composite key for table 2.”
The latter is rather overwhelming, whereas the former is more digestible and more conducive to communication. I found that it feels more natural to have a conversation about situations in which one would, or would not, apply a set of practices than talk about their relative merits: it connects immediately to context.
In summary, this approach seems to have had some benefits: it helped to rationalize and articulate situations in which different ETL testing approaches would be appropriate, it helped to streamline the comparative analysis of different options when searching for workable solutions to a problem and it helped to enhance the clarity of test framing. However, it is too early to determine whether it will also help to sensitize people to relevant contextual cues. Whilst I suspect that it will pay dividends in terms of how my team looks at similar problems in the future, and may be useful in training, I have yet to find an opportunity to put that to the test. We should exercise a degree of caution though. I’ve borrowed indications and contraindications, as well as some interesting examples, from medicine but we should not simply accept these because medicine has an air of authority. Doctors are people too, they are subject to the same biases that we are, and they make mistakes: expert intuition is not infallible. Groopman (2007) describes a number of examples, and points to a 1995 study that indicated that approximately 15% of all medical cases in the U.S. are misdiagnosed. We should also be wary of claiming the relevance of indications and contraindications across contexts. Two contexts might be similar in all but a handful of ways, but if those differences are relevant to a given practice in ways that are not understood then we risk missing important cues. The last thing we want to do is open a back door to best practice. Even in medicine, where there are bodies that maintain “authoritative” lists of indications and contraindications, there is a trend towards making decisions that blend such factors with the individual context of the patient.
For now, I have more questions than I started with:
- Does the Recognition Primed Decision model really provide a reasonable description of how testers think?
- Which aspects of testing lend themselves to analytical approaches? To intuitive approaches?
- Can a hybrid decision making model facilitate the training of intuition?
- Is there value in exploring the indications and contraindications used by expert testers?
- Can indications and contraindications add value in the training of testers?
What do you think; does this warrant further study?
- Bazerman, M.H. (2005). Judgment in Managerial Decision Making.
- Croskerry, P. (2009). A Universal Model for Diagnostic Reasoning.
- Gigerenzer, G. (2008). Gut Feelings.
- Groopman, J. (2007). How Doctors Think.
- Kahneman, D. and Klein, G. (2009). Conditions for Intuitive Expertise: A Failure to Disagree.
- Kaner, C. (2003). What is a good test case?
- Klein, G. (1999). Sources of Power: How People Make Decisions.
- Simon, H. (1956). Rational Choice and the Structure of the Environment.