Tag Archives: Checking

Dear Paul

Paul,

I’d like to thank you for your kind words with regards my recent post. I agree with your assertion that there are a number of factors at work that will influence whether a tester will notice more than a machine, and I’d love to know more about your case study.

I suspect that we are closely aligned when it comes to machine checking. One of the main benefits of making the checking/testing distinction is that it serves to highlight what is lost when one emphasizes checking at the expense of testing, or when one substitutes mechanized checks for human ones. I happened to glance at my dog-eared copy of the Test Heuristics Cheat Sheet today, and one item leapt out at me: “The narrower the view, the wider the ignorance”. Human checks have a narrower view than testing, and mechanized checks are narrower still. We need to acknowledge these tradeoffs, and manage them accordingly.

I think we need to be careful about the meanings that we give to the word “check”.  You say the usage that you have observed the most is when “talking about unmotivated, disinterested manual testers with little domain knowledge” or when “talking about machine checking”. Checking, in and of itself, is not a bad thing: rather checks are essential tools. Further, checking, or more accurately the testing activities that necessarily surround checking, are neither unskilled nor unintelligent. Not all checks are created equally: the invention, implementation and interpretation of some checks can require great skill. It is, in my opinion, an error to conflate checking and the work of “unmotivated, disinterested manual testers with little domain knowledge”. Testers who are making heavy use of checking are not necessarily neglecting their testing.

More generally, I worry about the tendency – conscious or otherwise – to use terms such as “checking testers” (not your words) as a pejorative and to connect checking with bad testing. I would agree that the inflexible, unthinking use of checks is bad. And I agree that many instances of bad testing are check-heavy and thought-light. But rather than labeling those who act in this way as “bad testers”, and stopping at the label, perhaps we should go deeper in our analysis. We do after all belong to a community that prides itself in doing just that. I like that, in your post, you do so by exploring some traits that might influence the degree to which testers will go beyond checking.

There are a multitude of reasons why testers might stop short of testing, and it seems to me that many of them are systemic. Here’s a few to consider, inspired in part by Ben Kelly’s series The Testing Dead (though far less stylish). It is neither exhaustive nor mutually exclusive:

  • The bad. Some people may actually be bad testers. It happens; I’ve met a few.
  • The uninformed. The testers who don’t know any better than to iterate through checks. It’s how they were raised as testers. Checking is all they’ve been taught; how to design checks, how to monitor the progress of checks, how to manage any mismatches that the checks might identify.
  • The oppressed. The testers who are incentivized solely on the progress of checks, or who are punished if they fail to hit their daily checking quotas. Testing is trivial after all: any idiot can do it. If you can’t hit your test case target you must be lazy.
  • The disenfranchised. Ah, the independent test group! The somebody-else’s problem group! Lock them in the lab, or better yet in a lab thousands of miles away, where they can’t bother the developers. If fed on a diet of low-bandwidth artifacts and divorced from the life, the culture of the project, is it any wonder then that their testing emphasizes the explicit and that their capacity to connect observations to value is compromised?
  • The demotivated. The testers who don’t care about their work. Perhaps they are simply uninformed nine-to-fivers, perhaps not. Perhaps they know that things can be different, cared once, but have given up: that’s one way to deaden the pain of hope unrealized. Many of the oppressed and disenfranchised might find themselves in this group one day.

Do you notice something? In many cases we can help! Perhaps we can encourage the bad to seek alternate careers. Perhaps we can help the uninformed by showing them a different way (and as you are an RST instructor, I know you are doing just that!). Perhaps we can even free the oppressed and the disenfranchised by influencing the customers of testing, the decision makers who insist on practices that run counter to their own best interests. That might take care of some of the demotivated too.

I like to think there is hope. Don’t you?

Kind regards,

Iain

Human and Machine Checking

In a post late last night, James Bach and Michael Bolton refined their definitions of testing and checking, and introduced a new distinction between human and machine checking.  What follows is an exploration of that latter distinction. These views are only partially baked, and I welcome your comments.

Recently Service Nova Scotia, which operates the registry of motor vehicles in the province of Nova Scotia, was featured on CBC radio. A number of customers had complained about the vetting of slogans used on personalized licence plates. According to local regulations it is unacceptable to display a plate that features obscene, violent, sexual or alcoholic references. As part of their vetting process, Service Nova Scotia uses software: a requested slogan is entered into a computer which then executes a check by comparing the slogan with a blacklist. If a match is found, the slogan is rejected. This is an example of machine checking: it is a process whereby a tool collects observations (reading the database field that holds the slogan), subjects those observations to an algorithmic decision rule (is the slogan in the blacklist?) and returns an evaluation (true or false). This is not the end of the story. Service Nova Scotia accepts that no machine can detect every case where a slogan may be inappropriate: a blacklist cannot possible contain every combination of letters and numbers that might be considered by someone to be offensive, slogans are also “checked” by a human. Of course, problems occasionally sneak through: there is a “TIMEUP” driving around Nova Scotia. Look at that closely; there are a couple of ways one could read that. One suggests a ticking clock; the other an invitation involving whips and chains.

This is a check in the natural sense, in the sense that the term is used in everyday English: “Please check this slogan”, “I checked the slogan, and it was fine”. But is it a human check per Bolton and Bach’s definition? Not necessarily: the definition turns on whether humans are attempting to apply algorithmic rules or whether they are doing something else1.

Let’s explore that distinction by way of another example. Imagine a computer program that produces a list of people’s names, where initials are allowed. An excerpt from such output might look something like this:

  • S.P. Maxwell
  • Judith Irving
  • Roderick Judge
  • 1-555-123-4567
  • Sally-Ann Hastings

I’d be surprised if you didn’t spot a bug! 1-555-123-4567 looks a lot like a (fake) telephone number rather than a name. In fact, whilst looking at this list you might be alert to a number of items that are clearly not names:

  • 37 Main St.
  • *%^&$
  • 

These are examples of items that could be detected by checking, in that one can form a rule to highlight them:

  • Names should only contain letters, periods or hyphens2.

Please go back and evaluate the list of names using that rule. Congratulations! You just performed a human check. Now try to evaluate this list using the same rule:

  • Rufus Mulholland the 3rd
  • F.U. Buddy
  • Tasty Tester
  • Metallica Jones

In this case you have again performed a human check, in that you attempted to apply the rule to your observations. What differentiates this from a machine check is that you may have done something that a machine, applying the rule, could not have:

  • You may have recognized that the suffix “the 3rd” is perfectly acceptable in a name, even though it violates the rule.
  • You may have recognized that “F.U. Buddy” is potentially offensive, and that “Tasty Tester” and “Metallica Jones”3 are unlikely names, even though they do not violate the rule.

So what happened here? As you observed the list of names, you glossed over the first item, then on the second, third and fourth items you experienced a flash of recognition as your expectations were violated: you have brought your tacit knowledge to bear. Specifically, you have drawn on collective tacit knowledge4, knowledge that is rooted in society. It is entirely reasonable to assume that some readers of this blog, who are not native English speakers or who are unfamiliar with Western popular music, would have reacted differently to this list.

What does this have with attempting to check? The distinction relates to the difference between machine and human evaluation. A machine can evaluate observations against a rule in the same sense that a conditional expression evaluates to true or false. And of course, a human can do this too. What a machine cannot do, and a human will struggle not to do, is to connect observations to value.  When a human is engaged in checking this connection might be mediated through a decision rule: is this output of check a good result or a bad one? In this case we might say that the human’s attempt to check has succeeded but that at the point of evaluation the tester has stepped out from checking and is now testing. Alternatively, a human might connect observations to value in a way such that the checking rule is bypassed. As intuition kicks in and the tester experiences a revelation (“That’s not right!”) the attempt to check has failed in that the rule has not been applied, but never mind: the tester has found something interesting. Again, the tester has stepped out from checking and into testing. This is the critical distinction between human and machine checking: that a human – even when attempting to apply a rule – has the capacity5 to connect observations to value within the frame of a project, a business, or society in general. A human, on being exposed to a threat to value, can abort a check and revert to testing. In contrast, all a machine check can do is report the result of observations being subjected to a rule.

This has important implications. Computers are wondrous things; they can reliably execute tasks with speed, precision and accuracy that are unthinkable in a human. But when it comes to checking, they can only answer questions that we have thought to program them to ask. When we attempt to substitute a machine check for a human check, we are throwing away the opportunity to discover information that only a human could uncover.

Notes

1 In case you are wondering, humans at Service Nova Scotia are “doing something else”. They do not attempt to apply an explicit decision rule; they “eyeball” the slogan and reject it if anything strikes them as being inappropriate. No rule is involved, no attempt to check is made.

2 I’m sure you can come up with lots of ways in which this rule is inadequate.

3 We need to be careful with this one: some fans are obsessive: in the UK there is at least one called, after changing his name by deed poll, “Status Quo”.

4 Tacit and Explicit Knowledge, Harry Collins (2010).

5 I describe this as a capacity: it not a guarantee that a human will recognize all threats to value. Recall that such recognition is tied to collective tacit knowledge, knowledge that is rooted in society. Your exposure to different projects, different environments, different cultures, has a bearing on the problems that you will recognize. For example: I used to work for a Telecommunications operator in the UK. On one occasion a disgruntled employee reputedly changed the account details of one of our major customers, a police force, such that bills would be addressed to “P.C. Plod”. British readers are likely to recognize this as a potentially embarrassing problem. For those of you who haven’t spent any time in the UK there is a good chance that this would be invisible to you: “Plod” is a mildly offensive term for a police officer and suggests a slowness of wit.

Not Checking

Michael Bolton’s distinction between testing and checking seems to have had some unintended consequences. Don’t get me wrong: I have found immense value in this distinction. It’s just that there is some confusion that needs clearing up.

The first confusion relates to value. Some of us (myself included) have fallen into the trap of undervaluing checking. I wrote about this in Checking is Not Evil, and I’m not going to explore that any further today.

The second confusion relates to the relationship between testing and checking. I’ve seen some of this locally, and it is also apparent in some of the comments to Checking is Not Evil. It’s easy to read blog post titles like “Testing vs. Checking” (and possibly reading no further) whilst imaging testing and checking as a Venn diagram that looks like this:

 

 

 

 

 

That would be a mistake. As Michael points out in that self-same post, testing involves checking. In other words: checking is something that we can do as part of testing. It is a subset:

 

 

 

 

 

The third confusion relates to the content of the-bit-of-testing-that-is-not-checking. One response I’ve run into involves some vague hand waving and allusions to mystical powers that I know about but am not going to share with you. Another is something along the lines of “well obviously that’d be exploratory testing”. There is some truth in this, in that many of the things that make for good ET will fall under this category, but more generally this answer is not a good fit: ET is an approach, a style of testing. It’s definition is too narrow to encompass all of testing less checking.

So what IS the-bit-of-testing-that-is-not-checking? It is what you – as a human being – bring to testing: any skill, any mindset, and any insight that serves your testing mission. You might have skills at identifying checks or automating them, you might have a nose for trouble or a mind made for modeling, you might be a master relationship builder. That-which-is-testing-but-is-not-checking is anything and everything that you can do that a tool cannot, anything and everything that you can bend to the purpose of investigating software so as to provide information about it.

Checking Is Not Evil

A few days ago, over at http://context-driven-testing.com, Cem Kaner expressed the view that manual exploratory testing had become the new Best Practice. Cem’s comments resonate strongly for me as I have recently recognized and sought to correct that in my own thinking.

Whilst on BBST Test Design last year, Cem called me out on one question where I’d answered from a position of context driven dogma rather than any kind of well thought out position. This was an invaluable lesson that made me begin to question my prejudices when selecting a test approach.

I recognized that I had begun to show a strong preference towards manual exploration over automated testing, of disconfirmatory testing over confirmatory checking. I resolved to challenge myself, and for every approach I considered, to think of at least two others so as to compare and contrast their relative merits.
 
Then came the moment of truth: a new project. And as luck would have it, one that would go to the very heart of this issue.
 
The project in question is all about data. There are many thousands of conditions that are specified detail, the data combinations expand this out to potentially billions of individual checks. I would be ignoring a significant part of my testing mission were I to disregard either checking or automation in favor of manual ET. In short, the context demands that large-scale mechanized checking form a big part of the test approach.
 
Does this mean that we will rely solely on checking and disregard exploration?
Does a check-heavy strategy make this any less context driven?
 
No, and no. As I wrote here, checking may be necessary but it is not sufficient. This project is not unique: there is plenty of opportunity for unfulfilled needs, undesired or unexpected behaviour. We will test the specifications before a single line of executable code is available. We will challenge our own understanding of the design, and seek to align the perspectives and understanding of others. We will use risk based test design and seek to reveal failures. We will accept that the specification is not perfect, and that we will need to adjust our thinking as the project evolves. We may check, we may not test manually, but the essence of what we do will remain both exploratory and context driven.
 
Checking is not evil, nor (as far as I am aware) did anyone say that it is. 

Spec Checking and Bug Blindness

Testing is often reduced to the act of modelling specified behaviours as expected results and checking actual software behaviours against the model.

This reduction trivializes the complexity of the testing problem, and reliance on such an approach is both flawed and dangerous.

In “Software Testing: A Craftsman’s Approach”, Paul Jorgensen discusses some questions about testing through the use of Venn diagrams. This post will use a modified version of those diagrams to explore the kinds of issues that we miss if we rely solely on checking against specifications.

Rather than depicting, as Jorgensen does, the relationship between specified, implemented and tested behaviours, the diagrams used in this post recognize that there is a distinction between desired behaviour and those that are specified.  Given a universe of all possible system behaviours, and three overlapping sets containing those behaviours that are needed, specified or implemented, we might conceive of the testing problem like this:

Those behaviours that lie within the intersection of all three sets (region 5 on the above diagram) represent those behaviours that are needed, specified and implemented.

From a bug detection viewpoint, the remaining regions of these sets (i.e. 1 to 4, 6 and 7) are more interesting in that they can be grouped in such a way as to describe 4 possible classes of bug. Let’s take a look at each class in turn:

Unimplemented Specifications

The highlighted section of this diagram equates to behaviours that were specified but not implemented: either the required features were not implemented or they were implemented incorrectly such that the intended behaviour does not occur.

It is this kind of bug that the specification based checking is geared towards.

Unfulfilled Needs

In this case, the highlighted section of this diagram equates to behaviours that are needed but have not been implemented. Note that these bug classes are not mutually exclusive: a bug in region 2 can be categorized as both an unfulfilled need and an unimplemented specification bug.

This kind of bug is far more insidious than those belonging to the previous class: whilst we may catch some such bugs (those in region 2) by checking against specifications, we will miss those that relate to needs that were either not articulated or not captured when the software was specified. Finding this kind of bug requires testing that is sensitive not just to specified behaviour, but to the underlying needs of the customer.

Important: whilst some will argue that such behaviours are “out of scope” for a given project by virtue of not having been specified, building software that does not fulfill the needs of its customer is a fast route to failure.

Unexpected Behaviour

With this class, the tester’s life starts to get interesting. The highlighted section of this diagram equates to behaviours that have been implemented but were not specified: this is the realm of the unexpected bug. Occasionally, unexpected behaviour may turn into an unexpected boon (region 4): behaviour that was not specified but is actually desired (perhaps the developer had an insight into real needs and implemented something without it being in the spec).

Other than through the intervention of dumb luck, specification based checking will miss many bugs in this class. Some will be apparent where an unspecified behaviour is substituted for a specified one; however this class also includes pretty much anything that could fail. Testing for this kind of bug requires the creativity to imagine possible failures and the skill to craft tests that will determine whether or not they they can occur.

Undesired Behaviour

The highlighted section of this diagram equates to behaviours that have been implemented but were neither needed nor desired: for example, “gold-plated” features or behaviours that were specified incorrectly.

Much like the previous class, specification based checking will miss many of these bugs. It will also be completely blind to behaviours that were specified but are not desired. Like the previous class, testing for this kind of bug requires imagination and skill. It also requires an understanding of customer needs that is sufficient to identify potential issues regardless of whether specified or not.

Conclusions

Specification based checking is a good fit for only one of the four classes of bug discussed here.  In the other three cases, the power of such an approach is seriously limited. Whilst such an approach may be necessary, it is insufficient if the testing mission is the discovery of bugs: an excessive reliance on it will inevitably result in important bugs being missed.

Testing for many types of bugs requires a more investigative approach: an approach that brings the skill, creativity and knowledge of the tester into play.