Human and Machine Checking

In a post late last night, James Bach and Michael Bolton refined their definitions of testing and checking, and introduced a new distinction between human and machine checking.  What follows is an exploration of that latter distinction. These views are only partially baked, and I welcome your comments.

Recently Service Nova Scotia, which operates the registry of motor vehicles in the province of Nova Scotia, was featured on CBC radio. A number of customers had complained about the vetting of slogans used on personalized licence plates. According to local regulations it is unacceptable to display a plate that features obscene, violent, sexual or alcoholic references. As part of their vetting process, Service Nova Scotia uses software: a requested slogan is entered into a computer which then executes a check by comparing the slogan with a blacklist. If a match is found, the slogan is rejected. This is an example of machine checking: it is a process whereby a tool collects observations (reading the database field that holds the slogan), subjects those observations to an algorithmic decision rule (is the slogan in the blacklist?) and returns an evaluation (true or false). This is not the end of the story. Service Nova Scotia accepts that no machine can detect every case where a slogan may be inappropriate: a blacklist cannot possible contain every combination of letters and numbers that might be considered by someone to be offensive, slogans are also “checked” by a human. Of course, problems occasionally sneak through: there is a “TIMEUP” driving around Nova Scotia. Look at that closely; there are a couple of ways one could read that. One suggests a ticking clock; the other an invitation involving whips and chains.

This is a check in the natural sense, in the sense that the term is used in everyday English: “Please check this slogan”, “I checked the slogan, and it was fine”. But is it a human check per Bolton and Bach’s definition? Not necessarily: the definition turns on whether humans are attempting to apply algorithmic rules or whether they are doing something else1.

Let’s explore that distinction by way of another example. Imagine a computer program that produces a list of people’s names, where initials are allowed. An excerpt from such output might look something like this:

  • S.P. Maxwell
  • Judith Irving
  • Roderick Judge
  • 1-555-123-4567
  • Sally-Ann Hastings

I’d be surprised if you didn’t spot a bug! 1-555-123-4567 looks a lot like a (fake) telephone number rather than a name. In fact, whilst looking at this list you might be alert to a number of items that are clearly not names:

  • 37 Main St.
  • *%^&$
  • 

These are examples of items that could be detected by checking, in that one can form a rule to highlight them:

  • Names should only contain letters, periods or hyphens2.

Please go back and evaluate the list of names using that rule. Congratulations! You just performed a human check. Now try to evaluate this list using the same rule:

  • Rufus Mulholland the 3rd
  • F.U. Buddy
  • Tasty Tester
  • Metallica Jones

In this case you have again performed a human check, in that you attempted to apply the rule to your observations. What differentiates this from a machine check is that you may have done something that a machine, applying the rule, could not have:

  • You may have recognized that the suffix “the 3rd” is perfectly acceptable in a name, even though it violates the rule.
  • You may have recognized that “F.U. Buddy” is potentially offensive, and that “Tasty Tester” and “Metallica Jones”3 are unlikely names, even though they do not violate the rule.

So what happened here? As you observed the list of names, you glossed over the first item, then on the second, third and fourth items you experienced a flash of recognition as your expectations were violated: you have brought your tacit knowledge to bear. Specifically, you have drawn on collective tacit knowledge4, knowledge that is rooted in society. It is entirely reasonable to assume that some readers of this blog, who are not native English speakers or who are unfamiliar with Western popular music, would have reacted differently to this list.

What does this have with attempting to check? The distinction relates to the difference between machine and human evaluation. A machine can evaluate observations against a rule in the same sense that a conditional expression evaluates to true or false. And of course, a human can do this too. What a machine cannot do, and a human will struggle not to do, is to connect observations to value.  When a human is engaged in checking this connection might be mediated through a decision rule: is this output of check a good result or a bad one? In this case we might say that the human’s attempt to check has succeeded but that at the point of evaluation the tester has stepped out from checking and is now testing. Alternatively, a human might connect observations to value in a way such that the checking rule is bypassed. As intuition kicks in and the tester experiences a revelation (“That’s not right!”) the attempt to check has failed in that the rule has not been applied, but never mind: the tester has found something interesting. Again, the tester has stepped out from checking and into testing. This is the critical distinction between human and machine checking: that a human – even when attempting to apply a rule – has the capacity5 to connect observations to value within the frame of a project, a business, or society in general. A human, on being exposed to a threat to value, can abort a check and revert to testing. In contrast, all a machine check can do is report the result of observations being subjected to a rule.

This has important implications. Computers are wondrous things; they can reliably execute tasks with speed, precision and accuracy that are unthinkable in a human. But when it comes to checking, they can only answer questions that we have thought to program them to ask. When we attempt to substitute a machine check for a human check, we are throwing away the opportunity to discover information that only a human could uncover.

Notes

1 In case you are wondering, humans at Service Nova Scotia are “doing something else”. They do not attempt to apply an explicit decision rule; they “eyeball” the slogan and reject it if anything strikes them as being inappropriate. No rule is involved, no attempt to check is made.

2 I’m sure you can come up with lots of ways in which this rule is inadequate.

3 We need to be careful with this one: some fans are obsessive: in the UK there is at least one called, after changing his name by deed poll, “Status Quo”.

4 Tacit and Explicit Knowledge, Harry Collins (2010).

5 I describe this as a capacity: it not a guarantee that a human will recognize all threats to value. Recall that such recognition is tied to collective tacit knowledge, knowledge that is rooted in society. Your exposure to different projects, different environments, different cultures, has a bearing on the problems that you will recognize. For example: I used to work for a Telecommunications operator in the UK. On one occasion a disgruntled employee reputedly changed the account details of one of our major customers, a police force, such that bills would be addressed to “P.C. Plod”. British readers are likely to recognize this as a potentially embarrassing problem. For those of you who haven’t spent any time in the UK there is a good chance that this would be invisible to you: “Plod” is a mildly offensive term for a police officer and suggests a slowness of wit.

14 thoughts on “Human and Machine Checking”

  1. The subject of tacit knowledge is fascinating. One of the problems with detailed scripts is that they attempt to ignore tacit knowledge, or remove it from the process. Detailed test scripts implicitly assume that the knowledge necessary to test the application has already been gained, and that the testing can be reduced to a series of true/false checks.

    Checking these decision rules is valuable if the answers are the starting point for the important, searching follow up questions to try and understand what is going on. If the results of checking are regarded as definitive answers in their own right then you’re in trouble.

    1. James,

      I wholeheartedly agree with your final sentence: “If the results of checking are regarded as definitive answers in their own right then you’re in trouble.”

      I’d like to draw a couple of further distinctions though:

      -Whilst detailed scripts often* imply checking, checking does not necessarily imply detailed scripts.
      -Checks can be used in pursuit of learning (e.g. “I wonder: if X then Y? Check for Y when X”)
      -Individual checks may yield a true/false (I’m entertaining the idea of richer checks), but this needn’t mean pass/fail (e.g. X may be a sign of trouble worth investigating further, check for X)

      In other words, the testing vs. checking distinction is different from the scripted vs exploratory debate.

      Yours

      -Iain

      * I have from time to time employed open ended scripts that invite the tester to apply judgement. One might argue that these do not qualify as “detailed scripts”, and I’m sure its not the thing you had in mind.

  2. Thanks for this post, Iain. One case where human checking is currently necessary is performing functional accessibility testing. Static checking tools can be useful as an initial check for missing identifier fields or frame identification, but verifying context of screen reader outputs requires a deep tacit understanding of the application business logic. However, I have no doubt that “smart” tools will soon be able to take up more and more of this type of checking. Case in point: Since James and Michael first started making the test vs check distinction, a “smart tool” won a Jeopardy competition. I wonder if this evolution in computer checking had some contribution to this more recent distinction with human checking.

    1. Jeff,

      Thanks. I suspect you are right: that the possible scope of machine checking will expand as the capabilities of technology grow.

      This will have interesting repercussions. Checks are already heuristic in nature, but add in the heuristic processing used by Watson (and Deep Blue before it), and we have an even bigger mess on our hands. And as technology grows ever more wonderously whizbang and bamboozling, we’ll have to intensify the campaign to educate folk that machine checking != testing.

      -Iain

    1. Mario,

      An interesting reply. All rhetoric, little content, and completely unresponsive to any point that I made. Nice.

      -Iain

  3. Hi Ian,

    I’ve just been reading this post again…

    I wholeheartedly agree with your interpretation of James and Michael’s distinction between human and machine checking.

    The company I work for produces software which attempts to find the correct Postcode for a UK address. There are huge problems with searching on fragments of an address especially if you cannot easily tell whether the term being entered is a street or a town (London Road, Crewe for example) and what is right or wrong is very subjective. Very often it is ‘obvious’ to a human which address is the correct one to use for gathering the Postcode but it sure cannot be made that obvious to a computer!

    Whilst there is a certain amount of tool support that can be used we would miss so much information if we relied on tools and computer-driven checking exclusively. My checks have often led me to question the path through the algorithm that certain addresses take. To me that is taking me into the realm of proper ‘testing’.

    Sadly, a lot of people don’t see that you just cannot automate this sort of thing! If I could have, I would have because it’s headache inducing stuff sometimes!

    Regards,

    Stephen

  4. Hi Iain,

    First off: love your blog! Second: I completely disagree with Mario about testing and checking. Which is why I’m posting a comment here. I suggested to James Bach challenging Mario to test something using his (old) techniques versus James using CDT (see my comment here http://www.satisfice.com/blog/archives/893#comment-270145).

    Anyway, just letting you and others know about this so you can watch when he goes down in flames.

    -Matt

Leave a Reply

Your email address will not be published. Required fields are marked *