ET: Why We Do It, an article by Petter Mattson

What follows is an article by my colleague Petter Mattson.

Petter and I recently made each other’s acquaintance after our organizations, Logica and CGI, merged.  An experienced test manager and an advocate for exploratory testing, Petter wrote this article for internal publication within Logica. Unfortunately its contents were sufficiently divergent with the official testing methodology that it was never published.  Many of the points in this piece resonated for me, and I was determined that it see the light of day.

I’d like to thank Petter, and his management at CGI in Sweden, for allowing me to publish it on Exploring Uncertainty.


Click here for Petter’s article


Dear Paul


I’d like to thank you for your kind words with regards my recent post. I agree with your assertion that there are a number of factors at work that will influence whether a tester will notice more than a machine, and I’d love to know more about your case study.

I suspect that we are closely aligned when it comes to machine checking. One of the main benefits of making the checking/testing distinction is that it serves to highlight what is lost when one emphasizes checking at the expense of testing, or when one substitutes mechanized checks for human ones. I happened to glance at my dog-eared copy of the Test Heuristics Cheat Sheet today, and one item leapt out at me: “The narrower the view, the wider the ignorance”. Human checks have a narrower view than testing, and mechanized checks are narrower still. We need to acknowledge these tradeoffs, and manage them accordingly.

I think we need to be careful about the meanings that we give to the word “check”.  You say the usage that you have observed the most is when “talking about unmotivated, disinterested manual testers with little domain knowledge” or when “talking about machine checking”. Checking, in and of itself, is not a bad thing: rather checks are essential tools. Further, checking, or more accurately the testing activities that necessarily surround checking, are neither unskilled nor unintelligent. Not all checks are created equally: the invention, implementation and interpretation of some checks can require great skill. It is, in my opinion, an error to conflate checking and the work of “unmotivated, disinterested manual testers with little domain knowledge”. Testers who are making heavy use of checking are not necessarily neglecting their testing.

More generally, I worry about the tendency – conscious or otherwise – to use terms such as “checking testers” (not your words) as a pejorative and to connect checking with bad testing. I would agree that the inflexible, unthinking use of checks is bad. And I agree that many instances of bad testing are check-heavy and thought-light. But rather than labeling those who act in this way as “bad testers”, and stopping at the label, perhaps we should go deeper in our analysis. We do after all belong to a community that prides itself in doing just that. I like that, in your post, you do so by exploring some traits that might influence the degree to which testers will go beyond checking.

There are a multitude of reasons why testers might stop short of testing, and it seems to me that many of them are systemic. Here’s a few to consider, inspired in part by Ben Kelly’s series The Testing Dead (though far less stylish). It is neither exhaustive nor mutually exclusive:

  • The bad. Some people may actually be bad testers. It happens; I’ve met a few.
  • The uninformed. The testers who don’t know any better than to iterate through checks. It’s how they were raised as testers. Checking is all they’ve been taught; how to design checks, how to monitor the progress of checks, how to manage any mismatches that the checks might identify.
  • The oppressed. The testers who are incentivized solely on the progress of checks, or who are punished if they fail to hit their daily checking quotas. Testing is trivial after all: any idiot can do it. If you can’t hit your test case target you must be lazy.
  • The disenfranchised. Ah, the independent test group! The somebody-else’s problem group! Lock them in the lab, or better yet in a lab thousands of miles away, where they can’t bother the developers. If fed on a diet of low-bandwidth artifacts and divorced from the life, the culture of the project, is it any wonder then that their testing emphasizes the explicit and that their capacity to connect observations to value is compromised?
  • The demotivated. The testers who don’t care about their work. Perhaps they are simply uninformed nine-to-fivers, perhaps not. Perhaps they know that things can be different, cared once, but have given up: that’s one way to deaden the pain of hope unrealized. Many of the oppressed and disenfranchised might find themselves in this group one day.

Do you notice something? In many cases we can help! Perhaps we can encourage the bad to seek alternate careers. Perhaps we can help the uninformed by showing them a different way (and as you are an RST instructor, I know you are doing just that!). Perhaps we can even free the oppressed and the disenfranchised by influencing the customers of testing, the decision makers who insist on practices that run counter to their own best interests. That might take care of some of the demotivated too.

I like to think there is hope. Don’t you?

Kind regards,


Human and Machine Checking

In a post late last night, James Bach and Michael Bolton refined their definitions of testing and checking, and introduced a new distinction between human and machine checking.  What follows is an exploration of that latter distinction. These views are only partially baked, and I welcome your comments.

Recently Service Nova Scotia, which operates the registry of motor vehicles in the province of Nova Scotia, was featured on CBC radio. A number of customers had complained about the vetting of slogans used on personalized licence plates. According to local regulations it is unacceptable to display a plate that features obscene, violent, sexual or alcoholic references. As part of their vetting process, Service Nova Scotia uses software: a requested slogan is entered into a computer which then executes a check by comparing the slogan with a blacklist. If a match is found, the slogan is rejected. This is an example of machine checking: it is a process whereby a tool collects observations (reading the database field that holds the slogan), subjects those observations to an algorithmic decision rule (is the slogan in the blacklist?) and returns an evaluation (true or false). This is not the end of the story. Service Nova Scotia accepts that no machine can detect every case where a slogan may be inappropriate: a blacklist cannot possible contain every combination of letters and numbers that might be considered by someone to be offensive, slogans are also “checked” by a human. Of course, problems occasionally sneak through: there is a “TIMEUP” driving around Nova Scotia. Look at that closely; there are a couple of ways one could read that. One suggests a ticking clock; the other an invitation involving whips and chains.

This is a check in the natural sense, in the sense that the term is used in everyday English: “Please check this slogan”, “I checked the slogan, and it was fine”. But is it a human check per Bolton and Bach’s definition? Not necessarily: the definition turns on whether humans are attempting to apply algorithmic rules or whether they are doing something else1.

Let’s explore that distinction by way of another example. Imagine a computer program that produces a list of people’s names, where initials are allowed. An excerpt from such output might look something like this:

  • S.P. Maxwell
  • Judith Irving
  • Roderick Judge
  • 1-555-123-4567
  • Sally-Ann Hastings

I’d be surprised if you didn’t spot a bug! 1-555-123-4567 looks a lot like a (fake) telephone number rather than a name. In fact, whilst looking at this list you might be alert to a number of items that are clearly not names:

  • 37 Main St.
  • *%^&$
  • 

These are examples of items that could be detected by checking, in that one can form a rule to highlight them:

  • Names should only contain letters, periods or hyphens2.

Please go back and evaluate the list of names using that rule. Congratulations! You just performed a human check. Now try to evaluate this list using the same rule:

  • Rufus Mulholland the 3rd
  • F.U. Buddy
  • Tasty Tester
  • Metallica Jones

In this case you have again performed a human check, in that you attempted to apply the rule to your observations. What differentiates this from a machine check is that you may have done something that a machine, applying the rule, could not have:

  • You may have recognized that the suffix “the 3rd” is perfectly acceptable in a name, even though it violates the rule.
  • You may have recognized that “F.U. Buddy” is potentially offensive, and that “Tasty Tester” and “Metallica Jones”3 are unlikely names, even though they do not violate the rule.

So what happened here? As you observed the list of names, you glossed over the first item, then on the second, third and fourth items you experienced a flash of recognition as your expectations were violated: you have brought your tacit knowledge to bear. Specifically, you have drawn on collective tacit knowledge4, knowledge that is rooted in society. It is entirely reasonable to assume that some readers of this blog, who are not native English speakers or who are unfamiliar with Western popular music, would have reacted differently to this list.

What does this have with attempting to check? The distinction relates to the difference between machine and human evaluation. A machine can evaluate observations against a rule in the same sense that a conditional expression evaluates to true or false. And of course, a human can do this too. What a machine cannot do, and a human will struggle not to do, is to connect observations to value.  When a human is engaged in checking this connection might be mediated through a decision rule: is this output of check a good result or a bad one? In this case we might say that the human’s attempt to check has succeeded but that at the point of evaluation the tester has stepped out from checking and is now testing. Alternatively, a human might connect observations to value in a way such that the checking rule is bypassed. As intuition kicks in and the tester experiences a revelation (“That’s not right!”) the attempt to check has failed in that the rule has not been applied, but never mind: the tester has found something interesting. Again, the tester has stepped out from checking and into testing. This is the critical distinction between human and machine checking: that a human – even when attempting to apply a rule – has the capacity5 to connect observations to value within the frame of a project, a business, or society in general. A human, on being exposed to a threat to value, can abort a check and revert to testing. In contrast, all a machine check can do is report the result of observations being subjected to a rule.

This has important implications. Computers are wondrous things; they can reliably execute tasks with speed, precision and accuracy that are unthinkable in a human. But when it comes to checking, they can only answer questions that we have thought to program them to ask. When we attempt to substitute a machine check for a human check, we are throwing away the opportunity to discover information that only a human could uncover.


1 In case you are wondering, humans at Service Nova Scotia are “doing something else”. They do not attempt to apply an explicit decision rule; they “eyeball” the slogan and reject it if anything strikes them as being inappropriate. No rule is involved, no attempt to check is made.

2 I’m sure you can come up with lots of ways in which this rule is inadequate.

3 We need to be careful with this one: some fans are obsessive: in the UK there is at least one called, after changing his name by deed poll, “Status Quo”.

4 Tacit and Explicit Knowledge, Harry Collins (2010).

5 I describe this as a capacity: it not a guarantee that a human will recognize all threats to value. Recall that such recognition is tied to collective tacit knowledge, knowledge that is rooted in society. Your exposure to different projects, different environments, different cultures, has a bearing on the problems that you will recognize. For example: I used to work for a Telecommunications operator in the UK. On one occasion a disgruntled employee reputedly changed the account details of one of our major customers, a police force, such that bills would be addressed to “P.C. Plod”. British readers are likely to recognize this as a potentially embarrassing problem. For those of you who haven’t spent any time in the UK there is a good chance that this would be invisible to you: “Plod” is a mildly offensive term for a police officer and suggests a slowness of wit.


A few months ago, I met a colleague from a distant part of the organization.

“My biggest problem”, he said, “is a lack of consistency in test analysis. No two of my testers do it the same”.

“Really?” I enquired, “How do you mean?”

“Well, they document things differently, but that isn’t the problem…given the same specification they’ll come up with different tests. They approach things differently on each project. I need them to be doing the same things every time: I need consistency, I need consistent processes and methods.”

“Well, consistency can be important” I replied. “Will consistently crap work for you?”

“No, of course not. We need to be doing good testing.”

“So a consistent approach might be less important than consistent results?”

“Er, maybe. Doesn’t one imply the other?”

And therein lies the problem. With simple systems it’s reasonable to assume that you will always get the same output for a given set of inputs. Think of domestic light: you flick the switch and the light comes on. Flick it again and the light goes off. Even adding a little complexity by means of another staircase switch doesn’t add much in the way of complication. Now we don’t know whether switch-up or switch-down equates to on or off, but flicking the switch still toggles the state of the light.

If only software development was like this! Such simplicity is beguiling, and many of our mental models are rooted in the assumption that this is how the world works. It doesn’t of course.

Unfortunately, software projects are nothing like this. Imagine a lighting circuit with many, many switches. Imagine you don’t know how many switches there are, or indeed where they are. Imagine that you have no knowledge of the starting state of the circuit, and therefore which switches need flicking to get the lights on. Imagine that some of the switches are analogue dimmers rather than their binary cousins. Imagine that there are other actors who are also flicking switches or changing the configuration of the circuit. Imagine that those involved have different ideas as to which light should come on, or how bright it should be: the streetlight folk and the reading light people just can’t agree, and nobody’s talking to team ambience. Now imagine that you’ve been asked to assess the quality of the lighting once someone has figured out how to turn the damned thing on.

Now subject this scenario to a consistent set of inputs. Try to form some repeatable algorithm to reliably “solve” this problem. You can’t, it’s impossible, it’s an intractable problem. This feels a little more like the testing I know, though still a hopeless simplification.

“That,” I explained, “is why we need to emphasize skill over Method. Software projects are complex systems, and repeating the same inputs doesn’t work very well. Mechanical, repeatable processes are no guarantee of success. We need testers who are able to navigate that maze, and figure out what needs doing given.”

I made a few notes, wrote a few tweets, and got back to business. Thoughts about consistency however, never strayed far away. Then, earlier this week, they paid me another visit.

I’ve been moonlighting as a lecturer at Dalhousie University. At the start of the course I set the students an assignment (based on one of Kaner’s from BBST Test Design) to analyze a specification and identify a list of test ideas. This week’s lecture was on test design: for the first half of the lecture we discussed some of the better known, and more mechanistic, test design techniques. Then I asked the class to break into groups, compare and contrast their test ideas, and present on any similarities and differences.

“We had a few ideas in common,” said the first student “and a load of differences”. He went on to list them.

“You came up with different ideas?” I asked, feigning horror.

“Er, yes”.

“How can that be? I’ve just spent a good part of the last 3 hours describing a set of techniques that should guarantee the same set of tests. What happened?”

“Well, um, we came up with a lot of ideas that we wouldn’t have using those techniques.”

“Good, I was hoping as much.”


“Tell me why you think there were differences. Why wouldn’t these techniques have suggested those ideas?” I asked.

“Well, I guess we used our imaginations. And we’ve all got different backgrounds, so we all thought of different things.”

“Exactly: the techniques we’ve discussed are mechanistic. They might give you consistency, but they remove imagination –and your own experiences- from the test design. Now, tell me, which would you prefer: testing based on your original list of ideas, or based on the ideas of the group as a whole? Which do you think would provide better information to the project?”

“The group’s ideas” he said without hesitation.


“Well some of us focused on specific things, the overall list is better…better-rounded. We’d miss lots of stuff with our own.”

“So this inconsistency between individuals, is it a good thing or a bad thing?”

“I think it’s good. Those techniques might be useful for some problems, but by sharing our different ideas, I think we can test better.”

“Thank you” I said, mission accomplished.

Models of Automation

Why do many people completely miss obvious opportunities to automate? This question has been bothering me for years. If you watch carefully, you’ll see this all around you: people, who by rights have deep and expansive experience of test automation, unable to see the pot of gold lying right in front of them. Instead, they charge off hither and thither, searching for nuggets or ROI-panning in the dust.

Last year I witnessed a conversation, which whilst frustrating at the time, suggested an answer to this puzzle. One of my teams had set about creating a set of tools to help us to test an ETL process. We would use these tools to mine test data and evaluate it against a coverage model (as a prelude to data conditioning), for results prediction and for reconciliation. Our client asked that we hook up with their internal test framework team in order to investigate how we might integrate our toolset with their test management tool. The resulting conversation resembled first contact, and without the benefit of an interpreter:

Framework team: So how do you run the tests?

Test team: Well, first we mine the data, and then we condition it, run the parallel implementation to predict results, execute the real ETL, do a reconciliation and evaluate the resulting mismatches for possible bugs.

Framework team: No, I mean, how do you run the automated tests?

Test team: Well, there aren’t any automated tests per se, but we have tools that we’ve automated.

Framework team: What tests do those tools run?

Test team: They don’t really. We run the tests, the tools provide information…

Framework team: So…how do you choose which test to run and how do you start them?

Test team: We never run one test at a time, we run thousands of tests all at the same time, and we trigger the various stages with different tools.

Framework team: Urghh?

Human beings are inveterate modelers: we represent the world through mental constructs that organize our beliefs, knowledge and experiences. It is from the perspective of these models that we draw our understanding of the world. And when people with disjoint mental models interact, the result is often confusion.

So, how did the framework team see automation?

  • We automate whole tests.

And the test team?

  • We automate tasks. Those tasks can be parts of tests.

What we had here wasn’t just a failure to communicate, but a mismatch in the way these teams perceived the world. Quite simply, these two teams had no conceptual framework in common; they had little basis for a conversation.

Now, such models are invaluable, we could not function without them: they provide cues as to where to look, what to take note of, what to consider important. The cognitive load of processing every detail provided by our senses would paralyze us. The price is that we do not perceive the world directly; rather, we do so through the lens of our models. And it is easy to miss things that do not fit those models. This is exactly like test design: the mechanism is the same. When we rely on a single test design technique, which after all is only a model used to enumerate specific types of tests, we will only tend to find bugs of a single class: we will be totally blind to other types of bug. When we use a single model to identify opportunities to automate, we will only find opportunities of a single class, and be totally blind to opportunities that don’t fit that particular mold.

Let’s look at the model being used by the framework team again. This is the more traditional view of automation. It’s in the name: test automation. The automation of tests. If you’re not automating tests it’s not test automation, right? For it to be test automation, then the entire test must be automated. This is a fundamental assumption that underpins the way many people think about automation. You can see it at work behind many common practices:

  • The measurement, and targeting, of the proportion of tests that have been automated. Replacement (and I use the term loosely) of tests performed by humans by tests performed by machines. The unit of measure is the test, the whole test and nothing but the test.
  • The selection of tests for automation by using Return on Investment analysis. Which tests, when automated, offer the greatest return on the cost of automation? Which tests should we consider? The emphasis is on evaluating tests for automation, not on evaluating what could be usefully automated.
  • Seeking to automate all tests. Not only must whole tests be automated, every last one should be.

This mental model, this belief that we automate at the test level, may have its uses. It may guide us to see cases where we might automate simple checks of results vs. expectations for given inputs. It is blind to cases that are more sophisticated. It is blind to opportunities where parts of a test might be automated, and other parts may not be. But many tests are exactly like this! Applying a different model of what constitutes a test, and applying this in the context of testing on a specific project, can provide insight into where automation may prove useful. So, when looking for opportunities for automation, I’ve turned to this model for aid1:

  • Analyze. Can we use tools to analyze the testing problem? Can it help us extract meaningful themes from specifications, to enumerate interesting or useful tests from a model, to understand coverage?
  • Install and Configure. Can we use tools to install or configure the software? To rapidly switch between states?  To generate, condition and manage data?
  • Drive. Can we use tools to stimulate the software? To simulate input or interactions with other components, systems?
  • Predict. Can we use tools to predict what the software might do under certain conditions?
  • Observe. Can we use tools to observe the behavior of the software? To capture data or messages? To monitor timing
  • Reconcile. Can we use tools to reconcile our observations and predictions? To help draw attention to mismatches between reality and expectations?
  • Evaluate. Can we use tools to help us make sense of our observations? To aggregate, analyze or visualize results such that we might notice patterns?

Of course, this model is flawed too. Just as the whole test / replacement paradigm misses certain types of automation, this model most likely has its own blind spots. Thankfully, Human beings are inveterate modelers, and there is nothing to stop you from creating your own and hopping frequently from one to another. Please do: we would all benefit from trying out a richer set of models.

1Derived from a model by James Bach that describes roles of test techniques (documented in BBST Foundations).


On April 6th 2009, an earthquake devastated L’Aquila, a town in central Italy. It killed more than three hundred people, injured thousands and displaced tens of thousands. This tragedy continues to have repercussions: last month, a group consisting of six scientists and a government official were convicted of manslaughter on the grounds that they failed to properly represent the risks of such an event.

These convictions have drawn considerable media attention, much of it contradictory, some of it misleading. One might be forgiven for believing the absurd: that these men have been found guilty of failing to predict an earthquake. The reality seems to be more complex. However, my intention with this post is simply to draw attention to lessons that this case may hold for the tester. This is neither a detailed account of the facts of the case, nor is it an opinion piece concerning the rights and wrongs of the Italian legal system. If you are seeking such things, then please Google your heart away: you’ll find plenty of accounts and opinion online.

The first thing to strike me about this case is the different opinions as to the types of information scientists should be providing about earthquake risks. In the week prior to the earthquake, these scientists met to assess the probability of an earthquake taking place, given an unusually high level of seismic activity in the preceding weeks. The scientists may have believed that providing such an assessment was the extent of their role: according to an official press release from Italy’s Department of Civil Protection, the purpose of this meeting was to provide local citizens “with all the information available to the scientific community about the seismic activity of recent weeks”. However, a key argument in the prosecution’s case was that the scientist had a legal obligation to provide a risk assessment that took into consideration factors such as population density, the age and structure of local housing etc. There is a world of difference between assessing the probability of an event and conducting risk assessment.

Does this sound familiar? Have YOU ever run into a situation where testers and their customers have conflicting views as to the role of the tester? I see these things playing out pretty much every day. Testers providing “sign off” vs. “assuring quality” vs. “providing information”: these are well known and well publicized debates that are not going to go away any time soon. Ignoring these issues and allowing such expectation gaps to persist is to court disaster: they erode and destroy relationships and it is the tester who will lose when unable to live up to the expectations of those they serve. Nor is toeing the line and trying to keep everyone happy a solution. Often testers must deliver difficult messages, and it is impossible to do so whilst playing popularity games. Imagine being the tester who has been cowed into “signing off” on a product that ends up killing someone or causing a financial loss. If you do this, then you deserve what is coming to you.

Now, whilst I strongly subscribe to the view that our role is as information providers, I have noticed something disturbing lately: testers who seem to feel that their responsibility ends with adding smiling/frowning faces to a dashboard or filing bug reports, “we reported the bug so now it’s not our problem”. This is wholly inadequate. If our role is to provide information, then handing over a report is not enough. Providing information implies not only acquiring information, but communicating it effectively. A test plan is not communication. A bug report is not communication. Even a conversation is not communication. Only when the information borne by such artifacts is absorbed and understood has communication taken place. This is neurons not paperwork. I recently had a conversation with a tester who objected to articulating the need to address automation related technical debt in a way that would be understood by project executives. Perhaps he thought this was simply self-evident. Perhaps the requests of testers should simply be accepted? Perhaps the language of testers is easily understood by executives? I disagree on all counts: testers need to be master communicators, we need to learn how to adapt our information to different mediums, but most importantly we need to learn to tailor our message to our many different audiences. Facts rarely speak for themselves; we need to give them a voice.

Another aspect of this case that I find interesting is that it seems that Bernardo De Bernardinis, the government official who has been convicted, may have had a different agenda from the scientists: to reassure the public. He had motivation to do so: not only had a local resident been making widespread, unofficial, earthquake predictions, but in 1985 Giuseppe Zamberletti, a previous head of the Department of Civil Protection, was investigated for causing panic after ordering several, and in hindsight unnecessary, evacuations in Tuscany. Before the above meeting took place he told journalists that everything was “normal” and that there was “no danger”. This advice had fatal results: many of the local inhabitants abandoned their traditional earthquake precautions in the belief that they were safe.

This is the kind of reassurance that science cannot, that scientists should not, give. It is the same with testing. Have you ever worked for a project manager, product owner or customer who simply wanted to know that everything would be okay? Of course you have, this is human nature: we crave certainty. Unfortunately, certainty is not a service we can provide. Not if we want to avoid misleading our customers. Not if we value integrity. We are not in the confidence business: we are no more Quality Reassurance than we are Quality Assurance.

Science and testing are often misunderstood, and the customers of both have needs and expectations that cannot be fulfilled by either. Scientists need to do a better job of communicating the limits of what they can do. Testers need do the same. In the L’Aquila case, the prosecution stated that the accused provided “incomplete, imprecise, and contradictory information”. Often information IS incomplete, imprecise and contradictory. Scientists and testers alike would be well advised not to hide the fact, but to frequently draw attention to it.

Throw it Away

Test automation is software. It is often more complex than the solution it is testing. Investments in automation should be justified by an ROI analysis. Automation should be based on design patterns that ensure maintainability, and its development should be subject to the same controls as any software development project. Targets should be set for automation, and progress reported. Blah, blah, blah blah blah.

Such arguments are common: and can even make a degree of sense for certain forms of automation (perhaps: I have grave concerns about script targets, automated or otherwise, and see my comments on ROI here). They also represent a phenomenally constrained way of thinking about automation and how it can help testers: the view that automation is only really useful for regression testing.

Don’t get me wrong, I have often found regression automation useful, for example:

  • The SMEs shock troops were the exploratory vanguard of the testing team, ripping apart each new release, iteration after iteration. As the bugs were fixed and the features stabilized, the toolsmith would build a series of automated checks on the more complex rules, or the more business critical transactions. Our explorers had scant need to return to old battlefields and could focus on scouting out new territory.

Sadly, for each example such as the above I’ve encountered many that are more along these lines:

  • The ROI looked great. The framework was a work of art and the automation a breeze. The UI and business rules were pretty stable, and we didn’t find many regression bugs. Then (and totally unpredicted) the business decided on a significant overhaul of the UI. The tests all broke, taking our hard work with it. The automation had worked fine in the absence of any regression risk, but when regression risk was introduced the automation was toast. What was the point exactly?
  • Management was totally bought into automation. After months of painstaking design and development a framework was established. Scripting targets were set and met. Much backslapping ensued. The testers, their numbers having been cut “because we’re doing automation now”, seemed more stressed than ever. Critical regression bugs were being missed, and strangely no one would answer my question “exactly what has been automated and why?”

Now, let’s escape the narrow view that automation is a synonym for regression testing. Thinking back on those times where I’ve I have found automation to be the most useful, it strikes me that it had very little to do with regression testing at all. For example:

  • There were too many tests to contemplate. The explorers had been fought to a standstill, and were facing the prospect of many weeks of mind numbing grind working through a thousand combinations of data. Our risk analysis suggested a failure on any of the combinations would wind up in the newspapers. Enter the toolsmith. A day’s worth of development, a data driver and a spreadsheet later, the explorers could move on.
  • A one off data migration, never to be used again. Any inaccuracy could cost a fortune in incorrect pricing, yet the millions of rows could not be reconciled by hand, or even using commonly available diff tools. Two days worth of fiddling with a consumer-grade database application, and we had a custom data reconciliation tool with which to go to town on the migration.
  • For no apparent reason, the enterprise web app kept falling over in production. Load? Nope? Any errors in the logs? Nope. Just a gradual degradation in performance followed by collapse. An admittedly inelegant framework cobbled together in Java, a handful of commonly occurring transaction driven by Selenium, random data and a few hours of execution soon revealed resource leaks in the session handler.

What do these have in common? In each case, the tool helped us to TEST. These tools had nothing to do with hitting arbitrary targets. They were not driven by a desire for cheap testing or notions of efficiency. These tools helped us do something we couldn’t otherwise test, find bugs that would have otherwise remained hidden, or achieve levels of coverage that would have been inconceivable without tooling. These tools were built to solve a particular problem, and with the problem solved, the tools were disposable. With an investment of mere days, hours even, no ROI analysis was required. With no expectation of reuse, maintenance was a non-issue: automation patterns need not apply.

What conclusions can we draw from these examples?

  • Automation exists to solve a testing problem and if it doesn’t do that, then it doesn’t deserve to exist. Other goals should not be allowed to interfere.
  • Automation is not about replacing human testing effort: it is about doing more, doing faster, or doing the impossible.
  • Automation need not be big, complex, or expensive. Some of the best automation is none of these, and is pretty ugly to boot. And when a tool’s job is done, it’s okay to throw it away.