Tag Archives: Automation

Models of Automation

Why do many people completely miss obvious opportunities to automate? This question has been bothering me for years. If you watch carefully, you’ll see this all around you: people, who by rights have deep and expansive experience of test automation, unable to see the pot of gold lying right in front of them. Instead, they charge off hither and thither, searching for nuggets or ROI-panning in the dust.

Last year I witnessed a conversation, which whilst frustrating at the time, suggested an answer to this puzzle. One of my teams had set about creating a set of tools to help us to test an ETL process. We would use these tools to mine test data and evaluate it against a coverage model (as a prelude to data conditioning), for results prediction and for reconciliation. Our client asked that we hook up with their internal test framework team in order to investigate how we might integrate our toolset with their test management tool. The resulting conversation resembled first contact, and without the benefit of an interpreter:

Framework team: So how do you run the tests?

Test team: Well, first we mine the data, and then we condition it, run the parallel implementation to predict results, execute the real ETL, do a reconciliation and evaluate the resulting mismatches for possible bugs.

Framework team: No, I mean, how do you run the automated tests?

Test team: Well, there aren’t any automated tests per se, but we have tools that we’ve automated.

Framework team: What tests do those tools run?

Test team: They don’t really. We run the tests, the tools provide information…

Framework team: So…how do you choose which test to run and how do you start them?

Test team: We never run one test at a time, we run thousands of tests all at the same time, and we trigger the various stages with different tools.

Framework team: Urghh?

Human beings are inveterate modelers: we represent the world through mental constructs that organize our beliefs, knowledge and experiences. It is from the perspective of these models that we draw our understanding of the world. And when people with disjoint mental models interact, the result is often confusion.

So, how did the framework team see automation?

  • We automate whole tests.

And the test team?

  • We automate tasks. Those tasks can be parts of tests.

What we had here wasn’t just a failure to communicate, but a mismatch in the way these teams perceived the world. Quite simply, these two teams had no conceptual framework in common; they had little basis for a conversation.

Now, such models are invaluable, we could not function without them: they provide cues as to where to look, what to take note of, what to consider important. The cognitive load of processing every detail provided by our senses would paralyze us. The price is that we do not perceive the world directly; rather, we do so through the lens of our models. And it is easy to miss things that do not fit those models. This is exactly like test design: the mechanism is the same. When we rely on a single test design technique, which after all is only a model used to enumerate specific types of tests, we will only tend to find bugs of a single class: we will be totally blind to other types of bug. When we use a single model to identify opportunities to automate, we will only find opportunities of a single class, and be totally blind to opportunities that don’t fit that particular mold.

Let’s look at the model being used by the framework team again. This is the more traditional view of automation. It’s in the name: test automation. The automation of tests. If you’re not automating tests it’s not test automation, right? For it to be test automation, then the entire test must be automated. This is a fundamental assumption that underpins the way many people think about automation. You can see it at work behind many common practices:

  • The measurement, and targeting, of the proportion of tests that have been automated. Replacement (and I use the term loosely) of tests performed by humans by tests performed by machines. The unit of measure is the test, the whole test and nothing but the test.
  • The selection of tests for automation by using Return on Investment analysis. Which tests, when automated, offer the greatest return on the cost of automation? Which tests should we consider? The emphasis is on evaluating tests for automation, not on evaluating what could be usefully automated.
  • Seeking to automate all tests. Not only must whole tests be automated, every last one should be.

This mental model, this belief that we automate at the test level, may have its uses. It may guide us to see cases where we might automate simple checks of results vs. expectations for given inputs. It is blind to cases that are more sophisticated. It is blind to opportunities where parts of a test might be automated, and other parts may not be. But many tests are exactly like this! Applying a different model of what constitutes a test, and applying this in the context of testing on a specific project, can provide insight into where automation may prove useful. So, when looking for opportunities for automation, I’ve turned to this model for aid1:

  • Analyze. Can we use tools to analyze the testing problem? Can it help us extract meaningful themes from specifications, to enumerate interesting or useful tests from a model, to understand coverage?
  • Install and Configure. Can we use tools to install or configure the software? To rapidly switch between states?  To generate, condition and manage data?
  • Drive. Can we use tools to stimulate the software? To simulate input or interactions with other components, systems?
  • Predict. Can we use tools to predict what the software might do under certain conditions?
  • Observe. Can we use tools to observe the behavior of the software? To capture data or messages? To monitor timing
  • Reconcile. Can we use tools to reconcile our observations and predictions? To help draw attention to mismatches between reality and expectations?
  • Evaluate. Can we use tools to help us make sense of our observations? To aggregate, analyze or visualize results such that we might notice patterns?

Of course, this model is flawed too. Just as the whole test / replacement paradigm misses certain types of automation, this model most likely has its own blind spots. Thankfully, Human beings are inveterate modelers, and there is nothing to stop you from creating your own and hopping frequently from one to another. Please do: we would all benefit from trying out a richer set of models.

1Derived from a model by James Bach that describes roles of test techniques (documented in BBST Foundations).

Throw it Away

Test automation is software. It is often more complex than the solution it is testing. Investments in automation should be justified by an ROI analysis. Automation should be based on design patterns that ensure maintainability, and its development should be subject to the same controls as any software development project. Targets should be set for automation, and progress reported. Blah, blah, blah blah blah.

Such arguments are common: and can even make a degree of sense for certain forms of automation (perhaps: I have grave concerns about script targets, automated or otherwise, and see my comments on ROI here). They also represent a phenomenally constrained way of thinking about automation and how it can help testers: the view that automation is only really useful for regression testing.

Don’t get me wrong, I have often found regression automation useful, for example:

  • The SMEs shock troops were the exploratory vanguard of the testing team, ripping apart each new release, iteration after iteration. As the bugs were fixed and the features stabilized, the toolsmith would build a series of automated checks on the more complex rules, or the more business critical transactions. Our explorers had scant need to return to old battlefields and could focus on scouting out new territory.

Sadly, for each example such as the above I’ve encountered many that are more along these lines:

  • The ROI looked great. The framework was a work of art and the automation a breeze. The UI and business rules were pretty stable, and we didn’t find many regression bugs. Then (and totally unpredicted) the business decided on a significant overhaul of the UI. The tests all broke, taking our hard work with it. The automation had worked fine in the absence of any regression risk, but when regression risk was introduced the automation was toast. What was the point exactly?
  • Management was totally bought into automation. After months of painstaking design and development a framework was established. Scripting targets were set and met. Much backslapping ensued. The testers, their numbers having been cut “because we’re doing automation now”, seemed more stressed than ever. Critical regression bugs were being missed, and strangely no one would answer my question “exactly what has been automated and why?”

Now, let’s escape the narrow view that automation is a synonym for regression testing. Thinking back on those times where I’ve I have found automation to be the most useful, it strikes me that it had very little to do with regression testing at all. For example:

  • There were too many tests to contemplate. The explorers had been fought to a standstill, and were facing the prospect of many weeks of mind numbing grind working through a thousand combinations of data. Our risk analysis suggested a failure on any of the combinations would wind up in the newspapers. Enter the toolsmith. A day’s worth of development, a data driver and a spreadsheet later, the explorers could move on.
  • A one off data migration, never to be used again. Any inaccuracy could cost a fortune in incorrect pricing, yet the millions of rows could not be reconciled by hand, or even using commonly available diff tools. Two days worth of fiddling with a consumer-grade database application, and we had a custom data reconciliation tool with which to go to town on the migration.
  • For no apparent reason, the enterprise web app kept falling over in production. Load? Nope? Any errors in the logs? Nope. Just a gradual degradation in performance followed by collapse. An admittedly inelegant framework cobbled together in Java, a handful of commonly occurring transaction driven by Selenium, random data and a few hours of execution soon revealed resource leaks in the session handler.

What do these have in common? In each case, the tool helped us to TEST. These tools had nothing to do with hitting arbitrary targets. They were not driven by a desire for cheap testing or notions of efficiency. These tools helped us do something we couldn’t otherwise test, find bugs that would have otherwise remained hidden, or achieve levels of coverage that would have been inconceivable without tooling. These tools were built to solve a particular problem, and with the problem solved, the tools were disposable. With an investment of mere days, hours even, no ROI analysis was required. With no expectation of reuse, maintenance was a non-issue: automation patterns need not apply.

What conclusions can we draw from these examples?

  • Automation exists to solve a testing problem and if it doesn’t do that, then it doesn’t deserve to exist. Other goals should not be allowed to interfere.
  • Automation is not about replacing human testing effort: it is about doing more, doing faster, or doing the impossible.
  • Automation need not be big, complex, or expensive. Some of the best automation is none of these, and is pretty ugly to boot. And when a tool’s job is done, it’s okay to throw it away.

 

Devil in the Detail IV

Previously:

A continued discussion about the arguments for detailed test scripts…

Argument 4: “We need scripts so that we can automate someday”.

Not so very long ago, before the transit strike in Halifax, I used to take the bus to work. I knew that I’d be buying a car at some point, but was looking for the right deal. Perhaps, knowing that I’d soon be purchasing a car, I should have bought myself some wheels, and paid for them out of my transit budget. Maybe even a spare and some winter tires. Of course, I didn’t need wheels to ride the bus, but I could have pretended that this was a cost of using transit so as to make the car look cheaper…

Testing does not require detailed step by step instructions. Automation does. Why hide automation’s true cost? Why? Because often testers and managers want to automate. It is seen as a universal good, or as a way to develop markatable skills. Full disclosure as to its costs might derail the proposed automation initiative.

And in many cases, rightly so, for the argument above mistakenly assumes that automated tests are equivalent to those conducted by humans. Not so. Even some poor soul who has been enslaved to scripted checks and daily test case execution targets has some chance (remote though it might sound) of doing something slightly different, noticing something a little bit odd. Automation does not.

Yet automation stands to be a powerful extension to what testers can accomplish. Such automation cannot be achieved by simply translating human actions into automated ones: rather, it is achieved through understanding what types of actions a machine is better at than us. Now, why would you think that a test script would tell you that?

R.O.Why?

I’ve been noodling with this post for a while now, never quite finishing it off, and then yesterday Dorothy Graham posted Is it dangerous to measure ROI for test automation?, spurring my reply: Yes Dot, yes it is.

All too often I’ve seen ROI used in attempt to either sell or justify test automation. The reasoning goes like this: replace a bunch of manual tests with automated ones, it’ll save money. This is an argument based on a great big stinking ROLie. Automated checks are never a replacement for tests conducted by people. Tests and automated checks are not substitute goods. Machines cannot observe anything other than that which they’ve been programmed to check, their judgment is limited to the algorithms with which they have been provided, they are incapable of insight or intuitive leaps, and they can never play a hunch.

Even when used honestly, as a sanity check of investment decisions, this kind of thinking is perverse. As Dot states: “If we justify automation ONLY in terms of reduced human effort, we run the risk of implying that the tools can replace the people.” In other words, we can perpetuate the myth that testing is a commodity made up of low skilled and easily replaceable parts.

I do not believe that R.O.I. is entirely irredeemable. Such cost comparisons can make sense, but only if applied below and across tests, at the level of testing tasks. For example, earlier this year, I needed to make some tooling decisions relating to the testing of a large volume and highly complex ETL process. Rather than evaluating which tests could be replaced with automation, I instead looked at which tasks should be performed by people, and which should be performed by a machine. Here’s a thumbnail sketch of the reasoning:

  • Data mining, repetitive application of rules: automate.
  • Data conditioning: conceptually possible to automate but insanely expensive to do so exhaustively: let’s stick with people.
  • Expected result generation, high volume repetitive application of rules: automate.
  • Result reconciliation, literally billions of checks per data load: automate.
  • Bug isolation and reporting, investigation and judgment required: need people.

Of course, other things factored into the decision making process (I’ll discuss that a little more at CAST 2012 and in a subsequent post), but having realized that I was beginning to lean heavily towards using large scale mechanized checking to assist with much of our testing, I wanted to carefully check my thinking. In this case, rather than seeking justification, I needed wanted to answer one simple question: Am I making idiotic use of my customer’s money? ROI served as a test of this aspect of my strategy.

Now, this is a narrow example. The parallel nature of ETL test execution lends itself to breaking out and batching the tasks that make up individual tests. For many types of testing this is impossible, and ROI useless. We need a different paradigm, a focus on value instead of replacement. This is fairly straightforward. There are many ways in which tooling can add value: it can enable testers to obtain information that would be inconceivable without tools, it can improve the accuracy and precision of information that they can access and it can enable them to provide information faster or more cost effectively. The tricky part is putting a price on that value so as to determine if a particular automation effort is a worthwhile investment. So why not simply ask? Why not simply discuss the likely value with your customer, and ask what price they would put on it? For example:

  • “This part of the application has a high number of permutations and we can only scratch the surface without automation. What price would you put on being able to identify some major problems in there? What price would you put on knowing that despite our best efforts we haven’t found any major problems with it?”
  • “The current regression checks take about a week to complete. What price would you put on being able to complete them within a few hours of a new build?”
  • “Using flags and stopwatches, we can only scale the performance tests to around 100 testers. What price would you put on a more realistic simulation?”

This might lack the appearance of objectivity that accompanies ROI, but let’s face it; the typical ROI is so speculative and riddled with estimating error as to be laughable. What this approach provides is a quick and easy way of getting to the business of value, and focusing on what tooling can do for our customers rather than only what it will cost them.

That Test’s Got A Bug In It

Well duh! Testers are human too: tests can have bugs in them just as easily the software we’re testing. But are these bugs compounded if we choose to automate?

Recently, this has been the topic of a stimulating discussion with a colleague: are bugs more pervasive in automated tests than in manual tests?

This is too broad a question, so let’s narrow the focus a little: imagine that a tester analyzes a specification that defines how the software will behave under given conditions, and identifies a process that consists of a sequence of tasks. Each task is non-trivial but mechanical, i.e. interpreting the relevant rules requires concentration but no judgment. There are many variations in the different conditions that this process could be executed for, and therefore the process will be executed many times over. In short, this is a classic case where tool supported testing might have value.

In such a case, will automating the process result in something buggier than performing this process with humans?

First, let’s consider the different mechanisms by which bugs can be introduced to a test:

  • Specification: requirements, design etc can be wrong. A tester may not recognize that there is a bug in the specification, and propagate it into the test. Both manual and automated tests are equally susceptible to these kinds of bugs. Often the kind of close scrutiny required to translate a specification into an automated test will reveal ambiguities and inconsistencies; but this can be equally true of performing a test manually.
  • Interpretation: the tester may interpret the specification incorrectly. Automation neither adds nor subtracts from the tester’s ability to interpret specifications: the tester is equally likely to make such a mistake regardless of whether they are automating or executing manually.
  • Implementation: the tester may make an error whilst implementing the test. This mechanism could show marked differences in the degree to which bugs are introduced and this boils down to skill of the tester in relation to automation vs. their skill at performing the tasks manually. A skilled user of the software who has no experience of automation might introduce fewer bugs when executing manually, whereas a skilled automator who has only infrequently used the software might introduce fewer bugs when developing automated tests.

Now let’s consider the critical difference between these types of testing: that of repetition:

  • Unfortunately few testers are gifted with perfect memories and there is nothing quite like repetitive tasks when it comes to sending people to sleep. It is entirely likely that a tester’s memory of their initial correct interpretation of a rule will degrade over the course of multiple iterations, leading to the onset of bugs.
  • Repeated manual execution of a test might initially reduce the number of errors a tester makes, particularly if the tester is not skilled in the use of the software to begin with. Practice makes perfect, right? Often, we can get better at simple tasks by repetition. However, over the longer term, the drudge factor of repeated execution might serve to sap the tester’s concentration and cause bugs to creep in.
  • When automating, a tester only has one round of analyzing, interpreting and implementing the specification. If the tester fails to notice a specification error, makes an error in either interpreting or implementing the specification in code, then that error becomes a bug that could manifest every time the test executes. In other words, the effects of that bug have been magnified through repetition.
  • In contrast, when testing manually, different bugs could be introduced on different iterations of the test. A particular rule might be complex, and the tester might not recognize errors in its specification on first reading: time and familiarity might allow the tester to spot a specification bug on later iterations. The tester might have to revisit such a rule between iterations, interpret it differently on subsequent passes, and in this way introduce different bugs on different iterations. Similarly, the tester’s implementation might vary between iterations, causing different bugs in different places.

Finally, let’s consider the consequences of these bugs. Bugs in tests result in one of two things:

  • A false negative, whereby an incorrect result is not recognized as such. Often test bugs derived from specification errors have an identical twin in the software under test (i.e. the developer also propagated the specification error into code) and a false negative will result. Unless detected by some other means (for example other tests, scary explosions in live use), false negatives will go undetected.
  • A false positive, whereby a correct result is incorrectly identified as a failure. These will hopefully be detected once a bug is reported, triaged and analysed.

If detected, test bugs can sometimes be corrected:

  • Interpretation and implementation bugs in automated tests can be fixed in the same way that any software bug can. As there is only a single implementation of the tests, resolution of these bugs will apply to all execution iterations. Of course, as with any software, change can introduce regression bugs.
  • Interpretation bugs in manual tests can be addressed through the clarification of the relevant rules and through education of the testers. However, remember that interpretation bugs in manual tests can vary between execution iterations: there is nothing to prevent different interpretation bugs creeping in later.
  • Implementation bugs in manual tests might be addressed through improving a tester’s skills in the software, but this cannot eliminate the loss of concentration that accompanies repetition. Implementation bugs will persist.

In summary: we cannot form a general answer to the question “are bugs more pervasive in automated tests than in manual tests?” The skillset of the tester, the complexity of the software and the degree to which repetition is required will all significantly influence any differences between the numbers of bugs introduced in manual vs. automated tests.

However, when an automated test is broken it is consistently broken, whereas the bugs that occur in manual tests may vary dramatically between iterations. In cases where automation can genuinely substitute for manual effort, managing bugs in automated tests may be easier than in their manual counterparts.