Forwards & Backwards

A little while ago, I wrote about “testing backwards”. In her comments, Savita pointed out that when testing backwards, I was actually doing exploratory testing. This is true; though it might be more accurate to say that I was using types of reasoning that tend to be more common in exploratory testing than in “testing forwards”.

When we test forwards we tend to rely heavily on deduction, i.e. predicting a result based on a rule. There are other forms of reasoning: induction and abduction. Each has an important part to play in testing.  When you test, you probably use all of these subconsciously: I find it useful to remain aware of these different modes of thinking, and consciously switch gears when I’m getting stuck. This post expands on these types of reasoning, and describes some of their uses in testing.

Note: whilst deduction, induction and abduction have a long history, this terminology can be terribly confusing: even now, more than twenty years after studying the subject at university it still leaves me struggling at times. The chief difficulty is with the term abduction. This is used in different ways across disciplines and by different people (sound familiar?) Even Charles Sanders Peirce, the inventor of the term “abduction”, changed his usage during his career. Lesson 29 of Lessons Learned in Software Testing (Kaner, Bach and Pettichord) provides an excellent description of abductive inference, without which I would probably still be scratching my head.

Deduction: Reasoning to a Result

With deduction, we predict results based on a rule and a set of initial conditions. This type of reasoning  is commonly associated with specification based testing: model the software based on its requirements, derive rules from the model that define how the software should behave, feed the software a set of conditions (inputs) for each rule, and check that results match those predicted by the rule.

For example, when testing account authentication you might test that the account becomes locked after a specified number of failed attempts.  Imagine your specification states that accounts will become locked after a third unsuccessful login attempt: based on a set of conditions (three failed attempts) and a rule (three failed attempts → account locked) you predict a result (account locked). This leads you to design and execute a test that sets up the necessary conditions and allows you to observe whether the rule has been implemented correctly.

We don’t just use deduction when we testing from specifications, we use this type of reasoning in many types of test design. For example, you might understand that software often fails when subjected to large inputs and apply this to your testing: based on a condition (large input) and a rule (large input → failure) you predict a result (software failure). This leads you test to see whether the software does indeed fail with a large input. In this example, you are still applying a model, but the model is not one of how the software should work (e.g. drawn from specifications) but how it could fail. In this way, many different models can be used to predict results and design tests.

Induction: Reasoning to a Rule

When we practice induction, we attempt to determine a rule based on our observations. We often do this when we use software with the goal of learning about it, i.e. when we use the evidence of our tests to infer the rules upon which it is based. This is primarily the form of reasoning that I was talking about in “Testing Backwards”:

Let’s return to the account locking example: perhaps you haven’t got a specification at all, but you’ve noticed that after three failed login attempts the application tells you that your account is now locked. From the conditions (three failed attempts) and results (account locked) you speculate that the software implements a rule (three failed attempts → account locked).

A note of caution: whilst useful, induction does not guarantee that the rule you infer will be correct. Of all the possible conditions and results you might observe, how do you know that you have observed the critical ones? Consider the above example again: what if the account appears to have become locked for some reason other than three failed login attempts? Perhaps your account was locked by another tester performing a different test? Perhaps the rule you inferred is in fact valid, but only applies to particular types of account? When you use induction to infer a rule, you are making a conjecture that can only be verified, or disproven, through further testing.

Abduction: Reasoning to the Best Explanation

Abduction is the process of gathering information, identifying possible explanations (rules) based on that evidence, and seeking to verify or disprove each explanation until you arrive at one which best explains the available data.

Back to the account locking example…let’s say that you determined from the specification that accounts are locked after three unsuccessful login attempts. However, on testing you determine that the account remains unlocked even after three failed attempts. Is this a bug? What could have happened here?

Based on your input, results, and some previous testing experience, you might apply a common failure model and speculate that the account locking logic has a boundary bug. Perhaps rather than implementing “lock account if failed attempts = 3” this has been implemented as “lock account if failed attempts > 3”? This might lead you to test another failed log in, to see if that locks the account.

Perhaps this behavior is configurable? Perhaps a configuration file needs adjusting to set the threshold to three, or to even activate the feature. This might lead you to investigate what the specification has to say about configuration, or to nose around some configuration files.

You construct a number of possible explanations, and investigate each in turn – by reading the specs, poking around in config, attempting further tests or talking to the developers until you arrive at a reasonable explanation, or conclude that the only reasonable explanation is that you have found a bug.

Conclusions

This final example illustrates the need to use more than simple deduction in testing: whether using an exploratory approach or otherwise.

In the above case, if you rely purely on deduction, you stop at the unexpected behavior: “it didn’t do what the spec says, it’s a bug”. Yet there many other possible explanations for this behavior, some entirely benign. If you log a bug at this point, you may well be raising a false positive, damaging your credibility and wasting valuable developer time. In short, your job isn’t done until you have an explanation or can no longer justify spending any more time on this particular line of enquiry.

Classical models of testing, which overemphasize the use of deduction and simple comparisons between expected/actual results lobotomize our testers and constrain the real value that they can bring to their projects.

Uncertainty is Good for You

Keeping our minds open to new explanations requires tolerating uncertainty, which, ironically, is precisely the mental vexation we try to relieve by thinking. Thomas Szasz, in the foreword to Levy’s Tools of Critical Thinking.

I’ve written previously about the role of testing in reducing uncertainty in software projects. You might be forgiven for thinking that uncertainty is an evil that we must drive out.

In fact, the opposite is true: uncertainty is our friend, and there is little place for certainty in a tester’s work:

  • Consider a tester who is certain that there is one correct way to test. How well do you think that tester will adapt to a changing mission or to different project constraints?
  • Consider a tester who is certain that a specification is complete and correct. How do you rate his or her chances of identifying unfulfilled needs or specified, but undesired, behaviours?
  • Consider a tester who is certain that a given test will “pass”. How motivated do you think the tester would be to run that test? How attentively do you think the tester will observe software behavior during the execution of that test?

In each of these examples, certainty is poison. In contrast:

  • Where uncertainty abounds, where there is little agreement between users, programmers, BAs, PMs and other project stakeholders, there is ample opportunity for confusion, errors and bugs. Like nature, testers abhor a vacuum: where there is an absence of certainty, there is fertile territory for our craft. Uncertainty can act as a flashing neon sign that reads “TEST ME”.
  • When we don’t understand something, we seek to do so. When we feel uncertainty about software, about how it might react, what it might do, we are experiencing the prelude to discovery, the motivation to ask “what if?”. Uncertainty is the powerhouse of testing.
  • Uncertainty drives us to question not only the software under test, but our oracles, our practices and our very mission: without such questioning, our habits and assumptions needlessly constrain us. Uncertainty is the antidote to testing chauvinism.

When we test, we seek to reduce uncertainty. Paradoxically, we must embrace uncertainty in order to do so.

Hard Sometimes

Being context driven can be hard sometimes.

It is easy to get carried away with a favoured approach, a preferred method of doing things, a practice that you’ve used successfully before.

This is perfectly normal: when we start out on a new project, we’re juggling fifty million new pieces of information, struggling to make sense of it all and figuring out where to start. The filter of our experience helps us to simplify these problems rather than having to sweat through everything from first principles. If we did the latter we’d be paralyzed, like a deer in the headlights. Sometimes this is less benign: sometimes it is easy to get carried away with a new shiny thing, context be damned. I find that the lure of clever and elegant automation solutions can be particularly hard to resist.

But we’re not testing for kicks. We’re not testing because we want to do things the way we like doing them. We’re not testing because we want to build a cool automation gizmo. We have a mission: we’re testing to provide a service to our projects by doing some violence to their software. Many of our choices demand closer attention.

So think: are you considering an approach because it is a good fit, or because it looks fun or because you’re operating on instinct?

So be honest: recognize and acknowledge your biases, admit them to others.

So be open to challenge: invite others to suggest alternatives or find reasons why your own way might not work.

So compare and contrast: explore the similarities and differences between approaches. How does each serve your mission? How do they stack up against any constraints you may be under?

So consider history: explore the similarities and differences between situations where you’ve used an approach before, and the current context. This can reveal reasons a previously successful approach might fail, or a previously unsuccessful approach might just pay off.

Being context driven can be hard sometimes. Remembering to think is a good first step.

That Test’s Got A Bug In It

Well duh! Testers are human too: tests can have bugs in them just as easily the software we’re testing. But are these bugs compounded if we choose to automate?

Recently, this has been the topic of a stimulating discussion with a colleague: are bugs more pervasive in automated tests than in manual tests?

This is too broad a question, so let’s narrow the focus a little: imagine that a tester analyzes a specification that defines how the software will behave under given conditions, and identifies a process that consists of a sequence of tasks. Each task is non-trivial but mechanical, i.e. interpreting the relevant rules requires concentration but no judgment. There are many variations in the different conditions that this process could be executed for, and therefore the process will be executed many times over. In short, this is a classic case where tool supported testing might have value.

In such a case, will automating the process result in something buggier than performing this process with humans?

First, let’s consider the different mechanisms by which bugs can be introduced to a test:

  • Specification: requirements, design etc can be wrong. A tester may not recognize that there is a bug in the specification, and propagate it into the test. Both manual and automated tests are equally susceptible to these kinds of bugs. Often the kind of close scrutiny required to translate a specification into an automated test will reveal ambiguities and inconsistencies; but this can be equally true of performing a test manually.
  • Interpretation: the tester may interpret the specification incorrectly. Automation neither adds nor subtracts from the tester’s ability to interpret specifications: the tester is equally likely to make such a mistake regardless of whether they are automating or executing manually.
  • Implementation: the tester may make an error whilst implementing the test. This mechanism could show marked differences in the degree to which bugs are introduced and this boils down to skill of the tester in relation to automation vs. their skill at performing the tasks manually. A skilled user of the software who has no experience of automation might introduce fewer bugs when executing manually, whereas a skilled automator who has only infrequently used the software might introduce fewer bugs when developing automated tests.

Now let’s consider the critical difference between these types of testing: that of repetition:

  • Unfortunately few testers are gifted with perfect memories and there is nothing quite like repetitive tasks when it comes to sending people to sleep. It is entirely likely that a tester’s memory of their initial correct interpretation of a rule will degrade over the course of multiple iterations, leading to the onset of bugs.
  • Repeated manual execution of a test might initially reduce the number of errors a tester makes, particularly if the tester is not skilled in the use of the software to begin with. Practice makes perfect, right? Often, we can get better at simple tasks by repetition. However, over the longer term, the drudge factor of repeated execution might serve to sap the tester’s concentration and cause bugs to creep in.
  • When automating, a tester only has one round of analyzing, interpreting and implementing the specification. If the tester fails to notice a specification error, makes an error in either interpreting or implementing the specification in code, then that error becomes a bug that could manifest every time the test executes. In other words, the effects of that bug have been magnified through repetition.
  • In contrast, when testing manually, different bugs could be introduced on different iterations of the test. A particular rule might be complex, and the tester might not recognize errors in its specification on first reading: time and familiarity might allow the tester to spot a specification bug on later iterations. The tester might have to revisit such a rule between iterations, interpret it differently on subsequent passes, and in this way introduce different bugs on different iterations. Similarly, the tester’s implementation might vary between iterations, causing different bugs in different places.

Finally, let’s consider the consequences of these bugs. Bugs in tests result in one of two things:

  • A false negative, whereby an incorrect result is not recognized as such. Often test bugs derived from specification errors have an identical twin in the software under test (i.e. the developer also propagated the specification error into code) and a false negative will result. Unless detected by some other means (for example other tests, scary explosions in live use), false negatives will go undetected.
  • A false positive, whereby a correct result is incorrectly identified as a failure. These will hopefully be detected once a bug is reported, triaged and analysed.

If detected, test bugs can sometimes be corrected:

  • Interpretation and implementation bugs in automated tests can be fixed in the same way that any software bug can. As there is only a single implementation of the tests, resolution of these bugs will apply to all execution iterations. Of course, as with any software, change can introduce regression bugs.
  • Interpretation bugs in manual tests can be addressed through the clarification of the relevant rules and through education of the testers. However, remember that interpretation bugs in manual tests can vary between execution iterations: there is nothing to prevent different interpretation bugs creeping in later.
  • Implementation bugs in manual tests might be addressed through improving a tester’s skills in the software, but this cannot eliminate the loss of concentration that accompanies repetition. Implementation bugs will persist.

In summary: we cannot form a general answer to the question “are bugs more pervasive in automated tests than in manual tests?” The skillset of the tester, the complexity of the software and the degree to which repetition is required will all significantly influence any differences between the numbers of bugs introduced in manual vs. automated tests.

However, when an automated test is broken it is consistently broken, whereas the bugs that occur in manual tests may vary dramatically between iterations. In cases where automation can genuinely substitute for manual effort, managing bugs in automated tests may be easier than in their manual counterparts.

Testing Backwards

One of my favourite projects started off by testing backwards.

The project in question involved taking software used by one customer and customizing it for use by another. First we would define which of the existing features would be preserved, removed, and modified. Unfortunately, none of the original development team was available, nor were there any existing models, requirements or design documents. Our starting point: the source code and a little bit of domain knowledge. This was hardly a basis for having a meaningful conversation with the customer: we needed to reverse engineer the software before we could start to change it.

Testing proved to be a big part of the solution to this problem. As strange as it might seem, this project didn’t just end with testing, it started with testing.

When you test forwards you use a model. This might be a set of requirements, it might be a design, or it might be your expectations based on experience or conversations with stakeholders.  This model allows you to make predictions as to how the software will behave under certain conditions. You then execute a test with those conditions, and verify that it behaved as predicted.

In contrast, testing backwards is concerned with deriving such a model. You investigate how the software behaves under a range of conditions, gradually building an understanding of why it behaves the way it does. This is reverse engineering, determining rules from an existing system.

You might be forgiven for assuming that testing backwards is only concerned with determining how the software works rather assessing it and finding bugs, after all you need some kind of model as to how it should behave in order to determine whether it fails to do so. This is not the case: the model of the software’s behaviour is not the only model in play. When you test, you bring many models to bear:

  • Models that describe generally undesirable behaviour, for example; unmanaged exceptions shouldn’t bubble up to the UI as user unfriendly stack traces.
  • Models based on general expectations, for example; calculations should comply with mathematical rules, things that are summed should add up.
  • Models based on domain experience, for example; an order should not be processed if payment is refused.

When I first started on this project, I imagined that by testing backwards I was actually doing something unusual, but it slowly dawned on me that I had been doing this on every project I’d ever tested on:

  • Every time that I had started a new project and played with the software to figure out what it did, I’d been testing backwards.
  • Every time I’d refined tests to account for implementation details not apparent from the specification, I’d been testing backwards.
  • Every time I’d found a bug and prodded and poked so as to better understand what the software was doing, I’d been testing backwards.

I was struck by the power of testing backwards:by seeking to understand what the software did rather than simply measuring its conformance with expected results, we are better able to learn about the software. By developing the skills required to test backwards, we are better able to investigate possible issues. By freeing ourselves of the restrictions of a single model, a blinkered view that conformance to requirements alone equates to quality, we are better able to evaluate software in terms of value.

Would testing backwards serve your mission?

Should I Start Testing?

I recently participated in a Software Testing Club discussion about Quality Gates. This led me to reflect a little more on the subject.

My first test management gig was running system integration testing for an enterprise project. The phase was planned to last for five weeks. System testing for a number of component systems was running late, and that was making the project look bad. The project manager impressed on me the political importance of starting system integration on time.

“System test should be done in another week or so” she said.

“So you should be able to catch that up in the last month of integration”. I reluctantly agreed.

Twenty five weeks of hell later, I submitted the exit report. Nothing had worked out as expected. Delays had continued on completion of some systems. Others that were supposedly complete either had gaping holes where functionality should have been, or turned into a bugfest. Attempting to test and raise system integration level bugs simply poured fuel on the fire, adding to the confusion.  I swore that from that point on I’d only test software that was ready to be tested: that I’d implement strict quality gates and hard entry criteria.

I can understand why Quality Gates have become an article of faith to many testers; the above experience led me in the same direction. Fortunately, experience has also given me a reality check.

Let’s roll the clock forward a few years and consider another project: one on which time to market was the most critical factor.  The end date was not going to change: bugs or no bugs. Development was running late, and enforcing entry criteria would almost guarantee that no testing would be performed by my team, no feedback would be provided to development, no bugs would be found or fixed. I abandoned my gates, and helped to identify some major issues that we were able to iron out before we shipped.

In the first example, quality gates might have helped me. In the second, they were utterly inappropriate. The problem with quality gates goes deeper than a discussion of relevance to different contexts however: they are a trivial solution to a complex problem. They seek to reduce the process of answering the question “should I start testing” to simple box checking. Perhaps the most insidious problem with gates is that they tend to emphasize reasons NOT to test. Thus the tester who has come to believe that testing is like a sewerwhat you get out of it depends on what you put in to it * – can use them as an excuse not to test. Unit testing incomplete? Don’t test. Major bugs outstanding? Don’t test. Documentation outstanding? Don’t test. All these factors miss a vital point: reasons TO test.

Even when test entry criteria have not been fulfilled, there are many good reasons to test:

  • Perhaps you can provide early feedback on items that aren’t quite ready for the big time
  • Perhaps you can learn something new and think of new tests
  • Perhaps you can become more familiar with the software
  • Perhaps you can test your own models and assumptions about the software
  • Perhaps you test whether your test strategy is workable.

The decision to start testing demands more thought than simple box checking can provide. Instead, questions like these can serve as a guide:

  • If I we’re to start testing now, what kind of constraints would my testing be under?
  • Given those constraints, what testing missions could I accomplish?
  • For those missions, what kind of value could I provide to the project?
  • What is the value of tasks that I might forego in order to test now?
  • If I were to test now, what other costs might this impose on the project?

Next time you decide whether or not to test, use your brain, not a check list.

*A nod to Tom Lehrer.

 

Counting Experience

Once upon a time, I believed that only testers should test, and testing experience counted for everything.  This was an easy trap to fall into: after all, I had been around the block. I’d learned lots about testing approaches, different strategies, methodologies and techniques. Looking back at my earlier efforts, and many mistakes, it was easy to think “if only I knew then what I know now!” and ascribe any improvement to experience.

Dumb. Arrogant. Mistake.

What changed my mind?

Let me briefly describe the project: we were customizing an enterprise solution owned by Client A for use by Client B. Due to the complexity of the product, the lack of any models or documentation, and the significance of the changes, this would be no simple matter. In addition, time scales were tight and we were iterating rapidly. I chose a testing approach that relied heavily on exploratory testing, with automation support for those areas that were considered high regression risk or eminently suitable for data driving.

The exploratory testers were to be the vanguard: first to test new code, first to test fixes, first to investigate and qualify bugs reported by others. This was to be the most critical testing role on the project. For this I was given two subject matter experts seconded from Client A: testers whose sole testing experience was a little bit of UAT.

Now, there’s a common misperception that I often hear about ET: “you need a lot of testing experience to do that”. Perhaps I could have listened to this conventional wisdom. Perhaps I could have succumbed to my prejudices. After meeting the testers in question, I played a hunch and chose not to.

We got started with some basics: discussing the impossibility of complete testing, the oracle problem, bug reporting expectations and the overall approach to testing for the project. Then we got testing. To demonstrate the mechanics of SBTM, I led the first couple of test sessions, after which I dropped back to chartering and reviewing session reports. Within a few weeks I pretty much left them to charter themselves, with a daily huddle to discuss and prioritize things to test.

In the early days I monitored progress intensively:

  • When I heard them debate whether a particular behaviour was a bug, I interrupted with a brief breakout session to discuss the relationship of quality and value, and a variety of different oracles they might consider.
  • When I heard them struggling with how to test a particular feature, I’d introduce them to a few different test design techniques and heuristics that might be relevant.
  • I’d review bug reports, and provide feedback as to follow-up testing they should consider.

Pretty soon they were a wonder to behold. They quickly assimilated every idea and concept thrown at them. The bugs they identified demonstrated significant insight into the product, its purpose, and the needs of its eventual users. It was fascinating to listen to animated discussions along these lines:

Tester 1: Is this a bug?

Tester 2: Hmm, I don’t think so. It’s consistent with both the requirements and the previous version.

Tester 1: But what if [blah, blah, blah]…doesn’t that really kill the value of this other feature?

Tester 2: Got you. That could be a problem with the requirements; we need to talk to the BA.

This pair had rapidly become the most impressive testing duo I’d ever had the pleasure of working with. How had this happened? They brought with them a blend of aptitudes and experiences that far outweighed their relative lack of testing experience:

  • They had open minds, a willingness to learn, and no preconceived notions as to “the one true way” to test.
  • They exhibited an insatiable curiosity: a desire to understand what the software was doing and why.
  • Having lived with the previous version of the product, they were dedicated to delivering quality to other users like them.
  • Their experience as users meant that they had well refined internal oracles that gave them insight into what would diminish user value.

Their experience counted for a lot, but not their testing experience: any testing know-how they needed they were able to learn along the way.

I’m not claiming that testing experience counts for nothing: I’ve also worked in situations where testers needed significant experience in testing, and specific types of testing, or else be eaten alive by a sophisticated client. Testing experience is only one piece in a complicated puzzle that also includes domain experience, technology experience, attitude and aptitude.  Different situations will demand a different blend.

None of these factors are simple. Consider testing experience: this is not a straightforward, indivisible commodity. Testing is a diverse field, and highly context dependant. What works for you in your context might not work for me in mine. Often the recruitment of testers boils down to a simple count of years in testing. A tester can spend decades testing in one context, but when moved suffer from “that’s not how you test” syndrome.  Such an individual is ill equipped to learn or even consider the approaches and techniques that are relevant to a new situation. Even a testing virgin could be a better choice, if accompanied by an open mind. Diversity of experience, and a willingness to learn and adapt, count for far more than years. Counting experience is for fools.

Mission Creep

“Testing was going so well” said the tester, “at least on the first release”.

“We had a clear mandate: find as many important bugs as quickly as we could, and report them to the developers. And we found lots: the PM was happy because we were giving her a good feel for quality, the developers were happy because we were giving them rapid feedback, and the test team was happy because we felt like we were doing our jobs well.”

“I suppose it was during acceptance testing that things started to change. The UAT team hadn’t had enough exposure to the app during development, and struggled to figure out how to run their tests. In the end, the PM asked us to help out. We were only too happy to: we starting designing and maintaining scripted walkthroughs of most of the key features and requirements, as well as authoring the associated test data. This took a fair amount of effort, but we were up for it: it’s a team effort after all.”

“The initial release went in pretty smoothly, I mean, some bugs crept through, but we helped out there too: a lot of what support were getting hit with, we were able to find workarounds for, anything else we were at least able to repro and isolate for the devs. We still do a lot of that now: it helps to keep up a good relationship with the support team.”

“The latest release was a lot hairier; a fair few of the devs have changed. The new guys struggled to understand why a lot of the unit tests were failing, and ended up commenting them out: this meant we started seeing a lot more regression bugs. Added to that, they’re offshore: now we’ve got developers on three continents. Communications don’t seem to be hanging together and somewhere along the line config management got messed up. We ended up doing a lot more regression testing this time around.”

“Got to go, post mortem starts in five minutes.  Release 2 went in last week, and the PM is on the war path: she can’t understand how we missed so many major bugs.”

What happened here?

In the story above, the testers started out with a clear mission: find bugs.

…then they began to provide scripted demonstrations for acceptance testing.

…then they started to figure out workarounds and do isolation for the support team.

…then they added black box regression tests to mitigate regression risks.

…and then they started to fail in their initial mission.

After their initial success, they allowed their mission to expand beyond its original goals, and fell foul of mission creep.

Mission creep brings a number of risks:

  • Loss of effectiveness. There are many possible missions for testing, for example: finding bugs, investigating quality, reducing support costs, mitigating risks, reducing liability, conforming with standards or regulations1. Whilst any of these is potentially valid, some practices are more suitable for some missions than others. If changes are not recognized, and practices not changed accordingly, a test team can find itself working in a way that runs counter to its mission.
  • Loss of focus. Different goals can be blended, but this requires practices to be blended too. This adds complexity. If you have try to do too many things, you may not be able to do any of them well.
  • Overextension. Like its project management cousin scope creep, increasing the number of goals often requires additional effort. Without a corresponding increase to time or resources, mission creep means that less effort can be allocated to each goal, making success of any goal all the less likely.

How can you address mission creep?  Here are some suggestions:

  • Make the mission explicit. Consult with your stakeholders and ensure that the mission is both understood and agreed. If appropriate (for example: if you are an external contractor or testing vendor), consider making the mission a part of your formal scope.
  • Keep an eye open for your mission changing or expanding. Don’t let change surprise you. Review regularly, and engage stakeholders in this process. Are all needs being satisfied?  Does the project need anything new out of testing?
  • Make adjustments to mission and supporting practices. Don’t let your mission and practices diverge, but consult with your stakeholders about the tradeoffs. What are the time and cost implications? Do any goals conflict? Consider whether it is possible to refocus: are any goals now redundant?

Testing missions evolve. Unrecognised, mission creep can cause significant damage to your ability to deliver what your project needs and expects of you. If you are testing the same software for any length of time, mission creep is almost inevitable: it needs to be managed.

 

See Lessons Learned in Software Testing (Kaner, Bach & Pettichord) for a discussion of missions in testing.

Rethinking Regression, Part 5: Your Mission, Should You Choose to Accept It

Wouldn’t it be great if projects would take a “sensible“approach to mitigating regression risks? If projects applied plenty of prevention, used automated unit-level checks for confirmatory testing, and left the testers to do what they do best: find bugs.

This is not the reality on many projects. Nor is it even appropriate on every project. Not every change is sufficient to require reviews. Not every change will necessitate refactoring code. Static analysis tools can be noisy and take time to tune: not all projects will run for long enough to justify this investment. Not every project will be delivering code with a shelf-life that warrants automated unit level checks. Some projects may be having significant difficulties with their configuration management systems that require time to resolve.

“Sensible” therefore takes in a whole range of factors that a tester may not consider or even be aware of. Ultimately, it will not be the tester who determines what mitigation strategies are appropriate for the project: that it the province of the project manager.

What does this mean for the tester? In a word: mission.

It is often helpful to agree a clear testing mission with the relevant stakeholders. Doing so helps to avoid the unpleasant surprises (“You’re doing A? I thought you were doing B!”) that can result from misaligned expectations, and helps to keep the testing effort pulling in the same direction as the project.

The regression testing mission will be driven by a range of contextual factors that might include scope, scale and nature of the changes being implemented, stage within the project life-cycle, project constraints and the other mitigation strategies that the project are employing.  For example:

  • Project A is implementing a wide range of mitigation strategies, including configuration management and unit-level change detection. The project manager and testers agree that the testing mission should be biased towards finding bugs with only light confirmation being performed at the system level (as change detection is largely provided at the unit level).
  • Project B has effective configuration management, but no automated unit level regression checks. The project manager and testers agree that the testing mission should be strike a balance between conducting confirmation around those areas that are changing, and testing for bugs.
  • Project C has little regression mitigation: configuration management has proved highly unreliable and no automated unit level regression checks. Based on the nature of the changes and the stage in the project, the project manager and testers agree that the testing mission should focus on broad confirmation of the software, with some time allocated to testing for bugs.

Explicitly discussing the regression testing mission can provide the tester with an opportunity to ensure that  the relevant project stakeholders are aware of the limitations of black box regression testing. However, if a project manager understands that black box regression testing is not the most cost-effective means of providing change detection and is seriously limited in its ability to find bugs – but decides to rely on it to mitigate regression risks – then that is his or her decision to make. In such a position, all that a tester can reasonably do is recognize that they are selling tobacco and provide a health warning so as to set expectations.

In summary, the regression problem is not a single problem; it is a range of different risks that are most effectively mitigated with a variety of different strategies. By educating their stakeholders about the limitations and tradeoffs involved with black box regression testing, testers can help them to make better risk mitigation decisions. Ultimately contextual factors will drive decisions as to which strategies are appropriate on any given project, and the regression testing mission needs to be defined accordingly.

Other posts in this series:

Uncertainty Revisited

I’ve written previously about the role of the tester in reducing uncertainty on software development projects: of how we model and observe, building our knowledge and providing information.

It might be tempting to imagine the tester as a perfect observer, standing aloof, measuring and judging. Sadly, such an image is delusional: uncertainty permeates everything we do, not just the software we seek to understand. We don’t stand outside the room looking in.

What are the sources of uncertainty in testing?

1) Much uncertainty is inherent in the testing challenge: the impossibility of complete testing guarantees that we can never have full knowledge of the software we test, nor is it conceivable that any set of test techniques will ever predict with certainty where all the bugs will be found.

2) We are subject to model uncertainty.  As testers we construct models as to how the software should work, and how it could fail. These models are invariably flawed:

  • Consider oracles, our models as to how the software should behave: every oracle is heuristic, that is to say useful but imperfect. If that were not the case, if we had complete true oracles, then these oracles would be indistinguishable from the desired state of the software under test: why would we need that software?  Further, quality is subjective, relating to the needs and values of people: as such there can be no absolute and objective oracle.1
  • Consider bug hypotheses, our models that describe potential failures: if these were not flawed, then we could perfectly predict each bug without running a single test.
  • Consider tests themselves: each test is a model that describes how the software will behave under certain conditions.  Unfortunately the range of conditions is sufficiently vast that it is easy to miss conditions that prove to be critical.  The range of resulting behaviours presents a similar challenge.2
  • Even models that describe testing itself are flawed, some more than others.

3) Our observations are subject to measurement uncertainty: our interactions with the software influence how it behaves. This is not limited to our selection of conditions, nor even to Heisenbugs and the probe effect of resource monitors: the very rate, frequency and sequence of our actions can drive different behaviours in software (consider race conditions and resource leaks).

4) We are subject to human error. Testers are humans too: our perceptions our limited, we can only reliably focus on so many things at once, we have unavoidable psychological biases that will influence our choice of tests, the behaviours we observe, and how we interpret our observations.

Much uncertainty is epistemic, that is to say that it can be reduced.  If we are to reduce the uncertainty associated with software, we would be wise to understand the role that uncertainty plays in our own work, and seek ways in which we can reduce that too.

Notes

  • 1 Michael Bolton discusses this rather eloquently in Oracles.
  • 2 Doug Hoffman provides an interesting and detailed discussion of these issues in Why Tests Don’t Pass.