Hard Sometimes

Being context driven can be hard sometimes.

It is easy to get carried away with a favoured approach, a preferred method of doing things, a practice that you’ve used successfully before.

This is perfectly normal: when we start out on a new project, we’re juggling fifty million new pieces of information, struggling to make sense of it all and figuring out where to start. The filter of our experience helps us to simplify these problems rather than having to sweat through everything from first principles. If we did the latter we’d be paralyzed, like a deer in the headlights. Sometimes this is less benign: sometimes it is easy to get carried away with a new shiny thing, context be damned. I find that the lure of clever and elegant automation solutions can be particularly hard to resist.

But we’re not testing for kicks. We’re not testing because we want to do things the way we like doing them. We’re not testing because we want to build a cool automation gizmo. We have a mission: we’re testing to provide a service to our projects by doing some violence to their software. Many of our choices demand closer attention.

So think: are you considering an approach because it is a good fit, or because it looks fun or because you’re operating on instinct?

So be honest: recognize and acknowledge your biases, admit them to others.

So be open to challenge: invite others to suggest alternatives or find reasons why your own way might not work.

So compare and contrast: explore the similarities and differences between approaches. How does each serve your mission? How do they stack up against any constraints you may be under?

So consider history: explore the similarities and differences between situations where you’ve used an approach before, and the current context. This can reveal reasons a previously successful approach might fail, or a previously unsuccessful approach might just pay off.

Being context driven can be hard sometimes. Remembering to think is a good first step.

That Test’s Got A Bug In It

Well duh! Testers are human too: tests can have bugs in them just as easily the software we’re testing. But are these bugs compounded if we choose to automate?

Recently, this has been the topic of a stimulating discussion with a colleague: are bugs more pervasive in automated tests than in manual tests?

This is too broad a question, so let’s narrow the focus a little: imagine that a tester analyzes a specification that defines how the software will behave under given conditions, and identifies a process that consists of a sequence of tasks. Each task is non-trivial but mechanical, i.e. interpreting the relevant rules requires concentration but no judgment. There are many variations in the different conditions that this process could be executed for, and therefore the process will be executed many times over. In short, this is a classic case where tool supported testing might have value.

In such a case, will automating the process result in something buggier than performing this process with humans?

First, let’s consider the different mechanisms by which bugs can be introduced to a test:

  • Specification: requirements, design etc can be wrong. A tester may not recognize that there is a bug in the specification, and propagate it into the test. Both manual and automated tests are equally susceptible to these kinds of bugs. Often the kind of close scrutiny required to translate a specification into an automated test will reveal ambiguities and inconsistencies; but this can be equally true of performing a test manually.
  • Interpretation: the tester may interpret the specification incorrectly. Automation neither adds nor subtracts from the tester’s ability to interpret specifications: the tester is equally likely to make such a mistake regardless of whether they are automating or executing manually.
  • Implementation: the tester may make an error whilst implementing the test. This mechanism could show marked differences in the degree to which bugs are introduced and this boils down to skill of the tester in relation to automation vs. their skill at performing the tasks manually. A skilled user of the software who has no experience of automation might introduce fewer bugs when executing manually, whereas a skilled automator who has only infrequently used the software might introduce fewer bugs when developing automated tests.

Now let’s consider the critical difference between these types of testing: that of repetition:

  • Unfortunately few testers are gifted with perfect memories and there is nothing quite like repetitive tasks when it comes to sending people to sleep. It is entirely likely that a tester’s memory of their initial correct interpretation of a rule will degrade over the course of multiple iterations, leading to the onset of bugs.
  • Repeated manual execution of a test might initially reduce the number of errors a tester makes, particularly if the tester is not skilled in the use of the software to begin with. Practice makes perfect, right? Often, we can get better at simple tasks by repetition. However, over the longer term, the drudge factor of repeated execution might serve to sap the tester’s concentration and cause bugs to creep in.
  • When automating, a tester only has one round of analyzing, interpreting and implementing the specification. If the tester fails to notice a specification error, makes an error in either interpreting or implementing the specification in code, then that error becomes a bug that could manifest every time the test executes. In other words, the effects of that bug have been magnified through repetition.
  • In contrast, when testing manually, different bugs could be introduced on different iterations of the test. A particular rule might be complex, and the tester might not recognize errors in its specification on first reading: time and familiarity might allow the tester to spot a specification bug on later iterations. The tester might have to revisit such a rule between iterations, interpret it differently on subsequent passes, and in this way introduce different bugs on different iterations. Similarly, the tester’s implementation might vary between iterations, causing different bugs in different places.

Finally, let’s consider the consequences of these bugs. Bugs in tests result in one of two things:

  • A false negative, whereby an incorrect result is not recognized as such. Often test bugs derived from specification errors have an identical twin in the software under test (i.e. the developer also propagated the specification error into code) and a false negative will result. Unless detected by some other means (for example other tests, scary explosions in live use), false negatives will go undetected.
  • A false positive, whereby a correct result is incorrectly identified as a failure. These will hopefully be detected once a bug is reported, triaged and analysed.

If detected, test bugs can sometimes be corrected:

  • Interpretation and implementation bugs in automated tests can be fixed in the same way that any software bug can. As there is only a single implementation of the tests, resolution of these bugs will apply to all execution iterations. Of course, as with any software, change can introduce regression bugs.
  • Interpretation bugs in manual tests can be addressed through the clarification of the relevant rules and through education of the testers. However, remember that interpretation bugs in manual tests can vary between execution iterations: there is nothing to prevent different interpretation bugs creeping in later.
  • Implementation bugs in manual tests might be addressed through improving a tester’s skills in the software, but this cannot eliminate the loss of concentration that accompanies repetition. Implementation bugs will persist.

In summary: we cannot form a general answer to the question “are bugs more pervasive in automated tests than in manual tests?” The skillset of the tester, the complexity of the software and the degree to which repetition is required will all significantly influence any differences between the numbers of bugs introduced in manual vs. automated tests.

However, when an automated test is broken it is consistently broken, whereas the bugs that occur in manual tests may vary dramatically between iterations. In cases where automation can genuinely substitute for manual effort, managing bugs in automated tests may be easier than in their manual counterparts.

Testing Backwards

One of my favourite projects started off by testing backwards.

The project in question involved taking software used by one customer and customizing it for use by another. First we would define which of the existing features would be preserved, removed, and modified. Unfortunately, none of the original development team was available, nor were there any existing models, requirements or design documents. Our starting point: the source code and a little bit of domain knowledge. This was hardly a basis for having a meaningful conversation with the customer: we needed to reverse engineer the software before we could start to change it.

Testing proved to be a big part of the solution to this problem. As strange as it might seem, this project didn’t just end with testing, it started with testing.

When you test forwards you use a model. This might be a set of requirements, it might be a design, or it might be your expectations based on experience or conversations with stakeholders.  This model allows you to make predictions as to how the software will behave under certain conditions. You then execute a test with those conditions, and verify that it behaved as predicted.

In contrast, testing backwards is concerned with deriving such a model. You investigate how the software behaves under a range of conditions, gradually building an understanding of why it behaves the way it does. This is reverse engineering, determining rules from an existing system.

You might be forgiven for assuming that testing backwards is only concerned with determining how the software works rather assessing it and finding bugs, after all you need some kind of model as to how it should behave in order to determine whether it fails to do so. This is not the case: the model of the software’s behaviour is not the only model in play. When you test, you bring many models to bear:

  • Models that describe generally undesirable behaviour, for example; unmanaged exceptions shouldn’t bubble up to the UI as user unfriendly stack traces.
  • Models based on general expectations, for example; calculations should comply with mathematical rules, things that are summed should add up.
  • Models based on domain experience, for example; an order should not be processed if payment is refused.

When I first started on this project, I imagined that by testing backwards I was actually doing something unusual, but it slowly dawned on me that I had been doing this on every project I’d ever tested on:

  • Every time that I had started a new project and played with the software to figure out what it did, I’d been testing backwards.
  • Every time I’d refined tests to account for implementation details not apparent from the specification, I’d been testing backwards.
  • Every time I’d found a bug and prodded and poked so as to better understand what the software was doing, I’d been testing backwards.

I was struck by the power of testing backwards:by seeking to understand what the software did rather than simply measuring its conformance with expected results, we are better able to learn about the software. By developing the skills required to test backwards, we are better able to investigate possible issues. By freeing ourselves of the restrictions of a single model, a blinkered view that conformance to requirements alone equates to quality, we are better able to evaluate software in terms of value.

Would testing backwards serve your mission?

Should I Start Testing?

I recently participated in a Software Testing Club discussion about Quality Gates. This led me to reflect a little more on the subject.

My first test management gig was running system integration testing for an enterprise project. The phase was planned to last for five weeks. System testing for a number of component systems was running late, and that was making the project look bad. The project manager impressed on me the political importance of starting system integration on time.

“System test should be done in another week or so” she said.

“So you should be able to catch that up in the last month of integration”. I reluctantly agreed.

Twenty five weeks of hell later, I submitted the exit report. Nothing had worked out as expected. Delays had continued on completion of some systems. Others that were supposedly complete either had gaping holes where functionality should have been, or turned into a bugfest. Attempting to test and raise system integration level bugs simply poured fuel on the fire, adding to the confusion.  I swore that from that point on I’d only test software that was ready to be tested: that I’d implement strict quality gates and hard entry criteria.

I can understand why Quality Gates have become an article of faith to many testers; the above experience led me in the same direction. Fortunately, experience has also given me a reality check.

Let’s roll the clock forward a few years and consider another project: one on which time to market was the most critical factor.  The end date was not going to change: bugs or no bugs. Development was running late, and enforcing entry criteria would almost guarantee that no testing would be performed by my team, no feedback would be provided to development, no bugs would be found or fixed. I abandoned my gates, and helped to identify some major issues that we were able to iron out before we shipped.

In the first example, quality gates might have helped me. In the second, they were utterly inappropriate. The problem with quality gates goes deeper than a discussion of relevance to different contexts however: they are a trivial solution to a complex problem. They seek to reduce the process of answering the question “should I start testing” to simple box checking. Perhaps the most insidious problem with gates is that they tend to emphasize reasons NOT to test. Thus the tester who has come to believe that testing is like a sewerwhat you get out of it depends on what you put in to it * – can use them as an excuse not to test. Unit testing incomplete? Don’t test. Major bugs outstanding? Don’t test. Documentation outstanding? Don’t test. All these factors miss a vital point: reasons TO test.

Even when test entry criteria have not been fulfilled, there are many good reasons to test:

  • Perhaps you can provide early feedback on items that aren’t quite ready for the big time
  • Perhaps you can learn something new and think of new tests
  • Perhaps you can become more familiar with the software
  • Perhaps you can test your own models and assumptions about the software
  • Perhaps you test whether your test strategy is workable.

The decision to start testing demands more thought than simple box checking can provide. Instead, questions like these can serve as a guide:

  • If I we’re to start testing now, what kind of constraints would my testing be under?
  • Given those constraints, what testing missions could I accomplish?
  • For those missions, what kind of value could I provide to the project?
  • What is the value of tasks that I might forego in order to test now?
  • If I were to test now, what other costs might this impose on the project?

Next time you decide whether or not to test, use your brain, not a check list.

*A nod to Tom Lehrer.

 

Counting Experience

Once upon a time, I believed that only testers should test, and testing experience counted for everything.  This was an easy trap to fall into: after all, I had been around the block. I’d learned lots about testing approaches, different strategies, methodologies and techniques. Looking back at my earlier efforts, and many mistakes, it was easy to think “if only I knew then what I know now!” and ascribe any improvement to experience.

Dumb. Arrogant. Mistake.

What changed my mind?

Let me briefly describe the project: we were customizing an enterprise solution owned by Client A for use by Client B. Due to the complexity of the product, the lack of any models or documentation, and the significance of the changes, this would be no simple matter. In addition, time scales were tight and we were iterating rapidly. I chose a testing approach that relied heavily on exploratory testing, with automation support for those areas that were considered high regression risk or eminently suitable for data driving.

The exploratory testers were to be the vanguard: first to test new code, first to test fixes, first to investigate and qualify bugs reported by others. This was to be the most critical testing role on the project. For this I was given two subject matter experts seconded from Client A: testers whose sole testing experience was a little bit of UAT.

Now, there’s a common misperception that I often hear about ET: “you need a lot of testing experience to do that”. Perhaps I could have listened to this conventional wisdom. Perhaps I could have succumbed to my prejudices. After meeting the testers in question, I played a hunch and chose not to.

We got started with some basics: discussing the impossibility of complete testing, the oracle problem, bug reporting expectations and the overall approach to testing for the project. Then we got testing. To demonstrate the mechanics of SBTM, I led the first couple of test sessions, after which I dropped back to chartering and reviewing session reports. Within a few weeks I pretty much left them to charter themselves, with a daily huddle to discuss and prioritize things to test.

In the early days I monitored progress intensively:

  • When I heard them debate whether a particular behaviour was a bug, I interrupted with a brief breakout session to discuss the relationship of quality and value, and a variety of different oracles they might consider.
  • When I heard them struggling with how to test a particular feature, I’d introduce them to a few different test design techniques and heuristics that might be relevant.
  • I’d review bug reports, and provide feedback as to follow-up testing they should consider.

Pretty soon they were a wonder to behold. They quickly assimilated every idea and concept thrown at them. The bugs they identified demonstrated significant insight into the product, its purpose, and the needs of its eventual users. It was fascinating to listen to animated discussions along these lines:

Tester 1: Is this a bug?

Tester 2: Hmm, I don’t think so. It’s consistent with both the requirements and the previous version.

Tester 1: But what if [blah, blah, blah]…doesn’t that really kill the value of this other feature?

Tester 2: Got you. That could be a problem with the requirements; we need to talk to the BA.

This pair had rapidly become the most impressive testing duo I’d ever had the pleasure of working with. How had this happened? They brought with them a blend of aptitudes and experiences that far outweighed their relative lack of testing experience:

  • They had open minds, a willingness to learn, and no preconceived notions as to “the one true way” to test.
  • They exhibited an insatiable curiosity: a desire to understand what the software was doing and why.
  • Having lived with the previous version of the product, they were dedicated to delivering quality to other users like them.
  • Their experience as users meant that they had well refined internal oracles that gave them insight into what would diminish user value.

Their experience counted for a lot, but not their testing experience: any testing know-how they needed they were able to learn along the way.

I’m not claiming that testing experience counts for nothing: I’ve also worked in situations where testers needed significant experience in testing, and specific types of testing, or else be eaten alive by a sophisticated client. Testing experience is only one piece in a complicated puzzle that also includes domain experience, technology experience, attitude and aptitude.  Different situations will demand a different blend.

None of these factors are simple. Consider testing experience: this is not a straightforward, indivisible commodity. Testing is a diverse field, and highly context dependent. What works for you in your context might not work for me in mine. Often the recruitment of testers boils down to a simple count of years in testing. A tester can spend decades testing in one context, but when moved suffer from “that’s not how you test” syndrome.  Such an individual is ill equipped to learn or even consider the approaches and techniques that are relevant to a new situation. Even a testing virgin could be a better choice, if accompanied by an open mind. Diversity of experience, and a willingness to learn and adapt, count for far more than years. Counting experience is for fools.

Mission Creep

“Testing was going so well” said the tester, “at least on the first release”.

“We had a clear mandate: find as many important bugs as quickly as we could, and report them to the developers. And we found lots: the PM was happy because we were giving her a good feel for quality, the developers were happy because we were giving them rapid feedback, and the test team was happy because we felt like we were doing our jobs well.”

“I suppose it was during acceptance testing that things started to change. The UAT team hadn’t had enough exposure to the app during development, and struggled to figure out how to run their tests. In the end, the PM asked us to help out. We were only too happy to: we starting designing and maintaining scripted walkthroughs of most of the key features and requirements, as well as authoring the associated test data. This took a fair amount of effort, but we were up for it: it’s a team effort after all.”

“The initial release went in pretty smoothly, I mean, some bugs crept through, but we helped out there too: a lot of what support were getting hit with, we were able to find workarounds for, anything else we were at least able to repro and isolate for the devs. We still do a lot of that now: it helps to keep up a good relationship with the support team.”

“The latest release was a lot hairier; a fair few of the devs have changed. The new guys struggled to understand why a lot of the unit tests were failing, and ended up commenting them out: this meant we started seeing a lot more regression bugs. Added to that, they’re offshore: now we’ve got developers on three continents. Communications don’t seem to be hanging together and somewhere along the line config management got messed up. We ended up doing a lot more regression testing this time around.”

“Got to go, post mortem starts in five minutes.  Release 2 went in last week, and the PM is on the war path: she can’t understand how we missed so many major bugs.”

What happened here?

In the story above, the testers started out with a clear mission: find bugs.

…then they began to provide scripted demonstrations for acceptance testing.

…then they started to figure out workarounds and do isolation for the support team.

…then they added black box regression tests to mitigate regression risks.

…and then they started to fail in their initial mission.

After their initial success, they allowed their mission to expand beyond its original goals, and fell foul of mission creep.

Mission creep brings a number of risks:

  • Loss of effectiveness. There are many possible missions for testing, for example: finding bugs, investigating quality, reducing support costs, mitigating risks, reducing liability, conforming with standards or regulations1. Whilst any of these is potentially valid, some practices are more suitable for some missions than others. If changes are not recognized, and practices not changed accordingly, a test team can find itself working in a way that runs counter to its mission.
  • Loss of focus. Different goals can be blended, but this requires practices to be blended too. This adds complexity. If you have try to do too many things, you may not be able to do any of them well.
  • Overextension. Like its project management cousin scope creep, increasing the number of goals often requires additional effort. Without a corresponding increase to time or resources, mission creep means that less effort can be allocated to each goal, making success of any goal all the less likely.

How can you address mission creep?  Here are some suggestions:

  • Make the mission explicit. Consult with your stakeholders and ensure that the mission is both understood and agreed. If appropriate (for example: if you are an external contractor or testing vendor), consider making the mission a part of your formal scope.
  • Keep an eye open for your mission changing or expanding. Don’t let change surprise you. Review regularly, and engage stakeholders in this process. Are all needs being satisfied?  Does the project need anything new out of testing?
  • Make adjustments to mission and supporting practices. Don’t let your mission and practices diverge, but consult with your stakeholders about the tradeoffs. What are the time and cost implications? Do any goals conflict? Consider whether it is possible to refocus: are any goals now redundant?

Testing missions evolve. Unrecognised, mission creep can cause significant damage to your ability to deliver what your project needs and expects of you. If you are testing the same software for any length of time, mission creep is almost inevitable: it needs to be managed.

 

See Lessons Learned in Software Testing (Kaner, Bach & Pettichord) for a discussion of missions in testing.

Rethinking Regression, Part 5: Your Mission, Should You Choose to Accept It

Wouldn’t it be great if projects would take a “sensible“approach to mitigating regression risks? If projects applied plenty of prevention, used automated unit-level checks for confirmatory testing, and left the testers to do what they do best: find bugs.

This is not the reality on many projects. Nor is it even appropriate on every project. Not every change is sufficient to require reviews. Not every change will necessitate refactoring code. Static analysis tools can be noisy and take time to tune: not all projects will run for long enough to justify this investment. Not every project will be delivering code with a shelf-life that warrants automated unit level checks. Some projects may be having significant difficulties with their configuration management systems that require time to resolve.

“Sensible” therefore takes in a whole range of factors that a tester may not consider or even be aware of. Ultimately, it will not be the tester who determines what mitigation strategies are appropriate for the project: that it the province of the project manager.

What does this mean for the tester? In a word: mission.

It is often helpful to agree a clear testing mission with the relevant stakeholders. Doing so helps to avoid the unpleasant surprises (“You’re doing A? I thought you were doing B!”) that can result from misaligned expectations, and helps to keep the testing effort pulling in the same direction as the project.

The regression testing mission will be driven by a range of contextual factors that might include scope, scale and nature of the changes being implemented, stage within the project life-cycle, project constraints and the other mitigation strategies that the project are employing.  For example:

  • Project A is implementing a wide range of mitigation strategies, including configuration management and unit-level change detection. The project manager and testers agree that the testing mission should be biased towards finding bugs with only light confirmation being performed at the system level (as change detection is largely provided at the unit level).
  • Project B has effective configuration management, but no automated unit level regression checks. The project manager and testers agree that the testing mission should be strike a balance between conducting confirmation around those areas that are changing, and testing for bugs.
  • Project C has little regression mitigation: configuration management has proved highly unreliable and no automated unit level regression checks. Based on the nature of the changes and the stage in the project, the project manager and testers agree that the testing mission should focus on broad confirmation of the software, with some time allocated to testing for bugs.

Explicitly discussing the regression testing mission can provide the tester with an opportunity to ensure that  the relevant project stakeholders are aware of the limitations of black box regression testing. However, if a project manager understands that black box regression testing is not the most cost-effective means of providing change detection and is seriously limited in its ability to find bugs – but decides to rely on it to mitigate regression risks – then that is his or her decision to make. In such a position, all that a tester can reasonably do is recognize that they are selling tobacco and provide a health warning so as to set expectations.

In summary, the regression problem is not a single problem; it is a range of different risks that are most effectively mitigated with a variety of different strategies. By educating their stakeholders about the limitations and tradeoffs involved with black box regression testing, testers can help them to make better risk mitigation decisions. Ultimately contextual factors will drive decisions as to which strategies are appropriate on any given project, and the regression testing mission needs to be defined accordingly.

Other posts in this series: