Uncertainty Revisited

I’ve written previously about the role of the tester in reducing uncertainty on software development projects: of how we model and observe, building our knowledge and providing information.

It might be tempting to imagine the tester as a perfect observer, standing aloof, measuring and judging. Sadly, such an image is delusional: uncertainty permeates everything we do, not just the software we seek to understand. We don’t stand outside the room looking in.

What are the sources of uncertainty in testing?

1) Much uncertainty is inherent in the testing challenge: the impossibility of complete testing guarantees that we can never have full knowledge of the software we test, nor is it conceivable that any set of test techniques will ever predict with certainty where all the bugs will be found.

2) We are subject to model uncertainty.  As testers we construct models as to how the software should work, and how it could fail. These models are invariably flawed:

  • Consider oracles, our models as to how the software should behave: every oracle is heuristic, that is to say useful but imperfect. If that were not the case, if we had complete true oracles, then these oracles would be indistinguishable from the desired state of the software under test: why would we need that software?  Further, quality is subjective, relating to the needs and values of people: as such there can be no absolute and objective oracle.1
  • Consider bug hypotheses, our models that describe potential failures: if these were not flawed, then we could perfectly predict each bug without running a single test.
  • Consider tests themselves: each test is a model that describes how the software will behave under certain conditions.  Unfortunately the range of conditions is sufficiently vast that it is easy to miss conditions that prove to be critical.  The range of resulting behaviours presents a similar challenge.2
  • Even models that describe testing itself are flawed, some more than others.

3) Our observations are subject to measurement uncertainty: our interactions with the software influence how it behaves. This is not limited to our selection of conditions, nor even to Heisenbugs and the probe effect of resource monitors: the very rate, frequency and sequence of our actions can drive different behaviours in software (consider race conditions and resource leaks).

4) We are subject to human error. Testers are humans too: our perceptions our limited, we can only reliably focus on so many things at once, we have unavoidable psychological biases that will influence our choice of tests, the behaviours we observe, and how we interpret our observations.

Much uncertainty is epistemic, that is to say that it can be reduced.  If we are to reduce the uncertainty associated with software, we would be wise to understand the role that uncertainty plays in our own work, and seek ways in which we can reduce that too.

Notes

  • 1 Michael Bolton discusses this rather eloquently in Oracles.
  • 2 Doug Hoffman provides an interesting and detailed discussion of these issues in Why Tests Don’t Pass.

Babies 0: Bathwater 1

Michael Bolton took me a little by surprise this week when he tweeted a link to my blog with the #agile hash tag: I consider myself about as agile as a two-by-four.

That’s not to say that I am an agile virgin: I have tested on a few agile projects, and managed testing teams supporting iterative development.  I don’t however consider myself an expert, having cut my teeth on the waterfall test crunch of doom.

The best way that I can think of to characterize my experiences with agile is with one word: bipolar.

Let’s take the good first (call it Project 1), and describe it in terms of the Agile Manifesto:

  • Individuals and interactions (constant BA/dev/test interaction) over processes and tools (lots of emphasis on process and tools, constantly being tweaked and improved, more than I’ve ever witnessed elsewhere – is this some kind of agile paradox?).
  • Working software (every iteration gave us something that worked) over comprehensive documentation (minimal, but that which existed was both useful and used – i.e. no dust gatherers).
  • Customer collaboration (frequent engagement in refining requirements, defining acceptance criteria and acceptance testing) over contract negotiation (minimal).
  • Responding to change (effectively managed through backlog) over following a plan (burn down estimation).

This was a positive experience, the team gelled, the software hung together nicely and the customer left happy.

Now let’s consider the polar opposite (Project 2):

  • Individuals and interactions (none) over processes and tools (none).
  • Working software (take a guess…) over comprehensive documentation (what documentation?).
  • Customer collaboration (who?) over contract negotiation (what?).
  • Responding to change (hourly) over following a plan (plan is a four letter word).

This project called itself agile, assumed that meant “Process – Pah! Documentation, we don’t need no stinking documentation!” and put nothing in their place.  In short, the lunatics were running the asylum – and I’m not talking developers here – this was the whole team approach to insanity.

The funny thing is, I’ve been on waterfall projects that strongly resembled Project 1: good interaction, sharing of information, useful documentation, and customer collaboration.  I’ve also been on waterfall projects that could easily have been confused with Project 2: madness personified.  The common denominator is not the label, it is the team dynamics.

Software engineering is, when all things are said and done, a social activity.  We’re not making widgets here, there is no physical product being passed from A to B: our products are information, ideas and words – and these are like the wind. Without good collaboration the moment is lost and the information along with it. Process and documentation can at least provide some base level of information sharing. Rip these out without replacing them with people talking to one another and the baby has gone out with the bathwater. Regardless of methodology and other labels: effective sharing of information helps teams to succeed.

Whatever your methodological preferences, please look after your babies.

Learning for the Long Haul, Part 2: Touring & SBTM

“Tell me and I’ll forget; show me and I may remember; involve me and I’ll understand.” Chinese Proverb.

In the previous post I introduced the problem of building and maintaining knowledge (primarily application knowledge) within a testing team. I then described some simple artefacts that could serve as cost-effective (and opportunity-cost effective) alternatives to “idiot-scripts”. Such artefacts may prove useful as a means to introduce new testers to a given piece of software, but are a starting point at best. The bottom line: much of what we learn, we learn by doing. To effectively build knowledge we need more than passive methods (reading a document, browsing a model, watching a video), we need ways in which new testers can become actively engaged in their own learning.

Over the last few years, I have employed a handful of different approaches to doing so. This has included setting dedicated “play time” during which the new testers freely roam the software, and secondments where testers move into a new team and gradually build up their knowledge through testing on a live project. Whilst the results have been adequate, there is room for improvement:

  • In many cases testers were able to pick up the kind of information that was expected from them; in some cases this learning simply took too long.
  • The use of “play time” had mixed results: some testers were able to pick up significant amounts of information about the software whereas others floundered in the absence of specific learning objectives.
  • Assessment of progress was difficult, highly subjective, and not particularly granular (“Can Fred test that now?”).

As a result, I’ve started looking for ways in which this kind of learning can be enhanced such that testers:

  • Are active participants in their own learning.
  • Have specific learning objectives.
  • Are provided with a structure that supports goal setting, reflection and feedback.

In order to address these points, I am looking at a blend of tours and session based test management (SBTM).

First, let’s discuss tours. Tours are a set of simple techniques for getting familiar with a software product: they are often used within exploratory testing to emphasise the “learning” aspect of the ET triumvirate (learning, design and execution). Michael Kelly provides a good overview of a number of different types of tour in Taking a Tour Through Test Country.

Using this approach, a tester explores an item of software in order to learn about it. Some example missions are listed below:

  • Identify and list the software’s features.
  • Identify and list the software’s variables, include any thoughts as to equivalence classes and boundaries.
  • Identify and list the software’s data objects.
  • For a given data object, create a state diagram that represents its life-cycle.
  • Create a map describing how the user navigates the software.
  • Identify and list any potential or claimed benefits of the software.
  • Identify and list any decision rules implemented by the software, for complex sets of rules representing these as a decision tables or cause-effect graphs.
  • Identify and list different ways in which the software can be configured, and the consequences of each configuration.
  • Identify ways in which the software interacts with other systems with which it interfaces: draw a sequence diagram.

This is far from exhaustive, and Kelly’s article includes a number of other ideas.

In these examples, I’ve coupled learning objectives with specific deliverables such as the creation of models or inventories. I used a similar approach on a project where I was using ET with the goal of reverse engineering a product rather than testing it: in doing so I found that creating a model whilst I explored provided additional focus, helped me to keep track of my progress and helped in identifying additional avenues that might be worth investigating (at the time, I likened this to keeping a map whilst playing a 1980’s text based computer game). An added benefit in the context of learning is that such models can serve as an assessable deliverable (more on this below).

Now to SBTM: Jon Bach describes Session Based Test Management as a means to organize exploratory testing without obstructing its flexibility. This seems a reasonable fit: not only is it a process with which many testers are familiar, but it also enables goal setting, reflection and feedback within a structure that is flexible enough to adapt quickly to a tester’s individual learning needs.

As I am using this to structure learning rather than testing, I’ve made a few tweaks. Here’s an overview:

The general idea is that the tester’s learning is broken into manageable sessions, each with a specific learning mission. This mission is agreed by the tester and his coach (another team member with more experience of this particular software).

With the mission established, the tester is free to tour the software or any associated material with the goal of fulfilling that mission. Whilst doing so, he constructs any agreed deliverables.

On completion of the session, the tester creates a short report that outlines what was achieved. This gives the tester an opportunity to reflect on what he has learned, how any new learning relates to his previous knowledge about the software, and what else could be investigated.

Finally, the coach and tester perform a debrief in which the session report and any deliverables are reviewed. This gives the tester an opportunity to further refine his thoughts so as to be able to articulate his learning to his coach, whilst allowing the coach to assess what the tester has learned and provide any feedback that she feels is appropriate. This debrief is also an opportunity for the coach and tester to agree potential follow-up sessions, allowing them to tailor the route through any application-specific curriculum to the needs of the individual tester.

This is a work in progress, and I’ll write more on this once I’ve tested it out further. In the meantime, if you have any thoughts or ideas on the subject – or have attempted anything similar – I’d love to hear from you.

My thanks to Becky Fiedler: who helped me to coalesce some of these thoughts, whilst adding to my reading list 🙂

Learning for the Long Haul, Part 1: Onboarding

Scenario 1
Project Manager: Hey, sorry about the short notice, but we’ve got a new build coming down the pipe. We really need Bob on this one next week, he’s the expert.
Test Manager: Er, sorry, Bob’s an in-demand guy…I’ve got him allocated to XYZ for the next month, no one knows it like him.
Project Manager: [disappointed silence].

Scenario 2
Tester: You know, I really like the sound of that role on the XYZ project, it’s the kind of testing we talked about when we discussed my development plan – just what I’ve been trying to get into.
Test Manager: Ugh, well…I’m going to have to think about that. It’d leave your current team short-handed, and I don’t really have anyone else with the application knowledge to backfill you right now…
Tester: [sound of resume being updated].

Scenario 3
Test Manager: Great news!  I know your team is short-handed right now, I’ve got a new tester for you.
Test Lead: Great. With our timescales, how do you think I’m going to get a newbie trained up?  That’s going to reduce our capacity not increase it.
Test Manager: [mutter].

These scenarios might sound familiar. What they have in common is that they all suggest problems in how knowledge is being managed.

Knowledge is the tester’s stock in trade. It is from a position of knowledge that we construct our models as to how software should work, and how it could fail. Testing itself can be seen as a process whereby we refine these models, reducing uncertainty as we accumulate more knowledge. Within my context, as a manager of teams who provide ongoing testing services for multiple applications, effective management of knowledge is critical. Failure to do so results in the concentration of expertise in too small a group of individuals. This can give rise to a number of risks and issues:

  • Teams lack flexibility when it comes to assigning people to different projects, the result of which is often ineffective resource allocation: some teams are left short-handed, others lack the knowledge they require.
  • Team capability is placed at risk should a critical member leave.
  • Team members may miss out on opportunities to further their careers if their knowledge locks them in to a particular role.
  • Introducing new team members into the mix can be both time consuming and distracting.

A common knee-jerk reaction to these problems is idiot-level scripting, i.e. script to such a level of detail that “any idiot off the street” could run the tests (just imagine how good that testing would be). Unfortunately, scripts are of limited value as a vehicle for learning and the development and maintenance of such scripts represents a significant opportunity cost.

When it comes to providing information to new team members, there are alternatives. Here are a few that some of my teams have successfully used:

  • Build models. A picture is worth a thousand words, or in this case a thousand scripts. Models are an exceptionally powerful ways to convey information, and are far more efficient at doing so than scripts.  Not only can they describe how software should work, but they can also suggest tests. Models can easily be harvested from test basis documents, or created on the fly as testers gain a better understanding of a given item of software. Examples might include feature inventories, use cases, decision tables, state diagrams, context diagrams and so on.
  • Create “how to” guides. Perhaps certain tasks will be performed frequently during testing.  Why not borrow an automation pattern and decouple the scripts from the tests? Maintaining a set of instructions in one place is far more efficient than replicating it in hundreds or thousands of places, and once testers have become familiar with a given procedure they are unlikely to rely on the instructions any more – giving them back their peripheral vision.
  • Make screen cam or video walkthroughs. A cheap alternative to the above, a quick screen cam of a given task is easy to produce, leaves less margin for error when it comes to different interpretations of instructions, and can include voice-overs that provide lots of additional information.

Of course, none of the above are a substitute for experience: if you’ve never ridden bicycle before, try reading the manual first then see how well you do.  What the above approaches do provide are ways in which learning can be facilitated without the opportunity cost of idiot-scripts. The rest is down to testing.

I’ll discuss strategies for building the knowledge of testers, and propagating knowledge throughout a test team, in later posts.

Risk Based Testing is Not Cheating

Risk based testing: a set of test management and design approaches that facilitate the prioritized selection of tests based on ideas concerning possible failures.

Over the last few years, I’ve come across some interesting attitudes towards risk based testing. Here’s three:

  • Example 1. Project Manager: “No way! You’re not taking shortcuts on my project!”. This PM thought that we planned on cheating.
  • Example 2. Test Manager: “Sure! We do risk based testing: when we hit a crunch we drop the least important tests – we don’t like to do it, but sometimes you have to, right?”. This tester thought that he was cheating.
  • Example 3. New Tester: “How do I know what’s important to test?”. Now we’re talking!

In the first two cases, risk based test management was equated with cheating. And it can seem like cheating, right? After all, you can think of a whole bunch of things to test, but many of them wind up being discarded in the interest of time or efficiency.

The reality is closer to the question asked by the naive new tester in example 3. The impossibility of complete testing means we can only ever test a sample: we’re going to test some things and not others. This makes it important to make smart choices about which tests are included in the sample, and which are not: we need to bias our sample towards those tests that are going to yield the most important information.

Risk based testing encompasses a range of approaches for doing just that. It does not hold a monopoly however: almost any testing that isn’t simple grind through a predefined set of tests will involve exactly this kind of thought process, conscious or otherwise.

Whenever a tester asks “should I test this, or that?” some form of risk based decision is being made. These kind of decisions are normal, and they certainly aren’t cheating.

Two Tales of Learning

Riding a bicycle is a complex skill to acquire: not only must a rider stay balanced, but also steer, pedal, remain aware of his or her surroundings and even brake all at the same time.

Enter stabilisers (training wheels). By taking balance out of the equation, these reduce the complexity a little, and let the new rider focus on learning and coordinating the other skills.

When my daughter first learned to ride her bike, I assumed that once she grew confident with stabilisers, it would be easy enough for her to ride without them. Along came the big day, off came the extra set of wheels, and…disaster…over she went. A fluke accident? Once the tears had dried, she reluctantly agreed to try again…to the same result.

But why did she have this difficulty, once the stabilizers had come off? Simply put, she’d mastered the other aspects of riding, but had relied on her stabilisers, never learning to balance on her own.

More recently, I experienced something similar. Last year I bought a GPS for the car. I’ve always prided myself in both my direction sense and my map reading skills, so had always avoided this particular gadget. However, facing the prospect of a 900 mile drive to New York with my family, I decided it would be wise not to rely solely on a map (and to this day believe that the use of this little device saved my marriage).

Having been quite impressed with the GPS, I started using it more frequently. When I was called upon to visit our office on Prince Edward Island for the first time, out came the GPS. Over the next few months, I made several trips to the same location, each time using the GPS. Then one day…disaster…dead battery, no charger and no map in the car. It took me several wrong turns, and many miles of exploring before I finally figured out my route.

What had happened? Pre-GPS I would have used a map and a few scribbled notes on my first journey, then would easily have been able to find my way on each subsequent trip. Why was it different this time? I had been relying on the GPS, much in the same way that my daughter had relied on her stabilisers. I had relied on its detailed turn-by-turn instructions, and what’s more, had focussed on following those instructions to such a degree that I hadn’t even taken in my surroundings.

These are both true stories, but what have they got to do with testing?

In Scripting: An Industry Worst Practice?, Kaner and Bach use the navigation analogy as part of a refutation of the assumption that test scripts make effective training wheels. This, and the assertion that scripts make inexperienced testers productive whilst they are learning, are both frequently cited arguments in favour of scripts.

Can a script help a new tester learn? As in the story about my daughter, perhaps by removing some of the variables from the testing equation the tester can focus on learning some aspects of their role. This might include gaining an overview of the software’s features, its basic operation, and the common checks that are frequently performed during testing. Equally, reliance on scripts will prevent the tester from learning other things. How much does a script teach about the needs of users, of the rationale for a particular piece of software, the underlying rules that it implements, or even how to test it such that important failures will be revealed? A test script abstracts these things away in the same way that stabilisers take the balance out of riding a bike. Scripts are a limited learning tool.

Can a script help a new tester test productively? As with my GPS, perhaps a script will get a tester to their destination. This might be the end of the script, or to a point that they identify that the script cannot be completed due to bugs. This might add value to the testing effort if simple checking is called for, for example when mitigating regression risks (although such an approach has issues of its own, as discussed in this series of posts). However, testing in such a manner checks only the things that the script was designed to check for: tangentially related bugs that would be obvious to anyone actually using the software are likely to be missed because the tester will be focused on following instructions and not on taking in their surroundings – in exactly the same way that I wasn’t taking in my surroundings when using my GPS. This is a dangerous approach and a good way to miss bugs.

Executing scripts can offer some limited value. However, for either learning or testing to be truly effective a healthy dose of trial and error is required. Several weeks after the stabiliser came off, and after a few more spills, my daughter was finally riding with confidence. I no longer use my GPS for trips to Prince Edward Island, and since exploring some of the back roads of Nova Scotia, I have never lost my way again.

Spec Checking and Bug Blindness

Testing is often reduced to the act of modelling specified behaviours as expected results and checking actual software behaviours against the model.

This reduction trivializes the complexity of the testing problem, and reliance on such an approach is both flawed and dangerous.

In “Software Testing: A Craftsman’s Approach”, Paul Jorgensen discusses some questions about testing through the use of Venn diagrams. This post will use a modified version of those diagrams to explore the kinds of issues that we miss if we rely solely on checking against specifications.

Rather than depicting, as Jorgensen does, the relationship between specified, implemented and tested behaviours, the diagrams used in this post recognize that there is a distinction between desired behaviour and those that are specified.  Given a universe of all possible system behaviours, and three overlapping sets containing those behaviours that are needed, specified or implemented, we might conceive of the testing problem like this:

Those behaviours that lie within the intersection of all three sets (region 5 on the above diagram) represent those behaviours that are needed, specified and implemented.

From a bug detection viewpoint, the remaining regions of these sets (i.e. 1 to 4, 6 and 7) are more interesting in that they can be grouped in such a way as to describe 4 possible classes of bug. Let’s take a look at each class in turn:

Unimplemented Specifications

The highlighted section of this diagram equates to behaviours that were specified but not implemented: either the required features were not implemented or they were implemented incorrectly such that the intended behaviour does not occur.

It is this kind of bug that the specification based checking is geared towards.

Unfulfilled Needs

In this case, the highlighted section of this diagram equates to behaviours that are needed but have not been implemented. Note that these bug classes are not mutually exclusive: a bug in region 2 can be categorized as both an unfulfilled need and an unimplemented specification bug.

This kind of bug is far more insidious than those belonging to the previous class: whilst we may catch some such bugs (those in region 2) by checking against specifications, we will miss those that relate to needs that were either not articulated or not captured when the software was specified. Finding this kind of bug requires testing that is sensitive not just to specified behaviour, but to the underlying needs of the customer.

Important: whilst some will argue that such behaviours are “out of scope” for a given project by virtue of not having been specified, building software that does not fulfill the needs of its customer is a fast route to failure.

Unexpected Behaviour

With this class, the tester’s life starts to get interesting. The highlighted section of this diagram equates to behaviours that have been implemented but were not specified: this is the realm of the unexpected bug. Occasionally, unexpected behaviour may turn into an unexpected boon (region 4): behaviour that was not specified but is actually desired (perhaps the developer had an insight into real needs and implemented something without it being in the spec).

Other than through the intervention of dumb luck, specification based checking will miss many bugs in this class. Some will be apparent where an unspecified behaviour is substituted for a specified one; however this class also includes pretty much anything that could fail. Testing for this kind of bug requires the creativity to imagine possible failures and the skill to craft tests that will determine whether or not they they can occur.

Undesired Behaviour

The highlighted section of this diagram equates to behaviours that have been implemented but were neither needed nor desired: for example, “gold-plated” features or behaviours that were specified incorrectly.

Much like the previous class, specification based checking will miss many of these bugs. It will also be completely blind to behaviours that were specified but are not desired. Like the previous class, testing for this kind of bug requires imagination and skill. It also requires an understanding of customer needs that is sufficient to identify potential issues regardless of whether specified or not.

Conclusions

Specification based checking is a good fit for only one of the four classes of bug discussed here.  In the other three cases, the power of such an approach is seriously limited. Whilst such an approach may be necessary, it is insufficient if the testing mission is the discovery of bugs: an excessive reliance on it will inevitably result in important bugs being missed.

Testing for many types of bugs requires a more investigative approach: an approach that brings the skill, creativity and knowledge of the tester into play.