Principles Not Rules, part 3


As much as I have an aversion to mission statements, born of years working in organisations where everybody had to have one in order to satisfy some standard or other, my team and I agreed on the following “purpose” for our testing:

“To enable informed decision making by discovering and sharing timely and relevant information about the value of solutions, or threats to that value.”

This is our purpose, this is our cause. We test to discover things, things that are useful, things that help our stakeholders make better decisions.

 Yes, it’s generic. But it’s a starting point. It’s helping us to trigger new ways of thinking through testing problems.

Perhaps more importantly, it give us a useful lens through which to challenge ourselves. When someone suggests that we act in a particular way, we can look to our purpose, and ask “does this do anything to help us achieve our purpose, does it take us in the right direction?”

Steve Jobs had something interesting to say about this: “People think focus means saying yes to the thing you’ve got to focus on. But that’s not what it means at all. It means saying no to the hundred other good ideas that there are… innovation is saying no to a thousand things”. In something like testing, which can never be complete, in which every decision is a trade-off, that kind of focus is critical: we need to be able to say “no” to both the otherwise reasonable sounding suggestions, and the more silly ones, alike. In terms of examining such things, and explaining why we choose to say no, our purpose is a wonderful tool.

This kind of thinking is also making its way down to individual projects. I’ve noticed cases where testers have started to think in terms of the major “exam questions” that they need to answer, and the standard of evidence they need for their stakeholders and regulators. I’ve started hearing testers talking to other team members about what their projects are trying to achieve, and what they might need to know to help them. Test strategy is starting to look less like a bunch of logistics and more like a mandate to go discover. Gradually, setting and refining information objectives for testing seems to be becoming part of the way we work. Not everywhere, but hopefully enough to catch.

I will add a note of caution here. When setting information goals for a project, it is easy to think in confirmatory and binary terms “does the product do X?”, “can we load the data?”. But some of the most interesting questions to ask are neither confirmatory or binary: “what happens when?”, “how long does this take?”, “how many users can it handle before it goes BOOM?”. We should avoid closing ourselves off to such questions. Here’s a tip: if you reframe your testing objectives as questions, try and make sure they are not all closed questions. Open questions are important.


In addition to our purpose, my team and I agreed a set of eight principles: context, discovery, integration, accountability, transparency, information value, lean and learning.

I’m not going to go into the detail of these here. I suspect much of the value is less in the concepts, or the particular wording, but more in how we got there: after spending many long hours working these out, sweating the semantics, they’re highly personal to those testing with me. The process of developing and agreeing them as a group was insanely valuable: forget the shallow glossary of terms the other guys peddle, this is a real common language. We understand each other when one of us says “transparency”, or “lean”, because we’ve invested time getting to the bottom of how those labels matter to us, and we continue to invest time in sharing those meanings with those we work with.

Principles can be values, and they can be heuristics that help guide our thinking. They are not prescriptive or detailed, indeed, they can often be open to interpretation, or contradict one another (such tensions may even be a useful indicator that you’re pitching principles at the right level). This means that the user is forced to THINK when applying them, and this encourages the use of judgment.

The alternative is rules: simple mechanistic formulae or explicit instructions: fill out this template, use this technique, check this box. Principles are different. Principles are pivotal in empowering the tester. What we’re trying to do is regulate the testing system rather than simply control it, and principles have a pedigree in regulation.

In 2007 the UK’s Financial Services Authority published a treatise on “Principles Based Regulation” that described a trend away from rules in the regulation of the financial services industry in the UK. In this, they described their rationale:

  • Large sets of detailed rules are a significant burden on industry
  • No set of rules is able to address changing circumstances. In contrast, rules can delay or even prevent innovation. They tend to be retrospective, i.e. solve they yesterday’s problems rather than today’s or tomorrow’s
  • Detailed rules can divert attention towards adhering to the letter rather than the purpose of regulations, i.e. they encourage rule following behaviour, compliance at the expense of doing what’s right.

The FSA aren’t alone. You see this in a number of domains: regulation, law, accounting and audit. In a 2006 paper called “Principles not rules: a question of judgment“*, ICAS, the Institute of Chartered Accountants of Scotland aired views similar to those of the FSA:

  • When using rules, one’s objectives can become lost in a quest for compliance
  • In contrast to principles, rules discourage the use of judgment and deskill professionals.

So, do you want testers burdened down by a large body of rules (dare I say it, a standard?) describing how they should behave, do you want them to be deskilled and reduced to simply taking and obeying orders? Or do you want skilled testers who can think for themselves, apply professional judgment to choose, adapt or even innovate testing practices? If the latter, then I suggest you want to be thinking in terms of principles rather than rules.

Before moving on, it is worth mentioning that the SEC (2003) point out that principles based regulation does not equal principles only regulation, and indeed the FSA saw rules and principles as coexisting: the trick to a robust regulatory framework is in finding the right balance. In our framework, we place great emphasis on principles, but do maintain a handful of rules: for example concerning the use of production data in testing. There are laws after all.

Perhaps one of the more interesting aspects of my role this year has been finding this balance for some of the firm’s largest programmes and change initiatives. In each case we started out with long wish lists of rules, driven by a desire to consistency, yet when considering the legitimate variation between projects ended up agreeing principles instead, supported by a bare minimum set of rules. To avoid disempowering people, a light touch is required.


We believe that testing should be organised at the level at which delivery is performed, because the people closest to a context are those best suited to make the right decisions about what practices are needed. As a result, we do not specify any particular practices in this layer of our framework: we avoid dictating testing practices to projects, we do not push standards, we do not have a testing process document, we do not have templates. Teams, who own their own testing, are free to create, adopt or adapt such things, based on their own needs.

That is not to say that it’s a free for all. I have set a clear expectation that delivery teams are accountable for the quality of their own testing and must remain transparent in what they do. This is critical: empowerment can only thrive when there is trust.

And this is the elephant in the room. Many large enterprises are built on a foundation of mistrust: we manage projects through command and control because there is no trust; we demand suppliers comply with standards because there is no trust; we maintain elaborate sets of rules because there is no trust. To change things, we need trust; and trust is dependent on accountability and transparency.

We acknowledge that testing is a service, and that we are accountable to our stakeholders for the quality of our testing. We define the quality of our testing in terms of information value: whether the information we provide is useful, i.e. timely, relevant and consumable. We recognise that information is of no value if not shared, so we must provide transparency into what we discover, and to provide a warrant for those findings, into the extent, progress and limitations of our testing.

The alert amongst you may have noticed that these are three of our principles: accountability, transparency and information value. The prerequisites for trust and empowerment are firmly rooted in our framework.


Figuring out what kinds of information people need is hard. Evaluating software is hard. Sharing what you find in a way that is accessible to stakeholders is hard.

Nothing about testing is easy. It is hard enough without constraining ourselves unnecessarily with inflexible rules! But we also need to acknowledge that, when you’ve been living under a regime of command and control for a while, it can be hard to empower yourself.

This is where our testing community comes in. To break the rules based culture, we need to create an environment where people are comfortable sharing ideas and challenging one another. We need an environment where people can ask for help and support one another. If our people are going to gain confidence, grow into the role of empowered testers, then we need to make sure that there is a support network for them. You’d be foolish to learn a trapeze act without a safety net: and I need to make sure that those testing in my corner of the organization have one. It’s early days and this is an area where I intend to make significant investment in the coming year.

Final Words

This is proving to be a fascinating journey. It’s a journey of respect, respecting people enough to give them a chance to rise to the challenge of becoming excellent testers, freeing them from the tradition of command and control that has so constrained their work. The empowerment paradox suggests that we cannot directly empower others, but by removing these obstacles, perhaps we can create the conditions for them to empower themselves.

*My thanks to James Christie for discovering and sharing this.

Principles Not Rules, part 2

My Current Challenge

I head testing within the treasury function of a bank.

Like many of our peer organizations, like many large enterprises, we have a history of having commoditized, juniorized and offshored much of our testing. Successive changes to location strategy have left our testers scattered in multiple locations, often geographically separated from their projects. In most cases testing has historically been performed by “independent” testing teams, poorly integrated into the delivery effort, managed via command and control by a handful of test managers in the centre. This model has proved expensive, slow, and not terribly insightful. It has done little to prevent a number of projects “going dark” with regards to the quality of the product being delivered – an event often followed by project failure. It some cases this model is barely – if at all – better than a placebo.

In contrast, what we want is testing that informs us about quality and helps keep projects transparent. We want effective testing that is worth what we pay for it: i.e. that is “cost effective” (the clue is in the second word!). We want testing that is integrated with delivery and that is supportive of our firm’s transition to agile.

In short, we have a big gap between reality and expectation. If this gap weren’t challenging enough, we operate in a highly regulated industry: we have a requirement to demonstrate to our regulators that we have an effective control environment. Certain programmes of work are under intensive regulatory scrutiny and this demands a high level of control and transparency.

This is my challenge: how to enable good testing – by empowering testers – yet still fulfil a seemingly contradictory need for control? This is where our “principles based’ testing framework – a way of thinking about, organizing and governing our testing – comes in. It has its seeds in Simon Sinek’s golden circle.

If you haven’t seen Sinek’s TED talk – Google it, it’s worth a watch. His main argument is that most of us don’t have a purpose, cause or belief that’s worth a damn, and that when most of us communicate, we’re all about WHAT we’re doing, or HOW, but rarely WHY. In contrast, he argues, successful organisations have a clear WHY – and they start from that in everything they do.

This got me to thinking: when’s the last time I heard (outside of a conference) any testers talking about WHY they were testing? When’s the last time I saw a test strategy that gave any even the slightest indication of the mission, goals or objectives of testing? So I started asking people. Why do you test? What value do you bring? I got a lot of generic and dubious answers: “improve quality”, “mitigate risk”, “provide assurance”, “because audit said we need to test”*. Unless you’ve been under a rock for the last couple of decades, you’ll know that there’s a lot of disagreement with these kinds of statements.

It occurred to me that, if so few testers have a clear sense of why they are testing, then much of their testing is in fact purposeless; and without any sense of purpose, it is easy to wind up doing a lot of things that add no value whatsoever.

I decided to start using the circles to help me address that, and to start tackling the empowerment paradox. Unfortunately, early attempts failed dismally: a lot of people got hung up and what’s a what, what’s a how and what’s a why. Much confusion! So I changed the model. Instead of Sinek’s why, how and what, I swapped in “purpose”, “principles” and “practices”. To round things off, I added “people”, giving us a model that looks like this:









In the final post in this series, I’ll explore this model in greater depth.

*”Because audit said so” is a phrase guaranteed to drive me Gordon Ramsey. I have no problem with the auditors themselves, but rather with the use of the word “audit” as an attempt to shut down arguments, or to excuse shoddy practices. Suffice it to say, that this tactic rarely works on me.

Principles Not Rules, part 1

[This week I presented At EuroSTAR 2015. My subject? How testing can be well governed without recourse to standards, and how an emphasis on principles, rather than rules, empowers the tester, freeing them to perform better testing than a likely to be achieved under a command and control regime. This series of posts is drawn from my presentation notes.]

The Empowerment Paradox

Testing, as commonly practiced, has lost its way. But I’m jumping ahead. Let me explain.

For much of my working life, I have been a consultant. One of the benefits that this affords is the opportunity to meet lots of people. And I enjoy speaking to people about their testing: how they approach it, why they think they’re doing it, what they feel they get out of it.

On one notable occasion, whilst presenting to a PMI forum – a group of project and programme managers – I played a game of word association and asked “What’s the first word that springs to mind when I say ‘Testing'”. “Stinks” was the overwhelming response.

And to be brutally honest with you: “stinks” is the safe for work version.

This can be a hard message for a tester to hear. We often bemoan the fact that many of our colleagues don’t “get” testing, or that our stakeholders don’t seem to understand the value of what we do. Unfortunately, in my experience, when the customer of a service doesn’t see the value in it, this often means that there IS NO VALUE in it.

Now, don’t get me wrong. I’m not saying that my experience of testing has been universally stinky. I’m a context driven tester, and many of my best experiences of testing were on small projects where we were very much context driven: we sought to understand what our projects needed to know and designed testing to respond to those needs. And it worked.

Unfortunately, something seems to happen at scale. When projects are grouped and we seek “consistency”, or when we work within large organizations that promote some form of standardization, we start to take decisions away from those people most firmly rooted in project context. We take the decisions away from those people most likely to make good decisions about how to test.

I’m no exception! One of my first attempts at scaling CDT was to write a handbook that mandated practices often associated with context driven testing. The results were horrible. I had made a mistake that I see people making time again: mistaking CDT for a bag of practices. It isn’t. It’s a bag of ANY practices, and more than that: it’s a philosophy that empowers individuals to make their own choices about testing.

Unfortunately, empowerment is hard. The very structures that put one person in a position to “empower” another will often undermine that attempt at empowerment. This paradox brings me to my current challenge…

Party like it’s 1979

We are rolling back the clock so as to prevent

you from finding better ways to test software.

Through this work we will make you value:


Management control over individual accountability

Documentation over finding out about software

Policing the lifecycle over collaboration with the team

Detailed test planning over exploration and discovery


That is, we can’t even begin to imagine

how you might have come to value the things on the right.

-The International Organization for Standardization

Stop 29119

Following an excellent presentation by James Christie at CAST2014, I participated in drafting a petition calling on the International Organization for Standardization (ISO) to withdraw those parts of the ISO 29119 standard for software testing that have been issued to date, and to suspend production of the remaining parts.

You can find the petition here:, and this is why you should sign it.

A Warning From History

In 1979, with support from the Thatcher government, the British Standards Institution (BSI) first published the BS 5750 series of standards. These made provision for organizations to become “certified” and display a mark of registration, supposedly a sign of quality. By 1987, the British Government had convinced the International Organization for Standardization (ISO) to adopt BS 5750 as the basis for an international set of standards: ISO 9000.

Adoption was rapid. In the UK this was driven by Department of Trade and Industry (DTI) grants to firms who chose to register, regulation (e.g. the 1993 adoption of ISO 9001 conformance as a requirement of Oftel’s Metering and Billing scheme), and market coercion: the threat that large purchasers would only source goods and service from suppliers who were registered. Similar patterns emerged internationally, with many organizations being convinced that EU adoption of the standard meant that registration was a cost of doing business in Europe. By 2009 over a million firms were registered worldwide: over a million firms supporting an ecosystem of consultants, training providers, and assessors.

Nothing so dramatic has yet been seen in the world of software testing. When compared to the uptake of ISO 9000, IEEE 829, BS 7925 and their ilk seem to have been largely ignored. No wonder many testers seem to view the new ISO 29119 series of software testing standards with apathy.

Yet ISO 29119 could be the ISO 9000 of software testing. Whilst other testing standards were relatively limited in scope (documentation, component testing etc.) 29119’s stated aim is to “define [a] set of standards…that can be used by any organization when performing any form of software testing” (ISO/IEC/IEEE 29119-1:2013, Introduction). Any form of testing, in any organization, in any context. That means YOU.

Further, ISO 29119 is designed to support conformance and registration: conformance may be claimed to parts 2, 3 and 4, ISO 29119-2 describes how full or partial conformance with its processes might be achieved, the website of the ISO working group responsible for the standards describes how ISO/IEC 33063 would be used to assess test processes against ISO 29119-2. ISO 29119 has all the makings of a registration regime that would extend to any testing, anywhere. The bonfire is set; all that is needed is a spark to light it.

In the late 70’s, such a spark came in the form of Thatcher’s desire to repeat the “Japanese miracle” and reinvent British Management. What might ignite ISO 29119? Take banking, with it’s explosion in regulation following the last crash. Now imagine an event, a software failure, a Flash Crash or Knight Capital writ large, something that causes significant economic damage. Or take an airline, with hundreds of souls aboard. Or drug production. Or any of dozens of examples: software is everywhere. Now imagine the outcry: “No Senator, despite the existence of internationally agreed standards, we did not apply those standards to our testing1.

It’s not terribly hard to imagine, indeed it may already be happening: some vendors are calling for the adoption of standards as a result of the fiasco2. Love it or loathe it, the one thing you cannot afford to do is ignore ISO29119.

A Flawed Approach

Standards in manufacturing make sense: the variability between two different widgets of the same type should be minimal, so acting in the same way each time a widget is produced is desirable. This does not apply to services, where demand is highly variable, or indeed in software, where every instance of demand is unique.

Attempting to act in a standardized manner in the face of variable demand is an act of insanity: it’s akin to being asked to solve a number of different problems yet merrily reciting the same answer over and over. Sometimes you’ll be right, sometimes wrong, sometimes you’ll score a partial hit. In this way, applying the processes and techniques of ISO 29119 will result in effort being expended on activities that do nothing to aid the cause of testing.

And in testing, that’s a major problem. When we test, we do so with a purpose: to discover and share information related to the quality. Any activity, any effort that doesn’t contribute to doing so is waste. As “complete” testing is impossible, all testing is a sample. Any such waste results in a reduction in sample size, it equates to opportunity cost: an opportunity lost to perform certain tests. For a project constrained by quality, this translates into increased time and cost. For a project constrained by time or money, this translates into a reduction in the information available to stakeholders, and a corresponding increase in risks to quality.

The 29119 crowd might tell you that the new standard takes this into account, that it encourages you to tailor your application of the standard to each project, that it is sufficiently comprehensive that you need only select the processes and techniques that apply. This is the Swiss Army Knife fallacy: if you have one you’ll never need another tool. One of the problems with a Swiss Army knife is that it’s not much use if you need a pneumatic drill, or an ocean liner.

Training testers to use a standard in this way has a tendency of framing their thinking. Rather than trying to solve testing problems, they instead seek to choose from a set of ready-made solutions that may or may not fit. I once conducted a highly informal experiment with two groups of students. The first group was trained in a set of formal test design techniques. The second received a short briefing on some general testing principles and the use of the heuristic test strategy model3. Both were then tasked with creating a set of test ideas for the same product. The difference between the ideas generated by the two groups was stark. The first group came up with a predictable set of equivalence classes etc., whilst the second group came up with a rich and varied set of ideas. When released, and if widely adopted, part 4 (on test techniques) will give rise to a generation of testers locked firmly inside the 29119 box, without the ability or freedom to solve the problems they need to solve.

And that’s not the worst of it. For my sins, I spent a number of years as an ISO9000 auditor. It seemed like a great idea at the time: understand the system, monitor the system, improve the system. Gradually, I realized this wasn’t the reality.  People were documenting the system, documenting their reviews, documenting their responses to audit findings, and doing very little by way of improving the operation of the business. What the hell was going on? Goal displacement, that’s what. We’d created a machine geared towards complying with the standard, demonstrating conformance to the satisfaction of an assessor, and maintaining our registration once obtained. Somewhere along the line, the goal of improving our business had been forgotten. This phenomena isn’t limited to organizations that seek compliance with external standards. Not so very long ago, I watched an organization doing much the same whilst attempting to standardize their testing processes. Significant effort was directed to completing templates for test documentation, reporting metrics and self-assessing vs. the internal standard – all with no regard for the relevance or value of doing so for the projects that testing was meant to be serving.

Waste, waste, and more waste.

So when standards proponents tell you that following ISO 29119 will improve the efficiency or effectiveness of your processes, call them out: far from making testing more efficient or effective, conformance will have the opposite effect.

No Consensus

The text of ISO 29119 claims that it is “an internationally-agreed set of standards for software testing”. This agreement is meant to be the product of consensus, defined by ISO as “general agreement, characterized by the absence of sustained opposition to substantial issues by any important part of the concerned interests and by a process that involves seeking to take into account the views of all parties concerned and to reconcile any conflicting arguments” (ISO/IEC Guide 2:2004).

There is no such consensus. Instead there is a small group of members of a working group who claim to represent you. Meanwhile, hundreds of testers are calling for the withdrawal of ISO 29119.

There is no consensus; there will be sustained opposition. Join the opposition:



1 For more on this theme, read “Uncle Bob” Martin’s After the Disaster

2 My thanks to James Christie for drawing attention to this during his CAST 2014 presentation



Tasks? Whither the Test?


On Friday, via Twitter, @michaelbolton asked @rbcs about the unit of measurement for test cases. To this, @rbcs replied:







A test is a task? Sounds reasonable.

But wait, a test is surely two tasks? An action and an observation? Or is it three? An action, an observation and an evaluation?

But wait! What if the test is not completely trivial to set up? What if it takes many separate tasks to configure a system in order to conduct a single test? Perhaps a test consists of many, many tasks?

Then again, Rex’s tweet suggests he is referring to tasks in a project management context. Please imagine a Gantt chart. I can’t say that I’ve ever seen project planning down to the individual test case – it is more normally the case that tests are wrapped up into a higher-order task on the plan. So perhaps a test is but a fraction of a task and not a whole one?

Also, in a project management sense, a task might be of any size, from a few hours to many days effort an duration.

So, it would appear that a test could be an indeterminate number of tasks of indeterminate size.

Now /that/ seems like a sound basis for a unit of measurement.

It get worse.

Ever performed a test that revealed something interesting and unexpected? Where a single test spawned many follow up tests aimed at isolating and qualifying the impact of your discovery? Tests begat tests.

Ever experienced the opposite? Where a test turned out to be irrelevant or simply not viable? Tasks may have been performed, but you are left having performed no test at all? Just as tests are spawned, so they can disappear.

Imagine that you have employed a contractor to build you a house made of bricks. Imagine that the bricks vary in size from Lego proportions to that of boulders. Imagine that, when laid, some bricks  spontaneously vanish, whilst others materialize miraculously in place. The contractor, of course, reports his progress by telling you “42.3% of bricks have been laid“. I’d be inclined not to trust the contractor.

Of course, bricks don’t behave that way: they are real, concrete phenomena. Tests are not. Tests are constructs, abstractions.

Whither the Test?

But what does this mean? What constitutes a test case? This can be particular tricky to answer.

Let’s take the example of a project that  I participated in last year. My team were testing an ETL solution, and were focused on testing the rules by which data, extracted from a variety of source systems, was transformed in order to load it to a single target system. Testing was performed by conditioning real data (to cover various conditions evident in the transformation rules), predicting the results of transformation for every cell (table/column/row intersection) within the source data set, and reconciling ETL results against our predictions.

So, what is a “test case” in this example?

The tools we created for this purpose were capable of performing in excess of ten million checks per minute. Over the course of a particular test run, we were performing perhaps a billion checks. Were we executing a billion test cases?

Now, those checks were performed at a field level. In most cases, the transformation logic was tied to an individual row of data, with combinations of field values within the record contributing to the outcome of each transformation. In this way, each row might be seen as representing a particular combination of conditions. We were testing with a few million rows of data. Were we executing a few million test cases?

Of course, many of these checks were seemingly redundant. The underlying transformation rules represented in the order of two thousand different outcomes, and a given data load might result in many, many instances of each outcome. So were we only executing two thousand unique test cases?

Each test run was orchestrated over the course of about a week. Typically, each run was conducted with a new set of test data. Conditioning data took considerable time, as did analyzing results and potential anomalies. If we conceive of our tools as being scientific instruments and the ETL implementation, in combination with any given set of data, the subject of our enquiries, then perhaps we should consider a test run to be a single experiment, a single test. Were we only performing only one test, albeit a complex one, each time?

Any of these, from one to a billion, might be an appropriate answer dependent on how you choose to define a test case. For our purposes, with an eye to coverage of conditions and outcomes, we chose to count this as being two thousand test cases. There was nothing inherently “correct” to this, it was simply a decision that we made on the basis that defining a test case at this level seemed useful.

Test cases are how you choose to define them.

ET: Why We Do It, an article by Petter Mattson

What follows is an article by my colleague Petter Mattson.

Petter and I recently made each other’s acquaintance after our organizations, Logica and CGI, merged.  An experienced test manager and an advocate for exploratory testing, Petter wrote this article for internal publication within Logica. Unfortunately its contents were sufficiently divergent with the official testing methodology that it was never published.  Many of the points in this piece resonated for me, and I was determined that it see the light of day.

I’d like to thank Petter, and his management at CGI in Sweden, for allowing me to publish it on Exploring Uncertainty.


Click here for Petter’s article