Tasks? Whither the Test?

Tasks?

On Friday, via Twitter, @michaelbolton asked @rbcs about the unit of measurement for test cases. To this, @rbcs replied:

Tweet

 

 

 

 

 

A test is a task? Sounds reasonable.

But wait, a test is surely two tasks? An action and an observation? Or is it three? An action, an observation and an evaluation?

But wait! What if the test is not completely trivial to set up? What if it takes many separate tasks to configure a system in order to conduct a single test? Perhaps a test consists of many, many tasks?

Then again, Rex’s tweet suggests he is referring to tasks in a project management context. Please imagine a Gantt chart. I can’t say that I’ve ever seen project planning down to the individual test case – it is more normally the case that tests are wrapped up into a higher-order task on the plan. So perhaps a test is but a fraction of a task and not a whole one?

Also, in a project management sense, a task might be of any size, from a few hours to many days effort an duration.

So, it would appear that a test could be an indeterminate number of tasks of indeterminate size.

Now /that/ seems like a sound basis for a unit of measurement.

It get worse.

Ever performed a test that revealed something interesting and unexpected? Where a single test spawned many follow up tests aimed at isolating and qualifying the impact of your discovery? Tests begat tests.

Ever experienced the opposite? Where a test turned out to be irrelevant or simply not viable? Tasks may have been performed, but you are left having performed no test at all? Just as tests are spawned, so they can disappear.

Imagine that you have employed a contractor to build you a house made of bricks. Imagine that the bricks vary in size from Lego proportions to that of boulders. Imagine that, when laid, some bricks  spontaneously vanish, whilst others materialize miraculously in place. The contractor, of course, reports his progress by telling you “42.3% of bricks have been laid“. I’d be inclined not to trust the contractor.

Of course, bricks don’t behave that way: they are real, concrete phenomena. Tests are not. Tests are constructs, abstractions.

Whither the Test?

But what does this mean? What constitutes a test case? This can be particular tricky to answer.

Let’s take the example of a project that  I participated in last year. My team were testing an ETL solution, and were focused on testing the rules by which data, extracted from a variety of source systems, was transformed in order to load it to a single target system. Testing was performed by conditioning real data (to cover various conditions evident in the transformation rules), predicting the results of transformation for every cell (table/column/row intersection) within the source data set, and reconciling ETL results against our predictions.

So, what is a “test case” in this example?

The tools we created for this purpose were capable of performing in excess of ten million checks per minute. Over the course of a particular test run, we were performing perhaps a billion checks. Were we executing a billion test cases?

Now, those checks were performed at a field level. In most cases, the transformation logic was tied to an individual row of data, with combinations of field values within the record contributing to the outcome of each transformation. In this way, each row might be seen as representing a particular combination of conditions. We were testing with a few million rows of data. Were we executing a few million test cases?

Of course, many of these checks were seemingly redundant. The underlying transformation rules represented in the order of two thousand different outcomes, and a given data load might result in many, many instances of each outcome. So were we only executing two thousand unique test cases?

Each test run was orchestrated over the course of about a week. Typically, each run was conducted with a new set of test data. Conditioning data took considerable time, as did analyzing results and potential anomalies. If we conceive of our tools as being scientific instruments and the ETL implementation, in combination with any given set of data, the subject of our enquiries, then perhaps we should consider a test run to be a single experiment, a single test. Were we only performing only one test, albeit a complex one, each time?

Any of these, from one to a billion, might be an appropriate answer dependent on how you choose to define a test case. For our purposes, with an eye to coverage of conditions and outcomes, we chose to count this as being two thousand test cases. There was nothing inherently “correct” to this, it was simply a decision that we made on the basis that defining a test case at this level seemed useful.

Test cases are how you choose to define them.

8 thoughts on “Tasks? Whither the Test?”

  1. “Test cases are how you choose to define them.”

    …and then they aren’t. What’s a car? You could define it as a vehicle that has an engine, four wheels, and seats for at least two people, and someone can always come along and produce examples of things that don’t fit that description and that are still cars, and things that do fit that description and aren’t. Nothing wrong with doing that, until you base systems of measurement on it, try dress it up as science or engineering, and deceive people (including yourself) maliciously or innocently.

    The idea of counting test cases is like a bogus quarter that keeps coming back into the coin return slot, only to have people keep feeding it back into the machine. From six years ago: http://www.developsense.com/articles/2007-11-WhatCounts.pdf

    —Michael B.

    1. Okay, so for any definition of test case we should be able to do this. I made this up on the spot with some stealing from Wikipedia:

      “A test case is a set of explicit conditions under which a test or check is executed, the results interpreted by a tester to attempt to determine if there is a problem with the test item”

      Of course am I just declaring a test case to be this and then putting myself ad-hoc into a defensible position. Anything that doesn’t meet my definition becomes a non-test case… in which case is there something that a reasonable person would call a test case that doesn’t fit my definition?

      Just interested to see if this works. Perhaps the problem is that our definitions of “test case” don’t allow for them to become useful metrics. We can define “car” as a vehicle that has an internal engine, some seats and space for at least 2 people, then deny that anything else is a car, then count “cars” on the motorway for an hour. If we know what we meant by “car” and what counting them achieved then we can use that as our cars-on-motorway-in-an-hour metric.

  2. Great article.
    What happens though when management take that 2000 number and run with it? They start trying to measure your current progress against this ‘target’ of 2000. And then because your tests are exposing other tests, your initial 2000 number is becoming increasingly inaccurate? How can we quickly and concisely explain this to those that are trying to measure progress? And indeed how can they measure progress against a target (final, total number of test cases) that is inexact, changing and potentially never reached?

    1. Joseph,

      Thanks. Early in the project, I frequently pointed out that this number would be meaningless as a means of measuring progress. I suspect there might have been some disbelief, however, once we’d begun testing and it became clear to everyone that those “tests” could be executed within an hour or so of spooling the tool up – people started to realise it was more of a measure of coverage than of progress. And yes, the “number” changed, as we learned more about the solution, and added/subtracted tests.

      Now, I feel that it is entirely rational of our stakeholder to expect some sense of “doneness” from us. Test case counts however, for the reasons you give, and the reasons I have blogged about, can be exceptionally misleading. I’ve come to prefer rather more qualitative dashboards.

      -Iain

  3. IMHO when you have significant amount of test-cases, there is no matter how big or small the particular one is. A biggest part of them are about equal in size (for specific project), so the progress can be measured with adequate precision.

    1. This assertion is fairly common: “the differences will net out”. It assumes that the size of test cases will be normally distributed, and that the average size is in any way meaningful, useful.

      Even were this the case, such a metric can only indicate progress through the tests that you had planned to perform. What of the dozen new tests you end up performing when isolating and qualifying a bug? Or the hundreds you might like to perform when you identify a previously unthought of set of risks?

      This is perhaps my greatest objection as to measuring test progress in this way. I have seen testers driven to hit daily test case targets so as to “stick to the plan”, and I have seen this force them to raise CRs in order to add additional tests to their scope – or worse to ignore potential problem areas -, or “report bugs and run” instead of doing the follow up testing necessary to produce bug reports that were of any kind of professional standard.

      Dysfunction follows these metrics.

  4. Love the way you subverted the house building analogy so (misguidedly) beloved of project management howto writers

Leave a Reply

Your email address will not be published. Required fields are marked *