Does TDD Really Work?

Since TDD gained exposure in the industry, lots of people ask:

Does TDD really work?

This is a perfectly valid question and many TDD-ers I know tend to avoid it, probably because they don’t know or because they fear the answer. TDD is wonderful at personal level; it makes you feel very good about yourself because of the continuous reward system that’s ingrained in the practice. It’s hard to let it go.

But does it really work?

I think this isn’t the correct question. I think that we should ask:

Can TDD produce more value than other techniques?

Under what circumstances TDD will fail?

Let’s take them one by one.

Can TDD produce more value than other techniques?

Note that this question is about potential, so we can choose a bunch of very good developers who know TDD and see if they produce more value compared to another team that doesn’t do TDD. Right?

Wrong.

Unfortunately, all such experiments in the software industry are disputable. We cannot design a scientific experiment because it cannot be repeatable. This is valid for all software development practices, and it’s a key issue in improving the industry. Everything is disputable.

In this particular scenario, we have the following issues with the experiment:

  • Maybe the team with better results is more experienced
  • Maybe the team with better results was more relaxed or slept better
  • Maybe they already did such a project before
  • Maybe they worked together / talked / drank beers together before
  • Maybe the project was not chosen right to show the differences
  • If we repeat the experiment with the same teams and the same project, the context is different – they already know what to do

etc. etc.

So, what can we trust? I think that we are left with the following:

  • Personal experience
  • Personal intuition
  • Collective experience of the full industry
  • Collective experience of a sample of the industry (e.g. the best thinkers, the largest companies, the most successful companies etc.)
  • Statistical data

The strongest proof that one technique works better than another is statistical data, if it’s collected and interpreted correctly. Unfortunately, we don’t have that data, although there are some efforts to collect it. Thus, we have no scientific proof for any practice in software industry. Really.

The only thing we have is empirical evidence of the kind:

I did that and it worked better for me. Maybe you could try it too.

Thus, can TDD produce more value than other techniques?

All we have is empirical evidence.

Experienced, smart people from the trenches say it does. I’m talking about Kent Beck, Ward Cunningham, Robert C. Martin, Alistair Cockburn, J.B. Rainsberger, Roy Osherove, Cory Foy, Corey Haines to name just a few.

Other experienced and smart people say it doesn’t. Joel Spolsky and Jeff Atwood were publicly dismissing TDD in one of the stack overflow podcasts. I’m pretty sure we can find others who don’t think TDD is good. (Their opinions are disputable also, because Joel Spolsky is not a developer and I don’t know of Jeff Atwood ever saying he tried TDD. But let’s not dismiss them because their critiques could be useful).

I think that, under these circumstances, the only productive way of thinking on a personal level is:

I will try TDD with someone who knows what he’s doing and see if it works for me and in my context.

If we come back to empirical evidence, it seems logical what TDD practitioners say: TDD would decrease the maintenance time and increase the flexibility of the design with the expense of an increase in the development time. There are studies showing this result, but we can’t really trust them because of our previous discussion.

So, the final answer we can give to our question is, for now:

Empirical evidence suggests that under good circumstances, TDD reduces the maintenance time and increases the development time, providing more value overall by reducing defects and improving flexibility.

Until we will have a statistically significant collection of data to analyze, we cannot prove or disprove scientifically this empirical evidence.

Even then, the results might be incorrect because the success of any software development practice depends strongly on the context.

Is TDD good for you? You can only see if you try it in your context.

Under What Circumstance TDD Will Fail?

From the discussions about TDD that I’ve followed, we can easily gather a few reasons for TDD to fail:

Developers that are not skilled with TDD

Complicated or long-running tests are at best a waste of time. They not only slow down development but it comes a time when developers don’t run them anymore, so the whole effort to write them in the first place was in vain.

Areas where TDD doesn’t work

Parallel programming is inherently difficult because it’s less deterministic. In theory, one can TDD a process and use other techniques for the orchestration between them, but I don’t know anyone doing it.

Combinations between hardware and software are another difficult area because hardware cannot be TDD-ed. In theory, automatic tests could be used for hardware.

Certain languages make TDD much more difficult than others. C++ is one of the notorious examples.

Algorithms are another area where TDD seems to work less efficient or not at all. Could you discover a better sorting algorithm using TDD? It doesn’t seem feasible. However, TDD can help you to discover that you need a sorting algorithm and then it’s your choice which one to use. (I think this is due to the fact that creating algorithms is a scientific practice. One can prove that an algorithm is better in a scientific manner, and that’s hard to beat).

My favourite limitation is that TDD doesn’t help you create a product that sells. It can help you create a product that’s doing what it’s supposed to do by providing basic correctness, but it will not help you sell the product. So you may want to sort that out first. (Interesting enough, customer development seems the best way to do this nowadays, and it’s very similar to TDD).

Any other examples?

If you know another example of TDD not working, I would really like to learn about it. I cannot think of others.

The Answer: Does TDD really work?

On the company/project level:

If

you are in a context that allows TDD OR

if you can create a context where TDD works AND

if your business would benefit from reducing maintenance and increasing flexibility with the cost of longer development time

THEN

You should try TDD in your context.

Nobody can guarantee that it will work, but we know it’s certain it won’t work if you aren’t in the right context.

On the developer level:

TDD is a technique that you may encounter sooner or later. It is thus good for you to try it, to see if it works for you and to become skilled with it if it works for you. The worst thing that can happen is that you try another way of looking at development.

More complicated than you expected? Sorry, that’s not my doing, that’s how life is…


Related:

What I’ve learned from J.B. Rainsberger and Corey Haines

My take on Software Craftsmanship

Unit Testing, Automatic Testing, TDD – Pros and Cons

Lecțiile de TDD

Let’s have some fun:

  • “Joel Spolsky is not a developer”

    Sure he is! Straight from the Microsoft school of software development. And we all know the quality of their software. :))

    • Alexandru Bolboaca

      Oh, right, I remember his famous discussion with Bill Gates.

      Still, I’m pretty sure that he’s not writing software anymore, from a long time, and focused on business instead.

      Either way, his opinions are interesting – even if I personally disagree with him, he may be right.

  • TDD with C/C++ on embedded systems isn’t really that difficult… at least not to the extend I would say “TDD not working in this area”

    • Alexandru Bolboaca

      Thanks for the remark, I realized how to improve the article.

      I think it would be interesting to separate the fundamental limitations of TDD from the accidental ones. Technology barriers (languages, embedded etc.) can be overcome in certain ways so are accidental. The fundamental limitations seem to be the creation of algorithms and parallel programming (at least on the interaction level), since it seems they cannot be avoided.