Friday, August 5, 2011

A unit test case study

I've been trying to pay more attention to my process for fixing broken unit tests and I'm trying to improve my writing skills, especially when it comes to descriptive narration, so I'm going to fix a test and write down my thoughts as I go. I'll change the business object names to Widgets and Things, but everything else is exactly what happened as it happened. Here goes.



I ran some of our unit tests and the first one to fail has this error message:

XXX.Test.YYY.Widgets.WidgetsTest:
  Two things should be available.
  Expected: True
  But was:  False
Uh, ok. No indication of which two things were expected or which two things were found. Not a very useful error message. The test name doesn't help me much since the Widgets domain encompasses so much. The Widgets code does use Things though so that part makes sense.

I'm guessing the requirements and code changed so instead of two Things available there are now more. This is what happens with at least 95% of the broken tests I look at. I've gotten all I can from the test name and assertion message so it's time to take a look at the test itself.

public void WidgetsTest()
{
    // 44 lines of code
    Assert.IsTrue(thingList.Count == 2, "Two things should be available.");
    // 17 more lines of code
}
Ugh. I'm not even going to try to wrap my head around this yet. Some people are really good at reading through long methods where state winds in and out of methods but I'm not. I'll start with a small change and use AreEqual instead of IsTrue so the failure message indicates what was expected and what was found.
public void WidgetsTest()
{
    // 44 lines of code
    Assert.AreEqual(2, thingList.Count, "Two things should be available.");
    // 17 more lines of code
}
At least now I can see how many were available. It would be better if I know which two Things were expected so I could assert that only ThingA and ThingB are available. Even better than that would be knowing why so I could add that to the message, something like "ThingA and ThingB should be available if Widget.IsAwesome()". That way someone could see it and say "yup, there's a bug" or "no that's not true anymore". At least that would work better for me but I remind myself that I can't really fault others for writing tests in the style they prefer to work with.

Now I wait nearly five minutes while the tests rebuild then wait about a minute for the test to fail again. New error message:

XXX.Test.YYY.Widgets.WidgetsTest:
  Two things should be available.
  Expected: 2
  But was:  3
That's what I expected. I'm pretty sure no one updated the test when they changed the code. Had it said 0 or 500 or something interesting like that then it might be a bug but I'd say there's now a 99% chance it's just an invalid assertion. There is a reference in the test to an issue number so I look that up....

WTF? This issue has absolutely nothing to do with with Things or Widgets! Maybe it's a typo? Maybe it's a bad April Fools joke? Maybe the test was copied and pasted and this reference was left in? I've noticed that's common; I guess people copy the entire thing and leave in a bunch of irrelevant code and assertions that are no longer needed. I search the tests for the original assertion and see that there's one other test with the same line of code. Not only is almost the entire test is the same as the one that's failing but this one is also older so it's almost certainly a copy-and-paste job. This original test does have a comment with an issue number but when I look that up I find the issue has nothing to do with Widgets or Things either. Maybe I can check source control to see when these tests were made and who made them?

The original developer of both tests is long gone so I can't ask him what this is really about. Another developer who's focus is a different domain has made changes to the failing test eight months ago. He probably won't remember but I'll ask him later if I run into a dead end.

I try looking at our continuous integration logs to see when the test started failing but no luck: our logs only go back a couple months and this has been broken the whole time. *sigh* We've always got broken tests but this one has actually been broken since the beginning of recorded history. Looks like I'll have to dive into the test code and see what it's doing.

The test starts by dropping the database then reloading a backup (I've been here 5 or so years and brought this up before and I still don't know why so many of our tests do this), creates a handful of objects while doing a ton of work with them, reading and writing to the file system and database through it all. It even clears some caches along the way; I've always wondered if so many of other peoples tests do this because of superstition or to make sure we do as much work as possible. There are several comments but they just say the same thing as the code following it with no indication of why any of it is being done. This test seems to be running through some very specific scenario involving Widgets from beginning to end. There's a lot of data and I'm not sure what's relevant and what just happened to be that way. There are 14 scattered assertions, not including the assertions buried within other methods, on seemingly random objects and values. A lot of this work is somewhat related to Widgets but this is the only line related to Things. In fact, only one of the assertions is directly related to Widgets.

Time to recap: I've got a failing assertion in a large test filled with seemingly arbitrary assertions and a lot of work. The test name says nothing about what is expected or why. This failing assertion says nothing about what is expected or why. The issue it refers to is completely unrelated to Things or Widgets and the original developer has moved on to another job. I have no idea when it started failing or why and what it's asserting on doesn't even seem to be relevant to the other work happening in the test. This is all quite normal for our tests. Luckily it's the end of the day so I send an email to the other developer who worked on this eight months ago and leave this until tomorrow.

Here's the email:
Unit test XXX.Test.YYY.Widgets.WidgetsTest has a failing assertion on line 2199. The message is "Two Things should be available.". It looks like this test is from you and Mr Doesn't Work Here Anymore. Are there two specific Things or are any two Things acceptable? Why are those two Things expected and why are three Things a problem?

Hopefully he has some answers.



When I get to work the next day there's no response from the developer I emailed. I move on to some other tasks until I can ask him or someone else.

Eventually another developer is available to help me with this one and we walk through almost the same steps that I did. It turns out that if you look at the beginning of the test a certain way and you look at the issue a certain way, then the test looks like it's related to something implied by part of the issue. The weird part is the test has only one assertion related to that and that assertion is the exact opposite of what the issue says.

After about an hour of rereading it and commenting out chunks of the test to see what happens we figure that the last half of the test is completely unnecessary and we delete it — including the failing assertion. Now the test passes and I can move on to the next test.

No comments:

Post a Comment