Engineering Improvement Runbook | Engineering Practices
Don Brown
October 21st, 2021
Code Coverage: Is 90-95% really necessary?
In the video above, Sleuth's CTO, Don Brown, discusses testing and code coverage. Below are his key points. Read on or watch the video to learn about the testing pyramid and different scenarios for code coverage.
Testing pyramid
There is a pyramid for testing. The size of the pyramid layer correlates to how much time you should spend on the test or how many of them there should be. So at the bottom, there should be a ton of tests, and at the top, there should be very few.
At the very top of the testing pyramid are smoke tests. These are tests that you run to verify that you didn't break anything massive. It should be one to three tests at most. And ideally, you can run these tests against production or any other environment. For example, you can deploy to staging, run the smoke test to verify that staging is not broken, deploy to the next environment, and run the smoke test there.
At the next level down in the pyramid are end-to-end tests. These are browser based tests. In the middle of the pyramid are integration tests, which are unit tests but they also hit the database. So it's going through key things like, can I create a page? Can I delete a page? And you're able to look at the database to see if it worked okay. And that's called an integration test because you're not just testing the code, you're actually testing the data sources as well. And you might even test multiple layers of code.
Down at the bottom of the pyramid, you should have the most tests, and these are unit tests. These are ones that should have zero dependencies on anything. If you have an add function, you're passing in two plus two, and you're asserting that the response is four. You're not going out to the database. You're not going out to the internet. You're not hitting a microservice to make this happen. You're just testing that function.
Code coverage and targets
But when it comes to setting targets for code coverage, generally speaking, it depends on what you're doing and what kind of code you're working with. For example, let's say you're writing an algorithm which would parse HTML, arbitrary HTML, and extract some data from it. In that case, a 100% test coverage is the bare minimum. This is a very complicated piece of code that needs to test all kinds of different edge cases. Tests should be hitting lines of code 10, 20 times. So it's 1,000% percent test coverage, if that's really a thing. You should just be testing the hell out of this thing.
Or, if you're creating a protocol on how service A talks to service B, and you want to test how service B is taking these messages and interpreting them, you want to test that a lot, too. You want to make sure that that is completely covered of all possible edge cases from the unit test standpoint, because that's core to your system.
Where people often get messed up is that they start saying, "Well, I need a 100% test coverage, period." And then what you see them do is you'll have a function. And what this function does is, it takes three arguments, it calls four different internal services to do something and return something.
So, it's a four line function. And they'll say, "I need to have unit test coverage." So they'll have to create a mock for service A, a mock for service B, a mock for service C and a mock for service D to even call this function. So they call a function with three arguments. Mocks are driving all four lines of code, and then it's returning what the mock returned. At that point, you're not testing anything useful. You're wasting your time.
The classic one in Java land is testing getters and setters. If I set a value, can and I get it back? Well, yeah, it's a getter setter. That's all they do. There is no other logic happening. Why are you writing a unit test that covers that? So code in the middle is called glue code, because it's gluing one layer to another. Often that doesn't need to be tested as well. There are a number cases where you're not going to be doing anything that's worth testing at that level.
But something like a parser or a protocol implementation should be tested massively. A validator, for example, is a function that validates an email address. And maybe it's some regular expressions, only 10 lines of code. You should have a ton of unit tests for that because you should be passing in all kinds of different values. You should be giving it all kinds of crazy things, null values, different character sets, different string lengths, different characters in those things. You should throw in some 32-bit Unicode characters. You should be doing whatever you can to break it. In the process, you're doing two things: 1) you're making sure that what you wrote is bulletproof, and 2) you're documenting what that code or protocol is.
Unit tests aren't necessarily the best documentation strategy, but they are a documentation strategy. So if a new developer comes in and a good set of unit tests exists, they can see, "Ah, it shouldn't take null values. It shouldn't take empty strings. It shouldn't take this, and here's tests to prove it." They're going to have a lot more confidence. If they look at a piece of code that looks really complicated and there's no test, they won't know what it's supposed to do and not do.
And when you don't know, your risk increases. When your risk increases, and when you do make a change and go into production, the chances of it breaking in production are higher. Your Change Failure Rate is now going to go through the roof because you have no idea if what you're doing is going to break something. So you're trying to go faster. Now you're breaking things more, but have no test to cover it. You don't know what it's supposed to do. You're going to shoot yourself in the foot. So, you need end-to-end apps. And sometimes, you do need 90 to 95% coverage, perhaps 10,000% coverage. But sometimes you really don't.