You have to have tests for all combinations though. At least those combinations that you actually want to use. You get the same problem when your code is a big ifdef-hell.
This test gives an extreme advantage to lazy functional languages that they don't enjoy in any realistic context.
Trying to come up with a single test that is good turns out to be surprisingly hard. In modern languages I'm looking for the ability to do clean functional code, stateful object code, async programming, a solid module system, and a type system that doesn't get in my way. I really can't imagine a single 5 line program that would encapsulate all of these areas.
It seems plausible that people would write shorter, clearer functions that
have no possibility of being incorrect if they had no tests to gain the
confidence of them being correct.
Also, designing for testability only gives you testability. If you ditch this
requirement, suddenly you're free to design directly for ease of use, not for
some proxy like testability.
Testing / verification is the hardest problem in computing. I don’t think the author actually understands why that is and has a few superficial gripes with how codebase are tested. That’s not to say that a lot of test suites are actually good (they’re not). That criticism may be fair.
Static types do not help at all with software quality, beyond catching some typos. You can never, ever write a static type that enforces anything of substance, because for that to happen you’d need to execute arbitrary code within the compiler (as happens with dependent types, and of course there is the constant fear of undecidable or infinite type checking in that system).
The classic example is that there’s an infinite number of functions with type int -> int. The logic of these functions are extremely varied, and the type doesn’t help you at all. Enum types are very useful for knowing what values you are allowed to write, but don’t help with correctness.
The author brings up large input spaces, but conveniently leaves out input space partitioning, which any non-novice tester has to employ in their test case management. The category partition method is a fantastic way of covering large swaths of a seemingly infinite input space with relatively few tests. Of course there’s combinatorial explosion, but that can be mitigated with constraints on the input combinations.
I’m not saying testing is perfect - it’s not. But, the burden of proof is on the critic of testing, not testing itself. How else can you reliably change a large software product dozens of times a week, for 20 years in a row? And what’s the alternative when I see products that large teams produce break weekly?
This is not something that “ship it and iterate” has had any meaningful impact on. Eventually you get people who don’t intimately know the whole codebase, and make a breaking change unintentionally. What is the alternative to some kind of testing?
That argument applies to pretty much every single unit test ever written. A function running on a single long can take 2^64 possible values. Impossible to test by your logic. Yet they're tested without issues constantly.
What you do is put together a long list of sample functions and sample arguments that covers the expected edge cases and then test those for equivalence. Hardly impossible. Just takes time. Not bulletproof but better than nothing.
Maybe you can explain it to them this way - if they know what a function is. If you have a function that does something, foo, and it takes one boolean parameter, that's two test cases. If you add a second boolean parameter, now you have at least four total test cases. And so on.
If you look at what happens with Refactoring or Unit testing, instead of just listening to what people tell you is happening, you see that a lot of goes on is synthesizing one state from a small group of others, over and over again.
So for instance there may be five different criteria that decide whether you are qualified to receive a 10% off discount. You make a block of code that is responsible for emitting a single boolean, and the rest of the system only ever interacts with that single value.
Instead of an upper bound of 2^n you have one that is somewhere around n/3 factorial. Which is still a scary-big number, but might push out the dog leg a couple of years.
For a function with 2,000 lines of code, we have to be honest with ourselves and accept that testing will never be sufficient; if there's 2,000 lines of code, you can bet good money that there's global state manipulation as well.
Making the assumption that anybody is capable of sufficiently testing such functions and subsequently re-writing them will only introduce new bugs and old regressions.
Seems harsh, but I've had to work with a few such monstrosities. Global state galore.
A common occurrence is something like a pool of 10 actions where a bunch of tests each do 3 to 7 of them. This is very hard to abstract with a function call.
It's also about the compiler guaranteeing to enumerate every code path in and out of every function. A human writing tests is limited by what they think will happen, but the compiler knows.
Usually it's about keeping it granular enough that various sub-activities can be tested in isolation. If you have one function with 10000 LOCs, that's not really testable beyond "something doesn't work".
Then of course you need some way to automatically run these tests, but that's usually provided by IDEs or standard libraries these days.
You can generate random test cases (and save them), given that you have no performance worries. However, introducing randomness inside a test doesn't make so much sense, as one of the grandparents states, because the point of testing is predictability. When a test which does things differently every time it runs fails once, it is not totally useless information but much less informative than a test which always fails.
I should also mention that there are a lot of cases where randomness wouldn't affect the result[1], but then, why introduce randomness in the first place?
If the aim is to test as many combinations of different variables as possible, bunch of tight, nested loops would be much reliable IMHO.
[1] for example, if a list doesn't render correctly with n elements, it is very likely that it still wouldn't with m elements - unless you have performance problems and that means you need to test for the maximum sane values and limit the input
Testing is important, for sure, but just because you have two parameters with n choices each, does not mean you have to test n^2 combinations. You can aim to express parameterization at a higher level than ifdefs.
For example, template parameters in C++. The STL defines map<K, V>. You don't have to test ever possible type of key and value.
You can’t test everything. To believe we can is just hubris; even if we get every function to have coverage, we won’t have coverage of the full range of inputs unless all that’s being done is some very simple programming. The combinatorial explosion is fast.
Testing definitely has value, but when testing you always have to make the assumption that you can’t test everything, and then the trick is deciding what to test versus what not to. I think people forget this and focus on having every function tested, every interface having coverage, etc.
I find a great way to understand the complexity you are coding into a method / function is to try and write a unit test for it. Short simple methods are easy to write tests for, while long, complex ones become exponentially harder to capture all the combinatorial state possibilities (eg: does it work when foo is null, but bar is not and baz is out of range? what about when bar is null but foo is not and baz is in range .... and so on.)
Even if you are not intending to write a unit test for a particular function, it's useful to imagine how hard it would be to do so as a thought experiment.
Enums, bools, etc.
>can result in your test taking a significant amount of time - enough that you wouldn't want to run it very often.
It is irrelevant in theoretical discussions like this
reply