66

Having worked in complex solutions that had Unit Tests and Integration Test in the CI/CD pipeline, I recall having a tough time with tests that failed randomly (either due to random values being injected or because of the async nature of the process being tested - that from time to time resulted in some weird racing condition). Anyway, having this random behavior in the CI pipeline was not a good experience, we could never say for sure the change a developer committed was really causing the build issue.

I was recently introduced to AutoFixture, which helps in the creation of tests, by randomly generating values - surprisingly I was the only one who did not feel it was a great idea to introduce it in all tests of our CI pipeline.

I mean, I understand fuzz testing, monkey testing, etc but I believe this should be done out of the CI/CD pipeline - which is the place I want to ensure my business requirements are being met by having sturdy, solid and strict to the point tests. Non linear behavior tests like this (and load testing, black box, penetration, etc) should be done outside of the build pipeline - or at least should not be directly linked to code changes.

If these side tests ever find a behavior that is not expected, a fix should be created and a new concrete and repeatable test case should be added to avoid going back to the previous state.

Am I missing something?

5
  • 5
    it seems AutoFixture at least supports property-based testing, which is what it looks like you're looking for. QuickCheck popularized this approach, which allows you to test classes of inputs rather than using random inputs. Commented Jun 23, 2021 at 0:19
  • 2
    In digital design we do constrained random verification all the time. It’s unlikely that a random test is going to fail after you’ve run it (hundred)thousands of times and it’s too much effort to write good, dedicated non-random tests just for CI. If a random test does fail in CI: Re-run with the same seed and see if it’s a real bug. If it’s not: fix the test bench or randomness constraints.
    – Michael
    Commented Jun 23, 2021 at 5:41
  • @DocBrown, thanks! I have not found that before. I believe the core of the question is indeed the same. Although I can see value in both deterministic and non-deterministic, having a non deterministic test linked to the CI pipeline is my main concern. Commented Jun 23, 2021 at 11:22
  • 2
    In tests: yes, have some random tests (that record the seed as part of the results). In unit tests: no, but any failures from the random tests should be used to construct new unit tests that exercise the discovered failure condition. Commented Jun 24, 2021 at 13:37
  • If the test failed randomly because of the random data, that meant the test did have a need to discriminate on the values. So in that sense it was helpful. This can happen if it is hard to pick boundary conditions or if you want to stress-test concurrency effects. In both cases however it is good to have that only as additional test runs with value logging or pre-specified seeds to make it replayable.
    – eckes
    Commented Jun 24, 2021 at 22:22

5 Answers 5

80

Yes, I agree that randomness shouldn't be part of a testing suite. What you want is to mock any real randomness, to create deterministic tests.

Even if you genuinely need bulk random data, more than you can be bothered generating by hand, you should generate it randomly once, and then use that (now set in stone) data as the "random" input for your tests. The source of the data may have been random but because the same data is reused for each run, the test is deterministic.

However, what I said so far applies to running the tests you knowingly wrote and want to run. Fuzz testing, and therefore Autofixture, has a separate value: bug discovery. Here, randomization is actually desirable because it can help you find edge cases that you hadn't even anticipated.

This does not replace deterministic testing, it adds an additional layer to your test suite.

Some bugs are discovered through serendipity rather than by intentional design. Autofixture can help cycle through all possible values for a given input, in order to find edge cases that you likely wouldn't have stumbled on with limited hand-crafted simple test data.

If and when fuzz tests discover a bug, you should use that as the inspiration to write a new deterministic test to now account for this new edge case.

In short, think of fuzz tests as a dedicated QA engineer who comes up with the craziest inputs to stress test your code, instead of just testing with expected or sensible data.

12
  • 11
    The secret is to control the random, not be at its mercy. If you let it go wild be sure to capture it. Because the test is perfectly deterministic and repeatable if you know exactly what the random was that caused the behavior you observed. Don't make it hard to match the record of the random with the behavior. Commented Jun 22, 2021 at 19:04
  • 29
    Also, rather than store all the random data, you can use a deterministic pseudo random number generator and only store the seed. But obviously in such a way that test ordering doesn't influence the values (so reset the seed every test). Commented Jun 23, 2021 at 2:22
  • 2
    @GregoryCurrie: Yes. And it's a good idea to manually change the seed and see if the tests still pass. I've seen tests which were suspiciously tailored to only accept a specific list of random values. And also check if the tests really test something... Tests don't bring much if they can never fail. Commented Jun 23, 2021 at 9:14
  • 2
    xkcd.com/221 Commented Jun 23, 2021 at 11:58
  • 3
    @PeterCordes Regardless of random testing or not, a few deterministic tests like "all 1 bits" or "all 0 bits" are obvious things to use for a big-int arithmetic package. IMO.
    – alephzero
    Commented Jun 24, 2021 at 3:56
53

I've worked on projects which use anywhere from no to extensive randomness in tests, and I'm generally in favour of it.

The most important thing to remember is that the randomness must be repeatable. In the current project we use pytest-randomly with a seed based on the pipeline run ID in CI, so it's trivial to repeat a failing run identically, even though each pipeline run is different. This may be a showstopper if you want to run tests in parallel, because I could not find a (pytest) framework which will split tests into parallel runs reproducibly.

The randomness is used in two ways:

First, tests are run in random order. This virtually guarantees that any test interdependencies will eventually be discovered. This avoids those situations where a test fails when running the whole test suite, but when running the failing test on its own it succeeds. When that happens, it could take much longer to find and fix the actual issue, since you're effectively debugging two things at once, you can't be sure whether the test failure is because of a bad test or bad production code, and each test run to check a potential fix could take a long time.

Second, we use generator functions for any inputs which are irrelevant to the test result. Basically we have a bunch of functions like any_file_contents (returns bytes), any_past_datetime (datetime.datetime), any_batch_job_status (enum member) and any_error_message (str), and we use them to provide any required input which should not affect the test results. This can surface some interesting issues with both tests and production code, such as inputs being relevant when you thought they weren't, data not being escaped, escaped in the wrong way, or double escaped, and even ten-year-old core library bugs showing up in third party libraries. It is also a useful signal to whoever reads the test, telling them exactly which inputs are relevant for the result.

While this approach is not a replacement for fuzz testing, it is a much cheaper way to achieve what I would expect are similar results. It's not going to be sufficient if your software requires much more extensive fuzzing, such as a parsing library, but it should be a simple way to improve run-of-the-mill tests.

I don't have the numbers to back this up, but I believe this is the single most reliable test suite (in terms of false positives or negatives) I've worked on.

10
  • 11
    I don't know why someone downvoted this. Thanks for the voice of experience, and good point about all randomness needing to be reproducible—it's unfortunate that so many answers, both here and at the linked question, raise a point like "you wouldn't be able to reproduce it" with no sign of having considered the fact that reproducible pseudorandom functions exist, and are in fact what are present in most standard libraries etc. Commented Jun 23, 2021 at 7:26
  • 3
    +1 - i do it the same most of the time. Inputs that are not relevant (or maybe only a subset of possible values should behave equally) i'll do randomly. E.g. randomFirstname(), anyOf<ValueA, ValueB, ValueC>(), ... I even had tests succeed when they shouldn't because someone used the same ID for multiple mocked input objects. And with this setup i still had flaky tests - but never due to the random fill-ins.
    – marstato
    Commented Jun 23, 2021 at 7:47
  • 8
    And of course, once you have a failing test, you should create a new test with the specific data, or property of the data, that generated the failure, and make that part of the CI flow. Commented Jun 23, 2021 at 9:58
  • 2
    @PierreArlaud The point of this answer is that you should make the "random" test perfectly reproducible. That is, it's ok to use an external input like pipeline id or time-of-day or whatever as the seed for whatever randomness is in your tests, as long as you have a good way of rerunning the tests with the same seed, to get the same result. Then if a test fails, you won't have trouble reproducing it. (I will note though that your team must have a culture of taking each failure seriously and not just retrying flaky tests to pass!) Commented Jun 24, 2021 at 0:08
  • 2
    It's 2023 - I randomly generate the seed value and output it in CI before each test suite gets run - I don't tie it to any pipeline ids or other heuristic, I generate a random one for every single test suite every time a build happens, and I log it so I can see it. If a test suite fails, I can just copy paste the seed from CI into my local and reproduce the failure. In the rarest of cases it's a real failure, which is awesome. If it's a non application-breaking failure, I've still learned that I need to improve my test suite. I am a very very strong advocate of random mock data in tests Commented Jun 8, 2023 at 22:24
12

No. Random values in unit tests cause them to be not repeatable. As soon as one test will pass and another will fail without any change, people lose confidence in them, undermining their value. Printing a reproduction script is not enough.

That said, randomized edge case testing and fuzz testing can provide value. They’re just not unit tests at that point. And personally, I like linking them to CI even if they don’t necessarily block a deployment, or are necessarily run on every commit.

29
  • 9
    Only non repeatable if the unit tests are stupid. You set a hard coded seed value at the beginning of each test and the random sequence is perfectly reproducible.
    – gnasher729
    Commented Jun 23, 2021 at 7:12
  • 5
    @gnasher729 It isn't random then is it?
    – Blaž Mrak
    Commented Jun 23, 2021 at 7:37
  • 7
    @BlažMrak: That's why they are called pseudorandom number generator. Commented Jun 23, 2021 at 9:15
  • 8
    @BlažMrak gnasher729's comment is effectively a frame challenge. I assume that none of us are thinking that anyone is using a hardware RNG (or anything that creates actual random numbers). Therefore we are (even in the premise of the question) talking about PRNGs. Gnasher's (quite valid in my opinion) point is that pseudorandom numbers do not cause unit tests to be unrepeatable unless you are using them wrong. Or, to frame it in terms of your first comment, the data was never "random" to begin with, it was merely (by design) changing between each run. Commented Jun 23, 2021 at 15:19
  • 4
    @BlažMrak And my point is that choosing PRNG data as the input is not done so that "no two runs are the same", but rather so that you can get a variety of input coverage. Following from that, you can achieve that goal (variety of input coverage) without "no two runs being the same", simply by using the PRNG correctly. Seeing this as a nitpick between random vs unpredictable is missing the point being raised in these comments. Commented Jun 23, 2021 at 19:52
7

I would recommend covering "obvious" edge cases with explicit test data inputs, rather than hoping the fuzz testing will catch them. E.g. for a function that operates, handle empty arrays, single-entry arrays, and arrays with multiple (e.g. 5) items.

This way, your fuzz tests are strictly additive to a baseline level of solid test coverage.

One way to help reduce the pain is to ensure that your CI logs contain enough information to fully reproduce a test case locally.

Anyway, having this random behavior in the CI pipeline was not a good experience, we could never say for sure the change a developer committed was really causing the build issue.

Think of the flip side: if the fuzz testing wasn't there, nothing else would have caught it, so you'd have a false green. Sure it won't disturb your development/shipping experience, but it'll disturb production, instead.

0

To quote AutoFixture:

"...designed to minimize the 'Arrange' phase of your unit tests in order to maximize maintainability. Its primary goal is to allow developers to focus on what is being tested rather than how to setup the test scenario.."

So I can see why you wouldn't want a test such as:

x = random int
actual = SquareRoot(x)
Assert(actual = x^2)

You would want to explicitly test max int, negative numbers, etc and be sure that the test is repeatable.

However, this isnt what AutoFixture is proposing. They are more interested in tests like

x = new Customer
x.firstname = ...
x.lastname = ..
x.middlename = ...
x.Address = new Address()
x.Address.Street = ...
....
x.Account = new Accout()
...
etc 
repo.Save(x)
actual = repo.Load(x.Id)
Assert(actual = x);

Now you can see that your test is unlikely to fail due to the values you assign to the various customer and sub classes fields. That's not really what you are testing.

But! it would save you a lot of typing and unimportant code if you could auto-populate all those fields.

7
  • The examples on the Github page are really bad then because they only seem to show how important expected values are generated. The example is basically Assert.Equal(expectedNumber, sut.Echo(expectedNumber)); where expectedNumber is generated. I figured they didn't want to show anything more complex because then they would need to reimplement the SUT's logic in the test.
    – kapex
    Commented Jun 23, 2021 at 8:54
  • yeah its difficult to see how that would work
    – Ewan
    Commented Jun 23, 2021 at 9:31
  • So, from your example, if repo.Save limits the number of chars in x.Address, the test may or may not fail, depending on the random value generated. (i.e. if the generated random value is short the test passes, if the value is long the test fails) I see the value of finding such edge cases, but such tests cannot be considered regression tests, they are fuzz tests, and I struggle to believe they should belong in the CI pipeline Commented Jun 23, 2021 at 11:00
  • 1
    @ViniciusScheidegger AutoFixture seems to generate UUIDs by default for all stings, so they are all of the same length. The library doesn't seem to be about finding edge case or fuzz testing. The purpose is rather to generate valid dummy values. For example if the customer name can't be null but otherwise doesn't affect the test, then instead of hardcoding a dummy name, you let AutoFixture generate a value.
    – kapex
    Commented Jun 23, 2021 at 13:03
  • @ViniciusScheidegger yes, I'm using a simple example here you would have to assume that the generated fields fall within the expected range. But note this question, which tells use the AutoFixture will use data annotations to pick the correct length. its not trying to break your test, its trying to save you typing : stackoverflow.com/questions/10125199/…
    – Ewan
    Commented Jun 23, 2021 at 14:43

Not the answer you're looking for? Browse other questions tagged or ask your own question.