This approach is neat for observability, but it's worth noticing that it essenti...

jwosty · on Oct 13, 2021

That's true that it quantizes (aka bins) the samples, so it isn't right for tests that need to be 100% sample-perfect, at least vertically speaking. I suppose it is a compromise between a few tradeoffs - easy readability just from looking at the code itself (you could do images, but then there's a separate file you have to keep track of, or you're looking at binary data as a float[]) vs strict correctness. The evaluation of these tradeoffs would definitely depend on what you're doing, and in my case, most of the potential bugs are going to relate to horizontal time resolution, not vertical sample depth resolution.

If the precise values of these floats is important in your domain (which it very well may be), a combination of approaches would probably be good! Would love to hear how well this approach works for you guys. Keep me updated :)

phab · on Oct 13, 2021

I'm not sure it makes sense to separate "vertical" correctness from "horizontal" correctness when it comes to "did the feature behave" though; to extend the example in TFA, if your fade progress went from 0->0.99 but then stopped before it actually reached 1 for some reason, you might find that you still had a (small, but still present) signal on the output, which, if the peak-peak amplitude was < 0.1, the test wouldn't catch.

Obviously any time you're working with floating-point sample data the precise values of floats will almost always not be bit-accurate against what your model predicts (sometimes even if that model is a previous run of the same system with the same inputs as in this case); it's about defining an acceptable deviation. I guess what I'm saying is that for audio software, a peak-peak error of 0.1 equates to a signal at -20 dBFS (ref DBFS@1.0) (which of course is quite a large amount of error for an audio signal), so perhaps using higher-resolution graphs would be a good idea.

(Has anyone made a tool to diff sixels yet? /s)

jwosty · on Oct 13, 2021

Fair points here. Unfortunately adding more vertical resolution starts to get a little unwieldy to navigate through. Maybe it could start using different characters to multiply the resolution to something sufficiently less forgiving of errors. If it could choose between even 3 chars, for example, it would effectively squash 3 possible values into one line, tripling the resolution.

twobitshifter · on Oct 14, 2021

I think more resolution may give you more false negatives, which might not be helpful. We’ve used similar tools for integration testing at work and the smallest usually irrelevant change can bust the reference cases, due to the high-detail in the reference, which means going through all the changed tests and then seeing that everything is still fine.

For this, just thinking about sound, I wonder if you could invert the reference wave form and add it to the test to see how well it cancels? Then instead of just knowing there was a diff, you could get measurements of the degree of the diff.

PaulDavisThe1st · on Oct 13, 2021

A more accurate and only slightly more complex process for this is to generate numerical text representations of the desired test waveforms and then feed them through sox to get actual wave files. The numerical text representations are likely even easier to generate programmatically than the ascii->audio transformation.

theptip · on Oct 13, 2021

What does a "numerical text representation" of a waveform look like? (Not familiar with audio processing but interested to understand your suggestion.)

PaulDavisThe1st · on Oct 13, 2021

Here's a fragment of the representation of a stereo file:

       4.9600227   0.094451904297 -0.014831542969 
       4.9600454   0.089172363281 -0.0092468261719 
        4.960068   0.087493896484 -0.0065612792969 
       4.9600907   0.090179443359 -0.0028686523438 
       4.9601134   0.093963623047 0.0060729980469 
       4.9601361   0.095367431641  0.020538330078 
       4.9601587   0.094299316406  0.035186767578 
       4.9601814    0.09228515625  0.045013427734 
       4.9602041   0.089691162109  0.051422119141 
       4.9602268   0.086059570312  0.058929443359

Columsn are: [time in seconds] [left channel sample] [right channel sample]

This was generated using

      sox somefile.wav somefile.dat

You can reverse that by reversing the argument order above.

mlyle · on Oct 14, 2021

This has some advantages-- it's numerically precise and can be more flexible, but it has some downsides over the suggested approach.

- The quantization of the graphs is a feature to add some tolerance to the tests. I admit this is a mixed blessing.

- This is a lot more opaque to someone looking at a text file of the test output than what is described in the post.

PaulDavisThe1st · on Oct 14, 2021

The opacity of the .dat file is real and deep. But I'd expect the opacity of the go/python/lua/whatever code that generates the .dat to be extremely low, and that's what you'd read.

rkangel · on Oct 14, 2021

I was thinking that maybe that lack of precision was a good thing. Makes your tests less fragile.

I agree though that you probably want to augment this with some form of assertion about noise level to check the high frequency smaller components.

phab · on Oct 14, 2021

There's some french guy who made some maths that might help with this, idk ;)

--pb, CTO of Fourier Audio Ltd.

rkangel · on Oct 14, 2021

Yeah, maybe an ascii-art waterfall plot is the way to go!

contravariant · on Oct 13, 2021

I suppose you could use a patterned dither / sigma-delta to get a slightly bigger chance of finding differences.