Hacker News new | past | comments | ask | show | jobs | submit login
Using ASCII waveforms to test real-time audio code (goq2q.net)
121 points by jwosty on Oct 13, 2021 | hide | past | favorite | 70 comments



This approach is neat for observability, but it's worth noticing that it essentially quantises all of your samples down to the vertical resolution of your graph. If you somehow introduced a bug that caused an error that was smaller than the step size then these tests wouldn't catch it.

(e.g. if you somehow managed to introduce a constant DC-offset of +0.05, with the shown step size of 0.2, these tests would probably never pick it up, modulo rounding.)

That said, these tests are great for asserting that specific functionality does broadly what it says on the tin, and making it easy to understand why not if they fail. We'll likely start using this technique at Fourier Audio (shameless plug) as a more observable functionality smoke test to augment finer-grained analytic tests that assert properties of the output waveform samples directly.


That's true that it quantizes (aka bins) the samples, so it isn't right for tests that need to be 100% sample-perfect, at least vertically speaking. I suppose it is a compromise between a few tradeoffs - easy readability just from looking at the code itself (you could do images, but then there's a separate file you have to keep track of, or you're looking at binary data as a float[]) vs strict correctness. The evaluation of these tradeoffs would definitely depend on what you're doing, and in my case, most of the potential bugs are going to relate to horizontal time resolution, not vertical sample depth resolution.

If the precise values of these floats is important in your domain (which it very well may be), a combination of approaches would probably be good! Would love to hear how well this approach works for you guys. Keep me updated :)


I'm not sure it makes sense to separate "vertical" correctness from "horizontal" correctness when it comes to "did the feature behave" though; to extend the example in TFA, if your fade progress went from 0->0.99 but then stopped before it actually reached 1 for some reason, you might find that you still had a (small, but still present) signal on the output, which, if the peak-peak amplitude was < 0.1, the test wouldn't catch.

Obviously any time you're working with floating-point sample data the precise values of floats will almost always not be bit-accurate against what your model predicts (sometimes even if that model is a previous run of the same system with the same inputs as in this case); it's about defining an acceptable deviation. I guess what I'm saying is that for audio software, a peak-peak error of 0.1 equates to a signal at -20 dBFS (ref DBFS@1.0) (which of course is quite a large amount of error for an audio signal), so perhaps using higher-resolution graphs would be a good idea.

(Has anyone made a tool to diff sixels yet? /s)


Fair points here. Unfortunately adding more vertical resolution starts to get a little unwieldy to navigate through. Maybe it could start using different characters to multiply the resolution to something sufficiently less forgiving of errors. If it could choose between even 3 chars, for example, it would effectively squash 3 possible values into one line, tripling the resolution.


I think more resolution may give you more false negatives, which might not be helpful. We’ve used similar tools for integration testing at work and the smallest usually irrelevant change can bust the reference cases, due to the high-detail in the reference, which means going through all the changed tests and then seeing that everything is still fine.

For this, just thinking about sound, I wonder if you could invert the reference wave form and add it to the test to see how well it cancels? Then instead of just knowing there was a diff, you could get measurements of the degree of the diff.


A more accurate and only slightly more complex process for this is to generate numerical text representations of the desired test waveforms and then feed them through sox to get actual wave files. The numerical text representations are likely even easier to generate programmatically than the ascii->audio transformation.


What does a "numerical text representation" of a waveform look like? (Not familiar with audio processing but interested to understand your suggestion.)


Here's a fragment of the representation of a stereo file:

       4.9600227   0.094451904297 -0.014831542969 
       4.9600454   0.089172363281 -0.0092468261719 
        4.960068   0.087493896484 -0.0065612792969 
       4.9600907   0.090179443359 -0.0028686523438 
       4.9601134   0.093963623047 0.0060729980469 
       4.9601361   0.095367431641  0.020538330078 
       4.9601587   0.094299316406  0.035186767578 
       4.9601814    0.09228515625  0.045013427734 
       4.9602041   0.089691162109  0.051422119141 
       4.9602268   0.086059570312  0.058929443359 
Columsn are: [time in seconds] [left channel sample] [right channel sample]

This was generated using

      sox somefile.wav somefile.dat
You can reverse that by reversing the argument order above.


This has some advantages-- it's numerically precise and can be more flexible, but it has some downsides over the suggested approach.

- The quantization of the graphs is a feature to add some tolerance to the tests. I admit this is a mixed blessing.

- This is a lot more opaque to someone looking at a text file of the test output than what is described in the post.


The opacity of the .dat file is real and deep. But I'd expect the opacity of the go/python/lua/whatever code that generates the .dat to be extremely low, and that's what you'd read.


I was thinking that maybe that lack of precision was a good thing. Makes your tests less fragile.

I agree though that you probably want to augment this with some form of assertion about noise level to check the high frequency smaller components.


There's some french guy who made some maths that might help with this, idk ;)

--pb, CTO of Fourier Audio Ltd.


Yeah, maybe an ascii-art waterfall plot is the way to go!


I suppose you could use a patterned dither / sigma-delta to get a slightly bigger chance of finding differences.


Imagine if we had terminals that could handle graphical data. We wouldn't have to do weird kludges like this, we could just plot the waveforms in the output of our tools.

But it's 2021, and not only is this not possible, there is not even a path forward to a world where this would be possible. It's just not an option. Nobody is working on this, nobody is trying to make this happen. We're just sitting here with our text terminals, and we can't even for a second imagine that there could be anything else.

It's sad, is what it is.


I would point out that sixels[0] exist. There is a nice library, libsixel[1] for working with it, which includes bindings into many languages. If the author of sixel-tmux[2][3] is to be believed[4], the relative lack of adoption is a result of unwillingness on the part of maintainers of some popular open source terminal libraries to implement sixel support.

I can't comment on that directly, but I will say, it's pretty damn cool to see GnuPlot generating output right into one's terminal. lsix[5] is also pretty handy as well.

But yeah, I agree, I'm not a fan of all the work that has gone into "terminal graphics" that are based on unicode. It's a dead-end, as was clear to DEC even back in '87 (and that's setting aside that the VT220[6] had it's own drawing capabilities, though they were more limited). Maybe sixel isn't the best possible way of handling this, but it does have the benefit of 34 years of backwards-compatibility, and with the right software, you can already use it _now_.

0 - https://en.wikipedia.org/wiki/Sixel

1 - https://saitoha.github.io/libsixel/

2 - https://github.com/csdvrx/sixel-tmux

3 - https://news.ycombinator.com/item?id=28756701

4 - https://github.com/csdvrx/sixel-tmux/blob/main/RANTS.md

5 - https://github.com/hackerb9/lsix

6 - https://en.wikipedia.org/wiki/VT220


> If the author of sixel-tmux[2][3] is to be believed[4], the relative lack of adoption is a result of unwillingness on the part of maintainers of some popular open source terminal libraries to implement sixel support.

If you have any doubt, look no further than this thread: the sixel format is attacked not for any technical reasons, but for its age, RIGHT HERE ON HN:

>> "That's a protocol that's a good forty years old, and even that is not supported. And I can see why, why on earth would you want to be adding support for that in 2021? What a ridiculous state of affairs."

What's ridiculous is, with so many examples and quotes, some people still thing I must be "emotional" (I had a long discussion here... https://news.ycombinator.com/item?id=28761043 ) or that a few million colors is not sufficient for the terminal (!)

There is none so blind as those who will not see...


Sixels are fun, but I was disappointed by libsixel. It’s not really a general–purpose library; most of it is there only to implement various command–line arguments of img2sixel. Most of the functions determine what to do by parsing strings taken from the command–line arguments, so reusing it is super annoying.

When implementing a program that outputs sixels, you are better off looking elsewhere. SDL1.2-SIXEL is a good choice in general, if you are writing C or don’t mind using the C bindings for your preferred language.


That’s interesting. Do you think sixels could work for the baseline tests? Would it be feasible to have them display nicely in an IDE, like VS Code or Visual Studio?


I don’t see why sixels couldn’t work. You’d probably want a tool to decode them, diff the images, and then output another sixel image. I’m admittedly not sure of such a tool existing off the shelf though.

I’m not aware of text editors supporting sixels, which could make preparing the tests a challenge. Certainly, you could imagine a text editor supporting them, but I’m not aware of one that does personally.

I will concede that for your specific use case, an off the shelf ASCII plotting library probably involves less custom tooling.


> I don’t see why sixels couldn’t work.

Sixels will work: they are fast enough to allow youtube video playback !!!

https://github.com/saitoha/FFmpeg-SIXEL/blob/sixel/README.md

The problem is NOT THE FORMAT, the problem is the lack of tooling: links and w3m are among the rare text browsers that can display images in the console.

It's just a matter of the browser sending the image to the terminal in some format it can understand, but if that hasn't be thought about as a possibility (say, for text reflow issues) it's going to be far more complicated than just adding a new format, as you will have to work both on say the text reflow issues (ex: how do you select the size of the placeholder, when expressed in characters?), on top of the picture display issues.

Said differently, it would be easier to have console IDE that supported graphics if any format whatsoever (sixel, kitty...) was supported by a console IDE; we could then argue about the ideal format.

Arguing about the ideal format BEFORE letting the ecosystem grow using whatever solution there is only results in a negative loop.

It's like if a startup was arguing about the ideal technological stack even before trying to find a product market fit!!

Personally, I do not care much about sixels, kitty or iterm format - all I want is to see some kind of support for a format that's popular enough for tools using it to emerge.

Yes, it would be better if that supported format was the option that had the greatest chance of succeeding, but right now, that is a very remote concern: first we need tools, then if in the worst case they are for a "bad" format, we can write transcoders to whatever format people prefer!

Right now, there is rarely any "input" to transcode (how many console tools support say iTerm format?), so we have a much bigger bootstrapping problem.

> an off the shelf ASCII plotting library probably involves less custom tooling

With a terminal like msys2 or xterm, no custom tooling is required: just use the regular gnuplot after doing the export for the desired resolution, font, and font size.

gnuplot is far more standard than plotting library that often require special Unicode fonts on top of requiring you to use their specific format.


I find kitty's graphics protocol to be a superior implementation of the idea: https://sw.kovidgoyal.net/kitty/graphics-protocol/


That's a protocol that's a good forty years old, and even that is not supported. And I can see why, why on earth would you want to be adding support for that in 2021? What a ridiculous state of affairs.


> That's a protocol that's a good forty years old

34 years old, actually. I guess we can go ahead and deprecate the x86 instruction set, tcp/ip, ASCII, C, tar, and many other tools and standards that are old.

> and even that is not supported.

xterm supports vt430 emulation. I use this semi regularly. I believe mintty also supports sixels, plus a handful of others. The libsixel website has a full list.

> And I can see why, why on earth would you want to be adding support for that in 2021?

You might want to read your own post ( https://news.ycombinator.com/item?id=28856005 ).

What’s your great idea as opposed to sixels?


> It's sad, is what it is.

With graphics being everywhere in 2021, I wouldn't call this situation "sad," I'd think a lot more critically about why.

To start with, fixed-width text is significantly easier to work with than graphics.

Nothing's stopping anyone from writing a CI tool that outputs to HTML with embedded images. The bigger question is why it's uncommon.


In truth, it's because text is quite easy to handle. It's easy to make a program that handles text, too.

And so we have a lot of text editors, diff tools, efficient compression, tools like sort and uniq: the whole unix ecosystem.

So if you transform sound to text, you can then use text tools to compare the output to catch differences. A simple serialization of numerical sample values would have caught the bug, but I agree that having a way of visualizing the output is nice.

Command line input, programming, etc. is also still mostly done with text, because it's easy to transform. Of course, you can imagine working at a higher level with objects (like powershell does IIRC), mimetypes, etc.


The venerable xterm and a lot of later physical terminals (those things with CRTs) can emulate Tektronix (Tektronix, that today makes instruments, also made computer terminals with fancy storage CRTs that were kind of e-paper-like, but green - and sometimes yellow - screen) graphics. iTerm2 and some others, as pointed out, can do Sixel graphics (a format designed originally for DEC dot-matrix printers that some DEC terminals also implement).


I mean, yes, that is how sad the current state is.


VTE and, with it, almost every Linux distro, will get Sixel support soon. I volunteered to add Tektronix graphics to it too, but this is neither a dire need, nor something I have done before, so it'll take some time.


It's forty years old. Why on earth would you be adding that in 2021?

Why are we not focusing our energy on making something that is actually up to date?


Because things that existed 40 years ago are useful, already have software written for it, are compatible in sometimes unforeseen ways (a DEC dot-matrix graph can be printed as is on a Sixel-compatible terminal!) and have been battle tested for ages.

There is a reason the Unix way of bytestream-based shell and pipes is still useful and present these days to the point that That Other OS is now embedding Linux in it.

Also, these ancient terminals often had some interesting typography options that are encoded in the ANSI standard that most modern terminals don't bother (line attributes that generate wider and taller cells are one such example).

These formats may be more desirable than more modern and complete ones such as PostScript for other reasons. I wouldn't advise implementing a terminal capable of rendering PostScript graphics because it's one more way to infiltrate malware in your computer by rendering untrusted inputs (There are a lot of RCE opportunities in exploiting vulnerable decoders).


In TempleOS you can mix text, images, hyperlinks, and 3d models in the terminal. This is true for the whole system: you could literally have a spinning 3d model of a tank as a comment in a source file. That's right, it took a literal schizophrenic to make an OS with a feature that should have been standard decades ago.

Nobody tries to make actually interesting new operating systems anymore. OS research today is just "let's implement unix with $security_feature", nobody is actually trying to make computers more powerful or fun to use, or design a system based off of a first-principles understanding of what a computer should be.

God I wish I was born in the lisp machine timeline


The features you describe belong to the app ecosystem, not to the OS - IMHO the OS is about hardware and drivers, and what kind of graphics is supported by your terminal and source file editor is orthogonal to the OS and could be done in any of the current OS'es; but that would require a rewrite/redesign/reimagining of the whole standard application package which seems a much larger project than "merely" an OS.


"There are more things in heaven and earth, Horatio, than are dreamt of in your philosophy"

An OS facilitates communication between programs running on a computer. Unix lets those programs communicate by sending characters of text to each other. You could just as easily imagine an OS that lets them communicate by sending images, audio/video, 3d models, etc. An OS can be way more than what you think it is. To detox your brain from this unix worldview, spend some time in a VM and play around with amigaOS or opengenera. Those were actual coherent OSes with an actual view of what a computer should be and how it should behave. Unix isn't.

> reimagining of the whole standard application package which seems a much larger project than "merely" an OS.

By OS, I don't mean kernel. I mean the base set of software that lets you interact with your computer and do interesting stuff with it.


The line between app platform and OS is a blurry one. The Amiga OS, for instance, has libraries for specific file types that expose standardized entry points. This way, if you install the library for Photoshop files, all graphics programs that adhere to that protocol will be able to read and write Photoshop PSD files. Microsoft had DDE and, later, OLE, for embedding objects from one program into data from another in a standard way all programs were supposed to share. It was a pain.

This blurry line is present in other environments as well. In the Apple Lisa, installing a program resulted in new templates in the Stationery folder. In Smalltalk, installing a program adds its class definitions to the system as independent entities you could use in your own programs.

Not all operating systems are the children of Unix and VMS.


Smalltalk's components were so tightly interdependent that their integration smoke test was 4+3, because evaluating a simple addition expression exercised like 3/4 of the entire system.


I remember once crashing Squeak by doing

`True := False`

Should crash any vintage ST/80 workstation.


the downside of rich terminal output is that media formats become the system's responsibility. applications can't output media in formats that aren't provided by the system, because then the terminal wouldn't know how to display it and interop with other applications (e.g. piping) wouldn't work either.


You could let a program create an API for manipulating a new type of data and inform the system about it so that other programs could use it. This is more or less what AmigaOS did; you installed a datatype for e.g. a PSD file, then all your programs that worked with images could read PSD files. I think it's a nice idea.


Notebook interfaces are basically that, e.g. Jupyter or Mathematica.


This isn't a graphical problem. All that's required is storing arrays of validation data and a diff tool to check for mismatches. Visualizing the results is useful for failure analysis but not a core requirement. That can readily be done with free tools like matplotlib. We live in that world today.


Some terminals can.

https://iterm2.com/documentation-images.html

That's iterm's own implementation. There's also sixel, as pointed out by another comment.


> Imagine if we had terminals that could handle graphical data.

We have. They are called "browsers". You might be even using one right now!


Maybe you would like to support https://ctx.graphics/


Nice! I became obsessed with rendering sparkline representations of chunks of audio for the same reason: to inspect failures when writing tests / refactoring. I wrote a JUCE module (C++) and integration with lldb to make it quick to inspect chunks of audio in the IDE: https://github.com/sudara/melatonin_audio_sparklines


Once you've got the waveforms as arrays, what do you need the ASCII rendering for?

Instead of diffing ASCII-rendered waveforms, save the arrays and diff the arrays (and then use any kind of numerical metric on the residual). Scientist programmers have all sorts of techniques for testing and debugging software that processes sampled signals.


It's usually gonna be easier to tell what went wrong in an ASCII string array than a raw float[]. It's for the human reading/fixing the test.


well you don't read the residual, you plot it. the ASCII plots have limited rendering resolution compared to a proper plotting system, like where you can zoom in and stuff.


If we go beyond ASCII, Unicode specifies 2x2 mosaics since ever (they were present in DEC terminals) and 2x3 mosaics (from Teletext and the TRS-80) since version 13. Some more enlightened terminals (such as VTE) implement those symbols without the need of font support.

Or you can use Braille to get 2x4 mosaics, but they usually look terrible.


I just might have to try this next.


For audio you might also consider U+2581—U+2588, LOWER ONE EIGHTH BLOCK through FULL BLOCK. And then if you really want to go all out there are sixels, but that’s basically just an image; you probably lose the easy ability to compare them. On the other hand they’re not available in every terminal.


These blocks are great if you need more vertical than horizontal resolution. There are also newer symbols in the Unicode 13 spec that are just the line without the fill.


Oooh, I didn’t know about those! I’m going to go look them up.


There are more details here: https://en.wikipedia.org/wiki/Symbols_for_Legacy_Computing

The same group wants to include a couple others, from different platforms, but the Unicode Script Ad Hoc Group is concerned the new batch may not be as meaningful as the first one.


This is such a great idea! I've really struggled with how to test real-time audio code in the live looper I've been working on [0]. Most of my tests use either very small, hand-constructed arrays, or arrays generated by some function.

This is both tedious and makes it very hard to debug test failures (especially with cases like crossfades, pan laws, and looping). I love the idea of having a visual representation that lets me see what's going wrong in the test output, and I'm definitely going to try to implement some similar tests.

I'm also curious what the state-of-the-art is for these sorts of tests. Does anyone have insight into what e.g., ableton's test suite looks like?

[0] http://github.com/mwylde/loopers


> I'm also curious what the state-of-the-art is for these sorts of tests. Does anyone have insight into what e.g., appleton's test suite looks like?

I don't know, but if I were to make an educated guess, maybe rendering stuff to actual audio files is a common approach? That way when something goes wrong, they can inspect it in a standard waveform editor?


That's so cool, and reminds me of how I used Gnuplot as a makeshift oscilloscope to test and evaluate some (not real time) software synthesis I was doing.


I was inspired by your work to do a juce implementation: https://github.com/FigBug/Gin/commit/30aa84130f4f607bdeba538...

I think the most useful thing for me is I can call it from lldb and immediately dump buffers to my terminal while debugging.


Why would you use ascii for something like a waveform, something that's inherently a graph?

Sure, maybe you don't need that much resolution for what the use case is. But it's the equivalent of looking at a graph and squinting your eyes to blur it.


In short, because text is much easier to deal with than bitmaps, and there is much more tooling that "just works" for text than actual graphics, like Expecto's textual diffing in assertations. @MayeulC said it well: https://news.ycombinator.com/item?id=28856884


Would love to use it as a library! Is it open source?


Not yet, but it certainly could be. Would it be useful to publish the helper classes that render the waves out to ASCII? That's really the guts of the thing. After that, you just use whatever testing framework you want to do the actual diffing (in my case Expecto for F#).


I've added an fssnip for the ASCII renderer. It uses NAudio. Should be pretty easy to use. http://www.fssnip.net/85g


Am I the only one almost offended by Braille not being ASCII?

edit: Yes. I miscalculated the dot density.

/me slaps forehead


Aren't those asterisks?


Oh... The shame...

Yes. I miscalculated the dot density. :-(


This is great. People are doing very cool things with F# these days.


Thanks, I like to think so! I didn't see other people doing much audio programming in F#, so I figured someone would be interested in seeing what it can look like.



That looks right up my alley, thanks for the link!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: