Hacker News new | past | comments | ask | show | jobs | submit login
Mics that record in 3D ambisonics are the next big thing (cdm.link)
186 points by glitcher on Sept 19, 2018 | hide | past | favorite | 67 comments



Ambisonics is a fascinating technology. It's basically the same concept as differential stereo encoding (where you record an R+L and R-L channel and use them to derive R and L, or just play the R+L channel for mono) extended to all three axis to create surround sound (so you have a sum channel, a horizontal difference channel, a vertical difference channel, and a depth difference channel). This was all developed in the 70s (and thus out of patent today) but abandoned for more direct means of encoding surround since it was more complex to process the signals for not much gain. Of course now with DSPs the signal processing is much easier and with VR there's suddenly a niche for it to fill since it fully preserves the 3D soundscape (unlike e.g. 7.1 surround which only records 7 point sources at fixed positions).

On a side note, the upcoming 1.3 release of the Opus codec is adding support for Ambisonics-encoded surround sound.


Amazed to see it's still going, I haven't heard mention of it for decades. It was going to be the next big thing around the same time that quadrophonics was failing to take off. Obviously that didn't happen. :)

It was developed by the long defunct UK National Research Corporation that also brought us carbon fibre and the hovercraft.


When I studied recording in college in the early 1990s, I got to play with a Calrec Soundfield ambisonic mic. It was astoundingly cool. We were working with all sorts of other stereo mic techniques (XY, Blumlein, etc), and it just smoked them all for "realness".


Oh Calrec <3. I used to do work as an audio engineer and assisted on a lot of orchestral recordings in concert halls. Among many others, we often used a calrec stereo mic and was just such a great sounding mic.


This would only be 3 axis sound right? I don't think you can get full 6 axis reproduction with this, can you?


What do you mean 6-axis? We're talking about directional pressure at a point, so 3 dimensions is sufficient.


6-axis as in not just directional, but positional. You can both be looking towards or away from a source, as well as looking forward but moving your listening position in such a way that the source is either on the left or right side of listener. In full roomscale VR you'll ideally want to handle positional tracking as well.


You can figure out the shape of the wavefront if you are stubborn enough.


The Opus ambisonics stuff is cool, I think it would be interesting to use ambisonics for emitters in 3D environments, but I have no idea how computationally complex this would need to be before it produced good effects.

Inverse ambisonic recording (i.e. measuring emitted sound from outside the focal point, rather than attempting to record from the focal point) is considerably easier than real ambisonic recording.


I actually built two of my own 1st-order ambisonic mics and the decoder plugins about 6 years ago. I shelved the project since I couldn't (and still cannot) afford a playback system beyond stereo :(

It was pretty cool to rotate the soundfield through headphone playback to have the recorded audio whizzing around your head :)

I never thought at the time that VR would become popular again and provide a good use case for this technique.



Been there, done that. Although mine is not finished yet :-) https://imgur.com/a/myBFYgU


haha, I built an almost identical tetrahedron for my first prototype with tiny capsules :)

for the larger capsules I had to use more parts for the frame (small pieces of bent brass) to correctly mount the capsules.


Audio professional here. Ambisonic recording is great, though it does have limitations because you're baking in spatial information in a way that will limit editing later - it's not a magic bullet for location audio, although it will be a valuable supplement.

If you're interested in getting into this, you can feel good about the Zoom product. They deliver outstanding value for money. I've used them on several feature films, first as backup recorders but later as the primary audio capture platform.


Audio professional here also! I too have used zooms as backup and as a primary mixer. This comment is on point. I’d like to add that close mic-ing and multitrack recording are never going away. Given that, it would be nice to have high quality ambisonic microphones with inline dsp’s and post production suites designed to handle the new spatial awareness. There needs to be a standardazion for spatial audio information developed for cross compatibility and a new UI design on the channels strips in the Daws to ease workflow. I’m envisioning instead of a pan knob a trackball or trackpad would suffice. Diving deeper, there could be channel programmable parameters to set room size, aborbtion and reflection rates.


I think this is already happening in games. With ray tracing the reflections are calculated from the player's view point. The audio processor then uses this information and outputs 3D sound.

I think it will be interesting to see how this develops now Nvidia has 'affordable' cards with dedicated ray tracing.

Keywords: spatial sound rendering


If it's not off topic, what's your take, as a professional, on the value of a single chip with the ability to record/playback several hundred channels? Are these a readily available piece of kit, or are they not worth having?

I ask because as part of a student project we've demonstrated a delta-sigma DAC on an FPGA with very good dynamic range. We should be able to put a lot of these onto the one FPGA. It might also be extended to an ADC.


As an audio professional, I think you’ll find some interest, but it will be limited. Massive channel count can be useful in beam forming cases (e.g. Shure MXA910 array ceiling panel) or spatialization/impulse-representation cases (e.g. Meyer constellation).

I do think that consumer mic array representation could be valuable eventually, but you’re going to need to beat the costs of ADC’s and simple multiplexed front ends. An FPGA is an expensive way to do that at volume (think ASIC).

Consider per-channel gain stages for dynamic range enhancement...


Thanks for the response. Good point on multiplexing, as fast high resolution single/dual channel ADCs are now cheap. It will be interesting to look at how much the multiplexing costs relative to FPGA (which can be quite cheap). At this stage the circuit we have is mostly a curiosity, but it will be interesting to experiment with it.


> Ambisonic recording is great, though it does have limitations because you're baking in spatial information in a way that will limit editing later

Thats not true as long as

1. You record with enough near field mics

2. Your spatial resolution is fine enough to fully encompass the person's sound

As long as you do and calculate for both (or choose a hardware platform correctly) then audio processing is a simple modification within that bubble.

The problem is tracking the voice as it moves around the room, and correctly mutating the voice with the correct filters, while avoiding modifying data you don't want. My guess, is there will soon be an echo sound test to triangulate the geometry of the walls. With effective calibration and base audiographs, then much more can be done.

Of course, then its 30 seconds before someone feeds CMU sphinx each voice stream and realtime translates it into text.

I've had this workflow going on for about 4 years once I found out the XBox Kinect had 4 near field mics as well as a ir depth sensor and webcam.

If you want a jumpstart on this, install ROS, install HARK https://wp.hark.jp/faq/ and go to a used game store or pawn shop and buy a old style Kinect for $10. You'll need the USB cable so you might need to go to eBay as well.


That's great supplementary information. I guess I left out a step because I was worrying about the realities of shooting narrative, where people who appear to be sitting opposite (or in some other spatial relationship to) each other on screen may actually be differently oriented, not not even in the same location at the same time, through the magic of editing. My early experiments with ambisonic recording indicated that if sound sources weren't positioned consistently during shooting the editing and mixing turned out to be nightmarish, so I stuck with using surround recording to get nice room tones and then prioritizing the mono recordings of the actors' performances for all but the simplest scenes.

Your workflow sounds very interesting and at this low cost I definitely want to give it a whirl, thank you!


Gladly!

One thing to keep in mind, is that when you plug in the Kinect, you can get the ir and webcam data trivially. However, when you load up the appropriate also commands to see the audio device, it will be markedly not present. You need to load the appropriate firmware with this tool ( https://manpages.debian.org/stretch/kinect-audio-setup/kinec... ).

Once you load the firmware, you'll see the inputs and outputs as you should. I forget the exact procedure, but I know you have to download the driver and strip the firmware.


With deconvolution of an impulse response sweep you can get the spatial data in the form of an echo. You should get better data with a sweep than with a single pulse due to the general uncertainty principle.


Question: did you ever look into the work of Joseph Pompei over at Holosonics? His intent was to confine 3D audio into a narrow cylinder, virtually inaudible outside the cylinder. From the site:

As the ultrasonic beam travels through the air, the inherent properties of the air cause the ultrasound to change shape in a predictable way. This gives rise to frequency components in the audible band, which can be accurately predicted, and therefore precisely controlled. By generating the correct ultrasonic signal, we can create, within the air itself, any sound desired.

- presumably with any directivity desired.

https://www.holosonics.com/what-makes-a-sound-source-directi...

If practical it could keep the noise down in gaming and home theater rooms plus if the cylinder were scaled-up for auditoriums could give all in the audience front row center seats (at least acoustically).


The sound they make is functional for advertising or deterrent or making people think they are hearing things, but the quality is currently far too low (IMHO) for entertainment purposes.

Here's an example - https://www.youtube.com/watch?v=-p42IRDaKNc

There used to be a company selling speakers called Hypersonic or something, they were equally directional but sounded terrible.


I did and it looks really interesting, but I've never evaluated any of it myself, and I haven't worked (or bought much gear) in this space for a few years.


Agreed- and some of the new mic modeling err microphones use this tech along with software models of classic/expensive/fragile/sought after microphones to varying degrees of success. For someone like me with a limited budget and a semi pro project studio it’s a pretty good trade off. That said they are right at the price point where I can get a pretty good modern mic or the 90% emulator of 20 mics.


A cheap electret microphone with a good signal chain and some DSP can easily compete with expensive condenser microphones. Condenser mics these days are used for the same reason vacuum tubes are used.


agreed- I feel like a decent line amplifier gets you 95% there. Even a consumer grade electret mic gets you 97% there, reasonable DACs get you to 98%. the other 2% costs 5 figures plus...


Marvel's new Wolverine podcast/radio play is recorded in ambisonic sound [1] if anyone wants an easily accessible example.

Per the article:

The ambisonic mic necessitated a different method of recording the show. Instead of the standard “one-person, one-mic” studio approach that the actors recorded simultaneously, in the same room. The approach allowed for more interaction between actors, more like staging a play than recording an audiobook.

1 - https://www.theverge.com/2018/5/30/17409704/wolverine-the-lo...


It sounds great! Trailer at: https://www.wolverinepodcast.com/


I wonder how people play back those 3D recorded sounds?

If you want a 3D audio experience, it's possible to do that with just two microphones and a dummy head model [1]. Then you just play back the recording with earbuds. It's pretty fascinating that something so simple works, but you can only listen from the position where the recording was taken.

I guess if you record using one of those complicated mic arrays, it may be possible to simulate the effect of your pinnae in software, allowing you to move around in a virtual environment, and hear the 3D audio from different points?

[1]: https://en.m.wikipedia.org/wiki/Dummy_head_recording


Yeah, you have to simulate the attenuation from different directions - HRTFs (head related transfer functions)[1] are used to do this. They're already supported by some games, and you have to use them for VR audio, since the user is in control of the camera position and angle.

[1] https://en.wikipedia.org/wiki/Head-related_transfer_function


Right, binaural is very good but it only works if you keep each ear in the same place. Ambisonics record the sound pressure from every direction so you can turn your head and still get directional sound. At least, assuming you have several speakers to recreate the sound field reasonably well. If you only have two speakers, the approximation will be kinda rough.


You could do it with headphones if you tracked the head motion and remapped the spatial signal through the HRTF in real time.


You set up a lot of speakers and a decent computer. Two decent computers if your room isn't ideal.

>Music Research Centre's Arthur Sykes Rymer Auditorium is equiped with a sixteen speaker Ambisonic rig. This rig cosists of four high speakers, eight horizontal speakers and four speakers below the audience in the air conditioning plenum duct, which was designed to allow for this. The rig is driven by a Firewire audio interface (Focusrite Saffire Pro) which can be accessed from computers positioned in the performance area via Firewire. This rig can do up to third order horizontal with first order height.

https://www.york.ac.uk/inst/mustech/3d_audio/ambisyrk.htm

Here's some more York uni ambisonics stuff - https://www.york.ac.uk/inst/mustech/3d_audio/ambis2.htm


The minimum number of speakers for a horizontal ambisonic setup is 4 but you need at least 6 to localize sound in a full sphere: https://en.wikipedia.org/wiki/Ambisonic_reproduction_systems...

I'm not sure if massive speaker array systems like BEAST (http://www.beast.bham.ac.uk/about/) use ambisonic diffusion techniques, I think part of the appeal of ambisonics is the relatively small number of speakers you need for full sphere spatialization.

(It's also interesting to see this subject pop up here. Maybe because Zoom is offering a consumer option now? I've heard good things about the octomic: http://www.core-sound.com/OctoMic/1.php )


Pink Floyd's (or, perhaps more correctly "Roger Waters'") "The Final Cut" made extensive use of the binaural recording technique (or, perhaps, more correctly "holophonics").


The best example on that album was the missile going through the "air" towards impact. A good demo of the technology.


"Pros And Cons Of Hitchhiking" is probably a better example, if not nearly as musically satisfying.


Ambisonic recording is cool, but audio technology in general is full of technobabble and pseudoscience, so I wonder how theoretically grounded these systems are.

Are these companies actually understanding the science and producing well-calibrated systems that can accurately record and reproduce pressure waves in 3D? Or are they just taking a bunch of microphones and gluing them in an aesthetically interesting arrangement, and then playing them back in an ad-hoc way?


It's more engineering than science (especially considering the manifold shortcomings of mic capsules and preamplification electronics), but yes, it's real. I heard early analog systems over 25 years ago, and they were shocking. They were also a bazillion dollars then, with a dedicated box to discombobulate all the phase stuff. The sort of thing that's really easy to do digitally now...


Ambisonics is a fancy name for fairly basic concept. All the sound at single point adds up to a scalar number over time, but that captures no sense of direction. The idea of Ambisonics is to characterize the sound field over a sphere.

You can imagine the if you captured "pixels" of sound over a sphere the size of your head, as you turned your head, you'd be able to localize sound because mid and high frequencies would have wavelengths smaller than the sphere.

The idea of Ambisonics is to encode this spherical sound signal in the form of its spherical (spatial) Fourier transform. That way, you can drop almost all of the higher coefficients, as a form of lossy compression. In fact, most systems just capture the first-order spherical harmonics, which is enough to do a good job localizing a single sound. Conveniently, you can do this with no computation, using just 4 standard mics in the right configuration.

Spatial discrimination of multiple sound sources will be limited unless you capture higher order coefficients, which requires a much more sophisticated setup.

The second part about Ambisonics is that now that you have this handy spatial signal, you can map it back to an arbitrary speaker setup to approximate the original sound field over that sphere, if you know where the speakers are placed.

In many ways, it's far more scientific than much more common forms of multichannel audio, like stereo or 5.1. But that's kind of the downfall, because it doesn't necessarily map to that many real world use cases. In the real world, we dynamically navigate fields of sound, but Ambisonics is only going to capture sound at a predetermined point in space. If you imagine the edits in a video, the visual point of view is constantly changing. I suspect doing the same with audio wouldn't be as effective, but who knows?


Back in grad school our professor [Tom Holman](https://en.wikipedia.org/wiki/Tomlinson_Holman) did some amazing work in spatial sound, specifically for cinema. He chose to double the number of channels from 5.1 to 10.2 and the results were pretty amazing.

If my memory is working correctly, the channels chosen were Left, Left-Center, Center, Right-Center, Right, High-Left, High-Right, Left Surround, Right Surround, Back surround, and two low frequency channels. His research indicated that the human ear has better vertical localization towards the front (potentially having evolved to detect tree-dwelling predators, for example), and experiments with dummy head recording produced inadequate results for theatrical reproduction.

Sadly it doesn't seem to have caught on (yet), probably due to the expense of having to retrofit cinemas. Anyway, it was really cool to listen to. I imagine the use of ambisonic recording rigs will greatly benefit the 360° video playback experience. Don't know what the other use cases might be yet.


Rather than adding fixed channels, many cinemas moved to Dolby Atmos, which has 128 audio streams with spatial placement metadata. Those are rendered to whatever speaker array is available.

https://www.dolby.com/us/en/technologies/cinema/dolby-atmos....


I have been trying to get professional musicians interested in ambisonics for about 20 years, since I first learnt about it from my brother and I haven't yet found a single one who is at all interested in experimenting with it.

I have had a few going as far as denying that ambisonics could even have any kind of useful application in music at all.

The visual arts crowd seem far less dismissive, weirdly.


This doesn't surprise me. I think most music – recorded sound – generally falls into one of two categories: either:

• it's actually about the music, not any particular "auditory experience", so technologies like ambisonics – or heck, even stereo for that matter – are perceived as gimmicky (akin to 3D TV), or

• it's about crafting a very particular auditory experience: audio is mixed in the studio so it matches exactly the artist's vision when played back to your two eardrums.

Especially the latter case I think is not served by ambisonics, because now the artist has no control over how the product is consumed. I can only imagine the difficulty involved in preparing a musical recording that sounds like quality art when the listener can – and is expected to – completely change the dynamics merely by tilting their neck. Sure you can deliver the "concert experience" but no-one goes to a concert for the ability to hear the music with their head cocked at a weird angle… so technologies like 3DIO I think already serve this purpose well.

The analogy I would put forth is a movie where the viewer can control the camera angle. (If I recall, this is actually a technology that exists to some degree, and has found application pretty much only in pornography.) Good cinematography carefully controls the viewer's attention and focus through use of set design, camera angles and focus. Once you let the viewer just look anywhere, set design becomes immensely more complicated, and the artist loses the creative control afforded by camera positioning.

Whereas, visual arts is exactly where I would expect ambisonics to find application. Visual arts is all about setting up exploratory experiences, which is impossible with traditional stereo audio recording and fixed video recording. With AR and ambisonics, now the art can exist digitally. I think ambisonics will find application in video game design too, for the same reason: now 3D soundscapes can be recorded, rather than simply synthesized.

Of course there are crossovers… I would be surprised if the ambient music crowd (think Music for Airports) does not take to ambisonics. I'm curious whether your experience differed between musical genres?


They are not hard to diy, 4 electret capsules like Primo EM184. But by the time you've built the holder and preamps, might as well spend $300 on the Zoom H3-VR.

I don't think they have much use outside VR - you basically need head tracking for headphone use (or a lot of speakers)


How different is the H3-VR from the spatial audio of the Zoom H2n?

I’ve got an H2n though mostly I’ve only been using it with line-in to record from my synth now and then, so I haven’t really used the spatial audio feature... so actually come to think of it I’m not quite sure why I am even asking :p Oh yeah, when I originally bought my H2n I did so because I wanted to get into field recording.

In the end I didn’t find many interesting things to record; traffic - boring, wind in the leaves in the forest - boring, birds singing - birds don’t sing much here, trains - marginally entertaining the first time I recorded a train coming to a stop.

I guess what I should be asking is: What sorts of sounds are you using the H3-VR to capture? And do you find any advantage in using the H3-VR instead of an H2n for said recordings?


unless you're specifically recording ambisonic audio to go with 360 videos or VR content I think you're fine with the H2n. The main difference with H3-VR is the 6-degree gyro for aligning the soundfield correctly with the 360 video, and onboard processing for previewing ambisonic sound with headphones


I’m just starting to learn the process of capturing and mixing 360 sound. I’m confused though, is the “only” benefit of buying an ambisonic mic just have to do with the ease of workflow? Sort of how it’s best to use a matched stereo pair for stereo recordings if you want the best result with the least hassle?


What do you all think of Creative Technology's "Super X-Fi"?

http://www.mobilitytechzone.com/lte/news/2018/09/05/8812379....



I think that could be really interesting. Ears/brain are very good at localising sound, but normal headphones don't really work well for this.

Ossic raised $2.7m on Kickstarter to do something similar but failed to deliver, I'm hoping Creative have better engineering and deeper pockets.


I'm wondering if someone can clear up a related question. I've always heard that two ears are needed to identify the direction of sound. However, I can tell the direction of a sound when one ear is covered up and without moving my head. Try it yourself: put a TV or other sound source behind yourself, positioned a little to the left or right, cover one ear, and you'll see that you can reliably tell whether it's more left or more right. So I think that something more complicated is going on within the ear and that a single ear is not strictly unidirectional.


This article might help:

https://en.wikipedia.org/wiki/Head-related_transfer_function

Short answer is that yes, the folds of cartilage (pinnae) in your outer ear are there precisely for this reason, to provide some basic directional-sensing functionality with only a single ear.


Localizing the sound isn't just a matter of hearing a phase shift between your ears, you also have some 'knowledge' about how sound moves in a room, how your ears change the sound, and what frequencies will be more prominent given which side of you the sound is made on. The human brain kills it at real time signal processing.


I read somewhere that the human (and ape) ear has evolved its unique set of grooves and folds to help create sound reflections that allow the brain to better spatially locate sound, even without phase shifting between both ears.

This also allows us to locate sound not only spatially along the axis between the ears, but above and below as well.

I wonder if people using hearing aides, where the sound is recreated further inside the ear, are less able to locate sound sources.


I am deaf in one ear, and I do frequently struggle to determine the direction of sound. For example, if you call my name in a room full of people, or hide my phone and make it ring, I have to rely on non-directional cues to figure it out. (In these cases, recognizing your voice and seeing your face, or walking around and using the volume changes to figure out how far away I am.)


If you like to see a full talk about this topic, held by a renowned German microphone company "Schoeps Mikrofone", watch this video: https://youtu.be/K-ktLM0dQeA

This high-end manufacturer might have a divergent perspective on those topics in comparison to some other companies. Still, they supply recent high budget productions in 3d audio, film or 3d audio sports events.


Will this be a big thing in ASMR?


Something I don't often hear mentioned is auditory surveillance. An array of highly sensitive 3D microphones with sophisticated processing seems like it could completely destroy most remaining privacy that we have.


My intuition says that this idea, miniaturized to the size of an atom, might bring back some heretofore unknown information about the atomic realm. Like an atomic, multi-directional sound (or any wave in the EM spectrum) transducer, and then bring that data to higher levels... Yes, I know I'm crazy... <g>


I wonder whether these types of microphones are suitable for field recording.


Why not? The world is your stage, now in 3D.


It would be really cool to have music composed for 3D recording.


This is not mathematically necessary at all. This just reads like astroturfing for an overpriced mic retailer.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: