Hacker News new | past | comments | ask | show | jobs | submit login
What strips vocals from audio when a 1/8' audio jack is partially unplugged? (electronics.stackexchange.com)
183 points by taylorbuley on Oct 3, 2012 | hide | past | favorite | 44 comments



I think everyone knows that there's only one correct answer and that's phase cancellation because the dry vocal track is typically dead center in the mix.

Here it happens because when the earphones' ground connection is disconnected from the source, the earpiece drivers remain connected to the L and R hot signals and their former common grounding point floats and leaves them in-series. This means that any signal present equally on L and R results in no current through the transducers. You hear only the difference signal in both ears, although to be precise, one is actually opposite in polarity.


Why repeat exactly what's in the linked page? Probably half the people reading that stackoverflow question from hn have EE degrees so the answer was obvious to most, but I don't see the point of just copying the answer over to hn.


I didn't think anyone had covered the small leap from "disconnected ground" to a "get a difference signal" as presented in other comments. Glad to hear there are a lot of EE's here!


As a non-EE, something that confused me briefly is that "L-R" doesn't seem symmetric but the scenario is symmetric. But I see now that you'd hear L-R in your left ear and R-L in your right ear. The R current goes the "wrong way" through the left speaker and causes the coil to move opposite the way it does in the right speaker (and vv).


Discovering this trick as a kid was one of the many steps on the path to my fascination with all things acoustic. The widening effect of having the two earpieces out of phase was particularly intriguing to my young mind.


Is "phase cancellation" a general term for situations where two signals cancel out each other?

(I'm rusty at this, but when I read 'phase', I think of a time difference between two signals. In this case, there's no time difference.)


The answer from the link said 'in-phase cancellation' which isn't really a 'thing' nor accurate. He was just trying to say that when the signals are in phase you wouldn't get sound. There isn't actually any cancellation going on here. When the signal is the same in both channels there is just no voltage across the speaker so it doesn't do anything.


Their signals are cancelling out because they are in phase, this is a thing. It describes what is happening and why. You are arguing semantics here more than a technical point here, from my perspective. I agree with the physical reason, just because it is simple does not mean you cannot have a term for it in the case.


I am in a way arguing semantics because I believe its confusing to non EE people reading this to use a term like 'phase cancellation' in a way that is very different from its usage in almost every other context. In general it refers to the sum of two waves equaling zero (or some reduced value), i.e. destructive interference. In this case the opposite is happening and the vocals are removed not due to a sum of the waves but rather a difference equating to zero.

Side note - welcome to hacker news! Assuming your the same kortuk from the EE stack exchange I believe we've interacted a few times. My account there is: http://electronics.stackexchange.com/users/1438/mark


Yes, we have interacted there a few times. I would probably use the term "common-mode cancellation." But it is something that is a very base concept in EE and the other terminology makes sense to me, although for non-EE I could see the issue.

You should make hacker news your website on EE? Your account looks some bare there, add some information!


I think in this case it's phase inversion of one of the signals as opposed to a phase shift which would imply a time difference.


There is no phase inversion here. When the signals are the same there is just no voltage across the speaker. Think of it as putting 5 volts on both sides of the speaker, the voltage difference from one side of the speaker to the other is 0 volts, so no current flows and the speaker is at rest.


In audio engineering, yes. A null symbol on an analog mixer will be referred to as 'phase reversal'.


What anigbrowl refers to is the "crossed-out-circle"-symbol: ∅ http://en.wikipedia.org/wiki/%C3%98_(disambiguation)


> ... the dry vocal track is typically dead center in the mix.

What do you mean by 'dead center in the mix'?


Stereo audio has completely separate audio signals for the left and the right channels. One of the reasons to use separate channels is to allow creation of a "sound stage". If a track is well mastered/mixed and you have good speakers/headphones you should be able to pick out each performer's location as if they were on a stage in front of you. For example you should be able to discern where the lead guitarist is standing vs where the bass guitarist is standing. This is done by controlling the volume and phase of the sound in each channel for each performer. Generally the vocalist is placed dead center of the "sound stage" and the way to achieve that effect with stereo audio is to feed the exact same signal(in both phase and volume) into both the left and right channels. The 'mix' is just a term used for the way in which the various recordings of each performer are combined to create the final product.

There are a lot of fun things you can do acoustically with stereo sound. Some phasing effects can actually be pretty 'tripy' for lack of a better word.


Interesting. Any comment on how they get the "front of stage" and "back of stage" effect? I used to not believe this was possible, until I listened to a good recording on good speakers and could place the bass player clearly to the left and behind the singer.


Some depth can be modeled with phase control. By controlling the phase of a signal compared to another you can create a perceived time delay which makes it appear as though one performer is behind another. I'd also add that this is much easier to do with low frequencies (bass player) due to the wavelength being so long. At 49hz (G1 on a bass guitar) the acoustical wavelength is ~6.9 meters which means even small phase shifts (time delays) can create meaningful depth. This approach is pretty useless at higher frequencies as the time delay (and depth) achievable gets very small.

There are other ways to do this via more complex 3D acoustical modeling. Mostly focusing on modeling of reverberation effects but you don't see that much in music recording. It is used a lot in games though.


In addition to what mbell said, another simple approach for making one sound seem further away than another is changing the balance of the direct vs. reverberant sound (more distant == more reverberant sound, lower direct sound volume). This would happen naturally in a stereo recording with two microphones, and can also be done artificially.


Predelay time on the input to the reverb. The longer the gap between the direct and reflected sound, the closer to you the direct sound will seem to be.



Equally loud on both the left and right tracks of a stereo recording.


Dan discussed the same topic recently in his usual whimsical and thorough fashion:

http://www.howtospotapsychopath.com/2012/05/16/the-music-goe...


It's not just the vocals you're losing, you're losing everything in the center of the mix. Typically that means kick drums, deep basses and anything else lower than about 120Hz (these pieces are always mixed in mono), plus vocals, and a lot of times snare drums.


So called phase cancellation as often vocals are positioned in the middle of thee panorama and there's no difference between the content of left and right channel of the vocall tracks. That's thee popular way to create instrumental and acapella versions for mashups.


Another way is to invert the phase of an instrumental version of the track (or the actual track) and mix them together.


I tried doing that once, it didn't work as well as I'd hoped (ie: barely worked at all).

I'm assuming it's because the vocal track had to be removed pre-master, so the instrumental version would then mastered differently. Also the original track was a high quality cd rip, where as the instrumental was a downloaded mp3, so that would have had an adverse effect.


It depends on a few things...

-As mentioned earlier, how the mp3 was prepared, whether split or joint stereo encoding.

-The source of the instrumental version. If the source is vinyl, all bets are off. If the source is a proper "Instrumental Version", it would have been mastered from an identical mix but without vocals and vocal effects (ex. reverb), and with the exact same settings as the original master. That is how major labels typically do it --- if it's apparent that the album will be popular, they'll usually have an instrumental mix (and master) created at the same time. This is done because the label wants to have instrumental versions for later potential use in films, TV shows, award ceremonies, commercials, etc. It's much more expensive to go back and "recall" settings in order to "mix-down" instrumental versions. This usually requires hiring the original mix engineer or one of their assistants and renting the studio at which it was mixed. Many pieces of "analog" mixing gear are not "stepped" with repeatable settings. Analog gear also has differences between individual pieces of equipment, even if the same model. Equipment breaks, is sold or removed, and this makes the task of a recall that much more difficult. Recalling mastering settings isn't nearly as daunting, as mastering studios typically retain all of their equipment and keep it in working order. This means that good documentation is required, similar to the software development process. For many of your all-tim favorite albums, there exists binders full of charts, diagrams, and notes detailing microphone types and positioning in the room and relative to the instrument/speaker, channel assignments, mixing equipment settings, the type/speed/bias/etc. of tape, lyrics and notation, even the types and tuning of instruments. It's a very complex process to document and recreate.


That's because most of the time mp3 encoding will introduce subtle changes and differences between the channels. It's the result of chosen encoding method (joint, stereo, forced stereo) as well as specifics of perceptual coding itself. The best way to create such material is to rip necessary tracks from Dolby surround DVDs.


If you have a CD player (last century tech) that had a karaoke button that's what happens when you press that button, the two channels are merged.


Not merged, but subtracted; in effect, taking one side out of phase and adding that. If you just merge them, common components (sounds in the middle) double, rather than canceling out.


Could you use this to put an obfuscated message in a stereo recording... only audible when in this half-plugged-in difference-mode?


I'm not positive about this, but what if you take white noise (or pink noise to keep the frequency distortion to a minimum) and then add your hidden message to the left and right channels? Split it so that every other frequency band is on the other channel (L, R, L, R, etc) to keep it roughly even, and make sure the noise is on both channels equally (centered). The noise would drown out the message generally, and the message would only "come out" when the center was subtracted.


Yes, but with several caveats. You could to put it in a frequency band unused by the surround audio, but masked by the center - but this would show up very distinctly on spectrograms or phase analysis. Likewise, one could modulate the panning of a particular frequency band across the left and right, but that would also be quite noticeable on visual analysis.

You could do wideband modulation that would be much harder to detect, but only if your music is repetitive...such that you subtract, say, the instrumental content of the first verse from that of the second to reveal your hidden message. That's not so difficult, given the ubiquity of loop-based and electronic music nowadays.

One other issue with all these schemes is that the more well-hidden your steganographic message is, the more likely it is to be corrupted or destroyed by conversion to lossy data formats such as mp3. If you absolutely had to hide a message in musical form, I think it would be better to do in the manner of the purloined letter, ie haiding it in plain site. Encode it into the rhythm of the cymbals, or the sequence of pitch differentials in the primary melody or some other foregrounded musical element. You can use binary strings or a sequence of rational integers as musical elements very easily, and said sequences can be pre-encrypted. Sure, the signal is even more visible in that context, but who's to say that it's covert or merely artistic?


I don't think so, it would still be audible when the signal L and R have in common is transmitted. Maybe if put at a low enough volume?

I'm just guessing here, though. Definitely not an expert.


You can get a similar effect, but in a different manner, with attempting to plug a CD or MP3 player into certain mixing desks. The obvious approach, when you have a 3.5mm (1/8") stereo jack from the device, to attach it into one of the 1/4" line input sockets with a suitable stereo adaptor.

In most cases with older desks, this will be fine (you'll only get the left channel from the source). Some newer/smaller desks have a balanced jack input instead or as well as a balanced XLR, and plugging in an audio source will get exactly the same effect as described in the thread, since the left and right channels will be treated as a balanced pair and subtracted.

Cheapest approach to connect such a device to a mixer is a 3.5mm to 2x phono plugs and a pair of phono to 1/4" jack adaptors, connect to two channels on the mixer, and fully pan left and right.

(Never really thought about it before, but here in the UK it seems perfectly normal to refer to 1/4" and 3.5mm plugs at the same time)


Interesting note that mp3 compression does not keep the L/R channels perfectly independent. Since this means on playback the L/R fidelity is not 100%, you will have better results creating a difference signal with a lossless/uncompressed file than an mp3.


Doesn't that depend on what mode you use? Joint Stereo behaves as you describe, and is the common mode, but regular stereo mode encodes the two channels independently.


The difference in the signal in one channel will make the encoder throw away different parts of the audio compared to the other channel. This produces data loss in both the 'mono' portion of the signal and the 'stereo' part of the signal. So the 'mono' part of the signal (the vocal) ends up being different between the tracks since data loss is applied across the whole frequency range (or large chunks). Note that Joint Stereo has the same problem because the 'joint stereo-ness' only applies to the top part of the frequency spectrum.


Interesting point. Just to make sure I understand you correctly, you're saying that even in independent stereo mode, the fact that the two channels have a limited maximum total bitrate means that the encoder can and will trade off bits between the two channels in order to account for one channel or the other needing more data at any given time?

Seems like you could easily create an encoder that didn't do that, and just e.g. encoded each channel at 64kbps when a total of 128kbps is requested, but maybe none of the real-world encoders actually do this.


No, I think it's that even if each channel gets the same bitrate, if there's more "going on" in one channel than the other, there may be fewer bits to go around and that channel may have more qualitative loss. So if they both have the vocals equally, but one also has bass drum, the one with the base drum may lose more fidelity on the vocals than the channel with vocals alone.


Excellent point! I hadn't thought of that. It strikes me, then, that Joint Stereo would actually be better in this case, as it effectively encodes the mono signal and the L/R difference, which is what you're after anyway for this purpose.


I don't understand what you are getting at. Typically MP3s use mid/side encoding, where the diffence between the channels is computed before the compression step.


Then I guess this won't affect The Beatles's "Eleanor Rigby" (stereo variant).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: