It would be interesting to see how far you could get using deepfakes as a method...

teraflop · on April 18, 2020

This is a minor plot point in Vernor Vinge's excellent SF novel A Fire Upon the Deep.

One of the premises of the novel's universe is that computational power is generally absurdly plentiful, but communications bandwidth over interstellar distances is not. Most communications are in plain text (modeled after USENET) but in some cases, "evocations" are used to extrapolate video and audio from an ultra-compressed data stream.

The trouble, of course, is that it's not very obvious what aspects of the image you're seeing are real, and what aspects were dreamed up by the system doing the extrapolating.

cheez · on April 18, 2020

A main premise of the Fear the Sky trilogy as well but solved a different way. Machines representing various political factions from the home planet are uploaded with AI that mimics them emotionally and politically for all intents and purposes. I really enjoyed this book.

luhem7 · on April 18, 2020

Eh, I personally enjoyed the series, but I wouldn't recommend anything beyond book 1. Book 2 is ok. Book 3 really spoiled the series for me because of the inconsistent behavior if the main character. (Keeping it vague to avoid spoilers)

cheez · on April 19, 2020

Same. Notice I said "the book" while mentioning the trilogy ;-)

lgl · on April 18, 2020

+1 recommendation for this trilogy

Aperocky · on April 18, 2020

> it's not very obvious what aspects of the image you're seeing are real, and what aspects were dreamed up by the system doing the extrapolating.

It would be quite obvious unless the raw data before extrapolating is destroyed, for which there are no reason nor is it possible to stop others in the vincinity from receiving this raw data.

teraflop · on April 18, 2020

That assumes that the "raw data" is reasonably human-comprehensible (which neural network weights and activations are notoriously not) and/or that you have time to sit down and analyze the data at your leisure.

But saying more would be spoilery...

read_if_gay_ · on April 18, 2020

For that to be true the compression algorithm mustn’t be very efficient.

MasterScrat · on April 18, 2020

Google recently introduced something like that for Audio in Duo: https://ai.googleblog.com/2020/04/improving-audio-quality-in...

> WaveNetEQ is a generative model, based on DeepMind’s WaveRNN technology, that is trained using a large corpus of speech data to realistically continue short speech segments enabling it to fully synthesize the raw waveform of missing speech.

I don't think you need to train for each person specifically, you can just train a model for all heads, then maybe transmit a few high quality pics when the call starts, and interpolate from that afterward.

lambdaba · on April 18, 2020

Excellent idea and we'll surely be seeing something like this, there are AR apps that already map facial expressions to avatars.

Downside could be some uncanny valley if the models are not very high quality.

But if I had to make a prediction, I'd expect we'll get much more value from higher bandwidth, ultra high definition streaming and features like 3d cameras / virtual reality. I think we have a tendency to really underestimate how important high definition is for human communication.

4lun · on April 18, 2020

> I'd expect we'll get much more value from higher bandwidth, ultra high definition streaming and features like 3d cameras / virtual reality. I think we have a tendency to really underestimate how important high definition is for human communication.

Low latency is probably more important to me.

Recently I seem to have a 3 second delay on many VC calls at work (and just for me it seems), and I either end up interrupting people or feeling reluctant to talk at all since it becomes impossible to time gaps and conversations right.

Despite that I get a crystal clear HD picture for all participants, but I'd happily sacrifice video quality (in fact I'd accept audio only in some cases) to get a more real time experience (disabling video doesn't seem to have any effect).

JadeNB · on April 18, 2020

> Despite that I get a crystal clear HD picture for all participants, but I'd happily sacrifice video quality (in fact I'd accept audio only in some cases) to get a more real time experience (disabling video doesn't seem to have any effect).

If you're really willing to sacrifice video completely, at least for Zoom, and probably for lots of other videoconferencing solutions, you can call into meetings with your phone. In fact, I think Zoom allows you to join with the computer for video and the phone for audio, which might be the best of both worlds.

Scoundreller · on April 19, 2020

Yes, zoom supports that.

Slight issue in Toronto has been cellular system overloading and calls not being completed. But once on, no problem.

I can’t blame the providers though. How could they have predicted that people would use the service they’re paying for?

smichel17 · on April 18, 2020

This can be helped with hand-raising (queue style) and a dedicated facilitator for each meeting.

mlonkibjuyhv · on April 19, 2020

This is a long shot - but are you running on battery while this is happening? Had some weird issues that worked themselves out by plugging in the charger. Probably had to do with power savings and cpu throttling.

giovannibonetti · on April 18, 2020

> Downside could be some uncanny valley if the models are not very high quality.

That can be controlled, since these compression algorithms usually work by making a prediction and sending the difference between the prediction and the actual value.

That works both for lossless compression - where the difference is sent in full - and lossy as well - where only the most important part of the difference is sent.

abakker · on April 18, 2020

Even better would be for RPGs and things like Roll20. I’d love to deep fake different voices/ character faces on cue.

ajflores1604 · on April 19, 2020

This is very loosely what nvidias dlss game upscaling does. Generalized NN trained on super high resolution game engine output. You can run a game at like a quarter to half resolution and it upscales the rest.

https://www.nvidia.com/en-us/geforce/news/nvidia-dlss-2-0-a-...

yowlingcat · on April 18, 2020

Very cool idea. The coding used in H264 is a variant of the DCT, so moving one layer of abstraction up from there basically moves from semi-analog to fully digital. I agree that it should only require a fraction of the bandwidth because you'd only be sending parametric data rather than full video.

dheera · on April 18, 2020

I think this is largely possible, and accuracy to a human is very different than MSE accuracy used in a traditional lossy compression algorithm.

To a human, for example, the exact pattern of every strand of hair isn't important at all -- all that matters is that the hairstyle and hair color stays the same.

The algorithm can also not worry about encoding and re-constructing skin blemishes because humans would possibly actually enjoy not having to put on makeup for a video call.

stagas · on April 18, 2020

I was thinking the same thing today. I wonder if it can be done in on-the-spot, like capture your image from the camera initially and then send the rest as data points for deepfake generation on the other side, but based on your own image. That would be amazing for low/limited bandwidth situations.

userbinator · on April 18, 2020

Now, instead of having to send video data, you only have to send a representation of the facial movements so that the recipients can render it on their end.

The MPEG-4 part 2 actually had something like that, called "face and body animation (FBA)". As far as I know, there are no implementations in widespread use.

mwcampbell · on April 18, 2020

I wonder if the same kind of thing is feasible for someone's voice.

mwcampbell · on April 18, 2020

Update: I did some searching and found some interesting demos of a hybrid of neural nets and more conventional DSP called LPCNet:

https://people.xiph.org/~jm/demo/lpcnet_codec/

Sure enough, it was discussed on HN when it came out last year. I think I missed it then.

For those who didn't catch this from the URL, this is by Jean-Marc Valin, of Speex and Opus fame.

aidos · on April 18, 2020

I believe that’s what the vocoder was created for.

https://en.m.wikipedia.org/wiki/Vocoder

pbhjpbhj · on April 18, 2020

Mentioned in Jim Akhaleli's (sp?) "Revolutions" episode about the smart-phone currently on Netflix (my lad was watching it, really good for juniors or non-technical people IMO).

klondike_klive · on April 18, 2020

Almost right. It's Jim Al-Khalili. Great show.

peter_d_sherman · on April 18, 2020

I second the already expressed sentiments!

An utterly brilliant great idea!

lowdose · on April 18, 2020

This had already been done with criminals posing as CEOs.