Hacker News new | past | comments | ask | show | jobs | submit login
What if we made video conferencing work? (alexstjohn.com)
109 points by douche on May 2, 2017 | hide | past | favorite | 77 comments



Having worked in VOIP for a good part of my career, there aren't actually that many hurdles left. The biggest one is IPV6. Once that is pervasive, then you have an easy way to get end to end connectivity without having use a central server to carry the content. This is really, really massive because no matter how big you are, the central server is just adding latency -- sometimes a huge amount.

The next bit is finding a good UX for doing audio setup. Right now automatic gain control is just broken. The vast majority of problems that people have (after having no audio at all) is that the audio is set up badly. If you want to see how good audio can be in a VOIP situation, take a few minutes setting up mumble properly and talk to someone. It's incredible.

Those 2 things will solve virtually every audio problem because most of the really pathological cases where echo cancellation breaks are initiated by those 2 things. However, for doing video conferencing where you have lots of people, you probably need a hardware solution. This hardware already exists, but it doesn't talk to any voip software that I know about (granted I've been out of the business for a decade). Anybody really serious about this market needs to build hardware (or better yet, pair with someone willing to build commodity hardware to open standards).

Video? It's really no problem. It can lag for seconds and it's totally fine. All of the serious problems are audio.


I think one of the biggest problems with video calls is the lack of eye contact. I've seen solutions [0] that try to solve it with software but it's not good enough and it's not real eye contact.

A hardware (science fiction) solution could be to have lots of cameras behind the screen. The transmitted image would be from the camera your eyes are actually looking into.

[0] http://mashable.com/2013/08/27/video-call-eye-contact-app/#4...


I feel like the real problem here is that the camera attached to your screen always seems to be centered above (or below), so you get a skewed looking-down-upon-you situation - either you're staring at a forehead, or the other person looks like they're looking downward.

A solution here would be mounting a camera on either side of the screen for a (very) wide stereoscopic view, and then averaging it in software.


  > ... there aren't actually that many hurdles left. The biggest one is IPV6. Once that is pervasive, then you have an easy way to get end to end connectivity without having use a central server to carry the content...
I'm not really a networking pro. Could you ELI5 why IPv6 is such a magic bullet for VOIP?


IPv6 means every device can have a globally routable, unique address. Hence bypassing the need for relay or hole punching servers.

On a side note, I don't believe it to be a major problem in this context though.


I'm pretty sure I'll still have a firewall between my IPv6 devices and the internet.


Right, IPv6 gives you no guarantee that you'll be reachable. What default firewalling should look like is still an ongoing discussion.


Sure, but you won't need to NAT them.


Yeah but what advantage does it have over hole-punching? You have a direct connection in both cases.


A slight correction that hole punching only works maybe 95% of the time, leading to endless non-technical articles commenting about how none of this works never. 95% is the same as 0% in the popular press.

Almost all voip problems are lack of bandwidth and wifi lack of reliability. Enough BW and a wired connection and its as perfect as legacy copper wire telephone service. The bean counters are of the opinion that 95% reliability is spending "enough" and the popular science article writer are of the opinion 95% reliability is the same as 0% so this will not change until the economics and infrastructure of broadband connections dramatically change. The problem has nothing to do with app developers or library authors, and has everything to do with the network engineering team in IT fighting with the bean counters, on several levels.


Yes, this is exactly right. With IPV6 you can give explicit instructions to open up a port because you know what port to open up.

The frustrating part is that none of this should be a problem. It's just that lots and lots of routers don't follow the specs properly. Here it is 2017 and you still can't reliably punch a hole in a firewall.


That is kind of the point of a firewall, no? If some random video conferencing app can punch a hole, get a refund.


Even the best video conferencing software (codecs, etc) is no match for an unreliable network with zero latency guarantees. Otherwise we would have done this already. But you can't buffer video calls.

Who knows, maybe with net neutrality gone ISPs will provide priority traffic so companies can pay more for better video conferencing (whatever better means). And rich people can pay for better Netflix. We can dream and look on the bright side, right? I just hope you aren't a video conferencing startup, because $megacorp already has an exclusive with the ISPs for video conferencing on this priority network...

(On a serious note, I mention this because it's part of the age-old network traffic prioritisation debate, not to start a flame war about net neutrality. That includes the downside of prioritisation, too)


No, don't blame this on net neutrality. Skype used to be great, even for vidcalls - maybe not conferencing, but was tolerable - when it was pre-M$ and when it was p2p.

The thing that is destroying video calls is making them going through a centralized system; that's a delay and 2x bandwith for everyone.

This is not a net-neutrality thing.


> No, don't blame this on net neutrality.

I never said that. I agree it's more fundamental than that, since TCP/IP (and UDP) make no latency guarantees. (I also happen to live in a country unaffected from the situation in the US, so it is just a gedankenexperiment for me)

> The thing that is destroying video calls is making them going through a centralized system

There are many things that destroy video calls. Inadequate bandwidth. Terrible WiFi hardware (LTE/4G has worked better for me in several cases). Centralized system can even provide a benefit. Users of Teamspeak or Ventrilo might remember how much they outperformed P2P in an age of very limited bandwidth (given the server had adequate bandwidth).


So shouldn't WebRTC be more awesome then?

http://peerjs.com/


There's still unfortunately some sizeable gaps in implementation[0].

[0]:http://caniuse.com/#feat=rtcpeerconnection


Okay but on the browsers where it's implemented well, it is awesome?

It's not like Skype is designed to run on many architectures per OS either. Just pick one and use it!

My question though -- does WebRTC have better quality videoconferencing than Skype and Hangouts right now on the best browser implementation?


> Not to start a flame war about net neutrality

Too late!

I've never been convinced that lack of prioritisation is the real obstacle to videoconferencing. It's usability. The only videocall application that's ever achieved decent popularity seems to be FaceTime.

Mind you, one-way videoconferencing is becoming very popular these days in the form of "streamers".

There are also non-technical usability questions - like phones, the social cues are subtly different to same-room conversations, and potentially frustrating when it comes to the delicate dance of getting the other person to let you talk.


The only? People still use the word "Skype" as a general term for video calling.


> Mind you, one-way videoconferencing is becoming very popular these days in the form of "streamers".

Streamers are normally not real time, there's a large delay, meaning lots of time to buffer!


FaceTime works well, but only between two parties. Skype used to work well, but has inexplicably gotten worse - anecdotally of course.

> I've never been convinced that lack of prioritisation is the real obstacle to videoconferencing.

I'm not saying the lack of prioritisation is why it fails, or that prioritisation will fix everything. But lower latency and higher bandwidth doesn't seem to hurt. Going from 100Mbps to 1Gbps made a huge difference. And 1Gbps isn't that much, you can now get it some residential areas in the UK for a very reasonable £63 (about 80 USD) per month.

Again UK specific, but some ISPs have much, much better routing than others (even with "net neutrality"). Of course, in the UK ISPs have to do various government filtering, which probably adds several milliseconds...


Sounds like an idea for an app: each side of the call gets 60 seconds to speak, then is automatically muted and the other side opened. Or do it the speed chess way and have a button you press when you're done with your thought so you can "bank" time for a later point of conversation.


Sounds fun. We can pretend we're in The Expanse, and we have to deal with the communication lag between Earth and Mars or the Belt.


Replace the button with voice recognition and let the people say "over" when they're done talking.


Dropped calls every time you utter the word "over" don't sound like a usability enhancement


Hmm. With decent enough speech recognition machine learning (!) you could have the system act as "chair" and decide who it thinks should be speaking, with appropriate UI cues.

Quite tricky as you have to interpret non-speech vocalisations.


so...Snapchat?


If snapchat was a continuous stream like a skype meeting, yes


> is no match for an unreliable network with zero latency guarantees

UX is a disaster. At least they could display latency / packet loss charts for both directions. Volume gauges. Warnings on audio feedback or high compression.

We spend too much time asking if volumes and quality are ok.


> We spend too much time asking if volumes and quality are ok.

I agree, but that happens on teleconferences using phonelines, too. So not really a video conference only thing.

I'm not sure if all those problems are truly UX only problems though. How useful is an n-way latency graph Volume gauges, I see that more as a hardware fault. When using a phone or an iPad, that question never comes up. Audio feedback, IMO hardware/driver issue. Again, speakerphone is okay. High compression is obvious from the blockiness.


> that happens on teleconferences using phonelines

Phonelines do not have packet loss, latency and volume auto adjust issues. And a big screen that can fit a couple of gauges.

> High compression is obvious from the blockiness.

Not when you are the sender - and not when you are using audio only.


> you can't buffer video calls

FaceTime's "pause" of the video is one option. Software on both ends should keep a few user-selectable "keyframes" and display them when they drop to only audio.


There are a lot of comparisons here which don't make any sense. It's not the same people making HoloLens work (which may be overblown, since not many people have had heads-on time with one to report back) and making Skype so awful, because Microsoft is huge and have different teams doing different things and some of those teams are clearly better than others or get their priorities set by people more in line with what the author would want than others.

Likewise at Google, Hangouts and self-driving cars won't share any people. Especially as their self-driving cars are now handled by a different company, Waymo, who have the same parent company as Google.

I get the point that it's astonishing that videoconferencing tech is still so awful. I remember interviewing at Google's London office years ago and two of the interviews were done by video links, and they worked flawlessly - but this very impressive conference system was one Google had bought in from somewhere and presumably paid ridiculous sums of money for. That experience gave me hope for consumer video tech - but it's now ten years later and where are we?

FaceTime seems to work, but it only works between Apple devices. Google Duo seems to work in my limited experimentation, but only works between phones running the Duo app, because Google have more hands that don't know what any of the other hands are doing than just about anybody else - especially in the consumer communications arena.

I wish someone would get their pants on and seriously invest in sorting out video calling.

Heck, getting audio to work reliably would be good. Best one for that currently in my experience? Slack.


I would like to add to the first part of your comment that it seems lots of people don't understand how teams in a company work as well. You can't pull people off the HoloLens team and expect them to make huge leaps on the Skype team. If someone is in love with their HoloLens work, drive that forward, they might not be in love with Skype. Thus you have situations where new breakthroughs are hard to come by. I would think working on Skype is not nearly as exciting as HoloLens. #1 because I am sure Microsoft has to be insanely protective of the platform and not make large changes all at once. Not the case with HoloLens.


What you can do however is say to the hololens team that one of their priorities is to have a killer avatar based augmented reality video-conferencing app and that it's going to be one of the core drivers of the product so they better make sure it works and works well.

It doesn't need to be built on Skype (and preferably wouldn't), but they should be looking at that as a core market for the product.

A few months back I was talking to a friend, whose company was paying him to fly to the UK to host a week of training or something similar, and we calculated that the cost of flying and accommodating him for the week would have been equivalent to purchasing PCs and VR gear for all participants. Except of course there wasn't really any software up to the task of virtual conferencing.


Though fairly decent, FaceTime gets shaky with what seems like just slightly sub-optimal network conditions.


I wish someone would get their pants on and seriously invest in sorting out video calling.

It's not going to be sorted out...probably ever IMO.

To do teleconferencing right you need to metaphorically impedance match between two users at ever increasing bitrate demand.

So if one side has 3G and the other side is on Fiber - you have an impedance mismatch and an asymmetric experience.

Until the floor for internet is basically what users would define as delivering "excellent" video calling - it's never going to be solved. Watch out for those moving goalposts though.


Aside:

"Can you imagine the outrage that would occur in Palo Alto California if anybody presumed to keep chickens in their back yards?"

When we lived in Palo Alto (~2008-2011), we had about twenty chickens and three goats. The neighbors all loved them. There are some ways in which Palo Alto is a pretty weird place, but almost all the people there are just like people anywhere else.


Well, the median household income is $126,771, so it's not exactly just like anywhere else.


In a lot of places I think people would not appreciate chickens.


Chickens are fine if you keep them clean. Cockerels in an urban area will get you anything from a noise abatement order to a drive-by shooting.


Chickens aren't so bad, it's the roosters that you need to kill.


There are a bunch of options for low-latency video conferencing solutions available for people who actually need it (tele-medicine/surgery, sports/media broadcasting, enterprise conferencing). The thing is they just cost a lot of money and require dedicated hardware. Here's one company I know of that does this: http://www.haivision.com/


We have a Polycom conferencing camera (CX5000), which seems to be something originally designed by Microsoft and called Roundtable. It works well, including for international calls — we have regular calls using Skype without issue to Canada and Taiwan from Europe, and irregular calls elsewhere.

It does have an essentially perfect 1Gb/s Internet connection.

This is the current version: http://www.polycom.com/products-services/products-for-micros...


sigh Polycom - such good hardware with such awful software.

We have Polycom VVX VOIP phones, and the call quality blows the socks off any of the competition. However, we also have to maintain a TFTP provisioning server that feels like it was last updated in the 80s. I have a feeling that Polycom have only just discovered the internet. Hopefully they can overcome this last hurdle.


We have a couple of Polycom CX5000s too - the second one we had to source from eBay! We use Skype for Business for 100% of our voice and video communication, and have virtually no trouble with it.

Any problems in my experience stem from remote users' ISP bandwidth issues, or their inability to read instructions. No complaints about SfB.

I wonder if the author is using Skype for Business or the consumer Skype.


We're using Skype for Business and it's a daily source of issues. Call quality is often terrible and - unrelated to the audio/video aubject - screen sharing is so laggy that it is literally unusable most of the time ("Please click over there. wait for visual update "Now do..").

Another fun issue is that you cannot directly paste stuff without starting a conversation. It will block your message as "too long" most if the time, unless you write something meaningless and short first. "Hey. Skype sucks. Incoming error I need your help with:".


My experience is about the same.

Video conferencing/screen share is a huge part of how my company works - and SfB is extremely reliable for us. It 'just works', if it didn't we would switch to something else immediately.


it's the other way around, Polycom designed Roundtable for Microsoft.


Part of the problem is that the camera is positioned above the display, so the person on the other end isn't looking you in the eyes. It's a bit disconcerting.

Perhaps someone will invent a one-way transparent display where the camera will look out from behind the middle of the monitor. Alternatively, we could position multiple cameras around the edges of the display and then use real-time video processing software to combine those streams into a single synthetic video which makes it appear as if the person is looking at you.


So I'll have a post-it note in the middle of my monitor :)


I guess this is a fair time to promote the Video Conferencing startup(-ish) I'm working for: StarLeaf https://www.starleaf.com

The author never say exactly what they want fixed, but we sure know most VC systems are a horrible mess. We spend a lot of effort on UX and making good quality calls work through NATs and random packet loss.

The main products are meeting room solutions, which tends to be least well server by free offerings. Squeezing everyone in front of a laptop is never great.


Author is bashing VR but what if conferences in 3d environments are a better solution than skype? People use 3d / metaverse solutions for education for example, because it's a compelling solution. A gaming environment may in fact be more productive than looking at 100 thumbnails of participant faces.


video conferencing is appealing because we communicate as much through body language and facial cues as by voice. We evolved to communicate face-to-face and end up missing out on a lot of information if we only represent ourselves with our voices and a visual avatar.


A large part of the body language component is reproducible today with hand trackers + head orientation. In terms of facial cues there have been some interesting demos with gaze tracking being reproduced onto the visual avatar.


but have we gotten past the uncanny valley yet? I've seen this attempted previously but it always seems very "off" in a way that engenders distrust. When we could just fix video streaming, it seems redundant to try to accurately replicate body language in a 3d avatar.


If the cost and cumbersome nature of VR can be reduced to bring it in line with just putting on a set of headphones and still be aware of the outside environment, sure.


>The real coffin-nail for videophony involved the way callers' faces looked on their TP screen, during calls. Not their callers' faces, but their own, when they saw them on video.

>This sort of appearance check was no more resistible than a mirror. But the experience proved almost universally horrifying. People were horrified at how their own faces appeared on a TP screen. It wasn't just 'Anchorman's Bloat,' that well-known impression of extra weight that video inflicts on the face. It was worse. Even with high-end TPs' high-def viewer-screens, consumers perceived something essentially blurred and moist-looking about their phone-faces, a shiny pallid indefiniteness that struck them as not just unflattering but somehow evasive, furtive, untrustworthy, unlikable.

>The proposed solution to what the telecommunications industry's psychological consultants termed Video-Physiognmoic Dsyphoria (or VPD) was, of course, the advent of High-Definition Masking. Mask-wise, the initial option of High-Definition Photographic Imaging — i.e. taking the most flattering elements of a variety of flattering multi-angle photos of a given phone-consumer and ‚ thanks to existing image-configuration equipment already pioneered by the cosmetics and law-enforcement industries — combining them into a wildly attractive high-def broadcastable composite of a face wearing an earnest, slightly overintense expression of complete attention.

>It turned out that consumers' instinctively skewed self-perception, plus vanity-related stress, meant that they began preferring and then outright demanding videophone masks that were really quite a lot better-looking than they themselves were in person.

>Even then, of course, the bulk of U.S. consumers remained verifiably reluctant to leave home and teleputer and to interface personally, though this phenomenon's endurance can't be attributed to the videophony-fad per se, and anyway the new panagoraphobia served to open huge new entrepreneurial teleputerized markest for home-shopping and -delivery, and didn't cause much industry concern.

--infinite jest (1996)


The question this article raises for me is - would people really object if you kept hens in a Palo Alto back garden? I keep chickens in London and I've never had complaints.


This is a great rant, but what are the best solutions to this today? Has someone, somewhere solved this or at least well on their way to a solution?


Can't speak as to how it scales but I had to use both webex and Zoom in the past year. Paid corporate accounts etc.. No affiliation to either.

Webex linux support is unbelievably poor(literally, greater than a 1/2 days work to get it installed on linux). Sound quality deteriorates on international calls. Just really crappy experience overall.

Zoom has rpms/deb packages available. Installation- 30 secs.. Sound quality was crystal clear on international calls.

Webex is one of the few product offerings that makes me truly angry.


Agreed! We use Zoom almost exclusively at Elastic. Since we're a fully distributed company, we rely on video conferencing pretty heavily. Zoom handles everything from a handful of people up to full-company meetings. It's pretty impressive to be honest, and doesn't try to melt a hole through your desk like Hangouts.

I use it actively on both Mac and Linux.

Not affiliated with Zoom at all, just a very pleased user :)


Last few months I have used Zoom, RingCentral, Skype for Business, Google Hangouts, and Gotomeeting.

Zoom was best followed by gotomeeting.


I really like https://appear.in - simple, straight-forward, and "just works". Don't know how well it works for more than two people, though. You can add screensharing with a Chrome plugin. It's all I need, really.


we use appear.in daily - it works well up to 5 people. It also has an interesting feature to allow you to "mute" certain participants, which works well in our use case where some participants can be in the same physical room. On "older" (say 2012 macbook pro), it easily pegs chrome at 100% CPU though

Paid Zoom works generally well, and much better on mobile than alternatives - you can also record meetings, and the recordings are of decent quality.


I'm another happy user of appear.in

There are still many improvements that are possible but at the moment, it's the best service I've used for video conferencing.


https://talky.io/ works for a reasonable amount of people, but it's not ideal for regular calls and definitely not secure enough.

There is also https://www.bluejeans.com/ which usually works fine in the company.

Skype for Business? Lol. If only it was possible to _join_ a meeting without an account, that would be great.

EDIT: someone asked if I'm seriously recommending bluejeans; I am, yes. Indeed it's hilariously CPU heavy and their linux support is a joke, but it's still working compared to other solutions. At work we have some vidconf equipment in some rooms which can dial bluejeans on their own and they usually work pretty well, so yes.


I use meet.jit.si (or a self-hosted version of the software) pretty reguarly and am a big fan. It's probably broadly comparable to other things people have suggested since it actually powers a lot of other video solutions behind the scenes, but I like their little demo service because it doesn't require auth and is pretty bare bones.

Full disclosure: I was briefly on the Jitsi team.


Hi Everyone: Am a little late to the thread.

I am the founder / CEO of Highfive. One of the companies implicated in Alex's post :-)

I know some in the YC community checked out Highfive early in our maturity curve when we released our product 2 years ago. I would love for folks here to try it out again and tell us what they think.

I took a few minutes to write up a few lessons we have learned over the last 4.5 years. It's definitely been harder than anything I've ever worked on. Would love your thoughts.

https://highfive.com/10-reasons-video-conferencing-still-suc...


This has to be the best non-tech-documentation piece I've ever read on Hacker News. The author makes great points with a well-placed sarcasm.

I know, I know. Corporations go where the money flows, and they don't want to change the status quo if there's no money in that revolution (case in point: not attacking the rental cars / hotels / airlines). I am well aware.

But the author is still very correct on a conceptual level.


Apart from bigbluebutton are there any other open-source video conferencing solutions out there ?

Rather than so many different versions of flawed product. I wonder how things would work out if a lot of creative individuals solved and made better the same problem.

Also, I think another pain point is that webcam and audio setup is tricky to most `windows` pc average user, is there a product that helps figure out the quality score of your setup and potentially help with it ?


The fix for video conferencing is AR, I have been video conferencing for 20 years, and after the couple minutes of introductions, the video feed is not needed 99% of the time.

Screen sharing on the other hand is great, just about always need that. But we really don't have to be looking at each other constantly.

So stop trying to fix something that is inherently not needed.


Those Piperchat guys are onto something!


I think there might be some legal issues with their tos...



No thanks. If video conferencing starts to work, my customers will start to ask for it. And the loss of productivity will dwarf the annual spending on business trips.

You executives can go on and fly, don't ruin work for everybody else.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: