Having worked in VOIP for a good part of my career, there aren't actually that many hurdles left. The biggest one is IPV6. Once that is pervasive, then you have an easy way to get end to end connectivity without having use a central server to carry the content. This is really, really massive because no matter how big you are, the central server is just adding latency -- sometimes a huge amount.
The next bit is finding a good UX for doing audio setup. Right now automatic gain control is just broken. The vast majority of problems that people have (after having no audio at all) is that the audio is set up badly. If you want to see how good audio can be in a VOIP situation, take a few minutes setting up mumble properly and talk to someone. It's incredible.
Those 2 things will solve virtually every audio problem because most of the really pathological cases where echo cancellation breaks are initiated by those 2 things. However, for doing video conferencing where you have lots of people, you probably need a hardware solution. This hardware already exists, but it doesn't talk to any voip software that I know about (granted I've been out of the business for a decade). Anybody really serious about this market needs to build hardware (or better yet, pair with someone willing to build commodity hardware to open standards).
Video? It's really no problem. It can lag for seconds and it's totally fine. All of the serious problems are audio.
I think one of the biggest problems with video calls is the lack of eye contact. I've seen solutions [0] that try to solve it with software but it's not good enough and it's not real eye contact.
A hardware (science fiction) solution could be to have lots of cameras behind the screen. The transmitted image would be from the camera your eyes are actually looking into.
I feel like the real problem here is that the camera attached to your screen always seems to be centered above (or below), so you get a skewed looking-down-upon-you situation - either you're staring at a forehead, or the other person looks like they're looking downward.
A solution here would be mounting a camera on either side of the screen for a (very) wide stereoscopic view, and then averaging it in software.
> ... there aren't actually that many hurdles left. The biggest one is IPV6. Once that is pervasive, then you have an easy way to get end to end connectivity without having use a central server to carry the content...
I'm not really a networking pro. Could you ELI5 why IPv6 is such a magic bullet for VOIP?
A slight correction that hole punching only works maybe 95% of the time, leading to endless non-technical articles commenting about how none of this works never. 95% is the same as 0% in the popular press.
Almost all voip problems are lack of bandwidth and wifi lack of reliability. Enough BW and a wired connection and its as perfect as legacy copper wire telephone service. The bean counters are of the opinion that 95% reliability is spending "enough" and the popular science article writer are of the opinion 95% reliability is the same as 0% so this will not change until the economics and infrastructure of broadband connections dramatically change. The problem has nothing to do with app developers or library authors, and has everything to do with the network engineering team in IT fighting with the bean counters, on several levels.
Yes, this is exactly right. With IPV6 you can give explicit instructions to open up a port because you know what port to open up.
The frustrating part is that none of this should be a problem. It's just that lots and lots of routers don't follow the specs properly. Here it is 2017 and you still can't reliably punch a hole in a firewall.
Even the best video conferencing software (codecs, etc) is no match for an unreliable network with zero latency guarantees. Otherwise we would have done this already. But you can't buffer video calls.
Who knows, maybe with net neutrality gone ISPs will provide priority traffic so companies can pay more for better video conferencing (whatever better means). And rich people can pay for better Netflix. We can dream and look on the bright side, right? I just hope you aren't a video conferencing startup, because $megacorp already has an exclusive with the ISPs for video conferencing on this priority network...
(On a serious note, I mention this because it's part of the age-old network traffic prioritisation debate, not to start a flame war about net neutrality. That includes the downside of prioritisation, too)
No, don't blame this on net neutrality. Skype used to be great, even for vidcalls - maybe not conferencing, but was tolerable - when it was pre-M$ and when it was p2p.
The thing that is destroying video calls is making them going through a centralized system; that's a delay and 2x bandwith for everyone.
I never said that. I agree it's more fundamental than that, since TCP/IP (and UDP) make no latency guarantees. (I also happen to live in a country unaffected from the situation in the US, so it is just a gedankenexperiment for me)
> The thing that is destroying video calls is making them going through a centralized system
There are many things that destroy video calls. Inadequate bandwidth. Terrible WiFi hardware (LTE/4G has worked better for me in several cases). Centralized system can even provide a benefit. Users of Teamspeak or Ventrilo might remember how much they outperformed P2P in an age of very limited bandwidth (given the server had adequate bandwidth).
I've never been convinced that lack of prioritisation is the real obstacle to videoconferencing. It's usability. The only videocall application that's ever achieved decent popularity seems to be FaceTime.
Mind you, one-way videoconferencing is becoming very popular these days in the form of "streamers".
There are also non-technical usability questions - like phones, the social cues are subtly different to same-room conversations, and potentially frustrating when it comes to the delicate dance of getting the other person to let you talk.
FaceTime works well, but only between two parties. Skype used to work well, but has inexplicably gotten worse - anecdotally of course.
> I've never been convinced that lack of prioritisation is the real obstacle to videoconferencing.
I'm not saying the lack of prioritisation is why it fails, or that prioritisation will fix everything. But lower latency and higher bandwidth doesn't seem to hurt. Going from 100Mbps to 1Gbps made a huge difference. And 1Gbps isn't that much, you can now get it some residential areas in the UK for a very reasonable £63 (about 80 USD) per month.
Again UK specific, but some ISPs have much, much better routing than others (even with "net neutrality"). Of course, in the UK ISPs have to do various government filtering, which probably adds several milliseconds...
Sounds like an idea for an app: each side of the call gets 60 seconds to speak, then is automatically muted and the other side opened. Or do it the speed chess way and have a button you press when you're done with your thought so you can "bank" time for a later point of conversation.
Hmm. With decent enough speech recognition machine learning (!) you could have the system act as "chair" and decide who it thinks should be speaking, with appropriate UI cues.
Quite tricky as you have to interpret non-speech vocalisations.
> is no match for an unreliable network with zero latency guarantees
UX is a disaster. At least they could display latency / packet loss charts for both directions. Volume gauges. Warnings on audio feedback or high compression.
We spend too much time asking if volumes and quality are ok.
> We spend too much time asking if volumes and quality are ok.
I agree, but that happens on teleconferences using phonelines, too. So not really a video conference only thing.
I'm not sure if all those problems are truly UX only problems though. How useful is an n-way latency graph Volume gauges, I see that more as a hardware fault. When using a phone or an iPad, that question never comes up. Audio feedback, IMO hardware/driver issue. Again, speakerphone is okay. High compression is obvious from the blockiness.
FaceTime's "pause" of the video is one option. Software on both ends should keep a few user-selectable "keyframes" and display them when they drop to only audio.
There are a lot of comparisons here which don't make any sense. It's not the same people making HoloLens work (which may be overblown, since not many people have had heads-on time with one to report back) and making Skype so awful, because Microsoft is huge and have different teams doing different things and some of those teams are clearly better than others or get their priorities set by people more in line with what the author would want than others.
Likewise at Google, Hangouts and self-driving cars won't share any people. Especially as their self-driving cars are now handled by a different company, Waymo, who have the same parent company as Google.
I get the point that it's astonishing that videoconferencing tech is still so awful. I remember interviewing at Google's London office years ago and two of the interviews were done by video links, and they worked flawlessly - but this very impressive conference system was one Google had bought in from somewhere and presumably paid ridiculous sums of money for. That experience gave me hope for consumer video tech - but it's now ten years later and where are we?
FaceTime seems to work, but it only works between Apple devices. Google Duo seems to work in my limited experimentation, but only works between phones running the Duo app, because Google have more hands that don't know what any of the other hands are doing than just about anybody else - especially in the consumer communications arena.
I wish someone would get their pants on and seriously invest in sorting out video calling.
Heck, getting audio to work reliably would be good. Best one for that currently in my experience? Slack.
I would like to add to the first part of your comment that it seems lots of people don't understand how teams in a company work as well. You can't pull people off the HoloLens team and expect them to make huge leaps on the Skype team. If someone is in love with their HoloLens work, drive that forward, they might not be in love with Skype. Thus you have situations where new breakthroughs are hard to come by. I would think working on Skype is not nearly as exciting as HoloLens. #1 because I am sure Microsoft has to be insanely protective of the platform and not make large changes all at once. Not the case with HoloLens.
What you can do however is say to the hololens team that one of their priorities is to have a killer avatar based augmented reality video-conferencing app and that it's going to be one of the core drivers of the product so they better make sure it works and works well.
It doesn't need to be built on Skype (and preferably wouldn't), but they should be looking at that as a core market for the product.
A few months back I was talking to a friend, whose company was paying him to fly to the UK to host a week of training or something similar, and we calculated that the cost of flying and accommodating him for the week would have been equivalent to purchasing PCs and VR gear for all participants. Except of course there wasn't really any software up to the task of virtual conferencing.
I wish someone would get their pants on and seriously invest in sorting out video calling.
It's not going to be sorted out...probably ever IMO.
To do teleconferencing right you need to metaphorically impedance match between two users at ever increasing bitrate demand.
So if one side has 3G and the other side is on Fiber - you have an impedance mismatch and an asymmetric experience.
Until the floor for internet is basically what users would define as delivering "excellent" video calling - it's never going to be solved. Watch out for those moving goalposts though.
"Can you imagine the outrage that would occur in Palo Alto California if anybody presumed to keep chickens in their back yards?"
When we lived in Palo Alto (~2008-2011), we had about twenty chickens and three goats. The neighbors all loved them. There are some ways in which Palo Alto is a pretty weird place, but almost all the people there are just like people anywhere else.
There are a bunch of options for low-latency video conferencing solutions available for people who actually need it (tele-medicine/surgery, sports/media broadcasting, enterprise conferencing). The thing is they just cost a lot of money and require dedicated hardware. Here's one company I know of that does this: http://www.haivision.com/
We have a Polycom conferencing camera (CX5000), which seems to be something originally designed by Microsoft and called Roundtable. It works well, including for international calls — we have regular calls using Skype without issue to Canada and Taiwan from Europe, and irregular calls elsewhere.
It does have an essentially perfect 1Gb/s Internet connection.
sigh Polycom - such good hardware with such awful software.
We have Polycom VVX VOIP phones, and the call quality blows the socks off any of the competition. However, we also have to maintain a TFTP provisioning server that feels like it was last updated in the 80s. I have a feeling that Polycom have only just discovered the internet. Hopefully they can overcome this last hurdle.
We have a couple of Polycom CX5000s too - the second one we had to source from eBay! We use Skype for Business for 100% of our voice and video communication, and have virtually no trouble with it.
Any problems in my experience stem from remote users' ISP bandwidth issues, or their inability to read instructions. No complaints about SfB.
I wonder if the author is using Skype for Business or the consumer Skype.
We're using Skype for Business and it's a daily source of issues. Call quality is often terrible and - unrelated to the audio/video aubject - screen sharing is so laggy that it is literally unusable most of the time ("Please click over there. wait for visual update "Now do..").
Another fun issue is that you cannot directly paste stuff without starting a conversation. It will block your message as "too long" most if the time, unless you write something meaningless and short first. "Hey. Skype sucks. Incoming error I need your help with:".
Video conferencing/screen share is a huge part of how my company works - and SfB is extremely reliable for us. It 'just works', if it didn't we would switch to something else immediately.
Part of the problem is that the camera is positioned above the display, so the person on the other end isn't looking you in the eyes. It's a bit disconcerting.
Perhaps someone will invent a one-way transparent display where the camera will look out from behind the middle of the monitor. Alternatively, we could position multiple cameras around the edges of the display and then use real-time video processing software to combine those streams into a single synthetic video which makes it appear as if the person is looking at you.
I guess this is a fair time to promote the Video Conferencing startup(-ish) I'm working for: StarLeaf https://www.starleaf.com
The author never say exactly what they want fixed, but we sure know most VC systems are a horrible mess. We spend a lot of effort on UX and making good quality calls work through NATs and random packet loss.
The main products are meeting room solutions, which tends to be least well server by free offerings. Squeezing everyone in front of a laptop is never great.
Author is bashing VR but what if conferences in 3d environments are a better solution than skype? People use 3d / metaverse solutions for education for example, because it's a compelling solution. A gaming environment may in fact be more productive than looking at 100 thumbnails of participant faces.
video conferencing is appealing because we communicate as much through body language and facial cues as by voice. We evolved to communicate face-to-face and end up missing out on a lot of information if we only represent ourselves with our voices and a visual avatar.
A large part of the body language component is reproducible today with hand trackers + head orientation. In terms of facial cues there have been some interesting demos with gaze tracking being reproduced onto the visual avatar.
but have we gotten past the uncanny valley yet? I've seen this attempted previously but it always seems very "off" in a way that engenders distrust. When we could just fix video streaming, it seems redundant to try to accurately replicate body language in a 3d avatar.
If the cost and cumbersome nature of VR can be reduced to bring it in line with just putting on a set of headphones and still be aware of the outside environment, sure.
>The real coffin-nail for videophony involved the way callers' faces looked on their TP screen, during calls. Not their callers' faces, but their own, when they saw them on video.
>This sort of appearance check was no more resistible than a mirror. But the experience proved almost universally horrifying. People were horrified at how their own faces appeared on a TP screen. It wasn't just 'Anchorman's Bloat,' that well-known impression of extra weight that video inflicts on the face. It was worse. Even with high-end TPs' high-def viewer-screens, consumers perceived something essentially blurred and moist-looking about their phone-faces, a shiny pallid indefiniteness that struck them as not just unflattering but somehow evasive, furtive, untrustworthy, unlikable.
>The proposed solution to what the telecommunications industry's psychological consultants termed Video-Physiognmoic Dsyphoria (or VPD) was, of course, the advent of High-Definition Masking. Mask-wise, the initial option of High-Definition Photographic Imaging — i.e. taking the most flattering elements of a variety of flattering multi-angle photos of a given phone-consumer and ‚ thanks to existing image-configuration equipment already pioneered by the cosmetics and law-enforcement industries — combining them into a wildly attractive high-def broadcastable composite of a face wearing an earnest, slightly overintense expression of complete attention.
>It turned out that consumers' instinctively skewed self-perception, plus vanity-related stress, meant that they began preferring and then outright demanding videophone masks that were really quite a lot better-looking than they themselves were in person.
>Even then, of course, the bulk of U.S. consumers remained verifiably reluctant to leave home and teleputer and to interface personally, though this phenomenon's endurance can't be attributed to the videophony-fad per se, and anyway the new panagoraphobia served to open huge new entrepreneurial teleputerized markest for home-shopping and -delivery, and didn't cause much industry concern.
The question this article raises for me is - would people really object if you kept hens in a Palo Alto back garden? I keep chickens in London and I've never had complaints.
Can't speak as to how it scales but I had to use both webex and Zoom in the past year. Paid corporate accounts etc..
No affiliation to either.
Webex linux support is unbelievably poor(literally, greater than a 1/2 days work to get it installed on linux).
Sound quality deteriorates on international calls.
Just really crappy experience overall.
Zoom has rpms/deb packages available. Installation- 30 secs..
Sound quality was crystal clear on international calls.
Webex is one of the few product offerings that makes me truly angry.
Agreed! We use Zoom almost exclusively at Elastic. Since we're a fully distributed company, we rely on video conferencing pretty heavily. Zoom handles everything from a handful of people up to full-company meetings. It's pretty impressive to be honest, and doesn't try to melt a hole through your desk like Hangouts.
I use it actively on both Mac and Linux.
Not affiliated with Zoom at all, just a very pleased user :)
I really like https://appear.in - simple, straight-forward, and "just works". Don't know how well it works for more than two people, though. You can add screensharing with a Chrome plugin. It's all I need, really.
we use appear.in daily - it works well up to 5 people. It also has an interesting feature to allow you to "mute" certain participants, which works well in our use case where some participants can be in the same physical room. On "older" (say 2012 macbook pro), it easily pegs chrome at 100% CPU though
Paid Zoom works generally well, and much better on mobile than alternatives - you can also record meetings, and the recordings are of decent quality.
Skype for Business? Lol. If only it was possible to _join_ a meeting without an account, that would be great.
EDIT: someone asked if I'm seriously recommending bluejeans; I am, yes. Indeed it's hilariously CPU heavy and their linux support is a joke, but it's still working compared to other solutions. At work we have some vidconf equipment in some rooms which can dial bluejeans on their own and they usually work pretty well, so yes.
I use meet.jit.si (or a self-hosted version of the software) pretty reguarly and am a big fan. It's probably broadly comparable to other things people have suggested since it actually powers a lot of other video solutions behind the scenes, but I like their little demo service because it doesn't require auth and is pretty bare bones.
I am the founder / CEO of Highfive. One of the companies implicated in Alex's post :-)
I know some in the YC community checked out Highfive early in our maturity curve when we released our product 2 years ago. I would love for folks here to try it out again and tell us what they think.
I took a few minutes to write up a few lessons we have learned over the last 4.5 years. It's definitely been harder than anything I've ever worked on. Would love your thoughts.
This has to be the best non-tech-documentation piece I've ever read on Hacker News. The author makes great points with a well-placed sarcasm.
I know, I know. Corporations go where the money flows, and they don't want to change the status quo if there's no money in that revolution (case in point: not attacking the rental cars / hotels / airlines). I am well aware.
But the author is still very correct on a conceptual level.
Apart from bigbluebutton are there any other open-source video conferencing solutions out there ?
Rather than so many different versions of flawed product. I wonder how things would work out if a lot of creative individuals solved and made better the same problem.
Also, I think another pain point is that webcam and audio setup is tricky to most `windows` pc average user, is there a product that helps figure out the quality score of your setup and potentially help with it ?
The fix for video conferencing is AR, I have been video conferencing for 20 years, and after the couple minutes of introductions, the video feed is not needed 99% of the time.
Screen sharing on the other hand is great, just about always need that. But we really don't have to be looking at each other constantly.
So stop trying to fix something that is inherently not needed.
No thanks. If video conferencing starts to work, my customers will start to ask for it. And the loss of productivity will dwarf the annual spending on business trips.
You executives can go on and fly, don't ruin work for everybody else.
The next bit is finding a good UX for doing audio setup. Right now automatic gain control is just broken. The vast majority of problems that people have (after having no audio at all) is that the audio is set up badly. If you want to see how good audio can be in a VOIP situation, take a few minutes setting up mumble properly and talk to someone. It's incredible.
Those 2 things will solve virtually every audio problem because most of the really pathological cases where echo cancellation breaks are initiated by those 2 things. However, for doing video conferencing where you have lots of people, you probably need a hardware solution. This hardware already exists, but it doesn't talk to any voip software that I know about (granted I've been out of the business for a decade). Anybody really serious about this market needs to build hardware (or better yet, pair with someone willing to build commodity hardware to open standards).
Video? It's really no problem. It can lag for seconds and it's totally fine. All of the serious problems are audio.