I don't know about those theories. I suspect something else. Do you remember landlines, and how responsive they were? You would say things back and fourth and the conversation just worked, even on three-way and conference calls? There was very little noticeable lag!
Even in the early cellphone days, I'd often prefer using my landline. There was less lag. Cellphones felt like speakerphones where it tries to do that cancellation stuff and you feel like the conversation is split into pieces .. like a walkie-talkie where you don't control the "over" / button press.
Cellphones have gotten better, but they've never really hit the latency of landlines (or maybe they have and I didn't notice. I never actually talk to people on my phone any more). Video conferencing like that that but worse. It's really apparent if you're on a remote team and you're next to someone on the same call. You can hear that delay.
I like a lot of the fatigue comes from that delay. We can connect more people now, with video, over great distances, but it does come at a cost at moving from virtual dedicated circuits to package switches networks and transcoding on some central server.
Landlines were so fast and so "direct" in their latency (where distance correlates very directly with time, due to a lack of "hops") that local phone calls were faster than the speed of sound across a table, and for a bit after they came out--before people generally got used to seemingly random latency--local calls felt "intimate", like as if you were talking to someone in bed with their head right next to you; I also have heard stories of negotiators who had gotten really tuned to analyzing people's wait times while thinking that long distance calls were confusing and threw them off their game. But no: cell phones haven't become as fast as landlines and likely never can due to fundamental contrast of compressed packetized routed audio vs. the speed of an analog signal propagating over a circuit-switched wired connection.
I have very clear memories of my first overseas long-distance calls in the 1980s, which were over a satellite link. The latency was so pronounced (about half a second) that carrying out a conversation took quite a bit of practice. You almost had to carry out a formal protocol of handing over control of the line, as if you were on a walkie-talkie.
I have equally clear memories of my first overseas calls carried by undersea fiber because the lack of latency was so pronounced compared to what I had become accustomed to by then.
I still can barely tolerate carrying on a conversation on a cell phone though. The latency and compression artifacts are just horrible compared to a landline. VOIP has gotten pretty good though. I can't tell the difference between a good VOIP landline and a POTS line.
I'm in the military. I have found that the habit of saying "over", learned as crowded circuit management for open radio circuits, is quite useful in teleconferences and VTCs, and for similar reason. It makes it clear that you're handing over the circuit. "Out" means you're leaving the circuit.
I think that can work well, as long as people also adopt the military/aviation habit of speaking crisply and not rambling on to hog the channel until they're absolutely sure they've emptied themselves of every possible thought they could ever have on the matter. Like I just did with every word after "rambling on". ;)
Yeah, this. The experience with overseas calls I was referring to was talking to my then-girlfriend who was spending a year studying in France. Saying "over" all the time is not very romantic.
On the amateur radio side you see both types mixing in pretty amusing ways. On the one hand there's contesters and dx-ers that are all about confirming contacts as fast as they can, and on the other there's more casual people that'll talk about anything and everything that comes to mind. In my experience both sides use some amount of procdural language, but at very different cadences.
In the early 1990s I was sharing a flat in Edinburgh with a guy who's dad was working in Nigeria. The delay was routinely 4-5 seconds, which almost always took us a frustrating 30 seconds or so to sort out while we went backwards and forwards, both saying "no, you start" at the same time, leading to 10 seconds of silence, then both starting again...
There was a hack to force a call over cable I used to use when I did a lot of international calls for BT when I worked on international interconnect for x.400
You should note that most landlines have significantly more delay than they used to have 50 years ago.
All landlines now get digitised and packetized, and usually go over an IP network, frequently via the landline companies HQ hundreds of miles away, before heading back to your neighbours house who you were calling...
Data is the same - If I ping my next door neighbor, from my broadband connection to his, it goes via London (my ISP's headquarters) and Manchester (his ISP's headquarters) before coming back to him, with a round trip latency of >20ms. In sound terms, that delay is like him standing in the next room over.
> Data is the same - If I ping my next door neighbor, from my broadband connection to his, it goes via London (my ISP's headquarters) and Manchester (his ISP's headquarters) before coming back to him, with a round trip latency of >20ms. In sound terms, that delay is like him standing in the next room over.
I feel like that would be enough to be basically perfect, though. If only we could cut down on everything else in the chain adding its own latency.
I think the rule of thumb I used to go by when recording myself playing guitar was that 20ms was a noticeable delay, 100ms was sort of tolerable and anything more than that was enough to make my picking lose synchronization.
There's no compelling reason we couldn't have IP audio links (with an easy calling interface) that have no such compression artifacts. I would actually call it more of a noise gate issue perhaps trying to save bandwidth by maximizing the compressibility of quiet parts as if they're total silence, signal off. That kills the intimacy IMO.
We've had it for so long now with digital cell phone calls, even HD calling, certainly Zoom et al (some are particularly aggressive on the noise gate when others are speaking over you).
The process of waiting to create the first packet to send (even with no compression) will always be slower than a system where you just send the audio instantly as an analog signal. Even the process of waiting for a single "sample" of audio (for a tiny tiny packet) is technically slower (but of course no one does that: they tend to group together at least 2ms of audio into a packet). The only way you can go faster than a classic land line (assuming there were no signal repeaters: like, this is one of those landlines where as you get further away the signal also gets quieter) is if you can go a shorter distance or use materials with less resistance--maybe "lower relative permittivity"? I don't know exactly what measurement you use here as I don't remember enough physics--to build the wiring or (probably your best bet) switches.
Is noise gating why you can't talk over each other? I am terrible at doing this (I think of it as synthesis - 2 excited ppl grabbing the idea, running with it, then the other interrupts and takes over) and while I acknowledge it's a bad behaviour, why do video calls or most digital audio ones not let it happen? I get there's delay so they can't exactly blend the two audio streams together, but what is this crazy limitation that only one person can talk at a time?
Delay/latency is a totally separate issue that also has negative effects but not in a way that I would characterize as intimacy. Full duplex vs half duplex plays into it as I alluded: the symptom of half duplex is that the louder person "wins" temporary exclusivity which causes the quieter source to drop out in a way that isn't entirely different from what a noise gate does; the difference is that dropping out due to half duplex is based on the relative levels of the two sources and dropping out due to a noise gate is based on the absolute level of the source being above some determined noise threshold. Either way, all the dropouts where quiet becomes silent result in a lack of breath noise, saliva noise, the kind of laugh that manifests as just a bit of a strong nasal exhale, and plenty of other sounds. Think ASMR videos.
I never thought I'd miss working in a call center, but man do I miss my landline Jabra hardwired headset with an in-line mute button!! That place was a hellhole, but their call systems were fantastic.
I have a Logitech H390 which is better than any headset I ever used in call center(mainly Plantronics). It has inline mute. It's only $30 - $40. Also the call systems I was using were often crap, where every 30th call would just be static or weird behavior like agents getting routed to each other.
That and you’re constantly struggling to decipher audio and make yourself understood. Much like talking in a bar that’s too loud is exhausting. A lot of the exhaustion seems like it’s the difficulty of compensating for a high latency, low-bandwidth, low quality experience. We’re just not evolved for it—reality is the opposite of all of those things.
This is it. The issue is exhaustion compensating for all the audio problems-- dogs barking, room echoes, compression, images out of sync, the delay in reactions (constantly being unsure if you were heard- not knowing what the correct volume you should be speaking is), being asked to repeat yourself, and the mix of your audio with those of others.
It's a lot to process. With 15 people on "speakerphone", the audio is IMO the biggest and most taxing problem.
Its normally partipants with bad mics, headphones, AGC and the crappy audio support in windows.
I do a number of actual play games and it's the players trying to use the built in mics/headphones that have difficulty hearing the group conversations.
I use a high quality external sound card and don't have that problem - hangouts does seem to disconnect me when I don't speak for a while as my noise floor is so low
Siri has a huge latency problem. I say "Hey Siri" and its' precious seconds before she's listening (articles online tell us that it's possible to just ask the entire question without waiting for the beep -- but doesn't work, sometimes Siri's just not ready.). Someone at Apple clearly needs to reengineer Siri -- I say this as someone who's bought into the Apple ecosystem.
Siri is the like the Apple Maps of voice assistants.
Alexa (at least Echo gen 3) doesn't have this delay. I can talk with Alexa without getting frustrated at the latency. (unfortunately Alexa doesn't handle pauses or stumbles in sentences well)
it is interesting. I don't get exhaustion because I don't talk to the assistants that much, but people talk to their virtual assistants in a different cadence or register. Instead of "alexacanyoutellmethetime?" it is "ALEXA...WHAT...TIME...IS...IT."
Owning an Alphabet Corp Branded Espionage Hockey Puck (tm) myself, I must admit I'm fairly impressed at how far voice recognition and natural language processing has come, despite this. I remember when you really had to talk in that stilted, properly pronounced, methodical way, and then tentatively wait for a response for what seemed an age -- and it wasn't that long ago (maybe like, four or five years?). Nowadays, I can vaguely mumble at mine from the other room without thinking too much about sentence structure or how I'm pronouncing words or the volume or speed of my dictation or switching off things making background noise, and it usually does the right thing, and responds about as quickly as a person. It definitely feels a lot less mentally draining!
I suspect you're actually misidentifying the culprit.
Lag is generally not noticeable on cellphones, and lag has always been a problem with landlines on long-distance (especially far-international) calls.
What you're describing "like a walkie-talkie" is actually the difference between half-duplex and full-duplex.
Landlines are full-duplex, and cell phones were notoriously bad half-duplex... until 4G which is considered full-duplex. There's also the issue of carriers muting audio completely when it's under a certain threshold, which makes them less responsive, because you have to ask "are you there?" every so often.
Videoconferencing latency is a separate issue, because there's one central server for each meeting, so if it's between the NY and SF office, the server may be in SF, so two separate NY participants get double the latency talking to each other than they do talking to SF. But, everybody hears the same audio.
You might say, well why don't they select a server in the middle of the country for that call then? But selecting an optimal server location for a videoconferencing call is difficult, even if you have access to cloud locations all over the country/world, because you usually don't know in advance who's going to join the call. So whoever hosts the call or whoever's the first participant, the server is often chosen to simply be whichever one is closest to them, which can sometimes be quite suboptimal once everyone's joined. (And I sure wouldn't want to be the one to write code for switching servers mid-call.)
Also, videoconferencing is much closer to "half-duplex" because if you were always mixing everyone's audio together, it would be a noisy mess.
But in conclusion -- latency/delay isn't a significant new problem that digital networks have introduced. It's always been there, but duplex, silence thresholds, and conference calls are the more important factors to pay attention to.
Just another example of how the transition to digital from analog wasn't a complete solve. Switching from analog cable to digital cable also introduced this latency so that button presses on the remote are not instantaneous (due to timing issues switching between transport streams). It's kind of like Coke/New Coke/Coke Classic. Eventually, those that remember will fade away, leaving only those familiar with the current situation.
The best communication system I've ever used was the Clear-Com party line intercom in my high school's auditorium. The thing was older than me by at least 10 years. Hand assembled circuit boards in steel chassis. But:
- The headsets were comfortable and practical. I have not seen anything sold for office or even gaming use begin to approach the ergonomics. Plantronic, Logitech, none of it.
- Every headset had foldback, so even the big sound-isolating cans felt natural to speak into. I've spent several hours messing with virtual sound cards trying to make this work with Zoom and never managed it.
- The signal was so clear, and the mics were so good, that as long as you enunciated you could speak just a hair above a whisper and be reliably understood. I remember analog PSTN voice, and this was 10x better. Might have to do with less aggressive EQ.
- Full duplex, no gating. Absolutely no problem with multiple people speaking at the same time. At very busy parts of the show we might intentionally have two simultaneous interactions on the same channel. It was a bit of work to unpack in your brain, but no harder than it would have been in person.
- Obviously no noticeable latency (basic fitness for purpose - these things are for calling cues).
- There was a little bit of analog hum + background noise from each station, so when someone opened their mic you would notice, but it wasn't disruptive. When the channel was busy, this sort of substituted for body language and you'd be invited to speak at the first opportunity.
- Physical switch to open and close the mic. I know Zoom has spacebar PTT, but one button press latch on/off is also important.
It's absolutely astounding to me just how much better these systems are than their nearest alternatives 30+ years later. If I ever control a tech office, particularly with an operations component, I'll seriously consider installing one.
> It's absolutely astounding to me just how much better these systems are than their nearest alternatives 30+ years later.
FWIW, what you're describing is achievable with a decent audio interface, something like the AT BPHS1/BPHS2, and some cough/mute switches. My travel rig for video has full direct monitoring/sidetone for up to eight microphones/headsets, including wireless transmitters if I need to go that route; it's fantastic.
The current state of affairs for normal people is as bad as it is because normal people don't care.
Having used these systems in theatres/live events while I was at Uni I fully agree. These systems are amazing.
The reason is very similar to landlines. A dedicated board in each station that shares three copper lines with every other station. In this scenario you cannot beat analogue.
The ease of setup is also fantastic. All you need is some XLR cables. At some point on the line you need a power supply, which usually acts as a signal splitter so you can have lines going in different directions. Then just plug everything together.
> - The signal was so clear, and the mics were so good, that as long as you enunciated you could speak just a hair above a whisper and be reliably understood. I remember analog PSTN voice, and this was 10x better. Might have to do with less aggressive EQ.
OTOH everyone on a Web video chat always sounds like they're YELLING. Even if the volume's turned down, the tone registers as yelling. Having those happening nearby with regularity, even behind a (thin) wall, is about as bad as the proverbial open-office-seated-near-sales situation.
I think people talk louder on cell phones than they did on old analog land lines, too.
It might have something to do with the lack of sidetone[1] (where you can hear your own voice through the earpiece), which had the effect of you naturally lowering your voice. Landlines alway had sidetone, but most cell phones don't.
> Every headset had foldback, so even the big sound-isolating cans felt natural to speak into. I've spent several hours messing with virtual sound cards trying to make this work with Zoom and never managed it.
I have an external USB microphone from Schure that has an integrated 3.5mm headphone jack and mic volume dial. It mixes the sound of your own voice and the line-out from your computer, just like landline phones do. You might want to give it a try.
That's called sidetone. You really don't want your computer's sound system to generate that for you because the latency is even more distracting than not having it. It's much better to have it done in hardware, either by having the sound hardware direct mic input directly to the headset or with a headset that has it built in.
A good recoding system can do it in software and be acceptable. You can even do some filters on the output (recommend because the right over feedback to singers improves them) and get good results. You are allowed just over 5ms to do all of the above, which is possible but not easy.
It doesn’t due to potential echo but you can set this up yourself with most headsets or with a mic and mixer; you need to with headphones that shut out your voice or you will shout to compensate.
I tend to put my headphones on only halfway, also to catch anything going on in the background (noise cancelling headphones + my back is to the bedroom door).
It's pretty awkward to wear only one ear of my Bose headphones, but with earbuds it works great. Apple's are the least bad solution I've ever found for voice calls.
I have open back headphones that I use at home (can't use at the office because too much sound bleed), and they are great for video calls because of this since they let in the sound of my own voice.
> Do you remember landlines, and how responsive they were?
They weren't always, except for local calls. My Background: I made a lot overseas phone calls in the 1980's to talk to my father, from North America to Japan. I also took a telecom class going through uni.
When calling Japan, not only was it crazy expensive, but the connection was typically terrible. We considered it amazing to be calling Japan at all, so that was ok. Issues:
1. Latency, just start talking when the ringing stops. If you wait to hear the person say "hello" they'd generally hang up before they heard you. Sometimes it would be around 5s latency, we always said this must be "satellite" connections vs using undersea cables.
2. Voice quality. You could hear and recognize the person on the far side, but that's about it.
3. Cost. I don't remember the cost exactly since I was young, but I remember it was between $2-$5/min in 1980's dollars. Calls were short.
Latency was a major focus of telecom networks for a couple of reasons. One was human, obviously humans really prefer low latency in telecom networks. Good old school POTS lines also have to sidetone[1] the old fashioned way, and if the round trip time gets too high it's really disruptive to hear your own voice > 300ms later. But this causes a big echo problem, so have a shorter round trip time makes the echo cancellation filters shorter/easier.
1) Sidetone is hearing your own voice come out of the speaker. Analog POTS lines are super cool this way, on longer connections the signal is weaker so the sidetone is weaker, so the human speaks louder. This means their own voice is louder so the person at the far end can hear them over the weaker signal.
Agreed. After reading the title, I immediately thought of the Doherty Threshold [1]. It states that if the system can respond to user actions and give the user feedback within 400ms, it increases users’ attention and productivity.
That constant delay, plus the worse ones when the connection loses strength, put a constant toll on "zoom" conversations.
I think that is the main factor but many of the items they brought up in the article also add to it but my guess is if there were no delay or drop in quality, it would be very manageable.
I can't believe basic things like typing out an email in Outlook visibly lag at times. I'm a decent typist, but I find it absolutely ludicrous that this quad-core monster is at times half a word behind on my typing
I can type up to 180wpm and I have to intentionally slow myself down to half speed, to use microsoft's start menu, wow. Typing "cmd.exe" or "notepad" in windows start menu, seeing those applications briefly appear as I type, but by the time I press return, it forgot about the match, and launches a bing search instead.
I feel like microsoft is still cluseless, looking at bing analytics and thinking millions of people search for "notepad" every day because they really think its a great thing to research?
I'm amused because I remember typing on an 8 bit computer over a 300 baud modem to a different 8 bit computer that was echoing back and it was always able to keep up. Like you I find anything that can't do as good as the above setup unacceptable. Too bad I can't get It to put some reasonable selection criteria in the system
inadvertent - people who talk too loud or don't understand the lag or don't notice their drop outs
should-know-better - people who use speakerphone and inflict it on people who deal with headsets to give others good sound. or carry their devices moving
> people who use speakerphone and inflict it on people
To be fair, this varies by speakerphone. I use a landline phone with a high quality speakerphone. As far as I've been told, the quality is as good as possible (limited by the line I'm dialing into).
I feel like you've hit the nail on the head. I remember growing up; I'd have long conversations on the landline and it never really felt burdensome.
With Zoom, Facetime etc, it's hard to gauge who's talking to who. Side conversations within groups are impossible. You also can't get a feel over another person's body language.
Latency is one issue, audio quality is, I suspect, another. Most VoIP applications have this characteristic "hyper-compressed" sound to them that I find tiresome in and of itself — sort of the same way that I find shopping mall-style lighting exhausting.
If the other person isn't close to you no system can do it. At about 1000 miles you will be out of sync no matter what system. Sensitive singers will run into trouble at just 500 miles. Speed of light in different mediums is slower that vacuum, radio can probably get over 1000 miles if you can get line of sight.
You'll be happy to know that Damon Krukowski actually covered this on his podcast ways of hearing, episode 3, entitled love, it goes very in depth on how "terrible" even today's cell phones sound vs a land line. https://www.radiotopia.fm/showcase/ways-of-hearing
I've worked remote for a half dozen years now, and you're absolutely right, the lag is definitely the major factor. Minor factors are just the extra little effort expended in making sure your screen is showing, making sure everyone can hear you, the screen jumping focus as it follows the speaker. Just things like that add to it.
But holy crap does VOIP (I guess) beat Zoom in latency. Recently had to call the support line for my wives’ PC, and the latency during that call was around 1s. No clue what the cause was, but it was incredibly painful to have a conversation that way.
At least in part, it depends on what the program prioritizes. Mumbles prioritizes latency over sound quality. You start to sound bad sooner, but it's more snappy. For most gaming, that's the right choice to make. For normal phone conversations, you might value sound quality more.
I'm using Jamulus for making music with others. It has can reach latencies of < 25ms. Chatting over Jamulus is much more pleasant than talking over a video conferencing application.
Lots of normal things about human interaction are broken in Zoom meetings.
* You can't have a normal conversation with that much delay. Every time you talk you're probably interrupting someone, and that starts to wear on you.
* You can't have any side chats with your neighbors before and after the meeting, making everyone seem less like a person.
* You can't tell who's looking at you, so you always feel like you have to remain presentable and vigilant.
* There's no sense of space. You can't look over at person if whomever is talking just teleports to the center of the screen. Or if the layout is fixed, sound isn't coming from anywhere in particular, so you need to search around for who's talking. And everyone on screen is looking a random direction.
So why the fixation on a video call? I'm probably just an old curmudgeon, but to me the benefits do not out weigh the oddities being discussed. Everyone wearing their pieces of flair with their "personalized" backgrounds are tiresome.
I can see the benefits of a screen share/presentation, but I really don't need to see the other people involved. I don't care about tone of voice. I get sarcasm. For the people that don't, then even if they can see them, the sarcasm isn't more/less because it's a video call.
Video calls to me are just another version of "Hey look at me!!!! I'm special"
I have absolutely no use for videoconferencing professionally (and I work remotely, and have for nearly 20 years), but in the era of Corona I find we're doing Facetime or Zoom with friends and family for social reasons, and enjoying it for the most part.
I still find a video call more draining than physical interaction, but in the absence of the latter I'll take this in order to visit with our friends. We have a standing Friday happy hour with 2 other couples we used to see "in real life" all the time, and we look forward to it. My cycling group has taken to setting up "training Zooms" we can join so we can all suffer together on our indoor trainers, and that's nice, too.
What's interesting to me is that, now that video calling is becoming more mainstream, I wonder how much of it will stay after Corona, and how that will affect long-distance friendships.
In part due to the latency noted elsewhere in the comments, I find myself interrupting people more when I'm remote. With video on, the visual cue of someone getting ready to start talking makes that easier to avoid.
Secondarily, I have a much better time assessing whether other attendees are following a conversation. If I'm feeling lost, but everyone else looks like they're following along, maybe it's just me — and I'll sit tight and try to catch up. If others look bewildered, I'll start asking questions.
> With video on, the visual cue of someone getting ready to start talking makes that easier to avoid.
You have the possibility of using an actual raised hand to signal that you want to interrupt a discussion that has gone astray. But that only works in settings where all participants are visible even when silent.
meet.jit.si has that feature. It seems really superior to Zoom (no account needed, links in natural language, great UI, good sound/video quality), I don't know why more people don't use it.
I agree 100%, voice is fine for communication. The video should only be deployed for presentations, graphs, and other visual media that is necessary to get your point across. Facetime is usless unless it's with the parentals every couple of months.
At this point I'm one of the few people still putting up the video feed, and I'm feeling pretty depressed. There was at least some interaction with that dumb feed of peoples heads, now I feel alone and like I'm talking to a wall. You'd be surprised how much you rely on feedback and validation from body language alone when it's abruptly been eliminated from your life.
Different people think different. Some people would be happy with written only. Some people like audio even though it is slower. Some people need to see other people. It is all a compromise to get different people to work together. Sometimes that means that I use mediums of communication that don't work well for me.
Email is a compromise to the readers. It works out more or less. There is no good answer but if you understand it exists you can work with it. Sometimes that means you have to work hard in a medium you don't like, other times you can suggest yours is better and win.
In our (academic) meetings, people generally default to switching off their camera when there's more than a handful of participants. I find this much less stressful. I can also do some physical activity which helps concentration if I'm just listening.
> if the layout is fixed, sound isn't coming from anywhere in particular, so you need to search around for who's talking.
I’ve wondered about this - why aren’t individual speakers (people) each placed somewhere different on the stereo plane? Is it because the video is in front of us and having people sound like there coming from the sides could be distracting, by drawing our attention away from the screen?
Dolby Voice [1] provides this, and BlueJeans has support [2] for it. I've only been on one call with it, but as someone who (even before current circumstances) spends 4-6 hours/day on video calls.. I'm very intrigued by the idea.
If you want to google more, the term generically is 'positional audio'. Dolby calls theirs 'Voice Placement'.
Teamspeak (although that's audio only) allows you to place other speakers around you in 3D; I haven't tried it much, but I imagine it could be quite useful on certain situations.
I wonder if VR meetings will take off because of this crisis. Most of the issues you mention would be solved by VR, except the audio latency, which is the biggest hurdle.
For mass adoption the headsets would have to become even more comfortable and lightweight, the displays faster and higher resolution, with lower price and easier setup.
I haven't followed the latest developments, so we might be a couple of generations away from this, assuming the industry can keep progressing during downtime.
We did some meetings in VR. It definitely has some upsides, eg you can stand in a virtual circle and you get to see who people are looking at which makes at least that aspect of meetings better than video.
The big downside is the friction of getting it set up. Inevitably someone has a dead battery or their VR setup decides it absolutely needs to update drivers for the Nth time this week, and then everyone is waiting or unsure what is going on.
Also inevitably the same 3 or 4 people join all the Zooms with their camera off, and mic off, which creates a disconnect. This is akin to showing up to an in-person meeting under a giant blanket, only the sound quality of someone talking from underneath a blanket is significantly better than Zoom.
Any other Autism Spectrum or ADHD people who feel less fatigued by video meetings compared to IRL?
The things I like:
- having the range of activity during a meeting constrained to one area in front of me so I don't have to look around the room to make sure I'm not missing something / get overstimulated by too much stuff happening.
- Boxes with borders lighting up when someone is making noise / talking so I know where to look.
- Names always associated with faces
- text chat area for sharing links and non-disruptive ways to request clarification
- No longer get lost in the hallway trying to find rooms with weird names
- Don't have to kick people out of rooms when they stay too long
- can foot tap and fidget hands off screen without people noticing
And by introvert, I mean I can very confidently conduct meetings, conversations and socialise well, but it wears me down.
I find phone calls and video meetings somewhat more exhausting minute by minute.
So much so that I tend to have much shorter phone calls and video meetings. I cant go more than about 30 minutes before I have to stop.
Some people in my office however can seemingly go for hours! Literally 2 to 3 hours of a phone/video meetings and they seem engaged and energised. They love talking!
For me, just give me the facts and lets move on to some actual work.
My problem is I can't multitask during a zoom at all. I'm so anxious that someone will ask me a question so I'm tuned in the entire time while everyone else is pretty obviously off working on the side. I also find that I stare at myself fidgeting. It's nowhere near as bad as in person where I just want to not talk at all, though.
Also if everyone is on different timezones please don't make us turn on webcams. I'm getting a lot of 7-8am meetings lately now that my team is mostly East of me. I'm just waking up. I was in meetings from 8am this morning. I showered at 2pm when I finally got a 30 minute break.
I'm a little bit vain, a bit of a narcissist and self-conscious. I find it very distracting on Zoom to be looking at my own face on video to ensure I look like I'm properly paying attention, showing my best angle (hiding a double chin!) and not fidgeting.
Maybe I'm just too much in my own head about it, but it seems there's a bit of a performative element to zoom meetings making me a lot more award of presentation and perception than in irl meetings.
That makes total sense, most people don't make a habit of looking at themselves unless it's for the purposes of evaluating or adjusting something about their presentation. I tend to just make sure it's setup right when it starts, then use the option to hide the self-view to avoid that distraction. I don't care if I might look silly later, since constantly readjusting is a silliness of it's own. And someone will ping me if the camera messes up.
Yeah, I tend to hyperfocus so I don't miss a question too. But I figure that kinda should be the point of a meeting, so it's understandable, even if it's tiring. If everyone else is making it bad by multitasking, they need to get a handle on that. Zoom could add a bunch of features to help people run better meetings (a way to pass a ball around to select who's speaking, hand raising emoji, dynamic topic queues with votes), but it's just lousy software tbh.
>(a way to pass a ball around to select who's speaking, hand raising emoji, dynamic topic queues with votes)
I'm no Zoom apologist, but.. it has a 'raise hand' emoji/action, as well as a topic queue (in Q&A format). it also has "go faster" and "go slower" feedback attendees can display to the presenter in realtime.
I like the talking stick^W ball idea, though.
Lots of this stuff is in off-by-default preferences in Zoom, FYI.
My girlfriend has ADHD and possibly autism (she's getting assessed for it soonish), she's outright refused to switch her classes to be done over video - having to strain to hear / understand (it's a language class too), background noises, visual and auditory distractions. She explicitly went and paid extra for classroom sessions just to limit the distractions from online classes.
Not saying you're wrong (of course you're not, your experiences are not mine to judge), just providing a counter-anecdote.
Personally I don't really mind either way; with IRL meetings it varies, there's ones where I'm more an observer where I mainly tend to fall asleep, and there's the stand-ups which are fine outside of other people not in the meeting speaking up (we've had to shush them on more than one occasion). That last happened a few years ago though.
That's great it was an option for her but a lot of the uptick in zoom usage lately is for people who don't have the option to see each other face to face anymore due to Covid.
The alternative would be audio-only calls or text-based chat, or simply not communicating (not feasible for businesses but maybe for families/friends).
For me the main benefit is linearization of the conversation: people understand the limitations of the medium and interrupt each other less frequently, so it’s much easier to both follow and join the conversation without monitoring the nonverbal interactions in the room. That’s what makes teleconferencing with multiple people much less fatiguing to me than real meetings. Assuming known people and no video, of course — only screen share please.
I hate it. At least one person on our call has a background whine that he can't hear, but is constant. Other people have kids. At least once a call there's an echo event where someone puts their speakers too close to their mic. Trying to listen and understand what anyone is saying is difficult - I've nearly given up.
I’m surprised this didn’t mention the elephant in the room: the audio quality is terrible. I simply can’t hear half of what is said, even at full volume in a quiet room.
Either I ask for repetition all the time (and walk over whatever the next person said), or I assume it from context and let it go.
Even when I can pick up all the sounds, it’s so different from hearing voice in person that I have to concentrate just to parse the audio into meaning. It’s exhausting.
Cell phone calls are still pretty bad, too. Low bandwidth digital audio compression is just wretched.
The only thing really terrible is their (lack of) echo cancellation. Otherwise I'm about 90% sure it's just off the shelf OPUS with next to no DSP on top. It's not that bad on its own.
What is terrible is audio hardware on laptops. MacBooks are alright, but just $50 on a headset will go a long ways to solving your problems. Consumers haven't given a shit about audio fidelity for decades honestly, it's just showing today when you are listening to unprocessed audio streams that haven't been optimized for your laptop or phone's fundamentally inferior transducers.
This is a solved problem, for what it's worth. The major audio manufacturers sell products for these use cases. It's not the software at this point, it's actually decent. It's the input and output devices. Thank god for gamers buying these things and driving economies of scale.
So the solution is "convince all of my (unemployed) friends to spend $50 on something which has more value to me than to them"?
Assuming that does solve it, I'd say a solution is known, but I wouldn't call it "solved". That's like a "nobody needs DropBox" type of "solved problem".
Well they have value in hearing others and being heard, I'd assume. All my friends have their own mics because we couldn't understand eachother otherwise, and we don't have problems with fidelity. My coworkers are a different story, because consumers rarely listen to integrated microphones before they purchase a device (bluetooth earbuds are absolutely terrible about this, fwiw). Who's at fault, the company for cheaping out, or the consumer for not caring enough to find out if they company cheaped out?
The technical problem is solved. The economics problem of getting people to value the solution and marketing problem of educating about it is still open.
Regardless, this isn't a problem to be solved by software. It's a consumer electronics problem.
The problem is most likely caused by the other party not using a headset. Even the cheap ones result in much better sound quality than the laptop microphone. Another factor can be lack of bandwidth.
Audio quality in Zoom itself is superb according to my experience. Webex is applying some heavy compression. Teams is closer to Zoom, but not as good.
And I would say the root cause is people not knowing how bad they sound. Others don't complain, because they don't want to spend time debugging one participants audio setup. It would help if all the conferencing tools would provide easy to use tools for testing your setup (record message, play it back). Maybe one could even offer (as an option) a forced audio check before you can join the meeting.
You'd think / hope that nowadays builtin microphones would be better; I mean place them next to the webcam, add some wadding / filtering in front (like the round disk things in professional settings), have two of them, and do some clever AI things to filter out anything that isn't speech. And enforce push-to-talk with a clear visual indication of who wants to speak (ATM in Zoom, Skype it's too subtle).
I thought that for a long time. And then, a colleague and I have a MBP 16, and poof, looks like descent quality speaker and microphone on a properly integrated sound stack does wonders even on standard software.
Our team is all on 15" MBPs, sound quality is quite good with built in mics and only marginally better with a headset. The worst is when someone is phoned in, the quality is poor and they have more delay.
Could it be an issue with microphones or speaker quality? My fully remote team has used zoom for all of our meetings, and unless someone is calling in from a cell phone in a busy area, I don't recall a time when I have had difficulty understanding someone.
If anything, the fact that everyone has their own microphone makes it better than in person meetings, because you never have to struggle with that one person who doesn't talk loud enough for the entire table to hear (which is sometimes me, because I have a very soft voice and struggle to speak loudly in big meetings).
Not OP. Some people do have really terrible hardware. It would be good IMO if the software had an input quality indication. Or a test mode that does a loop test, play on speaker at end A, listen on mic A, play back mic A on speaker B, listen on mic B, analyse profile and reverse the test - "you may experience poor sound as B's microphone has a bad frequency profile" or something.
The other thing that would be nice is if Win10 hadn't hidden the microphone level/gain controls.
It’s possible, but then why is it $1000 smartphones are struggling with this, and $5 landline phones from 1993 did just fine? Did Apple skimp on audio hardware that bad? That’s a little hard to believe.
My guess would be that most people calling in to a call on Zoom from their phone are leaving it on speakerphone and not holding it up to their ear/mouth like you would with an old land-line phone. Opening up an omni-directional mic that has to pick up someone from 1-2 feet away from their mouth is a lot different than an old-school phone that you pick up and hold an inch away from your lips.
You're not paying Apple $1000 for a microphone on a stick when you buy an iPhone, you're buying a computer that has a tiny microphone that works well enough for most normal phone calls to your doctor or whichever. If you care about audio quality as a foremost feature, you do what business people have been doing for years and buy a wireless headset/in-ear with a directional mic that's built for comfort and tuned to pick up audio in the exact distance range of your ear to your mouth.
Aside from the sibling comment's point, the majority of your $1000 smartphone's cost is large, high resolution touchscreen, ever-higher quality camera, and ever thinner case and battery to power them.
As for Apple in particular, it is worth pointing out they have a hard time getting laptop keyboards right, when keyboards in the general sense have been working fine for decades. They make tradeoffs for other values (thinness) and the product suffers as a result.
I have issues with some peers who have thick accents who normally in the office I have zero problems understanding. It does get tiring. It's no ones fault other than the audio tossing away information though. I hate having to ask people to repeat themselves and saying "the audio is kind of crappy today".
The original post and the comments here are interesting and intriguing, but many points raised here (a) assume that the initial proposition is in fact true, and (b) don't ask as to what evidence is available in support of the proposition (or indeed whether there is evidence that might appear to contradict it) beyond our own individual experience:
'Why is video-conferencing so exhausting?'
This is not to say that Zoom (etc.) isn't tiring to use, simply that it's sensible first to ask questions such as:
- is it particularly tiring to use? How do we know?
- if Zoom etc. is particularly tiring, for what proportion of users is this the case?
- are some types of users more affected than others? Why?
- is fatigue related specifically to video (as opposed to audio) use?
- is fatigue related to the number of concurrent users in a given call?
- is all video-conferencing similarly tiring, or are some platforms more so than others, and if so, why? UI? Latency? Some other reason?
- are Zoom meetings any more tiring than traditional face-to-face meetings?
- etc.
I think these are great points. I came to here to understand what people's feelings were on the matter. I don't particularly find 'Zoom' tiring any more so than a regular meeting. Is the problem these days just more meetings than there used to be? Because people are still trying to figure out the whole remote work thing?
And in fact I kinda like them better than regular meetings, because I can walk around the house and do whatever. But I also don't have a webcam requirement either, so maybe that's the difference for me.
Just a little anecdote that’s relevant here. Around 10 years ago my girlfriend didn’t have wifi home yet (low tech family), so we spoke at night over landline, as those calls were free. And it definitely did feel like what some people here are saying: you could hear everything with no delay or cuts at all, the whole call just felt like she was there next to me.
They eventually setup wifi in their home, so we switched to VoIp calls, mainly so we wouldn’t block our families’ landlines for a couple hours every day. And the quality was so much worse...
The delay sometimes would be so bad that we had this technique where one would slowly count down from 3 and then say “Now”! Then when the other heard Now they’d say Now too, and the first person could essentially count the seconds. This way we could “slow down” our conversation to accomodate for the multi-second delay, which (thinking back) was like our own protocol for dealing with signal delay!
I will go out on a limb and say that the issues you describe had more to do with your internet connection(s) than with VoiP.
In the mid-2000s, I used Vonage (VoiP) from Japan to the US all the time. People were shocked because it sounded like I was right next to them. My grandmother actually thought I was bamboozling her by saying I was in Japan because she thought that there was no way it could be so clear.
Granted, I was usually dialing my VoiP on a blistering fast internet connection (not in Tokyo, to be fair) to a POTS line in the US, but the connection was amazing.
So I wonder how 10 years ago the VoiP-VoiP connection was so poor. Do you think it was basic internet speed (which the US has historically been behind other developed countries), or do you think it was something else?
Could very well be, she had a very basic plan (around 10Mbps), while I had a fiber plan on 200Mbps so maybe that was the issue. I can't recall how it was with other people though.
I was on a partially-remote team for years, with bi-weekly team conference calls - webcam was possible but we all never used it, not even the manager. Screenshare was used for presentations. But audio was enough. Why is everyone defaulting to video calls for things? It adds so much "noise" and cognitive load on things.
My personal happiness plummeted, anecdotally, when my coworkers started dropping the video feed one by one. It was surprising, but noticeable and significant.
Same for me. There's just little things about interacting with a person face to face, personality aspects like how they express themselves: posture, eye movement, facial expression even. They just add to the interestingness of the work.
We do standups on video because we have a team split between the west coast of the US and Hyderabad. We get to see teammates from the other office once every couple years maybe, so we use video to maintain a personal connection. We've found it really helps us work better as a team.
So I'd say it's really situational. I'd prefer not to try and coordinate development sprints between two sub-teams offset by 12.5 hours from each other, but if I can't avoid that, then it sure helps to see faces to keep people feeling connected.
Obviously everyone will have their own personal experience. This is what worked for us.
1. Body language- it's important to be able to see people's facial expressions etc to gauge their reactions to things you're saying
2. Avoiding interrupting one another- being able to see other people's faces means you can see when they're about to start speaking. That means you're less likely to talk over each other
3. Keeping focused- for me at least. If i'm on an audio only call i'm tempted to zone out or do some other work when other people are speaking and I miss stuff.
> 1. Body language- it's important to be able to see people's facial expressions etc to gauge their reactions to things you're saying
> 2. Avoiding interrupting one another- being able to see other people's faces means you can see when they're about to start speaking. That means you're less likely to talk over each other
Do you find this works in practice?
I don't think your average video call setup is good enough for it to make a difference. There's no eye contact, video quality is usually questionable, etc.
If I was a professional negotiator who knew how to read body language in an academic sense, then maybe I could get some use out of video calls. But I don't think I'm able to get the "instinctual" benefits.
First time I've heard of this and it's not something I experience at all. I do not find video conferences more exhausting than in-person meetings - if anything maybe even the opposite.
One factor than can make it more tiring is poor connectivity and audio to the point where it becomes an effort to interpret what others are saying/making oneself understood. Other than that I think it's just a matter of what one is accustomed to.
I also find meetings via Zoom less draining, in general, than in-person meetings.
Maybe this is why:
> This, too, amounts to a persistent expenditure of social and cognitive labor as I inadvertently mind my image as well as the images of the other participants.
If I understand the author correctly, if they don't have a constant image of themselves presented to them, they aren't constantly processing how they look to others(!)? Well, in that case, I can see why Zoom would be more draining, but, you know, welcome to my world in any situation except when I'm alone or only with my SO.
I have more freedom when I'm online that meeting in person.
The few times online meetings are stressing, IMO, are the same that would make me mad in person: people who keep talking without allowing interventions, people who don't what to say and just pretend to say things, people who have bad latency (yes, that take some seconds before speaking)
I find leaving a call to be difficult. In the real world, someone just “dropping in” to a meeting room can nod, wave, or gesture to say hi, find the right moment to enter the conversation, make a point and then leave with the same nod, wave or gesture. You can’t do that on a video call without either seeming rude (“Foo has left the call”) or breaking the main conversation flow (“Hey everyone, I’m gonna leave now”). It’s tiresome and I end up sitting on calls until everyone else is ready to exit.
I think the problem is that Zoom hasn't solved any of the hard problems with scheduling and holding meetings remotely. They got decent/reliable multimedia multicast, and I'll give them that.
Talk to anyone in process management/optimization and they'll probably rant to you about how bad most people are at holding and attending meetings. The "one page memo" at Amazon is kind of famous.
Zoom (and all its competitors) could do a lot to solve these problems and doesn't. That's the source of Zoom fatigue. People are bad at meetings, don't know they're bad at them, and hold bad meetings. It's stressful and tiring. Maybe Zoom just makes it more obvious when you can't easily joke around or grab a coffee. To me it seems that all these remote meeting software solutions are only focused on the purely technical problems, because they're really easy to specify and measure improvement. I haven't really seen anyone use technology to really go after the social problems of video conferencing yet, at least not like how Discord and Slack have done in their domains.
Seems obvious and succinct to me: on Zoom, you feel compelled to look at the screen all the time. In-person, you can respectfully look anywhere in the room because the other person shares the physical space and can intuit where your attention is.
I think Mumble.com hosts instances of Mumble.info's software (although they aren't related a la WordPress.com), so they aren't competing per se. Definitely an unfortunate situation, though.
I'll take a 1 hour Zoom call over the 1:30h commute to meeting in traffic jams, 1 hour meeting, 1:30h back in traffic jams any day.
Any 'Zoom Fatigue' I'll shrug off with a nice walk and a coffee in 30 minutes. The stress from spending 3hrs bumper to bumper ruins more than just the entire day.
I think the main cause of stress and fatigue is that you have to be presentable all the time.
I put a sticker over my camera and I bought a headset with a hardware mute switch (so that it doesn't show) and now I can phase out just like I'd do in a regular meeting :)
I recommend a camera cover slider. This one works great.
It gives me some sense of physical security that hackers can’t penetrate the camera, or corporate can’t spy on me, since they’re now installing desktop screen capture spyware on your laptops. They can probably spy on you by stealthily activating the microphone, but that’s a different problem.
On a slightly related subject I can HIGHLY recommend Google duo for voice calls. Very little latency, fantastic clarity and the ability for both parties to talk and listen at the same time.
Makes it so much less tiring for long conversations.
For one-on-one videoconferencing meetings with normal latency (e.g. within 3-4 time zones), I find zero fatigue or problems like this.
The only "fatigue" I find is frustration when people are talking over each other or multiple people trying to jump in at the same time. This generally happens either when latency is bad (e.g. US-Japan), connections are bad generally (MS Office decides to auto-update in the background and saturate your download), or just when it's a particularly contentious meeting with lots of people trying to get a word in edgewise, and all the normal cues you use to figure out when to talk just aren't there. And the reasons for those all seem self-evident.
A meeting that has a component of "go around the room and get everyone's input" is harder in Zoom. In a physical room, you go clockwise or counterclockwise. But on Zoom, when someone finishes talking, who knows who's supposed to go next?
We created a pattern that when they finish the speaker nominates who goes next, cuts a bunch of awkward lag from going around the circle. Maybe give that a try.
The elephant in the room is the lack of EYE CONTACT. As a mediator now having to work on video, this is very close to a deal breaker. We don't hear much about it on tech news because no one has a mass market solution. A google search will find either a wasteland of dead SW and HW products or room based systems that are so expensive that they are typically bought by globals for CxOs.
The "uncanny valley" aspect is something I have experienced, but the big thing for me is the accumulation of three types of events.
(1) People trying to talk at the same time and taking several seconds to sort out who goes first.
(2) Glitches when everything freezes or even gets kicked out and brought back in, taking even longer to resolve.
(3) Wanting to say something and having to wait ages as people with less lag (I'm on the opposite side of the continent from everyone else) keep jumping in without a break.
Each instance of these raises my annoyance level. Dozens upon dozens of them even in a half-hour meeting can leave me totally exhausted from trying to tamp down the anger. By contrast, doing a 1:1 over VC isn't nearly so bad. Maybe an occasional bad moment, but not enough for fatigue to set in. I highly recommend that everyone do that comparison for themselves. The problems tend to increase more than linearly with number of participants, until there are enough that everyone accepts the need for strong moderation and two out of the three issues drop back to near zero.
Do not feel this. I have a wireless headset so I'll occasionally mute my mic and do a couple of dumb-bell curls or go get some water while thinking. Feels fine, honestly. I like being present for the social comfort I get from physical proximity to people but the Zoom thing just has that absent, so I don't get that extra bonus but I still don't feel tired.
Do you ever feel pressured to turn it on when everyone else in the call has their video on? I know I do and it's something I need to work on cause I find myself more productive and engage with the camera off for myself.
Turn the camera off after introductions, then only turn it back on when you want to make an important point or the convo is about a very specific work item related to you.
If someone complains, you can say you don't want to clutter up the video feed or that you're having latency issues (also, they're a dick).
As a full time remote worker, one of my favorite things about my setup is that I have a 43" 4k monitor set about 6 feet away from my eyes. I use macOS's scaling feature to make everything bigger. This has significantly reduced my eye strain and my overall fatigue.
My setup is somewhat unique, as far as I can tell. I have the monitor on a wheeled TV stand and I use a small hand cranked standing desk to support my laptop and keyboard. The laptop is connected to a dock that sits below the monitor with one Thunderbolt 3 cable, then I have one USB cable returning from the hub to the desk to support my microphone and keyboard.
This setup gives me a ton of flexibility in how I position myself relative to the light in the room and the distance between my eyes and the monitor while maintaining a standing position.
Every meeting you have to think about audio, video, the background, how to chat, is mute really really on, what are you wearing, other people in the house, the fact your setup is being judged.
In an office you can just do it.
It's not until you have set yourself up and comfortable and confident you then notice all the little things missing that we know are important to humans, like touch and smell and body posture and 3D video and 3D sound which I think are not about fatigue but something else.
It is exactly the same as why we can't drive as well while on the phone.
When face to face with someone every possible audio frequencies emited by the other person will reach you. This means your brain won't have much work to do to make sense of this information since it is 'complete'.
In a zoom meeting in most cases everybody is on his laptop using the built in laptop microphone. This means the sound will be mediocre at best concidering there isn't any network issue. Voice quality will be compressed and lack certain frequencies that you don't notice but help you decypher more quickly the other person speech.
That means that the brain energy necessary to only be able to understand the person in a zoom meeting for the same amount of time is more important than in a real life meeting.
Thinking a lot of Zoom meetings are just too long. I recommend max 30 minutes for a meeting, almost regardless what it is about. It's enough to come up with some action points that can be worked on in the meantime, then followed up with a new meeting e.g. in the afternoon.
I have a simpler explanation, and my mantra for life.
Are you finding something difficult?
Is it something you've been doing frequently for a long time?
If not, then of course it's difficult.
Keep hope: We get good at the things we do in life.
I recently discovered that the best way for me to communicate over Zoom when speaking is to not even look at the screen. I'll stare off into the distance and just focus on my words and thoughts.
Video and audio lag seems to short circuit the communication interrupt window that happens normally in person to person communication, although in large enough gatherings that fails even IRL.
Another thing is that the way Zoom gallery mode works is a bit unnatural in IRL. In normal P2P communication your only looking at one set of eyes balls at a time. All of the sudden you have several sets of eyeballs you can see all it once. This makes it more like public speaking and less like a small tribe meeting.
Love how nobody checks what Zoom actually does in terms of audio.
First, the do a lot around synchronizing video and audio. On bad connections they slow and speed up video and audio to keep the appearance of coherence. Really, really hard stuff that other video conf systems fail at.
Also, they have audio settings to filter background noise. Basic settings around the mic, but the Advanced settings about noise cancellation ("Original Audio").
If you book online classes for music teachers, you'll see why they prefer Zoom and which settings they recommend.
> perceiving an image of a body in virtual space rather than perceiving a body itself in shared space may be worse than not perceiving a body at all.
So drop the video and go audio only.
Of course, I'm legally blind, so I can't really understand why perceiving facial expressions and body language is apparently so important. But one of my best friends is totally blind, and the two of us have communicated over two decades without ever being able to see each other, so it can work.
It's a familiarity thing. Sighted people are used to a manner of communication that includes visual cues, and people always prefer what they're used to. I also find that people who are used to visual in-person communication also dislike videoconferencing because they can't see things like when another person is crossing their legs or tapping their foot.
Whether any of that is actually beneficial is another question entirely. Personally, I have an easier time concentrating on the conversation when it is audio only.
I find the video component to be completely superfluous and often distracting. The combination of audio, screen-sharing, and chat are more than adequate for accomplishing the task at hand. I don't need to watch grainy, stuttering video of my co-workers in sweat pants to be productive.
To me, physical meetings are as draining as the ones using zoom. If it's a meeting,where you just sip coffee, listen to others who do all the taking,then it's easy, however beimg actively involved in a convo,while trying to come up with some useful stuff,can be draining.
I think another issue is having to re-invent cues for changing who is talking in the conversation. When you're talking to someone in person, you can pick up a lot of information as to whether they want to continue talking, or whether it's your turn. On video conferences, those things are missing. You might detect a lull in the conversation and start talking, but at the same time, someone else does the same thing and starts talking. You have 100ms+ of latency, so you can get into what you're saying before realizing that someone else took their turn (or you took theirs). The collision mitigation algorithm takes place, and it's just as bad for your mental fatigue as it is for your WiFi's latency (which uses the same algorithm -- listen before talk, and if you collide, wait a random amount of time and repeat). These things just don't happen when a few people are in the same room together.
The result is that we try to use the video channel to pre-allocate timeslots for speaking. You make your eyes bigger, you raise a finger... but sometimes people aren't looking at the video, or don't understand what your newly-invented cue actually means. This is all very tiring.
The overall quality of the call is much lower than real life, as well. People do not own good microphones or cameras, so you can't actually hear them or see them very well. The noise gate intervenes and just cuts off audio from time to time. It is maddening how bad it all is.
Many years ago when I worked for Bank of America, we had these multiple-100k Cisco videoconferencing setups. They worked really well. My friends and I were at work pretty late and there was a tornado warning, so we couldn't go home (we all biked). We went to these conference rooms, set up a link between the two rooms right next to each other, and had a totally normal conference. (Things were set up so that the entire wall of the room was a video screen, and it had an array of cameras, microphones, and speakers. Everything was tuned perfectly so it looked like the people in the other room were just sitting across the table from you. There was no latency, everyone was their normal size, and the audio and video quality were perfect. Obviously with two rooms across the hall from each other, there shouldn't be any latency... but at least the system didn't add its own. It makes a big difference.)
Finally, I think another issue is that people just aren't used to getting work done with video conferences. I worked as a remote team at Google, so pretty much 100% of my meetings were video conferences. 1:1s with my manager and everything else. The system suffered from the same quality/latency issues as anything else (though we did typically have good cameras and microphone arrays in every room), but through practice, people got good at getting stuff done despite the limitations. I never really felt fatigued the way I do on calls with random people at home. (I guess my tips are: have an agenda in advance, and use your screen share to show progress through the agenda. Call on people in remote locations: "Anyone from the New York room have anything to add?".)
Even in the early cellphone days, I'd often prefer using my landline. There was less lag. Cellphones felt like speakerphones where it tries to do that cancellation stuff and you feel like the conversation is split into pieces .. like a walkie-talkie where you don't control the "over" / button press.
Cellphones have gotten better, but they've never really hit the latency of landlines (or maybe they have and I didn't notice. I never actually talk to people on my phone any more). Video conferencing like that that but worse. It's really apparent if you're on a remote team and you're next to someone on the same call. You can hear that delay.
I like a lot of the fatigue comes from that delay. We can connect more people now, with video, over great distances, but it does come at a cost at moving from virtual dedicated circuits to package switches networks and transcoding on some central server.