In the starbucks shops from where I live, you have to pay when you order. That is before they write your name in your cup. So the problem the article is talking about simply does not apply.
Starbucks in NYC will radio or call out your drink order to the barista while you're waiting in line for the cashier, then correlate by drink type (barista yells "iced tall 2-pump classic iced coffee!")
side note-- there is a specific order to the way they call the drinks, too, as in the example above: iced (go to the stack of plastic cups) tall (pick the size) 2-pump classic (put the flavor/sugar syrup in the cup) iced coffee (finally put the beverage in). Helps the baristas remember your drink order by correlating the auditory with the physical, aiding short-term memory.
On the contrary, that is precisely the scenario the author is describing:
"The interaction between two parties (customer and coffee
shop) consists of a short synchronous interaction (ordering
and paying) and a longer, asynchronous interaction (making
and receiving the drink). This type of conversation is
quite common in purchasing scenarios."