Inband vs out-of-band signalling is one of those great debates in telecoms. You can see it in RS232 as well, with "XON/XOFF" characters versus "RTS/CTS" extra signalling lines.
Most internet protocols are "in-band", but there are two big exceptions in FTP and SIP, which is very much an internet protocol designed by phone people.
H.323 is the internet protocol designed by phone people, SIP is prime example of second systems effect caused by internet people trying to make it “more internet-ish”.
The reason why H.323 is “unspeakable horror” to internet people is mostly about it being built on top of existing practice of telco signaling, so: global titles instead of URIs and ASN.1 PER instead of HTTP-style (in case of SIP arguably overly verbose and complex) text messages.
Edit: and IIRC there is significant user experience difference caused by that: H.323 supports the traditional telco style Initial address/Subsequent address/Address complete dialing, while SIP works in terms of complete SIP URIs and the user terminal has to somehow divine that the dialed number is complete.
> while SIP works in terms of complete SIP URIs and the user terminal has to somehow divine that the dialed number is complete
I've never used a softphone that doesn't have an explicit Call button. You're telling me that there are terminals where people can just pound SIP URIs in one character at a time, and the call will just decide to initiate at some point?
That is how the normal POTS phone works and there is some expectation that physical SIP VoIP phone would work in same manner. For the VoIP phone the usual implementation is that the hardware SIP phones have configurable regular expression that describes all possible complete numbers and configurable dial timeout. Getting this to work reliably is somewhat intractable problem, while with H.323 or Cisco SCCP phones it just works. For SCCP because the phones are more or less dumb terminals and with H.323 because the signaling protocol is designed with this in mind.
On the other hand desktop phones are today mostly used in office environments, where essentially everybody today enters number first and then does some action that signifies that it is the complete address (pickups handset or enables speakerphone)
Edit: motivational example: suppose that random person pick ups random SIP phone and punches in 1-1-2 (or 9-1-1), then reasonable expectation is that the phone somehow divines, that what the user really means is 112@sip.exapmle.com(or 911@...) and immediately sends the INVITE, getting it to work that way for arbitrary numbers is somewhat nontrivial.
Most internet protocols are "in-band", but there are two big exceptions in FTP and SIP, which is very much an internet protocol designed by phone people.