Does this use a thread local approach, or a Go-style ctx parameter? Since it's async I expected to see the ctx approach, unless the underlying async executor/framework also supports a thread local-ish capability?
This is a great question, and the answer is "it's complicated". The core `tracing` libraries don't use either; instead, they provide an interface for `Subscriber`s (the pluggable component that collects & records trace data, kind of like a logger but fancier) to implement a way of tracking whatever contexts they care about. The typical approach is for the subscriber to track a current span per thread, but they could implement something else.
`tracing` instruments futures by wrapping them with a future combinator that enters a span each time the future is polled; the `#[tracing::instrument]` attribute will do the same thing under the hood when used on an `async fn`. This is kind of analogous to the Go-style context parameter, in that the contexts are stored in structures or on the stack, except that users don't have to manually pass the context around.
The core library provides an option to set the `Subscriber` that collects trace data in a scope; this does use thread-local storage. However, the default dispatcher can also be set globally (like the `log` crate), and the use of thread-locals is feature-flagged so it can be turned off by `no-std` users.
Finally, I have some thoughts on an abstraction for "context-local" storage that allows the user to customize the context that's used to shard the data. This could be used like a user-space version of OS thread-locals when threads are present, but it could also be used by bare metal code for (say) having a context for each CPU core. This would allow subscriber implementations to track a span per thread by default, but let embedded or kernel-mode users override this without having to reimplement the rest of the subscriber logic. This is still in the early stages though.
Hope that all makes sense; I'm happy to answer any further questions!