> my god are the reference implementations lacking.
Can you share some of your experience, what do you mean by that? Are there edge cases causing problems, or major missing features? Easy or difficult to use?
As an example, Exemplars are part of the metrics spec [1]. The official python library says metrics status is 'stable' [2]. But there's an approximately 2-year old issue with no work on it, titled 'Metrics: Add support for exemplars', where the latest update is that no work has begun [3]. Nothing at a top-level of the opentelemetry-python project indicates that the project does not implement everything in the metrics spec, so if you wanted to use that capability, you are apt to discover it relatively late.
In the Elixir library, we don't even have metrics. I went into the OTel rabbit hole for two days trying to understand how is it better to Prometheus, just to learn it doesn't even do the basic thing, just traces.
I've mentally decided to just go Prometheus and ignore OpenTelemetry for the foreseeable future.
It's one of those things big players are hyping to preemptively lock you in their solution, but it's actually just alpha-quality new tech and "boring" "old" tech like Prometheus or statsd are simply more functional and better supported in the wild.
Elixir opentelemetry works quite well with Tempo. Tempo does the metric generation [1] and writes it to Prometheus. Tempo also does Service Graphs which works great with context propagation [2].
Btw, metric generation is not enabled in Tempo by default.
Metrics are implemented in the `opentelemetry_experimental` application. Last time I tried them, they were still a bit buggy but working (not complete, thiugh).
> Can you share some of your experience, what do you mean by that? Are there edge cases causing problems, or major missing features? Easy or difficult to use?
Just the general problem you get with big, slow moving OSS projects like this. Mostly just docs not current and a massive delta between certain languages; a feature is `stable` for some languages but not others which makes it hard to push for consistent otel roll out in a mixed-language environment.
Some other "misc" points:
- Google how to do $thing and you might find the proposed spec which gives example code ... that isn't what actually got implemented. That's a different link further down on your google results.
- Python auto-instrumentation is ... fragile at best.
It's not super clear if instrumentation is supported only with well known frameworks or just ... in general.
I'd sure love some docs that explain how it works, too.
- certain things require the collector use GRPC, others work with grpc or http... and I only found this out after googling an obscure error and reading through a _very_ long GH issue thread.
The example presented seems to also log, they just annotated the logs with span data
> What we ended up implementing was a little tee inside the o11y library. As well as sending events to Honeycomb, we also converted them to JSON, and wrote them to stdout. That way, after sending to stdout, we then pumped off to our standard log aggregation system. This way, we've got a fallback. If Honeycomb is not working, we can just see our logs normally. We could also send these off to S3 or some other long term storage system if we wanted.
I'd like to go a step further, and say that in addition to being worried about honeycomb being down, sometimes you just want to check with kubectl to get an idea what is going on.
Our current projects are very log light because of the heavy tracing instrumentation, but it'd be nice to integrate this with the otel paradigms as they were originally intended
Agree, there being an open standard for instrumentation is a big win. Lots of work still needs to be done on showing more examples and making it more accessible to users & implementors.
One other key area is resources which can help get engineers/implementors to get organizational buy-in
Meh, I work in metrics observability and there's very little support for otel. Most new open source products are still based on Prometheus, which has much better SDK support than otel.
I think it's a mistake for Otel to do its own thing instead of just building on top of Prometheus.
I don’t agree with the communication patterns of either Prometheus or OpenTelemetry, but I’ll pick Prometheus next time I have to do telemetry. Unless there’s some fork of StatsD with tags that makes a resurgence.
But OTLP is still hot garbage right now. If you send otlp to Prometheus it might get there, or it might all end up being dropped by a parsing error, because otelcollector is dumber than a box of hammers that have been through a rock tumbler.
I am glad that the observability sector has standardized on a common protocol but my god are the reference implementations lacking.