Show HN: Open-source APM with support for tracing, metrics, and logs

derN3rd · on Sept 6, 2022

Nice to see so many new projects in the area of APM in the last few months.

We recently tried Signoz and Grafana Tempo and while I can't say something about uptrace yet (will definitely try it out) I want to list some pros and cons about them.

Grafana Tempo

Pros:

- Easy and smooth integration into our existing Grafana instance, no additional frontend needed

- No new storage engine needed (No additional Clickhouse, Postgres, etc) as it saves its data to S3

- Supports OTLP

Cons:

- Search is limited by param size and unique params (as its baked to be indexed)

- Ingestion is not in real time, but configurable (time to finish span)

Signoz:

Pros:

- Supports OTLP

- Integrates Logs and Metrics within the same service (for Grafana you need Loki then)

- Supports real time querying

Cons:

- Uses new storage engines (or extends the software stack) with adding ClickHouse

- Adds an additional frontend (might not be relevant for everyone)

- Doesn't provide SSO yet, so you need to manage users differently

Interesting to see, that UpTrace also chose ClickHouse (btw I love ClickHouse!)

Some questions:

- Can I easily disable certain features? (e.g. alerting)

- Is there support for SSO for self-hosted installation?

- Are there any recommendations for scaling (e.g. benchmarks) on how many spans/s are supported on what hardware?

Thanks in advance!

vmihailenco · on Sept 6, 2022

Thanks for the feedback!

>- Are there any recommendations for scaling (e.g. benchmarks) on how many spans/s are supported on what hardware?

With Uptrace, I was able to achieve 10k spans / second on a single core by running the binary with GOMAXPROCS=1. That is 1-3 terrabytes of compressed data each month which is more than most users need.

Practically, you are limited by the $$$ you are willing to spend on ClickHouse servers, not by Uptrace ingestion speed.

So my recommendation is to scale Uptrace vertically by throwing more cores at it. That will allow you to go very very far.

>- Is there support for SSO for self-hosted installation?

So far the only way to add news users is via the YAML config. We are considering to add a REST API or a CLI tool for the same purpose, but it is not clear how that would work with the YAML.

Regarding the SSO, it would be nice if you can provide an app that already does that so we can better estimate the complexity. But so far we don't have such plans.

>- Can I easily disable certain features? (e.g. alerting)

Yes, most YAML sections just can be removed / commented out to disable the feature.

derN3rd · on Sept 6, 2022

Thanks for the answer.

I have no other good examples for SSO than Grafana.

But something I would love to see more for logins are Application Tokens. We use Cloudflare Access for Team related logins, which will send such a token in the header, so the application can use it to authorize a user and by which group a user is in it can enable/disable features

https://developers.cloudflare.com/cloudflare-one/identity/au...

This solved our needs for multiple sign in options, as all is now managed through Cloudflare access, but this is obviously not a solution for everyone

vmihailenco · on Sept 6, 2022

We already use JWT tokens that are passed in a HTTP cookie so perhaps we could document how it works and let users sign JWT tokens themselves. That way your app only needs to set a cookie and the user should be authorized.

Let's continue the discussion on GitHub https://github.com/uptrace/uptrace/issues/76

pranay01 · on Sept 7, 2022

thanks for the mention. I am one of the maintainers at SigNoz [1].

Thanks for laying out the points in Pro section. We also recently launched logs witg v0.11.0 so you may want to give it a try again - we now have have metrics, logs and traces in a single app.

Would love to understand a few points in more details you have mentioned in Cons for SigNoz

> - Uses new storage engines (or extends the software stack) with adding ClickHouse Can you explain a bit more on the concern here?

> - Doesn't provide SSO yet, so you need to manage users differently

This is in our roadmap and we will be shipping it soon.

[1] https://github.com/SigNoz/signoz

Havoc · on Sept 6, 2022

Didn’t know about the grafana option thanks. Interesting - so one can do metrics dash, logs and tracing in one interface!

vmihailenco · on Sept 6, 2022

You can ingest data using OpenTelemetry Protocol (OTLP), Vector Logs, and Zipkin API. You can also use OpenTelemetry Collector to collect Prometheus metrics or receive data from Jaeger, X Ray, Apache, PostgreSQL, MySQL and many more.

The latest Uptrace release introduces support for OpenTelemetry Metrics which includes:

- User interface to build table-based and grid-based dashboards.

- Pre-built dashboard templates for Golang, Redis, PostgreSQL, MySQL, and host metrics.

- Metrics monitoring aka alerting rules inspired by Prometheus.

- Notifications via email/Slack/PagerDuty using AlertManager integration.

There are 2 quick ways to try Uptrace:

- Using the Docker container - https://github.com/uptrace/uptrace/tree/master/example/docke...

- Using the public demo - https://app.uptrace.dev/play

I will be happy to answer your questions in the comments.

0JzW · on Sept 6, 2022

this looks amazing! i would definitely like to use this for log monitoring. however, i have a question. is it possible to get logs for individual docker containers?

vmihailenco · on Sept 6, 2022

It is possible using Vector Logs which Uptrace supports out-of-the-box, for example:

- https://vector.dev/docs/reference/configuration/sources/dock...

- https://uptrace.dev/get/ingest/vector.html

If you are having troubles making it work, feel free to open an issue on Github and I will provide a complete example.

bovermyer · on Sept 6, 2022

I see lots of new tracing options these days, and that seems to have taken over the "APM" term.

I still have yet to see new profiling options. When I think of APM, I think of CPU profiling and automatic instrumentation of black box systems, not request tracing. I should be able to see which function calls are slow/problematic, without having to add code to the application.

vmihailenco · on Sept 6, 2022

I use memory profiling with Go and it is indeed very useful. I think that whatever Go already provides is enough, but I guess Uptrace could try to automate some things and/or provide some fancy UI.

But I find CPU profiling a lot less useful, because production profiles tend to be too broad and it is hard to make sense from them, for example, Uptrace profile mostly consists of memory allocations and network calls.

So I would not say that CPU profiling is superior/better/can replace tracing.

bovermyer · on Sept 6, 2022

Tracing is a piece of the puzzle and necessary. Profiling occupies a different part of the monitoring ecosystem.

nijave · on Sept 6, 2022

I think there is some interesting work being done with eBPF in the profiling space

jonasdevops · on Sept 6, 2022

Before you try, please make sure you are comfortable with their license - https://github.com/uptrace/uptrace/blob/master/LICENSE (Business Source License 1.1), which as License says "The Business Source License (this document, or the “License”) is not an Open Source license"

jarym · on Sept 6, 2022

It’s a shame that whenever something cool is posted on HN some long ass thread about licensing floats to the top.

Nothing wrong with the parent comment but do we really need a ‘yes it is’ / ‘no it isn’t’ back and forth that goes on seemingly forever almost EVERY TIME?

I’m gonna need a ‘license nitpick remover’ to complement my adblocker!

kadoban · on Sept 6, 2022

It says Open Source right in the HN headline. That the license contradicts that seems like important information to discuss.

Fnoord · on Sept 6, 2022

Go the comment and click next. Congrats you now avoided the entire discussion the comment spawned.

vmihailenco · on Sept 6, 2022

It is the same license used by MariaDB, Sentry, CockroachDB, Couchbase and many others. Technically it is not open source and instead is called source available, but you can enjoy pretty much the same benefits.

Out of curiosity, what makes you uncomfortable about the license?

fmajid · on Sept 6, 2022

Er, MariaDB is GPL2, and will forever be since it is derived from MySQL.

I'm guessing they aligned the license terms on those of ClickHouse, which is the underlying data store for Uptrace. From my understanding, if you use Uptrace and ClickHouse to manage your internal telemetry and don't offer it to clients, you should be fine. Still, non-standard license terms give pause, as there is always the possibility they will be restricted further in a bait-and-switch operation like that done by MongoDB or ElasticSearch.

vmihailenco · on Sept 6, 2022

>the possibility they will be restricted further

It is true for all licenses, for example, it is possible to keep old code available under old permissive license but release all new code under a more restricted license.

None is safe! :)

jillesvangurp · on Sept 6, 2022

This not correct. There's a difference between OSS projects with shared copyright (every individual contributor holds the copyright to their contributions) and oss projects where the copyright is required to be transferred to some company.

In the second case, this company then holds the copyright to the entire source tree and can re-license it at will. Mongo and Elastic did this and they are good examples of why you should not transfer copyright because they then have the right to re-license your contributions as they please. That's the reason they insist on copyright transfers. They reserve the right to change the license on future versions of the software. You can still use the old versions under the license that applied at the time.

So, Opensearch is a fork of the last Apache 2.0 licensed version of Elasticsearch. Versions after that are licensed under a non OSS license whereas Opensearch is a proper open source project where copyright belongs to individual contributors, which ironically is still mostly Elastic plus whatever individual opensearch contributors added.

Most projects don't insist on copyright transfers however and given enough external contributors it becomes increasingly hard for them to get permission to change he license.

Regardless of the license, there is absolutely zero chance of something like mysql, linux, or other long existing OSS projects ever being re-licensed because it would require tracing down tens of thousands of copyright holders (or their surviving relatives) to get permission for that most of whom would probably not be willing to do that. This is so impractical that it will never happen. And even if it happened, anyone could continue using and contributing under the old license. All you'd have is a fork that is cut off from those contributions (because licenses like Gpl v2 don't allow mixing with proprietary code). So given a permission you will never get, you'd have a fork that is effectively yours of an original source tree that still belongs to all the original contributors and is licensed under the original license.

OSS done properly builds communities that exist for as long as people continue to be willing to use and contribute to the software. Some OSS projects are now decades old.

vmihailenco · on Sept 6, 2022

I am not a lawyer so I can't keep up the discussion on the necessary level so I will just clarify few things.

Uptrace uses the BSL license to forbid or rather not allow other companies creating a cloud service using Uptrace code, because that is how we are planning to monetize. But you can self-host Uptrace and use it as you want to monitor your (production) application. I think this is fair enough.

I am sure there are many complications with re-licensing, but it happens in practice, for example, Sentry now uses BSL license. And you can't do anything about it except to fork old Sentry. But then you will have to maintain it yourself.

And I am not arguing with anything you've said, but there are not that many financially thriving truly OSS projects. That's why people like me have to complicate their lives with BSL, AGPL, and others.

Thanks for the comment!

avinassh · on Sept 6, 2022

> Technically it is not open source and instead is called source available, but you can enjoy pretty much the same benefits.

Since you already aware about it, could you update the OP from open source to source available, for the sake of transparency? If edit option is not available, you can request mods / dang

traceroute66 · on Sept 6, 2022

> Out of curiosity, what makes you uncomfortable about the license?

I can't speak for the OP, but my view is that all these semi-open (AGPL,BSL etc.) licenses do is muddy the waters. Its essentially giving the developer's lawyers enough grey area to work with in order to find something they can pin you on.

IMHO a company's code should either be closed source or open source. Wishy-washy no-mans-land wordings in the middle don't really help anyone (except the lawyer's bank balance, of course).

vmihailenco · on Sept 6, 2022

Uptrace uses the BSL license to forbid or rather not allow other companies creating a cloud service using Uptrace code, because that is how we are planning to monetize. But you can self-host Uptrace and use it as you want to monitor your (production) application. I think this is fair enough.

If there is another license that better reflects our intentions, let us know.

preisschild · on Sept 6, 2022

Why do you consider the AGPL semi-open? AGPL is just GPL specific to webservices

jonasdevops · on Sept 6, 2022

I cannot run uptrace in Production:

> The Licensor hereby grants you the right to copy, modify, create derivative works, redistribute, and make non-production use of the Licensed Work.

So what is the point of even trying to install it in other environments?

vmihailenco · on Sept 6, 2022

You can self-host Uptrace and use it in production environment. This is explicitly stated in the FAQ - https://github.com/uptrace/uptrace#faq . But you can't resell Uptrace to others.

I can only repeat that the same BSL license is used by MariaDB, Sentry, CockroachDB, Couchbase, and others. I am not a lawyer and thus not qualified to discuss details.

detaro · on Sept 6, 2022

Read the next sentence of the license.

KronisLV · on Sept 6, 2022

This seems like a pretty cool project!

Currently using Apache Skywalking myself, because it's reasonably simple to get up and running, as well as integrate with some of the more popular stacks: https://skywalking.apache.org/

I do wonder how ClickHouse (which Uptrace uses) would compare with something like ElasticSearch (which is used by Skywalking and some others) and how badly/well an attempt to use something like MariaDB/MySQL/PostgreSQL for a similar workload would actually go.

I mean, something like Matomo Analytics already uses a traditional RDBMS for storing its data, albeit it might be an order of magnitude or two off from the typical APM solution.

vmihailenco · on Sept 6, 2022

When compared with ElasticSearch, ClickHouse can handle the same amount data using 10x less resources and that is not an exaggeration. It is even worse with MariaDB/MySQL/PostgreSQL.

I guess ElasticSearch is still relevant when it comes to searching text, but ClickHouse is much faster when it comes to filtering and analyzing the data.

Give ClickHouse a try and you won't be disappointed.

https://benchmark.clickhouse.com/

tylergetsay · on Sept 6, 2022

I think the log interface should be optimized for keyboard navigation and larger screens. On my 4k monitor it only takes up 1/2 the width and only shows 10 lines at a time, id expect closer to ~100

vmihailenco · on Sept 6, 2022

Thanks for the feedback. Any projects that you could recommend that do it right?

tmd83 · on Sept 6, 2022

I wonder if anyone can answer some question on distributed tracing for me.

The difference between old days of APM vs. tracing as I understand is two things.

1. Originally APM was single process and it was language aware, usually do sampling stacktrace to find where times are being taken and some very well know place to instrument for exact timing say response time or query time.

Tracers are more working by instrumenting methods of framework/servers/runtime at well known point and getting the timing. In man ways it's a lot more coarse as it might know of a hot loop that I have in my code. But it can trace very well with exact timing at framework boundary like web, cache, db etc.

2. The APM were primarily single process and couldn't really show a different service/process which doesn't work in a micro-service/distributed world.

The way I understand it is that Tracers would allow me to narrow down to the service/component very easily. Whether I can find out why that component is slow might not be as easy (not sure what granularity tracing happens inside a component).

I wonder if this understanding of mine is correct.

The second thing I am really unsure about is sampling and overhead. What's the usual overhead of a single tracing (I know it's variable) but generally are they more expensive at a single request level. Also do they usually sample and is there a good/recommend way to sample this. I forgot exactly who but (probably NewRelic) was saying they collect all traces (like every request?) and discard if they are not anomalous (to save on storage). But does that mean taking a trace is very cheap? And is that end of the request sampling decision something that's common or that's a totally unique capability some have.

vmihailenco · on Sept 6, 2022

My understanding is that APM became or always was a marketing term which is used rather freely. For that reason I try to avoid it, but search engines love it and I don't know a better alternative.

>Whether I can find out why that component is slow might not be as easy (not sure what granularity tracing happens inside a component).

It is true that you can't always guess what operation going to be slow and instrument it, but it is almost always a network or a database call. There is still no way to tell *why* it is slow, but the more data you have the more hints you get.

>What's the usual overhead of a single tracing

Depending on what is your base comparison point the answer can be very different.

Usually, you trace or instrument network/filesystem/database calls and in those cases the overhead is negligible (few percents at most).

>But does that mean taking a trace is very cheap?

What you've described is tail-based sampling and it only helps with reducing storage requirements. It does not reduce the sampling overhead. Check https://uptrace.dev/opentelemetry/sampling.html

But is taking a trace cheap? Definitely. Billions of traces? Not so.

>request sampling decision something that's common or that's a totally unique capability some have.

It is a common practice to reduce cost of sampling when you have billions of traces, but it is an advanced feature because it requires backends to buffer incoming spans in memory so you can decide if the trace is anomalous or not.

Besides, you can't store only anomalous traces because you will lose a lot of useful details and you can't really detect anomaly without knowing what is the norm.

Hopefully that helps.

tmd83 · on Sept 7, 2022

By traditional APM I primarily meant stacktrace sampling based monitoring of applications.

As for overhead of tracing I wanted roughly compare (obviously it depends on the application a lot) stacktrace sampling vs. tracing based one. Are they usually of similar overhead or say tracing is lighter?

I was thinking tail based sampling could be a lot more expensive because say a head based sampling is doing trace for 10% request whereas regardless of how many sample are kept a tail based one is dong 100% trace. So tracing overhead would be much higher right?

I'm not sure why head based sampling is being called accurate in your doc? Isn't it the least accurate in a sense that it's purely statistical and rare outlier like latency spike or error could be missed?

And yes obviously a tail based sampling has to be something like (trace 5% random request or 1 every five + any outlier that gets calculated based on the captured trace)/

PeterZaitsev · on Sept 6, 2022

False Advertising!

BSL Licensed is not Open Source. To Be fair Utrace restrictions are relatively light but it is still Source Available project not Open Source

nik736 · on Sept 6, 2022

Nice! Exactly what I've been looking for, will give it a try for sure. Sentry eats a lot of resources so I was looking for an alternative.

vmihailenco · on Sept 6, 2022

Thanks! Don't hesitate to send any feedback you have so we have a chance to improve :)

edf13 · on Sept 6, 2022

Looks nice... I'm a bit out of touch in this space but my last solution for similar would be Datadog. How does this compare?

vmihailenco · on Sept 6, 2022

DataDog has a high learning curve and can be rather expensive if you need to monitor a lot of hosts and microservices.

Uptrace tries hard to stay simple while providing almost the same set of features. It can also be self-hosted without paying anything which can save a lot of money.

Uptrace aims to be an open source alternative to DataDog, but realistically we are not there yet.

xyzzy_plugh · on Sept 6, 2022

I've been out of the loop for a while but...

> OpenTelemetry Protocol (OTLP)

> OTLP

> OLTP

I'm going back to bed.

xfer · on Sept 6, 2022

Anyways to export dashboard for public viewing, maybe even static image? It looks like all drawing is done client side at present.

vmihailenco · on Sept 6, 2022

Echarts which we use supports exporting charts as images so it probably can be added relatively easy. Embedding is another possible option.

ram_rar · on Sept 6, 2022

can you elaborate more on why clickhouse for backend? And what challenges if any are you facing with clickhouse?

wdb · on Sept 6, 2022

How does it compare to Opstrace? (www.opstrace.com)