BlazingMQ: High-performance open source message queuing system

fidotron · on July 27, 2023

That fan-out functionality is particularly interesting for anyone needing to implement the upstream part of a firehose API, which of course Bloomberg do.

MathMonkeyMan · on July 27, 2023

They actually open sourced it!

I worked on this team very briefly. Great team working on some interesting tech.

theptip · on July 28, 2023

Would love to hear any technical history you can share, what did this replace, and what shortcomings led to it being built? (Seems quite similar to Kafka but perhaps more emphasis on low latency rather than raw throughput?)

callbacker · on July 28, 2023

Some of your questions might be answered in the accompanying blog - https://www.bloomberg.com/company/stories/bloomberg-publishe..., and the comparison doc https://bloomberg.github.io/blazingmq/docs/introduction/comp....

melenaboija · on July 27, 2023

I have been working on an engine to perform heavy computations that is distributed and my messaging choice has been gRPC + Flatbuffers + proxys and I was wondering if someone could name some reasons or use cases to chose message brokers vs a lower level approach like gRPC. Is it mostly because of synchronous/asynchronous needs or there is something else such as latency or even ease of implementation?

ozarker · on July 27, 2023

Depends on your use case. Queueing messages enables the competing consumer flow which implicitly allows load-leveling which is great if your traffic is spiky and you need a little time to auto-scale or other scenarios like that. If you’re just looking to blast out data to all consumers or just load-balancing is fine, approaches like yours are probably better.

stolsvik · on July 29, 2023

It is the asynchronous and decoupled nature of messaging that pulls architectures towards message brokers. In a "MOM Architecture", the producing of messages, and consuming of messages, is completely decoupled - both in "space" (HA, location transparency, replication etc) and in time (the async nature).

This leads to a whole heap of benefits. I've outlined a few of them here, in the "Rationale for Mats3", my Message-Oriented Async RPC library: https://mats3.io/background/rationale-for-mats/

You will find that the Reactive Manifesto also lays out the same type of arguments (there's some links to it from the list in the first link): https://www.reactivemanifesto.org/ and its glossary: https://www.reactivemanifesto.org/glossary

pravus · on July 27, 2023

If your workers can hop onto the next task without having to reach out for any additional state or report back to anything synchronously a distributed log/spool/queue/broker works well.

If you need light-weight reporting like a status code you can use RPC/reply mechanisms if the broker supports it (e.g. rabbitmq), build your own (custom headers), or use an API/gRPC call.

Everything else starts needing some sort of more sophisticated coordination mechanism and gRPC will look attractive for that although there are usually other options as well.

imachine1980_ · on July 27, 2023

grpc client a -> grpc server a for client a grpc client b -> grpc server a for client b

grpc client a -> message broker(implemen a) -->

grpc client b -> message broker(implemen b) -->

message broker-> your app for the broker this make sense if dont make sense manage the quque in your app if ypu want to abstract the implementations is worth if you have n to 1 or n to n and all works different

insanitybit · on July 27, 2023

> Supports a unique multi-hop network topology leading to a distribution tree for each queue, thereby leading to network bandwidth savings for certain use cases.

This is very interesting. One of the benefits of Kafka is partition routing so I'm curious to learn how this may compare.

Sparkyte · on July 27, 2023

No idea, but I am interested too. One of the reasons this was probably created was because of a network constrained environment where data is expensive in more than one way.

Cloud Providers can be expensive.

WJW · on July 27, 2023

Bloomberg (who built this) operate a system that makes money by transmitting the same data to a huge amount of clients. Having one server transmit the same piece of information to a million client is approximately 500 times slower than having one server transmit it to 1000 proxies who each pass it on to 1000 end users.

pclmulqdq · on July 27, 2023

The actual architecture of most of these systems in finance relies heavily on UDP multicast, which is a technology that big tech has basically forgotten because it can be tough to administer at large scale.

Multicast puts the entire load of distributing the data in its natural place: the network. The pub/sub queues that the rest of the world uses are more complicated and a lot worse.

This appears to be the beginning of a "cloud-native" bloomberg that can't lean on multicast.

Turskarama · on July 28, 2023

UDP Multicast relies on constantly throwing out the entire state of your system in order to be reliable, whereas with TCP you only need to throw out changes. This is fine in finance because the state is changing constantly anyway so you might as well just continually throw out the multicast traffic.

Message queuing is a more complicated system, but it allows you to push the bulk of your work out to the distributed network instead of concentrating it on the producer.

pclmulqdq · on July 28, 2023

That is definitely not true about state management - you can build very TCP-like systems on top of UDP (see QUIC), and the final adaptation you need is to figure out how to make that work with multicast (a non-trivial adaptation, but possible). Once you do, you have the foundations of a pub/sub system in the form of a one-to-many (or many-to-many if you are brave enough) network protocol, and now you basically only have to figure out how to do state recovery of new clients and clients who are behind, and that can basically just be a database. The system described in the OP is kind of a combination of a database and load balancer in order to do this over unicast connections.

You have to do this state management with a TCP-based system anyway - the use of TCP doesn't magically come with the full state of past messages.

zarkov99 · on July 28, 2023

This is not right in finance at least. Messages updates are typically incremental just like in TCP. The reliability problem is dealt with with careful network stack tuning, core affinity and out of band recovery.

buttaphingas · on July 29, 2023

Check out 0MQ (ZeroMQ), Informatica Ultra Messaging (formerly 29 West), and aeron.io They create reliable uni/multicast protocols over UDP, including strategies for NAK storms etc.

cgio · on July 28, 2023

Keep in mind Bloomberg works with a few “cloud native “ consumers. In that context it may make more sense.

IggleSniggle · on July 28, 2023

Well that certainly sent me down a rabbit hole. Seems like UDP Multicast is really IP Multicast. I'd love to play with this

pclmulqdq · on July 28, 2023

Yes, UDP on top of IP multicast - UDP is a pretty thin layer on top of IP in general, and the IP protocol is deceptively powerful. You just can't run protocols like TCP over IP multicast.

eric-hu · on July 27, 2023

> Cloud Providers can be expensive.

Do you have a story around this? If so, would love to hear more about it.

lelandbatey · on July 27, 2023

Specifically, egress from Cloud Providers (transmitting data from inside the cloud to a computer outside the cloud) is the usual culprit for being "very expensive". It's part of a pricing strategy by cloud providers that encourages folks to put all their infrastructure inside of the cloud and to have none of it outside the cloud. As one example of the ink spilled over this topic, see this discussion from 2 years ago about AWS's very high egress fees:

Article "AWS’s Egregious Egress" (2021) https://blog.cloudflare.com/aws-egregious-egress/

HN discussion: https://news.ycombinator.com/item?id=27930151

beyonddream · on July 27, 2023

Ingress can also be costly especially if there is a steady state of high load traffic sent from on-prem machines to machines inside cloud (not sure about aws but have experienced this with azure where we had to resort to buy their expressroute which was very costly and ultimately unsustainable for us)

linuxdude314 · on July 28, 2023

Ingress is free for AWS. Doesn’t make any sense to charge for ingress…

saled · on July 28, 2023

Something else I was caught out on recently is cross availability zone traffic, especially for multi AZ kubernetes where the nodes are very chatty. It's possible to get the services to consider network topology to limit cross az traffic, but it's not the default.

itscodingtime · on July 27, 2023

Can anyone give an example in which there would be less network bandwidth? This tree topology seems like it would use more bandwidth.

gavanm · on July 27, 2023

Instead of a the tree structure, think about it as Data Center -> WAN -> Site Server -> LAN -> Client

By having a proxy/repeater/site server, you are reducing the amount of WAN traffic you need to send.

manvillej · on July 28, 2023

maybe if you're hosting replicas or proxies outside of a highcost bandwith area?

low cost end replicas at the end. if the bandwidth usage is half the cost in their multihop topology, 14XXXX compared to 16XXXX? depends on the number of proxies or replication and pricing.

the-alchemist · on July 28, 2023

Bloomberg has some interesting code! https://github.com/orgs/bloomberg/repositories

Has anyone used comdb2? It seems like an awesome DB that doesn't seem to get the level of love it should.

cmacleod4 · on July 28, 2023

Comdb2 is very heavily used inside Bloomberg; outside not so much.

the-alchemist · on July 29, 2023

If it's heavily used inside Bloomberg, that's good enough for me!

BTW, the original Comdb2 paper from 2016 is good [0]. It has features that even today seem awesome. It sounds like a really nice distributed MySQL/PostgreSQL.

[0]: http://www.vldb.org/pvldb/vol9/p1377-scotti.pdf

Alifatisk · on July 29, 2023

Is it worth exploring?

1letterunixname · on July 27, 2023

Marketing is all well and good. Let's see the data of p99 latency and throughput numbers compared to rabbitmq, kafka, activemq, qpid, and hornetq.

Also, it's worth identifying and setting aside use-cases for broker-less messaging where 0mq, nanomsg, etc. could be more appropriate.

Edit: There's only published performance data of itself, which has little value: https://bloomberg.github.io/blazingmq/docs/performance/bench...

eightysixfour · on July 27, 2023

What a phenomenal landing page. Thanks to the animations I have a good idea of what this does and how it does it. You barely need to know what message queuing is to understand it.

gr33nq · on July 27, 2023

Does anyone recommend any particular piece of software for creating these types of visualizations with relative ease? I've always been a fan of animations for helping to understand some technical topics.

maxwellito · on July 27, 2023

As erwinkle mentioned, these animations use anime.js If you want to build something but don't really know how to do it, I would suggest the following:

1. Build the visual content in an SVG with Inkscape. It should be easy for you to build the layout with assets/item to animate. Don't forget to group items and set names so you can reference them later.

2. Once exported, insert the SVG in an HTML page and add anime.js. This library will let you animate the content you just created (you should be able to reference your named items from Inkscape and animate them). The learning curve might be tough depending on your experience regarding animation (or CSS in general).

Good luck :)

erwinkle · on July 27, 2023

They use anime.js for their animations https://animejs.com/

As seen in their source code here: view-source:https://bloomberg.github.io/blazingmq/assets/animations/anim...

Austizzle · on July 27, 2023

Lottie.js is a library for animations in JavaScript, and after effects can export to it! You're obviously limited in what you can export, but if you do a basic after effects animation with shapes and text, it works fine.

Pretty sure it was made by Airbnb, and used in tons of apps like Google home

Works really well

nine_k · on July 28, 2023

There are free tools to create lottie animation, Synfig in particular supports it.

whoisthemachine · on July 27, 2023

All the replies mention technologies to help you build one, but this chart looks like a modified, animated version of a Petri Net to me, which is a very useful chart for visually describing the flow of data between systems. I've built similar charts using PowerPoint and exporting the PowerPoint as a GIF... same effectiveness for internal use, but not something you're going to want to brag to your friends about.

jasonlotito · on July 27, 2023

Okay, this is going to sound weird, but I've used Keynote in the past. I'm sure PowerPoint can do this as well, but Keynote has Magic Move, and you can export this animation. It's very easy to work with, and do simple animations. I'm sure there are much better tools for this, but as someone who knew Keynote, this was always effective for me.

haswell · on July 28, 2023

Keynote is surprisingly good for this, and I’ve used it for animated data flow visualizations in the past.

While I like the idea of something like anime.js and think it raises the ceiling on what’s possible, I’d still probably start with something like keynote just to get the idea out of my head and into a format I can validate before going into code.

danielvaughn · on July 27, 2023

I've heard Framer can make some impressive animations, but I don't know much about how well they export to web. What's on display here is really cool, it's all animated SVG.

treprinum · on July 27, 2023

animatron.com for animated SVG or JS. I used it for some algorithm animations myself.

munro · on July 27, 2023

i've just used html/css/javascript then screen recorded it

ElongatedMusket · on July 27, 2023

Getting started pages are great too, with guided examples and use cases: https://bloomberg.github.io/blazingmq/getting_started

BMQ license is apache 2.0 for anyone curious

nine_zeros · on July 27, 2023

Agreed. This feels like a page created from enthusiasm, not from a place of promo-seeking performance review.

scottlamb · on July 27, 2023

The animations are very slick, but I can still grumpily pick at them: the "Clustering and Quorum-based Replication" one shows 4 total servers (suboptimal number because quorum requires 3/4 where I'd prefer to require 2/3 or 3/5) and all the followers acking before the primary proceeds (the animation could show one lagging instead).

amelius · on July 27, 2023

Also, why is the middle replica special, in that the message is passed through it in the end?

BeeOnRope · on July 27, 2023

4 might be useless compared to 3 from a "required quorum" PoV but if you want to stay up with 1 failure and the load can't be handled on 3 servers but 4 is OK, it is optimal isn't it?

I.e. selection of ideal server count is a multi-dimensional problem.

scottlamb · on July 27, 2023

Worse than useless from a quorum perspective, because you've increased the number of servers that have to participate in quorum (quickly) without increasing the number that don't. Looking at it the other way, you go from having problems when 2/3 are slow/broken to having problems when 2/4 are slow/broken. (also note slow doesn't even mean the machine is having problems; in geographically distributed setups it may just be due to physical distance from the leader.)

Yes, 4 servers might handle more load than 3 servers [1], but so would 5 servers. I'd guess that if you've got enough message queue traffic that 3 servers can't handle the load, those servers aren't a significant fraction of your total cost. You might as well go to 5. Likewise, if 5 aren't enough, I'd skip over 6 and go to 7. If 7 aren't enough...I'd probably rethink my design entirely...but I'd prefer 9 to 8.

Alternatively, quorum systems can support "zero-weight replicas" which effectively don't vote but still can be useful for eventually consistent reads. Then you have this server that helps handle the load but doesn't increase the number of servers that have to participate in a round.

[1] if the load is overwhelmingly eventually consistent read operations and their multi-hop topology with proxies doesn't sufficiently reduce the load on the replicas. It's not obvious to me if this scenario is plausible or not.

rewmie · on July 27, 2023

I agree, this is perhaps the very best landing page I ever saw. What a treat.

mkl95 · on July 27, 2023

How does it compare to NATS.io?

leetharris · on July 27, 2023

It is amazing to me that NATS.io is not more popular. It is so much better than Kafka, extremely simple, and extremely light.

klabb3 · on July 28, 2023

Same here! I don’t get it. It’s much more lightweight, fully FOSS and their custodian company (Synadia) is very friendly and open to self-hosting. Basically, they run a global super-cluster of Nats which is pretty much vanilla afaik.

My main gripe is lack of “best practices”. You’re pretty much on your own when it comes to important decisions like subject namespacing and some operational stuff can be a bit tricky, like changing options and migrating. They had “nats by example” but that seems to be quite out of date and not super helpful. They have started doing more videos which are great but not that much content overall.

Their auth story is very sophisticated but perhaps a bit over-engineered. Some use cases like having anonymous “self-signed” client keys are tricky to work with.

Other than that, it’s an amazing piece of tech. One that is scalable in a way that just makes sense and gets out of the way when multiple machines are involved.

slantedview · on July 27, 2023

Nats is simple because it doesn't bother addressing some of the scalability problems that Kafka was designed to address, like clustering with partitions:

https://docs.nats.io/legacy/stan/intro/partitioning

coder543 · on July 27, 2023

You linked to NATS Streaming, which has been deprecated for years. It’s not relevant. It can still be used for legacy code that still needs it, as the URL even shows, but it is not recommended for new applications.

The current standard is NATS JetStream, which absolutely addresses scalability, arguably to an even larger degree than Kafka, since it can be used to create a globally interconnected supercluster of clusters where messages can be configured to flow between regions as needed. I doubt anyone would run a Kafka cluster that spans multiple regions.

kitd · on July 28, 2023

I doubt anyone would run a Kafka cluster that spans multiple regions.

Stretch clusters and clusters with brokers in different AZs are absolutely a thing in Kafka. I make no comment on the ease relative to Nats, which I don't know at all.

coder543 · on July 28, 2023

I was talking about different regions altogether, not just different AZs. My understanding is that it’s generally a bad idea in Kafka due to the latency, but I could be wrong.

EDIT:

Definitely sounds like there are some tight latency requirements[0]:

> A stretched 3-data center cluster architecture involves three data centers that are connected by a low latency (sub-100ms) and stable (very tight p99s) network, usually a “dark fiber” network that is owned or leased privately by the company.

100ms would theoretically be enough to span the contiguous United States, but the references to "very tight p99s" and "dark fiber" make me wonder if 100ms is actually acceptable, or just a theoretical maximum that can only be allowed under absolutely perfect conditions. Either way, suggesting the user should have access to dark fiber between their regions does not fill me with confidence about the robustness of this solution. I'm sure it would be fine for geographically close datacenters, as AZs are designed to be.

[0]: https://docs.confluent.io/platform/current/multi-dc-deployme...

slantedview · on Aug 1, 2023

> My understanding is that it’s generally a bad idea in Kafka due to the latency

Mostly because of the speed of light, not because Kafka is for some reason unable to form a cluster across regions but Nats can.

topspin · on July 28, 2023

> How does it compare to NATS.io?

They're fundamentally different designs.

Garden variety NATS (i.e. the non-"Jetstream" case, hereafter) is "at most once" (messages will be lost.) BlazingMQ is "at least once" (>1 delivery of a message will occur.)

That difference is down persistence. NATS doesn't persist messages to stable storage. BlazingMQ must use persistent storage. From that all sorts of application design, performance and other implications are derived. And they all matter. A lot.

fadedsignal · on July 27, 2023

A yet another MQ :-) I quickly checked the performance page and it looks much better than its competitors. Documentation is better as well.

MathMonkeyMan · on July 27, 2023

Last time I checked, they weren't even building with optimization, and it was still the fastest thing in the West. Peeved me, but whatever.

"Every byte counts" was the motto of the team that designed it.

glonq · on July 27, 2023

Are they working on other API's besides Java and C++?

callbacker · on July 27, 2023

Hi, one of the authors here. We will be releasing a Python client library in a few weeks.

KrishnaAnaril · on July 28, 2023

Any plans to support .NET?

mikece · on July 28, 2023

There are a number of ways to wrap native code libraries as a managed library. I did it once (packaged an Obj-C library as a .NET DLL for a Xamarin app) back in 2018 but had lots of help from a teammate who knew that dance far better than me and I forgot all of the details. I googled and found at least four different blogs posts showing how to do it.

I know that's not an answer but if you're feeling adventurous you might be able to help the project.

glonq · on July 28, 2023

+1 vote from me for a first-class client for .Net core.

I'd tolerate wrappers for certain things, but not for this thing.

callbacker · on July 28, 2023

Hi, no plans as of yet. Please also see my response to the Golang client library request above.

no1youknowz · on July 28, 2023

Are there plans for a native Golang library? That's the only thing stopping me from looking at this. Many thanks!

callbacker · on July 28, 2023

It's not on the roadmap for the next 6 months. But priorities can change depending upon the demand. One thing to note is that BlazingMQ client libraries are very stateful, and implementing a batteries included library is a non-trivial effort.

frakkingcylons · on July 27, 2023

The python client library is planned to be published. They say they need to remove some dependencies and update the workflow to something more suitable for OSS.

stolsvik · on July 29, 2023

I'm the developer of the Message-Oriented Async RPC library Mats3: https://mats3.io/

Its current sole implementation is based on Java Messaging Service JMS API, and it is used in production for a rather large UCITS unit holder registry on Apace ActiveMQ, and all tests runs fine on Apache Artemis MQ.

Every time a new message broker comes along, I sit up in the chair and wonder a) about their performance (!), and b) whether they have a JMS client implementation, and c) whether Mats3 works with that! When I tested RabbitMQ's JMS client library, I sadly found that there was rather many differences - things I thought was screamingly obvious, was not available. E.g. as basic function as redelivery: "Normal" MQs try to redeliver N times, typically with a backoff between each attempt, and then, when all N attempts fail, it puts the message on the DLQ. Rabbit instead tight-loops the delivery attempts until either the message goes through, or the sun burns out. To be able to use Rabbit, I would have to use the native API, and implement redelivery and DLQing myself, client side. Also, transactions.

I now wonder whether I should make a lower-level abstraction, so that the JMS implementation is converted to a "base" impl, and then the JMS, Rabbit, Bloomberg, NATS, ZeroMQ, Aeron, etc implementations was extensions, or "plugins", or "drivers", to that.

the-alchemist · on July 29, 2023

Sounds great, and you have lots of nice documentation on the page, but could you provide a TLDR? There's a lot of competition in this area: GRPC, Cap'n'proto (was posted on HN a day or two ago), NATS, etc.

I'm also having trouble figuring out if Mats3 is a library (with a JMS API) over a variety of messaging systems (WebSockets, NATS, etc.)?

P.S. Some diagrams like https://bloomberg.github.io/blazingmq/ would be very helpful, especially at https://mats3.io/background/what-is-mats/. If a picture's worth a thousand words, and an animation must be worth at least 10k words. :)

stolsvik · on July 30, 2023

I am truly finding it hard to explain it. I have tried along a dozen angles. I actually think it is a completely new concept - and that might be the problem. But there is actual value here, because it "merges" the amazingness of using messaging, with the simplicity of "linear thought" that synchronous, blocking code gives (i.e. ISC using e.g. REST or gRPC). #)

There is an illustration on the front-page: https://mats3.io/

This page tries to directly explain the idea - but I guess it assumes that the reader is already extremely familiar with message queuing? https://mats3.io/docs/message-oriented-rpc/

Here's a set of small answers, from different angles, to "What is Mats?": https://github.com/centiservice/mats3/blob/main/README.md#wh...

Here's a way to code up Mats3 endpoints using JBang and a small toolkit which makes it extremely simple to explore the ideas - the article should also be skimmable even without a command line available: https://mats3.io/explore/jbang-mats/

If you read these and then get it, I would be extremely happy if you gave me a sentence or paragraph that would have led you to understanding faster!

The concept of messaging with queues and topics are essential to Mats3 - but the use of JMS is really just a transport. I could really have used anything, incl. any MQ over any protocol, or ZeroMQ, or plain TCP - or just a shared table in a database (but would then have had to implement the queue and topic semantics). As a developer, you code Mats3 Endpoints by using the Mats3 API, and you initiate Mats3 Flows using the API. You need an implementation of the MatsFactory to do this, and the sole existing is the JmsMatsFactory - which works on top of ActiveMQ and Artemis's JMS clients.

Wrt. WebSockets, that is a transport typically between a server, and a end-user client, e.g. an iOS App. Actually, there's also a "sister project", MatsSockets, that bring the utter async-ness of Mats3 all the way out to the client, e.g. a webpage or an app. https://matssocket.io/

NATS is, AFAIU, just a message broker, with some ability to orchestrate. Fundamentally, I do not like this concept of orchestration as an external service - this is one of the founding ideas of Mats3: Do the orchestration within each service, as you would do if you employed REST as the ISC mechanism. I do however assume that one could make a Mats3 implementation on top of NATS client libs.

#) Java's Project Loom is extremely interesting to me, as its argument for using threads as the basis of concurrency instead of asynchronous styles of programming, is exactly the same rationale for which I made Mats3: It is much simpler for the brain to grok a linear, sequential, "straight-down" code, than lots of pieces of code that are strung together in a callback-hell. Async/await is a way to make such callback-hell more readable, "linearaize it" (but you still have the "colored methods" problem which Loom just obliterates in a perfect explosion). One could argue that this is what I have achieved with Mats3 when using messaging.

nodesocket · on July 27, 2023

Awesome landing page and looks great. Documentation is very well done. Would love to see more clients (python) and a web u/i.

callbacker · on July 27, 2023

Hi, one of the authors here. We will be releasing a Python client library in a few weeks.

accrual · on July 27, 2023

Does anyone here do large scale interface stuff with many millions of messages? I do this in healthcare and would like to try applying my skills somewhere else...

didip · on July 28, 2023

Just about any large tech companies and unicorns have a large queue (or many of them).

matthewpersico · on July 28, 2023

https://careers.bloomberg.com/job/search :-)

renewiltord · on July 27, 2023

Very interesting. Thank you for benchmarks. Thank you for sharing. In a zero-replication no-leader-election (or all-in-one-node) scenario what is min latency observed? It appears predominant delay is in ACKing to ensure durability, right?

C++ and Java client examples are very simple which is cool.

Would appreciate if there is hook to enable `SO_TIMESTAMPNS` on socket and retrieve `cmsg` to get accurate timestamping and permit client code benchmarking.

callbacker · on July 28, 2023

Hi, one of the authors here.

> In a zero-replication no-leader-election (or all-in-one-node) scenario what is min latency observed?

We saw sub-millisecond latency in this case (typically double-digit microseconds).

renewiltord · on July 28, 2023

Splendid! Great work, guys. And thanks again for sharing. Dockerfile is good way to share build deps.

callbacker · on July 28, 2023

Thank you!

stolsvik · on July 29, 2023

I find it curious that lots of people always drag in Kafka as comparison to message brokers. Their own architecture (log of messages, vs. delivery of messages), but in particular the system architecture they lead to when used (event sourcing, vs. "react to events and commands"), are extremely dissimilar.

Kafka has its uses, in particular for massive influx of events, e.g. in a large IoT system - I'd say the perfect example would be continuous measurements of tens of thousands of gauges and sensors on an oil rig or any other large production system, or e.g. the energy meters in every home.

But I would personally never use such a system as the inter-service communication layer for a multi-service architecture. It is WAY to heavy coupling. Event sourcing looks fantastic on the surface, but is a disaster for a decades-living system with tons of developers. REST/gRPC is better, but async messaging really rocks.

marcelius · on July 29, 2023

How is Kafka causing heavy coupling? Serious question.

stolsvik · on July 30, 2023

If you use Kafka in an event sourced fashion, whereby each service emits events that it puts on a bunch of Kafka topics, and each service that needs that information consumes those events (by reading and applying those events to its local understanding of the world), I argue that you have effectively made a massive distributed and shared database - shared between all the services.

This goes smack in the face of the idea of "one service, one database" 1), where there is a distinct boundary where each service owns their own data. How a given service stores it data is of no concerns to any other service - the communication between them is done using e.g. REST or messages, with a clear contract.

Event sourcing / Kafka architectures are the exact opposite of a clear contract. You are exposing the absolute, deep-down innards of the storage solution of a service, by putting its microscopic events down for all to see, and all to consume. You may do aggregations, and emitting more coarse-grained "events" or state changes, thus kinda also exposing "views" of these inner tables, and maybe use a naming scheme to sort of which are "public" and which are not.

In the beginning, I really did find the concept of event sourcing to be extremely intriguing: Both the "you can get back to any point in history by replaying up-to", and "forget the databases, just emit your state changes as events" (I truly "hate" databases, as they are the embodiment of state, which is the single one thing that makes my field hard!), the ability to throw up a new projection for anything you'd need (a new "internal view") of the state, and that unit testing could be really smooth.

I upon deeper delving into the concepts found that the totality of such a system must quite quickly become extremely heavy: The amount of events, thus needing snapshots. Evolution of events, where you might have no idea of who consumes them (that is, the "shared database" problem). The necessary understanding, throwing a half-century or more of relational databases under the bus. The performance tweaking if things start to lag. Etc etc etc. It would become a distributed system in the worst way a distributed system could be distributed: All state laid out in minute details, little abstractions, and a massive diverse set of different implementations in the different services to get back to a relational view of the data. And this is even before the system gets a decade old, with lots of different developers coming and going, all of them needing to get up to speed, and the total system needing extremely good and hard steering to not devolve into absolute utter chaos.

That Kafka can be employed and viewed as a database has been argued hard by Confluent. Here's their former DevEx leader Tim Berglund explaining how databases are like onions, and that in the base of every database there is a commit log. And that this log is equivalent to a set of Kafka topics. 2) So why not just implement all the other layers of the database in application logic? Confluent even have a lot of libraries and whatnots to enable such architectures.

1) "Microservice Architectures", patterns: https://microservices.io/patterns/data/database-per-service....

2) JavaZone Talk from 2018 by Tim@Confluent: https://2018.javazone.no/program/3a9644e6-15b5-4c66-a28c-c35...

endisneigh · on July 27, 2023

it never ceases to amaze me how the software engineering profession has very smart folks reimplement the same conceptual primitives over and over again. I'm curious if hospitals (if that's even analogous to a company) do similar things by having different processes to solve the same problem.

I imagine this is used for Bloomberg (the terminal) and not Bloomberg (the website)?

going back to the article - fantastic animations. I'm just as curious to how that was made as the queue itself.

marginalia_nu · on July 27, 2023

I feel like on the contrary there's an enormous cost attached to using prefab components that only kinda do what you need them to do. Shops that have embrace this philosophy have been some of the most inefficient ones I've ever encountered. Screens of yaml code was written to implement tasks that could have been performed with lines of programming code; and many man-years were spent implementing functions that copied fields from one object to another nearly but not quite identical object.

It's a big driver of both complex and slow software. Often the adaptation layer necessary to get it to actually satisfy your needs rivals the size and complexity of the piece of existing solution code you're trying to leverage.

Software that is built from the ground-up to your exact needs is often both faster and slimmer and more maintainable and suffers from far less dependency churn.

ecshafer · on July 27, 2023

There is a balance to Not In House syndrome where you build everything yourself, and the opposite: everything should be vendored in. I am inclined to think that a healthy NIH syndrome is the better place to be. Companies which vendor everything move along at a snails pace, and often getting your outside product to work with internal needs is a huge years long endeavor, to the point just writing a decent version yourself that is exactly what you need is really useful.

marginalia_nu · on July 27, 2023

NIH has shifted in meaning over time. It used to literally mean Not Invented Here.

In the original sense, the NIH approach would be to invent a rivalling technique to the message broker. That is indeed a questionable idea.

ecshafer · on July 27, 2023

I think I just got it wrong. Not invented here sounds better and more right. I just typically think of only the acronym, and decided to write it out in case someone hadn’t heard it (and thus made a mistake). Good catch.

marginalia_nu · on July 27, 2023

There's a bunch of alternative meanings that have cropped up to suit the new and wider meaning (which is definitely common).

Seen Not Implemented Here too.

throwaway2037 · on July 28, 2023

In my experience, the problem with vendor: What if a feature is missing? I see this often when I am forced to use a vendor product. And, on day one, no one (hands on) usually knows anything about the vendor product, so there is a huge learning curve. If you follow NIH, then you organically build knowledge in the org. And features can be added with time+money. (Sure, some vendor products will add features, but usually for an insane price tag.) You cannot win either way: They hardest part is knowing which one to choose.

wmfiv · on July 27, 2023

Yeah. This works both ways. At small scale or early stage you should try like hell to work backwards from pre-existing components so that everything does exactly what you need them to do by definition. Eventually you outgrow that and it makes sense to build things that do exactly what you want because the incremental value is worth it.

vsareto · on July 27, 2023

>it never ceases to amaze me how the software engineering provision has very smart folks reimplement the same conceptual primitives over and over again.

Well, no one cares, but I'm not saying that to be mean to dismissive, it's just that software production as an industry (esp. at a company like Bloomberg) can basically do whatever it wants because of low capital costs to produce something. Things like writing another queue system will have zero impact on the industry unless it's adopted widely (and cargo culted) and later made worse.

Maybe I'm about to eat my shoe, but if a capable of team at Bloomberg wanted to reinvent something considered unnecessary in terms of industry need, they probably could if for no other reason that management allowing some indulgence in exchange for keeping talented people around is Worth It™.

Scubabear68 · on July 28, 2023

Bloomberg had a history of inventing their own stuff going back to the first terminals. They had their own networking protocols and everything, when I was last there you could still see remnants in how machines were named.

They do adopt a lot of industry stuff, but it is always heavily modified. And of course they love “Invented Here” as a corporate tradition going all the way back to the beginning.

It’s actually nice from a developer perspective, the R&D group is highly valued by management.

throwaway2037 · on July 28, 2023

You wrote: low capital costs to produce something

I don't understand that phrase. Probably a large team of very well paid engineers developed this product. Bloomberg is well-known in the industry for paying well. "[L]ow capital costs"? This software probably cost millions in salaries to build, and millions more to maintain over the next 10+ years.

vsareto · on July 28, 2023

>You wrote: low capital costs to produce something

>I don't understand that phrase.

You don't need a big factory with large capital costs to make or distribute software. You're only paying people, and salaries are not capital investments.

paxys · on July 27, 2023

Smart people solving "solved problems" over and over again is exactly how the tech industry progresses. Meanwhile other industries are stuck with processes from fifty years ago because "it has always been done this way", and anyone even suggesting the slightest bit of change is a heretic.

BossingAround · on July 27, 2023

> I'm curious if hospitals (if that's even analogous to a company) do similar things by having different processes to solve the same problem.

Of course they do, unless the processes are regulated by a country (such as handling highly addictive opiates in the US).

thfuran · on July 27, 2023

Yeah, pretty much any medium to large company is going to have a bunch of processes for doing things that any company that size needs to do and some more that are a bit more industry/domain specific but still basically the same as those at every other company in the space.

quercusa · on July 27, 2023

In the US, all hospitals are required to use electronic health records (EHRs) and most are implemented in Epic, the leading EHR system. But they are almost always electronic implementations of their old paper record systems, dating back to who-knows-when.

Scubabear68 · on July 27, 2023

When I left Bloomberg around 7 years ago Kafka was being aggressively rolled out and there was a lot of interest in data streaming tech in general.

I am wondering if they ran into performance or resource usage issues with Kafka at scale, and decided to roll their own for use cases that better fit their workloads.

Bloomberg does have vast backend infrastructure for moving around / computing data which the Terminal is a consumer of.

dboreham · on July 27, 2023

What you say is true. But...this is Bloomberg so I think we can give them a pass and take a deeper look. I can't think of a more relevant company within which to develop messaging middleware.

jdmichal · on July 27, 2023

Exactly. Handling message queuing and distribution is likely a core capability for Bloomberg, so it makes a lot of sense for them to develop something tailored for their needs. LinkedIn did the same with the original Kafka development, where distributed eventual state consistency was a core capability.

If anything, we should be thankful that they've decided to share it with the world.

KnobbleMcKnees · on July 27, 2023

It's cool to be cynical

jrvarela56 · on July 27, 2023

Two reasons I can think of:

1. It is easier to implement from scratch than understand and modify: we see this everyday at a way smaller scale with libraries/classes/functions. Not saying this is optimal but this is a driver for sure.

2. Recruiting: some devs love doing this. Fancy OSS project to put on your resume, some actually believe there's unfulfilled need, you get people like us discussing it (otherwise we wouldn't be thinking about Bloomberg).

erulabs · on July 27, 2023

It's a somewhat difficult argument to say that it's easier to build from scratch, but I hear what you're saying. At any rate, it is absolutely more fun to build from scratch.

I've polled developers about what part of software development they hate, and in general, it's deployment and maintenance (ie: devops). Using a off-the-shelf-component means that the developer only gets to do the un-fun part: installing, configuring, productionizing. If they code their own solution, they get to spend a few weeks coding!

Sparkyte · on July 27, 2023

Part of this evolution is from funded development to opensource development.

Then there is language variances like flavors so what ends up happening is a million different conceptual primitives that fundamentally the same.

It isn't the same because of a few things. At the end of the day too it is also about how good the support is over the rest. If a new MQ can demonstrate better tooling nothing stops a company from adopting it.

kapilvt · on July 27, 2023

mainframe emulation is a thing, aka code change (Ala swap mq libraries) at enterprise scale, and supporting production without a migration path can very much stop adoption.

Sparkyte · on July 27, 2023

That really depends on how the MQ integration is designed and whether or not the in queue data is persistent or doesn't get stale. In most cases an application doesn't like stale data so a long lived queue is rare. People who adopt something typically PoC before adoption.

jjahdhnmko · on July 28, 2023

I've had to do this mostly because of batshit crazy rules many large companies have about how all software has to be supported. The issue there being the definition of supported in the eyes of clueless auditor types.

If software is written 100% externally, but the vendor only wrote 10% of it and is transparently taking 90% from an open source project so they can sell fraudulent support (assuming they don't have people contributing to the open source project here), that counts as supported.

If it is written 100% internally, even if it is built on top of some ancient internal codebase that nobody can figure out and has been in life support mode for 20 years, then it counts as supported.

But if its written 50% internally and 50% of it came from an external open source library with a BSD license, and you are a major contributor to that project, then it is considered unsupported and you get in trouble.

This is how you end up with security issues because some sysadmin decided they had to roll-their-own crypto library or authentication system, because its better to have an unknown implementation show up on a scanner than a known one that has a list of CVEs that can be checked.

gorbachev · on July 28, 2023

I don't know for sure, but there's a fair chance BlazingMQ precdates Kafka, at least the OSS version of it, by several years.

say_it_as_it_is · on July 27, 2023

Why is Bloomberg approving the open sourcing of this system?

loire280 · on July 27, 2023

Why wouldn’t they? This technology is not a competitive advantage for their company, wouldn’t make sense for them to sell as a standalone product, and it helps their engineering department attract good talent.

sdesol · on July 27, 2023

Here's some community stats for Bloomberg open source projects

https://devboard.gitsense.com/bloomberg

For the 200 repos that they have made open source, they had over 6000 people contribute feedback and/or code. So in addition to attracting talent, they also get more testers and feedback.

Full Disclosure: This is my tool

SkipperCat · on July 27, 2023

If it gains wide adoption, then they've got a global community of people providing free software engineering for their messaging platform. Its a win-win.

gorbachev · on July 28, 2023

https://www.bloomberg.com/company/values/tech-at-bloomberg/o...

Cthulhu_ · on July 27, 2023

Because they're not making money off of their queueing system, but what they build with it; to them it's a cog in their machine, and open sourcing it means others may adopt it and help them maintain and improve it.

esafak · on July 27, 2023

They must not consider it a trade secret, so they can earn good will, generate public contributions, and advertise their brand by sharing it.

BossingAround · on July 27, 2023

Hiring strategy--there are a ton of people who'll take a slightly lower salary if they can get a public record of their code commits.

puchatek · on July 27, 2023

Same reasons any company open sources their software

fizwhiz · on July 27, 2023

> This section explains the leader election algorithm at a high level. It is by no means exhaustive and deliberately avoids any formal specification or proof. Readers looking for an exhaustive explanation should refer to the Raft paper, which acts as a strong inspiration for BlazingMQ’s leader election algorithm.

So their own homegrown leader election algorithm?

> BlazingMQ’s leader election and state machine replication differs from that of Raft in one way: in Rafts leader election, only the node having the most up-to-date log can become the leader. If a follower receives an election proposal from a node with stale view of the log, it will not support it. This ensures that the elected leader has up-to-date messages in the replicated stream, and simply needs to sync up any followers which are not up to date. A good thing about this choice is that messages always flow from leader to follower nodes.

> BlazingMQ’s elector implementation relaxes this requirement. Any node in the cluster can become a leader, irrespective of its position in the log. This adds additional complexity in that a new leader needs to synchronize its state with the followers and that a follower node may need to send messages to the new leader if the latter is not up to date. However, this deviation from Raft and the custom synchronization protocol comes in handy because it allows BlazingMQ to avoid flushing (fsync) every message to disk. Readers familiar with Apache Kafka internals will see similarities between the two systems here.

"a new leader needs to synchronize its state with the followers and that a follower node may need to send messages to the new leader if the latter is not up to date". I thought a hallmark of HA systems was fast failover? If I come to your house and knock on the door, but it takes you 10mins to get off the couch to open the door, it's perfectly acceptable for me to claim you were "unavailable". Pedants will argue the opposite.

anentropic · on July 27, 2023

"deliberately avoids any formal specification or proof"

how is this possibly a selling point?

fizwhiz · on July 27, 2023

FWIW they mention this at the bottom of their document

> Just like BlazingMQ’s other subsystems, its leader election implementation (and general replicated state machinery) is tested with unit and integration tests. In addition, we periodically run chaos testing on BlazingMQ using our Jepsen chaos testing suite, which we will be publishing soon as open source. We have also tested our implementation with a TLA+ specification for BlazingMQ’s elector state machine.

callbacker · on July 27, 2023

One of the authors here. Thanks for pointing that out. TLA+ spec can be found here -- https://github.com/bloomberg/blazingmq/tree/main/etc/tlaplus.

amelius · on July 27, 2023

How is this "deliberately avoiding any formal specification or proof"?

peheje · on July 27, 2023

I think they are referring to the specific section below the notice.

callbacker · on July 28, 2023

Correct

scottlamb · on July 27, 2023

> "deliberately avoids any formal specification or proof"

> how is this possibly a selling point?

Context.

In an overview doc? The informal version is accessible to more audiences.

As the canonical design doc for the system? It's certainly not.

anentropic · on July 28, 2023

Apologies! I was only blindly responding to the parent comment...

FWIW the phrase seems to come from here: https://bloomberg.github.io/blazingmq/docs/architecture/elec...

The full sentence reads:

"This section explains the leader election algorithm at a high level. It is by no means exhaustive and deliberately avoids any formal specification or proof."

Which makes a lot more sense.

And as noted in a sibling comment there is actually a TLA+ spec for the leader election: https://github.com/bloomberg/blazingmq/tree/main/etc/tlaplus

KRAKRISMOTT · on July 27, 2023

HA aside, is this faster than Aeron?

hkt · on July 27, 2023

It has a gorgeous landing page but I'm not really sure why this is better than any other MQ. Can someone more knowledgeable provide any insight into its comparative advantages?

Edit: I did see the comparisons page, but.. well, there's more to life than Rabbit and Kafka!

traceroute66 · on July 27, 2023

> Can someone more knowledgeable provide any insight into its comparative advantages?

Well, you know, there is the "Comparison With Alternative Systems" page[1]. :-)

Sadly it doesn't include NATS in the comparison. I would have been interested to see that. But the usual suspects (Rabbit & Kafka) are given a mention.

[1] https://bloomberg.github.io/blazingmq/docs/introduction/comp...

rlonstein · on July 27, 2023

I'd also be interested in seeing how it compares to AMQP implementations (Apache Qpid for example) and any of the proprietary message queue systems (IBM and Microsoft) they refer to but don't name-check.

Cthulhu_ · on July 27, 2023

I mean one unsubstantiated armchair take: it's newer than Rabbit / Kafka (RabbitMQ was built in 2007, Kafka in 2011), since then the learnings from distributed systems, message queues and programming languages has improved by leaps and bounds, and the existing ones will have backwards compatibility / legacy issues.

That said, there are indeed plenty of commercial and noncommercial alternatives, so now that it's been published I'm sure there will be some thorough people making comparisons soon!

MathMonkeyMan · on July 27, 2023

My guess is that at the time when Bloomberg's infrastructure was considering supporting a message queue, they surveyed existing solutions (mind you that this was a long time ago) and found them lacking for one reason or another, rightfully or not.

RabbitMQ was probably avoided because nobody wanted to learn Erlang.

Also, R&D had a lot of experience building message oriented middleware "from scratch" in a low overhead high availability way, so first instinct was probably to start hacking in C++.

Nowadays it might be the case that some teams within Bloomberg need the performance or would rather have a bespoke solution instead of spending on migrating to something else off the shelf.

Keep in mind that this is a company that has its own implementation of most of the C++ standard library.

vyskocilm · on July 27, 2023

FJI: There is a faq entry about that https://bloomberg.github.io/blazingmq/docs/introduction/comp...

epinephrinios · on July 28, 2023

I worked on this team for 3 years along the OGs behind BMQ, specifically on a closed-source middleware with very similar architecture. I am very happy that they finally open sourced one of their main projects!

They have very strict standards which are necessary given that C++ really is a minefield. We did a great job avoiding UB by following those coding standards. There is some verbosity that can be avoided but hey these are details.

Personally, I have moved to Rust since then and I am not looking back.

andrewgazelka · on July 28, 2023

Is there something like BMQ but in Rust? Seems interesting you mentioned it.

eska · on July 27, 2023

I checked the docs, but couldn’t find anything about mqtt compat?

m00x · on July 28, 2023

I don't think this is an MQTT broker.

weekendcode · on July 27, 2023

Built in C++, awesome :)

sho · on July 28, 2023

Is that really your reaction to C++? These days when selecting a critical infrastructure component I'm very inclined to prefer memory-safe by default, like rust or (somewhat less so) golang. Sure, C/C++ projects that have been around and under attack for years (redis, postgres, etc) are fine but they've had those years of battle testing. For a new project I really feel a lot safer if they're built in something more failsafe.

wiseowise · on July 28, 2023

Can you show me where modern C++ is unsafe and why it matters in this specific context?

epinephrinios · on July 28, 2023

Read this - https://docs.google.com/document/d/e/2PACX-1vRZr-HJcYmf2Y76D...

favflam · on July 28, 2023

From the "leadership election" snippet, this setup looks perilously close to the peer to peer propagation of transactions and blocks in a blockchain system such as Solana. I suppose the latency requirements are much tighter than 400ms though.

I know Solana does mutual TLS over QUIC authentication between peers and does bandwidth rate limiting via stake. Also, the topology is self organizing. Is it worth copying this tech into a message queue system?

arnold_palmur · on July 27, 2023

I wonder how this compares to something like Aeron performance-wise - though Aeron is strictly messaging (not queueing).

k2xl · on July 28, 2023

Congrats on open sourcing this, but I am curious if anyone here has ever had an actual scale issue that required changing from some "slower" queuing system (like RabbitMQ, Kestrel, Kafka, etc) where you actually needed to change queuing systems?

Maybe I'm old school but 99% of the use cases I've worked with in my 20 year career could scale with using a MySQL database table acting as a queue and some scripts querying the table and doing work. I have worked with maybe a dozen queuing technogies (including super expensive cloud Amazon SQS) and unless your app needs sub second latency on processing messages at greater than 5k per second I just don't see why using these sophisticated systems have benefits?

Hope this comment doesn't come across cynical or dismissive. I love seeing new tech like this come out and I am not saying that there aren't use cases I am sure there (maybe HFT?) but curious if anyone has case studies to share of legit queue systems that couldn't scale.

chasd00 · on July 28, 2023

You're right but "MySQL database table acting as a queue" doesn't stand out on resumes/CVs. Plus, new and shiny things attract devs like moths to a flame.

Eumenes · on July 28, 2023

As much as Bloomberg (the guy and company) creep me out, this is impressive.

midoridensha · on July 28, 2023

I have a similar feeling about Facebook, but zstd is amazing.

ensocode · on July 28, 2023

Very interesting, but looking up the supported standards it just uses TCP. There is no support for AMQP or MQTT. How is interoperability with just some client libraries for Python, Java, ...

kitd · on July 28, 2023

AMQP & MQTT are messaging protocols. You would need protocol handlers either end with this in the middle.

Having said that, you would need to match the protocol to the interaction style. Not all are appropriate.

travisgriggs · on July 27, 2023

Interesting that there's a comparison (albeit low on details) to Rabbit, but not MQTT? Anyone had experience with both MQTT and BlazingMQ able to compare/contrast?

dbrueck · on July 27, 2023

Isn't MQTT more of a standard than an implementation?

sheerun · on July 27, 2023

I think he means implementation of MQTT protocol, like https://mosquitto.org/

smarx007 · on July 27, 2023

I would be interested in the differences between BlazingMQ, AMQP, and MQTT protocols and the invariants afforded to the clients, not the implementations per se.

winrid · on July 27, 2023

Yes. They prob mean rabbitmq.

tetrep · on July 27, 2023

Is this implemented with any sort of thought to security?

I don't see anything that implies as such, all FAQ and comparisons to other message queues say nothing of security (I skimmed and ctrl+f so I could have missed it). This doesn't bode well for a network-based application written in a memory unsafe language.

edit: I'm surprised so many people are interpeting this as a weird cargo cult thing. Application security includes a lot of mundane and critical things like:

Authentication, authorization, message signing / authentication (e.g. HMAC), encryption, secret/key management, how the system handles updates, etc.

You can read a bit more about it here: https://en.m.wikipedia.org/wiki/Application_security

cogman10 · on July 27, 2023

I perused the code and it was written in a C++03 style. Pointer chasing and manual memory management in the few files I poked through. (IE, no unique_ptr, just new/delete).

This may be perfectly safe... however, I'd give it a few years of battle testing before touching it.

Here's an example:

https://github.com/bloomberg/blazingmq/blob/main/src/groups/...

HWR_14 · on July 27, 2023

Isn't this opensourced after 8 years of being battletested in a very time/error sensitive environment (feeding bloomberg terminals)?

cogman10 · on July 28, 2023

Lol. I'm sorry but my work has me directly interfacing with Bloomberg and other financial institutions. If anything, I trust the code LESS because of who wrote it.

Just because software is heavily used and deals with finance, doesn't mean it's secure or well written. Consider the Equifax breach of 2017, everyone's credit info leaked because Equifax was using a crusty old version of struts with known critical CVEs.

HWR_14 · on July 28, 2023

I thought everyone's info leaked because an Equifax office in South America had open access to the internet with default passwords.

d4damager · on July 27, 2023

I don’t see any examples of new/delete there?

Also they do make heavy use of managed pointers in the codebase.

MathMonkeyMan · on July 27, 2023

Life is harder without move semantics, but allocators are easier.

ljm · on July 27, 2023

Security purely in terms of the language used, or security in terms of cluster access, authentication and so on?

I mean, I don't often see documentation for networked services that goes in-depth on how exactly they write memory safe code in C or C++, as if that would mean the application is therefore secure; it's always much higher level.

tetrep · on July 27, 2023

> Security purely in terms of the language used, or security in terms of cluster access, authentication and so on?

I'm personally interested in both, but from a docs perspective I'd mostly expect the latter, more "security for users of this system" than anything else. I'm a little embarassed I didn't remember it until now, but I think the general term is "application security."

I'm concerned that I don't see any hints of either in the project. I'm not at a computer with access to my Github account so I can't easily search the source code for hints of obvious signs of care or neglect with respect to security.

ljm · on July 27, 2023

https://github.com/bloomberg/blazingmq/blob/93e6426d474a4293...

Not sure where they are on the roadmap but authentication in particular is 'mid-term' but lacks detail.

I expect that if you want to deploy this then you're doing it on an internal network; its deployment examples are based on Docker so I expect it's relying on what you can do with k8s.

tetrep · on July 27, 2023

Thanks! That is the sort of stuff I was asking about.

It looks like it's an after thought but at least on their mind now, which is very fair with respect to Bloomberg's wants/needs. It'd be nice if they had a bit of a warning about using this until it has some basic auth(n) and TLS since they're releasing it to the public. I think it is, relativley speaking, rude to release insecure networked software without giving users a notice as to what sorts of insecurity is at least known/expected.

femto113 · on July 27, 2023

Adding a veneer of security isn't necessarily superior to leaving it out altogether. Systems of this sort are best secured at the network level, i.e. only trusted hosts should be able to connect to it. Redis is a good example of where this has been tried: it does support password based log in, but the password is stored and transmitted in plaintext, and a redis server will happily accept thousands of auth attempts per second making brute forcing a viable attack. Rather than improve the auth system Redis has instead doubled down on encouraging appropriate network level security by defaulting to only being accessible to the local host, so admins have to go through an explicit step (with warnings) before they can just expose it to the internet.

ljm · on July 28, 2023

> I think it is, relativley speaking, rude to release insecure networked software

And if you believe that, you're wrong

rewmie · on July 27, 2023

> Is this implemented with any sort of thought to security?

There's a hefty load of cargo cult mentality revolving the topic of security. Security is not a product, but a process. This sort of talk is getting tiresome.

mrits · on July 27, 2023

Would you prefer it written in Rust with everything wrapped in unsafe block?

coder543 · on July 27, 2023

That is an obviously false dichotomy. The author of the comment never mentioned Rust, and wasn’t asking about Rust. Such a false dichotomy feels like flamebait, which is against HN guidelines.

Your false dichotomy also implies that “unsafe” blocks are as unsafe as C++, which is not true. “Unsafe” in Rust turns off very few checks[0], most of them are still active. No one would write a serious Rust program entirely in unsafe anyways.

Regardless, asking about security considerations is a valid thing to do, even if it were written in Rust. Security is not just about memory safety.

[0]: https://doc.rust-lang.org/std/keyword.unsafe.html#unsafe-abi...

touisteur · on July 27, 2023

Furthermore, choice of language can have an effect as to the actual security of the networking layer.

Having parsers (and serializers) proved absent of runtime errors (e.g. with something like SPARK) is a form of guarantee I wish I'd see as the default in any 'serious' network-facing library or component. It's not even that hard to achieve, compared to the learning curve of the borrow checker.

Once rust gets plugged into Why3 and gets some industrial-grade proof capacity, the question of 'is it written in rust?' will be automatic (as in 'why would you do it any other way?').

mrits · on July 28, 2023

You probably want to spend a bit of time in the Rust community before your speak on it.

mikepurvis · on July 27, 2023

Rust's guarantees are all about what's happening inside your own process with memory ownership. "Security" in the context of a middleware like this is almost certainly more about external validation/auth— what prevents some random node from injecting messages into the queue, or adding itself as a subscriber to the output?

Also, it's a super bad-faith argument to talk about an entire program being in an unsafe block. Rust's unsafe is about specifically and intentionally opting out of certain guardrails in a limited scope of code where it should be fairly easy for a human reviewer to understand and validate the more complex invariants that aren't enforceable by the compiler itself.

mrits · on July 28, 2023

This is absolutely not what Rust unsafe block is about. It is about getting around the limitations of the compiler to gain efficiency. And it happens a lot.

mikepurvis · on July 28, 2023

> Unsafe Rust exists because, by nature, static analysis is conservative. When the compiler tries to determine whether or not code upholds the guarantees, it’s better for it to reject some valid programs than to accept some invalid programs. Although the code might be okay, if the Rust compiler doesn’t have enough information to be confident, it will reject the code. In these cases, you can use unsafe code to tell the compiler, “Trust me, I know what I’m doing.”

https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html

mrits · on July 28, 2023

You might want to read the page you linked

"you can give up guaranteed safety in exchange for greater performance "

tetrep · on July 27, 2023

I care more about methodology than specific methods. I would like applications that handle I/O, especially network I/O, to be designed and built with security in mind. I asked my question because that doesn't appear to be the case here, and I think that is dissapointing and concerning from a security perspective.

zoobab · on July 28, 2023

Is BlazingMQ faster than ZeroMQ?

wallstprog · on July 28, 2023

Unlikely, but they seem to be different things altogether. BlazingMQ appears to be a traditional message queue (think ActiveMQ), with message peristence. ZeroMQ is more of a network middleware (think Tibco Rendezvous), and does not include persistence.

BlazingMQ also appears to be more of a "platform" or "service" that an app can use (sort of like Oracle, say) -- ZeroMQ includes libraries that one can use to build an app, service or platform, but none is provided "out of the box".

Which makes it harder to get started with ZeroMQ, since by definition every ZeroMQ app is essentially built "from scratch".

If you're interested in ZeroMQ, you may want to check out OZ (https://github.com/nyfix/OZ), which is a Rendezvous-like platform that uses the OpenMAMA API (https://github.com/finos/OpenMAMA) and ZeroMQ (https://github.com/zeromq/libzmq) transport to provide a full-featured network middleware implementation. OZ has been used in our shop since 2020 handling approx 50MM high-value messages per day on our global FIX network.

sitkack · on July 27, 2023

TLA 0.1%

Finally! Thank you!

callbacker · on July 28, 2023

Thanks! The spec can be found here: https://github.com/bloomberg/blazingmq/tree/main/etc/tlaplus

synergy20 · on July 27, 2023

how does this compare against other existing message queuing systems?

the github shows its codebase is in c++ with gcc-7.3, that means it might be using c++14 in the code, or even c++17, which is nice.

notjoemama · on July 27, 2023

It doesn't use UDP. So my tongue-in-cheek reply is, it doesn't qualify to have "blazing" in the name. Hopefully that's not just funny inside my own head. :)

https://bloomberg.github.io/blazingmq/docs/architecture/netw...

rewmie · on July 27, 2023

> It doesn't use UDP. So my tongue-in-cheek reply is, it doesn't qualify to have "blazing" in the name.

Arguably, if you start with UDP but you also need to ensure retransmission and reordering, you eventually end up reimplementing TCP but poorly.

lordnacho · on July 27, 2023

Check out Aeron, they have a way to multicast (thus UDP) reliably and ordered. Based on negative acks. Super fast.

HappySweeney · on July 27, 2023

NORM from the Naval Research Labs is another implementation of the same concept.

https://github.com/USNavalResearchLaboratory/norm

jonathanoliver · on July 27, 2023

Aeron is another awesome project from Martin Thompson (Disruptor, Simple Binary Encoding, Mechanical Sympathy, etc.).

That guy really knows his stuff regarding performance.

phishhook · on July 27, 2023

Sounds like PGM

lordnacho · on July 28, 2023

Similar ideas, yes. But somehow I found Aeron and got it to work, not so straightforward with PGM. Been a while now, so not sure about the details.

anonymoushn · on July 27, 2023

I've found that for many applications TCP doesn't really make optimal choices in these areas or expose knobs for users to make better choices.

jayd16 · on July 27, 2023

I guess you could implement QUIC instead.

amanj41 · on July 27, 2023

When I was working at BB I could have sworn they had a fanout mode using UDP. Guess not.

Thaxll · on July 27, 2023

Which version of C++ is this built on?

hallfox · on July 27, 2023

Hi, dev here. BlazingMQ is written with a baseline of C++03, though our docker tutorial builds it with C++17.