These days all those people who prioritise career building technical feats over solving actual business problems in pragmatic ways - they're all hyping and blogging about AI, with similar results for the companies they (allegedly) are working for: https://www.theregister.com/2024/06/12/survey_ai_projects/
A while ago I moved from microservices to monolith because they were too complicated and had a lot of duplicated code. Without microservices there's less need for a message queue.
For async stuff, I used RabbitMQ for one project, but it just felt...old and over-architected? And a lot of the tooling around it (celery) just wasn't as good as the modern stuff built around redis (bullmq).
For multi-step, DAG-style processes, I prefer to KISS and just do that all in a single, large job if I can, or break it into a small number of jobs.
If I REALLY needed a DAG thing, there are tools out there that are specifically built for that (Airflow). But I hear they're difficult to debug issues in, so would avoid at most costs.
I have run into scaling issues with redis, because their multi-node architectures are just ridiculously over-complicated, and so I stick with single-node. But sharding by hand is fine for me, and works well.
Messaging-based architecture is very popular
Change Data Capture (CDC) has also gotten really good and mainstream. It's relatively easy to write your data to a RDBMS and then capture the change data and propagate it to other systems. This pattern means people aren't writing about Kafka, for instance, because the message queue is just the backbone that the CDC system uses to relay messages.
These architectures definitely still exist and they mostly satisfy organizational constraints - if you have a write-once, read-many queue like Kafka you're exposing an API to other parts of the organization. A lot of companies use this pattern to shuffle data between different teams.
A small team owning a lot of microservices feels like resume-driven developnent. But in companies with 100+ engineers it makes sense.
I work at a company that only hires the best and brightest engineers from the top 3-4 schools in North America and for almost every engineer here this is their first job.
My engineers have done crazy things like:
- Try to queue up tens of thousands of 100mb messages in RabbitMQ instantaneously and wonder why it blows up.
- Send significantly oversized messages in RabbitMQ in general despite all of the warnings saying not to do this
- Start new projects in 2024 on the latest RabbitMQ version and try to use classic queues
- Creating quorum queues without replication policies or doing literally anything to make them HA.
- Expose clusters on the internet with the admin user being guest/guest.
- The most senior architect in the org declared a new architecture pattern, held an organization-wide meeting and demo to extol the new virtues/pattern of ... sticking messages into a queue and then creating a backchannel so that a second consumer could process those queued messages on demand, out of order (and making it no longer a queue). And nobody except me said "why are you putting messages that you need to process out of order into a queue?"...and the 'pattern' caught on!
- Use Kafka as a basic message queue
- Send data from a central datacenter to globally distributed datacenters with a global lock on the object and all operations on it until each target DC confirms it has received the updated object. Insist that this process is asynchronous, because the data was sent with AJAX requests.
As it turns out, people don't really need to do all that great of a job and we still get by. So tools get misused, overused and underused.
In the places where it's being used well, you probably just don't hear about it.
Edit: I forgot to list something significant. There's over 30 microservices in our org to every 1 engineer. Please kill me. I would literally rather Kurt Cobain myself than work at another organization that has thousands of microservices in a gigantic monorepo.
If your perception is indeed correct it'd attribute it to your 3rd point. People usually write blogposts about new shiny stuff.
I personally use queues in my design all the time, particularly to transfer data between different systems with higher decoupling. The only pain I have ever experienced was when an upstream system backfilled 7 days of data, which clogged our queues with old requests. Running normally it would have taken over 100 hours to process all the data, while massively increasing the latency of fresh data. The solution was to manually purge the queue, and manually backfill the most recent missing data.
Even if you need to be careful around unbound queue sizes I still believe they are a great tool.
Thats good. The documentation for eg RabbitMQ is much better and very helpful. People use it as a workhorse just like they use Postgres/MySQL. There’s not much surprising behavior needed to architect around etc.
I love boring software.
Or take AWS SNS which IMO is one level of abstraction higher than SQS. It became so feature rich that it can practically replace SQS.
What might have disappeared is those use cases which used Queues to handle short bursts of peak traffic?
Also streaming has become very reliable tech so a class of usecases that used Queues as streaming pipe have migrated to the streaming proper.
There are places where you need a queue just for basic synchronization, but you can use modules that are more convenient than external queues. And you can start testing your program without even doing that.
Actually async is being used a lot with Rust also, which can stretch that out to scale even farther with an individual server.
Without an async runtime or similar, you have to invent an internal async runtime, or use something like queues, because otherwise you are blocked waiting for IO.
You may still eventually end up with queues down the line if you have some large number of users, but that complexity is completely unnecessary for getting a system deployed towards the beginning.
When you enqueue something, you eventually need to dequeue and process it. A lambda just does that in a single call. It also removes the need to run or scale a worker.
I think Kafka continues to be popular because it is used as a temporary data store, and there is a large ecosystem around ingesting from streams.
I personally use queues a lot and am building an open source SQS alternative. I wonder if an open source lambda replacement would be useful too. https://github.com/poundifdef/SmoothMQ
I just added a RabbitMQ-based worker to replace some jobs that Temporal.io was bad at (previous devs threw everything at it, but it's not really suited to high throughput things like email). I'd bet that Temporal took a chunk of the new greenfield apps mindshare though.
My money is on this. I think the simple usecase of async communication, with simple pub/sub messaging, is hugely useful and not too hard to use.
We (as a Dev community) have just gotten over event sourcing, complex networks and building for unnecessary scale. I.e. we're past the hype cycle.
My team uses NATS for Async pub/sub and synchronous request/response. It's a command driven model and we have a huge log table with all the messages we have sent. Schemas and usage of these messages are internal to our team, and are discarded from NATS after consumption. We do at-least-once delivery and message handlers are expected to be idempotent.
We have had one or two issues with misconfiguration in NATS resulting in message replay or missed messages, but largely it has been very successful. And we were a 3 person dev team.
It's the same thing as Kubernetes in my mind - it works well if you keep to the bare essentials and don't try to be clever.
In large enterprises, there is usually some sort of global message bus on top of Kafka, AWS Kinesis or similar.
In smaller shops, the need for dedicated message bus is over engineering and can be avoided by using the db or something like redis. It is still a message queue, just without a dedicated platform.
What I would need to see required before bothering with a message queue architecture:
* High concurrency, atomic transactions
* Multiple stages of processing of a message required
* Traceability of process actions required
* Event triggers that will actually be used required
* Horizontal scaling actually the right choice
* Message queues can be the core architecture and not an add on to a Frankenstein API
Probably others, and yes you can achieve all of the above without message queues as the core architecture but the above is when I would think "I wonder if this system should be based on async message queues".
Unfortunately, moving real-time messaging complexity entirely to the back end has been the norm for a very long time. My experience is that, in general, it makes the architecture way more difficult to manage. I've been promoting end-to-end pub/sub as an alternative for over a decade (see https://socketcluster.io/) but, although I've been getting great feedback, this approach has never managed to spread beyond a certain niche. I think it's partly because most devs just don't realize how much complexity is added by micromanaging message queues on the back end and figuring out which message belongs to what client socket instead of letting the clients themselves decide what channels to subscribe to directly from the front end (and the back end only focuses on access control).
I think part of the problem was the separation between front end and back end developer responsibilities. Back end developers like to ignore the front end as much as possible; when it comes to architecture, their thinking rarely extends beyond the API endpoint definition; gains which can be made from better integrating the back end with the front end is 'not their job'. From the perspective of front-end developers, they see anything performance-related or related to 'architectural simplicity' as 'not their job' either... There weren't enough full stack developers with the required insights to push for integration efficiency/simplicity.
During heavy load the queue bloats up to a few million messages, then drains off over time. Or it spawns a few hundred lambdas to chow all the messages down...depending on what we want.
Redis taking some of the duty as you mentioned and microservices/distributed systems being less fashionable likely also factor into it.
2) Serverless, e.g. AWS lambdas can be joined with step functions instead and scale without a queue.
3) People have been burned. Multiple configurations, multiple queues, multiple libraries and languages, multiple backing stores, multiple serialisation standards and bugs - it's just overly complex engineering a distributed system for small to medium business. YAGNI.
4) Simpler architectures, e.g. microservices for better or worse tend to be fat monoliths made more efficient, mostly in my experience because those in charge don't actually don't understand the pattern, but the side effect is fewer queues compared to real microservices arch .
O/T I cringe whenever I hear our tech architects discuss our arch as microservices and datamesh. The former because it's not (as above, it's multiple small services) the latter also a problem because datamesh is an antiquated pattern that's better filled with segregated schemas on a distributed database, and scoped access per system to the data each needs, instead of adapter patterns, multiple dbs with slightly different schemas and fascades/specialised endpoints all the fucking way down.
People really abused kafka: https://www.confluent.io/en-gb/blog/publishing-apache-kafka-... like really abused it.
Kafka is hard to use, has lots of rough edges, doesn't scale all that easily, and isn't nice to use as a programmer. but you can make it do lots of stupid shit, like turn it into a database.
People tried to use message queues for synchronous stuff, or things that should be synchronous, and realised that queuing those requests is a really bad idea. I assume they went back to REST calls or something.
Databases are much much faster now, with both SSD, design and fucktonnes of ram. postgres isn't really the bottleneck it once was.
SQS and NATS cover most of the design usecases for pure message queues (as in no half arse RPC or other feature tacked in) and just works.
Message queues are brilliant, I use them a lot, but only for data processing pipelines. But I only use them to pass messages, not actual data. so I might generate a million messages, but each message is <2k
could I use a database? probably, but then I'd have to make an interface for that, and do loads of testing and junk.
Most of us in the desktop computing world don't actually need the distribution, reliability features, implementation-agnostic benefits of a queue. We can integrate our code very directly if we choose to. It seems to me that many of us didn't for a while because it was an exciting paradigm, but it rarely made sense in the places I encountered it.
There are certainly cases where they're extremely useful and I wouldn't want anything else, but again, this is typically in settings where I'm very constrained and need to talk to a lot of devices rather than when writing software for the web or desktop computers.
As for your last point, the Internet of Things is driven by message queues (like MQTT), so depending on the type of work you're doing, message queues are all over the place but certainly not exciting to write about. It's day-to-day stuff that isn't rapidly evolving or requiring new exciting insights. It just works.
From a technology perspective, message queuing has been commodified, I can pull an SQS off the shelf and get right to work. And so maybe the ubiquity of cloud based solutions that can be wired together has just removed the need for choice. If I need mqtt, there’s an app for that. Fanout? App for that. Needs to be up 25/7? …
1. Queues are actually used a lot, esp. at high scale, and you just don't hear about it.
2. Hardware/compute advances are outpacing user growth (e.g. 1 billion users 10 years ago was a unicorn; 1 billion users today is still a unicorn), but serving (for the sake of argument) 100 million users on a single large box is much more plausible today than 10 years ago. (These numbers are made up; keep the proportions and adjust as you see fit.)
3. Given (2), if you can get away with stuffing your queue into e.g. Redis or a RDBMS, you probably should. It simplifies deployment, architecture, centralizes queries across systems, etc. However, depending on your requirements for scale, reliability, failure (in)dependence, it may not be advisable. I think this is also correlated with a broader understanding that (1) if you can get away with out-of-order task processing, you should, (2) architectural simplicity was underrated in the 2010s industry-wide, (3) YAGNI.
On AWS for example, you use SQS and a sprinkling of SNS, or perhaps Kinesis for a few things and you're good. There isn't a lot to talk about there, so the queues no longer become the center of the design.
Message-queue based architectures are great for data processing, but not great for interactive web sites, and if most people are building interactive web sites, then the choices seem a little obvious. I still design event systems for data processing (especially with immutable business data where you have new facts but still need to know that you were "wrong" or had a different picture at some earlier time). But for most apps... you just don't need it.
A lot of the difficulty in modeling a complex system has to do with deciding what is durable state vs what is transient state.
Almost all state should be durable and/but durability is more expensive upfront. So people make tradeoffs to model a transition as transient and put in a queue. One or two or three years in, that is almost always a regretted decision.
Message queues that are not databases/systems of record wind up glossing this durable/transient state problem, and then when you have also this unique piece of infrastructure to support, this is a now you have two problems moment.
I have seen a startup where RabbitMQ was being used to hand-off requests to APIs (services) that take long to respond. I argued for unifying queueing and data persistence technology using Postgres even though I know a simple webhook would suffice.
Given that AWS has to sell and complexity tends to make people look smart, another server was spurn up for RabbitMQ :)
Many companies that have run a highly distributed system have figured what works for them. If requests are within the read and write rates of what Redis or Postgres can handle why introduce RabbitMQ or Kafka :?
Always remember that the Engineer nudging you towards more complexity will not be there when the chips are down.
Message brokers need client libraries for every language and serialization support. HTTP clients and JSON Serialization have first-class support already, so many software distributors ship those APIs and clients first. Everyone got used to working with it and started writing their own APIs with it too.
* message queues solve some problems that are nowadays easily solved by cloud or k8s or other "smart" infrastructure, like service discovery, load balancer, authentication
* the tooling for HTTPS has gotten much better, so using something else seems less appealing
* it's much easier to get others to write a HTTPS service than one that listens on a RabbitMQ queue, for example
* testing components is easier without the message queue
* I agree with your point about databases
* the need for actual asynchronous and 1:n communication is much lower than we thought.
I maintain a web application with a few hundred daily users and with the following table I have never had any problems:
CREATE TABLE `jobs` ( `id` BIGINT NOT NULL AUTO_INCREMENT, `queue` VARCHAR NOT NULL, `payload` JSON NOT NULL, `created_at` DATETIME NOT NULL, PRIMARY KEY `id`, INDEX `queue` );
Using MySQL's LOCK and UNLOCK I can ensure the same job doesn't get picked up twice.
All in all, it's a very simple solution. And simple is better!
At my [place of work] we have built a simple event system in top of lambda functions, sqs, s3, eventbridge etc to ingest and add metadata to events before sending them on to various consumers.
We replaced an older kafka system that did lots of transformations to the data making it impossible to source the origin of a field at the consumer level; the newer system uses an extremely KISS approach - collate related data without transformation, add metadata and tags for consumers to use as a heads up and then leave it at that.
I agree that most regular stuff should just be http (or whatever) microservices as the garbage collection is free; requests, sockets, etc time out and then there's no rubbish left over. In an event based system if you have issues then suddenly you might have dozens of queues filled with garbage that requires cleanup.
There are definitely pros to event based, the whole idea of "replaying" etc is cool but like...I've never felt the need to do that...ever.
The event volume that we do process is quite low though, maybe a couple hundred k messages a day.
We just call it different. Or use different underlying products. Nearly all web frameworks have some worker system built in. Many languages have async abilities using threads and messaging built in.
The only popular language and ecosystem I can think of that doesn't offer "worker queues" and messaging OOTB is JavaScript.
We are using it more than ever before. We just don't talk about it anymore, because it has become "boring" tech.
Given a recent opportunity to rethink messaging based architectures, I chose the simplicity and flexibility of Redis to implement stack and queue based data-structures accessible across distributed nodes.
With even a handful of nodes, it was challenging to coordinate a messaging based system and the overhead of configuring a messaging architecture, essentially developing an ad-hoc messaging schema with each project (typically simple JSON objects), and relatively opaque infrastructure that often required advanced technical support led messaging systems to fall out of favor for me.
Kafka seems to be the current flavor of the day as far as messaging based systems, but I don't think I'll ever support a system that approaches the throughput required to even think about implementing something like Kafka in the laboratory automation space - maybe there's a use case for high-content imaging pipelines?
Right now, I'd probably choose Redis for intra-system communication if absolutely necessary, then something like hitting a Zapier webhook with content in a JSON object to route information to a different platform or software context, but I'm not in a space where I'm handling Terabytes of data or millions of requests a second.
Message queue-based architectures are the backbone of distributed, event-driven systems. Think of systems where when this particular event happens, then several downstream systems need to be aware and take action. A message queue allows these systems to be loosely coupled and supports multiple end system integration patterns.
Notice, this is systems or enterprise development, not application development. If you're using a message queue as part of your application architecture, then you may be at risk of over-engineering your solution.
This effect has also happened with microservices/monaliths, lambda/serverless, agile/scrum(still no concrete definition on these). Even cloud as a whole, there are so many articles of how companies managed to cut cloud cost to a fraction just by going bare metal.
This pattern has it's uses, but if you are using it everywhere, every time you have some sort of notifications because "it's easy" or whatever, you are likely doing it wrong and you will understand this at some point and it will not be pleasant.
Most people that deal with <40k users a day will have low server concurrency loads, and can get away with database abstracted state-machine structures.
Distributed systems is hard to get right, and there are a few key areas one needs to design right...
If I was to give some guidance, than these tips should help:
1. user UUID to allow separable concurrent transactions with credential caching
2. global UTC time backed by GPS/RTC/NTP
3. client side application-layer load-balancing through Time-division multiplexing (can reduce peak loads by several orders of magnitude)
4. store, filter, and forward _meaningful_ data
5. peer-to-peer AMQP with role enforcement reduces the producer->consumer design to a single library. i.e. if done right the entire infrastructure becomes ridiculously simple, but if people YOLO it... failure is certain.
6. automatic route permission and credential management from other languages can require a bit of effort to sync up reliably. Essentially you end up writing a distributed user account management system in whatever ecosystem you are trying to bolt on. The client user login x509 certs can make this less laborious.
7. redacted
8. batched payload 128kB AMQP messages, as depending how the consumer is implemented this can help reduce hits to the APIs (some user UUID owns those 400 insertions for example.)
9. One might be able to just use Erlang/Elixir channels instead, and that simplifies the design further.
Have a great day, =3
These days, with AI, vector dbs are all the rage, so everyone hops onto that train.
Nice to see so many developers owning up to the "resume building" and being pragmatic about solving human/business problems versus technology for the sake of it.
- event log (stateful multi-consumers) have taken a portion of the message queue workflow. this is more likely than moving them to redis/database
message queuing works incredibly well for many problems. it’s as critical to most companies architectures as an application database
But now that I think about it, we don't use it in the traditional sense. Most of our regular operations work well enough by just using the "async" pattern in our programming (in JS and Rust).
The only place we use Pub/Sub is for communication between our NodeJS backend server and the Rust servers that we deploy on our client's VMs. We didn't want to expose a public endpoint on the Rust server (for security). And there was no need for a response from the Rust servers when the NodeJS server told it to do anything.
We don't fully utilize the features of a messaging queue (like GCP's Pub/Sub), but there just wasn't a better way for our specific kind of communication.
That’s it. Full stop.
It's awesome!
It absorbs the peaks, smoothes them out, acts as a buffer for when the database is down for upgrades, and I think over all these years we only had one small issue with it.
10/10 would recommend.
I still think queues are great, but most of the time I can just run my queues using language constructs (like Channels) communicating between threads. If I need communication between machines, I can usually do that with Postgres or even s3. If you're writing constantly but only reading occasionally, you don't need to consume every message – you can select the last five minutes of rows from a table.
I've also seen a general trend in my line of work (data engineering) to try and write workloads that only interact with external services at the beginning or end, and do the rest within a node. That makes a lot of sense when you're basically doing a data -> data transformation, and is easier to test.
There are still cases where we need the full power of Kinesis, but it's just a lot less common than we thought it would be 10 years ago.
Since then I've been reading about async and await in the newer versions of javascript and it really threw me for a loop. I needed this to slow down some executing code but as I worked through my problems I realized "my god! this is exactly what we could have used for pub/sub at my last job".
We could have replaced a kafka system as well an enterprise workflow system with javascript and the async/await paradigm. Some of these systems cost millions per year to license and administer.
For the blog posts, most were garbage(and still are) if my memory serves right. I recall reading a lot of blog posts and all of those were basically a different derivative of same “quick-start tutorial” that you would find on any decently documented piece of software. Once you delve into the real trenches, the blog posts start showing their limits immediately and the shallowness of the depth.
That all being said, message queues are very crucial part of most complex systems these days, same as your typical tools(containers, git, your language of choice etc.), it has moved onto mature and boring.
There is no hype because not much news there.
That doesn’t mean it is less used.
The fundamental issue with event driven architecture is getting out of sync with the source of truth.
Every single design doc I’ve seen in an organization pitching event driven architecture has overlooked the sync issue and ultimately been bitten by it.
But...
Isn't another reason why we don't see hype around message queues and such in Distributed Systems because they are standard practice now? This discourse feels around this feels like, "message queues will make your architecture so much better, you should try it!" and more like, "just use a message queue..." To feel like the hype isn't there anymore because the technology is just standard practice now.
I could be wrong but whenever I come across any articles on building Distributed Systems, message queues and their variants are one of the first things mentioned as a necessity.
I've read very strong reports favorable of it. For instance, it also can be scaled using progressively better hardware (like better CPU or more RAM), or horizontally scaling too. that Also that with a database on the same network, there won't be much need for a cache. Presumably, ability to throw CPU and RAM would lessen some needs for queue too.
At the same time, I don't notice much Elixir usage in practice and it has remained a small community.
We are now based firmly in the Azure landscape and Event Grids provide us with an effective team service boundary for other teams to consume our events all with the appropriate RBAC. Internal team Azure Service Bus is the underlying internal team driver for building decoupled and resilient services where we have to guarantee eventual consistency between our internal system landscape and the external SaaS services we actively leverage. At this scale it works very effectively, especially when pods can be dropped at any point within our k8s clusters.
Especially if they are services that are conceptually related, you will immediately hit consistency problems - message queues offer few guarantees about data relationships, and those relationships are key to the correctness of every system I've seen. Treating message queues as an ACID data store is a categorical mistake if you need referential integrity - ACID semantics are ultimately what most business processes need at the end of the day and MQ architectures end up accumulating cruft to compensate for this.
Basically every "cool, shiny, new" tech goes through this process. Arguably it's all related to the Garnter Hype Cycle[1], although I do think the process I'm describing exists somewhat independently of the GHC.
Queues are still very useful for queueing up asynchronous work. Most SaaS apps I've worked with use one. However, there is a difference to what kind of queue you need to queue a few thousand tasks per day, vs using the queue as the backbone of all of your inter-service communications. For the first use case, using a DB table or Redis as a queue backend is often enough.
Message queues are often the wrong tool. Often you rather want something like RPC, and message queues were wrongly used as poor man's async DIY RPC.
Basically everything in tech has gone through a hype cycle when everyone was talking about it, when it was the shiny new hammer that needed to be applied to every problem and appear in every resume.
A bit over 20 years ago I interviewed with a company who was hiring someone to help them "move everything to XML". Over the course of the two hour interview I tired unsuccessfully to figure out what they actually wanted to do. I don't think they actually understood what XML was, but I still wonder from time to time what they were actually trying to achieve and if they ever accomplished it.
Its easier and faster to make a Webservice request where you get an instant result you can handle directly in the source System.
Mostly the queue is implemented in the source System where you can monitor and see the processing status in Realtime without delays.
Although queuing systems can be implemented on top of a database, message queues like RabbitMQ / ZeroMQ are doing a fine job. I use RabbitMQ all the time, precisely because i need to transfer data between systems and i have multiple workers working asynchronously on that data.
I guess these architectures might be less popular, or less talked about, because monoliths and simple webapps are more talked about than complex systems ?
And for a consulting company, a solid message-based deployment is not a good business strategy. If things just work and temporary load spikes get buffered automatically, there's very little reason for clients to buy a maintenance retainer.
In the ruby world, delayed job was getting upended by sidekiq, but redis was still a relatively new tool in a lot of tool-belts, and organizations had to approach redis at that time with (appropriate) caution. Even Kafka by the mid 10s was still a bit scary to deploy and manage yourself, so it might have been the optimal solution but you potentially wanted to avoid it to save yourself headaches.
Today, there are so many robust solutions that you can choose from many options and not shoot yourself in the foot. You might end up with a slightly over complicated architecture or some quirky challenges, but it’s just far less complex to get it right.
That means fewer blog posts. Fewer people touting their strategy. Because, to be frank, it’s more of a “solved” problem with lots of pre existing art.
All that being said, I still personally find this stuff interesting. I love the stuff getting produced by Mike Perham. Kafka is a powerful platform that can sit next to redis. Tooling getting built on top of Postgres continues to impress and show how simple even high scale applications can be——
But, maybe not everyone cares quite the way we do.
Along with much more mentioned in this thread, I think a lot of companies realized that they indeed are not AWS/Google/Meta/Twitter scale, won't be in the next decade, and probably never will need to be to be successful or to support their product.
The last thing you want on your LinkedIn profile is a link to a video you made in 2015 about your cool Kafka solution. The ACS would spit you out so fast...
I can’t really say I’m enjoying it. But it does help with scale
Distributed logs such as Kafka, bookkeeper, and more stepped in to take some market share and most of the hype.
MQs retain their benefits and are still useful in some cases but the more modern alternatives can offer fewer asterisks.
Serialization and going over the network are an order of magnitude slower and error prone than good ol' function calls.
I've seen too many systems that spent more time on marshalling from and to json and passing messages around than actual processing.
One project I know of started with message queues for request response pattern. It performed poorly because Windows Service Bus writes messages to a database. That increased latency for a UI heavy application.
Second project used message queues but the front end was a HTTP API. When overloaded the API timed out at 30 seconds but the job was still in the queue and wasn’t cancelled. It led to a lot of wastage.
* PX4
* Ardupilot
* Betaflight
* DroneCAN
* Cyphal
* ROS/ROS2
* Klipper
* GNU Radio
Also would like to mention that all of the most popular RTOSes implement queues as a very common way of communicating between processes.
The new thing is "event driven architecture" (or whatever they can pass off as that hype). In a lot of cases, it's a better architecture. For fhe remaining batches, we are running against S3 buckets, or looking at no SQL entries in a specific status in a DB. And we still use a little SQS, but not that often.
I think like most have said is that it’s just not a popular topic anymore to blog about but it’s still used. OTOH logs like Kafka have become more ubiquitous. Even new and exciting systems like Apache Pulsar (a log system that can emulate a queue) have implemented the Kafka API.
I personally have built three. It's the latest thing to do.
This but also: computers got incredibly more capable. You can now have several terabytes of ram and literally hundreds of cpu cores in a single box.
Chances are you can take queuing off your design.
There is literally nothing in common between RabbitMQ and ZeroMQ except for the two symbols 'MQ' in the name.
SQS is still very much alive. It is more than likely the first or second most deployed resource in AWS in my daily work.
AWS Lambda, Google Cloud Functions, etc often resemble message queue architectures when used with non-HTTP triggers. An even happens and a worker spawns to handle the event.
Almost every AWS architecture diagram will have a queue.
SQS is extremely stable, mature and cheap. Integrates with so many services and comes out of the box with good defaults.
if you are running truly global (as planetary scale) distributed service, and have multiple teams developing independent components then it makes sense.
If possible boiled down systems to a transitional db replica set. That is a local minimum in complexity that will serve you for a long time.
* The Log is the superior solution
There are any number of ways to do the same thing — context matters.
suddenly everyone could scale much much more, but by then they were moving to the cloud and execs don't understand two buzzwords at the same time.
Was never obsessed with "event-driven" distributed systems using message queues.
The major issue is to keep syncing state between services.
Quite for a long time used to get decent results with simple golang and postgres scripts to distribute work between workers on multiple bare metal machines
Took ideas from projects similar to MessageDB
https://redis.io/docs/latest/develop/data-types/streams/
CREATE TABLE IF NOT EXISTS message_store.messages ( global_position bigserial NOT NULL, position bigint NOT NULL, time TIMESTAMP WITHOUT TIME ZONE DEFAULT (now() AT TIME ZONE 'utc') NOT NULL, stream_name text NOT NULL, type text NOT NULL, data jsonb, metadata jsonb, id UUID NOT NULL DEFAULT gen_random_uuid() );
along with Notify https://www.postgresql.org/docs/9.0/sql-notify.html
or polling techinques
Redis Stream here and there worked well too https://redis.io/docs/latest/develop/data-types/streams/
Another possible alternative can be "durable execution" platforms like Temporal, Hatchet, Inngest mentioned on HN many times
- Temporal - https://temporal.io/ - https://github.com/temporalio - https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...
- Hatchet https://news.ycombinator.com/item?id=39643136
- Inngest https://news.ycombinator.com/item?id=36403014
- Windmill https://news.ycombinator.com/item?id=35920082
Biggest issues I had with workflow platforms - it requires a huge congnitive investments in understanding SDKs, best practices, deciding how properly persist data. Especially, if you optout to host it yourself.
Be careful not to conflate message transport with message storage, although message brokers usually do both.
SQS was slower than frozen molasses when I used it c. 2010. ZMQ, ZK, rabbit, and MQTT are oft mentioned. ESBs always come up and a large % of technical people hate them because they come with "software architect" historical baggage.
It's risky to have a SPoF one-grand-system-to-rule-them-all when you can have a standardized API for every department or m[ia]cro-service exposed as RESTful, gRPC, and/or GraphQL in a more isolated manner.
Redis isn't needed in some ecosystems like Elixir/Erlang/BEAM. Memcache is simpler if you just need a temporary cache without persistence.
It has some downsides as it if something happens it's harder to debug than just using good old REST API calls.
Still, very much present/popular in the ecosystems I dabble in.
If you have a problem that logically has different async services, sure, use Redis or something. Databases also were able to handle this problem, but weren't as sexy, and they explicitly handle the problem better now. Just another tool in the toolbelt.
NoSQL was another solution in search of problems, but databases can handle this use-case better now too.
Ok, you asked.
It was early Summer 1997 and I had just graduated college with a computer information systems degree. I was employed at the time as a truck driver doing deliveries to Red Lobster and my father was not happy and offered me double my current pay and guaranteed 40 hours per week to return to work for him as an electrician. I returned to work with my Dad for the Summer but after 14+ years of electrical work with him I decided I needed to get a job with computers. Labor day weekend of 1997 I blasted out over 40 resumes and on Tuesday 9/9/97 I had my first interview at a small payments startup dotcom in Wilmington Delaware, the credit card capital of the world at that time. I was hired on the spot and started that day as employee #5 and was the first software hire for the company yet I had NO IDEA what I was doing. I was tasked with creating a program that could take payments from a TCPIP WAN interface and proxy it back out over serial modems to Visa and batch ACH files to the U.S. FED. This started as a monolith design and we were processing automated clearing house as well as credit cards by late 1997. I would continue in this architectural design and sole software developer support of this critical 100% uptime system for many years. Somewhere around mid 1998 volume started to increase and the monolith design experienced network socket congestion, the volume from the firehose continued to increase and I was the sole guy tasked with solving it. That volume increase came from a little known company at the time, PayPal. Since the 'mafia' was very demanding they knew I was the guy however management isolated me since the 'mafia' demanded extensive daily reporting that only the guy who built the platform could provide. This however took a backseat to the network connection issues which were growing at an increasing rate. I was involved in a lot of technology firsts as a result and herein starts the queue story. After processing payments for Microsoft TechEd too in 1998 I was given an NT Option Pack CD. I was constantly seeking a solution to reduce the network congestion on the monolith and within this option pack was something called "Microsoft Message Queue". I spent several months nonstop of nights and weekends redesigning the entire system using an XML interface API from the ground up while writing individual services that read from an ingress queue and output to and egress queue, this structure is now know as microservices and this design solved all the load problems since it scaled extremely well. This new redesigned system had many personal experience enhancements added such as globally unique identifiers as well as the API being fully extensible but the greatest unseen win was the ability to 100% recreate software bugs since all message passing was recorded. Paypal ended up leaving us in ?2003? for JPMorgan after the majority holder refused to sell to the 'mafia'. Some years later I was informed by several management executives that Paypal also offered to exclusively hire only me but I was of course never informed of that.
I have many a business horror story however in over a decade of using MSMQ at my first payments company, 1998-2010, I only had one data corruption event in the queue broker which required me to reverse engineer the MSMQ binary and thus file format to recover live payments records. This corruption was linked to bad ECC memory on one of those beige 1U Compaq servers that many here likely recall.
This story may reveal my age but while my ride has been exciting it isn't over yet as so many opportunities still exist. A rolling rock gathers no moss!
Stay Healthy!
However, a few years ago, I did use MSMQ (Microsoft Message Queue) and worked very well. However, at the time, I wanted something that didn't limit me to Windows.
In the end, I ended up learning ZeroMQ. Once I understood their "patterns" I created my own Broker software.
Originally, all requests (messages) sent to the Broker were being stored in files. Eventually, I moved over to Sqlite. As the broker is designed to process one thing at a time, was not worried multiple requests were going to sqlite. So now my Broker requires little dependencies.. except ZeroMQ and Sqlite.
(I did not need to worry about async processing as they get passed to the workers)
So, a broker is communicated with a client and a worker/consumer.
- client can ask for state (health of a queue, etc)
- client can send a message (to a queue, etc)
The worker communicates with the broker
- please connect me to this queue
- is there anything in this queue for me?
- here is the result of this message (success or failure)
etc.
I also made use of the pub-sub pattern. I made a GUI app that subscribes to these queues and feeds updates. If there are problems (failures) you can see them, here. I leave it to staff to re-send the message. If its already sent, maybe the subscriber missed that packet. pub-sub is not reliable, afterall.. but works 99% of time.
Overall it has been a great system for my needs -- again, it is lightweight, fast, and hardly costs anything. I do not need a beefy machine, either.
Honestly, I swear by this application (broker) i made. Now, am I comparing it to RabbitMQ or Kafka? No! I am sure these products are very good at what they do. However, especially for smaller companies I work for, this software has saved them a few pennies.
In all I found "Distributed computing" to be rewarding, and ZeroMQ+Sqlite have been a nice combination for my broker.
I have been experimenting with nanomsg-NG as a replacement for ZeroMQ but I just haven't spent proper time of it due to other commitments.
That is, there was a big desire around that time period to "build it how the big successful companies built it." But since then, a lot of us have realized that complexity isn't necessary for 99% of companies. When you couple that with hardware and standard databases getting much better, there are just fewer and fewer companies who need all of these "scalability tricks".
My bar for "Is there a reason we can't just do this all in Postgres?" is much, much higher than it was a decade ago.