"It was at that moment that it became obvious they deleted millions of messages using our API, leaving only 1 message in the channel. If you have been paying attention you might remember how Cassandra handles deletes using tombstones (mentioned in Eventual Consistency). When a user loaded this channel, even though there was only 1 message, Cassandra had to effectively scan millions of message tombstones (generating garbage faster than the JVM could collect it)."
And although the blog post talks about GC tuning, there's mention here [1] that they didn't do much tuning and were actually running on an old version of Cassandra (and presumably JVM) - having just switched over from CMS (!).
0) https://discord.com/blog/how-discord-stores-billions-of-messages
1) https://news.ycombinator.com/item?id=33136453
That services layer reminds be of a big, fancy, distributed Varnish Cache... they don't mention caching and they chose the word coalesce so I assume it doesn't do much actual caching. But made me think of Varnish's "grace mode" and it's use to prevent the thundering herd problem (which is where I first heard of 'request coalescing') https://varnish-cache.org/docs/6.1/users-guide/vcl-grace.htm...
Also love to see consistent hashing come up again and again. It's a great piece of duct tape that has proven useful in many similar situations. If you know where something should be then you know where everything is gonna come look for it!
How Discord Stores Trillions of Messages - https://news.ycombinator.com/item?id=35048410 - March 2023 (10 comments)
> The last one? Our friend, cassandra-messages. [...] To start with, it’s a big cluster. With trillions of messages and nearly 200 nodes, any migration was going to be an involved effort.
To me, that's a surprisingly small amount of nodes for message storage, given the size of discord. I had honestly expected a much more intricate architecture, engineered towards quick scalability, involving a lot more moving parts. I'm sure the complexity is higher than stated in the article, but it makes me wonder, given that I've been partially responsible for more than 200 physical nodes that did less, how much of modern cloud architecture is over engineered.
I wonder how much they paid ScyllaDB to do this before even using ScyllaDB.
(Not trying to undermine the engineering efforts, or the welcoming engineering blog posts though! I really think all these is needed)