0x74696d
This architecture is roughly how HashiCorp's Nomad, Consul, and Vault are built (I'm one of the maintainers of Nomad). While it's definitely a "weird" architecture, the developer experience is really nice once you get the hang of it.

The in-memory state can be whatever you want, which means you can build up your own application-specific indexing and querying functions. You could just use sqlite with :memory: for the Raft FSM, but if you can build/find an in-memory transaction store (we use our own go-memdb), then reading from the state is just function calls. Protecting yourself from stale reads or write skew is trivial; every object you write has a Raft index so you can write APIs like "query a follower for object foo and wait till it's at least at index 123". It sweeps away a lot of "magic" that normally you'd shove into a RDBMS or other external store.

That being said, I'd be hesitant to pick this kind of architecture for a new startup outside of the "infrastructure" space... you are effectively building your own database here though. You need to pick (or write) good primitives for things like your inter-node RPC, on-disk persistence, in-memory transactional state store, etc. Upgrades are especially challenging, because the new code can try to write entities to the Raft log that nodes still on the previous version don't understand (or worse, misunderstand because the way they're handled has changed!). There's no free lunch.

Zak
Decades ago, PG wrote that he didn't use a database for Viaweb, and that it seemed odd for web apps to be frontends to databases when desktop apps were not[0]. HN also doesn't use a database.

That's no longer true, with modern desktop and mobile apps often using a database (usually SQLite) because relational data storage and queries turn out to be pretty useful in a wide range of applications.

[0] https://www.paulgraham.com/vwfaq.html

theideaofcoffee
I get the desire to experiment with interesting things, but it seems like such a huge waste of time to avoid having to learn the most basic aspects of MySQL or postgres. You could "just" build on top of and be done with it, especially if you're running in a public cloud provider. I don't buy the increased RTT or troubles with concurrency issues, the latter having simple solutions by basic tuning, or breaking out your noisy customers. There's another post on their blog mentioning the possibility of adding 10 million rows per day and the challenges of indexing that. That's... literally nothing and I don't think even 10x that justifies having to engineer a custom solution.

Worse is better until you absolutely need to be less worse, then you'll know for sure. At that point you'll know your pain points and can address them more wisely than building more up front.

oefrha
Seems weird to start with “not talking about using something like SQLite where your data is still serialized”, then end up with a home grown transaction log that requires serialization and needs to be replicated, which is how databases are replicated anyway.

If your load fits entirely on one server, then just run the database on that damn server and forget about “special architectures to reduce round-trips to your database”. If your data fits entirely in RAM, then use a ramdisk for the database if you want, and replicate it to permanent storage with standard tools. Now that’s actually simple.

nilirl
I'm baffled at the arguments made in this article. This is supposed to be a simpler and faster way to build stateful applications?

The premises are weak and the claims absurd. The author uses overstatement of the difficulties of serialization just to make their weak claim stronger.

mg
When I start a new project, the data structure usually is a "list of items with attributes". For example right now, I am writing a fitness app. The data consists of a list of exercises and each exercise has a title, a description, a video url and some other attributes.

I usually start by putting those items into YAML files in a "data" directory. Actually a custom YAML dialect without the quirks of the original. Each value is a string. No magic type conversions. Creating a new item is just "vim crunches.yaml" and putting the data in. Editing, deleting etc all is just wonderfully easy with this data structure.

Then when the project grows, I usually create a DB schema and move the items into MariaDB or SQLite.

This time, I think I will move the items (exercises) into a JSON column of an SQLite DB. All attributes of an item will be stored in a single JSON field. And then write a little DB explorer which lets me edit JSON fields as YAML. So I keep the convenience of editing human readable data.

Writing the DB explorer should be rather straight forward. A bit of ncurses to browse through tables, select one, browse through rows, insert and delete rows. And for editing a field, it will fire up Vim. And if the field is a JSON field, it converts it to YAML before it sends it to Vim and back to JSON when the user quits Vim.

nickpsecurity
What they described early on in the article was basically how NUMA machines worked (eg SGI Altix or UV). Also, their claimed benefit was being able to parallelize things with multithreading in low-latency, huge RAM. Clustering came as a low-cost alternative to $1+ million machines. There’s similarities to persistence in AS/400, too, where apps just wrote memory that gets transparently mapped to disk.

Now, with cheap hardware, they’re going back in time to the benefits of clustered, NUMA machines. They’ve improved on it along the way. I did enjoy the article.

Another trick from the past was eliminating TCP/IP stacks from within clusters to knock out their issues. Solutions like Active Messages were a thin layer on top of the hardware. There’s also designs for network routers that have strong consistency built into them. Quite a few things they could do.

If they get big, there’s hardware opportunities. On CPU side, SGI did two things. Their NUMA machines expanded the number of CPU’s and RAM for one system. They also allowed FPGA’s to plug directly into the memory bus to do custom accelerators. Finally, some CompSci papers modified processor ISA’s, networks on a chip, etc to remove or reduce bottlenecks in multithreading. Also, chips like OpenPiton increase core counts (eg 32) with open, customizable cores.

AdieuToLogic
> Imagine all the wonderful things you could build if you never had to serialize data into SQL queries.

This exists in sufficiently mature Actor model[0] implementations, such as Akka Event Sourcing[1], which also addresses:

> But then comes the important part: how do you recover when your process crashes? It turns out that answer is easy, periodically just take a snapshot of everything in RAM.

Intrinsically and without having to create "a new architecture for web development". There are even open source efforts which explore the RAFT protocol using actors here[2] and here[3].

0 - https://en.wikipedia.org/wiki/History_of_the_Actor_model

1 - https://doc.akka.io/docs/akka/current/typed/persistence.html

2 - https://github.com/Michael-Dratch/RAFT_Implementation

3 - https://github.com/invkrh/akka-raft

oconnore
My first thought was, “oh, I used to do this when I wrote Common Lisp, it’s funny someone rediscovered that technique in <rust/typescript/java/whatever>”.

But no, just more lispers.

ksec
>RAM is super cheap

I think this has to be the number one misunderstanding for developers.

Yes, SSD in terms of throughput or IOPs has gone up by 100 to 10000x. vCPU performance per dollar has gone up by 20 - 50x. We went from 45/32nm to now 5nm/3nm, and much higher IPC.

But RAM price hasn't gotten anywhere near the same fall as CPU or SSD. It may have gotten a lot faster, you may be even getting to stick lots of memory with higher density chip and channels went from dual to 8 or 12. But if you look at the DRAM Spot price since 2008 to 2022, you will see the lowest DRAM price has been the same at around $2.8/GB for three times. As the DRAM price goes in cycle with $8 / $6 per GB in between this same period. i.e Had you bought DRAM at its lowest point or its highest point during the past ~15 years your DRAM would have cost roughly the same plus or minus 10-20% ignoring inflation.

It was only until Mid 2022 it finally broke through the $2.8/GB barrier and collapse close to $1/GB before settling on ~ $2/GB for DDR5.

Yes you can now get 4TB RAM on a server. But it doesn't mean DRAM are super cheap. Developers on average or for those in big Tech are now earning way more than they were in 2010. Which makes them think RAM has gotten a lot more affordable. In reality even in the lowest point over past 15 years you only get at best slightly more than 2x reduction in DRAM price. And we will likely see DRAM price shot up again in a year or two.

wmf
This sounds a lot like Prevayler. https://prevayler.org/
donatj
I've got a handful of small Go applications where I just have a "go generate" command that generates the entire dataset as Go, so the data set ends up compiled into the binary. Works great.

https://emoji.boats/ is the most public facing of these.

I also have built a whole class of micro-services that pull their entire dataset from an API on start up, hold it resident and update on occasion. These have been amazing for speeding up certain classes of lookup for us where we don't always need entirely up to date data.

Sn0wCoder
Not sure I would call that setup simple, but it is interesting. I have honestly never heard of ‘Raft’ or the Raft Consensus Protocol or bknr.datastore, so always happy to learn something on a Friday night.
lpapez
I once saw a project in the wild where the "database" was implemented using filesystem directories as "tables" with JSON files inside as "rows".

When I asked people working on it if they considered Redis or Mongo or Postgres with jsonb columns, they just said they considered all of those things but decided to roll out their own db anyway because "they understood it better".

This article gives off the same energy. I really hope it works out for you, but IMO spending innovation tokens to build a database is nuts.

nephy
We didn’t want to build something complicated, so we implemented our own raft consensus layer. Have you considered just using Redis?
andrewstuart
But why, when you can build things in an ordinary way with ordinary tech like Python/Java/C#/TypeScript and Postgres. Lots of developers know it, lots of answers to your questions online, the AI knows how to write it.

Reading posts like this makes me think the founders/CTO is mixing hobby programming with professional programming.

_the_inflator
This reminds me of the heated discussions around jQuery by some so called performance driven devs, which cumulated into this website:

https://youmightnotneedjquery.com/

The overwhelming majority underestimates the beauty and effort as well as experience that goes into abstractions. There are some true geniuses at times doing fantastic work, to deliver syntactical sugar while the critics mock the maybe somewhat larger bundle size for “a couple of lines frequently used.” That’s why.

In the end, a good framework is more than just an abstraction. It guarantees consistency and accessibility.

Try to understand the source code if possible before reinventing the wheel is my advice.

What maybe starts out to be fun quickly becomes a burden. If there weren’t any edge cases or different conditions, you wouldn’t need an abstraction. Been there, done that.

tofflos
Check out https://eclipsestore.io (previously named Microstream) if you're into Java and interested in some of the ideas presented in this article. You use regular objects, such as Records, and regular code, such as java.util.stream, for processing, and the library does snapshotting to disk.

I haven't tried it out but just thinking of how many fewer organizational hoops I would have to jump through makes we want to try it out:

- No ordering a database from database operations.

- No ordering a port opening from network operations.

- No ordering of certificates.

- The above times 3 for development, test and production.

- Not having to run database containers during development.

I think the sweet spot for me would be in services that I don't expect to grow beyond a single node and there is an acceptance for a small amount of downtime during service windows.

Tehdasi
Hmm, but the problem with having in-memory objects rather than a db is you end up having to replicate alot of the features of a relational database to get a usable system. And adding all these extra features you want from those dbs end up making a simple solution not very simple at all.
leokennis
I’m not from “start up world” but in the end, few things give me more comfort and lack of surprises down the line than just having a relational database with built in redundancy/transaction logs/back up/recovery. Sure there might always be edge cases (lack of money, regulations, specialist software offering) but in the vast majority of cases - just get a database.
gchamonlive
The problem is, you only know what you know.

Sure you reduce deployment complexity, but what about maintaining your algorithm that implements data persistence and replication?

To assume that will never spectacularly bite you is naive. Tests also only go so far as you know what you are testing for, and while you don't know if your product will ever be used, you also don't know if it will explode in success and you will be hostage of your own decisions and technical debt.

These are HARD decisions. Hard decisions require solid solutions. You can surely try that with toy projects, but if I was in a position to build a software architecture for something that had a remote possibility of being used in production, I would oppose such designs adamantly.

LAC-Tech
Really got a kick out of this article. RAM is big, and cheap. And as we all know the database is the log, and everything else is just the cache. A few questions, comments!

1. I take it you've seen the LMAX talk [0], and were similarly inspired? :)

2. Are you familiar with the event sourcing approach? It's basically what you describe, except you don't flush to disk after editing every field, you batch your updates into a single "event". (you've come at it from the exact opposite end, but it looks like roughly the same thing).

[0] https://www.infoq.com/presentations/LMAX/

paxys
There is so much wrong with this I don't know where to even start. You want to "keep things simple" and not stand up a separate instance of MySQL/Postgres/Redis/MongoDB/whatever else. So, you:

1. Create your own in-memory database.

2. Make sure every transaction in this DB can be serialized and is simultaneously written to disk.

3. Use some orchestration platform to make all web servers aware of each other.

4. Synchronize transaction logs between your web servers (by implementing the Raft protocol) and update the in-memory DB.

5. Write some kind of conflict resolution algorithm, because there's no way to implement locking or enforce consistency/isolation in your DB.

6. Shard your web servers by tenant and write another load balancing layer to make sure that requests are getting to the server their data is on.

Simple indeed.

joatmon-snoo
This is cool! I’m always excited by people trying simpler things, as a big fan of using Boring Technology.

But I have some bad news: you haven’t built a system without a database, you’ve just built your own database without transactions and weak durability properties.

> Hold on, what if you’ve made changes since the last snapshot? And this is the clever bit: you ensure that every time you change parts of RAM, we write a transaction to disk.

This is actually not an easy thing to do. If your shutdowns are always clean SIGSTOPs, yes, you can reliably flush writes to disk. But if you get a SIGKILL at the wrong time, or don’t handle an io error correctly, you’re probably going to lose data. (Postgres’ 20-year fsync issue was one of these: https://archive.fosdem.org/2019/schedule/event/postgresql_fs...)

The open secret in database land is that for all we talk about transactional guarantees and durability, the reality is that those properties only start to show up in the very, very, _very_ long tail of edge cases, many of which are easily remedied by some combination of humans getting paged and end users developing workarounds (eg double entry bookkeeping). This is why MySQL’s default isolation level can lose writes: there are usually enough safeguards in any given system that it doesn’t matter.

A lot of what you’re describing as “database issues” problem don’t sound to me like DB issues, so much as latency issues caused by not colocating your service with your DB. By hand-rolling a DB implementation using Raft, you’ve also colocated storage with your service.

> Screenshotbot runs on their CI, so we get API requests 100s of times for every single commit and Pull Request.

I’m sorry, but I don’t think this was as persuasive as you meant it to be. This is the type of workload that, to be snarky about, I could run off my phone[0]

[0]: https://tailscale.com/blog/new-internet

annacappa
It’s great that people explore new ideas. However this does not seem like a good idea.

It claims to solve a bunch of problems by ignoring them. There are solid reasons why people distribute their applications across multiple machines. After reading this article I feel like we need to state a bunch of them.

Redundancy - what if one machine breaks either a hardware failure a software failure or a network failure (network partition where you can’t reach the machine or it can’t reach the internet)

Scaling- what if you can’t serve all of your customers from one machine ? Perhaps you have many customers and a small app or perhaps your app can use a lot of resources (maybe it loads gigs of data)

Deployment - what happens when we want to change the code and not go down if you are running multiple copies of your app you get this for cheap

There are tons of smaller benefits - right sizing your architecture What if the one machine you choose is not big enough you need to move to a new machine, with multiple machines you just increase the number of machines. You also get to use a variety of machine sizes and can choose ones that fit your needs so this flexibility allows you to choose cheaper machines

I feel like the authors don’t know why people invented the standard way of doing things.

jhardy54
> Hold on, what if you’ve made changes since the last snapshot? And this is the clever bit: you ensure that every time you change parts of RAM, we write a transaction to disk. So if you have a line like foo.setBar(2), this will first write a transaction that says we’ve changed the bar field of foo to 2, and then actually set the field to 2. An operation like new Foo() writes a transaction to disk to say that a Foo object was created, and then returns the new object.

>

> And so, if your process crashes and restarts, it first reloads the snapshot, and replays the transaction logs to fully recover the state. (Notice that index changes don’t need to be part of the transaction log. For instance if there’s an index on field bar from Foo, then setBar should just update the index, which will get updated whether it’s read from a snapshot, or from a transaction.)

That’s a database. You even linked to the specific database you’re using [0], which describes itself as:

> […] in-memory database with transactions […]

Am I misunderstanding something?

[0]: https://github.com/bknr-datastore/bknr-datastore

jb3689
We wanted to simplify our architecture and not use a database, so instead we created our own version of everything databases already do for us. Super risky for a company. Hopefully you don’t spend all of your time maintaining, optimizing, and scaling this custom architecture.
samarabbas
Notice how the complexity of this grows suddenly when you start thinking about infrastructure failure and restarts due to deployments. I have seen this play out dozens of time in my professional career where these systems although starts very simple but eventually becomes a huge maintenance burden over time. This is where high level abstractions like Durable Execution is much more powerful for developers which has the potential to abstract out this level of details. Basically code up your application like infrastructure failures does not exist and let underlying Durable Execution platform like Temporal or something similar handle resiliency for you.
aag
After reading countless negative comments, many written based on real experience, but almost all tinged with fear, and even a few ad hominem attacks ("...not really experienced" and "...just more lispers?" Really?), I'd like to offer a word of encouragement.

I'm thrilled to see someone try something different, and grateful that he wrote about his positive experiences with it. Perhaps it will turn out to have been the wrong decision, but his writing about it is the only way we'll ever really know. It's so easy to be lulled into a sense of security by doing things the conventional way, and to miss opportunities offered by huge improvements in hardware, well-written open-source libraries, and powerful programming languages.

I have an especially hard time with the idea that SQL is where we all should end up. I've worked at Oracle, and I worked on Google AdWords when it was built on MySQL and InnoDB. I understand SQL's power, but I also understand how constraining it is, not only on data representation, but also on querying. I want to read more posts about people trying to build something without it. Redis is one way, but I'm eager to hear about others.

I wish the author good luck, and encourage him to write another post once Screenshotbot reaches the next stage.

nesarkvechnep
It’ll be interesting to do something like this in Elixir where clustering is almost a runtime primitive.
antman
As a side question is there a python library for braft or a production grade raft library for python?
apexkid
> periodically just take a snapshot of everything in RAM.

Sound similar to `stop the world Garbage collection` in Java. Does your entire processing comes to halt when you do this? How frequently do you need to take snapshots? Or do you have a way to do this without halting everything

qprofyeh
We used Redis with persistence to build our first prototype. It performed amazingly and development speed was awesome. We were a full year beyond break-even before adding MySQL to the stack for the few times we missed the ability to run SQL queries, for finance.
MagicMoonlight
> Hold on, what if you’ve made changes since the last snapshot? And this is the clever bit: you ensure that every time you change parts of RAM, we write a transaction to disk.

Every single time… it’s always just the wheel being re-written.

hankchinaski
textbook example of overengineering for no reason
bastawhiz
Please, someone explain how building your own in-memory database and snapshotting on top of Raft is simpler than just installing Postgres or SQLite with one of the modern durability tools. Seriously, if you genuinely believe writing concurrency code with mutexes and other primitives and hoping that's all correct is easier than just writing a little SQL, you've tragically lost your way.
ibash
1. If your entire cluster goes down do you permanently lose state?

2. Are network requests / other ephemeral things also saved to the snapshot?

golergka
> Imagine all the wonderful things you could build if you never had to serialize data into SQL queries.

No transactions, no WAL, no relational schema to keep data design sane, no query planner doing all kinds of optimisations and memory layout things I don't have to think about?

You could say that transactions, for example, would be redundant if there is no external communication between app server and the database. But it is far from the only thing they're useful for. Transactions are a great way of fulfilling important invariants about the data, just like a good strict database schema. You rollback a transaction if an internal error throws. You make sure that transaction data changes get serialised to disk all at once. You remove a possibility that statements from two simultaneous transactions access the same data in a random order (at least if you pick a proper transaction isolation level, which you usually should).

> You also won’t need special architectures to reduce round-trips to your database. In particular, you won’t need any of that Async-IO business, because your threads are no longer IO bound. Retrieving data is just a matter of reading RAM. Suddenly debugging code has become a lot easier too.

Database is far from the only other server I have to communicate with when I'm working on user's HTTP request. As a web developer, I don't think I've worked on a single product in the last 4 years that didn't have some kind of server-server communication for integrations with other tools and social media sites.

> You don’t need crazy concurrency protocols, because most of your concurrency requirements can be satisfied with simple in-memory mutexes and condition variables.

Ah, mutexes. Something that programmers never shot themselves in a foot with. Also, deadlocks don't exist.

> Hold on, what if you’ve made changes since the last snapshot? And this is the clever bit: you ensure that every time you change parts of RAM, we write a transaction to disk. So if you have a line like foo.setBar(2), this will first write a transaction that says we’ve changed the bar field of foo to 2, and then actually set the field to 2. An operation like new Foo() writes a transaction to disk to say that a Foo object was created, and then returns the new object.

A disk write latency is added to every RAM write. It has no performance cost and nobody notices this.

I apologise if this comes off too snarky. Despite all of the above, I really like this idea — and already think of implementing it in a hobby project, just to see how well it really works. I'm still not sure if it's practical, but I love the creative thinking behind this, and a fact that it actually helped them build a business.

k__
Well, at least they gave an example of what not to do.
aorloff
This is like an example case of a lambda + kinesis
WuxiFingerHold
This is not good advice. It's in parts a hyperbolic and unbalanced view:

> Imagine all the wonderful things you could build if you never had to serialize data into SQL queries.

You can do all those "wonderful things" with an RDBMS too, it's just an additional step.

> First, you don’t need multiple front-end servers talking to a single DB, just get a bigger server with more RAM and more CPU if you need it.

You don't "need" that with a single DB too, you can also get a bigger machine. Also, you can use SQLite and Litestream.

> What about indices? Well, you can use in-memory indices, effectively just hash-tables to lookup objects. You don’t need clever indices like B-tree that are optimized for disk latency.

RDMBS provide all kind of indices. You don't need to build them in your code or re-invent clever solutions. They're all there. Optimized and battle-tested over decades.

> You also won’t need special architectures to reduce round-trips to your database.

You don't need "special architectures" at all. With the most simple setup you get thousands to requests per second and sub 5 ms latency. With SQLite even more. No need for async IO, threads scale well enough for most apps. Anyway, async is not a magical thing.

> You don’t need any services to run background jobs, because background jobs are just threads running in this large process.

How does this change when using an RDBMS?

> You don’t need crazy concurrency protocols, because most of your concurrency requirements can be satisfied with simple in-memory mutexes and condition variables.

I trust a proper proven implementation in SQLite or Postgres much more than "simple in-memory mutexes and condition variables". One reason why Rust is so popular is that it's an eye opener when the compiler shows you all your concurrency bugs you never thought you had in your code.

---------------------

RDBMS solve / support may important things the easy way

- normalized data modelling by refs and joins

- querying, filtering and aggregating data

- concurrency

- storage

Re-inventing those is most of the time much harder, error prone and expensive.

---------------------

The simplest, easy and proven way is still to use an RDBMS. Start with SQLite and Litestream if you don't want to manage Postgres, which is a substantial effort, I admit. Or cost factor, although something like Neon / Supabase / ... for small scale is much much much cheaper than the development costs for all the stuff above.

RandomWorker
Honestly SQLite is just a great option. Stored locally, so you have that fast disk access. Does great for small medium and even larger databases. And you just have a file.
cakeface
You built a database.
iammrpayments
Isn’t this like redis?
shynomezzz
look like erlang beam
localfirst
I would use cloudflare R2 but its not globally distributed so its pointless using it on edge

otherwise I get the messaging with edge you the database is the bottleneck

just need a one stop shop to do edge functions + edge db

samstave
[flagged]