SQLite: 35% Faster Than the Filesystem

tgtweak

No file system attributes or metadata on records which also means no (xattrs/fattrs) being written or updated, no checks to see if it's a physical file or a pipe/symlink, no permission checks, no block size alignment mismatches, single open command.

Makes sense when you consider you're throwing out functionality and disregarding general purpose design.

If you use a fuse mapping to SQLite, mount that directory and access it, you'd probably be very similar performance (perhaps even slower) and storage use as you'd need to add additional columns in the table to track these attributes.

I have no doubt that you could create a custom tuned file system on a dedicated mount with attributes disabled, minimized file table and correct/optimized block size and get very near to this perf.

Let's not forget the simplicity of being able to use shell commands (like rsync) to browse and manipulate those files without running the application or an SQL client to debug.

Makes sense for developers to use SQLite for this use case though for an appliance-type application or for packaged static assets (this is already commonplace in game development - a cab file is essentially the same concept)

lc64

That's a very rigorously written article.

Let's also note the 4x speed increase on windows 10, once again underlining just how slow windows filesystem calls are, when compared to direct access, and other (kernel, filesystem) combinations.

robertclaus

I did some research in a database research lab, and we had a lot of colleagues working on OS research. It was always interesting to compare the constraints and assumptions across the two systems. I remember one of the big differences was the scale of individual records we expected to be working with, which in turn affected how memory and disk was managed. Most relational databases are very much optimized for small individual records and eventual consistency, which allows them to cache a lot more in memory. On the other hand, performance often drops sharply with the size of your rows.

igammarays

This is precisely why I'm considering appending to a sqlite DB in WAL2 mode instead of plain text log files. Almost no performance penalty for writes but huge advantages for reading/analysis. No more Grafana needed.

freedmand

I recently had the idea to record every note coming out of my digital piano in real-time. That way if I come up with a good idea when noodling around I don’t have to hope I can remember it later.

I was debating what storage layer to use and decided to try SQLite because of its speed claims — essentially a single table where each row is a MIDI event from the piano (note on, note off, control pedal, velocity, timestamp). No transactions, just raw inserts on every possible event. It so far has worked beautifully: it’s performant AND I can do fun analysis later on, e.g. to see what keys I hit more than others or what my average note velocity is.

leni536

> The performance difference arises (we believe) because when working from an SQLite database, the open() and close() system calls are invoked only once, whereas open() and close() are invoked once for each blob when using blobs stored in individual files. It appears that the overhead of calling open() and close() is greater than the overhead of using the database

I wonder how io_uring compares.

Upvoter33

When something built on top of the filesystem is "faster" than the filesystem, it just means "when you use the filesystem in a less-than-optimal manner, it will be slower than an app that uses it in a sophisticated manner." An interesting point, but perhaps obvious...

Kalanos

TLDR; don't do it.

I've used SQLite blob fields for storing files extensively.

Note that there is a 2GB blob maximum: https://www.sqlite.org/limits.html

To read/write blobs, you have to serialize/deserialize your objects to bytes. This process is not only tedious, but also varies for different objects and it's not a first-class citizen in other tools, so serialization kept breaking as my dependencies upgraded.

As my app matured, I found that I often wanted hierarchical folder-like functionality. Rather than recreating this mess in db relationships, it was easier to store the path and other folder-level metadata in sqlite so that I could work with it in Python. E.g. `os.listdir(my_folder)`.

Also, if you want to interact with other systems/services, then you need files. sqlite can't be read over NFS (e.g. AWS EFS) and by design it has no server for requests. so i found myself caching files to disk for export/import.

SQLite has some settings for handling parallel requests from multiple services, but when I experimented with them I always wound up with a locked db due to competing requests.

For one reason or another, you will end up with hybrid (blob/file) ways of persisting data.

throwaway211

I was looking at self hosted RSS readers recently. The application is single user. I don't expect it to be doing a lot of DB intensive stuff.

It surprised me that almost all required PostgreSQL, and most of those that didn't opted for something otherwise complex such as Mongo or MySQL.

SQLite, with no dependencies, would have simplified the process no end.

theGeatZhopa

Depends, depends.. but just of logic:

All fs/drive access is managed by the OS. No DB systems have raw access to sectors or direct raw access to files.

Having a database file on the disc, offers a "cluster" of successive blocks on the hard drive (if it's not fragmented), resulting in relatively short moving distances of the drive head to seek the necessary sectors. There will still be the same sectors occupied, even after vast insert/write/del operations. Absolutely no change of DB file's position on hard drive. It's not a problem with SSDs, though.

So, the following apply:

client -> DB -> OS -> Filesystem

I think, you already can see the DB part is an extra layer. So, if one wouldn't have this, it would be "faster" in terms of execution time. Always.

If it's slower, then you use the not-optimal settings for your use case/filesystem.

My father did this once. He took H2 and made it even more faster :) incredible fast on Windows in direct comparison of H2/h2-modificated with same data.

So having a DBMS is convenient and made in decisions to serve certain domains and their problems. Having it is convenient, but that doesn't mean it's the most optimized way of doing it.

RaiausderDose

numbers are from 2017, update would be cool

aydgn

This discussion never gets old.

* https://news.ycombinator.com/item?id=14550060 7 years ago

* https://news.ycombinator.com/item?id=20729930 5 years ago

* https://news.ycombinator.com/item?id=27137834 3 years ago

* https://news.ycombinator.com/item?id=27897427 3 years ago

* https://news.ycombinator.com/item?id=34387407 2 years ago

OttoCoddo

SQLite can be faster than FileSystem for small files. For big files, it can do more than 1 GB/s. On Pack [1], I benchmarked these speeds, and you can go very fast. It can be even 2X faster than tar [2].

In my opinion, SQLite can be faster in big reads and writes too, but the team didn't optimise it as much (like loading the whole content into memory) as maybe it was not the main use of the project. My hope is that we will see even faster speeds in the future.

[1] https://pack.ac [2] https://forum.lazarus.freepascal.org/index.php/topic,66281.m...

jstummbillig

Let's assume that filesystems are fairly optimized pieces of software. Let's assume that the people building them heard of databases and at some point along the way considered things like the costs of open/close calls.

What is SQLite not doing that filesystems are?

cedws

How much more performance could you get by bypassing the filesystem and writing directly to the block device? Of course, you'd need to effectively implement your own filesystem, but you'd be able to optimise it more for the specific workload.

throwaway211

i.e. opening and closing many files from disk is slower than opening and closing one file and using memory.

It's important. But understandable.

me551ah

Why hasn’t someone made sqlitefs yet?

alberth

Slight OT: does this apply to SQLite on OpenBSD?

Because with OpenBSD introduction of pinning all syscalls to libc, doesn’t this block SQLite from making syscall direct.

https://news.ycombinator.com/item?id=38579913

vagab0nd

Reminds me of this talk: https://www.destroyallsoftware.com/talks/the-birth-and-death...

throwaway81523

Deleting a lot of rows from an sqlite database can be awfully slow, compared with deleting a file.

efilife

> Reading is about an order of magnitude faster than writing

not a native speaker, what does it mean?

throwaway984393

[dead]

The_Colonel

* for certain operations.

Which is a bit d'oh, since being faster for some things is one of the main motivations for a database in the first place.