Let's also note the 4x speed increase on windows 10, once again underlining just how slow windows filesystem calls are, when compared to direct access, and other (kernel, filesystem) combinations.
I was debating what storage layer to use and decided to try SQLite because of its speed claims — essentially a single table where each row is a MIDI event from the piano (note on, note off, control pedal, velocity, timestamp). No transactions, just raw inserts on every possible event. It so far has worked beautifully: it’s performant AND I can do fun analysis later on, e.g. to see what keys I hit more than others or what my average note velocity is.
I wonder how io_uring compares.
I've used SQLite blob fields for storing files extensively.
Note that there is a 2GB blob maximum: https://www.sqlite.org/limits.html
To read/write blobs, you have to serialize/deserialize your objects to bytes. This process is not only tedious, but also varies for different objects and it's not a first-class citizen in other tools, so serialization kept breaking as my dependencies upgraded.
As my app matured, I found that I often wanted hierarchical folder-like functionality. Rather than recreating this mess in db relationships, it was easier to store the path and other folder-level metadata in sqlite so that I could work with it in Python. E.g. `os.listdir(my_folder)`.
Also, if you want to interact with other systems/services, then you need files. sqlite can't be read over NFS (e.g. AWS EFS) and by design it has no server for requests. so i found myself caching files to disk for export/import.
SQLite has some settings for handling parallel requests from multiple services, but when I experimented with them I always wound up with a locked db due to competing requests.
For one reason or another, you will end up with hybrid (blob/file) ways of persisting data.
It surprised me that almost all required PostgreSQL, and most of those that didn't opted for something otherwise complex such as Mongo or MySQL.
SQLite, with no dependencies, would have simplified the process no end.
All fs/drive access is managed by the OS. No DB systems have raw access to sectors or direct raw access to files.
Having a database file on the disc, offers a "cluster" of successive blocks on the hard drive (if it's not fragmented), resulting in relatively short moving distances of the drive head to seek the necessary sectors. There will still be the same sectors occupied, even after vast insert/write/del operations. Absolutely no change of DB file's position on hard drive. It's not a problem with SSDs, though.
So, the following apply:
client -> DB -> OS -> Filesystem
I think, you already can see the DB part is an extra layer. So, if one wouldn't have this, it would be "faster" in terms of execution time. Always.
If it's slower, then you use the not-optimal settings for your use case/filesystem.
My father did this once. He took H2 and made it even more faster :) incredible fast on Windows in direct comparison of H2/h2-modificated with same data.
So having a DBMS is convenient and made in decisions to serve certain domains and their problems. Having it is convenient, but that doesn't mean it's the most optimized way of doing it.
* https://news.ycombinator.com/item?id=14550060 7 years ago
* https://news.ycombinator.com/item?id=20729930 5 years ago
* https://news.ycombinator.com/item?id=27137834 3 years ago
* https://news.ycombinator.com/item?id=27897427 3 years ago
* https://news.ycombinator.com/item?id=34387407 2 years ago
In my opinion, SQLite can be faster in big reads and writes too, but the team didn't optimise it as much (like loading the whole content into memory) as maybe it was not the main use of the project. My hope is that we will see even faster speeds in the future.
[1] https://pack.ac [2] https://forum.lazarus.freepascal.org/index.php/topic,66281.m...
What is SQLite not doing that filesystems are?
It's important. But understandable.
Because with OpenBSD introduction of pinning all syscalls to libc, doesn’t this block SQLite from making syscall direct.
not a native speaker, what does it mean?
Which is a bit d'oh, since being faster for some things is one of the main motivations for a database in the first place.
Makes sense when you consider you're throwing out functionality and disregarding general purpose design.
If you use a fuse mapping to SQLite, mount that directory and access it, you'd probably be very similar performance (perhaps even slower) and storage use as you'd need to add additional columns in the table to track these attributes.
I have no doubt that you could create a custom tuned file system on a dedicated mount with attributes disabled, minimized file table and correct/optimized block size and get very near to this perf.
Let's not forget the simplicity of being able to use shell commands (like rsync) to browse and manipulate those files without running the application or an SQL client to debug.
Makes sense for developers to use SQLite for this use case though for an appliance-type application or for packaged static assets (this is already commonplace in game development - a cab file is essentially the same concept)