Search is a very different story, you wouldn't want to have to do a full directory scan for text based search. So some level of indexing would be useful for a client mail service.
Similarly, I've thought it would be really cool if Cloudflare offered a TCP worker option, you could to a simple mail service backed by R2. The web ui/ux could be pretty awesome and geo distributed.
Over a decade ago I developed, for our small (small enough to not have any IT dept or IT management) office a bespoke extension for Outlook (yes, bad idea, I know) which translated all incoming emails and attachments into the standard file system, decanted into project folders.
It was triggered upon any opening of an Unread email, and required the user to pick a project from a list, and hit OK. Cancelling was an option (for personal emails).
There was a config tab for the admin to define the filename string, by arranging elements like date/time/to/from/subject/.., and any attachments were also placed as files.
A very imperfect approach, but under the circumstances it was a vast improvement over the prior mess of individual mailboxes bestrewn with all manner of project correspondence and files, which made intricate queries about past doings into frustrating spaghettified detanglements.
And ultimately - perhaps like a good deal of IT - at heart was uninformed management, and the reality of ordinary users with little notion of information management.
Mail.app, uses a directory structure that looks similar* to this for say, gmail:
{account-uuid}/[Gmail].mbox/All Mail.mbox/{mailbox-guid}/Data/Messages/{msguid}.partial.emlx
{account-uuid}/[Gmail].mbox/All Mail.mbox/{mailbox-guid}/Data/Attachments/{msguid}/{mime part #}/{mime subpart #}/filename.ext
The emlx format is a bit different from eml. It contains the number of bytes for the message at the top and an xml plist at the end that has message flags, last viewed time, gmail labels, etc. For partial.emlx files, the base64 content is removed from the email itself and a content length is added.This format has its drawbacks, of course.
* Not shown is the hierarchy based on message uid used to keep the number of files in the Messages directory down.
Of course, standard (usual, common) email is just text. Right for the pictures, to have them just as text, they are encoded as base64. Right, its MIME (MultiMedia Internet Mail Extensions).
Soooo, okay, my ISP (Internet Service Provider) has an email service. The service is a Web site, and it does offer getting the "Source", that is, the text, all as just one file.
Now, suppose for each email message I send/receive, I keep the text in its own file, with just the text, just as I got it from, say, my ISP. I will handle the file naming, indexing, summarization, etc.
Help!!!! Is there an email program that I can run that, for each of those files, can read it and display it? Sure, it should be able to display the text, as text, that is not one of the MIME extensions but also be able to do the right thing for each of the rest, still images, video clips, audio, whatever. Know of such a program???? Thanks!
Anyone who tried to naively store millions of files in a file system folder realizes at some point that listing files becomes horrible, there's no GUI tool that will handle that gracefully, and even on the CLI this is a very serious roadblock.
It's still fine for accessing files straight by name, and there must be ways to read each file sequentially, but the concept of folder merely becomes an arbitrary namespace and not something to handle a whole group.
The other obvious option is to shard the mail folders to ensure there's no more than X files in each folders, but that becomes pretty complex IMHO.
At the end of the day, a database is needed somewhere down the line.
How much space is wasted due to partially filled filesystem blocks? This is less important with today's workstation drives than it was 30 years ago, but perhaps still relevant on a single-board computer with limited flash storage, for example.
How does performance suffer from scanning a directory with millions of files, or if they're spread across multiple directories, from traversing the directories? Even if the delivery and user agents handle it well, what about the command line tools that would make one-file-per-message appealing? What if it's a network filesystem?
Filesystems can be chosen and tuned for their expected contents, of course, as usenet admins once did for news spools. But most users won't maintain a special filesystem just for email; they will expect it to work well on the same fs that they use for everything else.
With those considerations in mind, I can understand the appeal of multiple messages per file, whether it's a database or just plain old mbox format with a nearby index.
Neither approach seems strictly better than the other.
I also remember working with a windows email server that saved all emails only as files, no db, although the directory structure was more complicated. But that was maybe 20 years ago…
(Which I discovered when I ran into this project yesterday.)
via conversation on the mailing it seems there are currently two WIP Rust crates/libraries being developed to implement the spec--one by the primary spec writer & another by an "email-interested" :) 3rd party, developed independently (AIUI) in part as an exploration of library API design space.
How about something like this instead:
Fewer underscore also mean better readability to me.
We finally added a cache of some data in a dot-file (that'd just get blown away and recalculated if it failed a format check).
It made a very slightly enhanced POP3 server sufficient for a web frontend with good performance.
But all the changes to the Maildir was optional - any software that didn't support it could still operate on them and the missing bits would just get recreated.
I find a files-centric (and more broadly filesystem-centric) approach easier to grapple with than one that focuses on apps (and hiding away the data). It makes it much easier to access my own data for other purposes outside of what the app provides. In particular when the files are in plain-text or otherwise human-editable. I can reuse all of the existing tool that I'm familiar with to search, modify or re-purpose the data.
INBOX/2023-09-04_13:[email protected],GTfrlwJfN5vyR28R
INBOX/.meta/GTfrlwJfN5vyR28R.flags
instead of INBOX/2023-09-04_13:[email protected],GTfrlwJfN5vyR28R/message
INBOX/2023-09-04_13:[email protected],GTfrlwJfN5vyR28R/.flags
The latter structure would allow creating/deleting the message and flags atomically.
For example, finding all emails where I discussed the "DIV" tag with somebody:
grep --ignore-case --word-regexp DIV *.eml
Unfortunately, since most email is in HTML, this would match every email.
Example: https://birdhouse.org/beos/refugee/bemail.jpg via https://birdhouse.org/beos/refugee/ (which has some other images of Tracker organization with extended attributes)