M2dir: Treating mails as files without going crazy

ksherlock

BeOS stored mail as individual files with extended attributes holding the subject, date, sender, etc. The email app (BeMail) was used to view/compose/send email but inbox management was handled by Tracker (BeOS version of Macintosh Finder or Windows Explorer). But the Tracker window was configured to display the extended attributes instead of the file names. The actual filename wasn't even displayed. Nobody went crazy.

Example: https://birdhouse.org/beos/refugee/bemail.jpg via https://birdhouse.org/beos/refugee/ (which has some other images of Tracker organization with extended attributes)

tracker1

It's a somewhat interesting idea... I've had similar ideas in the past regarding maildir replacement without resorting to a db file. I like the idea of having directories representing email dir/folders, you generally will want some level of aggregation and/or search... I've thought that having separate eml (header + body) along with a .meta.json file for additional tagging/details (deleted flag, tags, etc).

Search is a very different story, you wouldn't want to have to do a full directory scan for text based search. So some level of indexing would be useful for a client mail service.

Similarly, I've thought it would be really cool if Cloudflare offered a TCP worker option, you could to a simple mail service backed by R2. The web ui/ux could be pretty awesome and geo distributed.

crtified

Vaguely related anecdote with no punchline.

Over a decade ago I developed, for our small (small enough to not have any IT dept or IT management) office a bespoke extension for Outlook (yes, bad idea, I know) which translated all incoming emails and attachments into the standard file system, decanted into project folders.

It was triggered upon any opening of an Unread email, and required the user to pick a project from a list, and hit OK. Cancelling was an option (for personal emails).

There was a config tab for the admin to define the filename string, by arranging elements like date/time/to/from/subject/.., and any attachments were also placed as files.

A very imperfect approach, but under the circumstances it was a vast improvement over the prior mess of individual mailboxes bestrewn with all manner of project correspondence and files, which made intricate queries about past doings into frustrating spaghettified detanglements.

And ultimately - perhaps like a good deal of IT - at heart was uninformed management, and the reality of ordinary users with little notion of information management.

inopinatus

The problem with any mailbox storage standard that is explicitly labelled “do not use this for delivery”, in this case because the author explicitly rejects (and in 2023 openly mocked) the concurrency and crash-resilience demands that Maildir seeks to offer, is that someone will inevitably use it for delivery.

Aloisius

For MacOS, extracting attachments into files is useful so that Spotlight can index them for search. I believe the same is true for Windows.

Mail.app, uses a directory structure that looks similar* to this for say, gmail:

    {account-uuid}/[Gmail].mbox/All Mail.mbox/{mailbox-guid}/Data/Messages/{msguid}.partial.emlx

    {account-uuid}/[Gmail].mbox/All Mail.mbox/{mailbox-guid}/Data/Attachments/{msguid}/{mime part #}/{mime subpart #}/filename.ext

The emlx format is a bit different from eml. It contains the number of bytes for the message at the top and an xml plist at the end that has message flags, last viewed time, gmail labels, etc. For partial.emlx files, the base64 content is removed from the email itself and a content length is added.

This format has its drawbacks, of course.

* Not shown is the hierarchy based on message uid used to keep the number of files in the Messages directory down.

graycat

Been thinking about this subject:

Of course, standard (usual, common) email is just text. Right for the pictures, to have them just as text, they are encoded as base64. Right, its MIME (MultiMedia Internet Mail Extensions).

Soooo, okay, my ISP (Internet Service Provider) has an email service. The service is a Web site, and it does offer getting the "Source", that is, the text, all as just one file.

Now, suppose for each email message I send/receive, I keep the text in its own file, with just the text, just as I got it from, say, my ISP. I will handle the file naming, indexing, summarization, etc.

Help!!!! Is there an email program that I can run that, for each of those files, can read it and display it? Sure, it should be able to display the text, as text, that is not one of the MIME extensions but also be able to do the right thing for each of the rest, still images, video clips, audio, whatever. Know of such a program???? Thanks!

makeitdouble

There seems to be nothing about performance and how to deal with file count within a directory.

Anyone who tried to naively store millions of files in a file system folder realizes at some point that listing files becomes horrible, there's no GUI tool that will handle that gracefully, and even on the CLI this is a very serious roadblock.

It's still fine for accessing files straight by name, and there must be ways to read each file sequentially, but the concept of folder merely becomes an arbitrary namespace and not something to handle a whole group.

The other obvious option is to shard the mail folders to ensure there's no more than X files in each folders, but that becomes pretty complex IMHO.

At the end of the day, a database is needed somewhere down the line.

foresto

What always nagged at me about the one-file-per-message approach is what happens when you accumulate many messages, perhaps by being on high-volume mailing lists, or never throwing anything away, or both. In particular:

How much space is wasted due to partially filled filesystem blocks? This is less important with today's workstation drives than it was 30 years ago, but perhaps still relevant on a single-board computer with limited flash storage, for example.

How does performance suffer from scanning a directory with millions of files, or if they're spread across multiple directories, from traversing the directories? Even if the delivery and user agents handle it well, what about the command line tools that would make one-file-per-message appealing? What if it's a network filesystem?

Filesystems can be chosen and tuned for their expected contents, of course, as usenet admins once did for news spools. But most users won't maintain a special filesystem just for email; they will expect it to work well on the same fs that they use for everything else.

With those considerations in mind, I can understand the appeal of multiple messages per file, whether it's a database or just plain old mbox format with a nearby index.

Neither approach seems strictly better than the other.

Gys

The mail protocol is plain text so it’s not difficult to save emails as individual files. I had such setup some years ago for a company. Emails were stored in one folder per week, each email in its own subfolder with attachments extracted and a meta text file. References were in a database.

I also remember working with a windows email server that saved all emails only as files, no db, although the directory structure was more complicated. But that was maybe 20 years ago…

AdieuToLogic

Whenever I see efforts to treat email as files, I fondly think of my time using nmh[0]. Until the pervasive use of multimedia email, nmh was a really nice way to communicate with email IMHO.

0 - https://www.nongnu.org/nmh/

follower

Note that it seems some details of the spec have changed since the blog post was written, so the linked blog post is slightly outdated/lagging behind the actual spec document here: https://man.sr.ht/~bitfehler/m2dir/

(Which I discovered when I ran into this project yesterday.)

via conversation on the mailing it seems there are currently two WIP Rust crates/libraries being developed to implement the spec--one by the primary spec writer & another by an "email-interested" :) 3rd party, developed independently (AIUI) in part as an exploration of library API design space.

jll29

One wonders why email isn't kept in a well-thought out directory structure since the beginnings of UNIX, given that almost anything is a file in UNIX, and especially given the power of UNIX text processing tools.

childintime

Wow, a spec with a filename with colons in it. Going crazy.

How about something like this instead:

[email protected]

Fewer underscore also mean better readability to me.

stephen_cagle

Are there other good alternatives for treating email as simple files that I can contrast this with? I'm quite surprised that this does not already exist? Are there hybrid approaches like FUSE filesystem to your email?

WillAdams

I've been somewhat surprised that there hasn't been an effort to re-work e-mail as a content management system --- incoming e-mails have all attachments stripped off and stored in a hierarchy based on sender/subject/recipient/date (and dupes discarded and replaced w/ a pointer) and replaced w/ links w/ the matching e-mail text stored as an editable wiki or similar marked up text, outgoing e-mails are synched up w/ the appropriate attachment and the wiki/marked up text updated based on the content.

vidarh

We did exactly this for Nameplanet in'99. Started with plain Maildir, the added more info in the filename as new mails were found, or status changed.

We finally added a cache of some data in a dot-file (that'd just get blown away and recalculated if it failed a format check).

It made a very slightly enhanced POP3 server sufficient for a web frontend with good performance.

But all the changes to the Maildir was optional - any software that didn't support it could still operate on them and the missing bits would just get recreated.

QasimK

I've been thinking about doing this myself, so it's fantastic to see a project.

I find a files-centric (and more broadly filesystem-centric) approach easier to grapple with than one that focuses on apps (and hiding away the data). It makes it much easier to access my own data for other purposes outside of what the app provides. In particular when the files are in plain-text or otherwise human-editable. I can reuse all of the existing tool that I'm familiar with to search, modify or re-purpose the data.

zokier

Is there a reason why metadata and the message are stored so separately? I.e. why

   INBOX/2023-09-04_13:[email protected],GTfrlwJfN5vyR28R
   INBOX/.meta/GTfrlwJfN5vyR28R.flags

instead of

   INBOX/2023-09-04_13:[email protected],GTfrlwJfN5vyR28R/message
   INBOX/2023-09-04_13:[email protected],GTfrlwJfN5vyR28R/.flags

The latter structure would allow creating/deleting the message and flags atomically.

adius

Can somebody please define a specification on how to store emails in SQLite? Seems to be the only sensible approach if you ask me.

igammarays

For searching I just use DevonThink. Works with either mbox or a directory/file structure. Instant full-text search, date-based filtering, and continuous re-indexing as I archive my email there monthly with command line tools for GMail and ProtonMail's import/export tool.

robertlagrant

I was hoping the mailing list link would be to an FTP site I'd upload my email to.

amelius

The HTML in email is already incompatible with standard tools.

For example, finding all emails where I discussed the "DIV" tag with somebody:

grep --ignore-case --word-regexp DIV *.eml

Unfortunately, since most email is in HTML, this would match every email.

MisterTea

Plan 9 has a similar concept called upasfs http://man.postnix.pw/9front/4/upasfs

clircle

What's the problem they solve ? I have never put a modicum of thought into how my messages are organized on my harddisk.

chriscappuccio

As a CLI fan, I'm interested in where this could go