I am still looking for a solution that allows me to tag my mails similar to what Gmail offers.
Having thought about this for quite a bit now, I can see three solutions, implemented at different levels in the mail flow: as filesystem, patch to the IMAP server (Dovecot), or patch to mutt. Since this post is about the filesystem, I’ll concentrate on that, but I’ll also touch upon the other two solutions briefly.
A bit of theory
First, let’s dig into the theory and requirements for a bit. Instead of sorting mails into folders (one mail can only ever live in one folder), I want to tag my mails, such that a single message can have multiple tags and hence be relevant to multiple categories. If you want to think in terms of folders, then a single message could be attached to multiple folders, e.g. via hard links. This is what Mairix approximates. The problem with this approach is that simple operations, such as deleting a message (removing it from the enclosing folder) or marking it as read (renaming the file) only affect the message in the current folder, not its duplicates/clones in the other folders.
So it’s best to stay with a single file per message and to associate tags with it. Here, two approaches seem plausible: (a) extracting tags from the individual message files, and (b) storing tags/message pairs in a separate database. I have a gut feeling that (a) is the better approach, especially since the files we’re dealing with all have the same format. And in fact, there is a semi-standard header for email messages to store such tags: X-Label.
One more paragraph of theory: instead of simple tags, like Gmail provides, I want hierarchical tags, for instance:
X-Label: Debian::pkg::mdadm
X-Label: Debian::popcon, PhD::Debian::stats
Hierarchical tags work similar to folders in that any message
tagged as Debian::pkg::mdadm also belongs to the
Debian::pkg and Debian categories. No
rocket science.
mutt
Now it would probably be trivial to hack mutt and
add a new layer of I mentioned earlier that this could be
implemented in various places, and I’ll start at the highest level,
with mutt, the mail reader. For quite a while now,
mutt provides search operators for the
X-Label header, such that you can search or limit your
mailbox e.g. like so: ~y \\<PhD::Debian\\> | ~y
\\<popcon\\>. The problem with this approach is that
it reuses existing functionality, which I need on a regular basis:
searching and limiting. Furthermore, I think this approach is
hackish and brittle and exposes too much of the underlying
workings.
filters, which would be used in addition to the user-specified
limit. On a high-level, the user would set a tag expression, like
PhD::Debian | ::popcon, and may additionally choose to
limit the display of messages to those sent from the
debian.org domain: ~f debian.org$.
mutt would then assemble the pieces and apply the
filter (~y \\<PhD::Debian\\> | ~y \\<popcon\\>)
~f debian.org$ to the current view.
With a reasonable user interface to specify or pick labels, as
well as a simple label editor, this would almost cut the mustard.
Thanks to the header_cache
patch (which also indexes the X-Label header),
mutt fares reasonably well on Maildirs with thousands
of messages (well, not
always), and binding keys like s (save) and
d (delete) to appropriate actions to add and remove
labels determined from the destination or the current tags filter,
would make mutt a force to be reckoned with.
Dovecot
Unfortunately, implementing this in mutt would mean
that I’d be bound to the mail reader forever (and I like to pretend
that’s not the case, haha!).
An alternative would be to implement this on the IMAP server, which already provides IMAP folders which are not necessarily bound to physical, filesystem folders and can intercept all commands that manipulate messages or their status.
But while in theory this works fine, it’ll break when offline tools, such as offlineimap come into play, as those (need to) instantiate every message they download, which brings us right back to the problem with duplicate/cloned files discussed before.
A filesystem idea
Trying to stay independent of the tools I use, I started playing with the idea of implementing mail message tagging at the filesystem level, using Fuse. The user would run a daemon process against a given Maildir holding all mail (the base store) and export a virtual directory hierarchy in which tags become Maildirs, backed by a cache of X-Label:filename pairs.
-
the root folder contains one Maildir dor every tag present in all messages in the base store.
-
each of those Maildirs, in addition to the
new/cur/tmpdirectories, provides a subdirectory for every tag minus the ones already in the path, recursively. Navigating to/PhD/Debian::pkgwould result in a logical-AND query for all messages taggedPhD(and thusPhD::*) andDebian::pkg. I am not sure how to implement OR and NOT queries yet. -
every Maildir in the entire hierarchy contains the messages satisfying the query inherent in the path.
-
the special directory
/INCOMINGcan be used to write messages to the hierarchy in such a way that the cache gets updated. Alternatively, one could registerinotifyhooks with the kernel on the base store and let others write to it as usual. -
the special directory
/::UNTAGGEDmight enumerate all messages without any tags.
The virtual directory hierarchy should also export the familiar filesystem semantics:
-
creating a file automatically adds the tags encoded in the file’s path.
-
copying, linking, and symlinking a file within the hierarchy are all equivalent and cause the tag associated with the destination to be added to the file.
-
copying, linking, and symlinking into the hierarchy should probably result in an error, unless the source is clearly an RFC822 message.
-
mkdircreates a new tag in the cache, but without any files associated with it. -
rmdirremoves tags without any associated files. -
rmon a file removes a tag from a file.rm -ron a directory removes the associated tag from all files (and the cache). -
moving files is equivalent to copy-remove, but needs to be atomic.
Unlike tagfs, which
uses an external SQLite database for tags, this
filesystem should not use an external database for tags, but rather
a run-time cache. That way, tags can be propagated to other
machines with IMAP synchronisation tools.
Initially, I thought such a fileystem could even be deployed “underneath” the IMAP server, but then we obviously run into the same problems with offline tools as before.
Potentially, such a filesystem could be incredibly powerful as
it could do its work for any user agent directly dealing with
Maildirs (such as mutt), and it could also be used as
basis for an IMAP server which is only ever accessed synchronously
(in online vs. offline mode). In the latter case, one would have to
ensure that filenames don’t change across IMAP operations, or use
the message IDs instead from the start.
But as good as it may sound, getting it to work right will require quite some attention, I think. Especially guaranteeing filesystem atomicity, as required for Maildirs, might be impossible through the Fuse layer.
The sad end
Unfortunately, I don’t have the time to implement any of the
above, but if I did, I would probably try to patch
mutt first, even though such a patch may not make it
into the upstream source any time soon. The filesystem idea sounds
cooler and cleaner, but it does not yet support OR/NOT queries, and
it will be quite a task to implement and debug. On the other hand,
dealing with Maildirs of several thousand messages in mutt is
a bit of a pain if
those Maildirs are being updated externally (e.g. mail is delivered
to them).
At least the idea is out now. If you are interested and want to work on this, please let me know so that I can link you up with other interested people.
And if you have any other input, please let me know. And no, I
don’t really want to install emacs to read my
mail.
NP: Dream Theater: Train of Thought
Update: I’ve created a mailing list for public discussion of this topic, so please subscribe if you’re interested.
Thomas Viemann suggested to use IMAP flags for the task, as
these are standardised. However, I see two show-stoppers here:
first, I am not sure whether it would be possible to set those
flags from procmail on delivery (sieve
can supposedly do it, but I consider sieve not
adequate for my needs). And second, offline tools, such as
offlineimap, which map IMAP folders to filesystem
Maildirs, would have exactly the same problem with representing
affiliation of a single message with multiple tags. You can read up
on the discussion in the
archives of the offlineimap mailing list, and soon the new
mailtags
list.
Martin Scholl had the idea to extend dbmail with the concepts of virtual folders (or stored searches). Naturally, as database-backed mail system, it would be trivial and fast to implement, but this approach bears two problems for me: first, it also falls short with respect to offline tools (see above), and second it would require me to replace my entire mail infrastructure. I am not even considering the implications of using a database for mail storage.
Update: I just found pytagsfs and have suggested to the
author to abstract the tags store, so that e.g. an
sqlite database can be used instead of using tags
stored in files (which means a lot of redundancy when you
have multiple files belonging to a collection, such as a music
album.

