home of the madduck/ blog/
Cleaning HTML attachments from emails

Dear Lazyweb: do you know of a tool that can scrub text/html attachments from email, which have the message also in a text/plain attachment? I know storage is cheap these days, but with large mail volumes, there is a noticeable difference between storing 2k emails and 8k of redundancy.

Previously, I would send auto-replies to get people to reconfigure their mailers, but that's not a battle I'll ever win, and there is no safe way to send auto-replies without annoying the living hell out of some people at some point. So I stopped and now want to switch to a local solution.

I am sure the Anomy Sanitizer, or tools like MIMEDefang could be coerced into doing this, but they come with a lot of cruft that I do not need or want.

Rather, I am looking for a lean and mean Unix-style pipe filter that will strip text/html parts if it has enough reason to believe that sibling text/plain parts carry the same information. It has to be capable of dealing with Unicode, or at least be 8bit-clean. And it has to work.

Any comments appreciated!

NP: Negua / A Way Out