home of the madduck/ blog/ ocat/
madduck's droppings - blogs previously filed under the geek category

This page exists to ease the transition since I migrated my blog to a new software. You are interested in the posts previously filed in the “geek” category, which are listed below.

My new blog can be found at http://madduck.net/blog. Future articles, which would have been filed as “geek”, are going to show up here as well. However, please watch this space as these transitional pages may disappear at some point.

Down and up and down and up

You are probably using DHCP on the machine currently in front of you. The “Dynamic Host Configuration Protocol” is a way for your computer to obtain an Internet address from a pool of available addresses, and to return it to the pool when you no longer need it. Basically every Internet service provider uses DHCP, or something similar.

As a network operating system, Linux has DHCP support (and had it for ages). In true Unix fashion, Debian sports at least four DHCP clients. Debian’s default is dhcp3-client, also known as dhclient.

The theory is that the client requests an address lease from the server and periodically renews it. This process yields a number of events, to which the operating system can react. For instance, initially, the client issues a PREINIT event to get the interface into a state where it can talk on the network, and a BOUND event as soon as it acquired a lease, or FAIL if it, uh, failed.

After a certain period of time, the client tries to renew the lease. If it succeeds, it issues a RENEW event; if it fails, it yields EXPIRE.

So much for the theory.

It seems that dhclient is rather stupid, which I tried to document in bug #459813 — it does things differently: given a lease, after a certain period of time, it just issues an EXPIRE event, which causes the operating system to deconfigure the interface and take down connectivity. Then, the client spits out a PREINIT event, followed by BOUND or FAIL, as appropriate.

I have not quite investigated what all this means, but this much is for sure: periodically, your machine goes offline, only to come back online a second later. If this were Windows, one would probably knock on wood and be glad that it works at all. But we’re on Linux here, Debian even, so this cannot be.

I’d love to be proven wrong, so if you have a minute, please try to verify. One way of doing so is to insert

echo "$(date) got $reason" >> /tmp/dhclient-script.reasons

towards the top of /sbin/dhclient-script and monitor the output file. Once your client renews, it should read:

Wed Jan  9 11:53:27 CET 2008 got RENEW

but instead you’ll see

Wed Jan  9 11:53:27 CET 2008 got EXPIRE
Wed Jan  9 11:53:28 CET 2008 got PREINIT
Wed Jan  9 11:53:33 CET 2008 got BOUND

and if you look closely enough, your interface will be unconfigured those seconds between EXPIRE and BOUND.

NP: The Flower Kings: @Live Recording, Uppsala City Theatre, Sweden, 10 February 2003

Posted Wed 09 Jan 2008 11:59:36 CET Tags: ?dhclient ?geek ?netconf
Surveys on the console

As part of my research, I may have to conduct a survey among Debian contributors. The word “survey” usually elicits frowns because surveys are often misconducted. MJ has taken the time to draft up some advice to surveyors.

Problems with surveys generally fall into one of two categories: content and presentation. I’ll refrain from making statements about content (Wikipedia has some stuff on questionaire construction) and instead concentrate on presentation in the following.

Commonly in the digital age, surveys are administered via a web page or e-mail. In my recent Ph.D. transfer report, I identified a number of shortcomings with these approaches:

Asking Debian contributors to click radio buttons on a web page is a bit like expecting a mountain biking champion to ride a tricycle across a paddock: painful, if not offensive. Furthermore, web surveys can only be taken while on-line, when most of us have better things to do.

E-mail surveys address some of these problems, but create new ones: answers cannot be constrained to a domain (think multiple-choice), character set and formatting issues make evaluation difficult, and it’s impossible to prevent users from attaching comments or modifying responses.

In thinking about the issue, I came up with a third means to administer a survey: a console tool. Think of a Debian package which provides a console application controlled by a study-specific data file. The data file specifies the questions and their answer domains, and the tool presents those to the participant. Since most of Debian happens on the console anyway, such an approach to surveys seems more appropriate.

Interaction with the survey tool would be as easy as pressing the 2 or 4 keys to select one of the multiple choices, and the tool would immediately move on to the next question (and not wait for the user to hit enter). Obviously, n and p should allow navigation back and forth across the set, and c would spawn a text editor to give the user a chance to attach a comment to his/her current response, in which s/he might criticise the question or provide additional information. Finally, the tool should be able to pick up where it left off, should the user chose to exit/suspend the survey for now. Integration with debconf or another interface abstraction is also worth consideration.

There is more to it: people change their minds and should thus be able to amend responses. With their consent, it might be valuable to track such changes and inquire about their motivations. As I was thinking about how to realise this, I suddenly arrived at version control: use Git as a backend storage. The set of cool features this would enable seems to be endless: it works off-line and can be used to track aforementioned changes, but also offers the possibility to create a squashed result in case the participant prefers to submit only the final result. Furthermore, it’s a trivial change between anonymous submissions, and submissions authenticated by a GPG signature.

In addition, the survey tool should be able to display questions according to previous responses (control flow). For instance, if the survey determines that a given user is a contributor to the bug tracking system, but not a project member, it wouldn’t make sense to ask when s/he received his/her Debian account. Furthermore, questions could be dynamically creatable from context, so that the survey can drill into depth depending on previous responses, rather than asking the same questions to all participants.

I am currently applying for funding to outsource the development of such a tool. If you are interested in coding it up and getting paid for it, speak to me. Here are some more specifications to keep in mind before jumping on:

These are likely to be incomplete, but should convey the basic picture. Feedback is always welcome!

NP: Oceansize: Frames

Update: James Andrewartha pointed me to purity, which asks multiple-choice questions on the console. It has the kind of interface which I envision.

Also, Chris Lamb suggested this personality survey as a base line. Well, actually he just suggested I look into it.

Posted Fri 14 Dec 2007 17:06:53 CET Tags: ?geek
Delaying mail delivery

My current mailfilter has two features which increase my day-to-day productivity:

  1. a “tickler,” which is a reminder system inspired by the tickler file component of David Allen’s Getting Things Done action management method: I can submit emails and notes to the tickler along with a timestamp in the future, and the tickler delivers the mail (or note) to my inbox when the timestamp has passed.

  2. the ability to delay certain types of mail (e.g. Debian mail is held until the weekend, while news items and the like are only delivered at night).

My current implementations work alright, but they’re brittle and crap. What’s worse is that the two features could be combined and handled by one and the same tool, but I implemented them differently for now:

The tickler consists of a Maildir, to which I can submit mails, either by mailing a note to a specific email address (currently broken, thus not linked from here), or by adding a tickle stamp (X-Tickle header) to a message with this script_, saving it to the local tickler mailbox and asking offlineimap to shove it to the server.

On this server, the tickler queue is regularly scanned by a script, which resubmits mail whose timestamp has expired to my mailfilter, where it is treated somewhat specially as a resubmitted mail.

After almost three months with this setup, I can identify the following shortcomings:

When I implemented the delay queue, I knew of these problems and went a different way: delayed mail is stored in a Maildir and a (msgid, timestamp, filename) tuple is inserted into an sqlite3 database on the mail server. This script_ regularly processes the queue.

If I just sent you the chills, at least we have the same taste. There are numerous problems with this approach, the foremost being that Maildir filenames are not guaranteed to be constant: mails jump between the new and cur directories, and tags, such as seen are encoded in the filename (thus, symlinks also cannot be used). My script now uses an ugly heuristic, which at least makes it work. I should investigate whether inodes could be used instead as I think those wouldn’t change throughout the lifetime of a mail, at least while it’s not moved between folders.

I initially considered just dumping messages to files and encoding the timestamp in the files’ mtime, but then I would not be able to access the queue with mutt in case I needed to fetch a delayed mail prematurely, or if I wanted to synchronise the queue with offlineimap as well.

The past few days, I’ve been condensing experiences from both approaches and am working out a new technique to combine both features. In essence, I think the database/index approach is the best, if I can figure out a way to uniquely identify mail message files, ideally across folders. Assuming I can use inodes for that, delayed mail would then be stored into the delayed Maildir and an (inode, timestamp) tuple saved into the database. Tickler mail would be stored along with all other mail in my store Maildir and would get a similar entry in the database.

This approach solves some problems and leaves others. Assuming I synchronise the store Maildir remotely (which I do), then I can easily fathom making modifications via IMAP which causes orphan records in the database (if only IMAP would allow me to store key=value pairs for mails…). Furthermore, I’d have to submit mails to the tickler by bouncing them to an email address, and deleting the local copy, unless I want duplicates. If now the mail is somehow dropped, I’ve lost mail.

Still unhappy about all of this, still searching for a better implementation, I’d appreciate any feedback!

NP: Luminous Flesh Giants: Duma I Upadek

Posted Thu 13 Dec 2007 16:40:12 CET Tags: ?geek
I am thinking of installing Windows

Did that title get your attention? Thought so. Now you have to read on as I explain!

I am working on my PhD thesis and despite a bunch of gimmicks in place to help me with discipline (ha!), such as a pretty sweet mail filter setup (in need of further improvement), I end up distracted too often. For me, it’s not the Internet which draws my attention, it’s rather just Debian. As a developer, whenever I encounter a suboptimality (there are no problems with Debian!), I end up fixing it, or at least filing a bug report. Or I’ll play around with my system and tweak things. I’d even argue that I am not playing around, actually, for I am good enough staying away from geeky fiddling and rather put proper fixes and enhancements in place. But those still consume time.

Thus the idea of putting a second system with Windows installed wandered through my head in search of something to connect with. Yes, Windows, for one simple reason: I hate using it. So the theory goes that with Windows, I might actually stick with working on the thesis, trying to avoid touching that disgusting system otherwise.

There are at least two (obvious) reasons why this isn’t going to happen, any time soon at least: first, I am sure that the devil in me would find ways to kill time, and if only by cheating to obtain access to a Real operating system. Whether this is true, only a trial would tell. The second, and more important reason is that I have an important deadline coming up and anticipating the lovely whooshing sound it’d make zooming by is not an option. To get Windows up and running, and to have it ready for service, with LaTeX and Git and all the other tools I use properly configured, I’d have to invest 2-3 days, at least. And those I cannot afford at this stage.

But the thought is now in my head. I hope you haven’t just lost all your respect. I may actually be kidding.

NP: Red Snapper: Prince Blimey

Posted Tue 30 Oct 2007 22:14:07 CET Tags: ?geek
Archiving web pages

Every now and then I encounter a web page which I’d like to archive for later purposes, e.g. because I can see that it might be related to my research, or simply because I want to keep the information until eternity.

I know of three ways to achieve this, but each of them has drawbacks:

So, dear lazyweb, how do you do it? I am aware that it’ll be trivial to hack up a little script which uses wget to obtain page and dependencies, create a file with stamp data and produce a frameset to put this data above the actual file, but before I spend time implementing the wheel, I’d like to make sure it doesn’t yet exist.

NP: The Flower Kings: Paradox Hotel

Update: Michael Stevens pointed me to furl.net, but I need a solution that I can use offline. Part of the reason that I want to archive web pages is because I don’t want to come back in a couple of months and find a webpage removed, a server dead, or a site like furl.net discontinued.

Jörg Jaspert introduced me to Scrapbook, a Firefox extension I’ll have to check out. However, even before looking at it: I really want files or directories on the filesystem, so that I can check them into version control and have them archived and replicated. I don’t trust any data to Firefox or any of its extensions, or anything under ~/.mozilla. This is also the reason why I Slogger doesn’t cut the mustard for me.

Matej Cepl wants me to use konqueror to archive them. I don’t want to install hundreds of megabytes for this feature, nor can I imagine that this feature will be much different or less cumbersome than Firefox’s ability to save pages. I hate the mouse.

Thanks for all your comments, regardless!

Update: Just found this post about archiving stuff like digg/delicious.

Update: Jörg pointed out that Scrapbook can save files outside the Firefox profile, but it’s still not what I am looking for: I want a single file, ideally, or a directory.

Marcos Dione points me to his Kreissy browser, which seems to take an interesting approach, but it also depends on KDE and as with konqueror, I don’t want to install a hundred dependencies.

Alfonso Ali mentioned Zotero, which uses an sqlite database and seems interesting, but databases are not suitable for storage in version control systems, so this one is out too.

Posted Wed 19 Sep 2007 11:16:49 CEST Tags: ?geek
It's all text, and bug-free!

Yesterday, I started porting some of my research notes to the Debian wiki. I wanted to use vim for the editing, and since the “Editus Externus” extension basically died, I went out on a quest and found It’s all text, which does the job quite nicely.

And it’s bug-free, at least when you configure it that way in the addon’s preferences dialog:

Checked item 'remove all bugs'

NP: The Flower Kings: Paradox Hotel

Posted Wed 19 Sep 2007 10:35:41 CEST Tags: ?geek
If you procmail, read this

I just had a hard time finding this excellent procmail resource on the web. I am thus blogging it for posterity, in case anyone is looking for procmail documentation, tips, tricks, a how-to, or anything else related to procmail.

And if you procmail and have not read the document, I suggest you do. It’s truly outstanding.

NP: A Silver Mt. Zion: He Has Left us Alone, but Shafts of Light Sometimes Grace the Corner of our Rooms

Posted Tue 21 Aug 2007 18:43:11 CEST Tags: ?geek
A new list: mailtags

I’ve received a fair number of responses to my previous blog post on labelling/tagging mail and have created a list for public discussion of the topic. Please subscribe if you’re interested in this topic.

I am also trying to get permission to repost some of the responses I got there. Furthermore an update of my blog post with some of these comments is pending.

Busy times…

NP: Dredg: Catch without Arms

Posted Thu 26 Jul 2007 09:15:38 CEST Tags: ?geek
A user-space filesystem for mail labeling

I am still looking for a solution that allows me to tag my mails similar to what Gmail offers.

Having thought about this for quite a bit now, I can see three solutions, implemented at different levels in the mail flow: as filesystem, patch to the IMAP server (Dovecot), or patch to mutt. Since this post is about the filesystem, I’ll concentrate on that, but I’ll also touch upon the other two solutions briefly.

A bit of theory

First, let’s dig into the theory and requirements for a bit. Instead of sorting mails into folders (one mail can only ever live in one folder), I want to tag my mails, such that a single message can have multiple tags and hence be relevant to multiple categories. If you want to think in terms of folders, then a single message could be attached to multiple folders, e.g. via hard links. This is what Mairix approximates. The problem with this approach is that simple operations, such as deleting a message (removing it from the enclosing folder) or marking it as read (renaming the file) only affect the message in the current folder, not its duplicates/clones in the other folders.

So it’s best to stay with a single file per message and to associate tags with it. Here, two approaches seem plausible: (a) extracting tags from the individual message files, and (b) storing tags/message pairs in a separate database. I have a gut feeling that (a) is the better approach, especially since the files we’re dealing with all have the same format. And in fact, there is a semi-standard header for email messages to store such tags: X-Label.

One more paragraph of theory: instead of simple tags, like Gmail provides, I want hierarchical tags, for instance:

X-Label: Debian::pkg::mdadm
X-Label: Debian::popcon, PhD::Debian::stats

Hierarchical tags work similar to folders in that any message tagged as Debian::pkg::mdadm also belongs to the Debian::pkg and Debian categories. No rocket science.

mutt

Now it would probably be trivial to hack mutt and add a new layer of I mentioned earlier that this could be implemented in various places, and I’ll start at the highest level, with mutt, the mail reader. For quite a while now, mutt provides search operators for the X-Label header, such that you can search or limit your mailbox e.g. like so: ~y \\<PhD::Debian\\> | ~y \\<popcon\\>. The problem with this approach is that it reuses existing functionality, which I need on a regular basis: searching and limiting. Furthermore, I think this approach is hackish and brittle and exposes too much of the underlying workings.

filters, which would be used in addition to the user-specified limit. On a high-level, the user would set a tag expression, like PhD::Debian | ::popcon, and may additionally choose to limit the display of messages to those sent from the debian.org domain: ~f debian.org$. mutt would then assemble the pieces and apply the filter (~y \\<PhD::Debian\\> | ~y \\<popcon\\>) ~f debian.org$ to the current view.

With a reasonable user interface to specify or pick labels, as well as a simple label editor, this would almost cut the mustard. Thanks to the header_cache patch (which also indexes the X-Label header), mutt fares reasonably well on Maildirs with thousands of messages (well, not always), and binding keys like s (save) and d (delete) to appropriate actions to add and remove labels determined from the destination or the current tags filter, would make mutt a force to be reckoned with.

Dovecot

Unfortunately, implementing this in mutt would mean that I’d be bound to the mail reader forever (and I like to pretend that’s not the case, haha!).

An alternative would be to implement this on the IMAP server, which already provides IMAP folders which are not necessarily bound to physical, filesystem folders and can intercept all commands that manipulate messages or their status.

But while in theory this works fine, it’ll break when offline tools, such as offlineimap come into play, as those (need to) instantiate every message they download, which brings us right back to the problem with duplicate/cloned files discussed before.

A filesystem idea

Trying to stay independent of the tools I use, I started playing with the idea of implementing mail message tagging at the filesystem level, using Fuse. The user would run a daemon process against a given Maildir holding all mail (the base store) and export a virtual directory hierarchy in which tags become Maildirs, backed by a cache of X-Label:filename pairs.

The virtual directory hierarchy should also export the familiar filesystem semantics:

Unlike tagfs, which uses an external SQLite database for tags, this filesystem should not use an external database for tags, but rather a run-time cache. That way, tags can be propagated to other machines with IMAP synchronisation tools.

Initially, I thought such a fileystem could even be deployed “underneath” the IMAP server, but then we obviously run into the same problems with offline tools as before.

Potentially, such a filesystem could be incredibly powerful as it could do its work for any user agent directly dealing with Maildirs (such as mutt), and it could also be used as basis for an IMAP server which is only ever accessed synchronously (in online vs. offline mode). In the latter case, one would have to ensure that filenames don’t change across IMAP operations, or use the message IDs instead from the start.

But as good as it may sound, getting it to work right will require quite some attention, I think. Especially guaranteeing filesystem atomicity, as required for Maildirs, might be impossible through the Fuse layer.

The sad end

Unfortunately, I don’t have the time to implement any of the above, but if I did, I would probably try to patch mutt first, even though such a patch may not make it into the upstream source any time soon. The filesystem idea sounds cooler and cleaner, but it does not yet support OR/NOT queries, and it will be quite a task to implement and debug. On the other hand, dealing with Maildirs of several thousand messages in mutt is a bit of a pain if those Maildirs are being updated externally (e.g. mail is delivered to them).

At least the idea is out now. If you are interested and want to work on this, please let me know so that I can link you up with other interested people.

And if you have any other input, please let me know. And no, I don’t really want to install emacs to read my mail.

NP: Dream Theater: Train of Thought

Update: I’ve created a mailing list for public discussion of this topic, so please subscribe if you’re interested.

Thomas Viemann suggested to use IMAP flags for the task, as these are standardised. However, I see two show-stoppers here: first, I am not sure whether it would be possible to set those flags from procmail on delivery (sieve can supposedly do it, but I consider sieve not adequate for my needs). And second, offline tools, such as offlineimap, which map IMAP folders to filesystem Maildirs, would have exactly the same problem with representing affiliation of a single message with multiple tags. You can read up on the discussion in the archives of the offlineimap mailing list, and soon the new mailtags list.

Martin Scholl had the idea to extend dbmail with the concepts of virtual folders (or stored searches). Naturally, as database-backed mail system, it would be trivial and fast to implement, but this approach bears two problems for me: first, it also falls short with respect to offline tools (see above), and second it would require me to replace my entire mail infrastructure. I am not even considering the implications of using a database for mail storage.

Update: I just found pytagsfs and have suggested to the author to abstract the tags store, so that e.g. an sqlite database can be used instead of using tags stored in files (which means a lot of redundancy when you have multiple files belonging to a collection, such as a music album.

Posted Tue 24 Jul 2007 21:18:47 CEST Tags: ?geek
Iceweasel/Firefox brings you the Windows experience!

For a while now, Iceweasel/Firefox comes with built-in phishing protection, which is undoubtedly a good thing given the number of idiots using the Web these days.

Screenshot of iceweasel's phishing protection warning

But the implementation is crap. If you surf to a phishing site, such as this test site, the page loads as you would expect and then the first strange things happen:

I get similar performance problems on the StaTravel website, and on map.search.ch, it almost always crashes.

Thank you, Firefox developers, for bringing the joys of the Windows world to Linux!

NP: Dream Theater: Metropolis Pt 2: Scenes from a Memory

Update: several people have responded that they cannot reproduce the problem. I thus created a new profile, uninstalled all system-wide extensions, removed all plugins such that about:plugins was empty and tried again. The problems persist, although I think the lags are not quite as long as before and the memory consumption is obviously down. Maybe this is amd64-related?

Update: It’s not amd64-related, as several users have pointed out.

Update: still no luck, but James Andrewart pointed me to this bug report, which might improve things a bit:

“The new protocol specifies a single lookup algorithm for all tables, rather than having per-table logic. This lookup logic was moved in to the db service from the javascript.

URL canonicalization was moved completely into C++ too. The DB service can now handle a query from a raw URI, which will be needed for malware blocking.”

Posted Tue 24 Jul 2007 13:16:08 CEST Tags: ?geek