If you want to see the list of sections in this document, to more easily navigate to a specific section, enable the navigation bar support in your WWW browser.

"mbox" is a family of several mutually incompatible mailbox formats.

You've come to this page because you've talked about

the

"mbox" mailbox format.

This is the Frequently Given Answer to such definite articles.

"mbox" is actually a family of several mutually incompatible mailbox formats. Different tools support different formats (often without clearly specifying which format they support), and great care must be taken when using different tools that support different formats on a single mailbox file.

With the advent and now widespread adoption of the superior Maildir format over the past several years, the entire "mbox" family of mailbox formats is gradually becoming irrelevant, and of only historical interest.

Features common to all of the "mbox" formats

All of the "mbox" formats store all of the messages in the mailbox in a single file. Delivery appends new messages to the end of the file.

Each message is preceded by a From_ line and followed by a blank line. A From_ line is a line that begins with the five characters 'F', 'r', 'o', 'm', and ' '.

Notes:

By convention, the 'From ' in a From_ line is immediately followed by:

  1. the envelope sender mailbox name, without whitespace characters

    If there was no envelope sender, by convention the mailbox name used is MAILER-DAEMON. Whitespace characters in the envelope sender mailbox name are by convention replaced by hyphens.

    Historically, the envelope sender was a UUCP "bang path".

  2. the date that the message was delivered to the mailbox, in the Standard C asctime() format (i.e. in English, with the redundant weekday, and without timezone information)

    For best results, and for the same reasons that filesystem formats store file timestamps in UTC, the delivery date should be treated as the UTC time of delivery.

    Some specifications attempt to change existing practice by fiat, redefining the date to be in the form of a date-time token from RFC 2822.

  3. other, arbitrary data

The "mboxo" mailbox format

The "mboxo" mailbox format is the "original" System V mailbox format.

The "mboxo" mailbox format uses irreversible "From quoting" that corrupts messages. Before a message is appended to a "mboxo" mailbox file, it is transformed. Any line of the message, in either the header or the body, that begins with the five characters 'F', 'r', 'o', 'm', and ' ' has a single '>' character prepended to it. This transformation is irreversible because it is impossible to distinguish, when reading a message, a line that began '>From ' in the original message from a line that began 'From ' in the original message and that was subsequently transformed.

The substitution command for this transformation is 1,$s/^From />&/.

To locate the start of the next message in an "mboxo" format mailbox, one scans forward for the next From_ line. There is no next message if the end of the file is reached.

When reading each message from an "mboxo" format mailbox, one strips off the trailing blank line.

The "mboxrd" mailbox format

The "mboxrd" mailbox format is named after Rahul Dhesi, who was one of a number of people who invented the same idea roughly simultaneously. (Tim Goodwin said on 1996-08-09 that he first implemented the same idea on 1995-04-04, for example.) The earliest recorded version of Rahul's proposal is this one from 1995-06-24.

The "mboxrd" mailbox format was designed with reversible "From quoting", to solve the message corruption problems inherent in the "mboxo" format. Rahul Dhesi said that mail softwares could be incrementally revised to employ "mboxrd" instead of "mboxo". And indeed it has been adopted to a certain extent. qmail switched from "mboxo" format to "mboxrd" format on 1996-03-02, for example. However, and somewhat amazingly, close to a decade later this adoption has not been as universal as was predicted. Postfix, as of its 2004-08-29 version, still uses the "mboxo" format. (This is ironic when one considers that Postfix is in fact Rahul Dhesi's MTS software of choice.)

Before a message is appended to a "mboxrd" mailbox file, it is transformed. Any line of the message, in either the header or the body, that begins with zero or more '>' characters followed by the five characters 'F', 'r', 'o', 'm', and ' ', has a single '>' character prepended to it. The substitution command for this transformation is 1,$s/^>*From />&/.

When a message is read from a "mboxrd" mailbox file, it is transformed back. Any line of the message, in either the header or the body, that begins with one or more '>' characters followed by the five characters 'F', 'r', 'o', 'm', and ' ', has the single leading '>' character removed from it. The substitution command for this transformation is 1,$s/^>(>*From )/\1/.

To locate the start of the next message in an "mboxrd" format mailbox, one scans forward for the next From_ line. There is no next message if the end of the file is reached.

When reading each message from an "mboxrd" format mailbox, one strips off the trailing blank line.

The "mboxcl" mailbox format

The "mboxcl" mailbox format is one of the "new" System V mailbox formats. The mutt MUA attempts to convert "mboxo" and "mboxrd" mailboxes to "mboxcl" format.

The "mboxcl" mailbox format uses irreversible "From quoting" that corrupts messages. Before a message is appended to a "mboxcl" mailbox file, it is transformed. Any line of the message, in either the header or the body, that begins with the five characters 'F', 'r', 'o', 'm', and ' ' has a single '>' character prepended to it. This transformation is irreversible because it is impossible to distinguish, when reading a message, a line that began '>From ' in the original message from a line that began 'From ' in the original message and that was subsequently transformed.

The substitution command for this transformation is 1,$s/^From />&/.

The "mboxcl" mailbox format does not use From_ line scanning. Instead, each message contains a Content-Length: header that denotes the length of the message body, after transformation, in octets. This header is added to the message when it is added to the mailbox, and used to locate the start of each next message.

Notes:

When reading each message from an "mboxcl" format mailbox, one strips off the trailing blank line.

The "mboxcl2" mailbox format

The "mboxcl2" mailbox format is one of the "new" System V mailbox formats.

The "mboxcl2" mailbox format uses no "From quoting" at all. A message is delivered to a "mboxcl2" format mailbox file as it stands, without transformation. Messages are not transformed when they are read, either.

The "mboxcl2" mailbox format does not use From_ line scanning. Instead, each message contains a Content-Length: header that denotes the length of the message body in octets. This header is added to the message when it is added to the mailbox, and used to locate the start of each next message.

Notes:

When reading each message from an "mboxcl2" format mailbox, one strips off the trailing blank line.

Incompatibilities

These are some of the consequences of the incompatibilities between the various formats in the "mbox" family.


© Copyright 2004,2010 Jonathan de Boyne Pollard. "Moral" rights asserted.
Permission is hereby granted to copy and to distribute this web page in its original, unmodified form as long as its last modification datestamp information is preserved.