No anti-UBM measure for SMTP-based Internet mail works.

You've come to this page because you've promoted an anti-UBM measure for SMTP-based Internet mail, claiming that

it works.

This is the Frequently Given Answer to such statements.

No anti-UBM measure for SMTP-based Internet mail works.

The fundamental flaw

There are many technical measures that have been adopted over the years in attempts to assuage the torrent of unsolicited bulk electronic mail sent via SMTP. These measures have all shared one fundamental flaw: Their designs incorporate

The senders of unsolicited bulk mail simply end up removing or changing this element, and the problem continues. In the long run, all that results is that SMTP becomes less usable and patchy, as all of these various blocks and the collateral damage from their false positives accumulate.

Moreover, several of these schemes have fundamental paradigm conflicts with one another, which only serve to balkanise SMTP-based Internet mail yet further.

Case studies

AOL and source routing in SMTP

One classic example of the block/adapt cycle is what happend with AOL in 1997, when it rejected messages where SMTP source routing was used in envelope recipient mailbox names (instead of, as per RFC 1123, simply ignoring the source route).

At the time, source routing was used by several (but not all) senders of unsolicited bulk mail. It was also used by a few senders of perfectly acceptable mail. Whilst AOL's rejection of such traffic was primarily aimed at encouraging the world to get rid of an archaic protocol feature (justifying the collateral damage to a certain extent), it was touted, as an incidental benefit, as a measure for stopping UBM. However, as such it failed (which can be attested to by looking around, all these years later, and noting that UBM hasn't stopped), and the only long term consequence is that that feature of SMTP has now fallen into desuetude.

This was precisely because the use of source routes isn't directly related to the qualities of being unsolicited and bulk.

Bayesian filters

Another classic example of the block/adapt cycle is that of Bayesian filters. Bayesian filters explicitly use a statistical approach. But "Matching a set of criteria statistically derived from previous messages" is not the same as "unsolicited bulk". Blocking mail with the former quality won't block mail with the latter.

The inevitable consequence of a statistical approach is that the false positive rate goes up as the false negative rate goes down, as can be seen from these EmmesTech statistics from 2005. The more successful that a Bayesian filter is at detecting "bad" messages, the more likely it is that it will mis-classify "good" messages as bad. The inverse is true, too. The fewer "good" messages that are mis-classified, the more likely it is that "bad" messages will go undetected.

Moreover there is a well-known logical next step for the senders of unsolicited bulk mail for defeating Bayesian filters. As Jeremy Bowen explained in 2002, all that the UBM senders need do is copy the Bayesian filters, by querying them with a test corpus of messages to see how they categorize messages as "bad" or "good", and then construct a message-generation tool that uses the inverse of the Bayesian filter probabilities to calculate, for any given message text, what words to incorporate into the message to make it least likely to be categorized as "bad" by the original filter.

As explained in 2003 by MCP, UBM senders have ways of remotely training their own copies of filters to match the Bayesian filters of the world, not least via "WWW bugs" in messages. But even if every MUA in the world adhered to the Good Net-Keeping Seal of Approval for MUAs and didn't enable "WWW bugs", all of the people who oh-so-proudly configure their SMTP Relay servers to reject messages during SMTP Relay transactions, based upon Bayesian filter results, are providing on a platter to the UBM senders a mechanism to replace "WWW bugs", that allows the UBM senders to detect whether messages pass or fail Bayesian filters and train their message-generation tools to match.

As M. Bowen explained, the "Bayesian arms race" is biased in favour of the UBM senders, who simply use Bayesian filtering against itself. Only early on will Bayesian filtering engender both low false negative and low false positive rates. Once the senders of unsolicitied bulk mail adapt, the rates creep up again.

Moreover, the more trained the filter becomes, the larger the vocabulary it offers for the UBM sender to construct messages that will pass through it successfully. As M. Bowen pointed out, "using English" is not the same as "unsolicited bulk", and the better a filter is trained, the larger the core linguistic corpus it has of neutral and positive vocabulary that an UBM sender can deduce (by training "average" filters) and then employ in unsolicited bulk mail. Blocking mail that doesn't "stick out" from this core corpus won't block unsolicited bulk mail, and will increasingly instead block wanted mail.

Challenge-response systems

"Sent by a sender who fails to respond to a challenge" is not the same as "unsolicited bulk". Blocking mail with the former quality won't block mail with the latter.

It has also been argued that challenge-response systems actually exacerbate the problem. Because they are autoresponders, the malicious can use them to cause challenge messages to be generated targetted at a third party. The irony is that by most definitions of "unsolicited bulk", these challenge messages themselves qualify as UBM.

Treating other ISPs' customers as third-class citizens

Some ISPs foolishly treat the customers of other ISPs as third-class citizens, and discriminate against them by refusing to provide SMTP Relay service to them.

Such schemes force the use of third-party SMTP Relay servers. This is a fundamental paradigm conflict with anti-UBM schemes that concentrate upon preventing people from relaying mail through intermediate SMTP Relay servers. (Put another way: If relaying mail through an intermediary is part of the problem, it cannot be the solution.)

Even aside from the foolish discrimination, the same fundamental flaw exists. "Sent directly by a customer of another ISP" is not the same as "unsolicited bulk". Blocking mail with the former quality won't block mail with the latter.

Treating one's own customers as third-class citizens

ISPs generally treat their own customers as second-class or even first-class citizens. But some ISPs treat their own customers as third-class citizens as well, and discriminate against them by refusing to allow them to connect to any other SMTP Relay (and sometimes also SMTP Submission) services apart from the ISP's own. This usually involves interception proxy SMTP servers, but sometimes involves simple outright removal of SMTP connectivity.

This has fundamental paradigm conflicts

It prevents customers from observing, diagnosing, and remedying any mail system problems. It's also a step back of about two decades to a time when people generally weren't part of directly-connected Internet.

Even aside from the foolish discrimination and the rather nasty way that this locks customers into their ISP for all mail services, the same fundamental flaw exists. "Sent directly by an ISP customer" is not the same as "unsolicited bulk". Blocking mail with the former quality won't block mail with the latter.

"web of trusted servers" authentication proposals

A complete web of trust is a fool's dream, and will not exist as long as people continue to be born and to die, and to change jobs in the intervening time. The idea is absurdly unscalable. Moreover, trust relationships are subtle, intransitive, and complex. Most "web of trust" schemes don't even account for the fact that C may not want to trust everything that B says about A.

In any case, "not signed by everyone whose hands it has passed through" is not the same as "unsolicited bulk". Blocking mail with the former quality won't block mail with the latter. Anonymity is not the problem.

Call-back verification systems

"Sent from a mailbox that can be demonstrated to be capable of receiving mail" is not the same as "unsolicited bulk". Blocking mail with the former quality won't block mail with the latter.

Anti-UBM schemes that enforce call-back verification of envelope sender mailboxes have a fundamental paradigm conflict with anti-UBM schemes that prevent UBM senders from performing "Rumplestilschen" attacks (a.k.a. "RCPT Harvesting"). In order to have call-back verification work, one has to undo all of the work that people did over the period of several years to ensure that UBM senders could not harvest mailbox names directly from SMTP Relay servers.

"Designated sender" systems

"Designated sender" systems such as SPF designate what SMTP Relay clients may transmit mail messages with particualar envelope sender mailboxes.

SPF is deeply flawed and requires changes to the mail architecture and the way that people interact with mail systems that are repugnant and that are also actually greater than would be involved in a switch to IM2000 Internet mail.

SPF also has a fundamental paradigm conflict with anti-UBM schemes that involve ISPs treating their own customers as third-class citizens. SPF allows ISPs to require that their customers send all mail messages, which have mailboxes that the ISPs manage as the senders, through themselves as intermediaries, locking their customers in to their own SMTP services. ISPs who treat their own customers as third-class citizens also lock their customers into their own different SMTP services, however.

Of course, "sent by an SMTP Relay client that one doesn't expect" is not the same as "unsolicited bulk". Blocking mail with the former quality won't block mail with the latter.

The road to take

The problem is an architectural one, and a quite simple one at that. It's cheap for senders to create and to transmit multiple copies of messages, and this puts them at an advantage with respect to recipients. SMTP-based Internet mail is confounded by this.

The architecture of SMTP-based Internet mail is based upon the architecture of physical mail, where it is not cheap for senders to create and to transmit multiple copies of messages. If it were cheap for senders to do this with physical mail, physical mail would suffer from the same problem. (Ironically, computer telecommunications can combine with physical mail to distribute the copying and transmission costs across a collection of senders and thereby reduce them to approximately the levels of SMTP-based Internet mail. This allows this architectural flaw to be demonstrated to exist in physical mail, as Alan Ralsky found out.)

The road to take is an architecture that actually takes advantage of the fact that sending is cheap.


© Copyright 2004,2010 Jonathan de Boyne Pollard. "Moral" rights asserted.
Permission is hereby granted to copy and to distribute this web page in its original, unmodified form as long as its last modification datestamp information is preserved.