Some of what is said about HTML messages is wrong.

You've come to this page because you've made an erroneous claim about HTML messages, similar to one (or more) of the following précis (which are expanded in full further on):

These are the Frequently Given Answers to these claims.

These claims are highly inaccurate. Many of their falsehoods stem from improper analysis, and placing the blame for a particular problem on HTML messages when the proper place for blame lies elsewhere. (Alas! Sometimes the myth simply arises out of sheer bigotry and ignorance.)

These myths are often used to justify an Animal-Farm-like

text/plain good, text/html bad.
mantra and a blanket refusal of HTML. However, a reasonable analysis shows that HTML is a useful tool, that has its place in messaging alongside other tools and that should be used when it is the right tool for the job, and that the effort expended in decrying HTML would be better spent in combating the true causes of the problems for which HTML is incorrectly assigned the blame.

The myth about HTML not being text.

HTML isn't text.

In fact, HTML is text.

Even aside from the fact that the "T" in "HTML" stands for "text", the MIME type of an HTML body part is text/html. As RFC 2046 § 3 unequivocally states, a message body part with a text/* type contains "textual information".

An HTML document contains markup that provides logical information about the document structure and directions on how it is to be displayed. But the actual text content of an HTML document is not encoded, and the markup is not binary. In conformance with what RFC 2406 says about text/* types, one need not have a text/html viewer that understands HTML in order to "get the general idea of the content".

The myth about HTML messages being a violation of Internet standards imposed upon us by Microsoft.

HTML messages are violations of Internet standards that are forced upon us by Microsoft.

This is a falsehood in every respect. Microsoft Outlook and Microsoft Exchange are some of the more well-known sources of HTML messages, but Microsoft didn't invent HTML. Nor did Microsoft invent MIME, the framework used to encapsulate body parts of various types (including HTML ones) within Internet messages. (Indeed, those aren't even the only MUAs capable of generating HTML messages. Netscape Messanger can, for example, as can pine; neither of which were written by Microsoft.)

Both MIME and HTML are Internet standards. And, moreover, it is entities other than Microsoft that are responsible for them. MIME is defined by a suite of Internet standards documents (RFCs 2045, 2046, 2047, 2048, and 2049) produced by an IETF working group, and HTML is defined by the World Wide Web Consortium.

The myth about HTML messages being dangerous.

HTML messages are dangerous. Both plain text and HTML mail may carry malicious executable attachments but with HTML there is a significantly greater risk since some malware can exploit vulnerabilities in the HTML parser to automatically execute code as soon as the message is viewed.

This is one of several examples of the perpetuators of these myths blaming completely the wrong thing. HTML mail isn't dangerous, as can be easily demonstrated by viewing an HTML message, even a "malicious" one, with a program such as more. Nothing untoward happens.

The message format is not dangerous. It is the message viewers that are dangerous in this particular regard.

But exploitable message reading vulnerabilities, that allow execution of sender-specified programs, can exist in the viewers for any non-trivial data format, as anyone who remembers "ANSI bombs" will attest. HTML is not special in this regard. It is foolish both to blame the data format for a flaw in one of its viewers; and to blame a particular data format for being associated with vulnerabilities when in fact this type of vulnerability is generic to viewers, for any non-trivial document format, that are intended to view data from potentially hostile sources.

The myth about HTML messages wasting bandwidth.

HTML messages waste bandwidth. Look at the source code of any HTML message and after the headers you'll see the message body is always duplicated, once in plain text and once in HTML. So HTML messages are always at least twice as big as plain text only, and they can be many times larger.

This is one of several examples of the perpetuators of these myths blaming completely the wrong thing. Moreover, it is a falsehood. If one looks at the raw data of many HTML messages one does not always (or even often) find duplicated message bodies.

The origin of this myth is the behaviour of Microsoft Exchange, which, when converting messages from its own internal message format into Internet message format for sending out over SMTP, creates a multipart/alternative body containing two renderings of the original "rich text" message, one as text/plain and the other as text/html.

Ironically, Microsoft Exchange is actually following the spirit of RFC 2046 § 5.1.4 here. That section even uses as its very example the sending of different alternative text/* bodyparts in order to allow readers to view whichever one their MUA is capable of displaying best. If sending the same thing in multiple formats as a convenience to the recipient is a bad thing to be doing, it's a bad thing that is described and promoted by an Internet standards document. (This is another fact that refutes the myth that Microsoft invented and promoted this whole notion of HTML messages.)

However, the behaviour of Microsoft Exchange is not the case for all MUAs and MTSes, especially for those that adhere to the GNKSoA:MUA requirements. Netscape Messenger, for example, presents the sender of an HTML message with the option of sending both HTML and "plain text" versions (as a multipart/alternative message), or of just sending a text/html message body without an alternative. If one looks at the raw data of an HTML message that is sent as "HTML only" by Netscape Messenger, one does not see the same content in two different formats. One sees just the one, text/html, version of the content and the argument presented by the myth thus falls apart.

Moreover, an argument equally as valid as the one propounded by the myth is that given that "HTML and 'plain text'" messages are "at least twice as big" as "HTML only" messages, it is text/plain that is the waste of bandwidth, and that one should send messages as HTML only. The knife cuts both ways.

In fact, HTML is one of a range of options available for sending messages, and there are circumstances where it is the most appropriate tool for the job (i.e. where the message contains complex formatting codes and structure but the exact layout is not important), and circumstances where it is not.

The real criticism should not be levelled at HTML at all, but at people who don't choose the appropriate format for what they are actually sending, and at people who follow the example set by the Internet standard and send the same content in multiple formats. If anything, it is multipart/alternative that is the problem here, not text/html. As has been already said, the myth is blaming the wrong thing.

The myth about universal legibility.

HTML messages don't always work. Some popular messages readers simply don't read HTML mail.

The error in this myth can be more clearly seen if it is presented in the more personal form in which it often occurs:

I cannot read HTML messages with my message reader.

The immediately obvious response to the person propounding this myth is that they should use a message reader that can. The rest of the world should not be denied the use of a perfectly legitimate tool just because they have problems. But in fact the situation is even more subtle than that.

Anyone who claims that they "cannot read HTML" using their message reader is

since RFC 2046 § 4.1.4 clearly states that a message reader that doesn't specifically understand text/html should treat it as if it were text/plain. If they comply with RFC 2046 § 4.1.4, message readers can be used to read HTML. (If they don't comply, they should be fixed. Note that message readers that don't understand MIME actually comply in this regard, since they do, effectively, treat the message body as if it were text/plain.)

RFC 2046 § 4.1 states of text/* types that

an interesting characteristic of many such representations is that they are to some extent readable even without the software that interprets them.
This is true of HTML. A text/html message viewed as if it were text/plain will have the actual text content visible. It will simply be unformatted and interspersed with the markup.

Of course, it's worth pointing out that no message body part type is universally legible, not even supposed "plain text" (Consider the possibility of the sender and recipient not sharing a common character set.). So the idea that only universally legible body parts should be allowed in messages would, if put into practice, outlaw all bodypart types.

A good example of a lack of universal legibility not actually being the problem that this myth makes it out to be is the PNG format for images. Not all message readers can display PNG images. Yet the proponents of this particular myth don't go around asserting that PNG body parts in messages should be banned because some popular message readers cannot display them. (Indeed, experience shows that they often use the very same response as above, and assert that the recipient should obtain a message reader that can display PNG files.)

The myth about HTML messages "connecting to Internet".

HTML messages can connect to Internet by themselves. If you're off-line, opening an HTML email containing images may (by default) open a connection to Internet.

This is one of several examples of the perpetuators of these myths blaming completely the wrong thing. It's also a mis-statement. This is not a problem with HTML messages. This is a problem with the message viewers, which has to be dealt with for any compound document format, not just for HTML. Indeed, it's only a problem with some, badly designed, message viewers that don't adhere to the the GNKSoA:MUA requirements.

Moreover, it is not the HTML messages that "connect to Internet by themselves". HTML messages are passive entities. It is the (badly designed) message viewers that connect to Internet.

The problem is that the (badly designed) message viewers don't properly "sandbox" message rendering, but instead will fetch data for display that weren't actually contained within the message itself. If an HTML body part contains an external reference (to an image or to a frame, say) the viewer will attempt to connect to an HTTP server and fetch the object referenced for display.

The irony is that this sort of misfeature is the price that one pays for monolithic "integrated applications", the concept that Marketing Folks would have us believe is a good thing.

For example: Netscape Messenger is a web browser, a NUA, and an MUA. It uses the same engine for displaying HTML body parts in news and mail messages as it does for displaying web pages. As a consequence, an external reference from an HTML message will cause the engine to attempt to connect to an HTTP server and fetch the object referenced.

For another example: Although not superficially integrated, Microsoft Outlook is in fact integrated with Microsoft Internet Explorer behind the scenes. It uses Internet Explorer's display engine to display HTML messages. Again, as a consequence, an external reference from an HTML message will cause the engine to attempt to connect to an HTTP server and fetch the object referenced.

However, the fault here is in badly written viewers. A well-written viewer for HTML messages will "sandbox" message rendering, and not fetch data for display from anywhere apart from the message contents themselves. Such well-written viewers do exist in NUAs and MUAs. The internal text/html viewer in pine, for example, does this sort of "sandboxing".

Criticising a data format for the fact that "integration" has been taken too far in message viewers is blaming the wrong thing. The right thing to do is to complain to the vendors of the particular message viewers and have the softwares fixed – and to return them for a refund and replace them with other, better, softwares if they are not.

The myth about HTML messages rendering slowly.

HTML messages render slowly. Some mail apps can slow down considerably when rendering HTML.

This is one of several examples of the perpetuators of these myths blaming completely the wrong thing. It's also an example of the false notion that one can obtain something for nothing. The fact that complex data formats take more CPU power to render than trivial ones is not confined solely to HTML. And if one needs text to be formatted in a certain way, one has to perform the calculations to do so.

(PDF files render "slowly", too, for example, as do wordprocessor documents. It's simply the nature of the beast when a display engine has to actually process formatting information and lay out text, rather than just splat out one character after another without having to deal with fonts, tables, styles, paragraphs, lists, and whatnot.)

The myth about HTML messages causing code bloat.

The need for an HTML parser has led to code-bloat in email apps generally.

This is, at best, a mischaracterization. The need for handling the recoding necessitated by the charset parameter to text/plain messages has also led to an increase in the size of message readers, as has the need for handling PNG images.

Calling this "code bloat" is a subtle propaganda technique. This isn't "code bloat". "Code bloat" is where the program is unnecessarily large because it is inefficient, poorly written, or full of unused junk.

Rather, this is merely a simple truth: If one wants a message reader to handle a particular set of data formats, one has to have code to handle displaying those data formats. The more distinct data formats there are, the more code there will have to be.

The size of message readers has grown because the different sorts of stuff that there can actually be in messages has grown. Messages can now contain text in different character sets that has to be transliterated, images, sounds, and text with formatting instructions.

The myth about HTML imposing font and colour choices.

HTML messages are not always reader-friendly. HTML allows the sender to use unreadably small or non-standard fonts, clashing colours, badly formatted images and sometimes there is no quick or easy way for the reader to adjust the appearance to their choice.

In fact, the original intention of HTML was precisely that the document format specified just logical styles and the viewer allowed the user to specify which actual fonts and colours those styles would map to. To quote Composing Good HTML:

HTML provides a device-independent way of describing information. The elements of HTML describe what your information is, not how it should be displayed.

The fact that some HTML viewers don't adhere to this principle, and don't give full control over fonts and colours to the user, is not a fault in HTML, but a deficiency in those particular viewers. The correct target for complaint should be the manufacturers of such deficient viewers.

If one wants specific fonts and colours, HTML is not (originally intended to be) the correct tool for the job, and one should employ one of the other options available for sending messages.

It's also worth noting that it's not just text/html that allows the sender to make a message difficult to read. There are lots of tricks that one can play even with text/plain if this is the goal, such as

The myth about digests

HTML messages cause problems for mailing list digests.

This myth is very vague and comprises little more than merely hand-waving. No detailed explanation of it has ever turned up. HTML messages cause unspecified problems for mailing list digests somehow.

In fact, HTML messages don't cause problems for mailing list digests. The standard format for mailing list digests is the multipart/digest content type, defined in RFC 2046 § 5.1.5. This has no problem whatever with HTML messages. (It's entirely neutral with respect to the content types of the individual messages in the digest.)

The myth about a one-to-one correlation with UBM

All HTML messages are unsolicited bulk mail.

This is, for those people who assert it, a self-fulfilling assertion. They encourage people not to send HTML messages to them, because "all HTML messages are unsolicited bulk mail", and as a consequence they receive little to no HTML messages that are not unsolicited bulk mail.

But in fact this is a classic example of an anti-UBM measure for SMTP-based Internet mail that identifies the wrong thing. "Has a body with the text/html content type" is not the same as "unsolicited bulk".

Moreover, this is not even an inherent property of HTML messages. The same could be said of, say, messages written in Japanese. "All messages in Japanese are unsolicited bulk mail. Don't send messages in Japanese to me!" cry people, and soon their assertion becomes self-fulfilling, for them.

The simple truth is that people can and do send HTML messages which are not unsolicited bulk mail. (The author regularly receives HTML messages, that are not UBM, from several correspondents.) HTML messages are one of the range of options available for sending messages.

The myth about people who filter HTML messages

People filter HTML messages. So don't send them.

People filter mail on all sorts of (often spurious) grounds. If one were to stop sending messages that contained things that people filtered, one wouldn't be able to send any type of mail at all.

The fact that recipients choose to filter HTML messages is no-one's problem but their own, and is an entirely self-inflicted. Senders shouldn't adopt this problem.


© Copyright 2004-2004 Jonathan de Boyne Pollard. "Moral" rights asserted.
Permission is hereby granted to copy and to distribute this web page in its original, unmodified form as long as its last modification datestamp is preserved.