Proper HTML

Clean, strict markup is the best way to go.

Shop for web design and development books here.

What is “proper” HTML? Is it strictly standards-based? Or is it “de facto” based on the most popular browser? What difference does it make? Is it worth the effort? Why’s there still so much legacy markup around, and what’s wrong with it?

A few years back, I started converting all my HTML work to XHTML-1.0 Strict and running it through the validator. It kept saying things like, “There is no such tag ‘WIDTH’” and “There is no such tag ‘FONT’” and so forth. Thus, I learned what is Strict and what is Transitional, and after a while it started to make sense. Now it makes a lot of sense.

Many good HTML coders learned HTML way back in the 90’s, when tags were UPPERCASE and unclosed, and proprietary attributes were ammunition in the browser wars of the decade. But now it’s time to catch up.

The Myths

It’s amazing how myths, traditions, and ideologies proliferate, and HTML markup is no exception. Let’s dig around in a few of them, just for fun.

  • Proper markup must separate presentation from content. Baloney. Can’t be done! CSS helps a lot, but it’ll never remove all presentation (particularly layout), unless we want the web to become completely unformatted text.
  • XHTML is better than HTML. Nonsense. XHTML doesn’t fix anything, and doesn’t add anything that can’t be done with proper, strict HTML. In fact, all XHTML adds is that goofy extra space and slash. There is no need whatsoever for HTML to be enclosed in an XML wrapper, and the Consortium agrees: they have stopped work on XHTML2 and are pursuing HTML5, and that will fix a bunch of HTML’s shortcomings.
  • Tables are not for layout. Baloney. As I demonstrate in other articles, tables may be fully CSS-controlled, with no deprecated markup, and the Consortium is by no means even considering abandoning tables. Indeed, layout tables should not be deeply nested, and care should be taken to ensure linear accessibility, but the same caveats (and more) also apply to floated div layouts.
  • Fixed-width layouts are easier and faster. Nonsense. Fluid layouts are easier to build and maintain, and clients prefer them. Most web sites are fixed-with right now simply because it’s a fad created by certain books, software, and million copy-cat templates. Unfortunately, this fad leaves new devs devoid of the superior fluid-width design techniques.
  • As long as the browser renders it, what difference does it make? Well, for starters, future browsers will gradually drop support for attributes and unclosed tags. The XHTML craze has caused a lot of devs to clean up their markup, but alas, tons of it remain on the web. The Consortium has been saying for years that all new pages should be written to HTML 4.10 Strict, and for good reason. That is still state-of-the-art markup, and pages so written will more easily transition to -5.
  • Strict is too unforgiving! Nonsense. Writing in Strict, with good CSS, is so much easier than Transitional with all it’s goofy inline attributes that there’s no comparison. Besides, Transitional (both HTML and XHTML) is no better than no doctype at all. I can’t even remember all that bgcolor cellpadding cellspacing bordercolor font align nonsense—and good riddance!

Well, YMMV… And it’s all what you’re used to… But if you want to go Strict, here’s a few pointers.

Going Strict

  • Always use a doctype:
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
     "http://www.w3.org/TR/HTML4/strict.dtd">
    Failing to use a doctype makes browsers render in quirks mode. Strict markup won’t work right without a Strict doctype, and various things are broken in various browsers in quirks mode (i.e., strict isn't perfect, but quirks is completely unpredictable across various browsers).
  • Don’t mess with XML prologs or XML namespaces or anything xml unless actually writing xml for a server that’ll actually serve xml to a client that’ll actually render xml. RSS is one good example—and a web page is not.
  • Don’t use UTF-8 encoding unless the editor is actually saving UTF files and the CSS is calling Unicode fonts. iso-8859-something is still fine for most purposes.
  • Lower-case all markup. If it ain’t in quotes, it’s all lower-case. Except in the DOCTYPE.
  • Close all tags that have a matching closure. That’s everything but meta, link, input, img, br, and hr. Unclosed p and li of yesteryear will work in quirks mode (no doctype), but a bunch of other stuff will break. Also, forget XHTML and the goofy space-slash closure.
  • Don’t put any extra attributes in tags except class, style, name or id. Take everything possible to CSS. There are a few exceptions, such as colspan and rowspan, for which there are no CSS alternatives, and those are not deprecated. A very few others, like target="_blank", are easily handled with a bit of Javascript. The Absmiddle effect is handled with a two-cell CSS-controlled table.
  • Use CSS to override the defaults for every HTML tag used. Why browsers still default to 1995-style output is a mystery, but thanks to CSS we don’t have to live with it. Doing so, we can make a lovely, custom web page with almost no extra markup. Think of CSS as a way to reprogram the browser’s rendering engine, then just feed it clean HTML.
  • Some folks want to give every element an id then write CSS for it. Inefficient! An id may be used only once per page, but a CSS class can be used as many times as desired. Indeed, a .class, not tied to a tag selector, may be used with all tags to which the rule may apply. Also, the stock HTML tags may be CSS’d as defaults, eliminating much markup in the page.
  • Learn to use the named character entities like &mdash;, and forget numeric entities.
  • Master CSS-controlled tables. They reliably solve a lot of problems and are not so messy once all those deprecated attributes are stripped out.
  • Web 2.0, AJAX, and DOM manipulation are all the rage. I say, if you really need it, use it; if you really don’t, don’t. Most web pages really don’t need to act like “live” applications. And most users are confused and annoyed by all this expanding/collapsing nonsense. Sorry, but I’d rather just roll the mouse wheel and scroll.
  • Learn to write HTML for accessibility. Images need meaningful alt text, tables needs summaries, and links need titles. Hidden jump links are a good idea, too.
  • The validator isn’t perfect, but it’s a whole lot better than nothing. Learn to use it and trust it. When you finally get that green banner, you know you’ve learned something, and have the reward of knowing you’re doing pretty good work.

Some Examples

Let’s take some deprecated markup and fix it.

<HTML><HEAD><TITLE>Deprecated Tag Soup</TITLE></HEAD>
<BODY BGCOLOR=WHITE LINK=0000FF ALINK=FF0000 VLINK=FF00FF>
<H1>Bla bla bla.</H1>
<P>Some text.<P>Some more text.<P>Don't close me or else!
<DIV ALIGN=CENTER><CENTER>
<TABLE BGCOLOR=FFFFFF CELLSPACING=2 CELLPADDING=4 BORDERCOLOR=BLACK BORDER=1>
<TBODY>
<TR>
<TD ALIGN=LEFT VALIGN=TOP><FONT SIZE=2 FACE=ARIAL COLOR=0>Bla Bla Bla</FONT></TD>
<TD ALIGN=LEFT VALIGN=TOP><FONT SIZE=2 FACE=ARIAL COLOR=0>Bla Bla Bla</FONT></TD>
<TD ALIGN=LEFT VALIGN=TOP><FONT SIZE=2 FACE=ARIAL COLOR=0>Bla Bla Bla</FONT></TD>
</TR>
<TR>
<TD ALIGN=LEFT VALIGN=TOP><FONT SIZE=2 FACE=ARIAL COLOR=0>Bla Bla Bla</FONT></TD>
<TD ALIGN=LEFT VALIGN=TOP><FONT SIZE=2 FACE=ARIAL COLOR=0>Bla Bla Bla</FONT></TD>
<TD ALIGN=LEFT VALIGN=TOP><FONT SIZE=2 FACE=ARIAL COLOR=0>Bla Bla Bla</FONT></TD>
</TR>
<TR>
<TD ALIGN=LEFT VALIGN=TOP><FONT SIZE=2 FACE=ARIAL COLOR=0>Bla Bla Bla</FONT></TD>
<TD ALIGN=LEFT VALIGN=TOP><FONT SIZE=2 FACE=ARIAL COLOR=0>Bla Bla Bla</FONT></TD>
<TD ALIGN=LEFT VALIGN=TOP><FONT SIZE=2 FACE=ARIAL COLOR=0>Bla Bla Bla</FONT></TD>
</TR>
</TBODY>
</TABLE></CENTER></DIV>
<P><IMAGE SRC="some.gif" ALIGN=RIGHT HSPACE=10 VSPACE=10 BORDER=0 WIDTH=100 HEIGHT=100>
<UL>
<LI>Some text.<
<LI>Some text.<
<LI>Don't close me either!</UL></BODY></HTML>

Temporarily allow popups and .

UGH! Writing that gave me a headache! It’s no wonder some people hate HTML. Just imagine what that kind of mess looks like in a real web page with three columns and nested tables and images and embeds and content…

Now let’s see what strict markup looks like. First the CSS:

body { margin:20px; font:1.0em/1.5em Verdana,Geneva,"DejaVu Sans",sans-serif;
       background:url('paper.jpg') scroll repeat left top; }
a         { color:#3344CC; text-decoration:none; }
a:visited { color:#3344AA; }
a:hover   { color:#CC3333; background:#DDDDDD; }
p         { margin-top:0; margin-bottom:0.5em; line-height:1.5em; }
ul,ol     { margin-top:0; margin-bottom:0.5em; line-height:1.5em; }
li        { margin-top:0.5em; margin-bottom:0; }
h1,h2,h3,h4,h5 { margin:0 0 0.5em 0; color:#2C7590;
 font-family:'Century Gothic',futura,'URW Gothic L',sans-serif; }
table     {
 border-collapse:collapse; width:auto; margin:0.5em auto 1em auto;
 background:url('img/paper2.jpg') repeat scroll left top; }
th,td     {
 padding:0 3px 1px 3px; vertical-align:top;
 background:transparent; border:2px solid #FFFFFF;
 font-size:0.9em; text-align:left; }
th { color:#2C7590; text-align:left; } /* center is default for th */
td { color:#404040; }  /* left is default for td */
img { border:0; }
.fr { float:right; margin:0 0 10px 10px; }
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
 "http://www.w3.org/TR/HTML4/strict.dtd">
<html lang="en">
<head>
 <meta http-equiv="Content-Language" content="en-us">
 <meta http-equiv="Content-Type" content="text/HTML; charset=iso-8859-1">
 <meta name="description" content="Some description">
 <meta name="keywords" content="some keywords">
 <title>Strict HTML</title>
 <link rel="stylesheet" href="the-above.CSS" type="text/CSS">
</head>
<body>
 <h1>Bla bla bla.</h1>
 <p>Some text.</p>
 <p>Some more text.</p>
 <p>Close my tags!</p>
 <table>
  <tr>
   <td>Bla Bla Bla</td>
   <td>Bla Bla Bla</td>
   <td>Bla Bla Bla</td>
  </tr>
  <tr>
   <td>Bla Bla Bla</td>
   <td>Bla Bla Bla</td>
   <td>Bla Bla Bla</td>
  </tr>
  <tr>
   <td>Bla Bla Bla</td>
   <td>Bla Bla Bla</td>
   <td>Bla Bla Bla</td>
  </tr>
 </table>
 <p><img class="fr" src="some.gif" alt="Some Gif"></p>
 <ul>
  <li>Some text.</li>
  <li>Some more text.</li>
  <li>Close mine, too!</li>
 </ul>
</body>
</html>

Temporarily allow popups and .

Whew! That’s better. Okay, so I built and then destroyed a straw-man—I admit it. But let’s make a few observations.

  • The first one generates multiple errors in the validator. The second one validates as 401-Strict on the first try.
  • One is Garbage, the other Strict. But they both run okay, and render similarly.
  • One is a mess to write and maintain, the other is pure HTML simplicity.
  • Notice that I didn’t have to use any id’s or div wrappers, and used just one class (.fr).
  • All that deprecated stuff is removed and represented by modifying HTML tag defaults with CSS.

It hardly seems worth the effort to learn and write all that CSS for one page, but what about 10 pages, or 100? The more pages one adds, the better that CSS looks! Add header, footer, and menu include files, and you end up with lean code indeed—as minimally marked as possible, with nothing repeated twice.

Please see the other articles in this series for more in-depth examples.

Leave a Reply

Your email address will not be published.