The XHTML WYSIWYG Editor For Desktop & Web Applications

WYSIWYG Editors And Bad Markup

WYSIWYG editors are a key component of content management systems. They empower non-technical users to manage rich content efficiently and intuitively. Unfortunately, WYSIWYG editors are notorious for generating "bad" markup (or "dirty code"). In the longer-term, the problems that bad markup creates can outweigh the benefits that WYSIWYGs offer.

So what is bad markup and what can you do about it? This article gives examples of bad markup created by other WYSIWYG editors and explains how XStandard makes sure that business users generate "clean", standards-compliant markup every time.

Since Web developers and designers have differing opinions on what constitutes bad markup, it's more productive to define what "clean" markup is.

  1. Clean markup is standards-compliant. Whether you are using HTML 4 or the latest version of XHTML, markup tags must be used correctly - i.e. according to W3C specifications.
  2. Clean markup is also based on best practices. This involves using techniques that have successfully emerged after extensive professional use and that favor one approach over another.

Markup that is not based on standards and that does not follow best practices can therefore be considered "bad". The following are classic examples of "bad" markup practices, and how using XStandard avoids them.

Example Of Bad Markup: Incorrect Use Of <blockquote>

Of all the "bad" markup generated by using tags incorrectly, the most commonly misused tag is <blockquote>. Although block quote should only be used to contain text that is a quotation, nearly all WYSIWYG editors wrongly use <blockquote> for indenting, as we see in the toolbar screenshot below.

Toolbar with icons for increase and decrease indent.

There are two strong reasons for not using block quotes for indenting.

First, identing only justifies text on the left, whereas <blockquote> will justify text on the left and the right. So <blockquote> is simply the wrong markup for the job. Instead, indents should be rendered using CSS and the following is an example of how to do this:

CSS
p.indent {margin-left:40px}

Markup
<p class="indent">The quick brown fox jumped over the lazy dog.</p>

Screenshot of how blockquote will justify text on the left and the right, while the CSS approach will correctly indent text from the left.

Second, block quotes are used to transmit semantic meaning, whereas "indent" has no semantic meaning at all. "Indent" says nothing about the data that is indented. It cannot for instance require indented text to be read by a male voice via a screen reader. By contrast, text surrounded by <blockquote> can "tell" an application to read text in a male voice, or in any other way supported by the auditory user-agent.

How XStandard addresses <blockquote>

XStandard has different toolbar buttons for indenting and for block quotes. See the screenshot below:

XStandard toolbar showing 'Remove Block Quote', 'Add Block Quote' and 'Indent' buttons.

By default, the indent icon in XStandard creates the following markup but can be easily customized to suit your specific CSS:

<p class="indent">

Example Of Bad Markup: Incorrect Use Of <div>

To imitate the tighter line spacing between paragraphs that is typically found in word processors, many WYSIWYG editors use <div> tags instead of <p>. Here is an example:

Three lines of code using div tags instead of pragraph tags.

The <div> tag is semantically meaningless and should only be used for grouping, whereas the correct tag for marking paragraph spacing is <p>. If a line break is needed without beginning a new paragraph, then the <br> tag should be used, not <div>. In most WYSIWYG editors, pressing Shift-Enter creates a <br> tag. Spacing between paragraphs is formatting and tighter spacing should be done via CSS. For example: p {margin: .2em 0}

How XStandard Uses The <div> Tag Correctly

XStandard uses <p> for paragraph breaks and <br> for line breaks. XStandard treats <div> tags like a layer for grouping content.

Example Of Bad Markup: Illegal Characters

In order to seamlessly copy & paste text from word processors, WYSIWYG editors accept characters that are in fact illegal for the encoding they support. The most common illegal characters are curly quotation marks (”), long dashes (—) and ellipses (…). If the markup generated by the WYSIWYG does not support Unicode, then special characters should be represented as entities or decimal values.

How XStandard Deals With Illegal Characters

XStandard's native character encoding is Unicode so it can use special characters without escaping them. When interacting with content management systems that do not support Unicode, XStandard can convert Unicode (and special characters) to their decimal values.

Example Of Bad Markup: Bloated Markup

WYSIWYG editors are notorious for generating bloated markup, and the tag that generates most bloated markup is the <font> tag. Whether the editor inserts the <font> tag itself, or has a color-picker or font-selector that lets users do it manually, the end result is bloated markup. For example:

Markup for a 1 row, 3 column table using the same font tag in each cell.

Using CSS is far more efficient as we can see in the example below:

CSS:
table {font-family:arial;font-size:1em;color: #000000}

Markup:
Markup for a 1 row, 3 column table with no font tags.

How XStandard Creates Lean Markup Every Time

XStandard generates lean code. Formatting is done exclusively through external or embedded CSS, so tags responsible for bloated code (<font> and style attributes) are never used.

Example Of Bad Markup: Mixing Formatting Models

Combining external or embedded CSS with inline CSS, <font> tags and formatting elements is bad because it results in "spaghetti code", meaning the intent of the markup is not evident from the way it looks. The screenshot below shows one example of this:

Markup using external/embedded CSS, inline CSS, font tag and bold tag.

How XStandard Avoids Mixing Formatting Models

As recommended by the latest XHTML specification, XStandard uses only external or embedded CSS for formatting. So deprecated or outdated constructs like <font> and the style attributes are never used.

Example Of Bad Markup: Incorrect Use Of Alt (Alternate) Text

Images enhance the visual experience of those who are sighted, but for those with disabilities that limit vision, for users of small screen devices with limited display areas, or for search engine applications, alternate text becomes an important replacement for images. Alt text is therefore a crucial aspect of "clean" markup, yet most WYSIWYG editors do not encourage the use of alt text at all.

Some WYSIWYG editors that support file upload often insert the image file name as the alt text, but this results in meaningless alt text such as the one seen below:

<img src="images/x123001.gif" alt="x123001.gif" />

Many WYSIWYG editors also make the mistake of considering alt text and "tooltip" to be interchangeable, which they are not. Tooltip is placed inside the title attribute while alt text is placed inside the alt attribute. For example:

<img src="tv.gif" alt="Wide-screen television." title="On Sale Now!" />

WYSIWYG editors also rarely distinguish between images that are decorative versus images that are informative, leading to distortions in the meaning of content. Informative images transmit semantic meaning to devices such as accessibility screen readers and so require alt text. By contrast, decorative images (such as spacers, bullets, borders, etc.) are merely "eye-candy", convey no semantic meaning at all and should not use alt text. To make decorative images invisible to non-visual devices, the setting should be alt="".

The example below shows markup where alt text is used incorrectly for decorative images. Listen to the sound file to hear the confusion this creates when the markup is processed by an auditory user-agent such as a screen reader:

Markup showing the incorrect use of alt text for decorative images. The words 'Red ball' are used as alt text for each image in front of a list item.

Listen.

How XStandard Uses Alt (Alternate) Text Correctly

When users upload images into XStandard, they are prompted to identify the image as decorative or informative. If the user identifies the image as decorative, an empty alt attribute is automatically created and the title and longdesc attributes are removed. If the image is identified as informative, the alternate text becomes required. To make sure the alt text is not confused with the tooltip, XStandard has separate fields for "Alternate Text" and "Description" (tooltip) as shown in the screenshot below.

Screenshot of XStandard image properties dialog box showing fields Decorative Image, Alternate Text, Description, Image URL, Width, Height and Long Description URL.

Example Of Bad Markup: Proprietary Tags

Business users love to copy content from Microsoft Word then paste it into WYSIWYG editors. Unfortunately, when this happens, most WYSIWYG editors retain proprietary MS Office tags, creating meaningless and non-validating code. The illustration below shows examples of proprietary markup that cannot be understood outside of Word:

Propriety tags generated by Microsoft Word.

MS Office markup can also reference proprietary inline CSS such as seen below:

style="mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA; mso-bidi-font-family: 'Times New Roman'; mso-highlight: yellow"

MS Office markup also references proprietary CSS class names such as:

class=MsoNormal

How XStandard Neutralizes Proprietary Tags

When content is copied from Word and pasted into XStandard, the editor strips out all proprietary tags and inline formatting so that only important structural elements survive such as tables, lists, headings, images, hyperlinks and semantic tags like <strong>, <abbr>, <code>, <cite>, <kbd>, etc. Formatting is easily and more effectively replaced using XStandard's "styles" menu that references CSS.

Example Of Bad Markup: Empty Tags

WYSIWYG editors tend to generate empty tags. This often occurs when formatting has been applied to text and the text is later deleted. The following is an example of empty tags.

Markup of empty span and font tags.

How XStandard Avoids Empty Tags

XStandard removes inline tags that are meaningless because they are empty of content. So you never get markup that looks like this:
... <span></span> ...

Example Of Bad Markup: Using Formatting To Convey Meaning

Colors and fonts add no meaning (no semantics) to data. A font is a font and no more. The color red only says "red". Regardless, the practice of WYSIWYG vendors has been to encourage the use of color and font selectors to assign or suggest importance to data. This is a futile exercise since no information about the data is actually transmitted, as we see from the meaningless markup generated by old-fashioned formatting tools below:

Example of markup generated by tools like a color-picker and font-selector. The outcome is inline CSS or the use of font tags.

How XStandard Uses Meaningful Markup To Convey Meaning

XStandard has no color-pickers or font-selectors since these tools create semantically barren markup. Instead, XStandard's easy-to-use "styles" menu generates the type of meaningful markup seen in the illustration below. User-friendly style names apply semantic markup and at the same time reference CSS that offer limitless formatting options in a single mouse click. What better way to ensure a consistent look-and-feel to content?

Sample markup generated using XStandard styles drop-down list. The label in the drop-down list says 'Chapter Title' and the markup created is an h1 tag with a class value of 'title'.

Example Of Bad Markup: Incorrect Use Of Tables

Most WYSIWYG editors use tables incorrectly, whether for layout or for tabular data. Below is an example of a data table, where the data in the table can only be understood in relation to column and/ or row headers.

Cups of coffee consumed by each person
NameCupsTypeSugar
Wendy10Regularyes
Jim15Decafno

If the markup behind this table does not associate each cell with the appropriate header, the cells will be processed like <div> tags by non-visual devices. Listen to how an auditory user-agent "reads" the table when the markup is incorrect. Now listen to the same table using correct markup.

How XStandard Uses Tables Correctly

When users of XStandard create tables, they can explicitly select the type of table required (data table or layout table), as shown in the screenshot below:

XStandard toolbar showing the 'Layout Table' and 'Data Table' buttons.

In XStandard, layout tables use only <table>, <tr> and <td> tags. Data tables use <table>, <caption>, <thead>, <tbody>, <tr>, <th> and <td> tags, and the following attributes <th id="a"> and <td headers="a b">. Below is a screenshot of a data table and the correct markup created by XStandard.

A table with heading and markup used to create this table.