The XHTML WYSIWYG Editor For Desktop & Web Applications

Why XHTML is best for CMS (content management systems)

Web browsers are designed to render just about anything that looks like HTML. Taking advantage of this, most WYSIWYG editors generate markup that is poorly constructed. They don't follow best practices or even current specifications. So since browsers are so good at rendering poorly constructed HTML, do we really need to bother writing good quality markup?

The answer is Yes!

Although good markup is not important to browsers, it is very important to content management systems (CMS). Content management systems and developers that build solutions around them need to parse, analyze and modify content generated by WYSIWYG editors. The ability to do this programmatically outside the WYSIWYG editor is called automation. Examples of automation include:

  • Before content is saved, the CMS checks if the content contains headings (h1 to h6) and if headings are missing, it informs the user.
  • Before content is saved, potentially unsafe markup is removed, such as script, which might have been added unintentionally during a cut-and-paste operation.
  • Before content is saved, the CMS does a word count on the textual content (not including markup) and lets the user know if the content is too short or too long.
  • Before content is saved, the CMS adds or removes the attribute rel="nofollow" to hyperlinks, depending on whether they point to an external site or not.
  • Before content is saved, images that are no longer referenced by any document are removed from the CMS.
  • During a Web site redesign, developers need to migrate content from one page layout to another. In this process, long documents can be split into separate smaller documents to create pagination.
  • At run-time, temporary advertisements can be inserted into the content, for example after the first paragraph.
  • A change in a CSS file requiring modifications to thousands of documents is open to automation when markup is well constructed.

What do we mean by "good markup" anyway?

We really mean markup that is useful for a given purpose. And for content management systems, the most useful markup is markup that has the following characteristics:

Markup that is authored in a consistent style
Consistent style can include markup that is written in the same case (i.e. lowercase). It includes elements that are closed in the same way and attributes that are quoted in a consistent way. Consistent style reduces the amount of work programmers need to do in order to manipulate markup. For example, for simple operations, consistent style can allow developers to use a basic replace() function to adjust markup, instead of more complex methods such as regular expressions, or XSLT.
Markup that is free from syntax errors
Markup free from syntax errors reduces the amount of work programmers need to do in order to manipulate markup, because developers do not need to write code to check for certain types of errors.
Markup that is semantic
Markup that uses semantic elements rather than formatting markup (font or div / span elements with inline CSS) provides programmers with what are called parsing hooks. It is then much easier to write rules that use these parsing hooks.
Markup that can be manipulated by off-the-shelf tools
To make it easy to manipulate markup programmatically, developers need to have access to a wide variety of tools that work in different development environments.

XHTML authoring tools encourage "good markup"

Good markup can be constructed from HTML. Unfortunately, this rarely happens, because HTML WYSIWYG editors don't impose on themselves the rules of HTML or best practices. On the other hand, because XHTML editors have to follow the strict rules of XML, they often impose on themselves additional rules and best practices. In this way, XHTML authoring tools better meet the needs of content management systems.

WYSIWYG editors that generate XHTML meet the characteristics of "good markup" for content management systems because:

Markup is authored in a consistent style
All elements and attributes are written in lowercase. All attributes a quoted. All empty and non-empty elements are closed in a consistent way. Some XHTML WYSIWYG editors go even further. For example, XStandard will put attributes sorted alphabetically.
Markup is free from syntax and nesting errors

The rules of XML ensure that XHTML markup is free of syntax and some types of nesting errors. For example, markup generated by an XHTML editor will never have the following errors:

Incorrectly escaped content: <p>Able & Baker Inc.</p>

Incorrect nesting: <span><em>Look out!</span></em>

Markup is semantic
Although HTML is identical to XHTML when it comes to semantics, HTML WYSIWYG editors don't encourage the use of semantic markup, while XHTML WYSIWYG editors, such as XStandard, steer content authors towards using semantic markup.
Markup can be manipulated by off-the-shelf tools
XML DOM parsers, XSLT and SAX, can manipulate XHTML. These tools are available in practically every popular programming language such as C#, Visual Basic, VBScript, JavaScript, PHP, Java, C++, etc. Compared to the number of tools that can manipulate XML, the number of tools that can manipulate HTML is very limited.

Conclusion

Good markup is important to content management systems because it gives developers the ability to manipulate content programmatically, outside WYSIWYG editors, through automation. Markup generated by XHTML WYSIWYG editors tends to be better for content management systems than markup generated by HTML WYSIWYG editors, thanks to XML rules and best practices that XHTML tool vendors are able to incorporate in their products.