XML and HTML entities

Versione italiana

As everyone should know, some characters should be replaced with entities, if you want to use them in a text, within an XML or (X)HTML document. For example, since tags are usually wrapped by < and > characters , you can’t write and expect that it is simply considered as a text and printed in the browser’s window.

Usually, entities are used to escape meaningful chars. XML has five predefined entities, which are also avaible in HTML:

& &amp;
< &lt;
> &gt;
” &quot;
‘ &apos;

To be sure that the result is what you want, & characters must be replaced before others.

These are the only character that can confuse the client and invalidate a document. Within a parameter, you can even leave < and > unchanged. In other contexts, ” and ‘ can appear.

You can avoid to replace all other “weird” characters, if you use the UTF-8 character set.

In XML documents, you can also avoid using the predefined entities, as long as all matching characters only appear in a CDATA section. The syntax for CDATA sections is:

<![CDATA[ blah blah blah ]]>

CDATA sections can not contain the ]]> sequence of characters; there is no workaround for this limitation.