XML and HTML entities

Versione italiana

As everyone should know, some characters should be replaced with entities, if you want to use them in a text, within an XML or (X)HTML document. For example, since tags are usually wrapped by < and > characters , you can’t write and expect that it is simply considered as a text and printed in the browser’s window.

Usually, entities are used to escape meaningful chars. XML has five predefined entities, which are also avaible in HTML:

& &amp;
< &lt;
> &gt;
” &quot;
‘ &apos;

To be sure that the result is what you want, & characters must be replaced before others.

These are the only character that can confuse the client and invalidate a document. Within a parameter, you can even leave < and > unchanged. In other contexts, ” and ‘ can appear.

You can avoid to replace all other “weird” characters, if you use the UTF-8 character set.

In XML documents, you can also avoid using the predefined entities, as long as all matching characters only appear in a CDATA section. The syntax for CDATA sections is:

<![CDATA[ blah blah blah ]]>

CDATA sections can not contain the ]]> sequence of characters; there is no workaround for this limitation.


Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s