Interview Questions

Which should I use, &entityname; or &#number; ?

HTML Interview Questions and Answers


(Continued from previous question...)

52. Which should I use, &entityname; or &#number; ?

In HTML, characters can be represented in three ways:

1. a properly coded character, in the encoding specified by the "charset" attribute of the "Content-type:" header;
2. a character entity (&entityname;), from the appropriate HTML specification (HTML 2.0/3.2, HTML 4, etc.);
3. a numeric character reference (&#number;) that specifies the Unicode reference of the desired character. We recommend using decimal references; hexadecimal references are less widely supported.

In theory these representations are equally valid. In practice, authoring convenience and limited support by browsers complicate the issue.

HTTP being a guaranteed "8-bit clean" protocol, you can safely send out 8-bit or multibyte coded characters, in the various codings that are supported by browsers.
A. HTML 2.0/3.2 (Latin-1)

By now there seems no convincing reason to choose &entityname; versus &#number;, so use whichever is convenient.
If you can confidently handle 8-bit-coded characters this is fine too, probably preferred for writing heavily-accented languages. Take care if authoring on non-ISO-8859-based platforms such as Mac, Psion, IBM mainframes etc., that your upload technique delivers a correctly coded document to the server. Using &-representations avoids such problems.
B. A single repertoire other than Latin-1

In such codings as ISO-8859-7 Greek, koi8-r Russian Cyrillic, and Chinese, Japanese and Korean (CJK) codings, use of coded characters is the most widely supported and used technique.

Although not covered by HTML 3.2, browsers have supported this quite widely for some time now; it is a valid option within the HTML 4 specifications--use a validator such as the WDG HTML Validator or the W3C HTML Validation Service which supports HTML 4 and understands different character encodings.

Browser support for coded characters may depend on configuration and font resources. In some cases, additional programs called "helpers" or "add-ins" supply virtual fonts to browsers.

"Add-in" programs have in the past been used to support numeric references to 15-bit or 16-bit code protocols such as Chinese Big5 or Chinese GB2312.

In theory you should be able to include not only coded characters but also Unicode numeric character references, but browser support is generally poor. Numeric references to the "charset-specified" encoding may appear to produce the desired characters on some browsers, but this is wrong behavior and should not be used. Character entities are also problematical, aside from the HTML-significant characters <, &amp; etc.

C. Internationalization per HTML 4

Recent versions of the popular browsers have support for some of these features, but at time of writing it seems unwise to rely on this when authoring for a general audience.

(Continued on next question...)

Other Interview Questions