An HTML entity is a way to write a character so the browser shows it as text instead of treating it as markup. Type a raw < in your content and the browser starts reading a tag; write < instead and it renders a literal < on the page. That swap is the whole idea behind HTML entity encoding.
Five characters carry special meaning in HTML and are the ones you escape most often: <, >, &, ", and '. You escape them for two reasons. The first is display: you want to show code or markup as text. The second, and more important, is security: escaping untrusted input is the foundation of stopping cross-site scripting (XSS).
You can write any entity three ways, named (<), decimal (<), or hexadecimal (<), and they all resolve to the same character. The harder question is when to escape and with what, because the right answer depends on where the value lands: HTML text, an attribute, a script, or a URL. This guide covers the notations, the reserved set, a context decision matrix, and the pitfalls that catch people out.
What is an HTML entity? (anatomy)
An HTML entity, also called a character reference, is a short code that stands in for a single character. Every entity begins with an ampersand & and ends with a semicolon ;. What sits between them determines which character you get.
There are three shapes:
&name;— a named reference, like<or©.&#decimal;— a decimal numeric reference, like<.&#xhex;— a hexadecimal numeric reference, like<.
The browser reads the reference, looks up the character it points to, and renders that single character. The visible result is the same either way: < and a raw < display identically. The only difference is that the entity is treated as text, never as the start of a tag.
The three notations: named, decimal, hexadecimal
All three notations reference the same Unicode code point and differ only in spelling. A named entity is the readable form, but it exists only for characters that have a defined name. A decimal entity writes the code point in base 10. A hexadecimal entity writes the same code point in base 16, which lines up directly with the U+XXXX notation you see in the Unicode standard.
| Character | Named | Decimal | Hex |
|---|---|---|---|
< | < | < | < |
& | & | & | & |
© | © | © | © |
é | é | é | é |
Because hex mirrors U+XXXX directly (é is U+00E9, hence é), many developers reach for it when they are documenting or reasoning about a specific code point. For everyday markup, named entities read best.
The five reserved characters you must escape
These are the HTML special characters that change how the browser parses a document. If one appears in content that should be shown rather than executed, escape it.
| Character | Named | Decimal | Hex | What breaks if you don’t escape it |
|---|---|---|---|---|
< | < | < | < | Starts a tag, so the browser reads following text as markup |
> | > | > | > | Closes a tag prematurely |
& | & | & | & | Starts an entity, so the rest can be misread as a reference |
" | " | " | " | Ends a double-quoted attribute value early |
' | ' | ' | ' | Ends a single-quoted attribute value early |
The HTML ampersand entity sits underneath the rest. The & character begins every entity, so it has to be escaped first. Escape the angle brackets before the ampersand and you re-escape the & in the entities you just produced. More on that pitfall below.
When do you actually need to escape? (context-aware)
This is where most bugs and most vulnerabilities live. The rule is to escape at output time, matched to the context where the value lands. A value that is safe in one place is dangerous in another, so the encoding you apply has to match the destination.
HTML element content
When you drop a value between tags, inside a <p>, a <div>, or a <td>, escape <, >, and &. Escaping the quotes here is harmless but unnecessary. If you want to show the text <strong> as literal characters rather than turning the next word bold, encode it to <strong> and the browser prints the tag instead of applying it.
HTML attribute values
Inside an attribute, the quote characters become critical. If a value sits in title="…" and contains an unescaped ", it ends the attribute early and lets an attacker append new attributes, a well-known XSS vector. Escape " (and ideally ') in attribute context. A value like He said "hi" must become He said "hi" to stay contained.
Inside <script> or inline JavaScript
HTML entities do not help here. A string built into a <script> block or an inline event handler needs JavaScript or JSON string escaping, not character references. Writing " inside a JS string literal produces the literal six characters, not a quote. For this context, reach for the JSON Escape tool, and read the complete guide to JSON string escaping for the \uXXXX rules that actually apply inside script.
Inside a URL
A URL has its own escaping scheme: percent-encoding. HTML entities will not make a value URL-safe. The string a&b c belongs in a query as a%26b%20c, not a&b c. Write it as entities and the space still breaks the URL while the & still separates parameters. Use the URL Encoder / Decoder for this, and the URL encoding and decoding guide for the full rules on reserved versus unreserved characters.
The decision matrix
| Context | Escape with | Example | Wrong choice that fails |
|---|---|---|---|
| HTML element content | HTML entities (< > &) | <strong> → <strong> | Leaving < raw injects a tag |
| HTML attribute value | HTML entities (" ' critical) | "hi" → "hi" | An unescaped " breaks out |
<script> / inline JS | JS / JSON string escaping | " → \" | HTML entities are inert in JS |
| URL / query string | Percent-encoding | space → %20 | & and entities still break the URL |
Named vs numeric: which should you use?
Named entities are readable and the right default for the common reserved characters and well-known symbols such as <, &, ©, and —. They only exist, however, for characters that have a defined name. Numeric entities, decimal or hex, can encode any code point, including ones with no name, which makes them the universal fallback. When you cannot guarantee the consuming system supports a particular named entity, numeric is the safe choice.
Why the apostrophe is ' and not '
The named entity ' was only introduced in HTML5 and XML. It is undefined in HTML4, so a handful of older parsers and email clients render it as the literal text ' instead of an apostrophe. The numeric reference ', along with its decimal twin ', points to the exact same character, U+0027, and is understood by every conforming parser ever written. Well-tested escaping libraries such as he emit ' for the single quote for this reason, and a good encoder follows that convention so the output is safe to drop into any HTML, XML, or attribute context.
Charset vs entities: when to encode non-ASCII
A character set, like UTF-8, decides how characters are stored as bytes. An entity is a way to spell a character using only plain ASCII (&, #, ;, letters, digits). These are different layers, and conflating them leads to needless encoding.
On a UTF-8 page, which is nearly every modern page that declares <meta charset="utf-8">, accented letters, dashes, and emoji are valid raw characters. Leave é, —, and 😀 exactly as they are. Encoding everything into entities only matters when the text must survive a legacy single-byte charset or a system that mangles raw UTF-8; an “encode all non-ASCII” mode exists for those cases. If you are unsure how bytes, code points, and characters relate, the UTF-8, UTF-16 and Unicode encoding guide lays out the model.
Common HTML entity pitfalls
Escaping & last causes double-escaping
Order matters. If you replace < and > before &, the entities you just created (<, >) get their leading & escaped too, so < ends up as &lt; and renders as the literal text <. Always escape & first, then the rest. That ordering prevents one of the most common encoding bugs.
Double-encoding already-escaped text
Running text that is already escaped through an encoder again re-encodes it. & becomes &amp;, and the visitor sees & on the page instead of &. Escape exactly once, at output time. If a value passes through several layers, make sure only one of them escapes.
Mojibake when decoding
Going the other direction has its own trap. Decode with the wrong charset, or decode twice, and you get garbled output, the kind people call mojibake. If a page is showing literal &lt; where you expected <, paste it into the HTML Entity Decoder to see exactly what the entities resolve to; it handles named, decimal, hex, and even legacy unterminated references like © without a trailing semicolon.
Trusting escaping as a complete XSS cure
Escaping is the first line of defense, not the only one. Because HTML has several contexts with different rules, escaping for the wrong one leaves a hole: quotes in attributes, JS escaping in script, percent-encoding in URLs. Pair correct, context-aware escaping with a Content Security Policy and your framework’s auto-escaping, so CSP and framework defaults catch what a single layer of encoding misses.
How to encode and decode entities in practice
When you build HTML by hand, you escape it yourself. Here is a correct escapeHtml() that handles the &-first ordering, plus the better practice for real application code.
// The five reserved characters and their safe entities:
// < → < > → > & → & " → " ' → '
function escapeHtml(str) {
return str
.replace(/&/g, '&') // & FIRST, so later entities are not double-escaped
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(/"/g, '"')
.replace(/'/g, '''); // numeric form — safe in HTML4, HTML5 and XML
}
const userInput = `<a href="x">Tom & Jerry's</a>`;
const safe = escapeHtml(userInput);
// → <a href="x">Tom & Jerry's</a>
// Better in app code: let the platform escape for you.
// el.textContent = userInput; // the browser escapes; no manual replace
// React / Vue / Angular escape interpolated text by default
// Server templates (Jinja, ERB, Blade) auto-escape unless you opt out
The hand-rolled function is useful for understanding what happens and for one-off conversions, but in production prefer the built-in path. Setting element.textContent lets the browser escape for you, and modern frameworks escape interpolated values automatically. Reserve manual escaping for the cases the platform does not cover.
For ad-hoc work, the HTML Entity Encoder escapes the reserved set (named, decimal, or hex), and the HTML Entity Decoder reverses it. The two are exact inverses for the reserved characters, so you can round-trip text through both without loss.
Frequently asked questions
What is an HTML entity?
An HTML entity is a short code, starting with & and ending with ;, that represents a single character. The browser renders the character the entity points to instead of treating it as markup. For example, < displays a literal <, and & displays a literal &.
Which characters do I need to escape in HTML?
The five reserved HTML special characters: <, >, &, ", and '. In element content you mainly need <, >, and &; in attribute values the quotes " and ' become critical too. Escape the & ampersand first so the other entities are not double-escaped.
Should I use named or numeric (decimal/hex) entities?
Use named entities (<, ©) for readability with the common characters, since they are easy to recognize. Use numeric entities (decimal < or hex <) when you need to encode a character with no defined name, or when you cannot guarantee the consumer supports a given named entity. Both forms reference the same code point.
Do HTML entities protect against XSS?
They are the foundation, when applied correctly. Escaping the five reserved characters before placing untrusted input into HTML element or attribute content stops tag and script injection. But escaping is context-dependent: script blocks need JavaScript escaping and URLs need percent-encoding. Combine correct context-aware escaping with CSP and framework auto-escaping.
Why does my page show &lt; instead of <?
That is double-escaping. The text was encoded twice, or the & was escaped after the angle brackets, so the & in < got turned into &. The visitor then sees < as literal text. Escape exactly once and always escape & first. The decoder tool can confirm what the entities resolve to.
Do I need to escape characters like é, — or emoji?
Usually no. On a page that declares <meta charset="utf-8">, accented letters, dashes, and emoji are valid raw characters and need no encoding, so leave them as-is. Only encode non-ASCII when the text must pass through a legacy single-byte charset or a system that corrupts raw UTF-8.