URL Encoding & Decoding: A Developer’s Guide to Percent Encoding
You’re tailing a server log and spot this in a query string: %E4%BD%A0%E5%A5%BD. Is it corrupted data? A bug? Neither — it’s the Chinese characters 你好, converted to three UTF-8 bytes each, then percent-encoded into a URL-safe format. Every web developer hits this wall eventually: something looks broken, but the URL is working exactly as designed.
URL encoding — formally called percent encoding — is the mechanism that makes special characters safe for URLs. This guide covers how it works at the byte level, when to reach for encodeURI versus encodeURIComponent, how to encode correctly in four languages, and which bugs trip up experienced developers.
Paste any URL into our URL Decoder & Encoder to see encoding and decoding in real time as you follow along.
What Is URL Encoding (Percent Encoding)?
A URL can only contain a small subset of ASCII characters. Letters, digits, and a handful of symbols travel through the internet without issue. Everything else — spaces, ampersands, Chinese text, emoji — needs to be converted into a format that URLs can carry.
Percent encoding replaces each unsafe byte with a % sign followed by two hexadecimal digits. A space becomes %20. An ampersand becomes %26. The name comes from that % prefix.
The rules live in RFC 3986, published in 2005 and still the governing standard. It replaced RFC 2396 and tightened the definition of which characters are safe, which are reserved, and how non-ASCII text should be handled.
Quick examples:
| Input | Encoded | Why |
|---|---|---|
hello world | hello%20world | Space is not allowed in URLs |
price=10&tax=2 | price%3D10%26tax%3D2 | = and & have structural meaning |
中 | %E4%B8%AD | Non-ASCII → UTF-8 bytes → percent-encoded |
🚀 | %F0%9F%9A%80 | Emoji → 4 UTF-8 bytes → percent-encoded |
Which Characters Need Encoding?
RFC 3986 splits characters into three groups.
Unreserved Characters (Never Encoded)
These 66 characters pass through as-is in any part of a URL:
A-Z a-z 0-9 - . _ ~
Letters, digits, hyphen, period, underscore, tilde. That’s the full list.
Reserved Characters (Context-Dependent)
These characters serve as structural delimiters in URLs:
| Character | Role in URL structure |
|---|---|
: | Separates scheme from authority (https:) |
/ | Separates path segments |
? | Starts query string |
# | Starts fragment |
& | Separates query parameters |
= | Separates parameter key from value |
@ | Separates userinfo from host |
+ ! $ ' ( ) * , ; [ ] | Various reserved roles |
The rule: when a reserved character serves its structural purpose, leave it alone. When it appears as data (inside a parameter value, for example), encode it.
Everything Else (Always Encoded)
Spaces, angle brackets, curly braces, pipes, backslashes, and non-ASCII characters (Chinese, Arabic, emoji) must be percent-encoded.
One wrinkle: RFC 3986 encodes spaces as %20, but HTML form submissions use +. More on this conflict later.
How URL Encoding Actually Works: The UTF-8 Pipeline
For ASCII characters, encoding is straightforward: look up the byte value in hex, prepend %. A space (byte value 32, hex 20) becomes %20.
For non-ASCII text, encoding has three steps:
Step 1 — Character to Unicode code point.
The character é maps to code point U+00E9. The emoji 🚀 maps to U+1F680.
Step 2 — Code point to UTF-8 bytes.
UTF-8 uses 1 to 4 bytes depending on the code point range. é (U+00E9) becomes two bytes: 0xC3 0xA9. The rocket emoji (U+1F680) becomes four bytes: 0xF0 0x9F 0x9A 0x80.
Step 3 — Each byte to %XX.
Every byte from step 2 gets its own percent-encoded triplet.
Here’s the full pipeline for several character types:
| Character | Code Point | UTF-8 Bytes | Encoded | Size multiplier |
|---|---|---|---|---|
A | U+0041 | 41 | A (not encoded) | 1× |
| space | U+0020 | 20 | %20 | 3× |
é | U+00E9 | C3 A9 | %C3%A9 | 6× |
中 | U+4E2D | E4 B8 AD | %E4%B8%AD | 9× |
🚀 | U+1F680 | F0 9F 9A 80 | %F0%9F%9A%80 | 12× |
You can verify this yourself in JavaScript:
const char = '中';
const encoded = encodeURIComponent(char);
console.log(encoded); // '%E4%B8%AD'
// Trace the bytes
const bytes = new TextEncoder().encode(char);
console.log([...bytes].map(b => '%' + b.toString(16).toUpperCase()).join(''));
// '%E4%B8%AD' — matches
This expansion matters for URL length limits. A URL with 20 Chinese characters adds 180 characters of percent-encoded text.
encodeURI vs encodeURIComponent — Choosing the Right Function
These two JavaScript functions get confused constantly. They look similar but encode very different character sets.
encodeURI() | encodeURIComponent() | |
|---|---|---|
| Purpose | Encode a complete URL | Encode a single component (param key or value) |
| Preserves | : / ? # & = @ + $ , | None of these |
| Encodes | Spaces, non-ASCII, some punctuation | Everything except A-Z a-z 0-9 - _ . ~ ! ' ( ) * |
| Use when | You have a full URL with spaces or Unicode in the path | You’re building query parameters from user input |
A bug that ships to production regularly:
// ❌ BUG: encodeURI does NOT encode &
const search = 'Tom & Jerry';
const bad = `https://api.example.com/search?q=${encodeURI(search)}`;
// Result: https://api.example.com/search?q=Tom%20&%20Jerry
// The & splits the query string — server sees q=Tom%20 and a separate param %20Jerry
// ✅ FIX: encodeURIComponent encodes & as %26
const good = `https://api.example.com/search?q=${encodeURIComponent(search)}`;
// Result: https://api.example.com/search?q=Tom%20%26%20Jerry
When in doubt, pick encodeURIComponent(). It’s correct for 95% of real-world URL building.
Try both modes side-by-side in our URL Encoder tool →
URL Encoding in Every Language
JavaScript (Browser & Node.js)
// Encode a parameter value
const value = encodeURIComponent('price >= 100 & currency = €');
// 'price%20%3E%3D%20100%20%26%20currency%20%3D%20%E2%82%AC'
// Decode
const original = decodeURIComponent(value);
// 'price >= 100 & currency = €'
// Modern approach: URLSearchParams handles encoding automatically
const params = new URLSearchParams({ q: 'hello world', lang: '中文' });
console.log(params.toString());
// 'q=hello+world&lang=%E4%B8%AD%E6%96%87'
// Note: URLSearchParams uses + for spaces (form encoding)
Python
from urllib.parse import quote, unquote, urlencode
# Encode a path segment
quote('hello world/file name.txt', safe='/')
# 'hello%20world/file%20name.txt'
# Encode query parameters
urlencode({'q': '你好', 'page': '1'})
# 'q=%E4%BD%A0%E5%A5%BD&page=1'
# quote_plus uses + for spaces (form encoding)
from urllib.parse import quote_plus
quote_plus('hello world') # 'hello+world'
quote('hello world') # 'hello%20world'
Go
import "net/url"
// Encode a query value (uses + for spaces)
url.QueryEscape("hello world & more")
// "hello+world+%26+more"
// Encode a path segment (uses %20 for spaces)
url.PathEscape("hello world & more")
// "hello%20world%20&%20more"
// Build a URL safely with url.Values
params := url.Values{}
params.Set("q", "你好世界")
params.Set("page", "1")
fmt.Println(params.Encode())
// "page=1&q=%E4%BD%A0%E5%A5%BD%E4%B8%96%E7%95%8C"
Java
import java.net.URLEncoder;
import java.net.URLDecoder;
import java.nio.charset.StandardCharsets;
// Encode (uses + for spaces — Java follows form encoding)
String encoded = URLEncoder.encode("hello world & more", StandardCharsets.UTF_8);
// "hello+world+%26+more"
// For RFC 3986 compliance, replace + with %20
String rfc3986 = encoded.replace("+", "%20");
// "hello%20world%20%26%20more"
// Decode
String decoded = URLDecoder.decode(encoded, StandardCharsets.UTF_8);
// "hello world & more"
Go and Java default to form encoding (spaces as +). For RFC 3986 output, post-process the result to replace + with %20.
Five URL Encoding Bugs That Break Production
1. Double Encoding (%2520 Instead of %20)
You encode a string. A framework encodes it again. The % in %20 becomes %25, and the server sees literal %20 text instead of a space.
Symptom: URLs contain %2520, %253D, or other %25xx patterns.
Diagnosis: %25 in a URL means a % character was encoded, which usually points to double encoding.
Fix: Decode first, then encode once. Check whether input is already encoded before encoding it.
// Detect double encoding
function isDoubleEncoded(str) {
return /%25[0-9A-Fa-f]{2}/.test(str);
}
// Safe encode: decode first, then encode
function safeEncode(str) {
try { str = decodeURIComponent(str); } catch (e) { /* not encoded, that's fine */ }
return encodeURIComponent(str);
}
2. + in Path Segments
A developer URL-encodes a filename using a library that outputs + for spaces. The file my report.pdf becomes my+report.pdf. The server treats + as a literal plus sign and returns a 404.
The rule: + means space only in query strings (after ?). In path segments, + is just +. Always use %20 for spaces in paths.
3. Broken OAuth Redirect URIs
The authorization URL looks like this:
https://auth.provider.com/authorize?redirect_uri=https://myapp.com/callback?code=abc&state=xyz
The OAuth server reads redirect_uri=https://myapp.com/callback?code=abc and treats state=xyz as a separate top-level parameter. Authentication fails.
Fix: Encode the entire redirect URI value:
const redirectUri = 'https://myapp.com/callback?code=abc&state=xyz';
const authUrl = `https://auth.provider.com/authorize?redirect_uri=${encodeURIComponent(redirectUri)}`;
// redirect_uri=https%3A%2F%2Fmyapp.com%2Fcallback%3Fcode%3Dabc%26state%3Dxyz
4. Garbled Non-ASCII Text in Logs
Server logs show %E4%BD%A0%E5%A5%BD instead of readable Chinese characters. The URL is correctly encoded; your log viewer just isn’t decoding percent-encoded sequences.
Fix: Pipe logs through a decoder, or paste the URL into a URL Decoder to read the original text.
5. API Signing Failures
OAuth 1.0 and AWS Signature V4 require strict RFC 3986 encoding. JavaScript’s encodeURIComponent() does not encode !, ', (, ), or *. If these characters appear in your signing input, the signature won’t match.
Fix: Post-process the output:
function rfc3986Encode(str) {
return encodeURIComponent(str).replace(/[!'()*]/g, c =>
'%' + c.charCodeAt(0).toString(16).toUpperCase()
);
}
%20 vs + — The Space Encoding Dilemma
Two standards disagree on how to encode one character.
| Standard | Space becomes | Where it applies |
|---|---|---|
| RFC 3986 (URI syntax) | %20 | Everywhere in a URL |
application/x-www-form-urlencoded | + | Query strings from HTML form submissions |
The + convention is a holdover from early web browsers. When a <form> submits with method="GET", the browser encodes spaces as + in the query string. The HTML spec codifies this behavior.
The problem: + only means “space” in query strings. In path segments, a + is a literal plus sign. This is why https://example.com/my+file.pdf serves a file named my+file.pdf, not my file.pdf.
Practical guidance:
- Use
%20when building URLs manually or encoding path segments. It works everywhere. - Accept
+when parsing query strings from form submissions — your framework probably handles this already. - Don’t mix them. Pick one convention per component and stick to it.
URL Encoding and Security
URL Encoding Is NOT Encryption
Percent encoding is a fully reversible, deterministic transformation with no cryptographic properties. Anyone can decode %48%65%6C%6C%6F back to Hello in milliseconds.
Don’t use URL encoding to hide sensitive data. Use HTTPS to encrypt the entire request. URLs appear in server logs, browser history, and Referer headers, so sensitive information belongs in request bodies, not URLs.
Open Redirect Attacks
Attackers use encoded URLs to bypass naive validation. A redirect parameter containing %2F%2Fevil.com decodes to //evil.com, which browsers treat as a protocol-relative URL pointing to the attacker’s domain.
Defense: Validate the decoded URL, not the encoded form. Use allowlists for redirect domains.
Double Encoding Exploits
A WAF checks incoming URLs for <script> tags. An attacker sends %253Cscript%253E. The WAF sees percent-encoded text and lets it through. The application decodes once to %3Cscript%3E, then a second decode produces <script>, bypassing the filter.
Defense: Normalize all input (decode fully) before applying security checks. Don’t rely on a single decode pass.
For more on web security fundamentals, see our Web Security Essentials guide.
URL Length Limits and When Encoding Gets Expensive
The HTTP spec sets no maximum URL length, but every layer of the stack imposes practical limits.
| Layer | Limit |
|---|---|
| General recommendation | 2,000 characters |
| Chrome, Firefox | ~2 MB (but servers reject long before this) |
| Apache (default) | 8,190 bytes |
| Nginx (default) | 8,192 bytes |
| IIS | 16,384 bytes (query string) |
| CDNs, proxies | Varies — often 4,096-8,192 bytes |
Percent encoding makes URLs longer. A single Chinese character goes from 1 character to 9 (%E4%B8%AD). An emoji expands to 12. Two hundred Chinese characters in a query string alone produce 1,800 characters of percent-encoded text.
When you hit the limit: Move data from query parameters to a POST request body. For search interfaces, a POST endpoint accepting JSON works well.
FAQ
What is URL encoding and why do developers need it?
URL encoding (percent encoding) converts characters that aren’t allowed in URLs into %XX hex sequences. URLs only support 66 unreserved ASCII characters. Spaces, ampersands, Unicode text, and most punctuation must be encoded or they’ll break URL structure.
What is the difference between encodeURI and encodeURIComponent?
encodeURI() encodes a full URL while preserving structural characters like ://, /, ?, and &. encodeURIComponent() encodes everything except A-Z a-z 0-9 - _ . ~ ! ' ( ) *. Use encodeURIComponent() for query parameter values. Use encodeURI() only when you have a complete URL and want to fix spaces or non-ASCII characters without breaking its structure.
Why does %20 sometimes appear as + in URLs?
Both represent a space, but they come from different standards. %20 follows RFC 3986 and works everywhere in a URL. + follows the HTML form encoding spec and only works in query strings. In path segments, + is a literal plus sign. Use %20 when in doubt.
How do I URL-encode text in Python, JavaScript, Go, and Java?
JavaScript: encodeURIComponent('hello world') → hello%20world. Python: urllib.parse.quote('hello world') → hello%20world. Go: url.QueryEscape("hello world") → hello+world. Java: URLEncoder.encode("hello world", UTF_8) → hello+world. Go and Java default to form encoding (space as +) — replace + with %20 for RFC 3986 output.
Can URL encoding be used for security or encryption?
No. URL encoding is fully reversible without any key. It provides zero confidentiality. Protect sensitive data with HTTPS, not percent encoding. URLs appear in server logs, browser history, and Referer headers, so sensitive data belongs in request bodies.
What is double encoding and how do I fix it?
Double encoding happens when an already-encoded string gets encoded again. The % in %20 is encoded as %25, producing %2520. Servers see literal %20 text instead of a space. Fix it by decoding the input first, then encoding once. The pattern %25 followed by two hex digits is the telltale sign.
What is the maximum URL length?
No official maximum exists in the HTTP spec. 2,000 characters is the safe limit for broad compatibility. Apache defaults to 8,190 bytes, Nginx to 8,192 bytes. Non-ASCII characters expand 3-12x when percent-encoded, so internationalized URLs hit limits faster. For large payloads, switch to POST.