URL Encoding Special Characters: A Developer Reference
A comprehensive reference for URL encoding special characters including spaces, ampersands, unicode, and more.
Special Characters in URLs
URLs have a strict syntax defined by RFC 3986. Many characters have special meanings or are not allowed in certain parts of a URL. Understanding how to properly encode these characters is essential for building reliable web applications and APIs.
The Space Character
The space character is the most commonly encoded character. However, it can be encoded in two ways depending on the context:
%20- RFC 3986 percent-encoding, used in URL paths and most contexts+- Used inapplication/x-www-form-urlencodedformat (HTML form submissions)
When building URLs programmatically, prefer %20 for consistency. When constructing form data, the + notation is standard.
Reserved Characters Deep Dive
The Ampersand (&)
The ampersand separates query parameters. When an ampersand appears in a parameter value, it must be encoded as %26 to prevent it from being interpreted as a separator. In HTML contexts, it must also be entity-encoded as &.
The Question Mark (?)
The question mark separates the path from the query string. If a question mark appears in a parameter value, it must be encoded as %3F. Note that encodeURI() does NOT encode question marks, while encodeURIComponent() does.
The Hash (#)
The hash symbol starts the fragment identifier. In URLs, any # in a query parameter value must be encoded as %23, or the browser will interpret everything after it as a fragment identifier and not send it to the server.
The Forward Slash (/)
Forward slashes separate path segments. When a slash appears in a path segment value (like a filename containing a slash), it must be encoded as %2F. Note that some servers may decode %2F and still treat it as a path separator.
Unicode Characters
Non-ASCII characters are encoded by first converting them to their UTF-8 byte sequence, then percent-encoding each byte. Here are some examples:
- Latin e with acute (e) → UTF-8: 0xC3 0xA9 →
%C3%A9 - Chinese character (zhong) → UTF-8: 0xE4 0xB8 0xAD →
%E4%B8%AD - Emoji (globe) → UTF-8: 0xF0 0x9F 0x8C 0x8D →
%F0%9F%8C%8D
Encoding in Different URL Components
| Component | Must Encode | Example |
|---|---|---|
| Path segment | Spaces, ?, #, and non-ASCII | /path/my%20file |
| Query key | =, &, #, +, spaces | my%20key=value |
| Query value | &, #, +, spaces, = | key=hello%20world |
| Fragment | Spaces and non-ASCII | #section%20one |
Common Pitfalls
- Forgetting to encode
#in query values (the rest of the URL is silently dropped) - Double-encoding already-encoded strings (leading to
%2520instead of%20) - Using
encodeURI()instead ofencodeURIComponent()for parameter values - Not encoding
+in query values (it gets decoded as a space) - Assuming all servers handle
%2Fthe same way in paths