Reference10 min read

URL Encoding Special Characters: A Developer Reference

A comprehensive reference for URL encoding special characters including spaces, ampersands, unicode, and more.

Special Characters in URLs

URLs have a strict syntax defined by RFC 3986. Many characters have special meanings or are not allowed in certain parts of a URL. Understanding how to properly encode these characters is essential for building reliable web applications and APIs.

The Space Character

The space character is the most commonly encoded character. However, it can be encoded in two ways depending on the context:

  • %20 - RFC 3986 percent-encoding, used in URL paths and most contexts
  • + - Used in application/x-www-form-urlencoded format (HTML form submissions)

When building URLs programmatically, prefer %20 for consistency. When constructing form data, the + notation is standard.

Reserved Characters Deep Dive

The Ampersand (&)

The ampersand separates query parameters. When an ampersand appears in a parameter value, it must be encoded as %26 to prevent it from being interpreted as a separator. In HTML contexts, it must also be entity-encoded as &.

The Question Mark (?)

The question mark separates the path from the query string. If a question mark appears in a parameter value, it must be encoded as %3F. Note that encodeURI() does NOT encode question marks, while encodeURIComponent() does.

The Hash (#)

The hash symbol starts the fragment identifier. In URLs, any # in a query parameter value must be encoded as %23, or the browser will interpret everything after it as a fragment identifier and not send it to the server.

The Forward Slash (/)

Forward slashes separate path segments. When a slash appears in a path segment value (like a filename containing a slash), it must be encoded as %2F. Note that some servers may decode %2F and still treat it as a path separator.

Unicode Characters

Non-ASCII characters are encoded by first converting them to their UTF-8 byte sequence, then percent-encoding each byte. Here are some examples:

  • Latin e with acute (e) → UTF-8: 0xC3 0xA9 → %C3%A9
  • Chinese character (zhong) → UTF-8: 0xE4 0xB8 0xAD → %E4%B8%AD
  • Emoji (globe) → UTF-8: 0xF0 0x9F 0x8C 0x8D → %F0%9F%8C%8D

Encoding in Different URL Components

ComponentMust EncodeExample
Path segmentSpaces, ?, #, and non-ASCII/path/my%20file
Query key=, &, #, +, spacesmy%20key=value
Query value&, #, +, spaces, =key=hello%20world
FragmentSpaces and non-ASCII#section%20one

Common Pitfalls

  • Forgetting to encode # in query values (the rest of the URL is silently dropped)
  • Double-encoding already-encoded strings (leading to %2520 instead of %20)
  • Using encodeURI() instead of encodeURIComponent() for parameter values
  • Not encoding + in query values (it gets decoded as a space)
  • Assuming all servers handle %2F the same way in paths

Related Articles

Try Our Free Tools