What is Percent Encoding? (RFC 3986 Explained Simply)

What is Percent Encoding?

Percent encoding, also called URL encoding, is the standard way to represent characters in a URI (Uniform Resource Identifier) that are not allowed or have special meaning. It works by replacing each character that needs encoding with a percent sign (%) followed by two hexadecimal digits representing the byte value of the character.

For example, the space character (byte value 0x20) is encoded as %20. The at sign @ (byte value 0x40) is encoded as %40. This mechanism ensures that any data can be safely embedded in a URI without conflicting with the URI syntax.

The term "percent encoding" comes from the use of the percent character as an escape prefix. The formal specification is defined in RFC 3986, published by the Internet Engineering Task Force (IETF) in January 2005, authored by Tim Berners-Lee, Roy Fielding, and Larry Masinter.

How Percent Encoding Works

The encoding process follows these steps: first, the character is converted to its byte representation using a character encoding (almost always UTF-8 on the modern web). Then, each byte is represented as a percent sign followed by two hexadecimal digits (using uppercase A-F by convention, though decoders should accept lowercase).

// Step-by-step encoding of a space character
// Character: ' ' (space)
// ASCII/UTF-8 byte value: 0x20 (decimal 32)
// Percent-encoded: %20

// Step-by-step encoding of a multi-byte character: e with acute
// Character: e with acute (U+00E9)
// UTF-8 bytes: 0xC3 0xA9
// Percent-encoded: %C3%A9

// Step-by-step encoding of a 3-byte character
// Character: CJK ideograph (U+4E2D)
// UTF-8 bytes: 0xE4 0xB8 0xAD
// Percent-encoded: %E4%B8%AD

The hexadecimal digits must always come in pairs. A lone percent sign or a percent sign followed by non-hexadecimal characters is invalid and will cause a decoding error. This is a common source of the "URI malformed" error in JavaScript.

Percent encoding is always reversible. The decoder reads the percent sign, takes the next two characters as a hexadecimal byte value, and converts it back to the original character. For multi-byte UTF-8 characters, consecutive encoded bytes are combined and decoded together.

Which Characters Must Be Encoded?

Not all characters need percent encoding. RFC 3986 defines two categories: unreserved characters that never need encoding, and reserved characters that must be encoded when they are not being used for their reserved purpose. All other characters (including spaces, non-ASCII characters, and control characters) must always be encoded.

Characters that do NOT need encoding (unreserved): uppercase letters (A-Z), lowercase letters (a-z), digits (0-9), hyphen (-), period (.), underscore (_), and tilde (~). These 66 characters can appear anywhere in a URI without encoding.

Characters that sometimes need encoding (reserved): : / ? # [ ] @ ! $ & ' ( ) * + , ; =. These characters have special syntactic meaning in URIs. They only need encoding when used as data rather than as delimiters.

Characters that always need encoding: spaces, non-ASCII characters (any character above 127), control characters (0-31 and 127), and unsafe characters like < > { } | \ ^ ` ".

Reserved vs Unreserved Characters in RFC 3986

The distinction between reserved and unreserved characters is fundamental to how URIs work. Reserved characters serve as delimiters that define the structure of a URI. For example, :// separates the scheme from the authority, /separates path segments, ? begins the query string, and #begins the fragment.

Category	Characters	When to Encode
Unreserved	`A-Z a-z 0-9 - . _ ~`	Never
General delimiters	`: / ? # [ ] @`	When used as data, not as delimiters
Sub-delimiters	`! $ & ' ( ) * + , ; =`	When used as data in components that assign meaning to them
All other characters	Spaces, non-ASCII, control chars, etc.	Always

A critical principle from RFC 3986 is that URIs that differ only in whether a reserved character is percent-encoded or appears literally are not equivalent. For example, /path/to and /path%2Fto are different URIs, even though %2F decodes to /. The first has two path segments; the second has one path segment that contains a literal slash.

Percent Encoding vs URL Encoding: Are They the Same?

"Percent encoding" and "URL encoding" are often used interchangeably, and in most contexts they mean the same thing. However, there is a subtle historical distinction. "URL encoding" can sometimes refer to the older application/x-www-form-urlencoded format used by HTML forms, which encodes spaces as + instead of %20.

The application/x-www-form-urlencoded format was defined in the HTML specification and predates RFC 3986. It has slightly different rules: spaces become +, and the set of characters that do not get encoded is slightly different. Modern use of the term "URL encoding" almost always refers to RFC 3986 percent encoding, where spaces are %20.

In practice, you should use RFC 3986 percent encoding for all URI contexts (path segments, query parameters in REST APIs, fragment identifiers). Use the application/x-www-form-urlencoded format only when dealing with HTML form submissions or when a specific API requires it.

// RFC 3986 percent encoding (recommended for URIs)
encodeURIComponent('hello world')  // "hello%20world"

// application/x-www-form-urlencoded (HTML forms)
new URLSearchParams({q: 'hello world'}).toString()  // "q=hello+world"

// Python equivalent
from urllib.parse import quote, quote_plus
quote('hello world', safe='')   # "hello%20world"
quote_plus('hello world')       # "hello+world"

What is Percent Encoding? (RFC 3986 Explained Simply)