Guide12 min read

The Complete Guide to URL Encoding

Learn everything about URL encoding: what it is, why it matters, how percent-encoding works, and best practices for handling special characters in URLs.

What is URL Encoding?

URL encoding, also known as percent-encoding, is a mechanism for encoding information in a Uniform Resource Identifier (URI) under certain circumstances. Although it is known as URL encoding, it is also used more generally within the main Uniform Resource Identifier (URI) set, which includes both Uniform Resource Locator (URL) and Uniform Resource Name (URN).

The encoding is needed because URLs can only be sent over the Internet using the ASCII character set. Since URLs often contain characters outside the ASCII set, the URL has to be converted into a valid ASCII format. URL encoding replaces unsafe ASCII characters with a % followed by two hexadecimal digits.

How Does Percent-Encoding Work?

The percent-encoding mechanism works by converting each character that needs encoding into one or more bytes. Each byte is then represented by a percent sign followed by two hexadecimal digits. The hexadecimal digits represent the byte value of the character in the specified encoding (typically UTF-8).

For example, the space character (ASCII value 32, hex 20) becomes %20. The ampersand & (ASCII value 38, hex 26) becomes %26. Non-ASCII characters like e with acute accent are first encoded to their UTF-8 byte sequence, then each byte is percent-encoded.

Reserved vs Unreserved Characters

RFC 3986 defines two categories of characters for URIs: reserved and unreserved. Unreserved characters include uppercase and lowercase letters (A-Z, a-z), digits (0-9), hyphens (-), periods (.), underscores (_), and tildes (~). These characters can be included in a URI without encoding.

Reserved characters have special meanings in URL syntax. These include: :,/, ?, #, [, ],@, !, $, &, ',(, ), *, +, ,,;, and =. When these characters are used in a URI component outside their reserved purpose, they must be percent-encoded.

URL Encoding in Different Languages

JavaScript

JavaScript provides two pairs of functions for URL encoding: encodeURI() /decodeURI() for encoding complete URIs, and encodeURIComponent()/ decodeURIComponent() for encoding individual URI components. The key difference is that encodeURI() preserves reserved characters that have special meaning in URIs (like /, ?, #), whileencodeURIComponent() encodes all non-alphanumeric characters.

Python

Python's urllib.parse module provides quote() and unquote() for basic encoding/decoding, plus urlencode() for encoding dictionaries into query strings. The quote() function takes an optional safe parameter to specify characters that should not be encoded.

PHP

PHP offers urlencode() which encodes spaces as + (following the application/x-www-form-urlencoded format), and rawurlencode() which encodes spaces as %20 (following RFC 3986). For decoding, use urldecode() and rawurldecode() respectively.

Best Practices

  • Always encode user input before including it in URLs
  • Use encodeURIComponent() for encoding query parameter values in JavaScript
  • Use encodeURI() only when encoding a complete URI
  • Be consistent with your encoding approach throughout your application
  • Always decode URLs on the server side before processing them
  • Test your encoding with special characters including Unicode
  • Do not double-encode URLs -- always check if a string is already encoded

Related Articles

Try Our Free Tools