Complete URL Encoded Characters Table (ASCII + Unicode)
Complete reference table of URL encoded characters. Covers all ASCII printable characters (32-126), common Unicode characters, and their percent-encoded equivalents as defined by RFC 3986.
ASCII Control and Special Characters
ASCII characters in the range 0-31 and character 127 are control characters. These are not printable and must always be percent-encoded in URLs. The most commonly encountered control characters in URL contexts are the space (ASCII 32), horizontal tab (ASCII 9), and newline characters (ASCII 10 and 13).
According to RFC 3986, any octet that is not an unreserved character, a reserved character used for its reserved purpose, or a percent-encoded triplet must be percent-encoded. In practice, this means most non-alphanumeric characters should be encoded in URI components.
| Character | ASCII Code | URL Encoded | Description |
|---|---|---|---|
| (tab) | 9 | %09 | Horizontal Tab |
| (LF) | 10 | %0A | Line Feed |
| (CR) | 13 | %0D | Carriage Return |
| (space) | 32 | %20 | Space |
Printable ASCII Characters
Below is a complete reference of all printable ASCII characters (codes 32-126) and their URL-encoded forms. Characters marked as "unreserved" in RFC 3986 do not require encoding. Reserved characters must be encoded when used outside their reserved purpose.
| Character | ASCII | Hex | URL Encoded | Type |
|---|---|---|---|---|
| (space) | 32 | 20 | %20 | Special |
| ! | 33 | 21 | %21 | Reserved |
| " | 34 | 22 | %22 | Unsafe |
| # | 35 | 23 | %23 | Reserved |
| $ | 36 | 24 | %24 | Reserved |
| % | 37 | 25 | %25 | Special |
| & | 38 | 26 | %26 | Reserved |
| ' | 39 | 27 | %27 | Reserved |
| ( | 40 | 28 | %28 | Reserved |
| ) | 41 | 29 | %29 | Reserved |
| * | 42 | 2A | %2A | Reserved |
| + | 43 | 2B | %2B | Reserved |
| , | 44 | 2C | %2C | Reserved |
| - | 45 | 2D | %2D | Unreserved |
| . | 46 | 2E | %2E | Unreserved |
| / | 47 | 2F | %2F | Reserved |
| 0-9 | 48-57 | 30-39 | Not required | Unreserved |
| : | 58 | 3A | %3A | Reserved |
| ; | 59 | 3B | %3B | Reserved |
| < | 60 | 3C | %3C | Unsafe |
| = | 61 | 3D | %3D | Reserved |
| > | 62 | 3E | %3E | Unsafe |
| ? | 63 | 3F | %3F | Reserved |
| @ | 64 | 40 | %40 | Reserved |
| A-Z | 65-90 | 41-5A | Not required | Unreserved |
| [ | 91 | 5B | %5B | Reserved |
| \ | 92 | 5C | %5C | Unsafe |
| ] | 93 | 5D | %5D | Reserved |
| ^ | 94 | 5E | %5E | Unsafe |
| _ | 95 | 5F | %5F | Unreserved |
| ` | 96 | 60 | %60 | Unsafe |
| a-z | 97-122 | 61-7A | Not required | Unreserved |
| { | 123 | 7B | %7B | Unsafe |
| | | 124 | 7C | %7C | Unsafe |
| } | 125 | 7D | %7D | Unsafe |
| ~ | 126 | 7E | %7E | Unreserved |
Common Unicode Encodings
Unicode characters are first encoded to their UTF-8 byte sequence, then each byte is individually percent-encoded. Multi-byte characters produce multiple percent-encoded triplets.
| Character | Description | UTF-8 Bytes | URL Encoded |
|---|---|---|---|
| a with umlaut | Latin Small Letter A with Diaeresis | C3 A4 | %C3%A4 |
| n with tilde | Latin Small Letter N with Tilde | C3 B1 | %C3%B1 |
| e with acute | Latin Small Letter E with Acute | C3 A9 | %C3%A9 |
| u with umlaut | Latin Small Letter U with Diaeresis | C3 BC | %C3%BC |
| Euro sign | Euro Sign | E2 82 AC | %E2%82%AC |
| Pound sign | Pound Sign | C2 A3 | %C2%A3 |
| Yen sign | Yen Sign | C2 A5 | %C2%A5 |
| CJK zhong | CJK Unified Ideograph (middle) | E4 B8 AD | %E4%B8%AD |
| Cyrillic Ya | Cyrillic Capital Letter Ya | D0 AF | %D0%AF |
| Arabic Ba | Arabic Letter Ba | D8 A8 | %D8%A8 |
How Multi-Byte Characters Are Encoded
UTF-8 uses a variable-length encoding scheme. Characters in the ASCII range (0-127) use one byte. Characters outside this range use two to four bytes. When percent-encoding, each byte of the UTF-8 representation becomes a separate %XX triplet.
Two-byte characters (U+0080 to U+07FF): These include most Latin extended, Greek, Cyrillic, and Arabic characters. For example, the German letter sharp s (U+00DF) encodes to UTF-8 bytes C3 9F, which becomes %C3%9F.
Three-byte characters (U+0800 to U+FFFF): These include CJK characters, most symbols, and the Basic Multilingual Plane. For example, the Japanese hiragana character (U+3042) encodes to UTF-8 bytes E3 81 82, which becomes %E3%81%82.
Four-byte characters (U+10000 to U+10FFFF): These include emoji and supplementary characters. For example, the grinning face emoji (U+1F600) encodes to UTF-8 bytes F0 9F 98 80, which becomes %F0%9F%98%80.
// JavaScript demonstration of multi-byte encoding
console.log(encodeURIComponent('u with umlaut'));
// Two bytes: "%C3%BC"
console.log(encodeURIComponent('CJK char'));
// Three bytes: "%E4%B8%AD"
console.log(encodeURIComponent('emoji'));
// Four bytes: "%F0%9F%98%80"
// You can verify by decoding
console.log(decodeURIComponent('%C3%BC')); // u with umlaut
console.log(decodeURIComponent('%E4%B8%AD')); // CJK char