URL Syntax
https://admin:[email protected]:80/bio.txt;pp=1&qp=2#Three
URL Part | URL Data |
---|---|
Scheme | https |
User | admin |
Password | pass123 |
Subdomain | www |
Domain | example.com |
Port | 80 |
Path | /bio.txt |
Path Parameter | pp=1 |
Query Parameter | qp=2 |
Fragment | Three |
Safe Characters
RFC1738 section 2.2 outlines the safe characters to use in an HTTP URL Scheme:
abcdefghijklmnopqrstuvwxyz0123456789$-_.+!*'(),
Safe characters can be used in URLs without any form of encoding as they aren’t reserved for special use in the construction of the URL.
Unsafe Characters
Per RFC1738 section 2.2, the following characters are unsafe for use in an HTTP URL Scheme:
space < > " # % { } | \ ^ ~ [ ] `
RFC1738 section 2.2 also states that the following characters are reserved in an HTTP URL Scheme:
; / ? : @ = &
RFC3986 section 2.2 additionally specifies reserved characters in URI schemes:
space % : / ? # [ ] @ ! $ & ' ( ) * + , ; =
Unsafe and reserved characters are reserved for use in constructing the URL scheme. These characters must be encoded so the URL can be constructed without ambiguity. Fortunately, RFC1738 has us covered.
URL Encoding
URL, or percent, encoding substitutes the percent (%) sign and two hexadecimal characters to represent unsafe characters in a URL. Here are the encodings for unsafe and reserved characters per RFCs 1738 and 3986:
Unsafe Character | URL(Percent) Encoding |
---|---|
space | %20 |
% | %25 |
: | %3A |
/ | %2F |
? | %3F |
# | %23 |
[ | %5B |
] | %5D |
@ | %40 |
! | %21 |
$ | %24 |
& | %26 |
' | %27 |
( | %28 |
) | %29 |
* | %2A |
+ | %2B |
, | %2C |
; | %3B |
= | %3D |
< | %3C |
> | %3E |
" | %22 |
{ | %7B |
} | %7D |
pipe | %7C |
\ | %5C |
^ | %5E |
~ | %7E |
` | %60 |
URL ENCODING “GOTCHAS”
Stéphane Épardaud describes several pitfalls to URL encoding at the Lunatech Blog (go read his post!). In summary, reserved characters differ for each part of the URL:
Path
spaces must be encoded to %20
, not +
: @ - . _ ~ ! $ & grave ( ) * + , ; =
are allowed unencoded
Path Parameter
=
is allowed unencoded
Query
spaces may be encoded to +
(for backward compatibility) or %20
. +
must be
encoded to %2B
? , /
are allowed unencoded
Fragment
/ ? : @ - . _ ~ ! $ & grave ( ) * + , ; =
are allowed unencoded
For more information
RFC1738 - Uniform Resource Locators
RFC3986 - Uniform Resource Identifiers
What every web developer must know about URL encoding