URL encoding and parsing, explained
A URL looks like plain text, but not every character can safely appear anywhere inside it. Spaces, question marks, ampersands, hashes, slashes, non-English text, and punctuation can all change the meaning of a URL if they land in the wrong place.
URL encoding is the way those characters are written safely. URL parsing is the separate job of splitting a URL into its pieces: scheme, host, path, query string, fragment, and parameters.
Those two jobs are related, but they are not the same. Encoding protects characters so they can travel through a URL. Parsing tells you what each part of the URL means.
What percent-encoding is
Percent-encoding writes a byte as a percent sign followed by two hexadecimal digits.
A space can become:
%20
The character & can become:
%26
The question mark ? can become:
%3F
This matters because those characters already have jobs inside URLs. ? starts the query string. & separates query parameters. # starts a fragment. A literal space is not allowed in a raw URL.
If you mean the character as data, encode it. If you mean it as URL syntax, leave it in its syntax role.
That difference is the heart of most URL encoding mistakes.
Reserved characters
Some characters are reserved because URLs use them as structure.
Common examples include:
:after the scheme, as inhttps:/inside paths?before a query string&between query parameters=between a parameter name and value#before a fragment
Reserved does not mean forbidden. It means the character can carry special meaning. If the character is part of the URL structure, it should stay readable. If it is part of a value, it should usually be encoded.
For example, this query string has two parameters:
?q=red shoes&sort=price
But if the search text itself is red shoes & socks, the ampersand must not be read as a parameter separator. The value needs encoding:
?q=red%20shoes%20%26%20socks
Now the & inside the search phrase is data, not syntax.
Query strings
The query string starts after ? and usually carries key-value pairs.
Example:
https://example.com/search?q=red%20shoes&page=2
This has:
qwith valuered shoespagewith value2
The query string is where encoding mistakes show up most often because values may contain spaces, symbols, copied text, Unicode, or another URL.
A nested URL is a classic case:
redirect=https%3A%2F%2Fexample.com%2Faccount%3Ftab%3Dsettings
The inner URL is encoded so its own ?, /, and = characters do not get confused with the outer URL's structure.
Plus signs and spaces
Spaces are a special source of confusion.
In many URL contexts, a space is encoded as %20. In form-style query encoding, spaces are often represented as +.
That means this:
q=red+shoes
may decode to:
red shoes
But a literal plus sign is also a real character. If you want the value C++, the plus signs need protection:
C%2B%2B
Otherwise some decoders may turn the plus signs into spaces.
This is why Base64 strings, programming language names, and math expressions can break when pasted into query strings without careful encoding.
Unicode in URLs
Modern URLs can represent Unicode text, but the underlying encoded form still matters.
Take the Arabic word:
مرحبا
Before it can safely travel in a URL path or query string, it is represented as UTF-8 bytes, and those bytes are percent-encoded. The final encoded form is longer than the original visible word.
The important part is that encoding works at the byte level, not just at the character-you-see level. That is why non-English text can turn into a long chain of %D9 and %85 style sequences.
If the encoding and decoding sides disagree about character encoding, the result can turn into mojibake: readable text becomes broken symbols.
Parsing a URL into components
Parsing asks a different question from encoding.
Given this URL:
https://example.com:443/products/shoes?q=red%20shoes&page=2#reviews
A parser can identify:
- scheme:
https - host:
example.com - port:
443 - path:
/products/shoes - query:
q=red%20shoes&page=2 - fragment:
reviews
Parsing is useful when you need to inspect a link, read parameters, debug redirects, or check whether a URL points where you think it points.
Encoding is useful when you are building or repairing the URL. Parsing is useful when you are reading it.
A worked example
Start with this intended destination:
https://example.com/search
You want to add three pieces of data:
q = red shoes & socks
city = القاهرة
return = https://example.com/account?tab=orders
Each value contains characters that need care. The search phrase has spaces and &. The city has Unicode. The return URL has its own URL syntax.
A safely encoded version could look like this:
https://example.com/search?q=red%20shoes%20%26%20socks&city=%D8%A7%D9%84%D9%82%D8%A7%D9%87%D8%B1%D8%A9&return=https%3A%2F%2Fexample.com%2Faccount%3Ftab%3Dorders
Read it from the outside in:
- the first
?starts the outer query string - outer
&characters separate the three parameters %26inside theqvalue means a literal ampersand- the Arabic city name is UTF-8 percent-encoded
- the return URL is encoded so its own
?does not start a new outer query
That is the practical skill: know which characters are URL structure and which characters are data.
Try the browser tools
These tools cover the common URL workflow, and each one runs in your browser.
- URL Encoder - encode text so spaces, Unicode, and reserved characters can safely sit inside a URL.
- URL Decoder - turn percent-encoded text back into readable form when you need to inspect it.
- URL Parser - split a URL into components and query parameters so you can see what it really contains.
That split mirrors the real work: encode when building, decode when reading, parse when you need the structure.
Common mistakes
Encoding the whole URL when you meant to encode one value. This can turn structural slashes and question marks into data and break the link.
Forgetting to encode nested URLs. Redirect and callback parameters often contain a full URL inside another URL.
Confusing + with a literal plus sign. In query strings, a plus may be decoded as a space. Use %2B when the plus sign is part of the value.
Decoding twice. Double-decoding can turn protected data back into active URL syntax.
Using regex as a full URL parser. Regex can help find URL-looking text, but real parsing is safer when components and query parameters matter.
FAQ
No. Anyone can decode it. Encoding makes characters safe for URLs; it does not hide the content.
It depends on context. %20 is common in paths and general URL encoding. + often appears in form-style query strings.
That often means %20 was encoded again. The percent sign became %25, so %20 turned into %2520.
They can, but the transport form may use UTF-8 percent-encoding or domain-name conversion for hostnames.
Decoding turns encoded characters back into text. Parsing splits the URL into structural pieces and parameters.
Related guides
- Base64 encoding, explained - another encoding used when data needs to survive a text-based channel.
- QR codes, explained - many QR codes carry URLs, so parsing the destination matters.
- Regular expressions, explained - useful for finding URL-like text, but not a replacement for a URL parser.