URI: Uniform Resource Identifier
A URI identifies a resource by a sequence of characters. An URI is not a file name, although in most cases that seems to be the case. For example, because of content negotiation, the same URI can lead to (different) documents in different languages.
RFC 3986 requires the URI (that is: its sequence of characters) to be chosen from a limited subset of the repertoire of US-ASCII characters.
Characters outside that subset can be encoded with percent encoding. With percent encoding, the character is represented with a percent sign followed by the two-letter hexadecimal representation of that character (for example %20
for space).
Some
programming languages provide special functions to decode percent ecoded characters, for example the
PHP function
urldecode()
.
A property of URIs is that different persons or organizations can independently create them and then use them to identify things.
An URI starts with a scheme name. A particular scheme's specification may restrict the syntax and semantics of identifiers in that scheme.
There are two types of URIs: URLs and URNs
Components of an URI
RFC 3986 has this nice ASCII Art diagram that shows the component parts in an URI:
foo://example.com:8042/over/there?name=ferret#nose
\_/ \______________/\_________/ \_________/ \__/
| | | | |
scheme authority path query fragment
| _____________________|__
/ \ / \
urn:example:animal:ferret:nose
Percent encoding
Percent encoding (aka URL encoding) allows to escape reserved characters (such as :
, /
etc) in an URI and/or(?) URL by prefixing the hexadecimal representation of the ASCII value of the character with a percent sign: %3A
= :
, %2F
= /
, %25
= %
etc.
URC: Uniform Resource Citation
URCs provide a set of attribute/value pairs that describe properties of URIs like authorship, publisher, copyright etc.