As a professional URI aficionado I deal with various levels of ignorance on URI percent-encoding (aka URI encoding, or URL escaping).
Getting into the more subtle levels of URI percent-encoding ignorance, folks try to apply their knowledge of percent-encoding to URIs as a whole producing the concepts escaped URIs and unescaped
URIs. However there are no such things - URIs themselves aren't percent-encoded or decoded but rather contain characters that are percent-encoded or decoded. Applying percent-encoding or decoding
to a URI as a whole produces a new and non-equivalent URI.
Instead of lingering on the incorrect concepts we'll just cover the correct ones: there's raw unencoded data, non-normal form URIs and normal form URIs. For example:
http://example.com/%74%68%65%3F%70%61%74%68?query
http://example.com/the%3Fpath?query
"http", "example.com", "the?path", "query"
In the above (A) is not an 'encoded URI' but rather a non-normal form URI. The characters of 'the' and 'path' are percent-encoded but as unreserved characters specific in the RFC should not be
encoded. In the normal form of the URI (B) the characters are decoded. But (B) is not a 'decoded URI' -- it still has an encoded '?' in it because that's a reserved character which by the RFC
holds different meaning when appearing decoded versus encoded. Specifically in this case, it appears encoded which means it is data -- a literal '?' that appears as part of the path segment. This
is as opposed to the decoded '?' that appears in the URI which is not part of the path but rather the delimiter to the query.
Usually when developers talk about decoding the URI what they really want is the raw data from the URI. The raw decoded data is (C) above. The only thing to note beyond what's covered already is
that to obtain the decoded data one must parse the URI before percent decoding all percent-encoded octets.
Of course the exception here is when a URI is the raw data. In this case you must percent-encode the URI to have it appear in another URI. More on percent-encoding while constructing URIs
later.
As a professional URI aficionado I deal with various levels of ignorance on URI percent-encoding (aka URI encoding, or URL escaping).
Worse than the lame blog comments hating on percent-encoding is the shipping code which can do actual damage. In one very large project I won't name, I've fixed code that decodes all
percent-encoded octets in a URI in order to get rid of pesky percents before calling ShellExecute. An unnamed developer with similar intent but clearly much craftier did the same thing in a loop
until the string's length stopped changing. As it turns out percent-encoding serves a purpose and can't just be removed arbitrarily.
Percent-encoding exists so that one can represent data in a URI that would otherwise not be allowed or would be interpretted as a delimiter instead of data. For example, the space character
(U+0020) is not allowed in a URI and so must be percent-encoded in order to appear in a URI:
http://example.com/the%20path/
http://example.com/the path/
In the above the first is a valid URI while the second is not valid since a space appears directly in the URI. Depending on the context and the code through which the wannabe URI is run one
may get unexpected failure.
For an additional example, the question mark delimits the path from the query. If one wanted the question mark to appear as part of the path rather than delimit the path from the query, it must
be percent-encoded:
http://example.com/foo%3Fbar
http://example.com/foo?bar
In the second, the question mark appears plainly and so delimits the path "/foo" from the query "bar". And in the first, the querstion mark is percent-encoded and so
the path is "/foo%3Fbar".
As a professional URI aficionado I deal with various levels of ignorance on URI percent-encoding (aka URI encoding, or URL escaping). The basest ignorance is with respect to the mere existence of
percent-encoding. Percents in URIs are special: they always represent the start of a percent-encoded octet. That is to say, a percent is always followed by two hex digits that represents a value
between 0 and 255 and doesn't show up in a URI otherwise.
The IPv6 textual syntax for scoped addresses uses the '%' to delimit the zone ID from the rest of the address. When it came time to define how
to represent scoped IPv6 addresses in URIs there were two camps: Folks who wanted to use the IPv6 format as is in the URI, and those who wanted to encode or replace the '%' with a different
character. The resulting thread was more lively than what shows up on the IETF URI discussion mailing list.
Ultimately we went with a percent-encoded '%' which means the percent maintains its special status and singular purpose.
“Serious Sam 3′s DRM is brilliantly cruel, punishing only those who pirated it. By relentlessly pursuing them with a giant invincible armoured scorpion.”
Wow, FTA: "Given all of this, reporter Charlie Savage of the NY Times filed a Freedom of Information Act request to find out the federal government's interpretation of its own law... and had it
refused. According to the federal government, its own interpretation of the law is classified."
2011 Sep 28, 10:22They've got maps from your favorite NES games as giant images. I'm using SMB3 1-1 as my desktop background. I've got a four monitor setup now and so its tough to find desktop backgrounds but Mario
levels easily cover my whole desktop.gamevideogamemapnintendones
2011 Jul 18, 2:38Neat idea: "When the user wants to visit a blacklisted site, the client establishes an encrypted HTTPS connection to a non-blacklisted web server outside the censor’s network, which could be a normal
site that the user regularly visits... The client secretly marks the connection as a Telex request by inserting a cryptographic tag into the headers. We construct this tag using a mechanism called
public-key steganography... As the connection travels over the Internet en route to the non-blacklisted site, it passes through routers at various ISPs in the core of the network. We envision that
some of these ISPs would deploy equipment we call Telex stations."internetsecuritytoolscensorshiptechnical
2011 Jul 6, 7:28"Over this past Fourth Of July weekend, we neglected to note that it was the 15th anniversary of Roland Emmerich’s 1996 blockbuster Independence Day. New York comedian Sean Kleier remembered, and
decided to make his own tribute, going to various locations around New York City—Times Square, the Brooklyn Bridge, the subway, and inside a Victoria’s Secret—reciting Bill Pullman’s rousing speech
before the movie's final battle sequence, megaphone and all." humorvideobill-pullmanindependence-daynew-york
2011 Jul 1, 10:09"I periodically get email from folks who, having read "Accelerando", assume I am some kind of fire-breathing extropian zealot who believes in the imminence of the singularity, the uploading of the
libertarians, and the rapture of the nerds. I find this mildly distressing, and so I think it's time to set the record straight and say what I really think. Short version: Santa Claus doesn't exist."scifisingularitycharles-strossfuturefiction
2011 Jun 20, 2:36A mix of 36 YouTube videos of various people playing Radiohead's Paranoid Android. It sounds good but the video too is very compelling. Also I would be psych'ed if it were my video picked to rock out
at 2:50.
2011 May 1, 7:51"The hilarious speeches by Seth Meyers and Barack Obama at the 2011 White House Correspondents’ Dinner. Seth and Obama really let Trump have it in their speechs. Trump’s reaction in the audience is
priceless."humorpoliticsbarack-obamaseth-meyersvideowhite-house-correspondents-dinner
2011 Apr 30, 4:33"The HTTP-based Memento framework bridges the present and past Web by interlinking current resources with resources that encapsulate their past. It facilitates obtaining representations of prior
states of a resource, available from archival resources in Web archives or version resources in content management systems, by leveraging the resource's URI and a preferred datetime. To this end, the
framework introduces datetime negotiation (a variation on content negotiation), and new Relation Types for the HTTP Link header aimed at interlinking resources with their archival/version resources.
It also introduces various discovery mechanisms that further support briding the present and past Web."technicalrfcreferencehttpheadertimemementoarchive
2010 Sep 27, 1:39Twitter account that retweets Kanye West's tweets with 'Liz Lemmon, ' prefixed to produce occasionally hilarious 30 Rock quote sound-a-likes.humor30-rockmemekanye-westmind-grapestwitter
2010 Jul 1, 10:51"Sometimes it’s hard to judge whether an engineering effort has been successful or not. It can take years for an idea to catch on, to go from being the butt of jokes to becoming an international
imperative (IPv6). Uniform Resource Names (URNs), which are part of the Uniform Resource Identifier (URI) family, are conceptually at least as old as IPv6. While not figuring in international
directives for deployment, they-and the technology engineered to resolve them-are still going concerns."ietfurnurihistorytechnicalinterneturl
2010 May 14, 8:52It really is an actual quote from the Sacramento Credit Union's website: "The answers to your Security Questions are case sensitive and cannot contain special characters like an apostrophe, or the
words “insert,” “delete,” “drop,” “update,” “null,” or “select.”"
Out of context that seems hilarious, but if you read the doc the next Q/A twists it like a defense in depth rather than a 'there-I-fixed-it'.technicalsecurityhumorsql