As a professional URI aficionado I deal with various levels of ignorance on URI percent-encoding (aka URI encoding, or URL escaping).
Worse than the lame blog comments hating on percent-encoding is the shipping code which can do actual damage. In one very large project I won't name, I've fixed code that decodes all percent-encoded octets in a URI in order to get rid of pesky percents before calling ShellExecute. An unnamed developer with similar intent but clearly much craftier did the same thing in a loop until the string's length stopped changing. As it turns out percent-encoding serves a purpose and can't just be removed arbitrarily.
Percent-encoding exists so that one can represent data in a URI that would otherwise not be allowed or would be interpretted as a delimiter instead of data. For example, the space character (U+0020) is not allowed in a URI and so must be percent-encoded in order to appear in a URI:
http://example.com/the%20path/
http://example.com/the path/
For an additional example, the question mark delimits the path from the query. If one wanted the question mark to appear as part of the path rather than delimit the path from the query, it must be percent-encoded:
http://example.com/foo%3Fbar
http://example.com/foo?bar
/foo
" from the query "bar
". And in the first, the querstion mark is percent-encoded and so
the path is "/foo%3Fbar
".
As a professional URI aficionado I deal with various levels of ignorance on URI percent-encoding (aka URI encoding, or URL escaping). The basest ignorance is with respect to the mere existence of percent-encoding. Percents in URIs are special: they always represent the start of a percent-encoded octet. That is to say, a percent is always followed by two hex digits that represents a value between 0 and 255 and doesn't show up in a URI otherwise.
The IPv6 textual syntax for scoped addresses uses the '%' to delimit the zone ID from the rest of the address. When it came time to define how to represent scoped IPv6 addresses in URIs there were two camps: Folks who wanted to use the IPv6 format as is in the URI, and those who wanted to encode or replace the '%' with a different character. The resulting thread was more lively than what shows up on the IETF URI discussion mailing list. Ultimately we went with a percent-encoded '%' which means the percent maintains its special status and singular purpose.
From the document: ‘Appendix B. Implementation Report: The encoding defined in this document currently is used for two different HTTP header fields: “Content-Disposition”, defined in [RFC6266], and “Link”, defined in [RFC5988]. As the encoding is a profile/clarification of the one defined in [RFC2231] in 1997, many user agents already supported it for use in “Content-Disposition” when [RFC5987] got published.
Since the publication of [RFC5987], two more popular desktop user agents have added support for this encoding; see http://purl.org/
NET/http/content-disposition-tests#encoding-2231-char for details. At this time, only one major
desktop user agent (Safari) does not support it.
Note that the implementation in Internet Explorer 9 does not support the ISO-8859-1 encoding; this document revision acknowledges that UTF-8 is sufficient for expressing all code points, and removes the requirement to support ISO-8859-1.’
Yay for UTF-8!
Draw ShapeCatcher a symbol and ShapeCatcher shows you the characters in Unicode that look similar. Try a smiley face. (via Eric Lawrence)
I've just got a new media center PC connected directly to my television with lots of HD space and so I'm ripping a bunch of my DVDs to the PC so I don't have to fuss with the physical media. I'm ripping with DVD Rip, viewing the results in Windows 7's Windows Media Center after turning on the WMC DVD Library, and using a powershell script I wrote to copy over cover art and metadata.
My powershell script follows. To use it you must do the following:
Download copydvdinfo.ps1
I've just updated Encode-O-Matic with a Guess Input Encoding feature. When you start Encode-O-Matic or when you use the 'Guess Input Encoding' menu item from the 'Tools' menu, Encode-O-Matic will try out various combinations of encodings and guess at which set seem to apply to your input. For instance given the following text, Encode-O-Matic will correctly guess that it is percent encoded, base64 encoded, deflate compressed text:
S%2BWqUEhLLMoFUulFpXnZQLogMa%2BkmCuPqxzILk%2FMyeHK4QIA
It should work fairly well for simple things but I did pick 'Guess' for the name of the feature to intentionally lower
expectations. It doesn't currently apply to character encodings but that may be something to consider in the future.I was reading Makers, Cory Doctorow's latest novel, as it was serialized on Tor's website but with no ability to save my place within a page I set out to find a book reading app for my G1 Android phone. I stopped looking once I found Aldiko. Its got bookmarks within chapters, configurable fonts, you can look-up words in a dictionary, and has an easy method to download public domain and creative common books. I was able to take advantage of Aldiko's in-app book download system to get Makers onto my phone so I didn't have to bother with any conversion programs etc, and I didn't have to worry about spacing or layout, the book had the correct cover art, and chapter delimiters. I'm very happy with this app and finished reading Makers on it.
Makers is set in the near future and features teams of inventors, networked 3d printers, IP contention, body modifications, and Disney -- just the sort of thing you'd expect from a Cory Doctorow novel. The tale seems to be an allegory for the Internet including displacing existing businesses and the conflict between the existing big entertainment IP owners and the plethora of fans and minor content producers. The story is engaging and the characters filled out and believable. I recommend Makers and as always its Creative Commons so go take a look right now.
I've made an OpenSearchDescriptionToHTML XSLT that given an OpenSearch description file produces HTML that describes that file, lets you install it, or search with it. For example, here's a Google OpenSearch description that uses my OpenSearchDescriptionToHTML XSLT.
I had just created an OpenSearch description for WolframAlpha at work and was going about the process of adding another install link to my search provider page so that I could install it. Thinking about it, I realized I could apply an XSLT to the OpenSearch description XML to produce the HTML automatically so I wouldn't have to modify additional documents everytime I create and want to install a new OpenSearch description. While I was in there writing the XSLT I figure why not let the user try out searching with the OpenSearch description file too. And lastly I made the XSLT apply to itself to produce HTML describing its own usage.
Incidentally, I added WolframAlpha at work to replace my FileInfo search provider for the purposes of searching for information about particular Unicode characters. For instance, look at WolframAlpha's lovely output for this search for "Bopomofo zh".