character page 2 - Dave's Blog

Search
My timeline on Mastodon

URI Percent-Encoding Ignorance Level 1 - Purpose

2012 Feb 15, 4:00

As a professional URI aficionado I deal with various levels of ignorance on URI percent-encoding (aka URI encoding, or URL escaping).

Worse than the lame blog comments hating on percent-encoding is the shipping code which can do actual damage. In one very large project I won't name, I've fixed code that decodes all percent-encoded octets in a URI in order to get rid of pesky percents before calling ShellExecute. An unnamed developer with similar intent but clearly much craftier did the same thing in a loop until the string's length stopped changing. As it turns out percent-encoding serves a purpose and can't just be removed arbitrarily.

Percent-encoding exists so that one can represent data in a URI that would otherwise not be allowed or would be interpretted as a delimiter instead of data. For example, the space character (U+0020) is not allowed in a URI and so must be percent-encoded in order to appear in a URI:

  1. http://example.com/the%20path/
  2. http://example.com/the path/
In the above the first is a valid URI while the second is not valid since a space appears directly in the URI. Depending on the context and the code through which the wannabe URI is run one may get unexpected failure.

For an additional example, the question mark delimits the path from the query. If one wanted the question mark to appear as part of the path rather than delimit the path from the query, it must be percent-encoded:

  1. http://example.com/foo%3Fbar
  2. http://example.com/foo?bar
In the second, the question mark appears plainly and so delimits the path "/foo" from the query "bar". And in the first, the querstion mark is percent-encoded and so the path is "/foo%3Fbar".
PermalinkCommentsencoding uri technical ietf percent-encoding

URI Percent Encoding Ignorance Level 0 - Existence

2012 Feb 10, 4:00

As a professional URI aficionado I deal with various levels of ignorance on URI percent-encoding (aka URI encoding, or URL escaping). The basest ignorance is with respect to the mere existence of percent-encoding. Percents in URIs are special: they always represent the start of a percent-encoded octet. That is to say, a percent is always followed by two hex digits that represents a value between 0 and 255 and doesn't show up in a URI otherwise.

The IPv6 textual syntax for scoped addresses uses the '%' to delimit the zone ID from the rest of the address. When it came time to define how to represent scoped IPv6 addresses in URIs there were two camps: Folks who wanted to use the IPv6 format as is in the URI, and those who wanted to encode or replace the '%' with a different character. The resulting thread was more lively than what shows up on the IETF URI discussion mailing list. Ultimately we went with a percent-encoded '%' which means the percent maintains its special status and singular purpose.

PermalinkCommentsencoding uri technical ietf percent-encoding ipv6

Out-of-Character Stephen Colbert Interviews Neil Degrasse Tyson

2011 Nov 28, 9:24PermalinkCommentshumor science neil-degrasse-tyson stephen-colbert video

Indicating Character Encoding and Language for HTTP Header Field Parameters

2011 Nov 24, 7:45

From the document: ‘Appendix B. Implementation Report: The encoding defined in this document currently is used for two different HTTP header fields: “Content-Disposition”, defined in [RFC6266], and “Link”, defined in [RFC5988]. As the encoding is a profile/clarification of the one defined in [RFC2231] in 1997, many user agents already supported it for use in “Content-Disposition” when [RFC5987] got published.

Since the publication of [RFC5987], two more popular desktop user agents have added support for this encoding; see http://purl.org/
   NET/http/content-disposition-tests#encoding-2231-char for details. At this time, only one major desktop user agent (Safari) does not support it.

Note that the implementation in Internet Explorer 9 does not support the ISO-8859-1 encoding; this document revision acknowledges that UTF-8 is sufficient for expressing all code points, and removes the requirement to support ISO-8859-1.’

Yay for UTF-8!

PermalinkCommentstechnical http http-headers ie9 internationalization utf-8 encoding

ShapeCatcher

2011 Nov 14, 12:37

Draw ShapeCatcher a symbol and ShapeCatcher shows you the characters in Unicode that look similar.  Try a smiley face.  (via Eric Lawrence)

PermalinkCommentstechnical unicode via:ericlaw

[audio] McDonald's Drops 'Hammurderer' Character From Advertising

2010 Nov 21, 9:30PermalinkComments

DVD Ripping and Viewing in Windows Media Center

2010 Aug 17, 3:05

I've just got a new media center PC connected directly to my television with lots of HD space and so I'm ripping a bunch of my DVDs to the PC so I don't have to fuss with the physical media. I'm ripping with DVD Rip, viewing the results in Windows 7's Windows Media Center after turning on the WMC DVD Library, and using a powershell script I wrote to copy over cover art and metadata.

My powershell script follows. To use it you must do the following:

  1. Run Windows Media Center with the DVD in the drive and view the disc's metadata info.
  2. Rip each DVD to its own subdirectory of a common directory.
  3. The name of the subdirectory to which the DVD is ripped must have the same name as the DVD name in the metadata. An exception to this are characters that aren't allowed in Windows paths (e.g. <, >, ?, *, etc)
  4. Run the script and pass the path to the common directory containing the DVD rips as the first parameter.
Running WMC and viewing the DVD's metadata forces WMC to copy the metadata off the Internet and cache it locally. After playing with Fiddler and reading this blog post on WMC metadata I made the following script that copies metadata and cover art from the WMC cache to the corresponding DVD rip directory.

Download copydvdinfo.ps1

PermalinkCommentspowershell wmc technical tv dvd windows-media-center

RFC 5987 - Character Set and Language Encoding for Hypertext Transfer Protocol (HTTP) Header Field Parameters

2010 Aug 13, 11:47Other characters sets for HTTP headers: "By default, message header field parameters in Hypertext Transfer Protocol (HTTP) messages cannot carry characters outside the ISO-8859-1 character set. RFC 2231 defines an encoding mechanism for use in Multipurpose Internet Mail Extensions (MIME) headers. This document specifies an encoding suitable for use in HTTP header fields that is compatible with a profile of the encoding defined in RFC 2231."PermalinkCommentsrfc language localization charset http technical reference http-header

What every programmer needs to know about game networking « Gaffer on Games

2010 Jul 5, 8:38"This way the player appears to control their own character without any latency, and provided that the client and server character simulation code is deterministic – giving exactly the same result for the same inputs on the client and server – it is rarely corrected."PermalinkCommentsnetwork programming game technical quake history

Members Of The Supreme Court As Human Beings

2010 May 14, 9:37New York Times article from May 15th 1910 titled "MEMBERS OF THE SUPREME COURT AS HUMAN BEINGS: When Not on the Bench They Are Pretty Much Like Other People — Characteristic Stores About Them". This is the NYT 1910's version of US Weekly's current "Celebrities Are Just Like Us!" feature.PermalinkCommentshumor history article supreme-court

A quote from Sacramento Credit Union

2010 May 14, 8:52It really is an actual quote from the Sacramento Credit Union's website: "The answers to your Security Questions are case sensitive and cannot contain special characters like an apostrophe, or the words “insert,” “delete,” “drop,” “update,” “null,” or “select.”"

Out of context that seems hilarious, but if you read the doc the next Q/A twists it like a defense in depth rather than a 'there-I-fixed-it'.PermalinkCommentstechnical security humor sql

Encode-O-Matic: Guess Encoding

2010 Apr 4, 2:02

I've just updated Encode-O-Matic with a Guess Input Encoding feature. When you start Encode-O-Matic or when you use the 'Guess Input Encoding' menu item from the 'Tools' menu, Encode-O-Matic will try out various combinations of encodings and guess at which set seem to apply to your input. For instance given the following text, Encode-O-Matic will correctly guess that it is percent encoded, base64 encoded, deflate compressed text:

S%2BWqUEhLLMoFUulFpXnZQLogMa%2BkmCuPqxzILk%2FMyeHK4QIA
It should work fairly well for simple things but I did pick 'Guess' for the name of the feature to intentionally lower expectations. It doesn't currently apply to character encodings but that may be something to consider in the future.PermalinkCommentstechnical encodeomatic tool encoding

Code: Flickr Developer Blog » A Chinese puzzle: Unicode and EXIF metadata parsing

2010 Jan 8, 2:08Flickr dev talks image metadata the various forms which to prefer and how to guess at their character encodings.PermalinkCommentsunicode charset flickr photo image exif programming reference xmp technical

curlies - Project Hosting on Google Code

2009 Dec 23, 9:58Results of a set of black box tests on various characters in various parts of URLs in various popular browsers.PermalinkCommentsvia:mnot url uri iri idn dns browser web technical

Android eBook Reader And Makers

2009 Dec 13, 1:27

I was reading Makers, Cory Doctorow's latest novel, as it was serialized on Tor's website but with no ability to save my place within a page I set out to find a book reading app for my G1 Android phone. I stopped looking once I found Aldiko. Its got bookmarks within chapters, configurable fonts, you can look-up words in a dictionary, and has an easy method to download public domain and creative common books. I was able to take advantage of Aldiko's in-app book download system to get Makers onto my phone so I didn't have to bother with any conversion programs etc, and I didn't have to worry about spacing or layout, the book had the correct cover art, and chapter delimiters. I'm very happy with this app and finished reading Makers on it.

Makers is set in the near future and features teams of inventors, networked 3d printers, IP contention, body modifications, and Disney -- just the sort of thing you'd expect from a Cory Doctorow novel. The tale seems to be an allegory for the Internet including displacing existing businesses and the conflict between the existing big entertainment IP owners and the plethora of fans and minor content producers. The story is engaging and the characters filled out and believable. I recommend Makers and as always its Creative Commons so go take a look right now.

PermalinkCommentstor aldiko cory doctorow g1 makers ebook android book

Vanjamrgan's Gallery

2009 Nov 29, 1:53Fictional characters with beards. "Bearded: Brock Samson, Bearded: Robocop, Bearded: Hellboy, Bearded: Boba Fett, Bearded: Black Mage, Bearded: Wario, Bearded: Clint Eastwood, Bearded: Batman"
PermalinkCommentshumor art beard facial-hair hero brock-samson robocop batman boba-fett

Timelines: Time travel in popular film and tv | Information Is Beautiful

2009 Aug 28, 3:02Lovely visualization of the time travels taken by characters in various movies and television series and notes the places where they overlap.PermalinkCommentsvia:waxy time-travel bttf startrek tv movie information visualization

OpenSearchDescriptionToHTML Tool

2009 Jun 10, 3:36

I've made an OpenSearchDescriptionToHTML XSLT that given an OpenSearch description file produces HTML that describes that file, lets you install it, or search with it. For example, here's a Google OpenSearch description that uses my OpenSearchDescriptionToHTML XSLT.

I had just created an OpenSearch description for WolframAlpha at work and was going about the process of adding another install link to my search provider page so that I could install it. Thinking about it, I realized I could apply an XSLT to the OpenSearch description XML to produce the HTML automatically so I wouldn't have to modify additional documents everytime I create and want to install a new OpenSearch description. While I was in there writing the XSLT I figure why not let the user try out searching with the OpenSearch description file too. And lastly I made the XSLT apply to itself to produce HTML describing its own usage.

Incidentally, I added WolframAlpha at work to replace my FileInfo search provider for the purposes of searching for information about particular Unicode characters. For instance, look at WolframAlpha's lovely output for this search for "Bopomofo zh".

PermalinkCommentstechnical xml wolframalpha opensearchdescriptiontohtml xslt opensearch

Amazon.com: Shatnerquake: Jeff Burk: Books

2009 May 1, 11:25Seems like this would be a good gift for someone. "...all of the characters ever played by William Shatner are suddenly sucked into our world. Their mission: hunt down and destroy the real William Shatner. Featuring: Captain Kirk, TJ Hooker, Denny Crane, Rescue 911 Shatner, Singer Shatner, Shakespearean Shatner, Twilight Zone Shatner, Cartoon Kirk, Esperanto Shatner, Priceline Shatner, SNL Shatner, and - of course - William Shatner!"PermalinkCommentshumor book gift wishlist william-shatner shatner startrek via:boingboing

[whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009 Apr 23, 1:35"This e-mail is an attempt to give a relatively concise yet reasonably complete overview of non-Unicode character sets and encodings for 'Chinese characters', excluding those which are not supported by at least one of the four browsers IE, Safari, Firefox and Opera (henceforth 'all browsers'), and tentatively avoiding technical details which are out of scope for HTML5 unless they are important to gain a general understanding of the relevant issues."PermalinkCommentshtml html5 iso-2022 charset encoding character unicode cjk
Older EntriesNewer Entries Creative Commons License Some rights reserved.