ai page 34 - Dave's Blog

Search
My timeline on Mastodon

The 'Is It UTF-8?' Quick and Dirty Test

2009 Mar 6, 5:16

I've found while debugging networking in IE its often useful to quickly tell if a string is encoded in UTF-8. You can check for the Byte Order Mark (EF BB BF in UTF-8) but, I rarely see the BOM on UTF-8 strings. Instead I apply a quick and dirty UTF-8 test that takes advantage of the well-formed UTF-8 restrictions.

Unlike other multibyte character encoding forms (see Windows supported character sets or IANA's list of character sets), for example Big5, where sticking together any two bytes is more likely than not to give a valid byte sequence, UTF-8 is more restrictive. And unlike other multibyte character encodings, UTF-8 bytes may be taken out of context and one can still know that its a single byte character, the starting byte of a three byte sequence, etc.

The full rules for well-formed UTF-8 are a little too complicated for me to commit to memory. Instead I've got my own simpler (this is the quick part) set of rules that will be mostly correct (this is the dirty part). For as many bytes in the string as you care to examine, check the most significant digit of the byte:

F:
This is byte 1 of a 4 byte encoded codepoint and must be followed by 3 trail bytes.
E:
This is byte 1 of a 3 byte encoded codepoint and must be followed by 2 trail bytes.
C..D:
This is byte 1 of a 2 byte encoded codepoint and must be followed by 1 trail byte.
8..B:
This is a trail byte.
0..7:
This is a single byte encoded codepoint.
The simpler rules can produce false positives in some cases: that is, they'll say a string is UTF-8 when in fact it might not be. But it won't produce false negatives. The following is table from the Unicode spec. that actually describes well-formed UTF-8.
Code Points 1st Byte 2nd Byte 3rd Byte 4th Byte
U+0000..U+007F 00..7F
U+0080..U+07FF C2..DF 80..BF
U+0800..U+0FFF E0 A0..BF 80..BF
U+1000..U+CFFF E1..EC 80..BF 80..BF
U+D000..U+D7FF ED 80..9F 80..BF
U+E000..U+FFFF EE..EF 80..BF 80..BF
U+10000..U+3FFFF F0 90..BF 80..BF 80..BF
U+40000..U+FFFFF F1..F3 80..BF 80..BF 80..BF
U+100000..U+10FFFF F4 80..8F 80..BF 80..BF

PermalinkCommentstest technical unicode boring charset utf8 encoding

Back From Vegas

2009 Feb 28, 2:21

Penn and Teller StageSarah and I met up with Jon, Scott, Jesse, and Grib in Las Vegas last weekend and we had a fun time.

PermalinkCommentspersonal2 monorail vegas penn-and-teller

The Old New Thing : How does Raymond decide what to post on any particular day?

2009 Feb 27, 11:00Raymond Chen has a years worth of blog content written and scheduled! "To give you an idea of how far in advance I write my blog entries, I wrote this particular entry on February 13, 2008. ... this particular entry ended up on February 27, 2009 because that was the next available open day. ... Now, with a buffer of over a year, I do have quite a bit of leeway in choosing when any particular article is published." Humorous commentor John writes in response: "If you were to disappear off the face of the Earth, how long would it be before we knew?"PermalinkCommentsblog raymond-chen writing humor

YouTube - VMware demo showing two operating systems running on one phone

2009 Feb 27, 10:49Finally, you can play solitare on your phone while waiting for Android to boot with VMWare's mobile phone OS: "VMware has demoed its mobile virtualisation platform, which could potentially let users simultaneously run two different operating systems."PermalinkCommentsvideo vmware mobile phone cellphone os android google microsoft windows windows-ce

Downloads: PlayOn Streams Netflix, Hulu, YouTube, and More to Your Xbox 360 and PS3

2009 Feb 24, 9:32Of course Netflix is already available on the 360, but PlayOn lets you watch Hulu on the 360. So far so good with the trial software. "Windows only: Previously mentioned Windows utility PlayOn-which streams popular online video to your PS3, Xbox 360, and HP MediaSmart TV-has officially left its beta phase in the dust"PermalinkCommentshulu video xbox xbox360 mediacenter dvr windows tv

android-vnc-viewer - Google Code

2009 Feb 23, 6:00"A VNC viewer for Android platform. android-vnc-viewer is forked from tightVNC viewer. This project is still under development. ... When android-vnc-viewer is more stable, it will be available on Android Market. In the meantime you can install the development builds."PermalinkCommentsandroid vnc client viewer phone cellphone google remote g1 open-source

Some Datasets Available on the Web - Data Wrangling Blog

2009 Feb 23, 10:34Lots of neat web APIs. Added to Delicious network. "Over the past year, I've been tagging interesting data I find on the web in del.icio.us. I wrote a quick python script to pull the relevant links from my del.icio.us export and list them at the bottom of this post. Most of these datasets are related to machine learning, but there are a lot of government, finance, and search datasets as well."PermalinkCommentsapi data semanticweb information reference

Semantic Search the US Library of Congress

2009 Feb 23, 10:31"This is an experimental service that makes the Library of Congress Subject Headings available as linked-data using the SKOS vocabulary. The goal of lcsh.info is to encourage experimentation and use of LCSH on the web with the hopes of informing a similar effort at the Library of Congress to make a continually updated version available. More information about the Linked Data effort can be found on the W3C Wiki."PermalinkCommentslibrary-of-congress loc semanticweb web rdf metadata library api

RemoteOpen Tool

2009 Feb 7, 10:39

On my laptop at work I often get mail with attached files the application for which I only have installed on my main computer. Tired of having to save the file on the laptop and then find it on the network via my other computer, I wrote remoteopen two nights ago. With this I open the file on my laptop and remoteopen sends it to be opened on my main computer. Overkill for this issue but it felt good to write a quick tool that solves my problem.

PermalinkCommentstechnical boring remoteopen tool

Web Proxy Autodiscovery Protocol IETF Draft Document

2009 Feb 5, 8:39The long expired draft of the Web Proxy Autodiscovery Protocol (WPAD). To summarize, use DHCP and failing that DNS to find the name of a web server and on that web server find a Proxy Auto-Config file at a well known localtion.PermalinkCommentswpad proxy internet reference browser dns dhcp

The WHATWG Blog - Blog Archive - This Week in HTML 5 - Episode 20

2009 Feb 3, 11:15"r2719 specifies that browsers should not allow scripts to set document.domain to anything on the Public Suffix List, such as "com" or "co.jp". Essential background reading on why this is dangerous: Untraceable XSS Attacks. Most browsers already block this attack, e.g. Firefox since 3.0. [Background: Re: Setting document.domain]"PermalinkCommentshtml5 tld publicsuffix dns security html internet web reference w3c

Sarah and I Are Engaged

2009 Jan 30, 5:21

Shot on the RocksOver the previous weekend Sarah and I got engaged. I had a limo pick us up and take us to a park that has a beautiful view of the Seattle skyline where I proposed, then out for dinner and drinks including a bottle of wine for the ride back. What's the point of a limo ride if you don't drink while being driven around? It was a nice night and only had a hint of rain when we came home. We don't yet have a date set.

PermalinkCommentsengagement personal nontechnical

Gravity Bone

2009 Jan 29, 10:22Play this game now. Its like half of a delicious club sandwhich. Love the music. "To make it in Nuevos Aires, one has to have nerves of silk and the filthiest of hands. Mix together a batch of espionage, some high- speed car chases, fire-spewing assassins, and you've got one oven that'll never bake cookies again. We provide the pliers and you bring the moxie."PermalinkCommentsgame videogame quake gravity-bone humor spy espionage

GROW TOWER(GAME) (EYEZMAZE --FLASH GAME--)

2009 Jan 27, 4:00"I'm very sorry to have kept you waiting so long. I've just finished New GROW !"PermalinkCommentsgame flash cool puzzle

shazow.net - Google's Lucky is fickle, too

2009 Jan 27, 10:41I just noticed that Google's Feeling Lucky doesn't work if your query contains a 'site:...' entry unless the HTTP request has a referer header pointing to Google. This person noticed too and wrote a Google App that acts like Feeling Lucky without this restriction. "It appears that Google has some secret threshold to decide when to get in the way of your destination like an angry ceiling cat catapulting itself onto your face."PermalinkCommentsgoogle im-feeling-lucky search http referer http-header app

DIY Pepsi Challenge

2009 Jan 25, 5:39

Deutsches MuseumMicrosoft isn't completely shielded from our economies issues but I still have a job and still get free soda. While that's all still the case, I decided to test Sarah's claimed ability to differentiate between Pepsi, Coke, and their diet counterparts by taste alone. I poured the four sodas into marked cups and Sarah and I each took two runs through the cups with the following guesses.

Soda Identification Challenge Results
Drink Sarah Dave
Guess 1 Guess 2 Guess 1 Guess 2
Coke Coke Coke Pepsi Diet Pepsi
Diet Coke Diet Coke Diet Pepsi Diet Coke Diet Coke
Pepsi Pepsi Pepsi Coke Coke
Diet Pepsi Diet Pepsi Diet Coke Diet Pepsi Pepsi
Total (out of 8) 6 3

As you can see from the results, Sarah's claimed ability to identify Coke and Pepsi by taste is confirmed. The first run through she got completely correct and on the second run only mistook Diet Pepsi for Diet Coke. Her excuse for the error on the second run was a tainted palate from the first run. I on the other hand was mostly incorrect. Surprisingly though my incorrect answers were mostly consistent between run one and two. For instance I thought Pepsi was Coke in both runs.

PermalinkCommentscoke microsoft waste of soda pepsi waste of time soda

PolitiFact | The Obameter: Tracking Barack Obama's Campaign Promises

2009 Jan 24, 2:42"PolitiFact has compiled about 500 promises that Barack Obama made during the campaign and is tracking their progress on our Obameter. We rate their status as No Action, In the Works or Stalled. Once we find action is completed, we rate them Promise Kept, Compromise or Promise Broken."PermalinkCommentspolitics news government obama election president tracking

It's Me, and Here's My Proof: Why Identity and Authentication Must Remain Distinct

2009 Jan 22, 9:48"Revocation presents another challenge. If a system relies only on a biometric for both identity and authentication, how do you revoke that factor? Forgotten passwords can be changed; lost smartcards can be revoked and replaced. How do you revoke a finger?"PermalinkCommentsarticle microsoft security identity authentication biometrics

kottke.org - home of fine hypertext products

2009 Jan 20, 6:16Good linksPermalinkCommentshumor blog internet web technology geek culture design daily

Google: If You Commit a Felony, Don't Google It or You'll Go to Jail

2009 Jan 20, 11:40"But, when police searched his computer, they found Google searches from a couple days after the accident like, "auto parts, auto dealers out-of-state; auto glass, Las Vegas; auto glass reporting requirements to law enforcement, auto theft," according to the prosecutor. The coup de grace? He searched for "hit-and-run," which he followed to a page about the hit-and-run he committed."PermalinkCommentsprivacy google internet crime
Older EntriesNewer Entries Creative Commons License Some rights reserved.