2009 Mar 6, 5:16
I've found while debugging networking in IE its often useful to quickly tell if a string is encoded in UTF-8. You can check for the Byte Order Mark (EF BB BF in UTF-8) but, I rarely see the BOM on
UTF-8 strings. Instead I apply a quick and dirty UTF-8 test that takes advantage of the well-formed UTF-8 restrictions.
Unlike other multibyte character encoding forms (see Windows supported character sets or IANA's list of character sets), for example Big5, where sticking together any two bytes is more likely than not to give a valid byte sequence, UTF-8 is more restrictive. And unlike
other multibyte character encodings, UTF-8 bytes may be taken out of context and one can still know that its a single byte character, the starting byte of a three byte sequence, etc.
The full rules for well-formed UTF-8 are a little too complicated for me to commit to memory. Instead I've got my own simpler (this is the quick part) set of rules that will be mostly correct (this
is the dirty part). For as many bytes in the string as you care to examine, check the most significant digit of the byte:
-
F:
-
This is byte 1 of a 4 byte encoded codepoint and must be followed by 3 trail bytes.
-
E:
-
This is byte 1 of a 3 byte encoded codepoint and must be followed by 2 trail bytes.
-
C..D:
-
This is byte 1 of a 2 byte encoded codepoint and must be followed by 1 trail byte.
-
8..B:
-
This is a trail byte.
-
0..7:
-
This is a single byte encoded codepoint.
The simpler rules can produce false positives in some cases: that is, they'll say a string is UTF-8 when in fact it might not be. But it won't produce false negatives. The following is table
from the
Unicode spec. that actually describes well-formed UTF-8.
Code Points
|
1st Byte
|
2nd Byte
|
3rd Byte
|
4th Byte
|
U+0000..U+007F
|
00..7F
|
U+0080..U+07FF
|
C2..DF
|
80..BF
|
U+0800..U+0FFF
|
E0
|
A0..BF
|
80..BF
|
U+1000..U+CFFF
|
E1..EC
|
80..BF
|
80..BF
|
U+D000..U+D7FF
|
ED
|
80..9F
|
80..BF
|
U+E000..U+FFFF
|
EE..EF
|
80..BF
|
80..BF
|
U+10000..U+3FFFF
|
F0
|
90..BF
|
80..BF
|
80..BF
|
U+40000..U+FFFFF
|
F1..F3
|
80..BF
|
80..BF
|
80..BF
|
U+100000..U+10FFFF
|
F4
|
80..8F
|
80..BF
|
80..BF
|
test technical unicode boring charset utf8 encoding 2009 Feb 28, 2:21
Sarah and I met up with Jon, Scott, Jesse, and Grib in Las Vegas last weekend and we had a fun time.
- I got to play the Monorail Song via YouTube on my phone while on the Las Vegas Monorail rather than just chanting monorail like last
year.
- I didn't lose more gambling than I spent on food for the trip.
- Contrary to what some suggested, Sarah and I did not get married in Vegas.
- I finally saw a live Penn & Teller show and it was great!
personal2 monorail vegas penn-and-teller 2009 Feb 27, 11:00Raymond Chen has a years worth of blog content written and scheduled! "To give you an idea of how far in advance I write my blog entries, I wrote this particular entry on February 13, 2008. ... this
particular entry ended up on February 27, 2009 because that was the next available open day. ... Now, with a buffer of over a year, I do have quite a bit of leeway in choosing when any particular
article is published." Humorous commentor John writes in response: "If you were to disappear off the face of the Earth, how long would it be before we knew?"
blog raymond-chen writing humor 2009 Feb 27, 10:49Finally, you can play solitare on your phone while waiting for Android to boot with VMWare's mobile phone OS: "VMware has demoed its mobile virtualisation platform, which could potentially let users
simultaneously run two different operating systems."
video vmware mobile phone cellphone os android google microsoft windows windows-ce 2009 Feb 24, 9:32Of course Netflix is already available on the 360, but PlayOn lets you watch Hulu on the 360. So far so good with the trial software. "Windows only: Previously mentioned Windows utility PlayOn-which
streams popular online video to your PS3, Xbox 360, and HP MediaSmart TV-has officially left its beta phase in the dust"
hulu video xbox xbox360 mediacenter dvr windows tv 2009 Feb 23, 6:00"A VNC viewer for Android platform. android-vnc-viewer is forked from tightVNC viewer. This project is still under development. ... When android-vnc-viewer is more stable, it will be available on
Android Market. In the meantime you can install the development builds."
android vnc client viewer phone cellphone google remote g1 open-source 2009 Feb 23, 10:34Lots of neat web APIs. Added to Delicious network. "Over the past year, I've been tagging interesting data I find on the web in del.icio.us. I wrote a quick python script to pull the relevant links
from my del.icio.us export and list them at the bottom of this post. Most of these datasets are related to machine learning, but there are a lot of government, finance, and search datasets as well."
api data semanticweb information reference 2009 Feb 23, 10:31"This is an experimental service that makes the Library of Congress Subject Headings available as linked-data using the SKOS vocabulary. The goal of lcsh.info is to encourage experimentation and use
of LCSH on the web with the hopes of informing a similar effort at the Library of Congress to make a continually updated version available. More information about the Linked Data effort can be found
on the W3C Wiki."
library-of-congress loc semanticweb web rdf metadata library api 2009 Feb 7, 10:39
On my laptop at work I often get mail with attached files the application for which I only have installed on my main computer. Tired of having to save the file on the laptop and then find it on the
network via my other computer, I wrote remoteopen two nights ago. With this I open the file on my laptop and remoteopen sends it to be opened on
my main computer. Overkill for this issue but it felt good to write a quick tool that solves my problem.
technical boring remoteopen tool 2009 Feb 5, 8:39The long expired draft of the Web Proxy Autodiscovery Protocol (WPAD). To summarize, use DHCP and failing that DNS to find the name of a web server and on that web server find a Proxy Auto-Config
file at a well known localtion.
wpad proxy internet reference browser dns dhcp 2009 Feb 3, 11:15"r2719 specifies that browsers should not allow scripts to set document.domain to anything on the Public Suffix List, such as "com" or "co.jp". Essential background reading on why this is dangerous:
Untraceable XSS Attacks. Most browsers already block this attack, e.g. Firefox since 3.0. [Background: Re: Setting document.domain]"
html5 tld publicsuffix dns security html internet web reference w3c 2009 Jan 30, 5:21
Over the previous weekend Sarah and I got engaged. I had a limo pick us up and take us to a park that has a beautiful view of the Seattle
skyline where I proposed, then out for dinner and drinks including a bottle of wine for the ride back. What's the point of a limo ride if you don't drink while being driven around? It was a nice
night and only had a hint of rain when we came home. We don't yet have a date set.
engagement personal nontechnical 2009 Jan 29, 10:22Play this game now. Its like half of a delicious club sandwhich. Love the music. "To make it in Nuevos Aires, one has to have nerves of silk and the filthiest of hands. Mix together a batch of
espionage, some high- speed car chases, fire-spewing assassins, and you've got one oven that'll never bake cookies again. We provide the pliers and you bring the moxie."
game videogame quake gravity-bone humor spy espionage 2009 Jan 27, 4:00"I'm very sorry to have kept you waiting so long. I've just finished New GROW !"
game flash cool puzzle 2009 Jan 27, 10:41I just noticed that Google's Feeling Lucky doesn't work if your query contains a 'site:...' entry unless the HTTP request has a referer header pointing to Google. This person noticed too and wrote a
Google App that acts like Feeling Lucky without this restriction. "It appears that Google has some secret threshold to decide when to get in the way of your destination like an angry ceiling cat
catapulting itself onto your face."
google im-feeling-lucky search http referer http-header app 2009 Jan 25, 5:39
Microsoft isn't completely shielded from our economies issues but I still have a job and
still get free soda. While that's all still the case, I decided to test Sarah's claimed ability to differentiate between Pepsi, Coke, and their diet counterparts by taste alone. I poured the four
sodas into marked cups and Sarah and I each took two runs through the cups with the following guesses.
Soda Identification Challenge Results
Drink
|
Sarah
|
Dave
|
Guess 1
|
Guess 2
|
Guess 1
|
Guess 2
|
Coke
|
Coke
|
Coke
|
Pepsi
|
Diet Pepsi
|
Diet Coke
|
Diet Coke
|
Diet Pepsi
|
Diet Coke
|
Diet Coke
|
Pepsi
|
Pepsi
|
Pepsi
|
Coke
|
Coke
|
Diet Pepsi
|
Diet Pepsi
|
Diet Coke
|
Diet Pepsi
|
Pepsi
|
Total (out of 8)
|
6
|
3
|
As you can see from the results, Sarah's claimed ability to identify Coke and Pepsi by taste is confirmed. The first run through she got completely correct and on the second run only mistook Diet
Pepsi for Diet Coke. Her excuse for the error on the second run was a tainted palate from the first run. I on the other hand was mostly incorrect. Surprisingly though my incorrect answers were
mostly consistent between run one and two. For instance I thought Pepsi was Coke in both runs.
coke microsoft waste of soda pepsi waste of time soda 2009 Jan 24, 2:42"PolitiFact has compiled about 500 promises that Barack Obama made during the campaign and is tracking their progress on our Obameter. We rate their status as No Action, In the Works or Stalled. Once
we find action is completed, we rate them Promise Kept, Compromise or Promise Broken."
politics news government obama election president tracking 2009 Jan 22, 9:48"Revocation presents another challenge. If a system relies only on a biometric for both identity and authentication, how do you revoke that factor? Forgotten passwords can be changed; lost smartcards
can be revoked and replaced. How do you revoke a finger?"
article microsoft security identity authentication biometrics 2009 Jan 20, 11:40"But, when police searched his computer, they found Google searches from a couple days after the accident like, "auto parts, auto dealers out-of-state; auto glass, Las Vegas; auto glass reporting
requirements to law enforcement, auto theft," according to the prosecutor. The coup de grace? He searched for "hit-and-run," which he followed to a page about the hit-and-run he committed."
privacy google internet crime