list page 7 - Dave's Blog

Search
My timeline on Mastodon

The 'Is It UTF-8?' Quick and Dirty Test

2009 Mar 6, 5:16

I've found while debugging networking in IE its often useful to quickly tell if a string is encoded in UTF-8. You can check for the Byte Order Mark (EF BB BF in UTF-8) but, I rarely see the BOM on UTF-8 strings. Instead I apply a quick and dirty UTF-8 test that takes advantage of the well-formed UTF-8 restrictions.

Unlike other multibyte character encoding forms (see Windows supported character sets or IANA's list of character sets), for example Big5, where sticking together any two bytes is more likely than not to give a valid byte sequence, UTF-8 is more restrictive. And unlike other multibyte character encodings, UTF-8 bytes may be taken out of context and one can still know that its a single byte character, the starting byte of a three byte sequence, etc.

The full rules for well-formed UTF-8 are a little too complicated for me to commit to memory. Instead I've got my own simpler (this is the quick part) set of rules that will be mostly correct (this is the dirty part). For as many bytes in the string as you care to examine, check the most significant digit of the byte:

F:
This is byte 1 of a 4 byte encoded codepoint and must be followed by 3 trail bytes.
E:
This is byte 1 of a 3 byte encoded codepoint and must be followed by 2 trail bytes.
C..D:
This is byte 1 of a 2 byte encoded codepoint and must be followed by 1 trail byte.
8..B:
This is a trail byte.
0..7:
This is a single byte encoded codepoint.
The simpler rules can produce false positives in some cases: that is, they'll say a string is UTF-8 when in fact it might not be. But it won't produce false negatives. The following is table from the Unicode spec. that actually describes well-formed UTF-8.
Code Points 1st Byte 2nd Byte 3rd Byte 4th Byte
U+0000..U+007F 00..7F
U+0080..U+07FF C2..DF 80..BF
U+0800..U+0FFF E0 A0..BF 80..BF
U+1000..U+CFFF E1..EC 80..BF 80..BF
U+D000..U+D7FF ED 80..9F 80..BF
U+E000..U+FFFF EE..EF 80..BF 80..BF
U+10000..U+3FFFF F0 90..BF 80..BF 80..BF
U+40000..U+FFFFF F1..F3 80..BF 80..BF 80..BF
U+100000..U+10FFFF F4 80..8F 80..BF 80..BF

PermalinkCommentstest technical unicode boring charset utf8 encoding

Some Datasets Available on the Web - Data Wrangling Blog

2009 Feb 23, 10:34Lots of neat web APIs. Added to Delicious network. "Over the past year, I've been tagging interesting data I find on the web in del.icio.us. I wrote a quick python script to pull the relevant links from my del.icio.us export and list them at the bottom of this post. Most of these datasets are related to machine learning, but there are a lot of government, finance, and search datasets as well."PermalinkCommentsapi data semanticweb information reference

Keepers of Lists - Top 278 Star Wars Lines Improved By Replacing A Word With Pants

2009 Feb 13, 9:29"I have altered the pants, pray that I don't alter them further."PermalinkCommentshumor pants starwars quote scifi geek

MyFonts Blog - Blog Archive - Introducing WhatTheFont for iPhone!

2009 Feb 11, 10:05"With the iPhone version of WhatTheFont you can use the phone's built-in camera to photograph the text in question (or choose an existing image from your photo albums)... After confirming which characters are used in the image, the app provides a list of possible matching fonts."PermalinkCommentsfont iphone camera typography

The WHATWG Blog - Blog Archive - This Week in HTML 5 - Episode 20

2009 Feb 3, 11:15"r2719 specifies that browsers should not allow scripts to set document.domain to anything on the Public Suffix List, such as "com" or "co.jp". Essential background reading on why this is dangerous: Untraceable XSS Attacks. Most browsers already block this attack, e.g. Firefox since 3.0. [Background: Re: Setting document.domain]"PermalinkCommentshtml5 tld publicsuffix dns security html internet web reference w3c

Etre Touchy - Welcome - Gloves for your iPhone, iPod Touch, Nintendo DS, Blackberry, PDA and more...

2009 Jan 8, 5:49Gloves with the ends of the index finger and thumb missing for using phones and the like while keeping the rest of your hands warm. Good idea!PermalinkCommentsglove design shopping wishlist phone cellphone clothing gloves

20x200: Our Story

2008 Dec 29, 2:26Some interesting work in here: "On a Sunday night back in January, Jen came up with a formula: (limited editions x low prices) + the internet = art for everyone"PermalinkCommentsart gift purchase wishlist gallery online photo shop via:thefangmonster

A Young Mad Scientist's First Alphabet Blocks | Xylocopa

2008 Dec 17, 2:27"Specifically, we have noticed that there is absolutely no training in the K-6 grades that prepares students to become mad scientists. In this competitive 21st-century world, the need for mad scientists will only increase... We are pleased to announce the release of our Young Mad Scientist's First Alphabet Blocks."PermalinkCommentshumor science geek gift via:swannman shopping wishlist toy alphabet-blocks

Tab Expansion in PowerShell

2008 Nov 18, 6:38

PowerShell gives us a real CLI for Windows based around .Net stuff. I don't like the creation of a new shell language but I suppose it makes sense given that they want something C# like but not C# exactly since that's much to verbose and strict for a CLI. One of the functions you can override is the TabExpansion function which is used when you tab complete commands. I really like this and so I've added on to the standard implementation to support replacing a variable name with its value, tab completion of available commands, previous command history, and drive names (there not restricted to just one letter in PS).

Learning the new language was a bit of a chore but MSDN helped. A couple of things to note, a statement that has a return value that you don't do anything with is implicitly the return value for the current function. That's why there's no explicit return's in my TabExpansion function. Also, if you're TabExpansion function fails or returns nothing then the builtin TabExpansion function runs which does just filenames. This is why you can see that the standard TabExpansion function doesn't handle normal filenames: it does extra stuff (like method and property completion on variables that represent .Net objects) but if there's no fancy extra stuff to be done it lets the builtin one take a crack.

Here's my TabExpansion function. Probably has bugs, so watch out!


function EscapePath([string] $path, [string] $original)
{
    if ($path.Contains(' ') -and !$original.Contains(' '))
    {
        '"'   $path   '"';
    }
    else
    {
        $path;
    }
}

function PathRelativeTo($pathDest, $pathCurrent)
{
    if ($pathDest.PSParentPath.ToString().EndsWith($pathCurrent.Path))
    {
        '.\'   $pathDest.name;
    }
    else
    {
        $pathDest.FullName;
    }
}

#  This is the default function to use for tab expansion. It handles simple
# member expansion on variables, variable name expansion and parameter completion
# on commands. It doesn't understand strings so strings containing ; | ( or { may
# cause expansion to fail.

function TabExpansion($line, $lastWord)
{
    switch -regex ($lastWord)
    {
         # Handle property and method expansion...
         '(^.*)(\$(\w|\.) )\.(\w*)$' {
             $method = [Management.Automation.PSMemberTypes] `
                 'Method,CodeMethod,ScriptMethod,ParameterizedProperty'
             $base = $matches[1]
             $expression = $matches[2]
             Invoke-Expression ('$val='   $expression)
             $pat = $matches[4]   '*'
             Get-Member -inputobject $val $pat | sort membertype,name |
                 where { $_.name -notmatch '^[gs]et_'} |
                 foreach {
                     if ($_.MemberType -band $method)
                     {
                         # Return a method...
                         $base   $expression   '.'   $_.name   '('
                     }
                     else {
                         # Return a property...
                         $base   $expression   '.'   $_.name
                     }
                 }
             break;
          }

         # Handle variable name expansion...
         '(^.*\$)([\w\:]*)$' {
             $prefix = $matches[1]
             $varName = $matches[2]
             foreach ($v in Get-Childitem ('variable:'   $varName   '*'))
             {
                 if ($v.name -eq $varName)
                 {
                     $v.value
                 }
                 else
                 {
                    $prefix   $v.name
                 }
             }
             break;
         }

         # Do completion on parameters...
         '^-([\w0-9]*)' {
             $pat = $matches[1]   '*'

             # extract the command name from the string
             # first split the string into statements and pipeline elements
             # This doesn't handle strings however.
             $cmdlet = [regex]::Split($line, '[|;]')[-1]

             #  Extract the trailing unclosed block e.g. ls | foreach { cp
             if ($cmdlet -match '\{([^\{\}]*)$')
             {
                 $cmdlet = $matches[1]
             }

             # Extract the longest unclosed parenthetical expression...
             if ($cmdlet -match '\(([^()]*)$')
             {
                 $cmdlet = $matches[1]
             }

             # take the first space separated token of the remaining string
             # as the command to look up. Trim any leading or trailing spaces
             # so you don't get leading empty elements.
             $cmdlet = $cmdlet.Trim().Split()[0]

             # now get the info object for it...
             $cmdlet = @(Get-Command -type 'cmdlet,alias' $cmdlet)[0]

             # loop resolving aliases...
             while ($cmdlet.CommandType -eq 'alias') {
                 $cmdlet = @(Get-Command -type 'cmdlet,alias' $cmdlet.Definition)[0]
             }

             # expand the parameter sets and emit the matching elements
             foreach ($n in $cmdlet.ParameterSets | Select-Object -expand parameters)
             {
                 $n = $n.name
                 if ($n -like $pat) { '-'   $n }
             }
             break;
         }

         default {
             $varNameStar = $lastWord   '*';

             foreach ($n in @(Get-Childitem $varNameStar))
             {
                 $name = PathRelativeTo ($n) ($PWD);

                 if ($n.PSIsContainer)
                 {
                     EscapePath ($name   '\') ($lastWord);
                 }
                 else
                 {
                     EscapePath ($name) ($lastWord);
                 }
             }

             if (!$varNameStar.Contains('\'))
             {
                foreach ($n in @(Get-Command $varNameStar))
                {
                    if ($n.CommandType.ToString().Equals('Application'))
                    {
                       foreach ($ext in @((cat Env:PathExt).Split(';')))
                       {
                          if ($n.Path.ToString().ToLower().EndsWith(($ext).ToString().ToLower()))
                          {
                              EscapePath($n.Path) ($lastWord);
                          }
                       }
                    }
                    else
                    {
                        EscapePath($n.Name) ($lastWord);
                    }
                }

                foreach ($n in @(Get-psdrive $varNameStar))
                {
                    EscapePath($n.name   ":") ($lastWord);
                }
             }

             foreach ($n in @(Get-History))
             {
                 if ($n.CommandLine.StartsWith($line) -and $n.CommandLine -ne $line)
                 {
                     $lastWord   $n.CommandLine.Substring($line.Length);
                 }
             }

             # Add the original string to the end of the expansion list.
             $lastWord;

             break;
         }
    }
}

PermalinkCommentscli technical tabexpansion powershell

G1 Android Phone

2008 Nov 9, 11:29

T-Mobile G1 Wallpapers by romainguy
I finally replaced my old regular cell-phone which was literally being held together by a rubber band with a fancy new G1, my first Internet accessible phone.

I had to call the T-Mobile support line to get data added to my plan and the person helping me was disconcertingly friendly. She asked about my weekend plans and so I felt compelled to ask her the same. Her plans involved replacing her video card so she could get back to World of Warcraft and do I enjoy computer gaming? I couldn't tell if she was genuine or if she was signing me up for magazines.

I was with Sarah in her new car, trying out the phone's GPS functionality via Google Maps while she drove. I switched to Street View and happened to find my car. It was a weird feeling, kind of like those Google conspiracy videos.

The phone runs Google's open source OS and I really enjoy the application API. Its all in Java and URIs and mime-types are sort of basics. Rather than invoking the builtin item picker control directly you invoke an 'intent' specifying the URI of your list of items, a mime-type describing the type of items in the list, and an action 'PICK' and whatever is registered as the picker on the system pops up and lets the user pick from that list. The same goes if you want to 'EDIT' an image, or 'VIEW' an mp3.

I wanted to replace the Google search box gadget that appears on the home screen with my own search box widget that uses OpenSearch descriptors but apparently in the current API you can't make home screen gadgets without changing parts of the OS. My other desired application is something to replace this GPS photo tracker device by recording my location to a file and an additional program on my computer to apply those locations to photos.

PermalinkCommentstmobile personal api phone technical g1 android google

Amazon.com: A Million Random Digits with 100,000 Normal Deviates: RAND Corporation: Books

2008 Oct 31, 7:10Bruce Schneier pointed out this book: "A Million Random Digits with 100,000 Normal Deviates (Paperback)". Its 600 pages of random numbers. I'd get a copy if it didn't cost $90! From the stats page Amazon lists the 100 most used words in the book: "6 8 11 19 23 28 30 32 37 38 42 47 52 54 56 60 72 77 80 84 86 92 101 102 107 108 111 115 125 126 131 143 147 148 150 157 158 163 166 167 171 179 183 188 190 197 206 207 212 215 218 220 226 228 230 234 236 242 247 249 251 253 261 265 272 292 297 304 311 323 332 336 337 338 344 345 354 356 358 359 364 371 372 374 384 389 391 409 412 413 421 433 436 443 457 481 489 516 517 642"PermalinkCommentsvia:schneier random book humor math csc

MTV Documentation Home

2008 Oct 29, 9:50MTV's new music video web service's API. The API provides feeds of music videos by artist or search term, list of artists that are 'like' other artists. Things it doesn't do: doesn't provide access to the video files instead provides URI to flash player. Also doesn't provide access to user's favorite videos or other user information.PermalinkCommentsapi video music mtv web feed rss

Investigation of a Few Application Protocols (Updated)

2008 Oct 25, 6:51

Windows allows for application protocols in which, through the registry, you specify a URL scheme and a command line to have that URL passed to your application. Its an easy way to hook a webbrowser up to your application. Anyone can read the doc above and then walk through the registry and pick out the application protocols but just from that info you can't tell what the application expects these URLs to look like. I did a bit of research on some of the application protocols I've seen which is listed below. Good places to look for information on URI schemes: Wikipedia URI scheme, and ESW Wiki UriSchemes.

Some Application Protocols and associated documentation.
Scheme Name Notes
search-ms Windows Search Protocol The search-ms application protocol is a convention for querying the Windows Search index. The protocol enables applications, like Microsoft Windows Explorer, to query the index with parameter-value arguments, including property arguments, previously saved searches, Advanced Query Syntax, Natural Query Syntax, and language code identifiers (LCIDs) for both the Indexer and the query itself. See the MSDN docs for search-ms for more info.
Example: search-ms:query=food
Explorer.AssocProtocol.search-ms
OneNote OneNote Protocol From the OneNote help: /hyperlink "pagetarget" - Starts OneNote and opens the page specified by the pagetarget parameter. To obtain the hyperlink for any page in a OneNote notebook, right-click its page tab and then click Copy Hyperlink to this Page.
Example: onenote:///\\GUMMO\Users\davris\Documents\OneNote%20Notebooks\OneNote%202007%20Guide\Getting%20Started%20with%20OneNote.one#section-id={692F45F5-A42A-415B-8C0D-39A10E88A30F}&end
callto Callto Protocol ESW Wiki Info on callto
Skype callto info
NetMeeting callto info
Example: callto://+12125551234
itpc iTunes Podcast Tells iTunes to subscribe to an indicated podcast. iTunes documentation.
C:\Program Files\iTunes\iTunes.exe /url "%1"
Example: itpc:http://www.npr.org/rss/podcast.php?id=35
iTunes.AssocProtocol.itpc
pcast
iTunes.AssocProtocol.pcast
Magnet Magnet URI Magnet URL scheme described by Wikipedia. Magnet URLs identify a resource by a hash of that resource so that when used in P2P scenarios no central authority is necessary to create URIs for a resource.
mailto Mail Protocol RFC 2368 - Mailto URL Scheme.
Mailto Syntax
Opens mail programs with new message with some parameters filled in, such as the to, from, subject, and body.
Example: mailto:?to=david.risney@gmail.com&subject=test&body=Test of mailto syntax
WindowsMail.Url.Mailto
MMS mms Protocol MSDN describes associated protocols.
Wikipedia describes MMS.
"C:\Program Files\Windows Media Player\wmplayer.exe" "%L"
Also appears to be related to MMS cellphone messages: MMS IETF Draft.
WMP11.AssocProtocol.MMS
secondlife [SecondLife] Opens SecondLife to the specified location, user, etc.
SecondLife Wiki description of the URL scheme.
"C:\Program Files\SecondLife\SecondLife.exe" -set SystemLanguage en-us -url "%1"
Example: secondlife://ahern/128/128/128
skype Skype Protocol Open Skype to call a user or phone number.
Skype's documentation
Wikipedia summary of skype URL scheme
"C:\Program Files\Skype\Phone\Skype.exe" "/uri:%l"
Example: skype:+14035551111?call
skype-plugin Skype Plugin Protocol Handler Something to do with adding plugins to skype? Maybe.
"C:\Program Files\Skype\Plugin Manager\skypePM.exe" "/uri:%1"
svn SVN Protocol Opens TortoiseSVN to browse the repository URL specified in the URL.
C:\Program Files\TortoiseSVN\bin\TortoiseProc.exe /command:repobrowser /path:"%1"
svn+ssh
tsvn
webcal Webcal Protocol Wikipedia describes webcal URL scheme.
Webcal URL scheme description.
A URL that starts with webcal:// points to an Internet location that contains a calendar in iCalendar format.
"C:\Program Files\Windows Calendar\wincal.exe" /webcal "%1"
Example: webcal://www.lightstalkers.org/LS.ics
WindowsCalendar.UrlWebcal.1
zune Zune Protocol Provides access to some Zune operations such as podcast subscription (via Zune Insider).
"c:\Program Files\Zune\Zune.exe" -link:"%1"
Example: zune://subscribe/?name=http://feeds.feedburner.com/wallstrip.
feed Outlook Add RSS Feed Identify a resource that is a feed such as Atom or RSS. Implemented by Outlook to add the indicated feed to Outlook.
Feed URI scheme pre-draft document
"C:\PROGRA~2\MICROS~1\Office12\OUTLOOK.EXE" /share "%1"
im IM Protocol RFC 3860 IM URI scheme description
Like mailto but for instant messaging clients.
Registered by Office Communicator but I was unable to get it to work as described in RFC 3860.
"C:\Program Files (x86)\Microsoft Office Communicator\Communicator.exe" "%1"
tel Tel Protocol RFC 5341 - tel URI scheme IANA assignment
RFC 3966 - tel URI scheme description
Call phone numbers via the tel URI scheme. Implemented by Office Communicator.
"C:\Program Files (x86)\Microsoft Office Communicator\Communicator.exe" "%1"
(Updated 2008-10-27: Added feed, im, and tel from Office Communicator)PermalinkCommentstechnical application protocol shell url windows

Language Log - Congress plans bailout for grammar epidemic

2008 Oct 23, 2:18I had no idea lingual prescriptivists vs descriptivists were split in a partisan manner: '... The Secretary [of the Department of Education] released a report that includes dire warnings of impending doom...The cause of this immanent catastrophe is, of course, those pesky linguists, the libertarian destroyers of good usage who claim that, well, anything goes. According to the report, "the language problem has now reached the crisis level and we are now experiencing a severe epidemic of bad grammar that will affect the very fiber of our nation." The Secretary added, "an alarming number of children are suffering from the bad advice given by those socialist, left-wing, atheistic linguists and we just gotta do something about it."'PermalinkCommentshumor language politics grammar

Failing Electronics

2008 Oct 22, 12:54

Electronic devices shouldn't fail, they should just sit wherever I place them and work forever. A while back my home web server started failing so I moved over to a real web hosting service. And this was the home web server I built from pieces Eric gave me after my previous one died during the big power failure the year before. The power socket on my old laptop has come undone from the motherboard so that it can no longer be powered. Just a week or two ago my Xbox 360 stopped displaying video. The CPU fan on my media center died. I also want to put my camera and GPS in this list, but the camera died due to accidentally turning on in my pocket and the GPS was stolen so those aren't the devices just arbitrarily failing.

PermalinkCommentsboring personal complaining nontechnical

Presidential Election 2008 FAQ

2008 Oct 13, 10:53"This is an FAQ (Frequently Asked Questions list) for the 2008 United States Presidential Election. I need to disclose up front that I am an Obama supporter. However, with the exception of the very last question, this FAQ is designed as a collection of factual information (such as the latest poll results) and of analysis that is as objective as possible."PermalinkCommentsvia:kris.kowal politics election obama mccain

List of language inventors

2008 Oct 10, 1:43A blog comment included the phrase 'hard-core conlangers' which at first glance sounds dirty, then based on the context I thought it was made up, but of course Wikipedia has the actual answer: "A conlanger ... is person who invents conlangs (constructed languages)."PermalinkCommentslanguage klingon nerd wikipedia conlang

MAKE: Blog: Metal plates send messages to airport x-ray screeners

2008 Sep 29, 3:07'These metal plates contain messages which will appear when they are X-Rayed.' What an awesome idea. Display messages to your friendly TSA x-ray security folk by cutting the messages into a plate of metal and placing it in your bag.PermalinkCommentshumor security product wishlist tsa airport x-ray

Party Movies Recommended by Netflix

2008 Sep 18, 10:31
Poster for 24 Hour Party PeoplePoster for Human TrafficPoster for The Boys and Girls Guide to Getting Down

Netflix has recommended three party movies over my time with Netflix and if you're OK with movies featuring sex, drugs, rock&roll (or techno) as almost the main character then I can recommend at least The Boys and Girls Guide to Getting Down.

24 Hour Party People is based on the true story of Tony Wilson, journalist, band manager, and club owner (not all at once) around the rise of punk and new wave in England. Like many true-story based movies it starts off strong and very interesting but gets very slow at the end like the writers got bored and just started copying the actual events. Unless you have some interest in the history of music in the 80s in Manchester I don't recommend this movie.

Human Traffic is fun and funny following a group of friends going out for a night of clubbing and partying. I had to get over seeing John Simm as not The Master from Doctor Who but rather as a partying youth. It felt like it was geared towards viewers who were on something like the totally odd techno musical interludes with the characters dancing for no apparent reason. Otherwise the movie was good.

The Boys and Girls Guide to Getting Down is done in the style of an old educational movie on the topic of clubbing and partying. It sounds like a premise that would get old but they do a good job. While demonstrating drinking and driving they have scientists push a mouse around in a toy convertible. Enough said. It was funny and I recommend it.

PermalinkCommentsparty movie netflix

Xbox Achievements for Everyday Life

2008 Sep 16, 7:54

I just upgraded to the Zune 3.0 software which includes games and purchasing music on the Zune via WiFi and once again I'm thrilled that the new firmware is available for old Zunes like mine. Rooting around looking at the new features I noticed Zune Badges for the first time. They're like Xbox Achievements, for example I have a Pixies Silver Artist Power Listener award for listening to the Pixies over 1000 times. I know its ridiculous but I like it, and now I want achievements for everything.

Achievements everywhere would require more developments in self-tracking. Self-trackers, folks who keep statistics on exactly when and what they eat, when and how much they exercise, anything one may track about one's self, were the topic of a Kevin Kelly Quantified Self blog post (also check out Cory Doctorow's SF short story The Things that Make Me Weak and Strange Get Engineered Away featuring a colony of self-trackers). For someone like me with a medium length attention span the data collection needs to be completely automatic or I will lose interest and stop collecting within a week. For instance, Nike iPod shoes that keep track of how many steps the wearer takes. I'll also need software to analyze, display, and share this data on a website like Mycrocosm. I don't want to have to spend extreme amounts of time to create something as wonderful as the Feltron Report (check out his statistic on how many daily measurements he takes for the report). Once we have the data we can give out achievements for everything!

Achievements for Everyday Life
Carnivore
Eat at least ten different kinds of animals.
Make Friends
Meet at least 10% of the residents in your home town.
Globetrotter
Visit a city in every country.
You're Old
Survive at least 80 years of life.

Of course none of the above is practical yet, but how about Delicious achievements based on the public Delicious feeds? That should be doable...

PermalinkCommentsself-tracking data achievements
Older EntriesNewer Entries Creative Commons License Some rights reserved.