val page 8 - Dave's Blog

Search
My timeline on Mastodon

Database - WEBAPPS

2010 Mar 5, 10:21Document explaining the relationship between the various web storage APIs coming out of HTML 5. To summarize:
Web Storage (aka DOM Storage) - simple key/value pairs API.
WebSimple DB API - now called Indexed Database API.
Indexed Database API and Web SQL Database - competing database APIs.
Application Cache - Storage of HTTP resources for offline apps.
DataCache API - A programmatically modifiable Application Cache.PermalinkCommentshtml html5 standard programming technical wiki w3c database storage web

The Boing Boing Guide to the 2010 Indie Games Festival Boing Boing

2010 Jan 6, 2:24This time its BoingBoing's list of 2010 indie games.PermalinkCommentsgame videogame

Code: Flickr Developer Blog » Language Detection: A Witch’s Brew?

2009 Dec 4, 10:24Flickr dev. blog on the accept-language HTTP header: "It’s true that the Accept-Language header has a troubled history. Because of this, many developers regard it the way medieval villagers might have regarded a woman with a warty nose and a pet cat – it should be shunned, avoided and possibly burned at the stake." And this great anecdote: "In two and a half years of running as an international site, we’ve only ever had one case where it didn’t work. Helio, a cellphone company, had a browser was custom-built for them in Korea, and had its “Accept-Language” header hard-coded to always request Korean, something which led to much confusion for the Flickr users amongst their American customers."PermalinkCommentsflickr internationalization language accept-language http http-header development technical web

English Shellcode

2009 Nov 27, 6:10"What follows is a brief description of the method we have developed for encoding arbitrary shellcode as English text. This English shellcode is completely self-contained, i.e., it does not require an external loader, and executes as valid IA32 code."PermalinkCommentssecurity polyglot intel paper research programming hack obfuscation english language technical system:filetype:pdf system:media:document

Official Google Blog: Cutting back on your long list of passwords

2009 Nov 23, 11:28"Thanks to the utilization of new technology, we're now seeing large-scale success in eliminating the need for passwords while increasing the successful registration rate at websites to over 90%...In addition, after a thorough evaluation of the security and privacy of these technologies, the same techniques are being piloted by President Obama's open identity initiative to enable citizens to sign in more easily to government-operated websites."PermalinkCommentsidentity openid google security authentication facebook password via:connolly technical

The Answer Factory: Fast, Disposable, and Profitable as Hell | Magazine

2009 Oct 22, 12:33"When asked for the most valuable topic in Demand’s arsenal, he replies instantly: “‘Where can I donate a car in Dallas?’"PermalinkCommentsvia:kris.kowal wired internet video howto automation business media marketing economics advertising

ASCIIpOrtal | Cymons Games

2009 Sep 23, 3:13ASCIIpOrtal is now released!PermalinkCommentsascii text game videogame humor portal valve free via:waxy

Sam Ruby: First Polyglot Validator Check Deployed

2009 Sep 10, 7:22HTML validator can validate that your document is both HTML and XHTML at the same time.PermalinkCommentshtml5 xhtml html validator technical web polyglot

Time/Date Conversion Tool

2009 Aug 28, 3:39

I built timestamp.exe, a Windows command line tool to convert between computer and human readable date/time formats mostly for working on the first run wizard for IE8. We commonly write out our dates in binary form to the registry and in order to test and debug my work it became useful to be able to determine to what date the binary value of a FILETIME or SYSTEMTIME corresponded or to produce my own binary value of a FILETIME and insert it into the registry.

For instance, to convert to a binary value:

[PS C:\] timestamp -inString 2009/08/28:10:18 -outHexValue -convert filetime
2009/08/28:10:18 as FILETIME: 00 7c c8 d1 c8 27 ca 01

Converting in the other direction, if you don't know what format the bytes are in, just feed them in and timestamp will try all conversions and list only the valid ones:

[PS C:\] timestamp -inHexValue  "40 52 1c 3b"
40 52 1c 3b as FILETIME: 1601-01-01:00:01:39.171
40 52 1c 3b as Unix Time: 2001-06-05:03:30:08.000
40 52 1c 3b as DOS Time: 2009-08-28:10:18:00.000
(it also supports OLE Dates, and SYSTEMTIME which aren't listed there because the hex value isn't valid for those types). Or use the guess option to get timestamp's best guess:
[PS C:\] timestamp -inHexValue  "40 52 1c 3b" -convert guess
40 52 1c 3b as DOS Time: 2009-08-28:10:18:00.000

When I first wrote this I had a bug in my function that parses the date-time value string in which I could parse 2009-07-02:10:18 just fine, but I wouldn't be able to parse 2009-09-02:10:18 correctly. This was my code:

success = swscanf_s(timeString, L"%hi%*[\\/- ,]%hi%*[\\/- ,]%hi%*[\\/- ,Tt:.]%hi%*[:.]%hi%*[:.]%hi%*[:.]%hi", 
&systemTime->wYear,
&systemTime->wMonth,
&systemTime->wDay,
&systemTime->wHour,
&systemTime->wMinute,
&systemTime->wSecond,
&systemTime->wMilliseconds) > 1;
See the problem?

To convert between these various forms yourself read The Old New Thing date conversion article or Josh Poley's date time article. I previously wrote about date formats I like and dislike.

PermalinkCommentsdate date-time technical time windows tool

Creating Accelerators for Other People's Web Services

2009 Aug 18, 4:19

Before we shipped IE8 there were no Accelerators, so we had some fun making our own for our favorite web services. I've got a small set of tips for creating Accelerators for other people's web services. I was planning on writing this up as an IE blog post, but Jon wrote a post covering a similar area so rather than write a full and coherent blog post I'll just list a few points:

PermalinkCommentstechnical accelerator ie8 ie

Common Sense Journalism: Local news value = 0? Not exactly, but ...

2009 Jul 28, 5:07Suggests that local news must provide the raw facts and only in particular cases do a 'story' on top of that -- not everything needs to be a story.PermalinkCommentsnews via:sambrook journalism

Linus Torvalds: "Microsoft hatred is a disease" - Ars Technica

2009 Jul 28, 3:39Linus Torvalds: "I'm a big believer in "technology over politics"...I may make jokes about Microsoft at times, but at the same time, I think the Microsoft hatred is a disease." This goes well with his previous quote calling Slashdot a "big public wanking session".PermalinkCommentslinux linus-torvalds microsoft politics technical

WGMX 4 - Zombocalypse on Vimeo

2009 Jun 30, 4:59"Congratulations on not being devoured and purchasing the Wagglemax Zombocalypse TM Survival Kit"PermalinkCommentshumor video commercial ad apocalypse zombie horror for:hellosarah videogame

PowerShell Scanning Script

2009 Jun 27, 3:42

I've hooked up the printer/scanner to the Media Center PC since I leave that on all the time anyway so we can have a networked printer. I wanted to hook up the scanner in a somewhat similar fashion but I didn't want to install HP's software (other than the drivers of course). So I've written my own script for scanning in PowerShell that does the following:

  1. Scans using the Windows Image Acquisition APIs via COM
  2. Runs OCR on the image using Microsoft Office Document Imaging via COM (which may already be on your PC if you have Office installed)
  3. Converts the image to JPEG using .NET Image APIs
  4. Stores the OCR text into the EXIF comment field using .NET Image APIs (which means Windows Search can index the image by the text in the image)
  5. Moves the image to the public share

Here's the actual code from my scan.ps1 file:

param([Switch] $ShowProgress, [switch] $OpenCompletedResult)

$filePathTemplate = "C:\users\public\pictures\scanned\scan {0} {1}.{2}";
$time = get-date -uformat "%Y-%m-%d";

[void]([reflection.assembly]::loadfile( "C:\Windows\Microsoft.NET\Framework\v2.0.50727\System.Drawing.dll"))

$deviceManager = new-object -ComObject WIA.DeviceManager
$device = $deviceManager.DeviceInfos.Item(1).Connect();

foreach ($item in $device.Items) {
        $fileIdx = 0;
        while (test-path ($filePathTemplate -f $time,$fileIdx,"*")) {
                [void](++$fileIdx);
        }

        if ($ShowProgress) { "Scanning..." }

        $image = $item.Transfer();
        $fileName = ($filePathTemplate -f $time,$fileIdx,$image.FileExtension);
        $image.SaveFile($fileName);
        clear-variable image

        if ($ShowProgress) { "Running OCR..." }

        $modiDocument = new-object -comobject modi.document;
        $modiDocument.Create($fileName);
        $modiDocument.OCR();
        if ($modiDocument.Images.Count -gt 0) {
                $ocrText = $modiDocument.Images.Item(0).Layout.Text.ToString().Trim();
                $modiDocument.Close();
                clear-variable modiDocument

                if (!($ocrText.Equals(""))) {
                        $fileAsImage = New-Object -TypeName system.drawing.bitmap -ArgumentList $fileName
                        if (!($fileName.EndsWith(".jpg") -or $fileName.EndsWith(".jpeg"))) {
                                if ($ShowProgress) { "Converting to JPEG..." }

                                $newFileName = ($filePathTemplate -f $time,$fileIdx,"jpg");
                                $fileAsImage.Save($newFileName, [System.Drawing.Imaging.ImageFormat]::Jpeg);
                                $fileAsImage.Dispose();
                                del $fileName;

                                $fileAsImage = New-Object -TypeName system.drawing.bitmap -ArgumentList $newFileName 
                                $fileName = $newFileName
                        }

                        if ($ShowProgress) { "Saving OCR Text..." }

                        $property = $fileAsImage.PropertyItems[0];
                        $property.Id = 40092;
                        $property.Type = 1;
                        $property.Value = [system.text.encoding]::Unicode.GetBytes($ocrText);
                        $property.Len = $property.Value.Count;
                        $fileAsImage.SetPropertyItem($property);
                        $fileAsImage.Save(($fileName + ".new"));
                        $fileAsImage.Dispose();
                        del $fileName;
                        ren ($fileName + ".new") $fileName
                }
        }
        else {
                $modiDocument.Close();
                clear-variable modiDocument
        }

        if ($ShowProgress) { "Done." }

        if ($OpenCompletedResult) {
                . $fileName;
        }
        else {
                $result = dir $fileName;
                $result | add-member -membertype noteproperty -name OCRText -value $ocrText
                $result
        }
}

I ran into a few issues:

PermalinkCommentstechnical scanner ocr .net modi powershell office wia

Wallace and Gromit: A Matter of Loaf and Death | 2009 Seattle International Film Festival | Nick Park | United Kingdom - Films

2009 May 25, 10:19"Wallace and Gromit return-this time as purveyors of the Top Bun bakery, despite the fact that 12 other local bakers have disappeared in the previous year. Now it's up to Gromit to solve the mystery while Wallace woos new love interest Piella Bakewell."PermalinkCommentshumor movie wallace-and-gromit animation claymation for:hellosarah

An Extraordinary Home. This 3 ++ bedroom 2.5 bathroom Single Family located at 601 Dolores Street, Mission Dolores, San Francisco, California is presented by John L. Woodruff III & Marcus Miller, MA Realtor/Broker Associates of Hill & Co. Real Estate.

2009 May 19, 1:43Lovely, although a bit out of my price range. "Formerly the Golden Gate Lutheran Church, this stunning Gothic Revival style building is now one of the most extraordinary and largest single family homes in San Francisco."PermalinkCommentsfor:hellosarah photo house home church california san-francisco flickr slideshow via:boingboing church-home

What San Francisco/Silicon Valley can learn from the Twittering company: Zappos - Scobleizer: bleeding edge technology talk

2009 May 13, 11:21Lots of interesting notes on the company culture of Zappos. "Yesterday I was lucky enough to visit Zappos and get a tour and talk with some of their executives, including Tony Hsieh, CEO."PermalinkCommentstwitter marketing business culture zappos shoes ecommerce

WRECK & SALVAGE

2009 May 1, 12:09"If I'm reading the pop-up window correctly, domain registrar Godaddy recommends against purchasing .tv domain names because the island of Tuvalu, which the domain represents, is sinking."PermalinkCommentshumor dns domain godaddy tv via:boingboing

The Self-Describing Web

2009 Apr 7, 1:13A sort of vertical cross section of an overview of what the web should look like from HTTP & URIs to GRDDL & RDF. Oh, and there's a pretty graph at the bottom. "This finding describes how document formats, markup conventions, attribute values, and other data formats can be designed to facilitate the deployment of self-describing, Web-grounded Web content."PermalinkCommentsweb w3c xml html http semanticweb microformats xhtml atom grddl rdfa rdf

The 'Is It UTF-8?' Quick and Dirty Test

2009 Mar 6, 5:16

I've found while debugging networking in IE its often useful to quickly tell if a string is encoded in UTF-8. You can check for the Byte Order Mark (EF BB BF in UTF-8) but, I rarely see the BOM on UTF-8 strings. Instead I apply a quick and dirty UTF-8 test that takes advantage of the well-formed UTF-8 restrictions.

Unlike other multibyte character encoding forms (see Windows supported character sets or IANA's list of character sets), for example Big5, where sticking together any two bytes is more likely than not to give a valid byte sequence, UTF-8 is more restrictive. And unlike other multibyte character encodings, UTF-8 bytes may be taken out of context and one can still know that its a single byte character, the starting byte of a three byte sequence, etc.

The full rules for well-formed UTF-8 are a little too complicated for me to commit to memory. Instead I've got my own simpler (this is the quick part) set of rules that will be mostly correct (this is the dirty part). For as many bytes in the string as you care to examine, check the most significant digit of the byte:

F:
This is byte 1 of a 4 byte encoded codepoint and must be followed by 3 trail bytes.
E:
This is byte 1 of a 3 byte encoded codepoint and must be followed by 2 trail bytes.
C..D:
This is byte 1 of a 2 byte encoded codepoint and must be followed by 1 trail byte.
8..B:
This is a trail byte.
0..7:
This is a single byte encoded codepoint.
The simpler rules can produce false positives in some cases: that is, they'll say a string is UTF-8 when in fact it might not be. But it won't produce false negatives. The following is table from the Unicode spec. that actually describes well-formed UTF-8.
Code Points 1st Byte 2nd Byte 3rd Byte 4th Byte
U+0000..U+007F 00..7F
U+0080..U+07FF C2..DF 80..BF
U+0800..U+0FFF E0 A0..BF 80..BF
U+1000..U+CFFF E1..EC 80..BF 80..BF
U+D000..U+D7FF ED 80..9F 80..BF
U+E000..U+FFFF EE..EF 80..BF 80..BF
U+10000..U+3FFFF F0 90..BF 80..BF 80..BF
U+40000..U+FFFFF F1..F3 80..BF 80..BF 80..BF
U+100000..U+10FFFF F4 80..8F 80..BF 80..BF

PermalinkCommentstest technical unicode boring charset utf8 encoding
Older EntriesNewer Entries Creative Commons License Some rights reserved.