format page 4 - Dave's Blog

Search
My timeline on Mastodon

The Future of Data Tags: Bokodes | Brain Pickings

2009 Aug 5, 7:57"Ten times smaller than barcodes, Bokodes’ low-cost optical design can be read from as far as 4 meters away, much farther than barcodes, by taking an out-of-focus photo with any off-the-shelf camera." Love for stuff like this to catch on, however compared to QR codes, these are much more difficult to produce than barcodes in that you can't just print them out and they require changes to the photography technique (must be out of focus) rather than just analyzing any photograph of a barcode. They seem to be solving slightly different problems.
PermalinkCommentsqrcode qr barcode camera information design bokode augmented-reality technical

Coding Horror: The Paper Data Storage Option

2009 Aug 3, 11:06"But how efficient is the alphabet at encoding information on a page?"PermalinkCommentsvia:ericlaw humor paper storage encoding

How Different Groups Spend Their Day - Interactive Graphic - NYTimes.com

2009 Aug 3, 8:30"The American Time Use Survey asks thousands of American residents to recall every minute of a day." I enjoy the graph animation when switching between different groups.PermalinkCommentsvia:swannman graph visualization information graphic nytimes infographics demographics statistics

very small array

2009 Jul 12, 3:16Blog of various entertaining graphs and visualizations. Lovely site design too.PermalinkCommentshumor blog art visualization graph statistics information chart design

Software Sleuthing : Date/Time Formats and Conversions

2009 Jul 7, 6:02More on converting between different date/time formats and the effective range of the formats.PermalinkCommentsdatetime time date programming c++ c technical

PowerShell Scanning Script

2009 Jun 27, 3:42

I've hooked up the printer/scanner to the Media Center PC since I leave that on all the time anyway so we can have a networked printer. I wanted to hook up the scanner in a somewhat similar fashion but I didn't want to install HP's software (other than the drivers of course). So I've written my own script for scanning in PowerShell that does the following:

  1. Scans using the Windows Image Acquisition APIs via COM
  2. Runs OCR on the image using Microsoft Office Document Imaging via COM (which may already be on your PC if you have Office installed)
  3. Converts the image to JPEG using .NET Image APIs
  4. Stores the OCR text into the EXIF comment field using .NET Image APIs (which means Windows Search can index the image by the text in the image)
  5. Moves the image to the public share

Here's the actual code from my scan.ps1 file:

param([Switch] $ShowProgress, [switch] $OpenCompletedResult)

$filePathTemplate = "C:\users\public\pictures\scanned\scan {0} {1}.{2}";
$time = get-date -uformat "%Y-%m-%d";

[void]([reflection.assembly]::loadfile( "C:\Windows\Microsoft.NET\Framework\v2.0.50727\System.Drawing.dll"))

$deviceManager = new-object -ComObject WIA.DeviceManager
$device = $deviceManager.DeviceInfos.Item(1).Connect();

foreach ($item in $device.Items) {
        $fileIdx = 0;
        while (test-path ($filePathTemplate -f $time,$fileIdx,"*")) {
                [void](++$fileIdx);
        }

        if ($ShowProgress) { "Scanning..." }

        $image = $item.Transfer();
        $fileName = ($filePathTemplate -f $time,$fileIdx,$image.FileExtension);
        $image.SaveFile($fileName);
        clear-variable image

        if ($ShowProgress) { "Running OCR..." }

        $modiDocument = new-object -comobject modi.document;
        $modiDocument.Create($fileName);
        $modiDocument.OCR();
        if ($modiDocument.Images.Count -gt 0) {
                $ocrText = $modiDocument.Images.Item(0).Layout.Text.ToString().Trim();
                $modiDocument.Close();
                clear-variable modiDocument

                if (!($ocrText.Equals(""))) {
                        $fileAsImage = New-Object -TypeName system.drawing.bitmap -ArgumentList $fileName
                        if (!($fileName.EndsWith(".jpg") -or $fileName.EndsWith(".jpeg"))) {
                                if ($ShowProgress) { "Converting to JPEG..." }

                                $newFileName = ($filePathTemplate -f $time,$fileIdx,"jpg");
                                $fileAsImage.Save($newFileName, [System.Drawing.Imaging.ImageFormat]::Jpeg);
                                $fileAsImage.Dispose();
                                del $fileName;

                                $fileAsImage = New-Object -TypeName system.drawing.bitmap -ArgumentList $newFileName 
                                $fileName = $newFileName
                        }

                        if ($ShowProgress) { "Saving OCR Text..." }

                        $property = $fileAsImage.PropertyItems[0];
                        $property.Id = 40092;
                        $property.Type = 1;
                        $property.Value = [system.text.encoding]::Unicode.GetBytes($ocrText);
                        $property.Len = $property.Value.Count;
                        $fileAsImage.SetPropertyItem($property);
                        $fileAsImage.Save(($fileName + ".new"));
                        $fileAsImage.Dispose();
                        del $fileName;
                        ren ($fileName + ".new") $fileName
                }
        }
        else {
                $modiDocument.Close();
                clear-variable modiDocument
        }

        if ($ShowProgress) { "Done." }

        if ($OpenCompletedResult) {
                . $fileName;
        }
        else {
                $result = dir $fileName;
                $result | add-member -membertype noteproperty -name OCRText -value $ocrText
                $result
        }
}

I ran into a few issues:

PermalinkCommentstechnical scanner ocr .net modi powershell office wia

OpenSearchDescriptionToHTML Tool

2009 Jun 10, 3:36

I've made an OpenSearchDescriptionToHTML XSLT that given an OpenSearch description file produces HTML that describes that file, lets you install it, or search with it. For example, here's a Google OpenSearch description that uses my OpenSearchDescriptionToHTML XSLT.

I had just created an OpenSearch description for WolframAlpha at work and was going about the process of adding another install link to my search provider page so that I could install it. Thinking about it, I realized I could apply an XSLT to the OpenSearch description XML to produce the HTML automatically so I wouldn't have to modify additional documents everytime I create and want to install a new OpenSearch description. While I was in there writing the XSLT I figure why not let the user try out searching with the OpenSearch description file too. And lastly I made the XSLT apply to itself to produce HTML describing its own usage.

Incidentally, I added WolframAlpha at work to replace my FileInfo search provider for the purposes of searching for information about particular Unicode characters. For instance, look at WolframAlpha's lovely output for this search for "Bopomofo zh".

PermalinkCommentstechnical xml wolframalpha opensearchdescriptiontohtml xslt opensearch

Data.gov: Unlocking the Federal Filing Cabinets - Bits Blog - NYTimes.com

2009 May 26, 11:28"But Data.gov is different. It is primarily for machines, not people, at least as a first step. It is a catalog of various sets of data from government agencies. And the idea is to offer the data in one of several standardized formats, ranging from a simple text file that can be read by a spreadsheet program to the XML format widely used these days for the exchange of information between Web services. Other data is presented in formats that are meant to feed into mapping programs."PermalinkCommentsdata nytimes xml government

Caught with Fake Info for Albertson Grocery Card

2009 May 25, 3:02

QFC grocery card barcodeChecking out at a grocery store to which I rarely go, the cashier asks me if I want an Albertson's card. I respond sure and she hands me the form on which I give up my personal information. I ask if I need to fill this out now, and she says yeah and it will only take two minutes, which surprised me because at QFC they just hand me a new card and send me on my way. I fill in my phone number as the first ten digits of pi so I don't have to worry about getting phone calls but its something I can remember next time I'm there and don't bring the card.

I turn to leave and the cashier asks me is that a '759' or '159' in my phone number. I stop for a second because I only know the digits as a sequence from the start and pause long enough reciting it in my head that its clear its not my phone number. And she calls me out on it: "Is that your real phone number?" I sigh, "No, does it have to be? Are you going to call me?" "Yeah," she says, "I'll call you." (ha ha) "Well I'll try entering this number," she says doubting the computer will accept the fake phone number. "On the number's already registered," she says, "So you already had a card." "No," says the manager who had walked up during for this exchange, "It means someone else used that same number." So the moral of the story is, try your fake phone number before trying to use it to get a new card.

PermalinkCommentspersonal2 pi albertsons

The Sheep Market

2009 May 13, 10:35In my first linear algebra book they had examples of linear tranformations applied to an image of a cartoon sheep. The fist example was a shear mapping.PermalinkCommentssheep humor amazon mechanicalturk via:swannman

Architectural Styles and the Design of Network-based Software Architectures

2009 May 3, 10:03"Architectural Styles and the Design of Network-based Software Architectures - DISSERTATION submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY in Information and Computer Science by Roy Thomas Fielding 2000"PermalinkCommentshttp rest paper web architecture development api webservices roy-fielding

stamen design | big ideas worth pursuing

2009 Apr 23, 4:46Some lovely data visualizations. Is their Crimespotting visualization supposed to look like the map interface from GTA3SA? "Since 2001, Stamen has developed a reputation for beautiful and technologically sophisticated projects in a diverse range of commercial and cultural settings."PermalinkCommentsblog web art visualization information interactive interface portfolio mashup

Flickr Visual Search in IE8

2009 Apr 10, 9:48

A while ago I promised to say how an xsltproc Meddler script would be useful and the general answer is its useful for hooking up a client application that wants data from the web in a particular XML format and the data is available on the web but in another XML format. The specific case for this post is a Flickr Search service that includes IE8 Visual Search Suggestions. IE8 wants the Visual Search Suggestions XML format and Flickr gives out search data in their Flickr web API XML format.

So I wrote an XSLT to convert from Flickr Search XML to Visual Suggestions XML and used my xsltproc Meddler script to actually apply this xslt.

After getting this all working I've placed the result in two places: (1) I've updated the xsltproc Meddler script to include this XSLT and an XML file to install it as a search provider - although you'll need to edit the XML to include your own Flickr API key. (2) I've created a service for this so you can just install the Flickr search provider if you're interested in having the functionality and don't care about the implementation. Additionally, to the search provider I've added accelerator preview support to show the Flickr slideshow which I think looks snazzy.

Doing a quick search for this it looks like there's at least one other such implementation, but mine has the distinction of being done through XSLT which I provide, updated XML namespaces to work with the released version of IE8, and I made it so you know its good.

PermalinkCommentsmeddler xml ie8 xslt flickr technical boring search suggestions

The Self-Describing Web

2009 Apr 7, 1:13A sort of vertical cross section of an overview of what the web should look like from HTTP & URIs to GRDDL & RDF. Oh, and there's a pretty graph at the bottom. "This finding describes how document formats, markup conventions, attribute values, and other data formats can be designed to facilitate the deployment of self-describing, Web-grounded Web content."PermalinkCommentsweb w3c xml html http semanticweb microformats xhtml atom grddl rdfa rdf

s i x t h s e n s e - a wearable gestural interface (MIT Media Lab)

2009 Apr 3, 11:40"'SixthSense' is a wearable gestural interface that augments the physical world around us with digital information and lets us use natural hand gestures to interact with that information." The page is a lot easier to read with styling turned off. Actually, skip the text just watch the TED video.PermalinkCommentsvisualization design research mit hci mobile interactive ted

Information Decoration

2009 Mar 25, 4:03"...we can look at the contemporary screen virus as a transitional phase - a growing pain, if you will, of the information age. Tiling our environment with screens is an extremely literal, and on top of that rather unimaginative, way to introduce virtuality into the physical world: simply piling it on where seamless integration was what was wanted."PermalinkCommentsvia:infosthetics visualization information architecture culture design art

GraphML Primer

2009 Mar 23, 6:19"GraphML Primer is a non-normative document intended to provide an easily readable description of the GraphML facilities, and is oriented towards quickly understanding how to create GraphML documents."PermalinkCommentsgraphml graph format xml howto reference w3c

FormToAccelerator Internet Explorer Extension

2009 Mar 12, 2:17

I've made an extension for Internet Explorer 8, FormToAccelerator which turns HTML forms on a web page into either an accelerator or a search provider. In the design of the accelerators format we intentionally had HTML forms in mind so that it would be easy to create accelerators for existing web services. Consequently, creating an accelerator from an HTML form is a natural concept and an extension I've been meaning to finish for many months now.

This is similar in concept to the Opera feature that lets you add a form as a search provider. The user experience is very rough and requires some knowledge of accelerator variables. If I can come up with a better interaction model I may update this in the future, but at the moment all the designs I can come up with require way too much effort. Install IE8 RC1 and then try out FormToAccelerator.

PermalinkCommentsactivity html accelerator ie8 internet-explorer activities formtoaccelerator extension

XSPF: XML Shareable Playlist Format: Home

2009 Mar 10, 11:31"XSPF is the XML format for sharing playlists." Supported by the Yahoo! Media Player.PermalinkCommentsreference internet xml playlist music opensource

Subst Allows Non-Letter Drive Letters

2009 Mar 4, 2:39

I knew that the command line tool subst would create virtual drives that map to existing directories but I didn't know that subst lets you name the virtual drives with characters that aren't US-ASCII letters. For instance you can run 'subst 4: C:\windows' and then 'more 4:\win.ini' to dump C:\windows\win.ini. This also works for non-US-ASCII characters like, "C" (aka U+FF23, Fullwidth Latin Capital Letter C), which when displayed by cmd.exe via some best fit style character conversions looks just like the regular US-ASCII 'C'. None of Explorer, IE, or the common file dialogs allow the use of these odd virtual drives -- just cmd.exe, so I'm not sure how this would ever be useful but I thought it was odd and I wanted to share.

PermalinkCommentscli technical boring subst windows
Older EntriesNewer Entries Creative Commons License Some rights reserved.