text page 5 - Dave's Blog

Search
My timeline on Mastodon

Eat Pants - Interactive Fiction Sessions from my Server Logs

2009 Jun 29, 4:19

I've looked at my web server logs previously to see if anyone had used my Web Frotz Interpreter and until recently didn't realize that awstats (the web server log report generator) was truncating the query from my URL, so I couldn't tell that anyone was actually using it. But after grepping the logs manually I've pulled out the URLs of visitor's text adventure sessions. If you'll recall, my Web Frotz Interpreter stores the game state in the URL so its easy to see user's game states in the web server logs.

I've put some of the links up on the Web Frotz Interpreter page. Some of the interesting ones:

PermalinkCommentsserver-logs technical zork frotz pants interactive-fiction uri if

PowerShell Scanning Script

2009 Jun 27, 3:42

I've hooked up the printer/scanner to the Media Center PC since I leave that on all the time anyway so we can have a networked printer. I wanted to hook up the scanner in a somewhat similar fashion but I didn't want to install HP's software (other than the drivers of course). So I've written my own script for scanning in PowerShell that does the following:

  1. Scans using the Windows Image Acquisition APIs via COM
  2. Runs OCR on the image using Microsoft Office Document Imaging via COM (which may already be on your PC if you have Office installed)
  3. Converts the image to JPEG using .NET Image APIs
  4. Stores the OCR text into the EXIF comment field using .NET Image APIs (which means Windows Search can index the image by the text in the image)
  5. Moves the image to the public share

Here's the actual code from my scan.ps1 file:

param([Switch] $ShowProgress, [switch] $OpenCompletedResult)

$filePathTemplate = "C:\users\public\pictures\scanned\scan {0} {1}.{2}";
$time = get-date -uformat "%Y-%m-%d";

[void]([reflection.assembly]::loadfile( "C:\Windows\Microsoft.NET\Framework\v2.0.50727\System.Drawing.dll"))

$deviceManager = new-object -ComObject WIA.DeviceManager
$device = $deviceManager.DeviceInfos.Item(1).Connect();

foreach ($item in $device.Items) {
        $fileIdx = 0;
        while (test-path ($filePathTemplate -f $time,$fileIdx,"*")) {
                [void](++$fileIdx);
        }

        if ($ShowProgress) { "Scanning..." }

        $image = $item.Transfer();
        $fileName = ($filePathTemplate -f $time,$fileIdx,$image.FileExtension);
        $image.SaveFile($fileName);
        clear-variable image

        if ($ShowProgress) { "Running OCR..." }

        $modiDocument = new-object -comobject modi.document;
        $modiDocument.Create($fileName);
        $modiDocument.OCR();
        if ($modiDocument.Images.Count -gt 0) {
                $ocrText = $modiDocument.Images.Item(0).Layout.Text.ToString().Trim();
                $modiDocument.Close();
                clear-variable modiDocument

                if (!($ocrText.Equals(""))) {
                        $fileAsImage = New-Object -TypeName system.drawing.bitmap -ArgumentList $fileName
                        if (!($fileName.EndsWith(".jpg") -or $fileName.EndsWith(".jpeg"))) {
                                if ($ShowProgress) { "Converting to JPEG..." }

                                $newFileName = ($filePathTemplate -f $time,$fileIdx,"jpg");
                                $fileAsImage.Save($newFileName, [System.Drawing.Imaging.ImageFormat]::Jpeg);
                                $fileAsImage.Dispose();
                                del $fileName;

                                $fileAsImage = New-Object -TypeName system.drawing.bitmap -ArgumentList $newFileName 
                                $fileName = $newFileName
                        }

                        if ($ShowProgress) { "Saving OCR Text..." }

                        $property = $fileAsImage.PropertyItems[0];
                        $property.Id = 40092;
                        $property.Type = 1;
                        $property.Value = [system.text.encoding]::Unicode.GetBytes($ocrText);
                        $property.Len = $property.Value.Count;
                        $fileAsImage.SetPropertyItem($property);
                        $fileAsImage.Save(($fileName + ".new"));
                        $fileAsImage.Dispose();
                        del $fileName;
                        ren ($fileName + ".new") $fileName
                }
        }
        else {
                $modiDocument.Close();
                clear-variable modiDocument
        }

        if ($ShowProgress) { "Done." }

        if ($OpenCompletedResult) {
                . $fileName;
        }
        else {
                $result = dir $fileName;
                $result | add-member -membertype noteproperty -name OCRText -value $ocrText
                $result
        }
}

I ran into a few issues:

PermalinkCommentstechnical scanner ocr .net modi powershell office wia

Data.gov: Unlocking the Federal Filing Cabinets - Bits Blog - NYTimes.com

2009 May 26, 11:28"But Data.gov is different. It is primarily for machines, not people, at least as a first step. It is a catalog of various sets of data from government agencies. And the idea is to offer the data in one of several standardized formats, ranging from a simple text file that can be read by a spreadsheet program to the XML format widely used these days for the exchange of information between Web services. Other data is presented in formats that are meant to feed into mapping programs."PermalinkCommentsdata nytimes xml government

Charles Stross - Fiction intro

2009 Apr 6, 10:40Charles Stross personal website that links to the text of a bunch of his short stories.PermalinkCommentsblog scifi fiction literature charles-stross

s i x t h s e n s e - a wearable gestural interface (MIT Media Lab)

2009 Apr 3, 11:40"'SixthSense' is a wearable gestural interface that augments the physical world around us with digital information and lets us use natural hand gestures to interact with that information." The page is a lot easier to read with styling turned off. Actually, skip the text just watch the TED video.PermalinkCommentsvisualization design research mit hci mobile interactive ted

Text ads in open source code will add millions in funding to OSS movement - jorgenmodin.net

2009 Apr 1, 9:26"Open source code will contain text ads in comments and variable names, and Google and other text ad companies will scan the open source code repositories and fund the projects based on what they find in the code"PermalinkCommentshumor opensource advertising code

Outline View Internet Explorer Extension

2009 Mar 23, 8:13

I've made another extension for IE8, Outline View, which gives you a side bar in IE that displays an outline of the current page and lets you make intrapage bookmarks.

The outline is generated based on the heading tags in the document (e.g. h1, h2, etc), kind of like what W3C's Semantic data extractor tool displays for an outline. So if the page doesn't use heading tags the way the HTML spec intended or just sticks img tags in them, then the outline doesn't look so hot. On a page that does use headings as intended though it looks really good. For instance a section from the HTML 4 spec shows up quite nicely and I find its actually useful to be able to jump around to the different sections. Actually, I've been surprised going to various blogs how well the outline view is actually working -- I thought a lot more webdevs would be abusing their heading tags.

I've also added intrapage bookmarks. When you make a text selection and clear it, that selected text is added as a temporary intrapage bookmark which shows up in the correct place in the outline. You can navigate to the bookmark or right click to make it permanent. Right now I'm storing the permanent intrapage bookmarks in IE8's new per-domain DOM storage because I wanted to avoid writing code to synchronize a cross process store of bookmarks, it allowed me to play with the DOM storage a bit, and the bookmarks will get cleared appropriately when the user clears their history via the control panel.

PermalinkCommentstechnical intrapage bookmark boring html ie8 ie extension

Notes on Creating Internet Explorer Extensions in C++ and COM

2009 Mar 20, 4:51

Working on Internet Explorer extensions in C++ & COM, I had to relearn or rediscover how to do several totally basic and important things. To save myself and possibly others trouble in the future, here's some pertinent links and tips.

First you must choose your IE extensibility point. Here's a very short list of the few I've used:

Once you've created your COM object that implements IObjectWithSite and whatever other interfaces your extensibility point requires as described in the above links you'll see your SetSite method get called by IE. You might want to know how to get the top level browser object from the IUnknown site object passed in via that method.

After that you may also want to listen for some events from the browser. To do this you'll need to:

  1. Implement the dispinterface that has the event you want. For instance DWebBrowserEvents2, or HTMLDocumentEvents, or HTMLWindowEvents2. You'll have to search around in that area of the documentation to find the event you're looking for.
  2. Register for events using AtlAdvise. The object you need to subscribe to depends on the events you want. For example, DWebBrowserEvents2 come from the webbrowser object, HTMLDocumentEvents come from the document object assuming its an HTML document (I obtained via get_Document method on the webbrowser), and HTMLWindowEvents2 come from the window object (which oddly I obtained via calling the get_script method on the document object). Note that depending on when your SetSite method is called the document may not exist yet. For my extension I signed up for browser events immediately and then listened for events like NavigateComplete before signing up for document and window events.
  3. Implement IDispatch. The Invoke method will get called with event notifications from the dispinterfaces you sign up for in AtlAdvise. Implementing Invoke manually is a slight pain as all the parameters come in as VARIANTs and are in reverse order. There's some ATL macros that may make this easier but I didn't bother.
  4. Call AtlUnadvise at some point -- at the latest when SetSite is called again and your site object changes.

If you want to check if an IHTMLElement is not visible on screen due how the page is scrolled, try comparing the Body or Document Element's client height and width, which appears to be the dimensions of the visible document area, to the element's bounding client rect which appears to be its position relative to the upper left corner of the visible document area. I've found this to be working for me so far, but I'm not positive that frames, iframes, zooming, editable document areas, etc won't mess this up.

Be sure to use pointers you get from the IWebBrowser/IHTMLDocument/etc. only on the thread on which you obtained the pointer or correctly marshal the pointers to other threads to avoid weird crashes and hangs.

Obtaining the HTML document of a subframe is slightly more complicated then you might hope. On the other hand this might be resolved by the new to IE8 method IHTMLFrameElement3::get_contentDocument

Check out Eric's IE blog post on IE extensibility which has some great links on this topic as well.

PermalinkCommentstechnical boring internet explorer com c++ ihtmlelement extension

LDC Catalog - Web 1T 5-gram Version 1

2009 Mar 16, 4:22"This data set, contributed by Google Inc., contains English word n-grams and their observed frequency counts. The length of the n-grams ranges from unigrams (single words) to five-grams. We expect this data will be useful for statistical language modeling, e.g., for machine translation or speech recognition, as well as for other uses." 6 DVDs for only $150 with licensing restri... ok nm.PermalinkCommentslanguage google statistics database text

The 'Is It UTF-8?' Quick and Dirty Test

2009 Mar 6, 5:16

I've found while debugging networking in IE its often useful to quickly tell if a string is encoded in UTF-8. You can check for the Byte Order Mark (EF BB BF in UTF-8) but, I rarely see the BOM on UTF-8 strings. Instead I apply a quick and dirty UTF-8 test that takes advantage of the well-formed UTF-8 restrictions.

Unlike other multibyte character encoding forms (see Windows supported character sets or IANA's list of character sets), for example Big5, where sticking together any two bytes is more likely than not to give a valid byte sequence, UTF-8 is more restrictive. And unlike other multibyte character encodings, UTF-8 bytes may be taken out of context and one can still know that its a single byte character, the starting byte of a three byte sequence, etc.

The full rules for well-formed UTF-8 are a little too complicated for me to commit to memory. Instead I've got my own simpler (this is the quick part) set of rules that will be mostly correct (this is the dirty part). For as many bytes in the string as you care to examine, check the most significant digit of the byte:

F:
This is byte 1 of a 4 byte encoded codepoint and must be followed by 3 trail bytes.
E:
This is byte 1 of a 3 byte encoded codepoint and must be followed by 2 trail bytes.
C..D:
This is byte 1 of a 2 byte encoded codepoint and must be followed by 1 trail byte.
8..B:
This is a trail byte.
0..7:
This is a single byte encoded codepoint.
The simpler rules can produce false positives in some cases: that is, they'll say a string is UTF-8 when in fact it might not be. But it won't produce false negatives. The following is table from the Unicode spec. that actually describes well-formed UTF-8.
Code Points 1st Byte 2nd Byte 3rd Byte 4th Byte
U+0000..U+007F 00..7F
U+0080..U+07FF C2..DF 80..BF
U+0800..U+0FFF E0 A0..BF 80..BF
U+1000..U+CFFF E1..EC 80..BF 80..BF
U+D000..U+D7FF ED 80..9F 80..BF
U+E000..U+FFFF EE..EF 80..BF 80..BF
U+10000..U+3FFFF F0 90..BF 80..BF 80..BF
U+40000..U+FFFFF F1..F3 80..BF 80..BF 80..BF
U+100000..U+10FFFF F4 80..8F 80..BF 80..BF

PermalinkCommentstest technical unicode boring charset utf8 encoding

MyFonts Blog - Blog Archive - Introducing WhatTheFont for iPhone!

2009 Feb 11, 10:05"With the iPhone version of WhatTheFont you can use the phone's built-in camera to photograph the text in question (or choose an existing image from your photo albums)... After confirming which characters are used in the image, the app provides a list of possible matching fonts."PermalinkCommentsfont iphone camera typography

kottke.org - home of fine hypertext products

2009 Jan 20, 6:16Good linksPermalinkCommentshumor blog internet web technology geek culture design daily

Post to Twitter using the command line - Download Squad

2009 Jan 15, 10:28Thanks to Matt, for the first time I can see myself using Twitter. Twitter app on my phone notifies me when something's posted so my build process can let me know when its done, or when sync finally finishes, etc. I'd been meaning to setup a mini-notification system with a command line tool to my phone (w/o paying per text msg) but I didn't think of Twitter.PermalinkCommentsvia:swannman api internet curl cli twitter

Text/Plain Fragment Bookmarklet

2008 Nov 19, 12:58

The text/plain fragment documented in RFC 5147 and described on Erik Wilde's blog struck my interest and, like the XML fragment, I wanted to see if I could implement this in IE. In this case there's no XSLT for me to edit so, like my plain/text word wrap bookmarklet I've implemented it as a bookmarklet. This is only a partial implementation as it doesn't implement the integrity checks.

Check out my text/plain fragment bookmarklet.

PermalinkCommentstext url boring bookmarklet uri plain-text javascript fragment

Word Wrapping IE's Plain Text

2008 Oct 28, 11:23

If you view a plain text document in Internet Explorer 8, for instance the plain text version of Cory Doctorow's book Little Brother and press F12 to bring up the developer toolbar, you can see that IE simply takes the plain text, sticks it inside a

 tag, and renders it.  This means that word wrapping isn't supplied and the only line breaks that appear are those in the document.  However, since the text document is converted to HTML it means I can implement word wrap myself using a bookmarklet:
javascript:function ww() { var preTag = document.getElementsByTagName('pre')[0]; preTag.style.fontFamily="arial"; preTag.style.wordWrap='break-word'; }; ww();
After adding a favorite and setting the favorite's URL to the previous, I can view plain text documents, and select my Word Wrap favorite to apply word wrap and non-fixed width font.
PermalinkCommentsbrowser technical ie wordwrap

List of language inventors

2008 Oct 10, 1:43A blog comment included the phrase 'hard-core conlangers' which at first glance sounds dirty, then based on the context I thought it was made up, but of course Wikipedia has the actual answer: "A conlanger ... is person who invents conlangs (constructed languages)."PermalinkCommentslanguage klingon nerd wikipedia conlang

Cadbury Bunny Sneaks Mint

2008 Oct 7, 2:49
Cadbury the bunny takes a moment from hiding under the chair to eat some mint. She comes out just to grab some mint and then goes back under the chair repeatedly for two minutes.
From: David Risney
Views: 328
1 ratings
Time: 02:01 More in Pets & Animals
PermalinkCommentsvideo

Disemvowelment and Reemvowelment Tools

2008 Oct 3, 5:29I thought the disemvowelment of trolls was a pretty funny punishment -- much better than simply removing the comment: "Disemvowelment is - obviously enough - the act of removing the vowels from a passage of text, as well as a pun on the word 'disembowelling'. A number of blogs and websites do this to offensive text which has been placed in their 'comments' section. ... This site exists because I couldn't resists the challenge of trying to re-emvowel disemvowelled text. This is a challenging task, as the disemvowelled word 'dg' may well have been 'dog', but also 'dig', 'dug', 'doge', diego' and so on. I have a first cut of this functionality at the re-emvowel link at the side of the page. A more advanced version is in progress."PermalinkCommentstool disemvowelment web comment forum troll language

Yahoo! Search Blog: Yahoo! Chats with Semantic Web Expert, Ben Adida

2008 Sep 16, 3:57Interview with Ben Adida on RDFa: "...RDFa is ready. It has just been approved by the W3C as a Candidate Recommendation, with the specific text of the specification and a brand new Primer published on June 20th. Y!: What can I do with RDFa? BA: You can tell the world what various components on your web page mean by marking up things like: The title of a photo Your name and contact information The license under which you're distributing your latest MP3 The ingredients of a cooking recipe The price of an item A gene on which you recently wrote a paper ... Anything that you want to make more machine-readable"PermalinkCommentsrdf microformats yahoo semantic interview ben-adida semanticweb via:felix42

Cadbury Hits Fireplace Shovel

2008 Sep 1, 4:45
Cadbury jumps out of her toy box hitting the fireplace shovel, and does a few other cute things.
From: David Risney
Views: 76
1 ratings
Time: 00:44 More in Pets & Animals
PermalinkCommentsvideo
Older EntriesNewer Entries Creative Commons License Some rights reserved.