2009 Oct 6, 3:24The map/reduce tutorial for Hadoop the Apache open source project. "Hadoop Map/Reduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte
data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner."
hadoop mapreduce java software programming opensource database distributed google yahoo apache technical todo 2009 Aug 31, 4:53From Ira as part of The Balloon Project "... took the lo-fi diy map making essentials (portable helium tank, party balloons, and a disposable video camera) to Paris, France, where they launched a
video camera into the sky not knowing where it would go, and created some very unique aerial cartography of the Place de la Concorde.' I'd love to see this run through photo stitching software like
Photosynth and then layered on Google Maps.
map balloon art ira-mowen france paris 2009 Jul 23, 2:59"hand-typed from original scans by the Virtual AGS project; in the comments, numero mysterioso and hope hope hope"
humor code space programming via:waxy technical 2009 Jul 7, 6:02More on converting between different date/time formats and the effective range of the formats.
datetime time date programming c++ c technical 2009 Jul 1, 2:24Stats on HTTP servers and HTTP server response headers. "Current statistics are based on a sample of 84604 probed servers, gathered in the last 386 days."
http statistics server internet http-header via:mnot technical 2009 Jun 27, 3:42
I've hooked up the printer/scanner to the Media Center PC since I leave that on all the time anyway so we can have a networked printer. I wanted to hook up the scanner in a somewhat similar fashion
but I didn't want to install HP's software (other than the drivers of course). So I've written my own script for scanning in PowerShell that does the following:
- Scans using the Windows Image Acquisition APIs via COM
- Runs OCR on the image using Microsoft Office Document Imaging via COM (which may already be on your PC if you have Office installed)
- Converts the image to JPEG using .NET Image APIs
- Stores the OCR text into the EXIF comment field using
.NET Image APIs (which means Windows Search can index the image by the text in the image)
- Moves the image to the public share
Here's the actual code from my scan.ps1 file:
param([Switch] $ShowProgress, [switch] $OpenCompletedResult)
$filePathTemplate = "C:\users\public\pictures\scanned\scan {0} {1}.{2}";
$time = get-date -uformat "%Y-%m-%d";
[void]([reflection.assembly]::loadfile( "C:\Windows\Microsoft.NET\Framework\v2.0.50727\System.Drawing.dll"))
$deviceManager = new-object -ComObject WIA.DeviceManager
$device = $deviceManager.DeviceInfos.Item(1).Connect();
foreach ($item in $device.Items) {
$fileIdx = 0;
while (test-path ($filePathTemplate -f $time,$fileIdx,"*")) {
[void](++$fileIdx);
}
if ($ShowProgress) { "Scanning..." }
$image = $item.Transfer();
$fileName = ($filePathTemplate -f $time,$fileIdx,$image.FileExtension);
$image.SaveFile($fileName);
clear-variable image
if ($ShowProgress) { "Running OCR..." }
$modiDocument = new-object -comobject modi.document;
$modiDocument.Create($fileName);
$modiDocument.OCR();
if ($modiDocument.Images.Count -gt 0) {
$ocrText = $modiDocument.Images.Item(0).Layout.Text.ToString().Trim();
$modiDocument.Close();
clear-variable modiDocument
if (!($ocrText.Equals(""))) {
$fileAsImage = New-Object -TypeName system.drawing.bitmap -ArgumentList $fileName
if (!($fileName.EndsWith(".jpg") -or $fileName.EndsWith(".jpeg"))) {
if ($ShowProgress) { "Converting to JPEG..." }
$newFileName = ($filePathTemplate -f $time,$fileIdx,"jpg");
$fileAsImage.Save($newFileName, [System.Drawing.Imaging.ImageFormat]::Jpeg);
$fileAsImage.Dispose();
del $fileName;
$fileAsImage = New-Object -TypeName system.drawing.bitmap -ArgumentList $newFileName
$fileName = $newFileName
}
if ($ShowProgress) { "Saving OCR Text..." }
$property = $fileAsImage.PropertyItems[0];
$property.Id = 40092;
$property.Type = 1;
$property.Value = [system.text.encoding]::Unicode.GetBytes($ocrText);
$property.Len = $property.Value.Count;
$fileAsImage.SetPropertyItem($property);
$fileAsImage.Save(($fileName + ".new"));
$fileAsImage.Dispose();
del $fileName;
ren ($fileName + ".new") $fileName
}
}
else {
$modiDocument.Close();
clear-variable modiDocument
}
if ($ShowProgress) { "Done." }
if ($OpenCompletedResult) {
. $fileName;
}
else {
$result = dir $fileName;
$result | add-member -membertype noteproperty -name OCRText -value $ocrText
$result
}
}
I ran into a few issues:
- MODI doesn't seem to be in the Office 2010 Technical Preview I installed first. Installing Office 2007 fixed that.
- The MODI.Document class, at least via PowerShell, can't be instantiated in a 64bit environment. To run the script on my 64bit OS I had to start powershell from the 32bit cmd.exe
(C:\windows\syswow64\cmd.exe).
- I was planning to hook up my script to the scanner's 'Scan' button, but
HP didn't get the button working for their Vista driver. Their workaround is "don't do that!".
- You must call Image.Dispose() to get .NET to release its reference to the corresponding image file.
- In trying to figure out how to store the text in the files comment, I ran into a dead-end trying to find the corresponding setter for GetDetailsOf which folks like James O'Neil use in PowerShell for interesting ends.
technical scanner ocr .net modi powershell office wia 2009 Jun 20, 9:39If you have Office installed you may have an OCR library sitting on your hard drive just waiting to be used via C#...
ocr microsoft office .net automation scanner camera windows technical 2009 Jun 12, 12:20"We have discovered remotely-exploitable vulnerabilities in Green Dam, the censorship software reportedly mandated by the Chinese government. Any web site a Green Dam user visits can take control of
the PC. According to press reports, China will soon require all PCs sold in the country to include Green Dam. This software monitors web sites visited and other activity on the computer and blocks
adult content as well as politically sensitive material."
censorship china hack security internet greendam 2009 Jun 2, 12:51"The Artvertiser is an urban, hand-held, augmented-reality project exploring on-site substitution of advertising content for the purposes of exhibiting art." There's some videos on the site of their
prototype software. I've got a similar idea I want to try with my G1.
video art design advertising aug augmented-reality 2009 May 27, 3:01"The Microsoft Connect service is a web-platform for communication between Microsoft Software Engineers and their developer community... Unfortunately the sign-up and feature request process is a
little confusing, and long-winded, so I have put together a guide to help people get to the right place."
microsoft internet ie ie8 ie9 html5 canvas humor reference screenshot 2009 May 3, 10:03"Architectural Styles and the Design of Network-based Software Architectures - DISSERTATION submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY in Information
and Computer Science by Roy Thomas Fielding 2000"
http rest paper web architecture development api webservices roy-fielding 2009 Apr 15, 7:33The emulator behind those cool script based Mario hacks. "FCEUX is a cross platform, NTSC and PAL Famicom/NES emulator that ... gives the best of all worlds for the casual player, the ROM-hacking
community, Lua Scripters, and the Tool-Assisted Speedrun Community."
emulator nintendo videogame software programming game 2009 Mar 20, 6:18
IE8, the software I've been working on for some time now, has finally been released at MIX09.
As I mentioned previously, I worked on
accelerators (previously named
Activities) in IE8. Looking at the
kinds of things I blog about on the IE Blog, you might also
correctly guess that I work on the networking stack. Ask me about what else I worked on during IE8 development. The past few months were very busy for me and I'm happy this is finally out.
technical internet explorer ie8 2009 Feb 24, 9:32Of course Netflix is already available on the 360, but PlayOn lets you watch Hulu on the 360. So far so good with the trial software. "Windows only: Previously mentioned Windows utility PlayOn-which
streams popular online video to your PS3, Xbox 360, and HP MediaSmart TV-has officially left its beta phase in the dust"
hulu video xbox xbox360 mediacenter dvr windows tv 2009 Jan 16, 4:02"European regulators notified Microsoft it believes the software giant is in violation of the region's antitrust laws by bundling its Internet Explorer browser in Windows, the company said Friday."
microsoft news browser opera browser-war ie windows eu 2009 Jan 15, 8:00Cool application that turns your sketches into graphs. I wonder if this can ever come to my phone? "Sketch a rough shape with your finger and Instaviz transforms what you drew in a split second.
Sketch a link between two shapes and Instaviz quickly redraws the graph with the best layout."
graph visualization iphone graphviz phone software application development 2008 Nov 9, 11:25
I've made an XSLT Meddler script in my continued XSLT adventures. Meddler is a
simple and easy web server that runs whatever JScript.NET code you give it. I wrote a script that takes an indicated XSLT on the server, downloads an indicated XML from the Internet and returns the
result of running that XML through the XSLT. This is useful when you want to work with something like the Zune software or IE7's feed platform which only reads feeds over the HTTP protocol. I'll
give more interesting and specific examples of how this could be useful in the future.
meddler technical xml script xslt 2008 Nov 5, 3:55A graphing library which includes variaous graph visualization algorithms. GNU licensed. "igraph is a free software package for creating and manipulating undirected and directed graphs. It includes
implementations for classic graph theory problems like minimum spanning trees and network flow, and also implements algorithms for some recent network analysis methods, like community structure
search."
reference free development programming visualization graph math library opensource c++ igraph graphviz via:mattb 2008 Nov 5, 3:51This site has example implementations for feedsync: "The FeedSync Specification is available under the Creative Commons Attribution-Share Alike License and the Microsoft Open Specification Promise.
Microsoft encourages developers to create independent implementations of the FeedSync specification. See the Developer page for more information on how to write a FeedSync enabled application, and
the Implementations page to see how people are using FeedSync already."
free software development feedsync feed microsoft live windows rss sse 2008 Nov 3, 2:01Software that can produce the design for a key from a photo of a key. "Scenes from one of the proof-of-concept telephoto experiments using a new software program from UC San Diego that can perform
key duplication without having the key. Instead, the computer scientists only need a photograph of the key."
security photo software research paranoia key