PowerShell Scanning Script - Dave's Blog

Search

PowerShell Scanning Script

2009 Jun 27, 3:42

I've hooked up the printer/scanner to the Media Center PC since I leave that on all the time anyway so we can have a networked printer. I wanted to hook up the scanner in a somewhat similar fashion but I didn't want to install HP's software (other than the drivers of course). So I've written my own script for scanning in PowerShell that does the following:

  1. Scans using the Windows Image Acquisition APIs via COM
  2. Runs OCR on the image using Microsoft Office Document Imaging via COM (which may already be on your PC if you have Office installed)
  3. Converts the image to JPEG using .NET Image APIs
  4. Stores the OCR text into the EXIF comment field using .NET Image APIs (which means Windows Search can index the image by the text in the image)
  5. Moves the image to the public share

Here's the actual code from my scan.ps1 file:

param([Switch] $ShowProgress, [switch] $OpenCompletedResult)

$filePathTemplate = "C:\users\public\pictures\scanned\scan {0} {1}.{2}";
$time = get-date -uformat "%Y-%m-%d";

[void]([reflection.assembly]::loadfile( "C:\Windows\Microsoft.NET\Framework\v2.0.50727\System.Drawing.dll"))

$deviceManager = new-object -ComObject WIA.DeviceManager
$device = $deviceManager.DeviceInfos.Item(1).Connect();

foreach ($item in $device.Items) {
        $fileIdx = 0;
        while (test-path ($filePathTemplate -f $time,$fileIdx,"*")) {
                [void](++$fileIdx);
        }

        if ($ShowProgress) { "Scanning..." }

        $image = $item.Transfer();
        $fileName = ($filePathTemplate -f $time,$fileIdx,$image.FileExtension);
        $image.SaveFile($fileName);
        clear-variable image

        if ($ShowProgress) { "Running OCR..." }

        $modiDocument = new-object -comobject modi.document;
        $modiDocument.Create($fileName);
        $modiDocument.OCR();
        if ($modiDocument.Images.Count -gt 0) {
                $ocrText = $modiDocument.Images.Item(0).Layout.Text.ToString().Trim();
                $modiDocument.Close();
                clear-variable modiDocument

                if (!($ocrText.Equals(""))) {
                        $fileAsImage = New-Object -TypeName system.drawing.bitmap -ArgumentList $fileName
                        if (!($fileName.EndsWith(".jpg") -or $fileName.EndsWith(".jpeg"))) {
                                if ($ShowProgress) { "Converting to JPEG..." }

                                $newFileName = ($filePathTemplate -f $time,$fileIdx,"jpg");
                                $fileAsImage.Save($newFileName, [System.Drawing.Imaging.ImageFormat]::Jpeg);
                                $fileAsImage.Dispose();
                                del $fileName;

                                $fileAsImage = New-Object -TypeName system.drawing.bitmap -ArgumentList $newFileName 
                                $fileName = $newFileName
                        }

                        if ($ShowProgress) { "Saving OCR Text..." }

                        $property = $fileAsImage.PropertyItems[0];
                        $property.Id = 40092;
                        $property.Type = 1;
                        $property.Value = [system.text.encoding]::Unicode.GetBytes($ocrText);
                        $property.Len = $property.Value.Count;
                        $fileAsImage.SetPropertyItem($property);
                        $fileAsImage.Save(($fileName + ".new"));
                        $fileAsImage.Dispose();
                        del $fileName;
                        ren ($fileName + ".new") $fileName
                }
        }
        else {
                $modiDocument.Close();
                clear-variable modiDocument
        }

        if ($ShowProgress) { "Done." }

        if ($OpenCompletedResult) {
                . $fileName;
        }
        else {
                $result = dir $fileName;
                $result | add-member -membertype noteproperty -name OCRText -value $ocrText
                $result
        }
}

I ran into a few issues:

PermalinkCommentstechnical scanner ocr .net modi powershell office wia
Creative Commons License Some rights reserved.