coding page 3 - Dave's Blog

Search
My timeline on Mastodon

No, you can’t do that with H.264 « Digital Diary of Ben Schwartz

2010 Feb 4, 2:01On the crappy licensing of the H.264 and MPEG codecs in popular video encoding software.PermalinkCommentsvideo encoding codec patent legal law apple microsoft theora h.264 technical

Official Google Blog: Unicode nearing 50% of the web

2010 Jan 28, 4:20Graph of encodings used by documents on the web. Unicode based encodings are thankfully on the rise.PermalinkCommentsunicode encoding web internationalization localization utf8 text html technical

Coding Horror: You're Reading The World's Most Dangerous Programming Blog

2010 Jan 20, 8:28GZip vs Deflate execution speeds. Deflate found to be much faster in particular cases and about the same in the rest.PermalinkCommentsgzip deflate performance technical http compression programming development blog

Code: Flickr Developer Blog » A Chinese puzzle: Unicode and EXIF metadata parsing

2010 Jan 8, 2:08Flickr dev talks image metadata the various forms which to prefer and how to guess at their character encodings.PermalinkCommentsunicode charset flickr photo image exif programming reference xmp technical

Get cached images from your visitors | Diovo

2009 Dec 15, 2:01"Jeff Atwood (Coding Horror fame) was in for a horror when he realized that his server crashed and his data was gone and due to some reason, the backup mechanism was not working. ... So what should Jeff do now? Since Coding horror is a high traffic blog, I think there is a way to get back at least some of the images." Reconstruct the HTML from Google's cache, change the HTTP server to tell the client it has the correct cached image for all the images, add script to the HTML to grab the images and send them back. Awesome idea. Of course now I want to setup Fiddler to swap in random images...PermalinkCommentsvia:ericlaw jeff-atwood backup web http cache image javascript technical

English Shellcode

2009 Nov 27, 6:10"What follows is a brief description of the method we have developed for encoding arbitrary shellcode as English text. This English shellcode is completely self-contained, i.e., it does not require an external loader, and executes as valid IA32 code."PermalinkCommentssecurity polyglot intel paper research programming hack obfuscation english language technical system:filetype:pdf system:media:document

Stroustrup: C++ Style and Technique FAQ

2009 Sep 30, 5:12Bjarne Stroustrup answers the age old style questions like "int *p or int* p?" and "const int a or int const a?"PermalinkCommentsreference c++ faq style coding programming bjarne-stroustrup technical

RFC 1951 - DEFLATE Compressed Data Format Specification version 1.3

2009 Sep 3, 7:17"This specification defines a lossless compressed data format that compresses data using a combination of the LZ77 algorithm and Huffman coding." Also see RFC 1950 zlib, a wrapper compression format that can use deflate, and RFC 1952 gzip, a compressed file format that can use deflate.PermalinkCommentstechnical rfc ietf compression http deflate gzip zlib

Coding Horror: The Paper Data Storage Option

2009 Aug 3, 11:06"But how efficient is the alphabet at encoding information on a page?"PermalinkCommentsvia:ericlaw humor paper storage encoding

PowerShell Scanning Script

2009 Jun 27, 3:42

I've hooked up the printer/scanner to the Media Center PC since I leave that on all the time anyway so we can have a networked printer. I wanted to hook up the scanner in a somewhat similar fashion but I didn't want to install HP's software (other than the drivers of course). So I've written my own script for scanning in PowerShell that does the following:

  1. Scans using the Windows Image Acquisition APIs via COM
  2. Runs OCR on the image using Microsoft Office Document Imaging via COM (which may already be on your PC if you have Office installed)
  3. Converts the image to JPEG using .NET Image APIs
  4. Stores the OCR text into the EXIF comment field using .NET Image APIs (which means Windows Search can index the image by the text in the image)
  5. Moves the image to the public share

Here's the actual code from my scan.ps1 file:

param([Switch] $ShowProgress, [switch] $OpenCompletedResult)

$filePathTemplate = "C:\users\public\pictures\scanned\scan {0} {1}.{2}";
$time = get-date -uformat "%Y-%m-%d";

[void]([reflection.assembly]::loadfile( "C:\Windows\Microsoft.NET\Framework\v2.0.50727\System.Drawing.dll"))

$deviceManager = new-object -ComObject WIA.DeviceManager
$device = $deviceManager.DeviceInfos.Item(1).Connect();

foreach ($item in $device.Items) {
        $fileIdx = 0;
        while (test-path ($filePathTemplate -f $time,$fileIdx,"*")) {
                [void](++$fileIdx);
        }

        if ($ShowProgress) { "Scanning..." }

        $image = $item.Transfer();
        $fileName = ($filePathTemplate -f $time,$fileIdx,$image.FileExtension);
        $image.SaveFile($fileName);
        clear-variable image

        if ($ShowProgress) { "Running OCR..." }

        $modiDocument = new-object -comobject modi.document;
        $modiDocument.Create($fileName);
        $modiDocument.OCR();
        if ($modiDocument.Images.Count -gt 0) {
                $ocrText = $modiDocument.Images.Item(0).Layout.Text.ToString().Trim();
                $modiDocument.Close();
                clear-variable modiDocument

                if (!($ocrText.Equals(""))) {
                        $fileAsImage = New-Object -TypeName system.drawing.bitmap -ArgumentList $fileName
                        if (!($fileName.EndsWith(".jpg") -or $fileName.EndsWith(".jpeg"))) {
                                if ($ShowProgress) { "Converting to JPEG..." }

                                $newFileName = ($filePathTemplate -f $time,$fileIdx,"jpg");
                                $fileAsImage.Save($newFileName, [System.Drawing.Imaging.ImageFormat]::Jpeg);
                                $fileAsImage.Dispose();
                                del $fileName;

                                $fileAsImage = New-Object -TypeName system.drawing.bitmap -ArgumentList $newFileName 
                                $fileName = $newFileName
                        }

                        if ($ShowProgress) { "Saving OCR Text..." }

                        $property = $fileAsImage.PropertyItems[0];
                        $property.Id = 40092;
                        $property.Type = 1;
                        $property.Value = [system.text.encoding]::Unicode.GetBytes($ocrText);
                        $property.Len = $property.Value.Count;
                        $fileAsImage.SetPropertyItem($property);
                        $fileAsImage.Save(($fileName + ".new"));
                        $fileAsImage.Dispose();
                        del $fileName;
                        ren ($fileName + ".new") $fileName
                }
        }
        else {
                $modiDocument.Close();
                clear-variable modiDocument
        }

        if ($ShowProgress) { "Done." }

        if ($OpenCompletedResult) {
                . $fileName;
        }
        else {
                $result = dir $fileName;
                $result | add-member -membertype noteproperty -name OCRText -value $ocrText
                $result
        }
}

I ran into a few issues:

PermalinkCommentstechnical scanner ocr .net modi powershell office wia

Coding4Fun : Look at me! Windows Image Acquisition

2009 Jun 20, 9:43How to use the WIA APIs in C#. WIA is Windows API to get images from scanners and cameras. And, as I found out, if you want to use the API in PowerShell try '$deviceManager = new-object -ComObject WIA.DeviceManager'PermalinkCommentsvideo scanner api wia csharp howto programming camera image photo .net webcam technical

Infrared Paint Link Roundup

2009 May 29, 2:50

I like the idea of QR codes, encoding URLs and placing them on real world objects, but the QR codes themselves are kind of ugly. To make them less obvious I thought I could spray QR codes on to an object with an infrared reflective paint and shine infrared light on the QR codes, since most cameras, for instance the camera in my G1 phone, pick up infrared that our eyes do not.

In my search for infrared paint I've found a seller of IR ink (via programming forum) and an Infrared Paint Recipe (via IR FAQ).

In looking for this paint I've found that it comes up a lot in relation to the military for things like paint markers that are visible at night with proper equipment, and paint that absorbs IR light to make vehicles less obvious to night vision goggles. Even though the first reflects infrared light and the second absorbs it websites end up refering to both as infrared paint which made it difficult to search.

Additionally I found links to some other geeky infrared projects:

PermalinkCommentsir paint technical ir infrared qr qr code

[whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009 Apr 23, 1:35"This e-mail is an attempt to give a relatively concise yet reasonably complete overview of non-Unicode character sets and encodings for 'Chinese characters', excluding those which are not supported by at least one of the four browsers IE, Safari, Firefox and Opera (henceforth 'all browsers'), and tentatively avoiding technical details which are out of scope for HTML5 unless they are important to gain a general understanding of the relevant issues."PermalinkCommentshtml html5 iso-2022 charset encoding character unicode cjk

URLs are tough - Anne's Weblog

2009 Apr 7, 1:30I really dislike how IE deals with non-US-ASCII in URLs. I should write up a post on what exactly IE does with non-US-ASCII characters in URLs. "Just like IRIs the URL is mapped to a URI using UTF-8. Except for the query component of the URL (the bit after the question mark). Here for legacy reasons the encoding of the document is used instead. Except if the encoding of the document is UTF-16, in which case UTF-8 is used. Effectively, using non-ASCII characters in URLs in documents not encoded as UTF-8 or UTF-16 will give you surprising results, to say the least. Yay for browsers!"PermalinkCommentshttp encoding html5 url uri unicode iri

Sorting it all Out : What do you get when you combine a base character with a buttload of diacritics?

2009 Mar 6, 11:47"Anyway, I decided to take the letter a and put as many different diacritics on it as I could." Micahel Kaplan sticks like 80 diacritics on the letter 'a'. Awesome.PermalinkCommentsencoding unicode diacritic language letter michael-kaplan

The 'Is It UTF-8?' Quick and Dirty Test

2009 Mar 6, 5:16

I've found while debugging networking in IE its often useful to quickly tell if a string is encoded in UTF-8. You can check for the Byte Order Mark (EF BB BF in UTF-8) but, I rarely see the BOM on UTF-8 strings. Instead I apply a quick and dirty UTF-8 test that takes advantage of the well-formed UTF-8 restrictions.

Unlike other multibyte character encoding forms (see Windows supported character sets or IANA's list of character sets), for example Big5, where sticking together any two bytes is more likely than not to give a valid byte sequence, UTF-8 is more restrictive. And unlike other multibyte character encodings, UTF-8 bytes may be taken out of context and one can still know that its a single byte character, the starting byte of a three byte sequence, etc.

The full rules for well-formed UTF-8 are a little too complicated for me to commit to memory. Instead I've got my own simpler (this is the quick part) set of rules that will be mostly correct (this is the dirty part). For as many bytes in the string as you care to examine, check the most significant digit of the byte:

F:
This is byte 1 of a 4 byte encoded codepoint and must be followed by 3 trail bytes.
E:
This is byte 1 of a 3 byte encoded codepoint and must be followed by 2 trail bytes.
C..D:
This is byte 1 of a 2 byte encoded codepoint and must be followed by 1 trail byte.
8..B:
This is a trail byte.
0..7:
This is a single byte encoded codepoint.
The simpler rules can produce false positives in some cases: that is, they'll say a string is UTF-8 when in fact it might not be. But it won't produce false negatives. The following is table from the Unicode spec. that actually describes well-formed UTF-8.
Code Points 1st Byte 2nd Byte 3rd Byte 4th Byte
U+0000..U+007F 00..7F
U+0080..U+07FF C2..DF 80..BF
U+0800..U+0FFF E0 A0..BF 80..BF
U+1000..U+CFFF E1..EC 80..BF 80..BF
U+D000..U+D7FF ED 80..9F 80..BF
U+E000..U+FFFF EE..EF 80..BF 80..BF
U+10000..U+3FFFF F0 90..BF 80..BF 80..BF
U+40000..U+FFFFF F1..F3 80..BF 80..BF 80..BF
U+100000..U+10FFFF F4 80..8F 80..BF 80..BF

PermalinkCommentstest technical unicode boring charset utf8 encoding

GIVE [dive into mark]

2009 Jan 26, 2:12Mark Pilgrim's series of articles and slides from a corresponding talk on video encoding.PermalinkCommentsmark-pilgrim video encoding audio reference codec

Tom Ricks's Inbox - washingtonpost.com

2008 Oct 13, 2:40Watch out for too good to be true washing services (or free network traffic anonymization): "The laundry would then send out "color coded" special discount tickets, to the effect of "get two loads for the price of one," etc. The color coding was matched to specific streets and thus when someone brought in their laundry, it was easy to determine the general location from which a city map was coded. While the laundry was indeed being washed, pressed and dry cleaned, it had one additional cycle -- every garment, sheet, glove, pair of pants, was first sent through an analyzer, located in the basement, that checked for bomb-making residue." From the comment section of Schneier on Security on this topic: "Yet another example of how inexpensive, reliable home washers and dryers help terrorists. When will we learn?"PermalinkCommentssecurity history laundromat ira terrorism bomb

QuickBase Formula Pretty Printer and Syntax Highlighter

2008 Oct 5, 9:17

Sarah asked me if I knew of a syntax highlighter for the QuickBase formula language which she uses at work. I couldn't find one but thought it might be fun to make a QuickBase Formula syntax highlighter based on the QuickBase help's description of the formula syntax. Thankfully the language is relatively simple since my skills with ANTLR, the parser generator, are rusty now and I've only used it previously for personal projects (like Javaish, the ridiculous Java based shell idea I had).

With the help of some great ANTLR examples and an ANTLR cheat sheet I was able to come up with the grammar that parses the QuickBase Formula syntax and prints out the same formula marked up with HTML SPAN tags and various CSS classes. ANTLR produces the parser in Java which I wrapped up in an applet, put in a jar, and embedded in an HTML page. The script in that page runs user input through the applet's parser and sticks the output at the bottom of the page with appropriate CSS rules to highlight and print the formula in a pretty fashion.

What I learned:

PermalinkCommentsjava technical programming quickbase language antlr antlrworks

Official Google Blog: Moving to Unicode 5.1

2008 May 7, 4:24Woo Unicode! "For the first time, we found that Unicode was the most frequent encoding found on web pages, overtaking both ASCII and Western European encodings"PermalinkCommentsgoogle encoding i18n utf8 unicode ascii web
Older EntriesNewer Entries Creative Commons License Some rights reserved.