2010 Mar 5, 10:21Document explaining the relationship between the various web storage APIs coming out of HTML 5. To summarize:
Web Storage (aka DOM Storage) - simple key/value pairs API.
WebSimple DB API - now called Indexed Database API.
Indexed Database API and Web SQL Database - competing database APIs.
Application Cache - Storage of HTTP resources for offline apps.
DataCache API - A programmatically modifiable Application Cache.
html html5 standard programming technical wiki w3c database storage web 2010 Jan 6, 2:24This time its BoingBoing's list of 2010 indie games.
game videogame 2009 Dec 4, 10:24Flickr dev. blog on the accept-language HTTP header: "It’s true that the Accept-Language header has a troubled history. Because of this, many developers regard it the way medieval villagers might
have regarded a woman with a warty nose and a pet cat – it should be shunned, avoided and possibly burned at the stake." And this great anecdote: "In two and a half years of running as an
international site, we’ve only ever had one case where it didn’t work. Helio, a cellphone company, had a browser was custom-built for them in Korea, and had its “Accept-Language” header hard-coded to
always request Korean, something which led to much confusion for the Flickr users amongst their American customers."
flickr internationalization language accept-language http http-header development technical web 2009 Nov 27, 6:10"What follows is a brief description of the method we have developed for encoding arbitrary shellcode as English text. This English shellcode is completely self-contained, i.e., it does not require
an external loader, and executes as valid IA32 code."
security polyglot intel paper research programming hack obfuscation english language technical system:filetype:pdf system:media:document 2009 Nov 23, 11:28"Thanks to the utilization of new technology, we're now seeing large-scale success in eliminating the need for passwords while increasing the successful registration rate at websites to over 90%...In
addition, after a thorough evaluation of the security and privacy of these technologies, the same techniques are being piloted by President Obama's open identity initiative to enable citizens to sign
in more easily to government-operated websites."
identity openid google security authentication facebook password via:connolly technical 2009 Oct 22, 12:33"When asked for the most valuable topic in Demand’s arsenal, he replies instantly: “‘Where can I donate a car in Dallas?’"
via:kris.kowal wired internet video howto automation business media marketing economics advertising 2009 Sep 23, 3:13ASCIIpOrtal is now released!
ascii text game videogame humor portal valve free via:waxy 2009 Sep 10, 7:22HTML validator can validate that your document is both HTML and XHTML at the same time.
html5 xhtml html validator technical web polyglot 2009 Aug 28, 3:39
I built timestamp.exe, a Windows command line tool to convert between computer and human readable date/time formats
mostly for working on the first run wizard for IE8. We commonly write out our dates in binary form to the registry and in order to test and debug my work it became useful to be able to determine to
what date the binary value of a FILETIME or SYSTEMTIME corresponded or to produce my own binary value of a FILETIME and insert it into the registry.
For instance, to convert to a binary value:
[PS C:\] timestamp -inString 2009/08/28:10:18 -outHexValue -convert filetime
2009/08/28:10:18 as FILETIME: 00 7c c8 d1 c8 27 ca 01
Converting in the other direction, if you don't know what format the bytes are in, just feed them in and timestamp will try all conversions and list only the valid ones:
[PS C:\] timestamp -inHexValue "40 52 1c 3b"
40 52 1c 3b as FILETIME: 1601-01-01:00:01:39.171
40 52 1c 3b as Unix Time: 2001-06-05:03:30:08.000
40 52 1c 3b as DOS Time: 2009-08-28:10:18:00.000
(it also supports OLE Dates, and SYSTEMTIME which aren't listed there because the hex value isn't valid for those types). Or use the guess
option to get timestamp's best guess:
[PS C:\] timestamp -inHexValue "40 52 1c 3b" -convert guess
40 52 1c 3b as DOS Time: 2009-08-28:10:18:00.000
When I first wrote this I had a bug in my function that parses the date-time value string in which I could parse 2009-07-02:10:18 just fine, but I wouldn't be able to parse 2009-09-02:10:18
correctly. This was my code:
success = swscanf_s(timeString, L"%hi%*[\\/- ,]%hi%*[\\/- ,]%hi%*[\\/- ,Tt:.]%hi%*[:.]%hi%*[:.]%hi%*[:.]%hi",
&systemTime->wYear,
&systemTime->wMonth,
&systemTime->wDay,
&systemTime->wHour,
&systemTime->wMinute,
&systemTime->wSecond,
&systemTime->wMilliseconds) > 1;
See the problem?
To convert between these various forms yourself read The Old New Thing date conversion article or
Josh Poley's date time article. I previously wrote about date formats I like and dislike.
date date-time technical time windows tool 2009 Aug 18, 4:19
Before we shipped IE8 there were no Accelerators, so we had some fun making our own for our favorite web services. I've got a small set of tips for creating Accelerators for other people's web
services. I was planning on writing this up as an IE blog post, but Jon wrote a post covering a
similar area so rather than write a full and coherent blog post I'll just list a few points:
- The first thing to try is looking for developer help for the web service, specifically if there's a REST-ful URL based API. For example, Bing Maps has great URL API documentation that would
be enough to create an Accelerator.
- The Accelerator XML is very similar to HTML forms. If you can find an HTML form for the web service for which you want to create an Accelerator, you can view the HTML source and create an
Accelerator based on that.
- I created the FormToAccelerator extension based on the previous idea. You can
use the extension to create an Accelerator from an HTML form, or just use it to create the start of one and edit it manually after.
- If the page doesn't use an HTML form, you can start up an HTTP debugger like Fiddler, use the web service from the normal web
page, and then in Fiddler see if you can find a REST-ful looking URL you can use.
- When looking to create a preview for your Accelerator, see if the web page for the web service has a mobile version or a version that's intended to embed in other web pages via an iframe. On
this same line, iPhone apps make great Accelerators usually with lovely previews.
- If there's no mobile or embeddable version and the only thing wrong with the normal web page for the web service is that the useful information doesn't fit in the preview window then see if you
can find an HTML tag with a name or id near the useful information, and stick a '#' fragment pointing to that tag onto the preview URL template.
- Without a reasonable REST-ful API you can use a combination of Google's "site:" and "I'm Feeling Lucky" to find the most relevant page on a particular site.
- The value of a name and value pair need not consist of only a single Accelerator variable. You can get creative and put other text in there. For instance, I implemented a Google currency conversion by setting the query to "{selection} in US Dollars".
technical accelerator ie8 ie 2009 Jul 28, 5:07Suggests that local news must provide the raw facts and only in particular cases do a 'story' on top of that -- not everything needs to be a story.
news via:sambrook journalism 2009 Jul 28, 3:39Linus Torvalds: "I'm a big believer in "technology over politics"...I may make jokes about Microsoft at times, but at the same time, I think the Microsoft hatred is a disease." This goes well with
his previous quote calling Slashdot a "big public wanking session".
linux linus-torvalds microsoft politics technical 2009 Jun 30, 4:59"Congratulations on not being devoured and purchasing the Wagglemax Zombocalypse TM Survival Kit"
humor video commercial ad apocalypse zombie horror for:hellosarah videogame 2009 Jun 27, 3:42
I've hooked up the printer/scanner to the Media Center PC since I leave that on all the time anyway so we can have a networked printer. I wanted to hook up the scanner in a somewhat similar fashion
but I didn't want to install HP's software (other than the drivers of course). So I've written my own script for scanning in PowerShell that does the following:
- Scans using the Windows Image Acquisition APIs via COM
- Runs OCR on the image using Microsoft Office Document Imaging via COM (which may already be on your PC if you have Office installed)
- Converts the image to JPEG using .NET Image APIs
- Stores the OCR text into the EXIF comment field using
.NET Image APIs (which means Windows Search can index the image by the text in the image)
- Moves the image to the public share
Here's the actual code from my scan.ps1 file:
param([Switch] $ShowProgress, [switch] $OpenCompletedResult)
$filePathTemplate = "C:\users\public\pictures\scanned\scan {0} {1}.{2}";
$time = get-date -uformat "%Y-%m-%d";
[void]([reflection.assembly]::loadfile( "C:\Windows\Microsoft.NET\Framework\v2.0.50727\System.Drawing.dll"))
$deviceManager = new-object -ComObject WIA.DeviceManager
$device = $deviceManager.DeviceInfos.Item(1).Connect();
foreach ($item in $device.Items) {
$fileIdx = 0;
while (test-path ($filePathTemplate -f $time,$fileIdx,"*")) {
[void](++$fileIdx);
}
if ($ShowProgress) { "Scanning..." }
$image = $item.Transfer();
$fileName = ($filePathTemplate -f $time,$fileIdx,$image.FileExtension);
$image.SaveFile($fileName);
clear-variable image
if ($ShowProgress) { "Running OCR..." }
$modiDocument = new-object -comobject modi.document;
$modiDocument.Create($fileName);
$modiDocument.OCR();
if ($modiDocument.Images.Count -gt 0) {
$ocrText = $modiDocument.Images.Item(0).Layout.Text.ToString().Trim();
$modiDocument.Close();
clear-variable modiDocument
if (!($ocrText.Equals(""))) {
$fileAsImage = New-Object -TypeName system.drawing.bitmap -ArgumentList $fileName
if (!($fileName.EndsWith(".jpg") -or $fileName.EndsWith(".jpeg"))) {
if ($ShowProgress) { "Converting to JPEG..." }
$newFileName = ($filePathTemplate -f $time,$fileIdx,"jpg");
$fileAsImage.Save($newFileName, [System.Drawing.Imaging.ImageFormat]::Jpeg);
$fileAsImage.Dispose();
del $fileName;
$fileAsImage = New-Object -TypeName system.drawing.bitmap -ArgumentList $newFileName
$fileName = $newFileName
}
if ($ShowProgress) { "Saving OCR Text..." }
$property = $fileAsImage.PropertyItems[0];
$property.Id = 40092;
$property.Type = 1;
$property.Value = [system.text.encoding]::Unicode.GetBytes($ocrText);
$property.Len = $property.Value.Count;
$fileAsImage.SetPropertyItem($property);
$fileAsImage.Save(($fileName + ".new"));
$fileAsImage.Dispose();
del $fileName;
ren ($fileName + ".new") $fileName
}
}
else {
$modiDocument.Close();
clear-variable modiDocument
}
if ($ShowProgress) { "Done." }
if ($OpenCompletedResult) {
. $fileName;
}
else {
$result = dir $fileName;
$result | add-member -membertype noteproperty -name OCRText -value $ocrText
$result
}
}
I ran into a few issues:
- MODI doesn't seem to be in the Office 2010 Technical Preview I installed first. Installing Office 2007 fixed that.
- The MODI.Document class, at least via PowerShell, can't be instantiated in a 64bit environment. To run the script on my 64bit OS I had to start powershell from the 32bit cmd.exe
(C:\windows\syswow64\cmd.exe).
- I was planning to hook up my script to the scanner's 'Scan' button, but
HP didn't get the button working for their Vista driver. Their workaround is "don't do that!".
- You must call Image.Dispose() to get .NET to release its reference to the corresponding image file.
- In trying to figure out how to store the text in the files comment, I ran into a dead-end trying to find the corresponding setter for GetDetailsOf which folks like James O'Neil use in PowerShell for interesting ends.
technical scanner ocr .net modi powershell office wia 2009 May 25, 10:19"Wallace and Gromit return-this time as purveyors of the Top Bun bakery, despite the fact that 12 other local bakers have disappeared in the previous year. Now it's up to Gromit to solve the mystery
while Wallace woos new love interest Piella Bakewell."
humor movie wallace-and-gromit animation claymation for:hellosarah 2009 May 19, 1:43Lovely, although a bit out of my price range. "Formerly the Golden Gate Lutheran Church, this stunning Gothic Revival style building is now one of the most extraordinary and largest single family
homes in San Francisco."
for:hellosarah photo house home church california san-francisco flickr slideshow via:boingboing church-home 2009 May 13, 11:21Lots of interesting notes on the company culture of Zappos. "Yesterday I was lucky enough to visit Zappos and get a tour and talk with some of their executives, including Tony Hsieh, CEO."
twitter marketing business culture zappos shoes ecommerce 2009 May 1, 12:09"If I'm reading the pop-up window correctly, domain registrar Godaddy recommends against purchasing .tv domain names because the island of Tuvalu, which the domain represents, is sinking."
humor dns domain godaddy tv via:boingboing 2009 Apr 7, 1:13A sort of vertical cross section of an overview of what the web should look like from HTTP & URIs to GRDDL & RDF. Oh, and there's a pretty graph at the bottom. "This finding describes how
document formats, markup conventions, attribute values, and other data formats can be designed to facilitate the deployment of self-describing, Web-grounded Web content."
web w3c xml html http semanticweb microformats xhtml atom grddl rdfa rdf 2009 Mar 6, 5:16
I've found while debugging networking in IE its often useful to quickly tell if a string is encoded in UTF-8. You can check for the Byte Order Mark (EF BB BF in UTF-8) but, I rarely see the BOM on
UTF-8 strings. Instead I apply a quick and dirty UTF-8 test that takes advantage of the well-formed UTF-8 restrictions.
Unlike other multibyte character encoding forms (see Windows supported character sets or IANA's list of character sets), for example Big5, where sticking together any two bytes is more likely than not to give a valid byte sequence, UTF-8 is more restrictive. And unlike
other multibyte character encodings, UTF-8 bytes may be taken out of context and one can still know that its a single byte character, the starting byte of a three byte sequence, etc.
The full rules for well-formed UTF-8 are a little too complicated for me to commit to memory. Instead I've got my own simpler (this is the quick part) set of rules that will be mostly correct (this
is the dirty part). For as many bytes in the string as you care to examine, check the most significant digit of the byte:
-
F:
-
This is byte 1 of a 4 byte encoded codepoint and must be followed by 3 trail bytes.
-
E:
-
This is byte 1 of a 3 byte encoded codepoint and must be followed by 2 trail bytes.
-
C..D:
-
This is byte 1 of a 2 byte encoded codepoint and must be followed by 1 trail byte.
-
8..B:
-
This is a trail byte.
-
0..7:
-
This is a single byte encoded codepoint.
The simpler rules can produce false positives in some cases: that is, they'll say a string is UTF-8 when in fact it might not be. But it won't produce false negatives. The following is table
from the
Unicode spec. that actually describes well-formed UTF-8.
Code Points
|
1st Byte
|
2nd Byte
|
3rd Byte
|
4th Byte
|
U+0000..U+007F
|
00..7F
|
U+0080..U+07FF
|
C2..DF
|
80..BF
|
U+0800..U+0FFF
|
E0
|
A0..BF
|
80..BF
|
U+1000..U+CFFF
|
E1..EC
|
80..BF
|
80..BF
|
U+D000..U+D7FF
|
ED
|
80..9F
|
80..BF
|
U+E000..U+FFFF
|
EE..EF
|
80..BF
|
80..BF
|
U+10000..U+3FFFF
|
F0
|
90..BF
|
80..BF
|
80..BF
|
U+40000..U+FFFFF
|
F1..F3
|
80..BF
|
80..BF
|
80..BF
|
U+100000..U+10FFFF
|
F4
|
80..8F
|
80..BF
|
80..BF
|
test technical unicode boring charset utf8 encoding