Raymond Chen has some thought experiments useful for discovering various kinds of stupidity in software design:
Tim Berners-Lee's principles of Web design includes my favorite: Test of Independent Invention. This has a thought experiment containing the construction of the MMM (Multi-Media Mesh) with MRIs (Media Resource Identifiers) and MMTP (Muli-Media Transport Protocol).
The Internet design principles (RFC 1958) includes the Robustness Principle: be strict when sending and tolerant when receiving. A good one, but applied too liberally can lead to interop issues. For instance, consider web browsers. Imagine one browser becomes so popular that web devs create web pages and just test out their pages in this popular browser. They don't ensure their pages conform to standards and accidentally end up depending on the manner in which this popular browser tolerantly accepts non-standard input. This non-standard behavior ends up as de facto standard and future updates to the standard essentially has had decisions made for it.
I built timestamp.exe, a Windows command line tool to convert between computer and human readable date/time formats mostly for working on the first run wizard for IE8. We commonly write out our dates in binary form to the registry and in order to test and debug my work it became useful to be able to determine to what date the binary value of a FILETIME or SYSTEMTIME corresponded or to produce my own binary value of a FILETIME and insert it into the registry.
For instance, to convert to a binary value:
[PS C:\] timestamp -inString 2009/08/28:10:18 -outHexValue -convert filetime
2009/08/28:10:18 as FILETIME: 00 7c c8 d1 c8 27 ca 01
Converting in the other direction, if you don't know what format the bytes are in, just feed them in and timestamp will try all conversions and list only the valid ones:
[PS C:\] timestamp -inHexValue "40 52 1c 3b"
40 52 1c 3b as FILETIME: 1601-01-01:00:01:39.171
40 52 1c 3b as Unix Time: 2001-06-05:03:30:08.000
40 52 1c 3b as DOS Time: 2009-08-28:10:18:00.000
(it also supports OLE Dates, and SYSTEMTIME which aren't listed there because the hex value isn't valid for those types). Or use the guess
option to get timestamp's best guess:
[PS C:\] timestamp -inHexValue "40 52 1c 3b" -convert guess
40 52 1c 3b as DOS Time: 2009-08-28:10:18:00.000
When I first wrote this I had a bug in my function that parses the date-time value string in which I could parse 2009-07-02:10:18 just fine, but I wouldn't be able to parse 2009-09-02:10:18 correctly. This was my code:
success = swscanf_s(timeString, L"%hi%*[\\/- ,]%hi%*[\\/- ,]%hi%*[\\/- ,Tt:.]%hi%*[:.]%hi%*[:.]%hi%*[:.]%hi",
&systemTime->wYear,
&systemTime->wMonth,
&systemTime->wDay,
&systemTime->wHour,
&systemTime->wMinute,
&systemTime->wSecond,
&systemTime->wMilliseconds) > 1;
See the problem?
To convert between these various forms yourself read The Old New Thing date conversion article or Josh Poley's date time article. I previously wrote about date formats I like and dislike.
Before we shipped IE8 there were no Accelerators, so we had some fun making our own for our favorite web services. I've got a small set of tips for creating Accelerators for other people's web services. I was planning on writing this up as an IE blog post, but Jon wrote a post covering a similar area so rather than write a full and coherent blog post I'll just list a few points:
There's no easy way to use local applications on a PC as the result of an accelerator or a search provider in IE8 but there is a hack-y/obvious way, that I'll describe here. Both accelerators and search providers in IE8 fill in URL templates and navigate to the resulting URL when an accelerator or search provider is executed by the user. These URLs are limited in scheme to http and https but those pages may do anything any other webpage may do. If your local application has an ActiveX control you could use that, or (as I will provide examples for) if the local application has registered for an application protocol you can redirect to that URL. In any case, unfortunately this means that you must put a webpage on the Internet in order to get an accelerator or search provider to use a local application.
For examples of the app protocol case, I've created a callto accelerator that uses whatever application is registered for the callto scheme on your system, and a Windows Search search provider that opens Explorer's search with your search query. The callto accelerator navigates to my redirection page with 'callto:' followed by the selected text in the fragment and the redirection page redirects to that callto URL. In the Windows Search search provider case the same thing happens except the fragment contains 'search-ms:query=' followed by the selected text, which starts Windows Search on your system with the selected text as the query. I've looked into app protocols previously.
I've hooked up the printer/scanner to the Media Center PC since I leave that on all the time anyway so we can have a networked printer. I wanted to hook up the scanner in a somewhat similar fashion but I didn't want to install HP's software (other than the drivers of course). So I've written my own script for scanning in PowerShell that does the following:
Here's the actual code from my scan.ps1 file:
param([Switch] $ShowProgress, [switch] $OpenCompletedResult)
$filePathTemplate = "C:\users\public\pictures\scanned\scan {0} {1}.{2}";
$time = get-date -uformat "%Y-%m-%d";
[void]([reflection.assembly]::loadfile( "C:\Windows\Microsoft.NET\Framework\v2.0.50727\System.Drawing.dll"))
$deviceManager = new-object -ComObject WIA.DeviceManager
$device = $deviceManager.DeviceInfos.Item(1).Connect();
foreach ($item in $device.Items) {
$fileIdx = 0;
while (test-path ($filePathTemplate -f $time,$fileIdx,"*")) {
[void](++$fileIdx);
}
if ($ShowProgress) { "Scanning..." }
$image = $item.Transfer();
$fileName = ($filePathTemplate -f $time,$fileIdx,$image.FileExtension);
$image.SaveFile($fileName);
clear-variable image
if ($ShowProgress) { "Running OCR..." }
$modiDocument = new-object -comobject modi.document;
$modiDocument.Create($fileName);
$modiDocument.OCR();
if ($modiDocument.Images.Count -gt 0) {
$ocrText = $modiDocument.Images.Item(0).Layout.Text.ToString().Trim();
$modiDocument.Close();
clear-variable modiDocument
if (!($ocrText.Equals(""))) {
$fileAsImage = New-Object -TypeName system.drawing.bitmap -ArgumentList $fileName
if (!($fileName.EndsWith(".jpg") -or $fileName.EndsWith(".jpeg"))) {
if ($ShowProgress) { "Converting to JPEG..." }
$newFileName = ($filePathTemplate -f $time,$fileIdx,"jpg");
$fileAsImage.Save($newFileName, [System.Drawing.Imaging.ImageFormat]::Jpeg);
$fileAsImage.Dispose();
del $fileName;
$fileAsImage = New-Object -TypeName system.drawing.bitmap -ArgumentList $newFileName
$fileName = $newFileName
}
if ($ShowProgress) { "Saving OCR Text..." }
$property = $fileAsImage.PropertyItems[0];
$property.Id = 40092;
$property.Type = 1;
$property.Value = [system.text.encoding]::Unicode.GetBytes($ocrText);
$property.Len = $property.Value.Count;
$fileAsImage.SetPropertyItem($property);
$fileAsImage.Save(($fileName + ".new"));
$fileAsImage.Dispose();
del $fileName;
ren ($fileName + ".new") $fileName
}
}
else {
$modiDocument.Close();
clear-variable modiDocument
}
if ($ShowProgress) { "Done." }
if ($OpenCompletedResult) {
. $fileName;
}
else {
$result = dir $fileName;
$result | add-member -membertype noteproperty -name OCRText -value $ocrText
$result
}
}
I ran into a few issues:
A while ago I promised to say how an xsltproc Meddler script would be useful and the general answer is its useful for hooking up a client application that wants data from the web in a particular XML format and the data is available on the web but in another XML format. The specific case for this post is a Flickr Search service that includes IE8 Visual Search Suggestions. IE8 wants the Visual Search Suggestions XML format and Flickr gives out search data in their Flickr web API XML format.
So I wrote an XSLT to convert from Flickr Search XML to Visual Suggestions XML and used my xsltproc Meddler script to actually apply this xslt.
After getting this all working I've placed the result in two places: (1) I've updated the xsltproc Meddler script to include this XSLT and an XML file to install it as a search provider - although you'll need to edit the XML to include your own Flickr API key. (2) I've created a service for this so you can just install the Flickr search provider if you're interested in having the functionality and don't care about the implementation. Additionally, to the search provider I've added accelerator preview support to show the Flickr slideshow which I think looks snazzy.
Doing a quick search for this it looks like there's at least one other such implementation, but mine has the distinction of being done through XSLT which I provide, updated XML namespaces to work with the released version of IE8, and I made it so you know its good.
IE8, the software I've been working on for some time now, has finally been released at MIX09.
Working on Internet Explorer extensions in C++ & COM, I had to relearn or rediscover how to do several totally basic and important things. To save myself and possibly others trouble in the future, here's some pertinent links and tips.
First you must choose your IE extensibility point. Here's a very short list of the few I've used:
Once you've created your COM object that implements IObjectWithSite and whatever other interfaces your extensibility point requires as described in the above links you'll see your SetSite method get called by IE. You might want to know how to get the top level browser object from the IUnknown site object passed in via that method.
After that you may also want to listen for some events from the browser. To do this you'll need to:
If you want to check if an IHTMLElement is not visible on screen due how the page is scrolled, try comparing the Body or Document Element's client height and width, which appears to be the dimensions of the visible document area, to the element's bounding client rect which appears to be its position relative to the upper left corner of the visible document area. I've found this to be working for me so far, but I'm not positive that frames, iframes, zooming, editable document areas, etc won't mess this up.
Be sure to use pointers you get from the IWebBrowser/IHTMLDocument/etc. only on the thread on which you obtained the pointer or correctly marshal the pointers to other threads to avoid weird crashes and hangs.
Obtaining the HTML document of a subframe is slightly more complicated then you might hope. On the other hand this might be resolved by the new to IE8 method IHTMLFrameElement3::get_contentDocument
Check out Eric's IE blog post on IE extensibility which has some great links on this topic as well.
I've found while debugging networking in IE its often useful to quickly tell if a string is encoded in UTF-8. You can check for the Byte Order Mark (EF BB BF in UTF-8) but, I rarely see the BOM on UTF-8 strings. Instead I apply a quick and dirty UTF-8 test that takes advantage of the well-formed UTF-8 restrictions.
Unlike other multibyte character encoding forms (see Windows supported character sets or IANA's list of character sets), for example Big5, where sticking together any two bytes is more likely than not to give a valid byte sequence, UTF-8 is more restrictive. And unlike other multibyte character encodings, UTF-8 bytes may be taken out of context and one can still know that its a single byte character, the starting byte of a three byte sequence, etc.
The full rules for well-formed UTF-8 are a little too complicated for me to commit to memory. Instead I've got my own simpler (this is the quick part) set of rules that will be mostly correct (this is the dirty part). For as many bytes in the string as you care to examine, check the most significant digit of the byte:
Code Points | 1st Byte | 2nd Byte | 3rd Byte | 4th Byte |
---|---|---|---|---|
U+0000..U+007F | 00..7F | |||
U+0080..U+07FF | C2..DF | 80..BF | ||
U+0800..U+0FFF | E0 | A0..BF | 80..BF | |
U+1000..U+CFFF | E1..EC | 80..BF | 80..BF | |
U+D000..U+D7FF | ED | 80..9F | 80..BF | |
U+E000..U+FFFF | EE..EF | 80..BF | 80..BF | |
U+10000..U+3FFFF | F0 | 90..BF | 80..BF | 80..BF |
U+40000..U+FFFFF | F1..F3 | 80..BF | 80..BF | 80..BF |
U+100000..U+10FFFF | F4 | 80..8F | 80..BF | 80..BF |