tracker - Dave's Blog

Search
My timeline on Mastodon

URI Percent Encoding Ignorance Level 2 - There is no Unencoded URI

2012 Feb 20, 4:00

As a professional URI aficionado I deal with various levels of ignorance on URI percent-encoding (aka URI encoding, or URL escaping).

Getting into the more subtle levels of URI percent-encoding ignorance, folks try to apply their knowledge of percent-encoding to URIs as a whole producing the concepts escaped URIs and unescaped URIs. However there are no such things - URIs themselves aren't percent-encoded or decoded but rather contain characters that are percent-encoded or decoded. Applying percent-encoding or decoding to a URI as a whole produces a new and non-equivalent URI.

Instead of lingering on the incorrect concepts we'll just cover the correct ones: there's raw unencoded data, non-normal form URIs and normal form URIs. For example:

  1. http://example.com/%74%68%65%3F%70%61%74%68?query
  2. http://example.com/the%3Fpath?query
  3. "http", "example.com", "the?path", "query"

In the above (A) is not an 'encoded URI' but rather a non-normal form URI. The characters of 'the' and 'path' are percent-encoded but as unreserved characters specific in the RFC should not be encoded. In the normal form of the URI (B) the characters are decoded. But (B) is not a 'decoded URI' -- it still has an encoded '?' in it because that's a reserved character which by the RFC holds different meaning when appearing decoded versus encoded. Specifically in this case, it appears encoded which means it is data -- a literal '?' that appears as part of the path segment. This is as opposed to the decoded '?' that appears in the URI which is not part of the path but rather the delimiter to the query.

Usually when developers talk about decoding the URI what they really want is the raw data from the URI. The raw decoded data is (C) above. The only thing to note beyond what's covered already is that to obtain the decoded data one must parse the URI before percent decoding all percent-encoded octets.

Of course the exception here is when a URI is the raw data. In this case you must percent-encode the URI to have it appear in another URI. More on percent-encoding while constructing URIs later.

PermalinkCommentsurl encoding uri technical percent-encoding

Why I Like Glitch

2012 Feb 17, 4:00

Sarah and I have been enjoying Glitch for a while now. Reviews are usually positive although occasionally biting (but mostly accurate).

I enjoy Glitch as a game of exploration: exploring the game's lands with hidden and secret rooms, and exploring the games skills and game mechanics. The issue with my enjoyment coming from exploration is that after I've explored all streets and learned all skills I've got nothing left to do. But I've found that even after that I can have fun writing client side JavaScript against Glitch's web APIs making tools (I work on the Glitch Helperator) for use in Glitch. And on a semi-regular basis they add new features reviving my interest in the game itself.

PermalinkCommentsvideo-game glitch glitch-helperator me project game

URI Percent-Encoding Ignorance Level 1 - Purpose

2012 Feb 15, 4:00

As a professional URI aficionado I deal with various levels of ignorance on URI percent-encoding (aka URI encoding, or URL escaping).

Worse than the lame blog comments hating on percent-encoding is the shipping code which can do actual damage. In one very large project I won't name, I've fixed code that decodes all percent-encoded octets in a URI in order to get rid of pesky percents before calling ShellExecute. An unnamed developer with similar intent but clearly much craftier did the same thing in a loop until the string's length stopped changing. As it turns out percent-encoding serves a purpose and can't just be removed arbitrarily.

Percent-encoding exists so that one can represent data in a URI that would otherwise not be allowed or would be interpretted as a delimiter instead of data. For example, the space character (U+0020) is not allowed in a URI and so must be percent-encoded in order to appear in a URI:

  1. http://example.com/the%20path/
  2. http://example.com/the path/
In the above the first is a valid URI while the second is not valid since a space appears directly in the URI. Depending on the context and the code through which the wannabe URI is run one may get unexpected failure.

For an additional example, the question mark delimits the path from the query. If one wanted the question mark to appear as part of the path rather than delimit the path from the query, it must be percent-encoded:

  1. http://example.com/foo%3Fbar
  2. http://example.com/foo?bar
In the second, the question mark appears plainly and so delimits the path "/foo" from the query "bar". And in the first, the querstion mark is percent-encoded and so the path is "/foo%3Fbar".
PermalinkCommentsencoding uri technical ietf percent-encoding

Blackmail DRM - Stolen Thoughts

2012 Feb 13, 4:00

Most existing DRM attempts to only allow the user to access the DRM'ed content with particular applications or with particular credentials so that if the file is shared it won't be useful to others. A better solution is to encode any of the user's horrible secrets into unique versions of the DRM'ed content so that the user won't want to share it. Entangle the users and the content provider's secrets together in one document and accordingly their interests. I call this Blackmail DRM. For an implementation it is important to point out that the user's horrible secret doesn't need to be verified as accurate, but merely verified as believable.

Apparently I need to get these blog posts written faster because only recently I read about Social DRM which is a light weight version of my idea but with a misleading name. Instead of horrible secrets, they say they'll use personal information like the user's name in the DRM'ed content. More of my thoughts stolen and before I even had a chance to think of it first!

PermalinkCommentsdrm blackmail blackmail-drm technical humor social-drm

URI Percent Encoding Ignorance Level 0 - Existence

2012 Feb 10, 4:00

As a professional URI aficionado I deal with various levels of ignorance on URI percent-encoding (aka URI encoding, or URL escaping). The basest ignorance is with respect to the mere existence of percent-encoding. Percents in URIs are special: they always represent the start of a percent-encoded octet. That is to say, a percent is always followed by two hex digits that represents a value between 0 and 255 and doesn't show up in a URI otherwise.

The IPv6 textual syntax for scoped addresses uses the '%' to delimit the zone ID from the rest of the address. When it came time to define how to represent scoped IPv6 addresses in URIs there were two camps: Folks who wanted to use the IPv6 format as is in the URI, and those who wanted to encode or replace the '%' with a different character. The resulting thread was more lively than what shows up on the IETF URI discussion mailing list. Ultimately we went with a percent-encoded '%' which means the percent maintains its special status and singular purpose.

PermalinkCommentsencoding uri technical ietf percent-encoding ipv6

JavaScript Array methods in the latest browsers

2011 Dec 3, 6:46

Cool and (relatively) new methods on the JavaScript Array object are here in the most recent versions of your favorite browser! More about them on ECMAScript5, MSDN, the IE blog, or Mozilla's documentation. Here's the list that's got me excited:

some & every
Does your callback function return true for any (some) or all (every) of the array's elements?
filter
Filters out elements for which your callback function returns false (in a new copy of the Array).
map
Each element is replaced with the result of it run through your callback function (in a new copy of the Array).
reduce & reduceRight
Your callback is called on each element in the array in sequence (from start to finish in reduce and from finish to start in reduceRight) with the result of the previous callback call passed to the next. Reduce your array to a single value aggregated in any manner you like via your callback function.
forEach
Simply calls your callback passing in each element of your array in turn. I have vague performance concerns as compared to using a normal for loop.
indexOf & lastIndexOf
Finds the first or last (respectively) element in the array that matches the provided value via strict equality operator and returns the index of that element or -1 if there is no such element. Surprisingly, no custom comparison callback method mechanism is provided.
PermalinkCommentsjavascript array technical programming

Bug Spotting: Ctors with default parameters

2011 Dec 1, 4:59

The following code compiled just fine but did not at all act in the manner I expected:

BOOL CheckForThing(__in CObj *pObj, __in IFigMgr* pFigMgr, __in_opt LPCWSTR url)
{
BOOL fCheck = FALSE;
if (SubCheck(pObj))
{
...
I’m calling SubCheck which looks like:
bool SubCheck(const CObj& obj);

Did you spot the bug? As you can see I should be passing in *pObj not pObj since the method takes a const CObj& not a CObj*. But then why does it compile?

It works because CObj has a constructor with all but one param with default values and CObj is derived from IUnknown:

CObj(__in_opt IUnknown * pUnkOuter, __in_opt LPCWSTR pszUrl = NULL);
Accordingly C++ uses this constructor as an implicit conversion operator. So instead of passing in my CObj, I end up creating a new CObj on the stack passing in the CObj I wanted as the outer object which has a number of issues.

The lesson is unless you really want this behavior, don't make constructors with all but 1 or 0 default parameters. If you need to do that consider using the 'explicit' keyword on the constructor.

More info about forcing single argument constructors to be explicit is available on stack overflow.

PermalinkCommentsc++ technical bug programming

Replacing Google Reader Shared Feeds with Tumblr

2011 Nov 28, 7:36

Last time I wrote about how I switched from Delicious to Google Reader's shared links feature only to find out that week that Google was removing the Google Reader shared links feature in favor of Google Plus social features (I'll save my Google Plus rant for another day).

Forced to find something new again, I'm now very pleased with Tumblr. Google Reader has Tumblr in its preset list of Send To sites which makes it relatively easy to add articles. And Tumblr's UX for adding things lets me easily pick a photo or video to display from the article - something which I had put together with a less convenient UX on my bespoke blogging system. For adding things outside of Google Reader I made a Tumblr accelerator to hookup to the Tumblr Add UX.

Of course they have an RSS feed which I hooked up to my blog. The only issue I had there is that when you add a link (and not a video or photo) to Tumblr, the RSS feed entry title for that link is repeated in the entry description as a link followed by a colon and then the actual description entered into Tumblr. I want my title separate so I can apply my own markup so I did a bit of parsing of the description to remove the repeated title from the description.

PermalinkCommentsblog tumblr me technical google-reader

URI Empty Path Segments Matter

2011 Nov 23, 11:00

Shortly after joining the Internet Explorer team I got a bug from a PM on a popular Microsoft web server product that I'll leave unnamed (from now on UWS). The bug said that IE was handling empty path segments incorrectly by not removing them before resolving dotted path segments. For example UWS would do the following:

A.1. http://example.com/a/b//../
A.2. http://example.com/a/b/../
A.3. http://example.com/a/
In step 1 they are given a URI with dotted path segment and an empty path segment. In step 2 they remove the empty path segment, and in step 3 they resolve the dotted path segment. Whereas, given the same initial URI, IE would do the following:
B.1. http://example.com/a/b//../
B.2. http://example.com/a/b/
IE simply resolves the dotted path segment against the empty path segment and removes them both. So, how did I resolve this bug? As "By Design" of course!

The URI RFC allows path segments of zero length and does not assign them any special meaning. So generic user agents that intend to work on the web must not treat an empty path segment any different from a path segment with some text in it. In the case above IE is doing the correct thing.

That's the case for generic user agents, however servers may decide that a URI with an empty path segment returns the same resource as a the same URI without that empty path segment. Essentially they can decide to ignore empty path segments. Both IIS and Apache work this way and thus return the same resource for the following URIs:

http://exmaple.com/foo//bar///baz
http://example.com/foo/bar/baz
The issue for UWS is that it removes empty path segments before resolving dotted path segments. It must follow normal URI procedure before applying its own additional rules for empty path segments. Not doing that means they end up violating URI equivalency rules: URIs (A.1) and (B.2) are equivalent but UWS will not return the same resource for them.
PermalinkCommentsuser agent url ie uri technical web browser

Features of image type input tags in HTML

2011 Nov 21, 11:00

A bug came up the other day involving markup containing <input type="image" src="http://example.com/.... I knew that "image" was a valid input type but it wasn't until that moment that I realized I didn't know what it did. Looking it up I found that it displays the specified image and when the user clicks on the image, the form is submitted with an additional two name value pairs: the x and y positions of the point at which the user clicked the image.

Take for example the following HTML:

<form action="http://example.com/">
<input type="image" name="foo" src="http://deletethis.net/dave/images/davebefore.jpg">
</form>
If the user clicks on the image, the browser will submit the form with a URI like the following:http://example.com/?foo.x=145&foo.y=124.

This seemed like an incredibly specific feature to be built directly into the language when this could instead be done with javascript. I looked a bit further and saw that its been in HTML since at least HTML2, which of course makes much more sense. Javascript barely existed at that point and sending off the user's click location in a form may have been the only way to do something interesting with that action.

PermalinkCommentsuri technical form history html

Replacing Delicious with Google Reader

2011 Nov 17, 11:00

I had previously replaced my use of Delicious with Google Reader. Delicious had a number of issues during their switch over from Yahoo to the new owners and I was eventually fed up enough to remove it from daily use. I used Delicious to do the following things:

  • Create a list of things to read later
  • Save things to read again in the future
  • Search through things I read and enjoyed (esp via tags)
  • Annotate and share things on my blog
I realized that since I did most of my web browsing in Google Reader now anyway I may as well make use of its features. I star things to note I want to read it later or save to read again later. I can annotate with notes in Google Reader and I can share items to my web site by way of the shared items feed. Additionally for when I'm not in Google Reader there's a bookmarklet to add an arbitrary web site as a shared item in Google Reader.

Of course I wrote this and switched over about 1 week before Google removed the sharing feature from Google Reader. I'm irritated but in practice it forced me to find a different option which has worked out mostly better. New blog post coming soon about that...

PermalinkCommentsblog delicious me technical google-reader google feed

Bug Spotting: Smart pointers and parameter evaluation order

2011 Oct 19, 5:58
The following code works fine. I have a ccomptr named resolvedUri and I want to update its hostname so I do the following:
        CreateIUriBuilder(resolvedUri, 0, 0, &builder);
builder->SetHost(host);
builder->CreateUri(0xFFFFFFFF, 0, 0, &resolvedUri);


But the following similar looking code has a bug:
    ResolveHost(resolvedUri, &resolvedUri);


The issue is that doing &resolvedUri gets the address of the pointer but also clears out the pointer due to the definition of my smart pointer class:
    operator T**()  
{
T *ptrValue = mPtrValue;
mPtrValue->Release();
mPtrValue = NULL;
return &ptrValue;
}


In C++ there’s no guarantee about the order in which parameters for a function or method are evaluated. In the case above, &resolvedUri clears out the ccomptr before evaluating resolvedUri.Get() and so ResolveHostAlias gets a nullptr.

An interesting and related thread on stack overflow on undefined behavior in C++.
PermalinkCommentsc++ technical bug programming smart-pointer cpp

Haven't Been Posting Much

2011 Oct 18, 4:52
I haven't been updating my blog recently. But I have three excellent reasons:
PermalinkComments

WPAD Server Fiddler Extension Update v1.0.1

2011 Jun 12, 3:34
As it turns out the WPAD Server Fiddler Extension I made a while back actually has a non-malicious purpose. Apparently its useful for debugging HTTP on the WP7 phone (or so I'm told). Anyway I took some requests and I've fixed a few minor bugs (start button not updating correctly), changed the dialog to be a Fiddler tab so you can use it non-modally, and the WPAD server is now always off when Fiddler starts.
PermalinkCommentsextension fiddler technical update wpad

Command line for finding missing URLACTIONs

2011 May 28, 11:00

I wanted to ensure that my switch statement in my implementation of IInternetSecurityManager::ProcessURLAction had a case for every possible documented URLACTION. I wrote the following short command line sequence to see the list of all URLACTIONs in the SDK header file not found in my source file:

grep URLACTION urlmon.idl | sed 's/.*\(URLACTION[a-zA-Z0-9_]*\).*/\1/g;' | sort | uniq > allURLACTIONs.txt
grep URLACTION MySecurityManager.cpp | sed 's/.*\(URLACTION[a-zA-Z0-9_]*\).*/\1/g;' | sort | uniq > myURLACTIONs.txt
comm -23 allURLACTIONs.txt myURLACTIONs.txt
I'm not a sed expert so I had to read the sed documentation, and I heard about comm from Kris Kowal's blog which happilly was in the Win32 GNU tools pack I already run.

But in my effort to learn and use PowerShell I found the following similar command line:

diff 
(more urlmon.idl | %{ if ($_ -cmatch "URLACTION[a-zA-Z0-9_]*") { $matches[0] } } | sort -uniq)
(more MySecurityManager.cpp | %{ if ($_ -cmatch "URLACTION[a-zA-Z0-9_]*") { $matches[0] } } | sort -uniq)
In the PowerShell version I can skip the temporary files which is nice. 'diff' is mapped to 'compare-object' which seems similar to comm but with no parameters to filter out the different streams (although this could be done more verbosely with the ?{ } filter syntax). In PowerShell uniq functionality is built into sort. The builtin -cmatch operator (c is for case sensitive) to do regexp is nice plus the side effect of generating the $matches variable with the regexp results.
PermalinkCommentspowershell tool cli technical command line

clip.exe - Useful tool I didn't know shipped with Windows

2011 May 26, 11:00

When you run clip.exe, whatever comes into its standard input is put onto the clipboard. So when you need to move the result of something in your command window somewhere else you can pipe the result into clip.exe. Then you won't have to worry about the irritating way cmd.exe does block copy/pasting and you avoid having to manually fixup line breaks in wrapped lines. For instance, you can put the contents of a script into the clipboard with:

more cdo.cmd | clip

I've got a lot of stuff dumped in my bin folder that I sync across all my PCs so I didn't realize that clip.exe is a part of standard Windows installs.

Nice for avoiding the block copy in cmd.exe but I'd prefer to have the contents sort of tee'd into the clipboard and standard output. So TeeClip.ps1:

$input | tee -var teeclipout | clip;
$teeclipout;
PermalinkCommentspowershell clip tool clipboard cli technical windows tee

_opt Mnemonic

2011 May 24, 11:00

​I always have trouble remembering where the opt goes in SAL in the __deref_out case. The mnemonic is pretty simple: the _opt at the start of the SAL is for the pointer value at the start of the function. And the _opt at the end of the SAL is for the dereferenced pointer value at the end of the function.






SAL foo == nullptr allowed at function start? *foo == nullptr allowed at function end?
__deref_out void **foo No No
__deref_opt_out void **foo Yes No
__deref_out_opt void **foo No Yes
__deref_opt_out_opt void **foo Yes Yes
.
PermalinkCommentssal technical programming

PowerShell Script Batch File Wrapper

2011 May 22, 7:20

I'm trying to learn and use PowerShell more, but plenty of other folks I know don't use PowerShell. To allow them to use my scripts I use the following cmd.exe batch file to make it easy to call PowerShell scripts. To use, just name the batch file name the same as the corresponding PowerShell script filename and put it in the same directory.

@echo off
if "%1"=="/?" goto help
if "%1"=="/h" goto help
if "%1"=="-?" goto help
if "%1"=="-h" goto help

%systemroot%\system32\windowspowershell\v1.0\powershell.exe -ExecutionPolicy RemoteSigned -Command . %~dpn0.ps1 %*
goto end

:help
%systemroot%\system32\windowspowershell\v1.0\powershell.exe -ExecutionPolicy RemoteSigned -Command help %~dpn0.ps1 -full
goto end

:end

Additionally for PowerShell scripts that modify the current working directory I use the following batch file:

@echo off
if "%1"=="/?" goto help
if "%1"=="/h" goto help
if "%1"=="-?" goto help
if "%1"=="-h" goto help

%systemroot%\system32\windowspowershell\v1.0\powershell.exe -ExecutionPolicy RemoteSigned -Command . %~dpn0.ps1 %*;(pwd).Path 1> %temp%\%~n0.tmp 2> nul
set /p newdir=
PermalinkCommentspowershell technical programming batch file console

Capturing HTTPS with FiddlerCore

2011 Apr 6, 10:00

I used FiddlerCore in GeolocMock to edit HTTPS responses and ran into two stumbling blocks that I'll document here. The first is that I didn't check if the Fiddler root cert existed or was installed, which of course is necessary to edit HTTPS traffic. The following is my code where I check for the certs.

    if (!Fiddler.CertMaker.rootCertExists())
{
if (!Fiddler.CertMaker.createRootCert())
{
throw new Exception("Unable to create cert for FiddlerCore.");
}
}

if (!Fiddler.CertMaker.rootCertIsTrusted())
{
if (!Fiddler.CertMaker.trustRootCert())
{
throw new Exception("Unable to install FiddlerCore's cert.");
}
}

The second problem I had (which would have been solved had I read all the sample code first) was that my changes weren't being applied. In my app I only need the BeforeResponse but in order to modify the response I must also sign up for the BeforeRequest event and mark the bBufferResponse flag on the session before the response comes back. For example:

    Fiddler.FiddlerApplication.BeforeRequest += new SessionStateHandler(FiddlerApplication_BeforeRequest);
Fiddler.FiddlerApplication.BeforeResponse += new SessionStateHandler(FiddlerApplication_BeforeResponse);
...
private void FiddlerApplication_BeforeRequest(Session oSession)
{
if (IsInterestingSession(oSession))
{
oSession.bBufferResponse = true;
}
}
PermalinkCommentshttp fiddler technical https geolocmock programming fiddlercore

JavaScript & .NET interop via WebBrowser Control

2011 Apr 5, 10:00

For my GeolocMock weekend project I intended to use the Bing Maps API to display a map in a WebBrowser control and allow the user to interact with that to select a location to be consumed by my application. Getting my .NET code to talk to the JavaScript in the WebBrowser control was surprisingly easy.

To have .NET execute JavaScript code you can use the InvokeScript method passing the name of the JavaScript function to execute and an object array of parameters to pass:

this.webBrowser2.Document.InvokeScript("onLocationStateChanged",
new object[] {
latitudeTextBoxText,
longitudeTextBoxText,
altitudeTextBoxText,
uncertaintyTextBoxText
});

The other direction, having JavaScript call into .NET is slightly more complicated but still pretty easy as far as language interop goes. The first step is to mark your assembly as ComVisible so that it can interact with JavaScript via COM. VS had already added a ComVisible declaration to my project I just had to change the value to true.

[assembly: ComVisible(true)]

Next set ObjectForScripting attribute to the object you want to expose to JavaScript.

this.webBrowser2.ObjectForScripting = this.locationState;

Now that object is exposed as window.external in JavaScript and you can call methods on it.

window.external.Set(lat, long, alt, gUncert);

However you don't seem to be able to test for the existence of methods off of it. For example the following JavaScript generates an exception for me even though I have a Set method:

if (window.external && window.external.Set) {
PermalinkCommentsjavascript webbrowser .net technical csharp
Older Entries Creative Commons License Some rights reserved.