I keep seeing crowdsource projects with big names that I actually want to back:
By the URI RFC there is only one way to represent a particular IPv4 address in the host of a URI. This is the standard dotted decimal notation of four bytes in decimal with no leading zeroes delimited by periods. And no leading zeros are allowed which means there's only one textual representation of a particular IPv4 address.
However as discussed in the URI RFC, there are other forms of IPv4 addresses that although not officially allowed are generally accepted. Many implementations used inet_aton to parse the address from the URI which accepts more than just dotted decimal. Instead of dotted decimal, each dot delimited part can be in decimal, octal (if preceded by a '0') or hex (if preceded by '0x' or '0X'). And that's each section individually - they don't have to match. And there need not be 4 parts: there can be between 1 and 4 (inclusive). In case of less than 4, the last part in the string represents all of the left over bytes, not just one.
For example the following are all equivalent:
The bread and butter of URI related security issues is when one part of the system disagrees with another about the interpretation of the URI. So this non-standard, non-normal form syntax has been been a great source of security issues in the past. Its mostly well known now (CreateUri normalizes these non-normal forms to dotted decimal), but occasionally a good tool for bypassing naive URI blocking systems.
One of the more limiting issues of writing client side script in the browser is the same origin limitations of XMLHttpRequest. The latest version of all browsers support a subset of CORS to allow servers to opt-in particular resources for cross-domain access. Since IE8 there's XDomainRequest and in all other browsers (including IE10) there's XHR L2's cross-origin request features. But the vast majority of resources out on the web do not opt-in using CORS headers and so client side only web apps like a podcast player or a feed reader aren't doable.
One hack-y way around this I've found is to use YQL as a CORS proxy. YQL applies the CORS header to all its responses and among its features it allows a caller to request an arbitrary XML, HTML, or JSON resource. So my network helper script first attempts to access a URI directly using XDomainRequest if that exists and XMLHttpRequest otherwise. If that fails it then tries to use XDR or XHR to access the URI via YQL. I wrap my URIs in the following manner, where type is either "html", "xml", or "json":
yqlRequest = function(uri, method, type, onComplete, onError) {
var yqlUri = "http://query.yahooapis.com/v1/public/yql?q=" +
encodeURIComponent("SELECT * FROM " + type + ' where url="' + encodeURIComponent(uri) + '"');
if (type == "html") {
yqlUri += encodeURIComponent(" and xpath='/*'");
}
else if (type == "json") {
yqlUri += "&callback=&format=json";
}
...
This
also means I can get JSON data itself without having to go through JSONP.
Shortly after joining the Internet Explorer team I got a bug from a PM on a popular Microsoft web server product that I'll leave unnamed (from now on UWS). The bug said that IE was handling empty path segments incorrectly by not removing them before resolving dotted path segments. For example UWS would do the following:
A.1. http://example.com/a/b//../
A.2. http://example.com/a/b/../
A.3. http://example.com/a/
In step 1 they are given a URI with dotted path segment and an empty
path segment. In step 2 they remove the empty path segment, and in step 3 they resolve the dotted path segment. Whereas, given the same initial URI, IE would do the following:
B.1. http://example.com/a/b//../
B.2. http://example.com/a/b/
IE simply resolves the dotted path segment against the empty path segment and removes them both. So, how
did I resolve this bug? As "By Design" of course!
The URI RFC allows path segments of zero length and does not assign them any special meaning. So generic user agents that intend to work on the web must not treat an empty path segment any different from a path segment with some text in it. In the case above IE is doing the correct thing.
That's the case for generic user agents, however servers may decide that a URI with an empty path segment returns the same resource as a the same URI without that empty path segment. Essentially they can decide to ignore empty path segments. Both IIS and Apache work this way and thus return the same resource for the following URIs:
http://exmaple.com/foo//bar///baz
http://example.com/foo/bar/baz
The issue for UWS is that it removes empty path segments before resolving dotted path segments. It must
follow normal URI procedure before applying its own additional rules for empty path segments. Not doing that means they end up violating URI equivalency rules: URIs (A.1) and (B.2) are equivalent
but UWS will not return the same resource for them.
Mime-type for describing the difference between two JSON resources (in JSON using JSON paths)
I wanted to ensure that my switch statement in my implementation of IInternetSecurityManager::ProcessURLAction had a case for every possible documented URLACTION. I wrote the following short command line sequence to see the list of all URLACTIONs in the SDK header file not found in my source file:
grep URLACTION urlmon.idl | sed 's/.*\(URLACTION[a-zA-Z0-9_]*\).*/\1/g;' | sort | uniq > allURLACTIONs.txt
grep URLACTION MySecurityManager.cpp | sed 's/.*\(URLACTION[a-zA-Z0-9_]*\).*/\1/g;' | sort | uniq > myURLACTIONs.txt
comm -23 allURLACTIONs.txt myURLACTIONs.txt
I'm
not a sed expert so I had to read the sed documentation, and I heard about comm from Kris Kowal's blog which happilly was in the Win32 GNU tools pack I
already run.
But in my effort to learn and use PowerShell I found the following similar command line:
diff
(more urlmon.idl | %{ if ($_ -cmatch "URLACTION[a-zA-Z0-9_]*") { $matches[0] } } | sort -uniq)
(more MySecurityManager.cpp | %{ if ($_ -cmatch "URLACTION[a-zA-Z0-9_]*") { $matches[0] } } | sort -uniq)
In
the PowerShell version I can skip the temporary files which is nice. 'diff' is mapped to 'compare-object' which seems similar to comm but with no parameters to filter out the different streams
(although this could be done more verbosely with the ?{ } filter syntax). In PowerShell uniq functionality is built into sort. The builtin -cmatch operator (c is for case sensitive) to do regexp is
nice plus the side effect of generating the $matches variable with the regexp results.
I've made two simple command line tools related to the console window and Win7 jump lists. The source is available for both but neither is much more than the sort of samples you'd find on MSDN =).
SetAppUserModelId lets you change the Application User Model ID for the current console window. The AppUserModelId is the value Win7 uses to group together icons on the task bar and is what the task bar's jump lists are associated with. The tool lets you change that as well as the icon and name that appear in the task bar for the window, and the command to launch if the user attempts to re-launch the application from its task bar icon.
SetJumpList lets you set the jump list associated with a particular AppUserModelId. You pass the AppUserModelId as the only parameter and then in its standard input you give it lines specifying items that should appear in the jump list and what to execute when those items are picked.
I put these together to make my build environment easier to deal with at work. I have to deal with multiple enlistments in many different branches and so I wrote a simple script around these two tools to group my build windows by branch name in the task bar, and to add the history of commands I've used to launch the build environment console windows to the jump list of each.