normalize - Dave's Blog


URI functions in Windows Store Applications

2013 Jul 25, 1:00


The Modern SDK contains some URI related functionality as do libraries available in particular projection languages. Unfortunately, collectively these APIs do not cover all scenarios in all languages. Specifically, JavaScript and C++ have no URI building APIs, and C++ additionally has no percent-encoding/decoding APIs.
WinRT (JS and C++)
JS Only
C++ Only
.NET Only
Relative resolution
Encode data for including in URI property
Decode data extracted from URI property
Build Query
Parse Query
The Windows.Foudnation.Uri type is not projected into .NET modern applications. Instead those applications use System.Uri and the platform ensures that it is correctly converted back and forth between Windows.Foundation.Uri as appropriate. Accordingly the column marked WinRT above is applicable to JS and C++ modern applications but not .NET modern applications. The only entries above applicable to .NET are the .NET Only column and the WwwFormUrlDecoder in the bottom left which is available to .NET.



This functionality is provided by the WinRT API Windows.Foundation.Uri in C++ and JS, and by System.Uri in .NET.
Parsing a URI pulls it apart into its basic components without decoding or otherwise modifying the contents.
var uri = new Windows.Foundation.Uri("");
console.log(uri.path);// /path%20segment1/path%20segment2

WsDecodeUrl (C++)

WsDecodeUrl is not suitable for general purpose URI parsing.  Use Windows.Foundation.Uri instead.

Build (C#)

URI building is only available in C# via System.UriBuilder.
URI building is the inverse of URI parsing: URI building allows the developer to specify the value of basic components of a URI and the API assembles them into a URI. 
To work around the lack of a URI building API developers will likely concatenate strings to form their URIs.  This can lead to injection bugs if they don’t validate or encode their input properly, but if based on trusted or known input is unlikely to have issues.
            Uri originalUri = new Uri("");
            UriBuilder uriBuilder = new UriBuilder(originalUri);
            uriBuilder.Path = "/path2/";
            Uri newUri = uriBuilder.Uri; //

WsEncodeUrl (C++)

WsEncodeUrl, in addition to building a URI from components also does some encoding.  It encodes non-US-ASCII characters as UTF8, the percent, and a subset of gen-delims based on the URI property: all :/?#[]@ are percent-encoded except :/@ in the path and :/?@ in query and fragment.
Accordingly, WsEncodeUrl is not suitable for general purpose URI building.  It is acceptable to use in the following cases:
- You’re building a URI out of non-encoded URI properties and don’t care about the difference between encoded and decoded characters.  For instance you’re the only one consuming the URI and you uniformly decode URI properties when consuming – for instance using WsDecodeUrl to consume the URI.
- You’re building a URI with URI properties that don’t contain any of the characters that WsEncodeUrl encodes.


This functionality is provided by the WinRT API Windows.Foundation.Uri in C++ and JS and by System.Uri in .NET.  Normalization is applied during construction of the Uri object.
URI normalization is the application of URI normalization rules (including DNS normalization, IDN normalization, percent-encoding normalization, etc.) to the input URI.
        var normalizedUri = new Windows.Foundation.Uri("HTTP://EXAMPLE.COM/p%61th foo/");
        console.log(normalizedUri.absoluteUri); //
This is modulo Win8 812823 in which the Windows.Foundation.Uri.AbsoluteUri property returns a normalized IRI not a normalized URI.  This bug does not affect System.Uri.AbsoluteUri which returns a normalized URI.


This functionality is provided by the WinRT API Windows.Foundation.Uri in C++ and JS and by System.Uri in .NET. 
URI equality determines if two URIs are equal or not necessarily equal.
            var uri1 = new Windows.Foundation.Uri("HTTP://EXAMPLE.COM/p%61th foo/"),
                uri2 = new Windows.Foundation.Uri("");
            console.log(uri1.equals(uri2)); // true

Relative resolution

This functionality is provided by the WinRT API Windows.Foundation.Uri in C++ and JS and by System.Uri in .NET 
Relative resolution is a function that given an absolute URI A and a relative URI B, produces a new absolute URI C.  C is the combination of A and B in which the basic components specified in B override or combine with those in A under rules specified in RFC 3986.
        var baseUri = new Windows.Foundation.Uri(""),
            relativeUri = "/path?query#fragment",
            absoluteUri = baseUri.combineUri(relativeUri);
        console.log(baseUri.absoluteUri);       //
        console.log(absoluteUri.absoluteUri);   //

Encode data for including in URI property

This functionality is available in JavaScript via encodeURIComponent and in C# via System.Uri.EscapeDataString. Although the two methods mentioned above will suffice for this purpose, they do not perform exactly the same operation.
Additionally we now have Windows.Foundation.Uri.EscapeComponent in WinRT, which is available in JavaScript and C++ (not C# since it doesn’t have access to Windows.Foundation.Uri).  This is also slightly different from the previously mentioned mechanisms but works best for this purpose.
Encoding data for inclusion in a URI property is necessary when constructing a URI from data.  In all the above cases the developer is dealing with a URI or substrings of a URI and so the strings are all encoded as appropriate. For instance, in the parsing example the path contains “path%20segment1” and not “path segment1”.  To construct a URI one must first construct the basic components of the URI which involves encoding the data.  For example, if one wanted to include “path segment / example” in the path of a URI, one must percent-encode the ‘ ‘ since it is not allowed in a URI, as well as the ‘/’ since although it is allowed, it is a delimiter and won’t be interpreted as data unless encoded.
If a developer does not have this API provided they can write it themselves.  Percent-encoding methods appear simple to write, but the difficult part is getting the set of characters to encode correct, as well as handling non-US-ASCII characters.
        var uri = new Windows.Foundation.Uri("" +
            "/" + Windows.Foundation.Uri.escapeComponent("path segment / example") +
            "?key=" + Windows.Foundation.Uri.escapeComponent("=&?#"));
        console.log(uri.absoluteUri); //

WsEncodeUrl (C++)

In addition to building a URI from components, WsEncodeUrl also percent-encodes some characters.  However the API is not recommend for this scenario given the particular set of characters that are encoded and the convoluted nature in which a developer would have to use this API in order to use it for this purpose.
There are no general purpose scenarios for which the characters WsEncodeUrl encodes make sense: encode the %, encode a subset of gen-delims but not also encode the sub-delims.  For instance this could not replace encodeURIComponent in a C++ version of the following code snippet since if ‘value’ contained ‘&’ or ‘=’ (both sub-delims) they wouldn’t be encoded and would be confused for delimiters in the name value pairs in the query:
"" + Windows.Foundation.Uri.escapeComponent(value)
Since WsEncodeUrl produces a string URI, to obtain the property they want to encode they’d need to parse the resulting URI.  WsDecodeUrl won’t work because it decodes the property but Windows.Foundation.Uri doesn’t decode.  Accordingly the developer could run their string through WsEncodeUrl then Windows.Foundation.Uri to extract the property.

Decode data extracted from URI property

This functionality is available in JavaScript via decodeURIComponent and in C# via System.Uri.UnescapeDataString. Although the two methods mentioned above will suffice for this purpose, they do not perform exactly the same operation.
Additionally we now also have Windows.Foundation.Uri.UnescapeComponent in WinRT, which is available in JavaScript and C++ (not C# since it doesn’t have access to Windows.Foundation.Uri).  This is also slightly different from the previously mentioned mechanisms but works best for this purpose.
Decoding is necessary when extracting data from a parsed URI property.  For example, if a URI query contains a series of name and value pairs delimited by ‘=’ between names and values, and by ‘&’ between pairs, one must first parse the query into name and value entries and then decode the values.  It is necessary to make this an extra step separate from parsing the URI property so that sub-delimiters (in this case ‘&’ and ‘=’) that are encoded will be interpreted as data, and those that are decoded will be interpreted as delimiters.
If a developer does not have this API provided they can write it themselves.  Percent-decoding methods appear simple to write, but have some tricky parts including correctly handling non-US-ASCII, and remembering not to decode .
In the following example, note that if unescapeComponent were called first, the encoded ‘&’ and ‘=’ would be decoded and interfere with the parsing of the name value pairs in the query.
            var uri = new Windows.Foundation.Uri("");
                function (keyValueString) {
                    var keyValue = keyValueString.split("=");
                    console.log(Windows.Foundation.Uri.unescapeComponent(keyValue[0]) + ": " + Windows.Foundation.Uri.unescapeComponent(keyValue[1]));
                    // foo: bar
                    // array: ['','&','=','#']

WsDecodeUrl (C++)

Since WsDecodeUrl decodes all percent-encoded octets it could be used for general purpose percent-decoding but it takes a URI so would require the dev to construct a stub URI around the string they want to decode.  For example they could prefix “http:///#” to their string, run it through WsDecodeUrl and then extract the fragment property.  It is convoluted but will work correctly.

Parse Query

The query of a URI is often encoded as application/x-www-form-urlencoded which is percent-encoded name value pairs delimited by ‘&’ between pairs and ‘=’ between corresponding names and values.
In WinRT we have a class to parse this form of encoding using Windows.Foundation.WwwFormUrlDecoder.  The queryParsed property on the Windows.Foundation.Uri class is of this type and created with the query of its Uri:
    var uri = Windows.Foundation.Uri("");
        function (pair) {
            console.log("name: " + + ", value: " + pair.value);
            // name: foo, value: bar
            // name: array, value: ['','&','=','#']
    console.log(uri.queryParsed.getFirstValueByName("array")); // ['','&','=','#']
The QueryParsed property is only on Windows.Foundation.Uri and not System.Uri and accordingly is not available in .NET.  However the Windows.Foundation.WwwFormUrlDecoder class is available in C# and can be used manually:
            Uri uri = new Uri("");
            WwwFormUrlDecoder decoder = new WwwFormUrlDecoder(uri.Query);
            foreach (IWwwFormUrlDecoderEntry entry in decoder)
                System.Diagnostics.Debug.WriteLine("name: " + entry.Name + ", value: " + entry.Value);
                // name: foo, value: bar
                // name: array, value: ['','&','=','#']

Build Query

To build a query of name value pairs encoded as application/x-www-form-urlencoded there is no WinRT API to do this directly.  Instead a developer must do this manually making use of the code described in “Encode data for including in URI property”.
In terms of public releases, this property is only in the RC and later builds.
For example in JavaScript a developer may write:
            var uri = new Windows.Foundation.Uri(""),
                query = "?" + Windows.Foundation.Uri.escapeComponent("array") + "=" + Windows.Foundation.Uri.escapeComponent("['','&','=','#']");
            console.log(uri.combine(new Windows.Foundation.Uri(query)).absoluteUri); //'%E3%84%93'%2C'%26'%2C'%3D'%2C'%23'%5D
PermalinkCommentsc# c++ javascript technical uri windows windows-runtime windows-store

Shout Text Windows 8 App Development Notes

2013 Jun 27, 1:00

My first app for Windows 8 was Shout Text. You type into Shout Text, and your text is scaled up as large as possible while still fitting on the screen, as you type. It is the closest thing to a Hello World app as you'll find on the Windows Store that doesn't contain that phrase (by default) and I approached it as the simplest app I could make to learn about Windows modern app development and Windows Store app submission.

I rely on WinJS's default layout to use CSS transforms to scale up the user's text as they type. And they are typing into a simple content editable div.

The app was too simple for me to even consider using ads or charging for it which I learned more about in future apps.

The first interesting issue I ran into was that copying from and then pasting into the content editable div resulted in duplicates of the containing div with copied CSS appearing recursively inside of the content editable div. To fix this I had to catch the paste operation and remove the HTML data from the clipboard to ensure only the plain text data is pasted:

        function onPaste() {
var text;

if (window.clipboardData) {
text = window.clipboardData.getData("Text").toString();
window.clipboardData.setData("Text", util.normalizeContentEditableText(text));
shoutText.addEventListener("beforepaste", function () { return false; }, false);
shoutText.addEventListener("paste", onPaste, false);

I additionally found an issue in IE in which applying a CSS transform to a content editable div that has focus doesn't move the screen position of the user input caret - the text is scaled up or down but the caret remains the same size and in the same place on the screen. To fix this I made the following hack to reapply the current cursor position and text selection which resets the screen position of the user input caret.

        function resetCaret() {
setTimeout(function () {
var cursorPos = document.selection.createRange().duplicate();;
}, 200);

shoutText.attachEvent("onresize", function () { resetCaret(); }, true);
PermalinkCommentsdevelopment html javascript shout-text technical windows windows-store

Alternate IPv4 Forms - URI Host Syntax Notes

2012 Mar 14, 4:30

By the URI RFC there is only one way to represent a particular IPv4 address in the host of a URI. This is the standard dotted decimal notation of four bytes in decimal with no leading zeroes delimited by periods. And no leading zeros are allowed which means there's only one textual representation of a particular IPv4 address.

However as discussed in the URI RFC, there are other forms of IPv4 addresses that although not officially allowed are generally accepted. Many implementations used inet_aton to parse the address from the URI which accepts more than just dotted decimal. Instead of dotted decimal, each dot delimited part can be in decimal, octal (if preceded by a '0') or hex (if preceded by '0x' or '0X'). And that's each section individually - they don't have to match. And there need not be 4 parts: there can be between 1 and 4 (inclusive). In case of less than 4, the last part in the string represents all of the left over bytes, not just one.

For example the following are all equivalent:
Standard dotted decimal form
Fewer parts
All of the above

The bread and butter of URI related security issues is when one part of the system disagrees with another about the interpretation of the URI. So this non-standard, non-normal form syntax has been been a great source of security issues in the past. Its mostly well known now (CreateUri normalizes these non-normal forms to dotted decimal), but occasionally a good tool for bypassing naive URI blocking systems.

PermalinkCommentsurl inet_aton uri technical host programming ipv4

Canonicalization - Webmaster Tools Help

2010 Jan 25, 8:31PermalinkCommentsgoogle url normalize canonical technical web

rev=canonical: url shortening that doesn't hurt the internet

2009 Apr 7, 1:59A URL shortening service that tries to find the normal form (which hopefully translates to shorter in length) of a URL viaPermalinkCommentsvia:connolly tinyurl canonical normalize uri url

Encoding methods in C#

2008 Apr 12, 10:38

For Encode-O-Matic, my encoding tool written in C#, I had to figure out the appropriate DllImport declarations to use IDN Win32 functions which was a pain. To spare others that pain here's the two files CharacterSetEncoding.cs and NationalLanguageSupportUtilities.cs that declare the DllImports for IdnToUnicode, IdnToAscii, NormalizeString, MultiByteToWideChar, and WideCharToMultiByte.

PermalinkCommentsencodeomatic boring csharp widechartomultibyte idn tool dllimport

Extensible Markup Language (XML) 1.1 (Second Edition)

2008 Mar 18, 11:21End-of-line handling in XML. Spoiler: XML processor should normalize most newline character sequences to 0xA.PermalinkCommentsxml spec standard w3c unicode charset newline end-of-line

Canonical Node Representation - Social Graph API - Google Code

2008 Feb 8, 3:26Google's Social Graph API's normalization method. This is no way to treat a URI...PermalinkCommentsuri google api normalize social social-graph graph reference

IE7 Feed Display Update

2007 May 22, 3:22I've created an update to the IE7 feed display.

After working on my update to the XML source view I tried running my resourcelist program on other IE DLLs including ieframe. I found that one of the resources in ieframe is the XSLT used to turn an RSS feed into the IE7 feed display.

My first thought for this was that I could embed enclosures into the feed display. For instance, have controls for videos or podcast audio files directly in the feed display. However, I found that I can't use object or embed tags that rely on ActiveX controls in the page or in frames in the feed display.

With that through I decided I could at least add support for some RSS extensions. Thanks to IE7's RSS platform which provides a normalized view of RSS feeds it was really easy to do this. I went to several popular RSS feeds and RSS feeds that I like and took a look at the source to see what extensions I might want to add support for.

For I added support for their RSS extension which includes digg count, and submitter name and icon. I added the digg count in a box on the right and tried to make it fit in stylistically. For the iTunes RSS extension I add the feed icon, feed author, and descriptions. I was surprised by how much of the podcasts content was missing from the feed view. I also added support for a few other misc things: the slash RSS extension's section and department, the feed description to the top of the feed display, and the atom author icon.

I wonder what other goodies lurk in IE's resources...PermalinkCommentsfeed res slashdot digg resource itunes technical browser ie rss extension
Older Entries Creative Commons License Some rights reserved.