xpointer - Dave's Blog

Search
My timeline on Mastodon

Text/Plain Fragment Bookmarklet

2008 Nov 19, 12:58

The text/plain fragment documented in RFC 5147 and described on Erik Wilde's blog struck my interest and, like the XML fragment, I wanted to see if I could implement this in IE. In this case there's no XSLT for me to edit so, like my plain/text word wrap bookmarklet I've implemented it as a bookmarklet. This is only a partial implementation as it doesn't implement the integrity checks.

Check out my text/plain fragment bookmarklet.

PermalinkCommentstext url boring bookmarklet uri plain-text javascript fragment

URI Fragment Info Roundup

2008 Apr 21, 11:53

['Neverending story' by Alexandre Duret-Lutz. A framed photo of books with the droste effect applied. Licensed under creative commons.]Information about URI Fragments, the portion of URIs that follow the '#' at the end and that are used to navigate within a document, is scattered throughout various documents which I usually have to hunt down. Instead I'll link to them all here.

Definitions. Fragments are defined in the URI RFC which states that they're used to identify a secondary resource that is related to the primary resource identified by the URI as a subset of the primary, a view of the primary, or some other resource described by the primary. The interpretation of a fragment is based on the mime type of the primary resource. Tim Berners-Lee notes that determining fragment meaning from mime type is a problem because a single URI may contain a single fragment, however over HTTP a single URI can result in the same logical resource represented in different mime types. So there's one fragment but multiple mime types and so multiple interpretations of the one fragment. The URI RFC says that if an author has a single resource available in multiple mime types then the author must ensure that the various representations of a single resource must all resolve fragments to the same logical secondary resource. Depending on which mime types you're dealing with this is either not easy or not possible.

HTTP. In HTTP when URIs are used, the fragment is not included. The General Syntax section of the HTTP standard says it uses the definitions of 'URI-reference' (which includes the fragment), 'absoluteURI', and 'relativeURI' (which don't include the fragment) from the URI RFC. However, the 'URI-reference' term doesn't actually appear in the BNF for the protocol. Accordingly the headers like 'Request-URI', 'Content-Location', 'Location', and 'Referer' which include URIs are defined with 'absoluteURI' or 'relativeURI' and don't include the fragment. This is in keeping with the original fragment definition which says that the fragment is used as a view of the original resource and consequently only needed for resolution on the client. Additionally, the URI RFC explicitly notes that not including the fragment is a privacy feature such that page authors won't be able to stop clients from viewing whatever fragments the client chooses. This seems like an odd claim given that if the author wanted to selectively restrict access to portions of documents there are other options for them like breaking out the parts of a single resource to which the author wishes to restrict access into separate resources.

HTML. In HTML, the HTML mime type RFC defines HTML's fragment use which consists of fragments referring to elements with a corresponding 'id' attribute or one of a particular set of elements with a corresponding 'name' attribute. The HTML spec discusses fragment use additionally noting that the names and ids must be unique in the document and that they must consist of only US-ASCII characters. The ID and NAME attributes are further restricted in section 6 to only consist of alphanumerics, the hyphen, period, colon, and underscore. This is a subset of the characters allowed in the URI fragment so no encoding is discussed since technically its not needed. However, practically speaking, browsers like FireFox and Internet Explorer allow for names and ids containing characters outside of the defined set including characters that must be percent-encoded to appear in a URI fragment. The interpretation of percent-encoded characters in fragments for HTML documents is not consistent across browsers (or in some cases within the same browser) especially for the percent-encoded percent.

Text. Text/plain recently got a fragment definition that allows fragments to refer to particular lines or characters within a text document. The scheme no longer includes regular expressions, which disappointed me at first, but in retrospect is probably good idea for increasing the adoption of this fragment scheme and for avoiding the potential for ubiquitous DoS via regex. One of the authors also notes this on his blog. I look forward to the day when this scheme is widely implemented.

XML. XML has the XPointer framework to define its fragment structure as noted by the XML mime type definition. XPointer consists of a general scheme that contains subschemes that identify a subset of an XML document. Its too bad such a thing wasn't adopted for URI fragments in general to solve the problem of a single resource with multiple mime type representations. I wrote more about XPointer when I worked on hacking XPointer into IE.

SVG and MPEG. Through the Media Fragments Working Group I found a couple more fragment scheme definitions. SVG's fragment scheme is defined in the SVG documentation and looks similar to XML's. MPEG has one defined but I could only find it as an ISO document "Text of ISO/IEC FCD 21000-17 MPEG-12 FID" and not as an RFC which is a little disturbing.

AJAX. AJAX websites have used fragments as an escape hatch for two issues that I've seen. The first is getting a unique URL for versions of a page that are produced on the client by script. The fragment may be changed by script without forcing the page to reload. This goes outside the rules of the standards by using HTML fragments in a fashion not called out by the HTML spec. but it does seem to be inline with the spirit of the fragment in that it is a subview of the original resource and interpretted client side. The other hack-ier use of the fragment in AJAX is for cross domain communication. The basic idea is that different frames or windows may not communicate in normal fashions if they have different domains but they can view each other's URLs and accordingly can change their own fragments in order to send a message out to those who know where to look. IMO this is not inline with the spirit of the fragment but is rather a cool hack.

PermalinkCommentsxml text ajax technical url boring uri fragment rfc

Zune Software Update

2007 Nov 19, 3:47PermalinkCommentsmicrosoft technical music zune social

XPointer Framework - IE7 XML Source View Upgrade Part 3

2007 May 17, 5:16Previously I created some resource tools and then I used them to overwrite msxml3's XML source view. In this update I've added support for the XPointer Framework.

This time around I've started to add support for the XPointer Framework to my XML source view and I've added installation instructions. The framework consists of a series of pointer segments each of which has a scheme name followed by data in parenthesis. For example 'scheme1(data1)scheme2(data2)scheme3(data3)'. A pointer segment resolves to a portion of the XML document based on the data and the scheme name. The whole pointer resolves to the first segment that successfully resolves. That is, from the example, if scheme1 resolves to nothing and scheme2 resolves to something then that's used and scheme3 is ignored. In addition to the framework I've added support for the xmlns scheme which binds namespace prefixes to a namespace URI and the element scheme which is a simple way to resolve to particular elements in an XML. I also have limited support for the xpointer scheme the content of which is resolved as an XPath with some extra functions (which I don't support -- hence the limited). I've also thrown in schemes for the two SelectionLanguage values supported by msxml3.

Next time I might try to support the xpointer functions that aren't in xpath using msxml script. But I think I'm losing steam on this project... we'll see.PermalinkCommentsresource technical xml xpointer res xpath xslt

New XSLT - IE7 XML Source View Upgrade Part 2

2007 May 11, 8:55Last time, I had written some resource tools to allow me to view and modify Windows module resources in my ultimate and noble quest to implement the XML content-type fragment in IE7. Using the resource tools I found that MSXML3.DLL isn't signed and that I can replace the XSLT embedded resource with my own, which is great news and means I could continue in my endevour. In the following I discuss how I came up with this replacement for IE7's XML source view.

At first I thought I could just modify the existing XSLT but it turns out that it isn't exactly an XSLT, rather its an IE5 XSL. I tried using the XSL to XSLT converter linked to on MSDN, however the resulting document still requires manual modification. But I didn't want to muck about in their weird language and I figured I could write my own XSLT faster than I could figure out how theirs worked.

I began work on the new XSLT and found it relatively easy to produce. First I got indenting working with all the XML nodes represented appropriately and different CSS classes attached to them to make it easy to do syntax highlighting. Next I added in some javascript to allow for closing and opening of elements. At this point my XSLT had the same features as the original XSL.

Next was the XML mimetype fragment which uses XPointer, a framework around various different schemes for naming parts of an XML document. I focused on the XPointer scheme which is an extended version of XPath. So I named my first task as getting XPaths working. Thankfully javascript running in the HTML document produced by running my XSLT on an XML document has access to the original XML document object via the document.XMLDocument property. From this this I can execute XPaths, however there's no builtin way to map from the XML nodes selected by the XPath to the HTML elements that I produced to represent them. So I created a recursive javascript function and XSLT named-template that both produce the same unique strings based on an XML node's position in the document. For instance 'a3-e2-e' is the name produced for the 3rd attribute of the second element of the root element of the XML document. When producing the HTML for an XML node, I add an 'id' attribute to the HTML with the unique string of the XML node. Then in javascript when I execute an XPath I can discover the unique string of each node in the selected set and map each of them to their corresponding positions in the HTML.

With the hard part out of the way I changed the onload to get the fragment of the URI of the current document, interpret it as an XPath and highlight and navigate to the selected nodes. I also added an interactive floating bar from which you can enter your own XPaths and do the same. On a related note, I found that when accessing XML files via the file URI scheme the fragment is stripped off and not available to the javascript.

The next steps are of course to actually implement XPointer framework parsing as well as the limited number of schemes that the XPointer framework specifies.PermalinkCommentsxml xpointer msxml res xpath xslt resource ie7 technical browser ie xsl

XML Pointer Language (XPointer)

2007 May 10, 12:17The XPointer specification describing the fragment used with text/xml documents.PermalinkCommentsw3c xml xpath xpointer reference uri fragment

XPointer Framework

2007 Feb 22, 10:44The standard for URI fragments for identifying portions of an XML document. I've been looking for this...PermalinkCommentsxml xpointer w3c specification standards xpath uri fragment
Older Entries Creative Commons License Some rights reserved.