url page 9 - Dave's Blog

Search
My timeline on Mastodon

HTML 5 - 5.7.2 Custom protocol and content handlers

2009 Apr 7, 10:45HTML 5 allows websites to register themselves as handlers of particular URI schemes and particular content-types. I think this is great, but I'm surprised it doesn't support POSTing files to allow for interactions with local content.PermalinkCommentshtml5 url uri protocol reference html standard javascript webbrowser registerProtocolHandler

Thoughts on registerProtocolHandler in HTML 5

2009 Apr 7, 9:02

I'm a big fan of the concept of registerProtocolHandler in HTML 5 and in FireFox 3, but not quite the implementation. From a high level, it allows web apps to register themselves as handlers of an URL scheme so for (the canonical) example, GMail can register for the mailto URL scheme. I like the concept:

However, the way its currently spec'ed out I don't like the following: PermalinkCommentsurl template registerprotocolhandler firefox technical url scheme protocol boring html5 uri urn

Web addresses in HTML 5

2009 Mar 23, 11:06The HTML5 spec tells us how it is in the real world for URLs: "This specification defines various algorithms for dealing with Web addresses intended for use by HTML user agents. For historical reaons, in order to be compatible with existing Web content HTML user agents need to implement a number of processes not defined by the URI and IRI specifications [RFC3986], [RFC3987]."PermalinkCommentshtml html5 url uri reference w3c

Chart Types - Google Chart API - Google Code

2009 Mar 12, 12:04Google's chart API can generate QR codes. Just specify in the URL the chart type as 'qr', and the data you want encoded and the returned resource is a QR code image for that data. Just installed a QR code reader on my phone.PermalinkCommentsqr barcode google api chart mobile web cellphone qrcode

Exposing RSS Comments

2009 Mar 10, 1:27Description of wfw:commentRss RSS extension: Content of the element is a URL to a feed of the comments for the particular RSS item. Exactly the sort of thing I was looking for a couple of years ago. At the time none of my web services used it, but now the Delicious v2 feed uses it! Maybe its time to reexamine this sort of thing...PermalinkCommentsrss comment feed reference blog namespace xml wfw

The 'Is It UTF-8?' Quick and Dirty Test

2009 Mar 6, 5:16

I've found while debugging networking in IE its often useful to quickly tell if a string is encoded in UTF-8. You can check for the Byte Order Mark (EF BB BF in UTF-8) but, I rarely see the BOM on UTF-8 strings. Instead I apply a quick and dirty UTF-8 test that takes advantage of the well-formed UTF-8 restrictions.

Unlike other multibyte character encoding forms (see Windows supported character sets or IANA's list of character sets), for example Big5, where sticking together any two bytes is more likely than not to give a valid byte sequence, UTF-8 is more restrictive. And unlike other multibyte character encodings, UTF-8 bytes may be taken out of context and one can still know that its a single byte character, the starting byte of a three byte sequence, etc.

The full rules for well-formed UTF-8 are a little too complicated for me to commit to memory. Instead I've got my own simpler (this is the quick part) set of rules that will be mostly correct (this is the dirty part). For as many bytes in the string as you care to examine, check the most significant digit of the byte:

F:
This is byte 1 of a 4 byte encoded codepoint and must be followed by 3 trail bytes.
E:
This is byte 1 of a 3 byte encoded codepoint and must be followed by 2 trail bytes.
C..D:
This is byte 1 of a 2 byte encoded codepoint and must be followed by 1 trail byte.
8..B:
This is a trail byte.
0..7:
This is a single byte encoded codepoint.
The simpler rules can produce false positives in some cases: that is, they'll say a string is UTF-8 when in fact it might not be. But it won't produce false negatives. The following is table from the Unicode spec. that actually describes well-formed UTF-8.
Code Points 1st Byte 2nd Byte 3rd Byte 4th Byte
U+0000..U+007F 00..7F
U+0080..U+07FF C2..DF 80..BF
U+0800..U+0FFF E0 A0..BF 80..BF
U+1000..U+CFFF E1..EC 80..BF 80..BF
U+D000..U+D7FF ED 80..9F 80..BF
U+E000..U+FFFF EE..EF 80..BF 80..BF
U+10000..U+3FFFF F0 90..BF 80..BF 80..BF
U+40000..U+FFFFF F1..F3 80..BF 80..BF 80..BF
U+100000..U+10FFFF F4 80..8F 80..BF 80..BF

PermalinkCommentstest technical unicode boring charset utf8 encoding

Official Google Webmaster Central Blog: Specify your canonical

2009 Feb 14, 5:41"Now, you can simply add this link tag to specify your preferred version... and Google will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.php?item=swedish-fish. Additional URL properties, like PageRank and related signals, are transferred as well."PermalinkCommentsvia:mattb google link html url uri canonical canonicalization web

draft-masinter-dated-uri-05 - names are readily assigned, offer the persistence of reference that is required by URNs, but do not require a stable authority to assign the name. The first namespace ("duri") is used to refer to URI-

2009 Feb 4, 4:30New URN schemes with no central minting authority. duri allows you to name a resource that was identified by the specified URI at the specified date (e.g. refers to the IETF's homepage at the end of the year 2001). tdb allows you to name a physical object or entity that was described by a resource that was identified by a specified URI at the specified date (e.g. refers to IETF the orginization as referenced by their homepage at the end of the year 2001). Date format is concise but I'd prefer RFC3339 rather than roping in another date format.PermalinkCommentsduri tdb uri url scheme reference ietf date datetime rfc

Post to Twitter using the command line - Download Squad

2009 Jan 15, 10:28Thanks to Matt, for the first time I can see myself using Twitter. Twitter app on my phone notifies me when something's posted so my build process can let me know when its done, or when sync finally finishes, etc. I'd been meaning to setup a mini-notification system with a command line tool to my phone (w/o paying per text msg) but I didn't think of Twitter.PermalinkCommentsvia:swannman api internet curl cli twitter

Broke Man Tries Paying Bill With a Picture of a Spider - Urlesque

2008 Nov 20, 10:58I, like Matt, am a bit incredulous but this is still funny. "Check, cash or money order are acceptable forms of payment when the bill collector comes knocking (or e-mailing), not a picture you doodled of a spider."PermalinkCommentsvia:swannman humor art spider money

Text/Plain Fragment Bookmarklet

2008 Nov 19, 12:58

The text/plain fragment documented in RFC 5147 and described on Erik Wilde's blog struck my interest and, like the XML fragment, I wanted to see if I could implement this in IE. In this case there's no XSLT for me to edit so, like my plain/text word wrap bookmarklet I've implemented it as a bookmarklet. This is only a partial implementation as it doesn't implement the integrity checks.

Check out my text/plain fragment bookmarklet.

PermalinkCommentstext url boring bookmarklet uri plain-text javascript fragment

Word Wrapping IE's Plain Text

2008 Oct 28, 11:23

If you view a plain text document in Internet Explorer 8, for instance the plain text version of Cory Doctorow's book Little Brother and press F12 to bring up the developer toolbar, you can see that IE simply takes the plain text, sticks it inside a

 tag, and renders it.  This means that word wrapping isn't supplied and the only line breaks that appear are those in the document.  However, since the text document is converted to HTML it means I can implement word wrap myself using a bookmarklet:
javascript:function ww() { var preTag = document.getElementsByTagName('pre')[0]; preTag.style.fontFamily="arial"; preTag.style.wordWrap='break-word'; }; ww();
After adding a favorite and setting the favorite's URL to the previous, I can view plain text documents, and select my Word Wrap favorite to apply word wrap and non-fixed width font.
PermalinkCommentsbrowser technical ie wordwrap

Investigation of a Few Application Protocols (Updated)

2008 Oct 25, 6:51

Windows allows for application protocols in which, through the registry, you specify a URL scheme and a command line to have that URL passed to your application. Its an easy way to hook a webbrowser up to your application. Anyone can read the doc above and then walk through the registry and pick out the application protocols but just from that info you can't tell what the application expects these URLs to look like. I did a bit of research on some of the application protocols I've seen which is listed below. Good places to look for information on URI schemes: Wikipedia URI scheme, and ESW Wiki UriSchemes.

Some Application Protocols and associated documentation.
Scheme Name Notes
search-ms Windows Search Protocol The search-ms application protocol is a convention for querying the Windows Search index. The protocol enables applications, like Microsoft Windows Explorer, to query the index with parameter-value arguments, including property arguments, previously saved searches, Advanced Query Syntax, Natural Query Syntax, and language code identifiers (LCIDs) for both the Indexer and the query itself. See the MSDN docs for search-ms for more info.
Example: search-ms:query=food
Explorer.AssocProtocol.search-ms
OneNote OneNote Protocol From the OneNote help: /hyperlink "pagetarget" - Starts OneNote and opens the page specified by the pagetarget parameter. To obtain the hyperlink for any page in a OneNote notebook, right-click its page tab and then click Copy Hyperlink to this Page.
Example: onenote:///\\GUMMO\Users\davris\Documents\OneNote%20Notebooks\OneNote%202007%20Guide\Getting%20Started%20with%20OneNote.one#section-id={692F45F5-A42A-415B-8C0D-39A10E88A30F}&end
callto Callto Protocol ESW Wiki Info on callto
Skype callto info
NetMeeting callto info
Example: callto://+12125551234
itpc iTunes Podcast Tells iTunes to subscribe to an indicated podcast. iTunes documentation.
C:\Program Files\iTunes\iTunes.exe /url "%1"
Example: itpc:http://www.npr.org/rss/podcast.php?id=35
iTunes.AssocProtocol.itpc
pcast
iTunes.AssocProtocol.pcast
Magnet Magnet URI Magnet URL scheme described by Wikipedia. Magnet URLs identify a resource by a hash of that resource so that when used in P2P scenarios no central authority is necessary to create URIs for a resource.
mailto Mail Protocol RFC 2368 - Mailto URL Scheme.
Mailto Syntax
Opens mail programs with new message with some parameters filled in, such as the to, from, subject, and body.
Example: mailto:?to=david.risney@gmail.com&subject=test&body=Test of mailto syntax
WindowsMail.Url.Mailto
MMS mms Protocol MSDN describes associated protocols.
Wikipedia describes MMS.
"C:\Program Files\Windows Media Player\wmplayer.exe" "%L"
Also appears to be related to MMS cellphone messages: MMS IETF Draft.
WMP11.AssocProtocol.MMS
secondlife [SecondLife] Opens SecondLife to the specified location, user, etc.
SecondLife Wiki description of the URL scheme.
"C:\Program Files\SecondLife\SecondLife.exe" -set SystemLanguage en-us -url "%1"
Example: secondlife://ahern/128/128/128
skype Skype Protocol Open Skype to call a user or phone number.
Skype's documentation
Wikipedia summary of skype URL scheme
"C:\Program Files\Skype\Phone\Skype.exe" "/uri:%l"
Example: skype:+14035551111?call
skype-plugin Skype Plugin Protocol Handler Something to do with adding plugins to skype? Maybe.
"C:\Program Files\Skype\Plugin Manager\skypePM.exe" "/uri:%1"
svn SVN Protocol Opens TortoiseSVN to browse the repository URL specified in the URL.
C:\Program Files\TortoiseSVN\bin\TortoiseProc.exe /command:repobrowser /path:"%1"
svn+ssh
tsvn
webcal Webcal Protocol Wikipedia describes webcal URL scheme.
Webcal URL scheme description.
A URL that starts with webcal:// points to an Internet location that contains a calendar in iCalendar format.
"C:\Program Files\Windows Calendar\wincal.exe" /webcal "%1"
Example: webcal://www.lightstalkers.org/LS.ics
WindowsCalendar.UrlWebcal.1
zune Zune Protocol Provides access to some Zune operations such as podcast subscription (via Zune Insider).
"c:\Program Files\Zune\Zune.exe" -link:"%1"
Example: zune://subscribe/?name=http://feeds.feedburner.com/wallstrip.
feed Outlook Add RSS Feed Identify a resource that is a feed such as Atom or RSS. Implemented by Outlook to add the indicated feed to Outlook.
Feed URI scheme pre-draft document
"C:\PROGRA~2\MICROS~1\Office12\OUTLOOK.EXE" /share "%1"
im IM Protocol RFC 3860 IM URI scheme description
Like mailto but for instant messaging clients.
Registered by Office Communicator but I was unable to get it to work as described in RFC 3860.
"C:\Program Files (x86)\Microsoft Office Communicator\Communicator.exe" "%1"
tel Tel Protocol RFC 5341 - tel URI scheme IANA assignment
RFC 3966 - tel URI scheme description
Call phone numbers via the tel URI scheme. Implemented by Office Communicator.
"C:\Program Files (x86)\Microsoft Office Communicator\Communicator.exe" "%1"
(Updated 2008-10-27: Added feed, im, and tel from Office Communicator)PermalinkCommentstechnical application protocol shell url windows

THOMAS Publishes Permanent Links (Another Recommendation Realized) | The Open House Project

2008 Oct 10, 3:35Apparently thanks to the Open House Project, US legislation can now have real and permanent links. I'm kind of surprised that legislation would exist so freely on the Internet without real links. The Open House Project is "a collaborative effort by government and legislative information experts, congressional staff, non-profit organizers and bloggers to study how the House of Representatives currently integrates the Internet into its operations, and to suggest attainable reforms to promote public access to its work and members."PermalinkCommentsinternet url link uri politics

IE8 Beta2 Shipped

2008 Aug 27, 11:36

Internet Explorer 8 Beta 2 is now available! Some of the new features from this release that I really enjoy are Tab Grouping, the new address-bar, and InPrivate Subscriptions.

Tab Grouping groups tabs that are opened from the same page. For example, on a Google search results page if you open the first two links the two new tabs will be grouped with the Google search results page. If you close one of the tabs in that group focus goes to another tab in that group. Its small, but I really enjoy this feature and without knowing exactly what I wanted while using IE7 and FF2 I knew I wanted something like this. Plus the colors for the tab groups are pretty!

The new address bar and search box makes life much easier by searching through my browsing history for whatever I'm typing in. Other things are searched besides history but since I ignore favorites and use Delicious I mostly care about history. At any rate its one of the things that makes it impossible for me to go machines running IE7.

InPrivate Subscriptions allows you to subscribe to a feed of URLs from which IE should not download content. This is intended for avoiding sites that track you across websites and could sell or share your personal information, but this feature could be used for anything where the goal is to avoid a set of URLs. For example, phishing, malware sites, ad blocking, etc. etc. I think there's some interesting uses for this feature that we have yet to see.

Anyway, we're another release closer to the final IE8 and I can relax a little more.

PermalinkCommentsmicrosoft browser technical ie8 ie

Tag Metadata in Feeds

2008 Aug 25, 10:13

As noted previously, my page consists of the aggregation of my various feeds and in working on that code recently it was again brought to my attention that everyone has different ways of representing tag metadata in feeds. I made up a list of how my various feed sources represent tags and list that data here so that it might help others in the future.

Tag markup from various sources
Source Feed Type Tag Markup Scheme One Tag Per Element Tag Scheme URI Human / Machine Names Example Markup
LiveJournal Atom atom:category yes no no , (source)
LiveJournal RSS 2.0 rss2:category yes no no technical
(soure)
WordPress RSS 2.0 rss2:category yes no no , (source)
Delicious RSS 1.0 dc:subject no no no photosynth photos 3d tool
(source)
Delicious RSS 2.0 rss2:category yes yes no domain="http://delicious.com/SequelGuy/">
hulu

(source)
Flickr Atom atom:category yes yes no term="seattle"
scheme="http://www.flickr.com/photos/tags/" />

(source)
Flickr RSS 2.0 media:category no yes no scheme="urn:flickr:tags">
seattle washington baseball mariners

(source)
YouTube RSS 2.0 media:category no no no label="Tags">
bunny rabbit yawn cadbury

(source)
LibraryThing RSS 2.0 No explicit tag metadata. no no no n/a, (source)
Tag markup scheme
Tag Markup Scheme Notes Example
Atom Category
atom:category
xmlns:atom="http://www.w3.org/2005/Atom"
category/@term
Required category name.
category/@scheme
Optional IRI id'ing the categorization scheme.
category/@label
Optional human readable category name.
term="catName"
scheme="tag:deletethis.net,2008:tagscheme"
label="category name in human readable format"/>
RSS 2.0 category
rss2:category
empty namespace
category/@domain
Optional string id'ing the categorization scheme.
category/text()
Required category name. The value of the element is a forward-slash-separated string that identifies a hierarchic location in the indicated taxonomy. Processors may establish conventions for the interpretation of categories.
domain="tag:deletethis.net,2008:tagscheme">
MSFT
Yahoo Media RSS Module category
media:category
xmlns:media="http://search.yahoo.com/mrss/"
category/text()
Required category name.
category/@domain
Optional string id'ing the categorization scheme.
scheme="http://dmoz.org"
label="Ace Ventura - Pet Detective">
Arts/Movies/Titles/A/Ace_Ventura_Series/Ace_Ventura_-_Pet_Detective
Dublin Core subject
dc:subject
xmlns:dc="http://purl.org/dc/elements/1.1/"
subject/text()
Required category name. Typically, the subject will be represented using keywords, key phrases, or classification codes. Recommended best practice is to use a controlled vocabulary.
humor

Update 2009-9-14: Added WordPress to the Tag Markup table and namespaces to the Tag Markup Scheme table.

PermalinkCommentsfeed media delicious technical atom youtube yahoo rss tag

URI Fragment Info Roundup

2008 Apr 21, 11:53

['Neverending story' by Alexandre Duret-Lutz. A framed photo of books with the droste effect applied. Licensed under creative commons.]Information about URI Fragments, the portion of URIs that follow the '#' at the end and that are used to navigate within a document, is scattered throughout various documents which I usually have to hunt down. Instead I'll link to them all here.

Definitions. Fragments are defined in the URI RFC which states that they're used to identify a secondary resource that is related to the primary resource identified by the URI as a subset of the primary, a view of the primary, or some other resource described by the primary. The interpretation of a fragment is based on the mime type of the primary resource. Tim Berners-Lee notes that determining fragment meaning from mime type is a problem because a single URI may contain a single fragment, however over HTTP a single URI can result in the same logical resource represented in different mime types. So there's one fragment but multiple mime types and so multiple interpretations of the one fragment. The URI RFC says that if an author has a single resource available in multiple mime types then the author must ensure that the various representations of a single resource must all resolve fragments to the same logical secondary resource. Depending on which mime types you're dealing with this is either not easy or not possible.

HTTP. In HTTP when URIs are used, the fragment is not included. The General Syntax section of the HTTP standard says it uses the definitions of 'URI-reference' (which includes the fragment), 'absoluteURI', and 'relativeURI' (which don't include the fragment) from the URI RFC. However, the 'URI-reference' term doesn't actually appear in the BNF for the protocol. Accordingly the headers like 'Request-URI', 'Content-Location', 'Location', and 'Referer' which include URIs are defined with 'absoluteURI' or 'relativeURI' and don't include the fragment. This is in keeping with the original fragment definition which says that the fragment is used as a view of the original resource and consequently only needed for resolution on the client. Additionally, the URI RFC explicitly notes that not including the fragment is a privacy feature such that page authors won't be able to stop clients from viewing whatever fragments the client chooses. This seems like an odd claim given that if the author wanted to selectively restrict access to portions of documents there are other options for them like breaking out the parts of a single resource to which the author wishes to restrict access into separate resources.

HTML. In HTML, the HTML mime type RFC defines HTML's fragment use which consists of fragments referring to elements with a corresponding 'id' attribute or one of a particular set of elements with a corresponding 'name' attribute. The HTML spec discusses fragment use additionally noting that the names and ids must be unique in the document and that they must consist of only US-ASCII characters. The ID and NAME attributes are further restricted in section 6 to only consist of alphanumerics, the hyphen, period, colon, and underscore. This is a subset of the characters allowed in the URI fragment so no encoding is discussed since technically its not needed. However, practically speaking, browsers like FireFox and Internet Explorer allow for names and ids containing characters outside of the defined set including characters that must be percent-encoded to appear in a URI fragment. The interpretation of percent-encoded characters in fragments for HTML documents is not consistent across browsers (or in some cases within the same browser) especially for the percent-encoded percent.

Text. Text/plain recently got a fragment definition that allows fragments to refer to particular lines or characters within a text document. The scheme no longer includes regular expressions, which disappointed me at first, but in retrospect is probably good idea for increasing the adoption of this fragment scheme and for avoiding the potential for ubiquitous DoS via regex. One of the authors also notes this on his blog. I look forward to the day when this scheme is widely implemented.

XML. XML has the XPointer framework to define its fragment structure as noted by the XML mime type definition. XPointer consists of a general scheme that contains subschemes that identify a subset of an XML document. Its too bad such a thing wasn't adopted for URI fragments in general to solve the problem of a single resource with multiple mime type representations. I wrote more about XPointer when I worked on hacking XPointer into IE.

SVG and MPEG. Through the Media Fragments Working Group I found a couple more fragment scheme definitions. SVG's fragment scheme is defined in the SVG documentation and looks similar to XML's. MPEG has one defined but I could only find it as an ISO document "Text of ISO/IEC FCD 21000-17 MPEG-12 FID" and not as an RFC which is a little disturbing.

AJAX. AJAX websites have used fragments as an escape hatch for two issues that I've seen. The first is getting a unique URL for versions of a page that are produced on the client by script. The fragment may be changed by script without forcing the page to reload. This goes outside the rules of the standards by using HTML fragments in a fashion not called out by the HTML spec. but it does seem to be inline with the spirit of the fragment in that it is a subview of the original resource and interpretted client side. The other hack-ier use of the fragment in AJAX is for cross domain communication. The basic idea is that different frames or windows may not communicate in normal fashions if they have different domains but they can view each other's URLs and accordingly can change their own fragments in order to send a message out to those who know where to look. IMO this is not inline with the spirit of the fragment but is rather a cool hack.

PermalinkCommentsxml text ajax technical url boring uri fragment rfc

5 useful url rewriting examples using .htaccess

2008 Apr 10, 8:14"In this post, I've given five useful examples of URL rewriting using .htacess."PermalinkCommentshtaccess apache linux reference uri url example blog article

del.icio.us API Backup

2008 Apr 3, 9:22Del.icio.us API allows you to get an XML file of all your links.PermalinkCommentsapi delicious backup xml url

Gmail integration with Internet Explorer 8

2008 Apr 3, 9:00

Internet Explorer LogoGmail Logo licensed under CC by Victor de la FuenteWith the new features of IE8 there's several easy ways to integrate Gmail, Google's web mail service, for mail composition, searching, and monitoring that I enjoy using.

Composition
I made a Send via Gmail activity that allows you to select some text, a document, or link and via the activity menu open a new tab to compose a new message with the selection. Go to my activity page and click "Send via Gmail" (source) to install it. I found info on the gmail composition URL in the comments of this gmail howto article and used that in the activity. I talked about activities previously.
Search
I've made a search provider that searches your gmail account. See my search provider page and select 'Gmail' (source) to install the Gmail search provider. Search providers aren't new to IE8 but this fits in with Gmail integration in IE. Again in the comments of another howto I found information on a Gmail search URL.
Monitor
New to IE8 is authenticated feed support and favorites bar monitoring which combined with the Gmail inbox feed means you can see when you get new mail in your favorites bar in IE. To do this, navigate to the feed https://mail.google.com/mail/feed/atom, click 'Subscribe to this feed', then click on the Add button in the upper left (the star with plus icon) and select 'Monitor on Favorites Bar' to add this as a monitored item in the favorites bar. Next, right click on the new item in your favorites bar, open the properties dialog, and enter your Gmail username and password into the new username and password fields. Now when you get new mail the Gmail feed item will shine and bold and you'll be able to get to new messages in the dropdown. I described monitored feed items previously.
PermalinkCommentsactivity gmail search howto google ie feed rss opensearch
Older EntriesNewer Entries Creative Commons License Some rights reserved.