2008 Aug 25, 10:13
As noted previously, my page consists of the
aggregation of my various feeds and in working on that code recently it was again brought to my attention that everyone has different ways of representing tag metadata in feeds. I made up a
list of how my various feed sources represent tags and list that data here so that it might help others in the future.
Tag markup from various sources
Source
|
Feed Type
|
Tag Markup Scheme
|
One Tag Per Element
|
Tag Scheme URI
|
Human / Machine Names
|
Example Markup
|
LiveJournal
|
Atom
|
atom:category
|
yes
|
no
|
no
|
, (source)
|
LiveJournal
|
RSS 2.0
|
rss2:category
|
yes
|
no
|
no
|
technical
(soure)
|
WordPress
|
RSS 2.0
|
rss2:category
|
yes
|
no
|
no
|
, (source)
|
Delicious
|
RSS 1.0
|
dc:subject
|
no
|
no
|
no
|
photosynth photos 3d tool
(source)
|
Delicious
|
RSS 2.0
|
rss2:category
|
yes
|
yes
|
no
|
domain="http://delicious.com/SequelGuy/">
hulu
(source)
|
Flickr
|
Atom
|
atom:category
|
yes
|
yes
|
no
|
term="seattle"
scheme="http://www.flickr.com/photos/tags/" />
(source)
|
Flickr
|
RSS 2.0
|
media:category
|
no
|
yes
|
no
|
scheme="urn:flickr:tags">
seattle washington baseball mariners
(source)
|
YouTube
|
RSS 2.0
|
media:category
|
no
|
no
|
no
|
label="Tags">
bunny rabbit yawn cadbury
(source)
|
LibraryThing
|
RSS 2.0
|
No explicit tag metadata.
|
no
|
no
|
no
|
n/a, (source)
|
Tag markup scheme
Tag Markup Scheme
|
Notes
|
Example
|
Atom Category
atom:category
xmlns:atom="http://www.w3.org/2005/Atom"
|
-
category/@term
-
Required category name.
-
category/@scheme
-
Optional IRI id'ing the categorization scheme.
-
category/@label
-
Optional human readable category name.
|
term="catName"
scheme="tag:deletethis.net,2008:tagscheme"
label="category name in human readable format"/>
|
RSS 2.0 category
rss2:category
empty namespace
|
-
category/@domain
-
Optional string id'ing the categorization scheme.
-
category/text()
-
Required category name. The value of the element is a forward-slash-separated string that identifies a hierarchic location in the indicated taxonomy. Processors may establish conventions
for the interpretation of categories.
|
domain="tag:deletethis.net,2008:tagscheme">
MSFT
|
Yahoo Media RSS Module category
media:category
xmlns:media="http://search.yahoo.com/mrss/"
|
-
category/text()
-
Required category name.
-
category/@domain
-
Optional string id'ing the categorization scheme.
|
scheme="http://dmoz.org"
label="Ace Ventura - Pet Detective">
Arts/Movies/Titles/A/Ace_Ventura_Series/Ace_Ventura_-_Pet_Detective
|
Dublin Core subject
dc:subject
xmlns:dc="http://purl.org/dc/elements/1.1/"
|
-
subject/text()
-
Required category name. Typically, the subject will be represented using keywords, key phrases, or classification codes. Recommended best practice is to use a controlled vocabulary.
|
humor
|
Update 2009-9-14: Added WordPress to the Tag Markup table and namespaces to the Tag Markup Scheme table.
feed media delicious technical atom youtube yahoo rss tag 2008 Jul 3, 10:12"Fast hashing of variable-length text strings", from Source Communications of the ACM archive Volume 33 , Issue 6 (June 1990) Pages: 677 - 680, Year of Publication: 1990, Author Peter K. Pearson,
Lawrence Livermore National Lab, Livermore, CA
hash programming acm reference 2008 Jun 12, 3:24A nice in browser regex tester. Give a sample string and a regex and the matches are highlighted.
regex javascript programming development 2008 May 2, 1:55Avoid sniffing using the HTTP range header: "...if we have an application...which protects against FindMimeFromData XSS attacks by searching the first 256 bytes for certain strings, then we can
simply place our strings after the first 256 bytes and get Fl
via:swannman http http-header range xss security 2008 Apr 12, 10:38
For Encode-O-Matic, my encoding tool written in C#, I had to figure out the appropriate DllImport declarations to use IDN Win32
functions which was a pain. To spare others that pain here's the two files CharacterSetEncoding.cs and NationalLanguageSupportUtilities.cs that declare the DllImports for IdnToUnicode, IdnToAscii,
NormalizeString, MultiByteToWideChar, and WideCharToMultiByte.
encodeomatic boring csharp widechartomultibyte idn tool dllimport 2008 Mar 28, 10:38Running time of regular expression matching. Most modern regex APIs do backtracking and can have exponential running time depending on the regex string.
regex programming reference wikipedia big-oh running-time 2008 Jan 28, 10:39Name your computer an HTML string to inject that HTML into the target wireless router's HTML configuration page.
via:swannman security xss injection dhcp 2007 Oct 24, 3:26Convert HTML into RSS with some predefined strings to look for.
feed rss tool free html convert 2007 Sep 27, 2:17Starting on a new simple project I wanted to get the history of my Delicious links. Delicious has an export tool available via the settings section so I thought I'd try that. However, the links
aren't exported in XML not even in XHTML but rather in HTML. Shocking. An example:
"Don't Tase Me, Bro!" (UF Student Tasered Remix)
Remix of the 'Don't tase me, bro!' guy getting tasered.
At this point I'm already not going to use this file because its in HTML but I'm even more disgusted by those date time values.
Raymond Chen of the Old New Thing posted about recognizing timestamps and timestamp sentinel values. From the first blog post and with the use of a calculator for base conversion one can tell that
those are UNIX style timestamps counting the number of seconds since 1970.
It reminds me of my hatred for the MIME date time format I developed working on my webpage's server side parsing of atom and RSS. Atom is
of course my favorite as Atom uses the Internet date time format described in the following documents. Here's an example of one
2007-09-27T020:50:00.000-08:00
On the other hand the evil and villainous RSS uses the MIME date time format now described in the more
recent IETF MIME standard. Here's an example Thu, 27 Sep 2007 20:50:00 -0800
The Internet date time format has the advantage of being so easy to sort. An alphabetic sort with normal C-style collation rules of strings containing Internet date times will also sort them
chronologically. This is not the case for the MIME date time due to the preceding day of the week and the spelled out month name. This also means that when producing these you have to figure out
the day of the week and when parsing them you have to match month names rather than just parsing out numbers. Anyway now days if I see mention of a date time in a new proposed standard or spec I be
sure to point out the numerous advantages of the Internet date time format.
date xml html feed time technical date-time code atom rss 2007 Sep 27, 12:01Another open effort to produce an XSLT library that does some standard things you might want like string manipulation, URI combining, etc etc
xsl xslt reference library xml xpath proramming api 2007 Sep 26, 11:57Free XSLT Extension libraries to support things like date/time conversions, string manipulation, etc.
xslt xsl api xpath xml library extension programming free development 2007 Aug 9, 5:41To satisfy my hands which have already learned to type *nix commands I like to install
Win32 versions of common GNU utilities. Unfortunately, the
which
command is a rather literal port and requires you to enter the entire name of the command for which you're looking. That is '
which which
' won't find itself but
'
which which.exe
' will. This makes this almost useless for me so I thought to write my own as a batch file. I had learned about a few goodies available in cmd.exe that I thought would
make this an easy task. It turned out to be more difficult than I thought.
for /F "usebackq tokens=*" %%a in ( `"echo %PATH:;=& echo %"` ) do (
for /F "usebackq tokens=*" %%b in ( `"echo %PATHEXT:;=& echo %"` ) do (
if exist "%%a"\%1%%b (
for %%c in ( "%%a"\%1%%b ) do (
echo %%~fc
)
)
)
)
The environment variables
PATH
and
PATHEXT
hold the list of paths to search through to find commands, and the extensions of files that should be run as
commands respectively. The '
for /F "usebackq tokens=*" %%a in (...) do (...)
' runs the '
do
' portion with
%%a
sequentially taking on the value of every line in
the '
in
' portion. That's nice, but
PATH
and
PATHEXT
don't have their elements on different lines and I don't know of a way to escape a newline character to
appear in a batch file. In order to get the
PATH
and
PATHEXT
's elements onto different lines I used the
%ENV:a=b%
syntax which replaces occurrences of a with b
in the value of ENV. I replaced the '
;
' delimiter with the text '
& echo
' which means
%PATHEXT:;=& echo%
evaluates to something like "
echo
.COM& echo .EXE& echo .BAT& ...
". I have to put the whole expression in double quotes in order to escape the '&' for appearing in the batch file. The
usebackq
and
the backwards quotes means that the backquoted string should be replaced with the output of the execution of its content. So in that fashion I'm able to get each element of the env. variable onto new
lines. The rest is pretty straight forward.
Also, it supports wildcards:
C:\Users\davris>which.cmd *hi*
C:\Windows\System32\GRAPHICS.COM
C:\Windows\System32\SearchIndexer.exe
D:\bin\which.exe
D:\bin\which.cmd
which cmd technical batch for 2007 Jul 22, 8:38Animated interactive graph of the FSM used to parse any regex and corresponding string you enter.
regex flash visualization fsm interactive howto language 2007 Jul 13, 8:30I bought an external backup drive a few weekends ago. I've previously setup a
Subversion repository so I decided to move everything into the repository and
then back it up. So in went the contents of all of my %USERPROFILE% and ~ directories with a bit of sorting and pruning. Not too much though given its much easier to dump in everything and search for
what I want then to take the time to examine and grade each file. What follows are the notes I took while setting this up. It takes me a bit of time to look up the help on each command so I figure
I'll write it all down here for the benefit of myself and potentially others...
Setting Up the Backup Drive For Linux
I first changed the filesystem on the drive to ext3. I plugged it into my USB2.0 port and ran fdisk:
sudo fdisk /dev/sda
Useful commands I used to do this follow mostly in order:
-
m
-
help
-
p
-
print current partitions
-
d
-
delete current partition
-
n
-
create new partition (I used the defaults)
-
w
-
write changes and exit
Then I formatted for ext3.
sudo mkfs.ext3 /dev/sda1
I made it easy to mount:
sudo vim /etc/fstab
# added line to end:
/dev/sda1 /media/backup ext3 rw,user,noauto 0 0
I setup the directory structure on the disk
mount /media/backup
sudo mkdir /media/backup/users
sudo mkdir /media/backup/users/dave
sudo chown dave:dave /media/backup/users/dave
After all that its easy to make a copy of the Subversion repository:
mount /media/backup
cp -Rv /home/dave/svn /media/backup/users/dave/
umount /media/backup
Next on the agenda is to add a cron job to do this regularly.
Subversion Command Reference
On a machine that has local access to the Subversion repository you can check out a specific subdirectory as follows using the file scheme:
svn co file:///home/dave/svn/trunk/web/dave%40deletethis.net/public_html
Note also that although one of my directories is named 'dave@deletethis.net' Subversion requires the '@' to be percent-encoded.
Other useful subversion commands:
-
svn help
-
help
-
svn list file:///home/dave/svn/
-
list all files in root dir of svn depot
-
svn list -R file:///home/dave/svn/
-
list all files in svn depot
-
svn list -R file:///home/dave/svn/ | grep \/$
-
list all directories
-
svn status
-
List status of all files in the working copy directory as in - modified, not in repository, etc
-
svn update
-
Brings the working copy up to date wrt the repository
-
svn commit
-
Commit changes from the working copy to the repository
-
svn add / move / delete
-
Perform the specified action -- occurs immediately
Setting up Windows Client for Auto Auth into SVN
When using an SVN client on Windows via svn+ssh its useful to have the Windows automatically generate connections to the SVN server. I use
putty on my Windows machines so I read the directions on
using public keys with putty.
putty.exe dave@deletethis.net
cd .ssh
vim authorized_keys # leave the putty window open for now
puttygen.exe
Click the 'generate' button
Move the mouse around until finished
Copy text in 'Public key for pasting into OpenSSH authorized_keys file:' to putty window & save & close putty window
Enter Key passphrase & Comment in puttygen
Save the private key somewhere private
pageant.exe
'Add Key' the private key just saved.
Checking out using Tortoise SVN
On one of my Windows machines I've already installed Tortoise SVN. Checking out from my SVN repository was really easy. I just right clicked in Explorer in a directory and selected "SVN Checkout...".
Then in the following dialog I entered the svn URI:
svn+ssh://dave@deletethis.net/home/dave/svn/trunk/web/dave%40deletethis.net/public_html/
Note again that the '@' that is part of the directory name is percent-encoded as '%40' while the '@' in the userinfo is not.
Windows Command Line Check Out
On my media center I didn't want to install Tortoise SVN so rather I used the
command line tool. I setup pageant like before the only
difficulty was getting the SVN command line tool to use putty. With the default configuration you can use the SVN_SSH environment variable to point at a compliant SSH command line tool. The trick is
that its interpreted as a backslash escaped string. So I set mine thusly:
set SVN_SSH=C:\\users\\dave\\bin\\putty\\plink.exe
The escaping solved the vague error I received about not being able to create the tunnel.
backup technical personal windows svn linux subversion 2007 Jun 27, 11:27RFC defining a registry of string sorts that other future RFCs may reference.
rfc reference ietf internet protocol registry collation sort string locale 2007 Jun 4, 5:26Music video for Deadly Deadly Bees the lead song from the album based on the Futurama episode The String featuring clips from the episode. (youtube version take down, argh!)
humor music futurama video music-video 2007 May 11, 3:48Type in some latin script and you'll get back a string of Unicode characters that looks like its rotated 180 degrees. More info on exciting Unicode codepoints.
unicode javascript tool tools web language 2007 May 11, 8:55Last time, I had written some resource tools to allow me to view and modify Windows module resources in my ultimate and noble quest to
implement the XML content-type fragment in IE7. Using the resource tools I found that MSXML3.DLL isn't signed and that I can replace the XSLT embedded resource with my own, which is great news and
means I could continue in my endevour. In the following I discuss how I came up with this
replacement for IE7's XML source view.
At first I thought I could just modify the existing XSLT but it turns out that it isn't exactly an
XSLT, rather its an
IE5 XSL. I tried using the
XSL to XSLT converter linked to on MSDN, however the resulting document still
requires manual modification. But I didn't want to muck about in their weird language and I figured I could write my own XSLT faster than I could figure out how theirs worked.
I began work on the new XSLT and found it relatively easy to produce. First I got indenting working with all the XML nodes represented appropriately and different CSS classes attached to them to make
it easy to do syntax highlighting. Next I added in some javascript to allow for closing and opening of elements. At this point my XSLT had the same features as the original XSL.
Next was the XML mimetype fragment which uses
XPointer, a framework around various different schemes for naming parts of an XML document. I focused on the
XPointer scheme which is an extended version of
XPath. So I named my first task as getting XPaths working.
Thankfully javascript running in the HTML document produced by running my XSLT on an XML document has access to the original XML document object via the
document.XMLDocument property. From this this I can execute XPaths, however there's no builtin way to map from the XML nodes selected by
the XPath to the HTML elements that I produced to represent them. So I created a recursive javascript function and XSLT named-template that both produce the same unique strings based on an XML node's
position in the document. For instance 'a3-e2-e' is the name produced for the 3rd attribute of the second element of the root element of the XML document. When producing the HTML for an XML node, I
add an 'id' attribute to the HTML with the unique string of the XML node. Then in javascript when I execute an XPath I can discover the unique string of each node in the selected set and map each of
them to their corresponding positions in the HTML.
With the hard part out of the way I changed the onload to get the fragment of the URI of the current document, interpret it as an XPath and highlight and navigate to the selected nodes. I also added
an interactive floating bar from which you can enter your own XPaths and do the same. On a related note, I found that when accessing XML files via the file URI scheme the fragment is stripped off and
not available to the javascript.
The next steps are of course to actually implement XPointer framework parsing as well as the limited number of schemes that the XPointer framework specifies.
xml xpointer msxml res xpath xslt resource ie7 technical browser ie xsl 2007 Mar 28, 12:54Given an ABNF description of a grammar, RandomGrammar produces a random string that fits that grammar. This is a personal project I worked on previously and have just now made available again on my
website.
me personal projects java randomgrammar abnf 2007 Feb 14, 3:12Another of Richard's tools that allows you to compose strings by visually picking characters from particular alphabets.
unicode tools picker encoding javascript language tool codepage i18n