string page 3 - Dave's Blog

Search
My timeline on Mastodon

Tag Metadata in Feeds

2008 Aug 25, 10:13

As noted previously, my page consists of the aggregation of my various feeds and in working on that code recently it was again brought to my attention that everyone has different ways of representing tag metadata in feeds. I made up a list of how my various feed sources represent tags and list that data here so that it might help others in the future.

Tag markup from various sources
Source Feed Type Tag Markup Scheme One Tag Per Element Tag Scheme URI Human / Machine Names Example Markup
LiveJournal Atom atom:category yes no no , (source)
LiveJournal RSS 2.0 rss2:category yes no no technical
(soure)
WordPress RSS 2.0 rss2:category yes no no , (source)
Delicious RSS 1.0 dc:subject no no no photosynth photos 3d tool
(source)
Delicious RSS 2.0 rss2:category yes yes no domain="http://delicious.com/SequelGuy/">
hulu

(source)
Flickr Atom atom:category yes yes no term="seattle"
scheme="http://www.flickr.com/photos/tags/" />

(source)
Flickr RSS 2.0 media:category no yes no scheme="urn:flickr:tags">
seattle washington baseball mariners

(source)
YouTube RSS 2.0 media:category no no no label="Tags">
bunny rabbit yawn cadbury

(source)
LibraryThing RSS 2.0 No explicit tag metadata. no no no n/a, (source)
Tag markup scheme
Tag Markup Scheme Notes Example
Atom Category
atom:category
xmlns:atom="http://www.w3.org/2005/Atom"
category/@term
Required category name.
category/@scheme
Optional IRI id'ing the categorization scheme.
category/@label
Optional human readable category name.
term="catName"
scheme="tag:deletethis.net,2008:tagscheme"
label="category name in human readable format"/>
RSS 2.0 category
rss2:category
empty namespace
category/@domain
Optional string id'ing the categorization scheme.
category/text()
Required category name. The value of the element is a forward-slash-separated string that identifies a hierarchic location in the indicated taxonomy. Processors may establish conventions for the interpretation of categories.
domain="tag:deletethis.net,2008:tagscheme">
MSFT
Yahoo Media RSS Module category
media:category
xmlns:media="http://search.yahoo.com/mrss/"
category/text()
Required category name.
category/@domain
Optional string id'ing the categorization scheme.
scheme="http://dmoz.org"
label="Ace Ventura - Pet Detective">
Arts/Movies/Titles/A/Ace_Ventura_Series/Ace_Ventura_-_Pet_Detective
Dublin Core subject
dc:subject
xmlns:dc="http://purl.org/dc/elements/1.1/"
subject/text()
Required category name. Typically, the subject will be represented using keywords, key phrases, or classification codes. Recommended best practice is to use a controlled vocabulary.
humor

Update 2009-9-14: Added WordPress to the Tag Markup table and namespaces to the Tag Markup Scheme table.

PermalinkCommentsfeed media delicious technical atom youtube yahoo rss tag

Fast hashing of variable-length text strings

2008 Jul 3, 10:12"Fast hashing of variable-length text strings", from Source Communications of the ACM archive Volume 33 , Issue 6 (June 1990) Pages: 677 - 680, Year of Publication: 1990, Author Peter K. Pearson, Lawrence Livermore National Lab, Livermore, CAPermalinkCommentshash programming acm reference

Regex Tester - RegexPal

2008 Jun 12, 3:24A nice in browser regex tester. Give a sample string and a regex and the matches are highlighted.PermalinkCommentsregex javascript programming development

Web Security Research- Alex's Corner: HTTP Range & Request-Range Request Headers

2008 May 2, 1:55Avoid sniffing using the HTTP range header: "...if we have an application...which protects against FindMimeFromData XSS attacks by searching the first 256 bytes for certain strings, then we can simply place our strings after the first 256 bytes and get FlPermalinkCommentsvia:swannman http http-header range xss security

Encoding methods in C#

2008 Apr 12, 10:38

For Encode-O-Matic, my encoding tool written in C#, I had to figure out the appropriate DllImport declarations to use IDN Win32 functions which was a pain. To spare others that pain here's the two files CharacterSetEncoding.cs and NationalLanguageSupportUtilities.cs that declare the DllImports for IdnToUnicode, IdnToAscii, NormalizeString, MultiByteToWideChar, and WideCharToMultiByte.

PermalinkCommentsencodeomatic boring csharp widechartomultibyte idn tool dllimport

Regular expression - Wikipedia, the free encyclopedia

2008 Mar 28, 10:38Running time of regular expression matching. Most modern regex APIs do backtracking and can have exponential running time depending on the regex string.PermalinkCommentsregex programming reference wikipedia big-oh running-time

DHCP/mDNS Injection Issues | GNUCITIZEN

2008 Jan 28, 10:39Name your computer an HTML string to inject that HTML into the target wireless router's HTML configuration page.PermalinkCommentsvia:swannman security xss injection dhcp

RSSxl Beta - RSS Generator

2007 Oct 24, 3:26Convert HTML into RSS with some predefined strings to look for.PermalinkCommentsfeed rss tool free html convert

Date Time Formats

2007 Sep 27, 2:17Starting on a new simple project I wanted to get the history of my Delicious links. Delicious has an export tool available via the settings section so I thought I'd try that. However, the links aren't exported in XML not even in XHTML but rather in HTML. Shocking. An example:
"Don't Tase Me, Bro!" (UF Student Tasered Remix)
Remix of the 'Don't tase me, bro!' guy getting tasered.At this point I'm already not going to use this file because its in HTML but I'm even more disgusted by those date time values. Raymond Chen of the Old New Thing posted about recognizing timestamps and timestamp sentinel values. From the first blog post and with the use of a calculator for base conversion one can tell that those are UNIX style timestamps counting the number of seconds since 1970.

It reminds me of my hatred for the MIME date time format I developed working on my webpage's server side parsing of atom and RSS. Atom is of course my favorite as Atom uses the Internet date time format described in the following documents. Here's an example of one 2007-09-27T020:50:00.000-08:00 On the other hand the evil and villainous RSS uses the MIME date time format now described in the more recent IETF MIME standard. Here's an example Thu, 27 Sep 2007 20:50:00 -0800
The Internet date time format has the advantage of being so easy to sort. An alphabetic sort with normal C-style collation rules of strings containing Internet date times will also sort them chronologically. This is not the case for the MIME date time due to the preceding day of the week and the spelled out month name. This also means that when producing these you have to figure out the day of the week and when parsing them you have to match month names rather than just parsing out numbers. Anyway now days if I see mention of a date time in a new proposed standard or spec I be sure to point out the numerous advantages of the Internet date time format.
PermalinkCommentsdate xml html feed time technical date-time code atom rss

XSLT Standard Library

2007 Sep 27, 12:01Another open effort to produce an XSLT library that does some standard things you might want like string manipulation, URI combining, etc etcPermalinkCommentsxsl xslt reference library xml xpath proramming api

EXSLT

2007 Sep 26, 11:57Free XSLT Extension libraries to support things like date/time conversions, string manipulation, etc.PermalinkCommentsxslt xsl api xpath xml library extension programming free development

Which which - Batch File Hackiness

2007 Aug 9, 5:41To satisfy my hands which have already learned to type *nix commands I like to install Win32 versions of common GNU utilities. Unfortunately, the which command is a rather literal port and requires you to enter the entire name of the command for which you're looking. That is 'which which' won't find itself but 'which which.exe' will. This makes this almost useless for me so I thought to write my own as a batch file. I had learned about a few goodies available in cmd.exe that I thought would make this an easy task. It turned out to be more difficult than I thought.

for /F "usebackq tokens=*" %%a in ( `"echo %PATH:;=& echo %"` ) do (
    for /F "usebackq tokens=*" %%b in ( `"echo %PATHEXT:;=& echo %"` ) do (
        if exist "%%a"\%1%%b (
            for  %%c in ( "%%a"\%1%%b ) do (
                echo %%~fc
            )
        )
    )
)
The environment variables PATH and PATHEXT hold the list of paths to search through to find commands, and the extensions of files that should be run as commands respectively. The 'for /F "usebackq tokens=*" %%a in (...) do (...)' runs the 'do' portion with %%a sequentially taking on the value of every line in the 'in' portion. That's nice, but PATH and PATHEXT don't have their elements on different lines and I don't know of a way to escape a newline character to appear in a batch file. In order to get the PATH and PATHEXT's elements onto different lines I used the %ENV:a=b% syntax which replaces occurrences of a with b in the value of ENV. I replaced the ';' delimiter with the text '& echo ' which means %PATHEXT:;=& echo% evaluates to something like "echo .COM& echo .EXE& echo .BAT& ...". I have to put the whole expression in double quotes in order to escape the '&' for appearing in the batch file. The usebackq and the backwards quotes means that the backquoted string should be replaced with the output of the execution of its content. So in that fashion I'm able to get each element of the env. variable onto new lines. The rest is pretty straight forward.

Also, it supports wildcards:
C:\Users\davris>which.cmd *hi*
C:\Windows\System32\GRAPHICS.COM
C:\Windows\System32\SearchIndexer.exe
D:\bin\which.exe
D:\bin\which.cmd
PermalinkCommentswhich cmd technical batch for

reAnimator: Regular Expression FSA Visualizer

2007 Jul 22, 8:38Animated interactive graph of the FSM used to parse any regex and corresponding string you enter.PermalinkCommentsregex flash visualization fsm interactive howto language

Backup Notes

2007 Jul 13, 8:30I bought an external backup drive a few weekends ago. I've previously setup a Subversion repository so I decided to move everything into the repository and then back it up. So in went the contents of all of my %USERPROFILE% and ~ directories with a bit of sorting and pruning. Not too much though given its much easier to dump in everything and search for what I want then to take the time to examine and grade each file. What follows are the notes I took while setting this up. It takes me a bit of time to look up the help on each command so I figure I'll write it all down here for the benefit of myself and potentially others...

Setting Up the Backup Drive For Linux
I first changed the filesystem on the drive to ext3. I plugged it into my USB2.0 port and ran fdisk:

sudo fdisk /dev/sda

Useful commands I used to do this follow mostly in order:
m
help
p
print current partitions
d
delete current partition
n
create new partition (I used the defaults)
w
write changes and exit
Then I formatted for ext3.

sudo mkfs.ext3 /dev/sda1

I made it easy to mount:

sudo vim /etc/fstab
# added line to end:
/dev/sda1 /media/backup ext3 rw,user,noauto 0 0

I setup the directory structure on the disk

mount /media/backup
sudo mkdir /media/backup/users
sudo mkdir /media/backup/users/dave
sudo chown dave:dave /media/backup/users/dave


After all that its easy to make a copy of the Subversion repository:

mount /media/backup
cp -Rv /home/dave/svn /media/backup/users/dave/
umount /media/backup

Next on the agenda is to add a cron job to do this regularly.

Subversion Command Reference
On a machine that has local access to the Subversion repository you can check out a specific subdirectory as follows using the file scheme:

svn co file:///home/dave/svn/trunk/web/dave%40deletethis.net/public_html

Note also that although one of my directories is named 'dave@deletethis.net' Subversion requires the '@' to be percent-encoded.
Other useful subversion commands:
svn help
help
svn list file:///home/dave/svn/
list all files in root dir of svn depot
svn list -R file:///home/dave/svn/
list all files in svn depot
svn list -R file:///home/dave/svn/ | grep \/$
list all directories
svn status
List status of all files in the working copy directory as in - modified, not in repository, etc
svn update
Brings the working copy up to date wrt the repository
svn commit
Commit changes from the working copy to the repository
svn add / move / delete
Perform the specified action -- occurs immediately


Setting up Windows Client for Auto Auth into SVN
When using an SVN client on Windows via svn+ssh its useful to have the Windows automatically generate connections to the SVN server. I use putty on my Windows machines so I read the directions on using public keys with putty.

putty.exe dave@deletethis.net
cd .ssh
vim authorized_keys # leave the putty window open for now
puttygen.exe
Click the 'generate' button
Move the mouse around until finished
Copy text in 'Public key for pasting into OpenSSH authorized_keys file:' to putty window & save & close putty window
Enter Key passphrase & Comment in puttygen
Save the private key somewhere private
pageant.exe
'Add Key' the private key just saved.



Checking out using Tortoise SVN
On one of my Windows machines I've already installed Tortoise SVN. Checking out from my SVN repository was really easy. I just right clicked in Explorer in a directory and selected "SVN Checkout...". Then in the following dialog I entered the svn URI:

svn+ssh://dave@deletethis.net/home/dave/svn/trunk/web/dave%40deletethis.net/public_html/

Note again that the '@' that is part of the directory name is percent-encoded as '%40' while the '@' in the userinfo is not.

Windows Command Line Check Out
On my media center I didn't want to install Tortoise SVN so rather I used the command line tool. I setup pageant like before the only difficulty was getting the SVN command line tool to use putty. With the default configuration you can use the SVN_SSH environment variable to point at a compliant SSH command line tool. The trick is that its interpreted as a backslash escaped string. So I set mine thusly:

set SVN_SSH=C:\\users\\dave\\bin\\putty\\plink.exe

The escaping solved the vague error I received about not being able to create the tunnel.PermalinkCommentsbackup technical personal windows svn linux subversion

RFC 4790 Internet Application Protocol Collation Registry

2007 Jun 27, 11:27RFC defining a registry of string sorts that other future RFCs may reference.PermalinkCommentsrfc reference ietf internet protocol registry collation sort string locale

YouTube - Futurama: Deadly Deadly Bees

2007 Jun 4, 5:26Music video for Deadly Deadly Bees the lead song from the album based on the Futurama episode The String featuring clips from the episode. (youtube version take down, argh!)PermalinkCommentshumor music futurama video music-video

Reverse Latin Script

2007 May 11, 3:48Type in some latin script and you'll get back a string of Unicode characters that looks like its rotated 180 degrees. More info on exciting Unicode codepoints.PermalinkCommentsunicode javascript tool tools web language

New XSLT - IE7 XML Source View Upgrade Part 2

2007 May 11, 8:55Last time, I had written some resource tools to allow me to view and modify Windows module resources in my ultimate and noble quest to implement the XML content-type fragment in IE7. Using the resource tools I found that MSXML3.DLL isn't signed and that I can replace the XSLT embedded resource with my own, which is great news and means I could continue in my endevour. In the following I discuss how I came up with this replacement for IE7's XML source view.

At first I thought I could just modify the existing XSLT but it turns out that it isn't exactly an XSLT, rather its an IE5 XSL. I tried using the XSL to XSLT converter linked to on MSDN, however the resulting document still requires manual modification. But I didn't want to muck about in their weird language and I figured I could write my own XSLT faster than I could figure out how theirs worked.

I began work on the new XSLT and found it relatively easy to produce. First I got indenting working with all the XML nodes represented appropriately and different CSS classes attached to them to make it easy to do syntax highlighting. Next I added in some javascript to allow for closing and opening of elements. At this point my XSLT had the same features as the original XSL.

Next was the XML mimetype fragment which uses XPointer, a framework around various different schemes for naming parts of an XML document. I focused on the XPointer scheme which is an extended version of XPath. So I named my first task as getting XPaths working. Thankfully javascript running in the HTML document produced by running my XSLT on an XML document has access to the original XML document object via the document.XMLDocument property. From this this I can execute XPaths, however there's no builtin way to map from the XML nodes selected by the XPath to the HTML elements that I produced to represent them. So I created a recursive javascript function and XSLT named-template that both produce the same unique strings based on an XML node's position in the document. For instance 'a3-e2-e' is the name produced for the 3rd attribute of the second element of the root element of the XML document. When producing the HTML for an XML node, I add an 'id' attribute to the HTML with the unique string of the XML node. Then in javascript when I execute an XPath I can discover the unique string of each node in the selected set and map each of them to their corresponding positions in the HTML.

With the hard part out of the way I changed the onload to get the fragment of the URI of the current document, interpret it as an XPath and highlight and navigate to the selected nodes. I also added an interactive floating bar from which you can enter your own XPaths and do the same. On a related note, I found that when accessing XML files via the file URI scheme the fragment is stripped off and not available to the javascript.

The next steps are of course to actually implement XPointer framework parsing as well as the limited number of schemes that the XPointer framework specifies.PermalinkCommentsxml xpointer msxml res xpath xslt resource ie7 technical browser ie xsl

RandomGrammar

2007 Mar 28, 12:54Given an ABNF description of a grammar, RandomGrammar produces a random string that fits that grammar. This is a personal project I worked on previously and have just now made available again on my website.PermalinkCommentsme personal projects java randomgrammar abnf

ishida >> utilities: Unicode character pickers

2007 Feb 14, 3:12Another of Richard's tools that allows you to compose strings by visually picking characters from particular alphabets.PermalinkCommentsunicode tools picker encoding javascript language tool codepage i18n
Older EntriesNewer Entries Creative Commons License Some rights reserved.