2009 Mar 17, 6:02"With LUA scripting included in the latest version of NES emulator FCEUX, Rusted Logic blogger Xkeeper has woven some black magic into Super Mario that gives you full keyboard/mouse control over your
surroundings."
video youtube videogames videogame hack mario 2009 Mar 16, 4:22"This data set, contributed by Google Inc., contains English word n-grams and their observed frequency counts. The length of the n-grams ranges from unigrams (single words) to five-grams. We expect
this data will be useful for statistical language modeling, e.g., for machine translation or speech recognition, as well as for other uses." 6 DVDs for only $150 with licensing restri... ok nm.
language google statistics database text 2009 Mar 6, 5:16
I've found while debugging networking in IE its often useful to quickly tell if a string is encoded in UTF-8. You can check for the Byte Order Mark (EF BB BF in UTF-8) but, I rarely see the BOM on
UTF-8 strings. Instead I apply a quick and dirty UTF-8 test that takes advantage of the well-formed UTF-8 restrictions.
Unlike other multibyte character encoding forms (see Windows supported character sets or IANA's list of character sets), for example Big5, where sticking together any two bytes is more likely than not to give a valid byte sequence, UTF-8 is more restrictive. And unlike
other multibyte character encodings, UTF-8 bytes may be taken out of context and one can still know that its a single byte character, the starting byte of a three byte sequence, etc.
The full rules for well-formed UTF-8 are a little too complicated for me to commit to memory. Instead I've got my own simpler (this is the quick part) set of rules that will be mostly correct (this
is the dirty part). For as many bytes in the string as you care to examine, check the most significant digit of the byte:
-
F:
-
This is byte 1 of a 4 byte encoded codepoint and must be followed by 3 trail bytes.
-
E:
-
This is byte 1 of a 3 byte encoded codepoint and must be followed by 2 trail bytes.
-
C..D:
-
This is byte 1 of a 2 byte encoded codepoint and must be followed by 1 trail byte.
-
8..B:
-
This is a trail byte.
-
0..7:
-
This is a single byte encoded codepoint.
The simpler rules can produce false positives in some cases: that is, they'll say a string is UTF-8 when in fact it might not be. But it won't produce false negatives. The following is table
from the
Unicode spec. that actually describes well-formed UTF-8.
Code Points
|
1st Byte
|
2nd Byte
|
3rd Byte
|
4th Byte
|
U+0000..U+007F
|
00..7F
|
U+0080..U+07FF
|
C2..DF
|
80..BF
|
U+0800..U+0FFF
|
E0
|
A0..BF
|
80..BF
|
U+1000..U+CFFF
|
E1..EC
|
80..BF
|
80..BF
|
U+D000..U+D7FF
|
ED
|
80..9F
|
80..BF
|
U+E000..U+FFFF
|
EE..EF
|
80..BF
|
80..BF
|
U+10000..U+3FFFF
|
F0
|
90..BF
|
80..BF
|
80..BF
|
U+40000..U+FFFFF
|
F1..F3
|
80..BF
|
80..BF
|
80..BF
|
U+100000..U+10FFFF
|
F4
|
80..8F
|
80..BF
|
80..BF
|
test technical unicode boring charset utf8 encoding 2009 Mar 4, 2:39
I knew that the command line tool subst would create virtual drives that map to existing directories but I didn't know that subst lets you name the virtual drives with characters that aren't
US-ASCII letters. For instance you can run 'subst 4: C:\windows' and then 'more 4:\win.ini' to dump C:\windows\win.ini. This also works for non-US-ASCII characters like, "C" (aka U+FF23, Fullwidth Latin Capital Letter C), which when displayed by cmd.exe via some best fit style character conversions looks just like the regular US-ASCII 'C'. None of Explorer, IE, or the common file
dialogs allow the use of these odd virtual drives -- just cmd.exe, so I'm not sure how this would ever be useful but I thought it was odd and I wanted to share.
cli technical boring subst windows 2009 Feb 23, 10:31"This is an experimental service that makes the Library of Congress Subject Headings available as linked-data using the SKOS vocabulary. The goal of lcsh.info is to encourage experimentation and use
of LCSH on the web with the hopes of informing a similar effort at the Library of Congress to make a continually updated version available. More information about the Linked Data effort can be found
on the W3C Wiki."
library-of-congress loc semanticweb web rdf metadata library api 2009 Feb 14, 5:41"Now, you can simply add this link tag to specify your preferred version... and Google will understand that the duplicates all refer to the canonical URL:
http://www.example.com/product.php?item=swedish-fish. Additional URL properties, like PageRank and related signals, are transferred as well."
via:mattb google link html url uri canonical canonicalization web 2009 Feb 11, 10:05"With the iPhone version of WhatTheFont you can use the phone's built-in camera to photograph the text in question (or choose an existing image from your photo albums)... After confirming which
characters are used in the image, the app provides a list of possible matching fonts."
font iphone camera typography 2008 Dec 29, 2:42Some funny stuff in here although I don't think anything's actually for sale.
art design via:thefangmonster humor clothing subversion cultural-disobediance 2008 Nov 7, 4:06'A live "drawing on whiteboard" version [of pong], mixing electronics with the joy of drawing on, wiping off and repositioning your playing bat. Check it out, thrill to the high-speed action and grin
at the ultimate use of a whiteboard: so much better than the usual business drivel that gets drawn on them.'
humor video interactive whiteboard pong game 2008 Oct 28, 11:23
If you view a plain text document in Internet Explorer 8, for instance the plain text version of Cory Doctorow's book
Little Brother and press F12 to bring up the developer toolbar, you can see that IE simply takes the plain text, sticks it inside a
tag, and renders it. This means that word wrapping isn't supplied and the only line breaks that appear are those in the document. However, since the text document is converted to HTML it means I can implement word wrap myself using a bookmarklet:
javascript:function ww() { var preTag = document.getElementsByTagName('pre')[0]; preTag.style.fontFamily="arial"; preTag.style.wordWrap='break-word'; }; ww();
After adding a favorite and setting the favorite's URL to the previous, I can view plain text documents, and select my Word Wrap favorite to apply word wrap and non-fixed width font.
browser technical ie wordwrap 2008 Oct 13, 3:26'But now is a good time to announce that we've decided to officially call the next version of Windows, "Windows 7."' No new name for Win7. That's one less thing for me to remember.
windows windows7 blog vista microsoft 2008 Oct 10, 1:32Xkcd providing answers to questions that I forgot I had, like what is the answer to the lawn-sprinkler question from Surely You're Joking Mr. Feynman. "Feynman used to tell a story about a simple
lawn-sprinkler physics problem. The nifty thing about the problem was that the answer was immediately obvious, but to some people it was immediately obvious one way and to some it was immediately
obvious the other. (For the record, the answer to Feynman problem, which he never tells you in his book, was that the sprinkler doesn't move at all. Moreover, he only brought it up to start an
argument to act as a diversion while he seduced your mother in the other room.)"
humor feynman comic blog xkcd physics science math 2008 Oct 7, 12:21
Last Thursday I saw a bunch of college friends that I hadn't seen in a while, despite all of us working at Microsoft, and Saul and Ciera who were visiting. We had dinner at Typhoon! which I haven't
been to in quite a while. Daniil and Val brought their cute child. I got to see Charlie and Matt who I'm not sure I've seen since my 25th birthday. There was much nerdiness. I need to remember to
organize such a night myself sometime in near future so I don't have to wait another year to see them.
On the weekend Sarah and I went out to dinner at Carnegie's, a former
public library in Ballard, Seattle that's now a restaurant. I saw the restaurant's website in Matt's delicious links and thought it looked interesting. The exterior and entryway look like a public
library, but just inside its redone as a sort of modern version of french classical with a bar and two dining rooms. No pictures since my replacement camera only arrived today, but there are
photos available. They serve french cuisine which was good and
not as expensive as I would have expected. An interesting place, although its a bit of a drive and I'm not sure if we'll be going back soon.
carnegies personal restaurant weekend nontechnical 2008 Oct 3, 5:29I thought the disemvowelment of trolls was a pretty funny punishment -- much better than simply removing the comment: "Disemvowelment is - obviously enough - the act of removing the vowels from a
passage of text, as well as a pun on the word 'disembowelling'. A number of blogs and websites do this to offensive text which has been placed in their 'comments' section. ... This site exists
because I couldn't resists the challenge of trying to re-emvowel disemvowelled text. This is a challenging task, as the disemvowelled word 'dg' may well have been 'dog', but also 'dig', 'dug',
'doge', diego' and so on. I have a first cut of this functionality at the re-emvowel link at the side of the page. A more advanced version is in progress."
tool disemvowelment web comment forum troll language 2008 Sep 3, 9:49Notes on how COM classes are registered on 64bit versions of Windows. Whole swaths of the registry (among other things) are redirected to a subnode named Wow6432Node when you're a 32bit process
running on a 64bit Windows.
msdn registry development microsoft 64bit 2008 Aug 22, 5:35Photosynth now available and easy to use: "Photosynth, a technology demo from Microsoft Live Labs, has graduated from its "ooh, that's pretty" status to being a viable Web service for consumers. The
technology, which takes a grouping of photographs and stitches them into a faux 3D environment, can now be implemented with photos you've taken on your digital camera or mobile phone, and converted
right on your computer. Previously, the process of stitching these photos together took weeks of processing on specially configured server arrays. With its latest version, Microsoft has managed to
shrink that into around the time it takes to upload your photos."
via:felix42 photosynth photos photography 3d microsoft free tool 2008 Aug 14, 2:23Lawrence Lessig's video presentation on history of Creative Commons.
lawrence-lessig lessig video legal law cc history copyright 2008 Aug 6, 2:56Online and offline YouTube link to video download and conversion tools.
youtube video hack lifehacker flv converter 2008 Jul 22, 5:17Down on the Farm by Charles Stross. Short scifi story with elements of steampunk and a math/csc based version of the occult.
math scifi fiction free tor literature charles-stross 2008 Jul 9, 9:59Lively is apparently a coming soon Google app that's like a web page embeddable version of Second Life.
via:felix42 second-life lively google webservices web2.0