encoding - Dave's Blog

Search
My timeline on Mastodon

Right-To-Left Override Twitter Name

2020 Oct 21, 3:50

Its rare to find devs anticipating Unicode control characters showing up in user input. And the most fun when unanticipated is the Right-To-Left Override character U+202E. Unicode characters have an implicit direction so that for example by default Hebrew characters are rendered from right to left, and English characters are rendered left to right. The override characters force an explicit direction for all the text that follows.

I chose my Twitter display name to include the HTML encoding of the Right-To-Left Override character #x202E; as a sort of joke or shout out to my favorite Unicode control character. I did not anticipate that some Twitter clients in some of their UI would fail to encode it correctly. There's no way I can remove that from my display name now.


Try it on Amazon.


How about pages that want to tell you about the U+202E. 


PermalinkCommentsUnicode

JavaScript Microsoft Store app StartPage

2017 Jun 22, 8:58

JavaScript Microsoft Store apps have some details related to activation that are specific to JavaScript Store apps and that are poorly documented which I’ll describe here.

StartPage syntax

The StartPage attributes in the AppxManifest.xml (Package/Applications/Application/@StartPage, Package/Applications/Extensions/Extension/@StartPage) define the HTML page entry point for that kind of activation. That is, Application/@StartPage defines the entry point for tile activation, Extension[@Category="windows.protocol"]/@StartPage defines the entry point for URI handling activation, etc. There are two kinds of supported values in StartPage attributes: relative Windows file paths and absolute URIs. If the attribute doesn’t parse as an absolute URI then it is instead interpreted as relative Windows file path.

This implies a few things that I’ll declare explicitly here. Windows file paths, unlike URIs, don’t have a query or fragment, so if you are using a relative Windows file path for your StartPage attribute you cannot include anything like ‘?param=value’ at the end. Absolute URIs use percent-encoding for reserved characters like ‘%’ and ‘#’. If you have a ‘#’ in your HTML filename then you need to percent-encode that ‘#’ for a URI and not for a relative Windows file path.

If you specify a relative Windows file path, it is turned into an ms-appx URI by changing all backslashes to forward slashes, percent-encoding reserved characters, and combining the result with a base URI of ms-appx:///. Accordingly the relative Windows file paths are relative to the root of your package. If you are using a relative Windows file path as your StartPage and need to switch to using a URI so you can include a query or fragment, you can follow the same steps above.

StartPage validity

The validity of the StartPage is not determined before activation. If the StartPage is a relative Windows file path for a file that doesn’t exist, or an absolute URI that is not in the Application Content URI Rules, or something that doesn’t parse as a Windows file path or URI, or otherwise an absolute URI that fails to resolve (404, bad hostname, etc etc) then the JavaScript app will navigate to the app’s navigation error page (perhaps more on that in a future blog post). Just to call it out explicitly because I have personally accidentally done this: StartPage URIs are not automatically included in the Application Content URI Rules and if you forget to include your StartPage in your ACUR you will always fail to navigate to that StartPage.

StartPage navigation

When your app is activated for a particular activation kind, the StartPage value from the entry in your app’s manifest that corresponds to that activation kind is used as the navigation target. If the app is not already running, the app is activated, navigated to that StartPage value and then the Windows.UI.WebUI.WebUIApplication activated event is fired (more details on the order of various events in a moment). If, however, your app is already running and an activation occurs, we navigate or don’t navigate to the corresponding StartPage depending on the current page of the app. Take the app’s current top level document’s URI and if after removing the fragment it already matches the StartPage value then we won’t navigate and will jump straight to firing the WebUIApplication activated event.

Since navigating the top-level document means destroying the current JavaScript engine instance and losing all your state, this behavior might be a problem for you. If so, you can use the MSApp.pageHandlesAllApplicationActivations(true) API to always skip navigating to the StartPage and instead always jump straight to firing the WebUIApplication activated event. This does require of course that all of your pages all handle all activation kinds about which any part of your app cares.

PermalinkComments

Tweet from David_Risney

2015 Dec 14, 12:28
First thought was: How did a character encoding issue get that far? pic.twitter.com/TIxrbLcylO
PermalinkComments

Tweet from David_Risney

2015 Mar 26, 4:20
Anyone know why Chrome percent-encodes single quote in URI query? http://jsfiddle.net/unLrqxso/1/  Its a reserved char so encoding changes URI.
PermalinkComments

Tweet from David_Risney

2015 Feb 12, 8:35
Unicode encoding holy wars via Mark Pilgrim / Emo Philips http://web.archive.org/web/20080209154953/http://diveintomark.org/archives/2004/07/06/nfc …
PermalinkComments

David_Risney: Learn about ES6 template strings: I didn't know about tagged templates allowing for HTML encoding in template strings

2015 Jan 21, 12:15
David Risney @David_Risney :
Learn about ES6 template strings: http://updates.html5rocks.com/2015/01/ES6-Template-Strings … I didn't know about tagged templates allowing for HTML encoding in template strings
PermalinkComments

URI functions in Windows Store Applications

2013 Jul 25, 1:00PermalinkCommentsc# c++ javascript technical uri windows windows-runtime windows-store

Stripe CTF - XSS, CSRF (Levels 4 & 6)

2012 Sep 10, 4:43

Level 4 and level 6 of the Stripe CTF had solutions around XSS.

Level 4

Code

> Registered Users 

    <%@registered_users.each do |user| %>
    <%last_active = user[:last_active].strftime('%H:%M:%S UTC') %>
    <%if @trusts_me.include?(user[:username]) %>

  • <%= user[:username] %>
    (password: <%= user[:password] %>, last active <%= last_active %>)
  • Issue

    The level 4 web application lets you transfer karma to another user and in doing so you are also forced to expose your password to that user. The main user page displays a list of users who have transfered karma to you along with their password. The password is not HTML encoded so we can inject HTML into that user's browser. For instance, we could create an account with the following HTML as the password which will result in XSS with that HTML:

    
    
    This HTML runs script that uses jQuery to post to the transfer URI resulting in a transfer of karma from the attacked user to the attacker user, and also the attacked user's password.

    Notes

    Code review red flags in this case included lack of encoding when using user controlled content to create HTML content, storing passwords in plain text in the database, and displaying passwords generally. By design the web app shows users passwords which is a very bad idea.

    Level 6

    Code

    
    

    ...

    def self.safe_insert(table, key_values)
    key_values.each do |key, value|
    # Just in case people try to exfiltrate
    # level07-password-holder's password
    if value.kind_of?(String) &&
    (value.include?('"') || value.include?("'"))
    raise "Value has unsafe characters"
    end
    end

    conn[table].insert(key_values)
    end

    Issue

    This web app does a much better job than the level 4 app with HTML injection. They use encoding whenever creating HTML using user controlled data, however they don't use encoding when injecting JSON data into script (see post_data initialization above). This JSON data is the last five most recent messages sent on the app so we get to inject script directly. However, the system also ensures that no strings we write contains single or double quotes so we can't get out of the string in the JSON data directly. As it turns out, HTML lets you jump out of a script block using no matter where you are in script. For instance, in the middle of a value in some JSON data we can jump out of script. But we still want to run script, so we can jump right back in. So the frame so far for the message we're going to post is the following:

    
    
    
    
PermalinkCommentscsrf encoding html internet javascript percent-encoding script security stripe-ctf technical web xss

Stripe CTF - SQL injections (Levels 0 & 3)

2012 Sep 5, 9:10

Stripe's web security CTF's level 0 and level 3 had SQL injection solutions described below.

Level 0

Code

app.get('/*', function(req, res) {
var namespace = req.param('namespace');

if (namespace) {
var query = 'SELECT * FROM secrets WHERE key LIKE ? || ".%"';
db.all(query, namespace, function(err, secrets) {

Issue

There's no input validation on the namespace parameter and it is injected into the SQL query with no encoding applied. This means you can use the '%' character as the namespace which is the wildcard character matching all secrets.

Notes

Code review red flag was using strings to query the database. Additional levels made this harder to exploit by using an API with objects to construct a query rather than strings and by running a query that only returned a single row, only ran a single command, and didn't just dump out the results of the query to the caller.

Level 3

Code

@app.route('/login', methods=['POST'])
def login():
username = flask.request.form.get('username')
password = flask.request.form.get('password')

if not username:
return "Must provide username\n"

if not password:
return "Must provide password\n"

conn = sqlite3.connect(os.path.join(data_dir, 'users.db'))
cursor = conn.cursor()

query = """SELECT id, password_hash, salt FROM users
WHERE username = '{0}' LIMIT 1""".format(username)
cursor.execute(query)

res = cursor.fetchone()
if not res:
return "There's no such user {0}!\n".format(username)
user_id, password_hash, salt = res

calculated_hash = hashlib.sha256(password + salt)
if calculated_hash.hexdigest() != password_hash:
return "That's not the password for {0}!\n".format(username)

Issue

There's little input validation on username before it is used to constrcut a SQL query. There's no encoding applied when constructing the SQL query string which is used to, given a username, produce the hashed password and the associated salt. Accordingly one can make username a part of a SQL query command which ensures the original select returns nothing and provide a new SELECT via a UNION that returns some literal values for the hash and salt. For instance the following in blue is the query template and the red is the username injected SQL code:

SELECT id, password_hash, salt FROM users WHERE username = 'doesntexist' UNION SELECT id, ('5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8') AS password_hash, ('word') AS salt FROM users WHERE username = 'bob' LIMIT 1
In the above I've supplied my own salt and hash such that my salt (word) plus my password (pass) hashed produce the hash I provided above. Accordingly, by providing the above long and interesting looking username and password as 'pass' I can login as any user.

Notes

Code review red flag is again using strings to query the database. Although this level was made more difficult by using an API that returns only a single row and by using the execute method which only runs one command. I was forced to (as a SQL noob) learn the syntax of SELECT in order to figure out UNION and how to return my own literal values.

PermalinkCommentssecurity sql sql-injection technical web-security

Stripe Web Security CTF Summary

2012 Aug 30, 5:00

I was the 546th person to complete Stripe's web security CTF and again had a ton of fun applying my theoretical knowledge of web security issues to the (semi-)real world. As I went through the levels I thought about what red flags jumped out at me (or should have) that I could apply to future code reviews:

Level Issue Code Review Red Flags
0 Simple SQL injection No encoding when constructing SQL command strings. Constructing SQL command strings instead of SQL API
1 extract($_GET); No input validation.
2 Arbitrary PHP execution No input validation. Allow file uploads. File permissions modification.
3 Advanced SQL injection Constructing SQL command strings instead of SQL API.
4 HTML injection, XSS and CSRF No encoding when constructing HTML. No CSRF counter measures. Passwords stored in plain text. Password displayed on site.
5 Pingback server doesn't need to opt-in n/a - By design protocol issue.
6 Script injection and XSS No encoding while constructing script. Deny list (of dangerous characters). Passwords stored in plain text. Password displayed on site.
7 Length extension attack Custom crypto code. Constructing SQL command string instead of SQL API.
8 Side channel attack Password handling code. Timing attack mitigation too clever.

More about each level in the future.

PermalinkCommentscode-review coding csrf html internet programming script security sql stripe technical web xss

HTTP Compression Documentation Reference

2012 Jun 13, 3:08
There's a lot of name reuse in HTTP compression so I've made the following to help myself keep it straight.
HTTP Content Coding Token gzip deflate compress
An encoding format produced by the file compression program "gzip" (GNU zip) The "zlib" format as described in RFC 1950. The encoding format produced by the common UNIX file compression program "compress".
Data Format GZIP file format ZLIB Compressed Data Format The compress program's file format
Compression Method Deflate compression method LZW
Deflate consists of LZ77 and Huffman coding

Compress doesn't seem to be supported by popular current browsers, possibly due to its past with patents.

Deflate isn't done correctly all the time. Some servers would send the deflate data format instead of the zlib data format and at least some versions of Internet Explorer expect deflate data format instead of zlib data format.

PermalinkCommentscompress compression deflate gzip http http-header technical zlib

URI Percent Encoding Ignorance Level 2 - There is no Unencoded URI

2012 Feb 20, 4:00

As a professional URI aficionado I deal with various levels of ignorance on URI percent-encoding (aka URI encoding, or URL escaping).

Getting into the more subtle levels of URI percent-encoding ignorance, folks try to apply their knowledge of percent-encoding to URIs as a whole producing the concepts escaped URIs and unescaped URIs. However there are no such things - URIs themselves aren't percent-encoded or decoded but rather contain characters that are percent-encoded or decoded. Applying percent-encoding or decoding to a URI as a whole produces a new and non-equivalent URI.

Instead of lingering on the incorrect concepts we'll just cover the correct ones: there's raw unencoded data, non-normal form URIs and normal form URIs. For example:

  1. http://example.com/%74%68%65%3F%70%61%74%68?query
  2. http://example.com/the%3Fpath?query
  3. "http", "example.com", "the?path", "query"

In the above (A) is not an 'encoded URI' but rather a non-normal form URI. The characters of 'the' and 'path' are percent-encoded but as unreserved characters specific in the RFC should not be encoded. In the normal form of the URI (B) the characters are decoded. But (B) is not a 'decoded URI' -- it still has an encoded '?' in it because that's a reserved character which by the RFC holds different meaning when appearing decoded versus encoded. Specifically in this case, it appears encoded which means it is data -- a literal '?' that appears as part of the path segment. This is as opposed to the decoded '?' that appears in the URI which is not part of the path but rather the delimiter to the query.

Usually when developers talk about decoding the URI what they really want is the raw data from the URI. The raw decoded data is (C) above. The only thing to note beyond what's covered already is that to obtain the decoded data one must parse the URI before percent decoding all percent-encoded octets.

Of course the exception here is when a URI is the raw data. In this case you must percent-encode the URI to have it appear in another URI. More on percent-encoding while constructing URIs later.

PermalinkCommentsurl encoding uri technical percent-encoding

URI Percent-Encoding Ignorance Level 1 - Purpose

2012 Feb 15, 4:00

As a professional URI aficionado I deal with various levels of ignorance on URI percent-encoding (aka URI encoding, or URL escaping).

Worse than the lame blog comments hating on percent-encoding is the shipping code which can do actual damage. In one very large project I won't name, I've fixed code that decodes all percent-encoded octets in a URI in order to get rid of pesky percents before calling ShellExecute. An unnamed developer with similar intent but clearly much craftier did the same thing in a loop until the string's length stopped changing. As it turns out percent-encoding serves a purpose and can't just be removed arbitrarily.

Percent-encoding exists so that one can represent data in a URI that would otherwise not be allowed or would be interpretted as a delimiter instead of data. For example, the space character (U+0020) is not allowed in a URI and so must be percent-encoded in order to appear in a URI:

  1. http://example.com/the%20path/
  2. http://example.com/the path/
In the above the first is a valid URI while the second is not valid since a space appears directly in the URI. Depending on the context and the code through which the wannabe URI is run one may get unexpected failure.

For an additional example, the question mark delimits the path from the query. If one wanted the question mark to appear as part of the path rather than delimit the path from the query, it must be percent-encoded:

  1. http://example.com/foo%3Fbar
  2. http://example.com/foo?bar
In the second, the question mark appears plainly and so delimits the path "/foo" from the query "bar". And in the first, the querstion mark is percent-encoded and so the path is "/foo%3Fbar".
PermalinkCommentsencoding uri technical ietf percent-encoding

URI Percent Encoding Ignorance Level 0 - Existence

2012 Feb 10, 4:00

As a professional URI aficionado I deal with various levels of ignorance on URI percent-encoding (aka URI encoding, or URL escaping). The basest ignorance is with respect to the mere existence of percent-encoding. Percents in URIs are special: they always represent the start of a percent-encoded octet. That is to say, a percent is always followed by two hex digits that represents a value between 0 and 255 and doesn't show up in a URI otherwise.

The IPv6 textual syntax for scoped addresses uses the '%' to delimit the zone ID from the rest of the address. When it came time to define how to represent scoped IPv6 addresses in URIs there were two camps: Folks who wanted to use the IPv6 format as is in the URI, and those who wanted to encode or replace the '%' with a different character. The resulting thread was more lively than what shows up on the IETF URI discussion mailing list. Ultimately we went with a percent-encoded '%' which means the percent maintains its special status and singular purpose.

PermalinkCommentsencoding uri technical ietf percent-encoding ipv6

draft-liman-tld-names-06 - Top Level Domain Name Specification

2011 Dec 4, 3:00

“The syntax for allowed Top-Level Domain (TLD) labels in the Domain Name System (DNS) is not clearly applicable to the encoding of Internationalised Domain Names (IDNs) as TLDs. This document provides a concise specification of TLD label syntax based on existing syntax documentation, extended minimally to accommodate IDNs.” Still irritated about arbitrary TLDs.

PermalinkCommentstechnical syntax dns tld idn

Indicating Character Encoding and Language for HTTP Header Field Parameters

2011 Nov 24, 7:45

From the document: ‘Appendix B. Implementation Report: The encoding defined in this document currently is used for two different HTTP header fields: “Content-Disposition”, defined in [RFC6266], and “Link”, defined in [RFC5988]. As the encoding is a profile/clarification of the one defined in [RFC2231] in 1997, many user agents already supported it for use in “Content-Disposition” when [RFC5987] got published.

Since the publication of [RFC5987], two more popular desktop user agents have added support for this encoding; see http://purl.org/
   NET/http/content-disposition-tests#encoding-2231-char for details. At this time, only one major desktop user agent (Safari) does not support it.

Note that the implementation in Internet Explorer 9 does not support the ISO-8859-1 encoding; this document revision acknowledges that UTF-8 is sufficient for expressing all code points, and removes the requirement to support ISO-8859-1.’

Yay for UTF-8!

PermalinkCommentstechnical http http-headers ie9 internationalization utf-8 encoding

Completion of IANA Selection of IDNA Prefix

2010 Dec 8, 6:44Description of how they picked 'xn--' as the ACE prefix for IDN. Shockingly elaborate =)PermalinkCommentsidn technical ace encoding unicode rfc ietf

RFC 6068 - The 'mailto' URI Scheme

2010 Oct 5, 2:54The mailto URI scheme finally gets its own RFC.PermalinkCommentsmailto uri url mail email technical rfc reference encoding

RFC 5987 - Character Set and Language Encoding for Hypertext Transfer Protocol (HTTP) Header Field Parameters

2010 Aug 13, 11:47Other characters sets for HTTP headers: "By default, message header field parameters in Hypertext Transfer Protocol (HTTP) messages cannot carry characters outside the ISO-8859-1 character set. RFC 2231 defines an encoding mechanism for use in Multipurpose Internet Mail Extensions (MIME) headers. This document specifies an encoding suitable for use in HTTP header fields that is compatible with a profile of the encoding defined in RFC 2231."PermalinkCommentsrfc language localization charset http technical reference http-header

Riviera Blog :: QTQR

2010 Apr 12, 10:44QR code degenerator allows you to mess with some pixels of a QR code or insert pictures without messing up the encoding.
PermalinkCommentsqrcode qr technical
Older Entries Creative Commons License Some rights reserved.