12
Jun

User generated output

   Posted by: Joshbw   in Browser/Web Security

We all encounter websites that accept user generated comments and this is often a a source of cross site scripting. In the past couple of weeks I have contacted two larger websites about this. In many cases there is a simple solution- whitelist input validation (which becomes slightly more complex if supporting internationalization) and output HTML entity encoding.

In ASP.Net a simple call to System.Web.HttpUtility.HtmlEncode will handle the output encoding for you. I actually like PHP’s htmlentities function a bit more, as you have control over how quotes are encoded and can additionally choose a character set (I very rarely can praise PHP over ASP.net, but they deserve credit for that). Java apparently doesn’t have a native method for entity encoding, which is dumb, but the nice folks at OWASP have provided a method. One caveat, like much of the OWASP material it is very UTF-8 centric, so it is less than optimal for international support and alternate character sets.

The problem is that many websites like customers to be able to supply some HTML that does render. Social networking sites love all the nice HTML based output that a billion online quizzes generate, and users like to be able to have some formatting other than plaintext (wiki does this right with their custom markup). That shoots whitelist validation of characters right down, and prevents entity encoding. Instead it requires custom parsing to validate which tags are used, and further parsing to make sure they aren’t used in a dangerous way. For example <a href=”something” onmouseover=”alert(’Give me a cookie’);”>this is pertinent </a> needs to be parsed to make sure that nothing malicious was in the legitimate attack. It isn’t as simple as just blacklisting any use of on* to stop events, since many of these sites want style tags to be used as well (I can’t imagine why, someone can do serious defacement this way), and good ol’ IE has this nice Expression value that can execute javascript inside of the style (the problem with this microsoft, is that the only people who use it anymore are people misusing it).

When one includes all of the myriad of ways to use encoding to obfuscate attacks, accepting selective HTML, especially if you accept more than just UTF-8 input, results in a seriously complex validation engine. Complexity is the enemy of security. Both of the websites I contacted (having used an isolated test area impacting only me- remember kids, it isn’t ok to do web vulnerability testing where you can ever impact another user) accepted some HTML tags and had flaws in how they validated them.

~Joshbw

This entry was posted on Thursday, June 12th, 2008 at 10:05 am and is filed under Browser/Web Security. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Leave a reply

Name (*)
Mail (will not be published) (*)
URI
Comment