Learn XML DTD and DOCTYPE — DTD Syntax, Element/Attribute Declarations, Entities, Validation, Limitations

Q: How do I prevent entity expansion attacks?

Disable external entity processing: DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);

XML Technologies

XML DTD and DOCTYPE — DTD Syntax, Element/Attribute Declarations, Entities, Validation, Limitations

DodaTech Updated Jun 20, 2026 8 min read

Document Type Definitions (DTD) define the structure and legal elements of an XML document. While older than XML Schema, DTD is still widely used for simple validation and entity management. This guide covers DTD syntax, validation, entities, and when to use DTD vs XSD.

What You’ll Learn

You’ll write DTD declarations for elements, attributes, and entities, use internal and external DTD subsets, validate XML documents against DTDs, understand DTD limitations, and choose between DTD and XSD for validation.

Learning Path

    flowchart LR
  A[XML Namespaces] --> B[DTD & DOCTYPE<br/>You are here]
  B --> C[XSD Schema]
  C --> D[XQuery]
  style B fill:#f90,color:#fff

DOCTYPE Declaration

The DOCTYPE appears before the root element:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE bookstore SYSTEM "bookstore.dtd">
<bookstore>
    <book category="fiction">
        <title lang="en">DTD Guide</title>
        <author>Alice Smith</author>
        <price>29.99</price>
    </book>
</bookstore>

Internal vs External DTD

<!-- Internal DTD — declarations in the same file -->
<?xml version="1.0"?>
<!DOCTYPE bookstore [
    <!ELEMENT bookstore (book+)>
    <!ELEMENT book (title, author, price)>
    <!ATTLIST book category CDATA #REQUIRED>
    <!ELEMENT title (#PCDATA)>
    <!ATTLIST title lang CDATA "en">
    <!ELEMENT author (#PCDATA)>
    <!ELEMENT price (#PCDATA)>
]>
<bookstore>
    <book category="fiction">
        <title>DTD Guide</title>
        <author>Alice</author>
        <price>29.99</price>
    </book>
</bookstore>

<!-- External DTD — declarations in a separate .dtd file -->
<!DOCTYPE bookstore SYSTEM "bookstore.dtd">

Element Declarations

<!ELEMENT element-name content-model>

Content Models

Model	Syntax	Example
Empty	`EMPTY`	`<!ELEMENT br EMPTY>`
Any	`ANY`	`<!ELEMENT div ANY>`
Text only	`(#PCDATA)`	`<!ELEMENT title (#PCDATA)>`
Sequence	`(a, b, c)`	`<!ELEMENT book (title, author, price)>`
Choice	`(a \| b)`	`<!ELEMENT contact (phone
Mixed	`(#PCDATA \| a)*`	`<!ELEMENT p (#PCDATA

Occurrence Indicators

Indicator	Meaning	Example
None	Exactly one	`(title)`
`?`	Zero or one	`(subtitle?)`
`*`	Zero or more	`(author*)`
`+`	One or more	`(book+)`

Complete DTD Example

<!-- bookstore.dtd -->
<!ELEMENT bookstore (book+)>
<!ELEMENT book (title, subtitle?, author+, editor*, price, chapters)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT subtitle (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ELEMENT editor (#PCDATA)>
<!ELEMENT price (#PCDATA)>
<!ELEMENT chapters (chapter+)>
<!ELEMENT chapter (title, content)>
<!ELEMENT content (#PCDATA)>

Attribute Declarations

<!ATTLIST element-name
    attr-name attr-type default-decl
    attr-name attr-type default-decl
    ...>

Attribute Types

Type	Description	Example
`CDATA`	Character data	`CDATA`
`(val1\|val2)`	Enumerated	`(fiction\|non-fiction)`
`ID`	Unique identifier	`ID`
`IDREF`	Reference to an ID	`IDREF`
`IDREFS`	Multiple IDREFs	`IDREFS`
`NMTOKEN`	Name token	`NMTOKEN`
`ENTITY`	Entity reference	`ENTITY`

Default Declarations

Declaration	Meaning
`#REQUIRED`	Attribute must be present
`#IMPLIED`	Attribute is optional
`#FIXED "value"`	Attribute has fixed value
`"default"`	Attribute has default value

Attribute Declaration Examples

<!ATTLIST book
    category (fiction|non-fiction|reference) "fiction"
    id ID #REQUIRED
    isbn CDATA #REQUIRED
    lang NMTOKEN "en"
    edition CDATA #IMPLIED
>

<!ATTLIST chapter
    id ID #REQUIRED
    parent IDREF #IMPLIED
>

Entity Declarations

Entities are shortcuts for reusable content:

General Entities (used in content)

<!-- Internal entity -->
<!ENTITY author "Alice Smith">
<!ENTITY copyright "Copyright 2026 DodaTech">

<!-- External entity -->
<!ENTITY intro SYSTEM "intro.xml">
<!ENTITY legal SYSTEM "legal.txt">

<!-- Usage in XML -->
<book>
    <author>&author;</author>
    <intro>&intro;</intro>
    <legal>&legal;</legal>
</book>

Parameter Entities (used in DTD)

<!-- Parameter entity declaration -->
<!ENTITY % inline "(b | i | u)">
<!ENTITY % common-attrs
    "id ID #IMPLIED
     class CDATA #IMPLIED
     style CDATA #IMPLIED">

<!-- Usage in DTD -->
<!ELEMENT paragraph (#PCDATA | %inline;)*>
<!ATTLIST paragraph %common-attrs;>

Validation with DTD

Java Validation

import javax.xml.parsers.*;
import org.xml.sax.*;

public class DtdValidator {
    public static void main(String[] args) throws Exception {
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setValidating(true);  // Enable DTD validation
        factory.setNamespaceAware(true);

        DocumentBuilder builder = factory.newDocumentBuilder();

        // Set error handler to capture validation errors
        builder.setErrorHandler(new ErrorHandler() {
            @Override
            public void warning(SAXParseException e) {
                System.out.println("WARNING: " + e.getMessage());
            }

            @Override
            public void error(SAXParseException e) {
                System.out.println("ERROR: " + e.getMessage());
            }

            @Override
            public void fatalError(SAXParseException e) {
                System.out.println("FATAL: " + e.getMessage());
            }
        });

        try {
            builder.parse("books.xml");
            System.out.println("Document is valid.");
        } catch (SAXException e) {
            System.out.println("Validation failed: " + e.getMessage());
        }
    }
}

Python Validation

from lxml import etree

def validate_with_dtd(xml_file, dtd_file=None):
    """Validate XML against DTD"""
    try:
        if dtd_file:
            # External DTD
            dtd = etree.DTD(open(dtd_file))
            tree = etree.parse(xml_file)
            if dtd.validate(tree):
                print("Document is valid.")
            else:
                print(f"Validation error: {dtd.error_log.last_error}")
        else:
            # Internal DTD (DOCTYPE in file)
            tree = etree.parse(xml_file)
            # If DTD is embedded, it's validated automatically
            print("Document parsed (internal DTD checked).")
    except etree.XMLSyntaxError as e:
        print(f"Parse error: {e}")

# Usage
validate_with_dtd('books.xml', 'bookstore.dtd')

DTD Limitations

Limitation	Explanation	XSD Solution
No data types	All text is CDATA	Rich type system (string, int, date, etc.)
No namespace support	Can’t handle namespaces	Full namespace integration
Limited cardinality	Only ?, *, +	minOccurs/maxOccurs
No inheritance	No type extension	Complex type extension/restriction
No key/unique	Only ID/IDREF (global)	xs:key, xs:unique, xs:keyref
No regex patterns	No pattern constraints	xs:pattern facet
Single namespace	One target namespace per DTD	Multiple namespaces
Not XML syntax	DTD uses non-XML syntax	XSD is XML-based

Common DTD Mistakes

1. ID/IDREF Not Being Unique

ID values must be unique across the entire document. Two elements with the same ID violates the DTD. Use namespaced IDs (e.g., ch01, ch02) to ensure uniqueness.

2. Content Model Order Mismatch

<!ELEMENT book (title, author, price)> requires elements in exactly that order. <book><author>...</author><title>...</title></book> fails validation.

3. Entity Expansion Attacks (Billion Laughs)

<!ENTITY lol "lol">
<!ENTITY lol2 "&lol;&lol;&lol;">
<!ENTITY lol3 "&lol2;&lol2;&lol2;">

This is the “billion laughs” attack — entity expansion can exhaust memory. Disable external entity processing when parsing untrusted XML.

4. Not Declaring All Elements

DTD validation fails on any undeclared element. Ensure every element used in the XML is declared in the DTD, including nested elements.

5. External Entity Security Risk

External entities (<!ENTITY data SYSTEM "file:///etc/passwd">) can read local files. Disable external entity resolution in production parsers.

Practice Questions

1. What is the difference between #PCDATA and CDATA? #PCDATA (parsed character data) is text content that may contain child elements (in mixed content). CDATA (attribute type) is text that cannot contain child elements. #PCDATA is for element content, CDATA is for attribute values.

2. How do internal and external DTD subsets differ? Internal DTD is embedded in the XML file (immediate validation, no external files needed). External DTD is a separate .dtd file (shared across documents, easier maintenance).

3. What are parameter entities used for? Parameter entities define reusable parts within a DTD itself (not in the XML document). They’re used like macros for DTD development — <!ENTITY % common "attr1 CDATA #IMPLIED attr2 CDATA #IMPLIED">.

4. Why would you choose DTD over XSD? DTD is simpler for basic validation, has widespread parser support, supports entities (impossible in XSD), and doesn’t require namespace management. Use DTD for simple document validation. Use XSD for data-oriented validation with types.

5. Challenge: Design a DTD for a configuration file format that supports nested sections, key-value pairs, comments, and inclusion of other config files via entities. Answer: Use mixed content models for sections ((#PCDATA | section | entry)*), attribute declarations for key-value pairs, and external entity references for file inclusion. Declare all element types with appropriate content models.

Mini Project: DTD Generator

Create a tool that generates a DTD from sample XML:

from lxml import etree
from collections import defaultdict

class DtdGenerator:
    """Generate a DTD from sample XML documents"""

    def __init__(self):
        self.elements = defaultdict(lambda: {
            'children': set(),
            'attributes': {},
            'has_text': False,
            'min_occur': defaultdict(int),
            'max_occur': defaultdict(int)
        })

    def analyze(self, xml_file):
        tree = etree.parse(xml_file)
        self._analyze_element(tree.getroot())
        return self

    def _analyze_element(self, element, path=''):
        name = etree.QName(element).localname
        info = self.elements[name]

        # Check for text content
        if element.text and element.text.strip():
            info['has_text'] = True

        # Analyze attributes
        for attr_name in element.attrib:
            attr_type = 'CDATA'
            # Simple type detection
            if element.get(attr_name) in ('true', 'false'):
                attr_type = '(true|false)'
            elif element.get(attr_name).isdigit():
                attr_type = 'CDATA'  # DTD has no numeric types
            info['attributes'][attr_name] = attr_type

        # Analyze children
        child_counts = defaultdict(int)
        for child in element:
            child_name = etree.QName(child).localname
            info['children'].add(child_name)
            child_counts[child_name] += 1

        for child_name, count in child_counts.items():
            if info['min_occur'][child_name] == 0 or \
               count < info['min_occur'][child_name]:
                info['min_occur'][child_name] = count
            if count > info['max_occur'][child_name]:
                info['max_occur'][child_name] = count

        # Recurse
        for child in element:
            self._analyze_element(child)

    def generate(self, root_name='root'):
        lines = ['<!-- Generated DTD -->']
        lines.append(f'<!ELEMENT {root_name} ('
                     f'{" | ".join(self.elements.keys())})>\n')

        for elem_name, info in sorted(self.elements.items()):
            # Content model
            if info['has_text'] and info['children']:
                children = ' | '.join(
                    sorted(info['children']))
                model = f'(#PCDATA | {children})*'
            elif info['has_text']:
                model = '(#PCDATA)'
            elif info['children']:
                children = ', '.join(
                    sorted(info['children']))
                model = f'({children})'
            else:
                model = 'EMPTY'

            lines.append(f'<!ELEMENT {elem_name} {model}>')

            # Attributes
            for attr_name, attr_type in sorted(
                info['attributes'].items()
            ):
                if attr_type.startswith('('):
                    default = '#IMPLIED'
                else:
                    default = '#IMPLIED'
                lines.append(
                    f'<!ATTLIST {elem_name} '
                    f'{attr_name} {attr_type} {default}>')

            lines.append('')  # blank line

        return '\n'.join(lines)


# Usage
generator = DtdGenerator()
generator.analyze('sample.xml')
print(generator.generate('bookstore'))

FAQ

Is DTD still relevant today?

Yes — DTD is still used for: (1) simple document validation, (2) entity management (XSD can’t define entities), (3) legacy systems, (4) RSS/Atom feeds, (5) SGML-derived formats (DocBook, TEI).

Can DTD and XSD be used together?

No — a document validates against either a DTD or an XSD, not both. The DOCTYPE triggers DTD validation; xsi:schemaLocation triggers XSD validation.

How do I prevent entity expansion attacks?

Disable external entity processing:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);

What’s the difference between SYSTEM and PUBLIC identifiers?

SYSTEM is a URL to the DTD file. PUBLIC is a publicly known identifier that parsers may use to look up a local copy. Most parsers still need SYSTEM as a fallback.

Can I have multiple DTDs for one document?

No — one document has one DOCTYPE declaration, which references at most one DTD. You can use parameter entities to include other DTD fragments.

Does DTD support namespaces?

No — DTD has no concept of namespaces. To validate namespace-qualified documents, use XSD or RelaxNG.

What’s Next

XQuery Basics

XML Digital Signatures

XML Validation

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro. Updated 2026-06-20.

Previous XML Namespaces — xmlns, Default Namespaces, Prefix Binding, Scope Rules, XPath with Namespaces Next XQuery — FLWOR Expressions, XPath Integration, Full-Text Search, Update Facilities, Use Cases

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse XML Technologies