XML DTD and DOCTYPE — DTD Syntax, Element/Attribute Declarations, Entities, Validation, Limitations
Document Type Definitions (DTD) define the structure and legal elements of an XML document. While older than XML Schema, DTD is still widely used for simple validation and entity management. This guide covers DTD syntax, validation, entities, and when to use DTD vs XSD.
What You’ll Learn
You’ll write DTD declarations for elements, attributes, and entities, use internal and external DTD subsets, validate XML documents against DTDs, understand DTD limitations, and choose between DTD and XSD for validation.
Learning Path
flowchart LR
A[XML Namespaces] --> B[DTD & DOCTYPE<br/>You are here]
B --> C[XSD Schema]
C --> D[XQuery]
style B fill:#f90,color:#fff
DOCTYPE Declaration
The DOCTYPE appears before the root element:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE bookstore SYSTEM "bookstore.dtd">
<bookstore>
<book category="fiction">
<title lang="en">DTD Guide</title>
<author>Alice Smith</author>
<price>29.99</price>
</book>
</bookstore>Internal vs External DTD
<!-- Internal DTD — declarations in the same file -->
<?xml version="1.0"?>
<!DOCTYPE bookstore [
<!ELEMENT bookstore (book+)>
<!ELEMENT book (title, author, price)>
<!ATTLIST book category CDATA #REQUIRED>
<!ELEMENT title (#PCDATA)>
<!ATTLIST title lang CDATA "en">
<!ELEMENT author (#PCDATA)>
<!ELEMENT price (#PCDATA)>
]>
<bookstore>
<book category="fiction">
<title>DTD Guide</title>
<author>Alice</author>
<price>29.99</price>
</book>
</bookstore><!-- External DTD — declarations in a separate .dtd file -->
<!DOCTYPE bookstore SYSTEM "bookstore.dtd">Element Declarations
<!ELEMENT element-name content-model>Content Models
| Model | Syntax | Example |
|---|---|---|
| Empty | EMPTY | <!ELEMENT br EMPTY> |
| Any | ANY | <!ELEMENT div ANY> |
| Text only | (#PCDATA) | <!ELEMENT title (#PCDATA)> |
| Sequence | (a, b, c) | <!ELEMENT book (title, author, price)> |
| Choice | (a | b) | `<!ELEMENT contact (phone |
| Mixed | (#PCDATA | a)* | `<!ELEMENT p (#PCDATA |
Occurrence Indicators
| Indicator | Meaning | Example |
|---|---|---|
| None | Exactly one | (title) |
? | Zero or one | (subtitle?) |
* | Zero or more | (author*) |
+ | One or more | (book+) |
Complete DTD Example
<!-- bookstore.dtd -->
<!ELEMENT bookstore (book+)>
<!ELEMENT book (title, subtitle?, author+, editor*, price, chapters)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT subtitle (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ELEMENT editor (#PCDATA)>
<!ELEMENT price (#PCDATA)>
<!ELEMENT chapters (chapter+)>
<!ELEMENT chapter (title, content)>
<!ELEMENT content (#PCDATA)>Attribute Declarations
<!ATTLIST element-name
attr-name attr-type default-decl
attr-name attr-type default-decl
...>Attribute Types
| Type | Description | Example |
|---|---|---|
CDATA | Character data | CDATA |
(val1|val2) | Enumerated | (fiction|non-fiction) |
ID | Unique identifier | ID |
IDREF | Reference to an ID | IDREF |
IDREFS | Multiple IDREFs | IDREFS |
NMTOKEN | Name token | NMTOKEN |
ENTITY | Entity reference | ENTITY |
Default Declarations
| Declaration | Meaning |
|---|---|
#REQUIRED | Attribute must be present |
#IMPLIED | Attribute is optional |
#FIXED "value" | Attribute has fixed value |
"default" | Attribute has default value |
Attribute Declaration Examples
<!ATTLIST book
category (fiction|non-fiction|reference) "fiction"
id ID #REQUIRED
isbn CDATA #REQUIRED
lang NMTOKEN "en"
edition CDATA #IMPLIED
>
<!ATTLIST chapter
id ID #REQUIRED
parent IDREF #IMPLIED
>Entity Declarations
Entities are shortcuts for reusable content:
General Entities (used in content)
<!-- Internal entity -->
<!ENTITY author "Alice Smith">
<!ENTITY copyright "Copyright 2026 DodaTech">
<!-- External entity -->
<!ENTITY intro SYSTEM "intro.xml">
<!ENTITY legal SYSTEM "legal.txt">
<!-- Usage in XML -->
<book>
<author>&author;</author>
<intro>&intro;</intro>
<legal>&legal;</legal>
</book>Parameter Entities (used in DTD)
<!-- Parameter entity declaration -->
<!ENTITY % inline "(b | i | u)">
<!ENTITY % common-attrs
"id ID #IMPLIED
class CDATA #IMPLIED
style CDATA #IMPLIED">
<!-- Usage in DTD -->
<!ELEMENT paragraph (#PCDATA | %inline;)*>
<!ATTLIST paragraph %common-attrs;>Validation with DTD
Java Validation
import javax.xml.parsers.*;
import org.xml.sax.*;
public class DtdValidator {
public static void main(String[] args) throws Exception {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(true); // Enable DTD validation
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
// Set error handler to capture validation errors
builder.setErrorHandler(new ErrorHandler() {
@Override
public void warning(SAXParseException e) {
System.out.println("WARNING: " + e.getMessage());
}
@Override
public void error(SAXParseException e) {
System.out.println("ERROR: " + e.getMessage());
}
@Override
public void fatalError(SAXParseException e) {
System.out.println("FATAL: " + e.getMessage());
}
});
try {
builder.parse("books.xml");
System.out.println("Document is valid.");
} catch (SAXException e) {
System.out.println("Validation failed: " + e.getMessage());
}
}
}Python Validation
from lxml import etree
def validate_with_dtd(xml_file, dtd_file=None):
"""Validate XML against DTD"""
try:
if dtd_file:
# External DTD
dtd = etree.DTD(open(dtd_file))
tree = etree.parse(xml_file)
if dtd.validate(tree):
print("Document is valid.")
else:
print(f"Validation error: {dtd.error_log.last_error}")
else:
# Internal DTD (DOCTYPE in file)
tree = etree.parse(xml_file)
# If DTD is embedded, it's validated automatically
print("Document parsed (internal DTD checked).")
except etree.XMLSyntaxError as e:
print(f"Parse error: {e}")
# Usage
validate_with_dtd('books.xml', 'bookstore.dtd')DTD Limitations
| Limitation | Explanation | XSD Solution |
|---|---|---|
| No data types | All text is CDATA | Rich type system (string, int, date, etc.) |
| No namespace support | Can’t handle namespaces | Full namespace integration |
| Limited cardinality | Only ?, *, + | minOccurs/maxOccurs |
| No inheritance | No type extension | Complex type extension/restriction |
| No key/unique | Only ID/IDREF (global) | xs:key, xs:unique, xs:keyref |
| No regex patterns | No pattern constraints | xs:pattern facet |
| Single namespace | One target namespace per DTD | Multiple namespaces |
| Not XML syntax | DTD uses non-XML syntax | XSD is XML-based |
Common DTD Mistakes
1. ID/IDREF Not Being Unique
ID values must be unique across the entire document. Two elements with the same ID violates the DTD. Use namespaced IDs (e.g., ch01, ch02) to ensure uniqueness.
2. Content Model Order Mismatch
<!ELEMENT book (title, author, price)> requires elements in exactly that order. <book><author>...</author><title>...</title></book> fails validation.
3. Entity Expansion Attacks (Billion Laughs)
<!ENTITY lol "lol">
<!ENTITY lol2 "&lol;&lol;&lol;">
<!ENTITY lol3 "&lol2;&lol2;&lol2;">This is the “billion laughs” attack — entity expansion can exhaust memory. Disable external entity processing when parsing untrusted XML.
4. Not Declaring All Elements
DTD validation fails on any undeclared element. Ensure every element used in the XML is declared in the DTD, including nested elements.
5. External Entity Security Risk
External entities (<!ENTITY data SYSTEM "file:///etc/passwd">) can read local files. Disable external entity resolution in production parsers.
Practice Questions
1. What is the difference between #PCDATA and CDATA? #PCDATA (parsed character data) is text content that may contain child elements (in mixed content). CDATA (attribute type) is text that cannot contain child elements. #PCDATA is for element content, CDATA is for attribute values.
2. How do internal and external DTD subsets differ? Internal DTD is embedded in the XML file (immediate validation, no external files needed). External DTD is a separate .dtd file (shared across documents, easier maintenance).
3. What are parameter entities used for?
Parameter entities define reusable parts within a DTD itself (not in the XML document). They’re used like macros for DTD development — <!ENTITY % common "attr1 CDATA #IMPLIED attr2 CDATA #IMPLIED">.
4. Why would you choose DTD over XSD? DTD is simpler for basic validation, has widespread parser support, supports entities (impossible in XSD), and doesn’t require namespace management. Use DTD for simple document validation. Use XSD for data-oriented validation with types.
5. Challenge: Design a DTD for a configuration file format that supports nested sections, key-value pairs, comments, and inclusion of other config files via entities.
Answer: Use mixed content models for sections ((#PCDATA | section | entry)*), attribute declarations for key-value pairs, and external entity references for file inclusion. Declare all element types with appropriate content models.
Mini Project: DTD Generator
Create a tool that generates a DTD from sample XML:
from lxml import etree
from collections import defaultdict
class DtdGenerator:
"""Generate a DTD from sample XML documents"""
def __init__(self):
self.elements = defaultdict(lambda: {
'children': set(),
'attributes': {},
'has_text': False,
'min_occur': defaultdict(int),
'max_occur': defaultdict(int)
})
def analyze(self, xml_file):
tree = etree.parse(xml_file)
self._analyze_element(tree.getroot())
return self
def _analyze_element(self, element, path=''):
name = etree.QName(element).localname
info = self.elements[name]
# Check for text content
if element.text and element.text.strip():
info['has_text'] = True
# Analyze attributes
for attr_name in element.attrib:
attr_type = 'CDATA'
# Simple type detection
if element.get(attr_name) in ('true', 'false'):
attr_type = '(true|false)'
elif element.get(attr_name).isdigit():
attr_type = 'CDATA' # DTD has no numeric types
info['attributes'][attr_name] = attr_type
# Analyze children
child_counts = defaultdict(int)
for child in element:
child_name = etree.QName(child).localname
info['children'].add(child_name)
child_counts[child_name] += 1
for child_name, count in child_counts.items():
if info['min_occur'][child_name] == 0 or \
count < info['min_occur'][child_name]:
info['min_occur'][child_name] = count
if count > info['max_occur'][child_name]:
info['max_occur'][child_name] = count
# Recurse
for child in element:
self._analyze_element(child)
def generate(self, root_name='root'):
lines = ['<!-- Generated DTD -->']
lines.append(f'<!ELEMENT {root_name} ('
f'{" | ".join(self.elements.keys())})>\n')
for elem_name, info in sorted(self.elements.items()):
# Content model
if info['has_text'] and info['children']:
children = ' | '.join(
sorted(info['children']))
model = f'(#PCDATA | {children})*'
elif info['has_text']:
model = '(#PCDATA)'
elif info['children']:
children = ', '.join(
sorted(info['children']))
model = f'({children})'
else:
model = 'EMPTY'
lines.append(f'<!ELEMENT {elem_name} {model}>')
# Attributes
for attr_name, attr_type in sorted(
info['attributes'].items()
):
if attr_type.startswith('('):
default = '#IMPLIED'
else:
default = '#IMPLIED'
lines.append(
f'<!ATTLIST {elem_name} '
f'{attr_name} {attr_type} {default}>')
lines.append('') # blank line
return '\n'.join(lines)
# Usage
generator = DtdGenerator()
generator.analyze('sample.xml')
print(generator.generate('bookstore'))FAQ
What’s Next
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro. Updated 2026-06-20.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro