XPath Explained — Beginner's Guide to Querying XML
XPath (XML Path Language) is a query language for selecting nodes from an XML document, using path expressions that look similar to filesystem paths — making it possible to navigate XML trees, filter nodes by conditions, and extract specific data.
What You’ll Learn
- How XPath path expressions work (absolute vs relative)
- Using predicates to filter nodes by conditions
- XPath axes for complex navigation
- Built-in functions for string, number, and boolean operations
Why XPath Matters
XPath is the backbone of XML processing. Without it, you’d need to manually traverse every node of an XML document to find the data you need. XPath gives you a single expression that says “find all book titles where the price is under $20.” It’s used in XSLT, XQuery, web scraping, browser automation, and configuration file processing.
Doda Browser uses XPath-like expressions for DOM element selection. Durga Antivirus Pro uses XPath to parse XML-based malware signature files, quickly extracting specific threat patterns.
Learning Path
flowchart LR
A[XML Basics] --> B[XPath Queries<br/>You are here]
B --> C[XSLT Transformations]
C --> D[XML Schema XSD]
D --> E[SOAP & WSDL]
The Data We’ll Query
Throughout this tutorial, we’ll use this XML document:
<?xml version="1.0" encoding="UTF-8"?>
<library>
<book category="fiction" id="b1">
<title>The Hobbit</title>
<author>J.R.R. Tolkien</author>
<year>1937</year>
<price currency="USD">12.99</price>
</book>
<book category="non-fiction" id="b2">
<title>A Brief History of Time</title>
<author>Stephen Hawking</author>
<year>1988</year>
<price currency="GBP">9.99</price>
</book>
<book category="fiction" id="b3">
<title>1984</title>
<author>George Orwell</author>
<year>1949</year>
<price currency="USD">10.99</price>
</book>
<magazine category="tech" id="m1">
<title>Wired</title>
<issue>June 2026</issue>
<price currency="USD">5.99</price>
</magazine>
</library>Basic Path Expressions
XPath expressions look like file paths. They navigate the XML tree from the root or from the current position.
Absolute Paths
Start with / to navigate from the root:
| Expression | Result | Explanation |
|---|---|---|
/library | The library element | Selects root |
/library/book | All 3 book elements | Selects all book children of library |
/library/book/title | “The Hobbit”, “A Brief History of Time”, “1984” | Selects all title elements |
Relative Paths
Start with ./ or just the node name to navigate from the current context:
| Expression | Result |
|---|---|
book/title | Title elements of current book |
.. | Parent of current node |
Wildcards
| Expression | Result |
|---|---|
/library/* | All children of library (books + magazine) |
/library//title | All title elements at any depth |
@* | All attributes |
Predicates — Filtering Nodes
Predicates are the real power of XPath. They filter nodes using conditions in square brackets [].
/library/book[1] -- First book element
/library/book[last()] -- Last book element
/library/book[position() < 3] -- First two books
/library/book[@category='fiction'] -- Books with category="fiction"Let’s run some queries on our XML:
| Expression | Result |
|---|---|
/library/book[@category='fiction']/title | “The Hobbit”, “1984” |
/library/*[price < 11] | The “Brief History” book and the magazine |
/library/book[@id='b3']/author | “George Orwell” |
/library//price[@currency='USD'] | $12.99, $10.99, $5.99 |
Practical Examples
# Find all fiction books
/library/book[@category='fiction']
# Result: book elements for The Hobbit and 1984
# Find elements with price under $10
/library/*[price < 10]
# Result: b2 (9.99 GBP) and m1 (5.99 USD)
# Find the author of the first book
/library/book[1]/author
# Result: J.R.R. Tolkien
# Find books published before 1950
/library/book[year < 1950]/title
# Result: The Hobbit (1937), 1984 (1949)Operators
XPath supports comparison and logical operators:
| Operator | Meaning | Example |
|---|---|---|
= | Equal | @category='fiction' |
!= | Not equal | @category!='fiction' |
< | Less than | price < 10 |
> | Greater than | year > 1950 |
and | Logical AND | @category='fiction' and year > 1940 |
or | Logical OR | @category='fiction' or @category='tech' |
# Fiction books published after 1940
/library/book[@category='fiction' and year > 1940]/title
# Result: 1984
# Books with USD price under $11
/library/book[price[@currency='USD'] < 11]/title
# Result: 1984XPath Axes
Axes let you navigate relative to the current node in ways beyond simple parent/child:
| Axis | Direction | Example |
|---|---|---|
child:: | Children (default) | child::title = title |
parent:: | One level up | parent::* = .. |
ancestor:: | All ancestors | ancestor::library |
descendant:: | All descendants | descendant::price = //price |
following-sibling:: | Siblings after | following-sibling::book |
preceding-sibling:: | Siblings before | preceding-sibling::book |
self:: | Current node | self::* |
Axis Examples
# From the title "The Hobbit", find its parent
/child::library/child::book[1]/child::title/parent::*
# Result: the first book element
# From the first book, find all following books
/library/book[1]/following-sibling::book
# Result: b2 and b3
# Find all descendant price elements
/library/descendant::price
# Result: all 4 price elements
# From price, find the parent book
/library/book[1]/price/parent::book/title
# Result: "The Hobbit"XPath Functions
XPath includes built-in functions for text, numbers, and booleans:
String Functions
# Get text content
string(/library/book[1]/title)
# Result: "The Hobbit"
# Length of text
string-length(/library/book[1]/title)
# Result: 9
# Contains
/library/book[contains(author, 'Tolkien')]
# Result: the first book
# Starts with
/library/book[starts-with(@category, 'non')]
# Result: the second bookNumber Functions
# Count elements
count(/library/book)
# Result: 3
# Sum prices (all in different currencies — would need conversion in real use)
# This example uses number() on the first price
number(/library/book[1]/price)
# Result: 12.99
# Sum of all prices as numbers (simple calculation)
# not real-world due to currency differences, but shows the syntax
//price[1] + //price[4]
# Result: 18.98 (12.99 + 5.99)Real-World Use: Web Scraping
XPath is commonly used in web scraping. When you inspect an HTML page and want to extract all links:
//a/@href -- All href attributes of all links
//img/@src -- All image sources
//h1/text() -- Text of all h1 elements
//*[@class='price'] -- All elements with class="price"Many web scraping tools and browser DevTools support XPath. In Chrome DevTools, you can use $x("//div[@class='product']") to evaluate XPath expressions directly.
Security Angle
XPath can be vulnerable to injection attacks if user input is directly concatenated into XPath expressions:
# VULNERABLE: user input concatenated into XPath
username = request.GET.get("username") # User input
xpath = f"//user[name='{username}']/password/text()"
# Input: admin' or '1'='1
# Result: //user[name='admin' or '1'='1']/password/text()
# This returns ALL users' passwords!Always use parameterized XPath queries (with variables) or sanitize user input. Durga Antivirus Pro uses parameterized XPath internally for all XML signature lookups to prevent injection vulnerabilities.
Common Mistakes
1. Index starting at 1 instead of 0
Unlike most programming languages, XPath positions start at 1, not 0. /library/book[1] is the first book, not the second.
2. Forgetting the difference between / and //
/library/book selects direct children. /library//book selects all book descendants at any depth.
3. Using price < 10 when price has text children
/book[price < 10] -- Correct: compares the text value
/book[price < 10.50] -- Also correct4. Not accounting for namespaces
If the XML has a default namespace, //title won’t match. You need to map the namespace: //ns:title.
5. Confusing text() with string value
child::text() returns text nodes children. string() returns the string value of a node.
Practice Questions
What is the difference between
/library/bookand//book?/library/bookselects book elements that are direct children of library.//bookselects all book elements at any depth in the document.What do predicates do in XPath? They filter nodes by conditions in square brackets, like
[price > 10]or[@category='fiction'].In XPath, what index does the first element start at? 1 (not 0 as in most programming languages).
What would
//title/../@categoryreturn? The category attributes of all elements that have a title child (i.e., the category of each book).How do you select all elements with a specific attribute value?
//*[@attribute='value'].
Challenge: Write an XPath expression that selects all book titles where the price is less than $15, the book was published after 1940, and the author’s last name contains “Orwell”. Then write the same query using axes instead of predicates.
FAQ
Try It Yourself
Use a command-line tool or Python to test XPath:
import xml.etree.ElementTree as ET
xml_data = """<?xml version="1.0" encoding="UTF-8"?>
<library>
<book category="fiction">
<title>The Hobbit</title>
<author>J.R.R. Tolkien</author>
<year>1937</year>
<price>12.99</price>
</book>
<book category="non-fiction">
<title>A Brief History of Time</title>
<author>Stephen Hawking</author>
<year>1988</year>
<price>9.99</price>
</book>
</library>"""
root = ET.fromstring(xml_data)
# Find all titles
for title in root.findall('.//title'):
print(title.text)
# Expected output:
# The Hobbit
# A Brief History of TimeOr use xmllint on the command line:
# Query all book titles
xmllint --xpath '/library/book/title/text()' library.xmlExpected output:
The Hobbit
A Brief History of Time
1984What’s Next
| Tutorial | What You’ll Learn |
|---|---|
| XSLT Explained — Transform XML | Transform XML into HTML and other formats using XPath |
| XML Basics — Complete Guide | Foundational XML concepts |
| Python XML Processing | Process XML data with Python’s ElementTree |
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro. Updated 2026-06-06.
What’s Next
Congratulations on completing this Xpath tutorial! Here’s where to go from here:
- Practice daily — Consistency is more important than long study sessions
- Build a project — Apply what you learned by building something real
- Explore related topics — Check out other tutorials in the same category
- Join the community — Discuss with other learners and share your progress
Remember: every expert was once a beginner. Keep coding!
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro