Skip to content
XPath Explained — Beginner's Guide to Querying XML

XPath Explained — Beginner's Guide to Querying XML

DodaTech Updated Jun 6, 2026 8 min read

XPath (XML Path Language) is a query language for selecting nodes from an XML document, using path expressions that look similar to filesystem paths — making it possible to navigate XML trees, filter nodes by conditions, and extract specific data.

What You’ll Learn

  • How XPath path expressions work (absolute vs relative)
  • Using predicates to filter nodes by conditions
  • XPath axes for complex navigation
  • Built-in functions for string, number, and boolean operations

Why XPath Matters

XPath is the backbone of XML processing. Without it, you’d need to manually traverse every node of an XML document to find the data you need. XPath gives you a single expression that says “find all book titles where the price is under $20.” It’s used in XSLT, XQuery, web scraping, browser automation, and configuration file processing.

Doda Browser uses XPath-like expressions for DOM element selection. Durga Antivirus Pro uses XPath to parse XML-based malware signature files, quickly extracting specific threat patterns.

Learning Path

    flowchart LR
  A[XML Basics] --> B[XPath Queries<br/>You are here]
  B --> C[XSLT Transformations]
  C --> D[XML Schema XSD]
  D --> E[SOAP & WSDL]
  

The Data We’ll Query

Throughout this tutorial, we’ll use this XML document:

<?xml version="1.0" encoding="UTF-8"?>
<library>
    <book category="fiction" id="b1">
        <title>The Hobbit</title>
        <author>J.R.R. Tolkien</author>
        <year>1937</year>
        <price currency="USD">12.99</price>
    </book>
    <book category="non-fiction" id="b2">
        <title>A Brief History of Time</title>
        <author>Stephen Hawking</author>
        <year>1988</year>
        <price currency="GBP">9.99</price>
    </book>
    <book category="fiction" id="b3">
        <title>1984</title>
        <author>George Orwell</author>
        <year>1949</year>
        <price currency="USD">10.99</price>
    </book>
    <magazine category="tech" id="m1">
        <title>Wired</title>
        <issue>June 2026</issue>
        <price currency="USD">5.99</price>
    </magazine>
</library>

Basic Path Expressions

XPath expressions look like file paths. They navigate the XML tree from the root or from the current position.

Absolute Paths

Start with / to navigate from the root:

ExpressionResultExplanation
/libraryThe library elementSelects root
/library/bookAll 3 book elementsSelects all book children of library
/library/book/title“The Hobbit”, “A Brief History of Time”, “1984”Selects all title elements

Relative Paths

Start with ./ or just the node name to navigate from the current context:

ExpressionResult
book/titleTitle elements of current book
..Parent of current node

Wildcards

ExpressionResult
/library/*All children of library (books + magazine)
/library//titleAll title elements at any depth
@*All attributes

Predicates — Filtering Nodes

Predicates are the real power of XPath. They filter nodes using conditions in square brackets [].

/library/book[1]                    -- First book element
/library/book[last()]               -- Last book element
/library/book[position() < 3]       -- First two books
/library/book[@category='fiction']  -- Books with category="fiction"

Let’s run some queries on our XML:

ExpressionResult
/library/book[@category='fiction']/title“The Hobbit”, “1984”
/library/*[price < 11]The “Brief History” book and the magazine
/library/book[@id='b3']/author“George Orwell”
/library//price[@currency='USD']$12.99, $10.99, $5.99

Practical Examples

# Find all fiction books
/library/book[@category='fiction']
# Result: book elements for The Hobbit and 1984

# Find elements with price under $10
/library/*[price < 10]
# Result: b2 (9.99 GBP) and m1 (5.99 USD)

# Find the author of the first book
/library/book[1]/author
# Result: J.R.R. Tolkien

# Find books published before 1950
/library/book[year < 1950]/title
# Result: The Hobbit (1937), 1984 (1949)

Operators

XPath supports comparison and logical operators:

OperatorMeaningExample
=Equal@category='fiction'
!=Not equal@category!='fiction'
<Less thanprice < 10
>Greater thanyear > 1950
andLogical AND@category='fiction' and year > 1940
orLogical OR@category='fiction' or @category='tech'
# Fiction books published after 1940
/library/book[@category='fiction' and year > 1940]/title
# Result: 1984

# Books with USD price under $11
/library/book[price[@currency='USD'] < 11]/title
# Result: 1984

XPath Axes

Axes let you navigate relative to the current node in ways beyond simple parent/child:

AxisDirectionExample
child::Children (default)child::title = title
parent::One level upparent::* = ..
ancestor::All ancestorsancestor::library
descendant::All descendantsdescendant::price = //price
following-sibling::Siblings afterfollowing-sibling::book
preceding-sibling::Siblings beforepreceding-sibling::book
self::Current nodeself::*

Axis Examples

# From the title "The Hobbit", find its parent
/child::library/child::book[1]/child::title/parent::*
# Result: the first book element

# From the first book, find all following books
/library/book[1]/following-sibling::book
# Result: b2 and b3

# Find all descendant price elements
/library/descendant::price
# Result: all 4 price elements

# From price, find the parent book
/library/book[1]/price/parent::book/title
# Result: "The Hobbit"

XPath Functions

XPath includes built-in functions for text, numbers, and booleans:

String Functions

# Get text content
string(/library/book[1]/title)
# Result: "The Hobbit"

# Length of text
string-length(/library/book[1]/title)
# Result: 9

# Contains
/library/book[contains(author, 'Tolkien')]
# Result: the first book

# Starts with
/library/book[starts-with(@category, 'non')]
# Result: the second book

Number Functions

# Count elements
count(/library/book)
# Result: 3

# Sum prices (all in different currencies — would need conversion in real use)
# This example uses number() on the first price
number(/library/book[1]/price)
# Result: 12.99

# Sum of all prices as numbers (simple calculation)
# not real-world due to currency differences, but shows the syntax
//price[1] + //price[4]
# Result: 18.98 (12.99 + 5.99)

Real-World Use: Web Scraping

XPath is commonly used in web scraping. When you inspect an HTML page and want to extract all links:

//a/@href           -- All href attributes of all links
//img/@src          -- All image sources
//h1/text()         -- Text of all h1 elements
//*[@class='price'] -- All elements with class="price"

Many web scraping tools and browser DevTools support XPath. In Chrome DevTools, you can use $x("//div[@class='product']") to evaluate XPath expressions directly.

Security Angle

XPath can be vulnerable to injection attacks if user input is directly concatenated into XPath expressions:

# VULNERABLE: user input concatenated into XPath
username = request.GET.get("username")  # User input
xpath = f"//user[name='{username}']/password/text()"

# Input: admin' or '1'='1
# Result: //user[name='admin' or '1'='1']/password/text()
# This returns ALL users' passwords!

Always use parameterized XPath queries (with variables) or sanitize user input. Durga Antivirus Pro uses parameterized XPath internally for all XML signature lookups to prevent injection vulnerabilities.

Common Mistakes

1. Index starting at 1 instead of 0

Unlike most programming languages, XPath positions start at 1, not 0. /library/book[1] is the first book, not the second.

2. Forgetting the difference between / and //

/library/book selects direct children. /library//book selects all book descendants at any depth.

3. Using price < 10 when price has text children

/book[price < 10]    -- Correct: compares the text value
/book[price < 10.50] -- Also correct

4. Not accounting for namespaces

If the XML has a default namespace, //title won’t match. You need to map the namespace: //ns:title.

5. Confusing text() with string value

child::text() returns text nodes children. string() returns the string value of a node.

Practice Questions

  1. What is the difference between /library/book and //book? /library/book selects book elements that are direct children of library. //book selects all book elements at any depth in the document.

  2. What do predicates do in XPath? They filter nodes by conditions in square brackets, like [price > 10] or [@category='fiction'].

  3. In XPath, what index does the first element start at? 1 (not 0 as in most programming languages).

  4. What would //title/../@category return? The category attributes of all elements that have a title child (i.e., the category of each book).

  5. How do you select all elements with a specific attribute value? //*[@attribute='value'].

Challenge: Write an XPath expression that selects all book titles where the price is less than $15, the book was published after 1940, and the author’s last name contains “Orwell”. Then write the same query using axes instead of predicates.

FAQ

What is XPath?
XPath (XML Path Language) is a query language for selecting nodes from XML documents using path expressions, predicates, and functions.
Is XPath only for XML?
XPath is designed for XML, but it’s also commonly used with HTML (for web scraping) and other markup languages.
What is the difference between XPath and XSLT?
XPath is a query language (find data). XSLT is a transformation language (convert data). XSLT uses XPath extensively inside its templates.
Can I use XPath with JSON?
Not directly, but there are tools like JSONPath that apply XPath-like syntax to JSON documents.
What does //* mean in XPath?
Select all elements at any depth in the document. The * is a wildcard matching any element name.
What is an XPath axis?
An axis defines a direction of navigation relative to the current node — parent, child, ancestor, following-sibling, etc.

Try It Yourself

Use a command-line tool or Python to test XPath:

import xml.etree.ElementTree as ET

xml_data = """<?xml version="1.0" encoding="UTF-8"?>
<library>
    <book category="fiction">
        <title>The Hobbit</title>
        <author>J.R.R. Tolkien</author>
        <year>1937</year>
        <price>12.99</price>
    </book>
    <book category="non-fiction">
        <title>A Brief History of Time</title>
        <author>Stephen Hawking</author>
        <year>1988</year>
        <price>9.99</price>
    </book>
</library>"""

root = ET.fromstring(xml_data)

# Find all titles
for title in root.findall('.//title'):
    print(title.text)

# Expected output:
# The Hobbit
# A Brief History of Time

Or use xmllint on the command line:

# Query all book titles
xmllint --xpath '/library/book/title/text()' library.xml

Expected output:

The Hobbit
A Brief History of Time
1984

What’s Next

TutorialWhat You’ll Learn
XSLT Explained — Transform XMLTransform XML into HTML and other formats using XPath
XML Basics — Complete GuideFoundational XML concepts
Python XML ProcessingProcess XML data with Python’s ElementTree

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro. Updated 2026-06-06.

What’s Next

Congratulations on completing this Xpath tutorial! Here’s where to go from here:

  • Practice daily — Consistency is more important than long study sessions
  • Build a project — Apply what you learned by building something real
  • Explore related topics — Check out other tutorials in the same category
  • Join the community — Discuss with other learners and share your progress

Remember: every expert was once a beginner. Keep coding!

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro