Learn XPath Explained — Beginner's Guide to Querying XML

Q: What does `//*` mean in XPath?

Select all elements at any depth in the document. The * is a wildcard matching any element name.

XPath Explained — Beginner's Guide to Querying XML

DodaTech Updated Jun 6, 2026 8 min read

XPath (XML Path Language) is a query language for selecting nodes from an XML document, using path expressions that look similar to filesystem paths — making it possible to navigate XML trees, filter nodes by conditions, and extract specific data.

What You’ll Learn

How XPath path expressions work (absolute vs relative)
Using predicates to filter nodes by conditions
XPath axes for complex navigation
Built-in functions for string, number, and boolean operations

Why XPath Matters

XPath is the backbone of XML processing. Without it, you’d need to manually traverse every node of an XML document to find the data you need. XPath gives you a single expression that says “find all book titles where the price is under $20.” It’s used in XSLT, XQuery, web scraping, browser automation, and configuration file processing.

Doda Browser uses XPath-like expressions for DOM element selection. Durga Antivirus Pro uses XPath to parse XML-based malware signature files, quickly extracting specific threat patterns.

Learning Path

    flowchart LR
  A[XML Basics] --> B[XPath Queries<br/>You are here]
  B --> C[XSLT Transformations]
  C --> D[XML Schema XSD]
  D --> E[SOAP & WSDL]

The Data We’ll Query

Throughout this tutorial, we’ll use this XML document:

<?xml version="1.0" encoding="UTF-8"?>
<library>
    <book category="fiction" id="b1">
        <title>The Hobbit</title>
        <author>J.R.R. Tolkien</author>
        <year>1937</year>
        <price currency="USD">12.99</price>
    </book>
    <book category="non-fiction" id="b2">
        <title>A Brief History of Time</title>
        <author>Stephen Hawking</author>
        <year>1988</year>
        <price currency="GBP">9.99</price>
    </book>
    <book category="fiction" id="b3">
        <title>1984</title>
        <author>George Orwell</author>
        <year>1949</year>
        <price currency="USD">10.99</price>
    </book>
    <magazine category="tech" id="m1">
        <title>Wired</title>
        <issue>June 2026</issue>
        <price currency="USD">5.99</price>
    </magazine>
</library>

Basic Path Expressions

XPath expressions look like file paths. They navigate the XML tree from the root or from the current position.

Absolute Paths

Start with / to navigate from the root:

Expression	Result	Explanation
`/library`	The library element	Selects root
`/library/book`	All 3 book elements	Selects all book children of library
`/library/book/title`	“The Hobbit”, “A Brief History of Time”, “1984”	Selects all title elements

Relative Paths

Start with ./ or just the node name to navigate from the current context:

Expression	Result
`book/title`	Title elements of current book
`..`	Parent of current node

Wildcards

Expression	Result
`/library/*`	All children of library (books + magazine)
`/library//title`	All title elements at any depth
`@*`	All attributes

Predicates — Filtering Nodes

Predicates are the real power of XPath. They filter nodes using conditions in square brackets [].

/library/book[1]                    -- First book element
/library/book[last()]               -- Last book element
/library/book[position() < 3]       -- First two books
/library/book[@category='fiction']  -- Books with category="fiction"

Let’s run some queries on our XML:

Expression	Result
`/library/book[@category='fiction']/title`	“The Hobbit”, “1984”
`/library/*[price < 11]`	The “Brief History” book and the magazine
`/library/book[@id='b3']/author`	“George Orwell”
`/library//price[@currency='USD']`	$12.99, $10.99, $5.99

Practical Examples

# Find all fiction books
/library/book[@category='fiction']
# Result: book elements for The Hobbit and 1984

# Find elements with price under $10
/library/*[price < 10]
# Result: b2 (9.99 GBP) and m1 (5.99 USD)

# Find the author of the first book
/library/book[1]/author
# Result: J.R.R. Tolkien

# Find books published before 1950
/library/book[year < 1950]/title
# Result: The Hobbit (1937), 1984 (1949)

Operators

XPath supports comparison and logical operators:

Operator	Meaning	Example
`=`	Equal	`@category='fiction'`
`!=`	Not equal	`@category!='fiction'`
`<`	Less than	`price < 10`
`>`	Greater than	`year > 1950`
`and`	Logical AND	`@category='fiction' and year > 1940`
`or`	Logical OR	`@category='fiction' or @category='tech'`

# Fiction books published after 1940
/library/book[@category='fiction' and year > 1940]/title
# Result: 1984

# Books with USD price under $11
/library/book[price[@currency='USD'] < 11]/title
# Result: 1984

XPath Axes

Axes let you navigate relative to the current node in ways beyond simple parent/child:

Axis	Direction	Example
`child::`	Children (default)	`child::title` = `title`
`parent::`	One level up	`parent::*` = `..`
`ancestor::`	All ancestors	`ancestor::library`
`descendant::`	All descendants	`descendant::price` = `//price`
`following-sibling::`	Siblings after	`following-sibling::book`
`preceding-sibling::`	Siblings before	`preceding-sibling::book`
`self::`	Current node	`self::*`

Axis Examples

# From the title "The Hobbit", find its parent
/child::library/child::book[1]/child::title/parent::*
# Result: the first book element

# From the first book, find all following books
/library/book[1]/following-sibling::book
# Result: b2 and b3

# Find all descendant price elements
/library/descendant::price
# Result: all 4 price elements

# From price, find the parent book
/library/book[1]/price/parent::book/title
# Result: "The Hobbit"

XPath Functions

XPath includes built-in functions for text, numbers, and booleans:

String Functions

# Get text content
string(/library/book[1]/title)
# Result: "The Hobbit"

# Length of text
string-length(/library/book[1]/title)
# Result: 9

# Contains
/library/book[contains(author, 'Tolkien')]
# Result: the first book

# Starts with
/library/book[starts-with(@category, 'non')]
# Result: the second book

Number Functions

# Count elements
count(/library/book)
# Result: 3

# Sum prices (all in different currencies — would need conversion in real use)
# This example uses number() on the first price
number(/library/book[1]/price)
# Result: 12.99

# Sum of all prices as numbers (simple calculation)
# not real-world due to currency differences, but shows the syntax
//price[1] + //price[4]
# Result: 18.98 (12.99 + 5.99)

Real-World Use: Web Scraping

XPath is commonly used in web scraping. When you inspect an HTML page and want to extract all links:

//a/@href           -- All href attributes of all links
//img/@src          -- All image sources
//h1/text()         -- Text of all h1 elements
//*[@class='price'] -- All elements with class="price"

Many web scraping tools and browser DevTools support XPath. In Chrome DevTools, you can use $x("//div[@class='product']") to evaluate XPath expressions directly.

Security Angle

XPath can be vulnerable to injection attacks if user input is directly concatenated into XPath expressions:

# VULNERABLE: user input concatenated into XPath
username = request.GET.get("username")  # User input
xpath = f"//user[name='{username}']/password/text()"

# Input: admin' or '1'='1
# Result: //user[name='admin' or '1'='1']/password/text()
# This returns ALL users' passwords!

Always use parameterized XPath queries (with variables) or sanitize user input. Durga Antivirus Pro uses parameterized XPath internally for all XML signature lookups to prevent injection vulnerabilities.

Common Mistakes

1. Index starting at 1 instead of 0

Unlike most programming languages, XPath positions start at 1, not 0. /library/book[1] is the first book, not the second.

2. Forgetting the difference between `/` and `//`

/library/book selects direct children. /library//book selects all book descendants at any depth.

3. Using `price < 10` when price has text children

/book[price < 10]    -- Correct: compares the text value
/book[price < 10.50] -- Also correct

4. Not accounting for namespaces

If the XML has a default namespace, //title won’t match. You need to map the namespace: //ns:title.

5. Confusing `text()` with string value

child::text() returns text nodes children. string() returns the string value of a node.

Practice Questions

What is the difference between /library/book and //book? /library/book selects book elements that are direct children of library. //book selects all book elements at any depth in the document.
What do predicates do in XPath? They filter nodes by conditions in square brackets, like [price > 10] or [@category='fiction'].
In XPath, what index does the first element start at? 1 (not 0 as in most programming languages).
What would //title/../@category return? The category attributes of all elements that have a title child (i.e., the category of each book).
How do you select all elements with a specific attribute value? //*[@attribute='value'].

Challenge: Write an XPath expression that selects all book titles where the price is less than $15, the book was published after 1940, and the author’s last name contains “Orwell”. Then write the same query using axes instead of predicates.

FAQ

What is XPath?

XPath (XML Path Language) is a query language for selecting nodes from XML documents using path expressions, predicates, and functions.

Is XPath only for XML?

XPath is designed for XML, but it’s also commonly used with HTML (for web scraping) and other markup languages.

What is the difference between XPath and XSLT?

XPath is a query language (find data). XSLT is a transformation language (convert data). XSLT uses XPath extensively inside its templates.

Can I use XPath with JSON?

Not directly, but there are tools like JSONPath that apply XPath-like syntax to JSON documents.

What does //* mean in XPath?

Select all elements at any depth in the document. The * is a wildcard matching any element name.

What is an XPath axis?

An axis defines a direction of navigation relative to the current node — parent, child, ancestor, following-sibling, etc.

Try It Yourself

Use a command-line tool or Python to test XPath:

import xml.etree.ElementTree as ET

xml_data = """<?xml version="1.0" encoding="UTF-8"?>
<library>
    <book category="fiction">
        <title>The Hobbit</title>
        <author>J.R.R. Tolkien</author>
        <year>1937</year>
        <price>12.99</price>
    </book>
    <book category="non-fiction">
        <title>A Brief History of Time</title>
        <author>Stephen Hawking</author>
        <year>1988</year>
        <price>9.99</price>
    </book>
</library>"""

root = ET.fromstring(xml_data)

# Find all titles
for title in root.findall('.//title'):
    print(title.text)

# Expected output:
# The Hobbit
# A Brief History of Time

Or use xmllint on the command line:

# Query all book titles
xmllint --xpath '/library/book/title/text()' library.xml

Expected output:

The Hobbit
A Brief History of Time
1984

What’s Next

Tutorial	What You’ll Learn
XSLT Explained — Transform XML	Transform XML into HTML and other formats using XPath
XML Basics — Complete Guide	Foundational XML concepts
Python XML Processing	Process XML data with Python’s ElementTree

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro. Updated 2026-06-06.

What’s Next

Congratulations on completing this Xpath tutorial! Here’s where to go from here:

Practice daily — Consistency is more important than long study sessions
Build a project — Apply what you learned by building something real
Explore related topics — Check out other tutorials in the same category
Join the community — Discuss with other learners and share your progress

Remember: every expert was once a beginner. Keep coding!

Previous XML Explained — Complete Beginner's Guide Next XSLT Explained — Transform XML into HTML & More

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse XML Technologies

XPath Explained — Beginner's Guide to Querying XML

What You’ll Learn

Why XPath Matters

Learning Path

The Data We’ll Query

Basic Path Expressions

Absolute Paths

Relative Paths

Wildcards

Predicates — Filtering Nodes

Practical Examples

Operators

XPath Axes

Axis Examples

XPath Functions

String Functions

Number Functions

Real-World Use: Web Scraping

Security Angle

Common Mistakes

1. Index starting at 1 instead of 0

2. Forgetting the difference between / and //

3. Using price < 10 when price has text children

4. Not accounting for namespaces

5. Confusing text() with string value

Practice Questions

FAQ

Try It Yourself

What’s Next

What’s Next

2. Forgetting the difference between `/` and `//`

3. Using `price < 10` when price has text children

5. Confusing `text()` with string value