XML: XPath Functions: Complete Reference with Examples

Q: How do XPath functions handle missing values?

Functions return empty sequence for missing nodes. contains(/root/missing, 'test') returns false, not an error. Numeric functions on empty sequences return NaN.

Q: Do XPath functions work in web browsers?

Modern browsers support XPath 1.0 via document.evaluate(). XPath 2.0+ functions require a library like Saxon-JS or running server-side.

XML Technologies

XPath Functions: Complete Reference with Examples

DodaTech Updated Jun 20, 2026 7 min read

XPath functions transform, filter, and compute values directly within your XPath expressions — eliminating the need for post-processing in languages like Python or JavaScript. This reference covers every essential XPath 2.0/3.0 function with runnable examples.

Learning Path

    flowchart LR
  A["XPath Basics<br/>Path Expressions"] --> B["XPath Functions<br/>Complete Reference"]
  B --> C["XML Validation<br/>DTD & XSD"]
  C --> D["SOAP APIs<br/>XML Web Services"]
  style B fill:#f90,color:#fff,stroke-width:2px

What you’ll learn: All major XPath function categories — string, number, date, boolean, and sequence functions — with examples you can run in any XPath tester. Why it matters: XPath functions let you extract, clean, and compute data at query time without external code. They’re essential for XSLT, XML databases, and web scraping. Real-world use: DodaZIP uses XPath functions to extract file metadata from XML manifests. Durga Antivirus Pro uses XPath functions in signature queries to match threat patterns.

Sample XML Document

We’ll use this catalog XML for all examples:

<catalog>
    <book id="b001" category="fiction">
        <title>The Hobbit</title>
        <author>J.R.R. Tolkien</author>
        <price currency="USD">12.99</price>
        <rating>4.8</rating>
        <pubdate>1937-09-21</pubdate>
    </book>
    <book id="b002" category="fiction">
        <title>The Fellowship of the Ring</title>
        <author>J.R.R. Tolkien</author>
        <price currency="USD">14.99</price>
        <rating>4.9</rating>
        <pubdate>1954-07-29</pubdate>
    </book>
    <book id="b003" category="non-fiction">
        <title>A Brief History of Time</title>
        <author>Stephen Hawking</author>
        <price currency="GBP">9.99</price>
        <rating>4.5</rating>
        <pubdate>1988-03-01</pubdate>
    </book>
</catalog>

String Functions

concat()

Combines two or more strings:

concat(//book[1]/title, ' by ', //book[1]/author)

Result: "The Hobbit by J.R.R. Tolkien"

substring()

Extracts a portion of a string:

substring(//book[1]/title, 5, 5)
substring(//book[1]/@id, 2)

Result: "obbit", "001"

string-length()

Returns character count:

string-length(//book[1]/title)

Result: 10

contains()

Checks if a string contains a substring:

//book[contains(title, 'Hobbit')]/title
//book[contains(author, 'Hawking')]/author

Result: "The Hobbit", "Stephen Hawking"

starts-with() and ends-with()

//book[starts-with(title, 'The')]/title
//book[ends-with(@id, '003')]/title

Result: All three titles, "A Brief History of Time"

normalize-space()

Removes leading/trailing whitespace and collapses internal whitespace:

normalize-space('   Extra   spaces   ')

Result: "Extra spaces"

string-join()

Joins a sequence of strings with a separator:

string-join(//book/author, ', ')

Result: "J.R.R. Tolkien, J.R.R. Tolkien, Stephen Hawking"

translate()

Character-by-character replacement:

translate(//book[1]/title, 'obB', 'OBb')

Result: "The HOBbit" (o→O, b→B, B→b)

import xml.etree.ElementTree as ET

def run_xpath_functions_demo():
    """Demonstrate XPath string functions using Python."""
    xml_str = """<catalog>
        <book id="b001"><title>The Hobbit</title><author>J.R.R. Tolkien</author></book>
        <book id="b002"><title>The Fellowship of the Ring</title><author>J.R.R. Tolkien</author></book>
    </catalog>"""
    
    root = ET.fromstring(xml_str)
    
    # Simulate XPath functions
    titles = [b.find('title').text for b in root.findall('book')]
    authors = [b.find('author').text for b in root.findall('book')]
    
    print(f"concat: {titles[0]} by {authors[0]}")
    print(f"substring(,5,5): {titles[0][4:9]}")
    print(f"contains('Hobbit'): {'Hobbit' in titles[0]}")
    print(f"string-join: {', '.join(authors)}")
    print(f"string-length: {len(titles[0])}")
    print(f"translate: {titles[0].translate(str.maketrans({'o':'O','b':'B'}))}")

run_xpath_functions_demo()

Expected output:

concat: The Hobbit by J.R.R. Tolkien
substring(,5,5): Hobbi
contains('Hobbit'): True
string-join: J.R.R. Tolkien, J.R.R. Tolkien
string-length: 10
translate: The HOBbit

Number Functions

number()

Converts a string to a number:

number(//book[1]/price) * 2

Result: 25.98

ceiling(), floor(), round()

ceiling(12.99)
floor(12.99)
round(12.49)
round(12.50)

Result: 13, 12, 12, 13

sum(), avg(), min(), max()

sum(//book/price)
avg(//book/price)
min(//book/price)
max(//book/price)

Result: 37.97, 12.656..., 9.99, 14.99

count()

count(//book)
count(//book[author = 'J.R.R. Tolkien'])

Result: 3, 2

abs()

XPath 3.0 absolute value:

abs(-42)
abs(//book[1]/price - //book[2]/price)

Result: 42, 2.0

def number_functions_demo():
    """Demonstrate XPath number functions."""
    prices = [12.99, 14.99, 9.99]
    
    # Simulate XPath number functions
    print(f"sum: {sum(prices)}")
    print(f"avg: {sum(prices)/len(prices):.4f}")
    print(f"min: {min(prices)}")
    print(f"max: {max(prices)}")
    print(f"count: {len(prices)}")
    print(f"ceiling(12.99): {__import__('math').ceil(12.99)}")
    print(f"floor(12.99): {__import__('math').floor(12.99)}")
    print(f"round(12.49): {round(12.49)}")
    print(f"round(12.50): {round(12.50)}")

number_functions_demo()

Expected output:

sum: 37.97
avg: 12.6567
min: 9.99
max: 14.99
count: 3
ceiling(12.99): 13
floor(12.99): 12
round(12.49): 12
round(12.50): 13

Date Functions (XPath 2.0+)

current-date(), current-time(), current-dateTime()

current-date()
current-time()
current-dateTime()

Result: 2026-06-20, 10:30:00+00:00, 2026-06-20T10:30:00+00:00

year-from-date(), month-from-date(), day-from-date()

year-from-date(//book[1]/pubdate)
month-from-date(//book[1]/pubdate)
day-from-date(//book[1]/pubdate)

Result: 1937, 9, 21

format-date()

Custom date formatting:

format-date(//book[1]/pubdate, '[D] [MNn], [Y]')
format-date(//book[1]/pubdate, '[Y0001]-[M01]-[D01]')

Result: "21 September, 1937", "1937-09-21"

days-from-duration()

Difference between dates:

days-from-duration(current-date() - xs:date(//book[1]/pubdate))

Result: 32464 (days between 1937-09-21 and 2026-06-20)

from datetime import date, datetime

def date_functions_demo():
    """Demonstrate XPath date functions."""
    today = date(2026, 6, 20)
    pubdate = date(1937, 9, 21)
    
    print(f"current-date: {today}")
    print(f"year-from-date: {pubdate.year}")
    print(f"month-from-date: {pubdate.month}")
    print(f"day-from-date: {pubdate.day}")
    print(f"format-date([D] [MNn], [Y]): {pubdate.strftime('%d %B, %Y')}")
    print(f"days-difference: {(today - pubdate).days}")

date_functions_demo()

Expected output:

current-date: 2026-06-20
year-from-date: 1937
month-from-date: 9
day-from-date: 21
format-date([D] [MNn], [Y]): 21 September, 1937
days-difference: 32464

Boolean Functions

not()

//book[not(rating > 4.8)]/title
//book[not(@category = 'fiction')]/title

Result: "The Hobbit", "A Brief History of Time", "A Brief History of Time"

boolean()

boolean(//book[1]/title)
boolean(//book[100])

Result: true, false

true() and false()

//book[rating > 4.5 and true()]
//book[false()]

Result: First two books, empty sequence

Sequence Functions

distinct-values()

distinct-values(//book/author)

Result: ("J.R.R. Tolkien", "Stephen Hawking")

subsequence()

subsequence(//book, 2, 2)

Result: Books 2 and 3

insert-before(), remove(), reverse()

reverse(//book/@id)

Result: ("b003", "b002", "b001")

index-of()

index-of(//book/author, 'J.R.R. Tolkien')

Result: (1, 2)

empty() and exists()

empty(//book[rating > 5.0])
exists(//book[rating > 4.0])

Result: true, true

Complete Python XPath Runner

from lxml import etree

def run_xpath(xml_content, xpath_expr):
    """Evaluate an XPath expression and print the result."""
    root = etree.fromstring(xml_content)
    result = root.xpath(xpath_expr)
    
    print(f"XPath: {xpath_expr}")
    print(f"Result: {result}")
    print()

xml = """<catalog>
    <book id="b001" category="fiction">
        <title>The Hobbit</title>
        <author>J.R.R. Tolkien</author>
        <price>12.99</price>
        <rating>4.8</rating>
    </book>
    <book id="b002" category="fiction">
        <title>The Fellowship of the Ring</title>
        <author>J.R.R. Tolkien</author>
        <price>14.99</price>
        <rating>4.9</rating>
    </book>
    <book id="b003" category="non-fiction">
        <title>A Brief History of Time</title>
        <author>Stephen Hawking</author>
        <price>9.99</price>
        <rating>4.5</rating>
    </book>
</catalog>"""

# Test all function categories
run_xpath(xml, "string-join(//book/title, ' | ')")
run_xpath(xml, "sum(//book/price)")
run_xpath(xml, "round(avg(//book/rating) * 10) div 10")
run_xpath(xml, "count(//book[@category='fiction'])")
run_xpath(xml, "distinct-values(//book/author)")
run_xpath(xml, "//book[rating = max(//book/rating)]/title")
run_xpath(xml, "//book[starts-with(author, 'J.')]/title")

Expected output:

XPath: string-join(//book/title, ' | ')
Result: The Hobbit | The Fellowship of the Ring | A Brief History of Time

XPath: sum(//book/price)
Result: 37.97

XPath: round(avg(//book/rating) * 10) div 10
Result: 4.7

XPath: count(//book[@category='fiction'])
Result: 2

XPath: distinct-values(//book/author)
Result: ['J.R.R. Tolkien', 'Stephen Hawking']

XPath: //book[rating = max(//book/rating)]/title
Result: ['The Fellowship of the Ring']

XPath: //book[starts-with(author, 'J.')]/title
Result: ['The Hobbit', 'The Fellowship of the Ring']

Common XPath Function Errors

Wrong argument types — Functions like sum() expect numeric sequences. Passing strings returns NaN. Use number() to convert first.
Forgetting predicates — //book/price selects ALL price elements. Use //book[1]/price for the first, or //book[@category='fiction']/price to filter.
Path with no matches — //book[100]/title returns empty sequence, not an error. Always check with exists() before accessing.
Date format mismatches — format-date() requires specific format patterns ([Y], [M], [D]). Using Python-style %Y-%m-%d patterns will fail.
XPath 1.0 vs 2.0 function availability — Functions like current-date() and format-date() are XPath 2.0+. Using them in an XPath 1.0 processor raises errors.
Case sensitivity — startswith() (not startsWith()). XPath functions are case-sensitive and lowercase.
Namespace issues in function queries — If your XML uses namespaces, prefix elements in the path: //ns:book[ns:rating > 4.5]. Without the namespace prefix, the path returns nothing.

Practice Questions

1. Which XPath function removes extra whitespace? normalize-space() — it removes leading/trailing whitespace and collapses internal whitespace sequences to single spaces.

2. How do you find the most expensive book using XPath functions? //book[price = max(//book/price)] — this selects the book whose price equals the maximum price across all books.

3. What’s the difference between concat() and string-join()? concat() joins individual string arguments. string-join() joins a SEQUENCE of strings with a separator — much more useful when working with node sets.

4. How do you format a date as “15-Mar-2026”? format-date(xs:date('2026-03-15'), '[D01]-[MN,3]-[Y]'). The [MN,3] format requests a 3-letter month abbreviation.

5. Challenge: XPath expression builder Write a single XPath expression that returns the title of the cheapest book by J.R.R. Tolkien. Then modify it to return the average price of non-fiction books published after 1950.

Mini Project: XPath Query Tool

def xpath_query_tool():
    """Interactive XPath query tool."""
    from lxml import etree
    
    xml_content = """<catalog>
      <book cat="fiction"><title>The Hobbit</title><price>12.99</price></book>
      <book cat="fiction"><title>1984</title><price>10.99</price></book>
      <book cat="non-fiction"><title>Sapiens</title><price>15.99</price></book>
    </catalog>"""
    
    root = etree.fromstring(xml_content)
    
    queries = [
        "count(//book)",
        "string-join(//book/title, ', ')",
        "//book[price > 12]/title",
        "sum(//book/price)",
        "//book[not(title = '1984')]/title",
        "distinct-values(//book/@cat)",
    ]
    
    for q in queries:
        try:
            result = root.xpath(q)
            print(f"{'✓' if result else '∅'} {q}")
            print(f"   → {result}")
        except Exception as e:
            print(f"✗ {q}")
            print(f"   → Error: {e}")
        print()

xpath_query_tool()

FAQ

What’s the difference between XPath 1.0 and 2.0 functions?

XPath 2.0 added date/time functions (current-date(), format-date()), sequence functions (distinct-values(), subsequence()), conditional expressions (if/then/else), and regular expression functions (matches(), replace()). XPath 1.0 has basic string and number functions only.

Can I write custom XPath functions?

XPath 2.0 doesn’t support user-defined functions, but XSLT 2.0 does via xsl:function. XPath 3.0 supports inline functions (anonymous functions) using the => arrow operator.

How do XPath functions handle missing values?

Functions return empty sequence for missing nodes. contains(/root/missing, 'test') returns false, not an error. Numeric functions on empty sequences return NaN.

Do XPath functions work in web browsers?

Modern browsers support XPath 1.0 via document.evaluate(). XPath 2.0+ functions require a library like Saxon-JS or running server-side.

What’s the performance impact of XPath functions?

Simple functions (contains, sum) are fast. Recursive functions or deep paths (//*) can be slow on large documents. For production, test with real data volumes.

XPath Functions: Complete Reference with Examples

Learning Path

Sample XML Document

String Functions

concat()

substring()

string-length()

contains()

starts-with() and ends-with()

normalize-space()

string-join()

translate()

Number Functions

number()

ceiling(), floor(), round()

sum(), avg(), min(), max()

count()

abs()

Date Functions (XPath 2.0+)

current-date(), current-time(), current-dateTime()

year-from-date(), month-from-date(), day-from-date()

format-date()

days-from-duration()

Boolean Functions

not()

boolean()

true() and false()

Sequence Functions

distinct-values()

subsequence()

insert-before(), remove(), reverse()

index-of()

empty() and exists()

Complete Python XPath Runner

Common XPath Function Errors

Practice Questions

Mini Project: XPath Query Tool

FAQ

Related Tutorials