Regex for HTML Tags — Pattern Explained with Examples
DodaTech
Updated Jun 20, 2026
3 min read
HTML tag matching is heavily used in web scraping, content sanitization, and template processing. While regex is not a full HTML parser, it can effectively match simple opening, closing, and self-closing tags with basic attributes. This pattern is useful for quick extraction and cleanup tasks.
The Pattern
/<[^>]*>/gFor a more detailed pattern including tag names and attributes:
/<\/?[\w\s="'.%-]+>/Pattern Breakdown
| Part | Meaning |
|---|---|
< | Opening angle bracket |
[^>]* | Any character except > — matches tag name, attributes, and whitespace |
> | Closing angle bracket |
<\/? | Optional forward slash for closing tags |
[\w\s="\'.%-]+ | Tag content including word chars, spaces, quotes, and common attribute characters |
Matches
<div><img src="img.png" /></p><a href="link" class="btn"><br>
Does NOT Match
<div(missing closing bracket)<p>unclosed(tag not a complete match — the regex would match<p>but not the full text)< div>(spaces between<and tag name)div>(missing opening bracket)- Nested tags properly (regex cannot track nesting depth)
Language Examples
JavaScript
const htmlTagRegex = /<[^>]*>/g;
const html = '<div><p>Hello</p></div>';
console.log(html.match(htmlTagRegex));
// ['<div>', '<p>', '</p>', '</div>']
Python
import re
pattern = r'<[^>]*>'
html = '<div><p>Hello</p></div>'
matches = re.findall(pattern, html)
print(matches) # ['<div>', '<p>', '</p>', '</div>']PHP
$html = '<div><p>Hello</p></div>';
preg_match_all('/<[^>]*>/', $html, $matches);
print_r($matches[0]);
// Array ( [0] => <div> [1] => <p> [2] => </p> [3] => </div> )
Common Pitfalls
- Regex cannot parse arbitrary HTML — it fails on nested tags of the same type, malformed markup, and complex attribute values containing
>characters - Script and style tags contain content with
<and>that will be incorrectly matched by simple patterns - HTML comments (
<!-- -->) have special syntax that requires its own handling to avoid false positives - Attribute values can contain
>if quoted, but[^>]*would stop at the first>inside the attribute
Real-World Use Cases
- Web scraping — extract all HTML elements from a page for data parsing and content extraction
- Content sanitization — strip HTML tags from user input to prevent XSS attacks in rendered output
- Template processing — identify and replace custom template tags or directives within HTML markup
FAQ
Related Patterns
Regex for URL Regex for Email
Previous
Regex for MAC Address — Pattern Explained with Examples
Next
Regex for Credit Card Numbers — Pattern Explained with Examples
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro