Skip to content
Regex for HTML Tags — Pattern Explained with Examples

Regex for HTML Tags — Pattern Explained with Examples

DodaTech Updated Jun 20, 2026 3 min read

HTML tag matching is heavily used in web scraping, content sanitization, and template processing. While regex is not a full HTML parser, it can effectively match simple opening, closing, and self-closing tags with basic attributes. This pattern is useful for quick extraction and cleanup tasks.

The Pattern

/<[^>]*>/g

For a more detailed pattern including tag names and attributes:

/<\/?[\w\s="'.%-]+>/

Pattern Breakdown

PartMeaning
<Opening angle bracket
[^>]*Any character except > — matches tag name, attributes, and whitespace
>Closing angle bracket
<\/?Optional forward slash for closing tags
[\w\s="\'.%-]+Tag content including word chars, spaces, quotes, and common attribute characters

Matches

  • <div>
  • <img src="img.png" />
  • </p>
  • <a href="link" class="btn">
  • <br>

Does NOT Match

  • <div (missing closing bracket)
  • <p>unclosed (tag not a complete match — the regex would match <p> but not the full text)
  • < div> (spaces between < and tag name)
  • div> (missing opening bracket)
  • Nested tags properly (regex cannot track nesting depth)

Language Examples

JavaScript

const htmlTagRegex = /<[^>]*>/g;
const html = '<div><p>Hello</p></div>';
console.log(html.match(htmlTagRegex));
// ['<div>', '<p>', '</p>', '</div>']

Python

import re
pattern = r'<[^>]*>'
html = '<div><p>Hello</p></div>'
matches = re.findall(pattern, html)
print(matches)  # ['<div>', '<p>', '</p>', '</div>']

PHP

$html = '<div><p>Hello</p></div>';
preg_match_all('/<[^>]*>/', $html, $matches);
print_r($matches[0]);
// Array ( [0] => <div> [1] => <p> [2] => </p> [3] => </div> )

Common Pitfalls

  • Regex cannot parse arbitrary HTML — it fails on nested tags of the same type, malformed markup, and complex attribute values containing > characters
  • Script and style tags contain content with < and > that will be incorrectly matched by simple patterns
  • HTML comments (<!-- -->) have special syntax that requires its own handling to avoid false positives
  • Attribute values can contain > if quoted, but [^>]* would stop at the first > inside the attribute

Real-World Use Cases

  • Web scraping — extract all HTML elements from a page for data parsing and content extraction
  • Content sanitization — strip HTML tags from user input to prevent XSS attacks in rendered output
  • Template processing — identify and replace custom template tags or directives within HTML markup

FAQ

When should I NOT use regex for HTML?
Never use regex to parse or validate HTML structure. Use a DOM parser (like DOMParser in JS or BeautifulSoup in Python) when you need to extract content, traverse the tree, or handle nested elements.
How do I match only opening tags and not closing tags?
Use the pattern <[^>/]+> — the [^>/] excludes the forward slash to avoid matching </tagname>. However, self-closing tags like <br /> will also be excluded.

Related Patterns

Regex for URL Regex for Email

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro