Regex for Beginners — From Basics to Real Patterns

How regular expression pattern matching works

The first time you see something like ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$, the natural reaction is "what on earth is that?" It looks like someone headbutted a keyboard. But once you learn the syntax, regex changes how you work with text entirely.

What Is Regex?

A regular expression (regex) is a way to describe patterns in strings. You can match exact text like "abc", or define patterns like "three digits followed by a hyphen followed by four digits."

Nearly every programming language supports regex — JavaScript, Python, Java, Go, C#. The syntax is almost identical across all of them. Learn it once, use it everywhere.

Core Syntax

Literal Matching

The simplest regex is just the string you're looking for.

hello matches "hello"
2026 matches "2026"

Metacharacters

Characters with special meaning:

Character	Meaning	Example
`.`	Any single character	`a.c` matches "abc", "a1c", "a c"
`\d`	Digit (0-9)	`\d\d` matches "42", "01"
`\w`	Word character (a-z, A-Z, 0-9, _)	`\w+` matches "hello", "test_1"
`\s`	Whitespace (space, tab, newline)	`a\sb` matches "a b"
`^`	Start of string	`^Hello` matches lines starting with "Hello"
`$`	End of string	`end$` matches lines ending with "end"

Quantifiers

Specify how many times a character repeats:

Quantifier	Meaning
`*`	Zero or more
`+`	One or more
`?`	Zero or one
`{3}`	Exactly 3
`{2,5}`	2 to 5
`{3,}`	3 or more

\d+ means "one or more digits" — matches "1", "42", "12345".

Character Classes

Square brackets [] define a set of characters to match:

[aeiou] — one vowel
[0-9] — one digit (same as \d)
[a-zA-Z] — one letter, upper or lower
[^0-9] — any character that's not a digit (^ inside brackets means "exclude")

Groups and Capturing

Parentheses () group patterns together:

(ab)+ matches "ab", "abab", "ababab"
(\d{3})-(\d{4}) — on "555-1234", captures "555" and "1234" separately

Captured groups can be extracted in code. Useful for pulling specific parts out of a match, like the area code from a phone number.

OR Operator

The pipe | matches one of several alternatives:

cat|dog matches "cat" or "dog"
(png|jpg|gif) matches image extensions

Common Patterns

Email (Basic Validation)

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

Perfect email validation via regex is basically impossible — the RFC 5322 spec is absurdly complex. This pattern catches most normal email formats, which is good enough for client-side validation.

URL

https?://[^\s]+

Matches strings starting with "http://" or "https://" followed by non-whitespace characters. Not a thorough URL validator, but it works for extracting URLs from text.

Date (YYYY-MM-DD)

\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])

Strip HTML Tags

<[^>]+>

Using regex to parse HTML is generally a bad idea — nested tags and > characters inside attributes break it. But for simple tag stripping, it gets the job done.

US Phone Number

^(\+1)?[-.\s]?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$

Handles formats like "(555) 123-4567", "555-123-4567", "+1 555 123 4567".

Flags

Options appended after the pattern:

Flag	Meaning
`g`	Global matching (find all matches, not just the first)
`i`	Case insensitive
`m`	Multiline mode (`^` and `$` match start/end of each line)
`s`	Dotall mode (`.` matches newlines too)

In JavaScript: /pattern/flags — for example, /hello/gi

Greedy vs. Lazy Matching

By default, quantifiers are greedy — they match as much as possible.

Applying <.+> to <div>hello</div> matches the entire string <div>hello</div>. The .+ gobbles everything it can.

Add ? after the quantifier for lazy matching. <.+?> matches just <div>. It takes as little as possible.

This distinction is critical when working with HTML-like content.

Lookahead and Lookbehind

Match based on what comes before or after, without including it in the result:

\d+(?=px) — digits before "px". In "16px", captures "16"
(?<=\$)\d+ — digits after "$". In "$50", captures "50"
\d+(?!px) — digits not followed by "px"
(?<!\$)\d+ — digits not preceded by "$"

Supported in Python and JavaScript (ES2018+).

Performance — Watch Out for ReDoS

Nested quantifiers like (a+)+ can become exponentially slow on certain inputs. This is called ReDoS (Regular Expression Denial of Service).

Take the pattern (a+)+$ and feed it "aaaaaaaaaaaaaaaaab". The regex engine has to backtrack hundreds of thousands of times before confirming a non-match. Execution time grows exponentially with string length.

On a Node.js server, ReDoS can freeze the event loop and take down the entire service. Be very careful when applying regex to user-supplied input.

Safer pattern guidelines:

Avoid nested quantifiers like (a+)+
Use bounded quantifiers {1,100} instead of unbounded + or * where possible
Validate patterns with libraries like safe-regex

Practical Tips

Complex regex becomes unreadable fast. You write it, come back three months later, and have no idea what it does. Give regex patterns meaningful variable names and add comments.

// US phone number matching
const PHONE_REGEX = /^(\+1)?[-.\s]?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$/;

// Python supports verbose mode with comments:
// pattern = re.compile(r"""
//     ^(\+1)?       # Optional country code
//     [-.\s]?       # Optional separator
//     \(?\d{3}\)?   # Area code, optional parens
//     [-.\s]?       # Optional separator
//     \d{3}         # First three digits
//     [-.\s]?       # Optional separator
//     \d{4}$        # Last four digits
// """, re.VERBOSE)

And don't build regex in your head alone — always test as you go. Patterns that look correct often behave unexpectedly on real input. Use a regex tester to see matches in real time while writing your pattern. It's faster and far more reliable than guessing.