Regex for Beginners — From Basics to Real Patterns
Regular expression fundamentals, common patterns, flags, performance pitfalls, and practical examples for developers getting started with regex.

The first time you see something like ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$, the natural reaction is "what on earth is that?" It looks like someone headbutted a keyboard. But once you learn the syntax, regex changes how you work with text entirely.
What Is Regex?
A regular expression (regex) is a way to describe patterns in strings. You can match exact text like "abc", or define patterns like "three digits followed by a hyphen followed by four digits."
Nearly every programming language supports regex — JavaScript, Python, Java, Go, C#. The syntax is almost identical across all of them. Learn it once, use it everywhere.
Core Syntax
Literal Matching
The simplest regex is just the string you're looking for.
hellomatches "hello"2026matches "2026"
Metacharacters
Characters with special meaning:
| Character | Meaning | Example |
|---|---|---|
. | Any single character | a.c matches "abc", "a1c", "a c" |
\d | Digit (0-9) | \d\d matches "42", "01" |
\w | Word character (a-z, A-Z, 0-9, _) | \w+ matches "hello", "test_1" |
\s | Whitespace (space, tab, newline) | a\sb matches "a b" |
^ | Start of string | ^Hello matches lines starting with "Hello" |
$ | End of string | end$ matches lines ending with "end" |
Quantifiers
Specify how many times a character repeats:
| Quantifier | Meaning |
|---|---|
* | Zero or more |
+ | One or more |
? | Zero or one |
{3} | Exactly 3 |
{2,5} | 2 to 5 |
{3,} | 3 or more |
\d+ means "one or more digits" — matches "1", "42", "12345".
Character Classes
Square brackets [] define a set of characters to match:
[aeiou]— one vowel[0-9]— one digit (same as\d)[a-zA-Z]— one letter, upper or lower[^0-9]— any character that's not a digit (^inside brackets means "exclude")
Groups and Capturing
Parentheses () group patterns together:
(ab)+matches "ab", "abab", "ababab"(\d{3})-(\d{4})— on "555-1234", captures "555" and "1234" separately
Captured groups can be extracted in code. Useful for pulling specific parts out of a match, like the area code from a phone number.
OR Operator
The pipe | matches one of several alternatives:
cat|dogmatches "cat" or "dog"(png|jpg|gif)matches image extensions
Common Patterns
Email (Basic Validation)
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
Perfect email validation via regex is basically impossible — the RFC 5322 spec is absurdly complex. This pattern catches most normal email formats, which is good enough for client-side validation.
URL
https?://[^\s]+
Matches strings starting with "http://" or "https://" followed by non-whitespace characters. Not a thorough URL validator, but it works for extracting URLs from text.
Date (YYYY-MM-DD)
\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])
Strip HTML Tags
<[^>]+>
Using regex to parse HTML is generally a bad idea — nested tags and > characters inside attributes break it. But for simple tag stripping, it gets the job done.
US Phone Number
^(\+1)?[-.\s]?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$
Handles formats like "(555) 123-4567", "555-123-4567", "+1 555 123 4567".
Flags
Options appended after the pattern:
| Flag | Meaning |
|---|---|
g | Global matching (find all matches, not just the first) |
i | Case insensitive |
m | Multiline mode (^ and $ match start/end of each line) |
s | Dotall mode (. matches newlines too) |
In JavaScript: /pattern/flags — for example, /hello/gi
Greedy vs. Lazy Matching
By default, quantifiers are greedy — they match as much as possible.
Applying <.+> to <div>hello</div> matches the entire string <div>hello</div>. The .+ gobbles everything it can.
Add ? after the quantifier for lazy matching. <.+?> matches just <div>. It takes as little as possible.
This distinction is critical when working with HTML-like content.
Lookahead and Lookbehind
Match based on what comes before or after, without including it in the result:
\d+(?=px)— digits before "px". In "16px", captures "16"(?<=\$)\d+— digits after "$". In "$50", captures "50"\d+(?!px)— digits not followed by "px"(?<!\$)\d+— digits not preceded by "$"
Supported in Python and JavaScript (ES2018+).
Performance — Watch Out for ReDoS
Nested quantifiers like (a+)+ can become exponentially slow on certain inputs. This is called ReDoS (Regular Expression Denial of Service).
Take the pattern (a+)+$ and feed it "aaaaaaaaaaaaaaaaab". The regex engine has to backtrack hundreds of thousands of times before confirming a non-match. Execution time grows exponentially with string length.
On a Node.js server, ReDoS can freeze the event loop and take down the entire service. Be very careful when applying regex to user-supplied input.
Safer pattern guidelines:
- Avoid nested quantifiers like
(a+)+ - Use bounded quantifiers
{1,100}instead of unbounded+or*where possible - Validate patterns with libraries like safe-regex
Practical Tips
Complex regex becomes unreadable fast. You write it, come back three months later, and have no idea what it does. Give regex patterns meaningful variable names and add comments.
// US phone number matching
const PHONE_REGEX = /^(\+1)?[-.\s]?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$/;
// Python supports verbose mode with comments:
// pattern = re.compile(r"""
// ^(\+1)? # Optional country code
// [-.\s]? # Optional separator
// \(?\d{3}\)? # Area code, optional parens
// [-.\s]? # Optional separator
// \d{3} # First three digits
// [-.\s]? # Optional separator
// \d{4}$ # Last four digits
// """, re.VERBOSE)
And don't build regex in your head alone — always test as you go. Patterns that look correct often behave unexpectedly on real input. Use a regex tester to see matches in real time while writing your pattern. It's faster and far more reliable than guessing.