About TestPattern
The open pattern library for Data Loss Prevention.
TestPattern is a free, community-curated collection of regex patterns, keyword lists, and classification rules for detecting sensitive information in documents and data streams. Think of it as Sigma for DLP.
Why this exists
If you've ever deployed DLP, you know the drill. You need a Medicare number regex. A credit card pattern. A passport detector. So you write it yourself, test it against real documents, discover the false positives, fix them, and hope you haven't missed an edge case.
The team at the next company over is doing the exact same thing. So is every other security team in the country. Nobody shares their work.
Which is strange, because every adjacent security domain figured out pattern sharing years ago:
- SIEM detection — Sigma (3,000+ rules)
- Malware signatures — YARA (thousands of rules)
- Network intrusion — Snort (tens of thousands of rules)
- Threat taxonomy — MITRE ATT&CK (216+ techniques)
- Secret detection — secrets-patterns-db (1,600+ patterns)
DLP had nothing. No shared patterns. No standard format. Every team started from zero.
TestPattern is here to fix that.
What you get
- 1,400+ detection patterns covering PII, PHI, financial data, government IDs, and credentials
- 15+ jurisdictions — Australia, US, EU, UK, Canada, Brazil, and more
- Curated collections for GDPR, HIPAA, PCI-DSS, Privacy Act, and other compliance frameworks
- One-click export to Microsoft Purview XML, GCP DLP JSON, AWS Macie JSON, YAML, or raw regex
- Honest quality metadata — every pattern has test cases, false-positive notes, and confidence ratings
Who's behind this
TestPattern is a community project sponsored by Compl8, the compliance toolkit for Microsoft Purview.
It was started by compliance engineers who were tired of writing the same regex patterns from scratch at every engagement. The patterns are open source. The project is open source.
Get in touch
- GitHub: testpatterndev
- Found a bug or have a pattern to suggest? Open an issue