Designing Sensitive Information Types

A detection pattern is only as good as the thinking behind it. This guide covers how to design SITs that actually work in production — and how to figure out which ones your organisation needs in the first place.

Anatomy of a SIT

Every sensitive information type has three layers:

  1. Primary pattern — What you're looking for. A regex, keyword list, or built-in function with checksum validation.
  2. Corroborative evidence — How sure you are. Keywords near the match that distinguish a 10-digit number from a Medicare number.
  3. Confidence tiers — What happens. Different evidence combinations map to different confidence levels and policy actions.

Detection methods

Confidence levels

Design principles

Complex SITs

Purview's XML schema supports composable logic through the <Any> element:

Defining sensitive data for your organisation

The hardest part of DLP isn't writing regex. It's figuring out what you need to detect in the first place.

The fastest way to build a comprehensive sensitive data inventory is a structured workshop. Key questions:

The Open Top 500 — a list of 500 sensitive information types across 25 categories — is available on the full page at https://testpattern.dev/design.