Date of birth
Detects date-of-birth references in documents across multiple date formats (numeric slash, dot, hyphen, written month, ISO). Requires birth-record keywords in proximity to distinguish DOB from invoice dates, event dates, and other temporal references. No Microsoft built-in SIT exists for unstructured DOB detection.
- Type
- regex
- Engine
- boost_regex
- Confidence
- medium
- Confidence justification
- Medium confidence: date formats appear in every business document. Birth-record keywords are essential to distinguish DOB from the thousands of other dates in any corpus.
- Jurisdictions
- global
- Regulations
- GDPR, CCPA, HIPAA, PIPEDA
- Frameworks
- ISO 27001, ISO 27701, NIST CSF
- Data categories
- pii
- Scope
- wide
- Risk rating
- 7
- Platform compatibility
- Purview: Compatible, GCP DLP: Compatible, Macie: Compatible, Zscaler: Compatible, Palo Alto: Compatible, Netskope: Unsupported
Pattern
\b(?:(?:0?[1-9]|[12][0-9]|3[01])[\/\-\.](?:0?[1-9]|1[012])[\/\-\.](?:19|20)\d{2}|(?:0?[1-9]|[12][0-9]|3[01])(?:st|nd|rd|th)?\s+(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\s+(?:19|20)\d{2}|(?:19|20)\d{2}[\-\/](?:0[1-9]|1[012])[\-\/](?:0[1-9]|[12]\d|3[01]))\b
Corroborative evidence keywords
DOB, date of birth, birth date, born, born on, birthday, d.o.b, d.o.b., address, age, citizenship, city, email, ethnicity, fax, first name, full name, gender, given name, last name (+14 more)
Proximity: 300 characters
Should match
DOB: 15/03/1990— Numeric slash format with DOB labelDate of birth: 03/15/1990— US-style MM/DD/YYYY with labelBorn on 1st January 1985— Written month format with ordinalDate of birth: 25.12.2000— Dot-separated formatBirth date: 1990-03-15— ISO format
Should not match
Invoice date: 15/03/2024— Date in non-DOB financial contextDue date: 01/01/2025— Payment due dateunrelated generic text— No date pattern present
Known false positives
- Non-DOB dates such as invoice dates, event dates, document timestamps, and meeting dates. Mitigation: Require birth-record keywords (DOB, date of birth, born) in proximity. Without these keywords the pattern fires at low confidence only.
- Historical dates in news articles or academic papers that happen to match the date format. Mitigation: Corroborative evidence keywords distinguish personal records from historical references.
- Dates in financial documents (settlement dates, maturity dates, filing dates). Mitigation: Filter excludes common business date labels via TextMatchFilter on the matched context.
References
- https://learn.microsoft.com/en-us/answers/questions/2153006/microsoft-purview-custom-sit-date-of-birth
- https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/personal-information-what-is-it/what-is-personal-data/