Единен граждански номер (ЕГН)
Detects Единен граждански номер (ЕГН) patterns. This pattern is based on a Microsoft Purview built-in sensitive information type. Users already running Purview may prefer to enable the built-in SIT directly, or use this version as a starting point for customisation.
- Type
- regex
- Engine
- universal
- Confidence
- medium
- Confidence justification
- Medium confidence: pattern has structural constraints but corroborative keywords are recommended to reduce false positive rates. Context label evidence plus explicit template/example exclusion improves precision for high-risk identifiers.
- Detection quality
- Verified
- Jurisdictions
- eu, bg
- Regulations
- BDSG, CNIL / LIL, GDPR
- Frameworks
- ISO 27001, ISO 27701
- Data categories
- pii, government-id
- Scope
- narrow
- Risk rating
- 9
- Platform compatibility
- Purview: Compatible, GCP DLP: Compatible, Macie: Compatible, Zscaler: Compatible, Palo Alto: Compatible, Netskope: Compatible
Pattern
\b\d{10}\b
Corroborative evidence keywords
EGN, uniform civil number, граждански номер, ID number, identification, ID card, license, permit, registration, certificate
Proximity: 300 characters
Should match
7501011234— Bulgarian UCN (born 1975-01-01)8812125678— Bulgarian UCN (born 1988-12-12)9203034567— Bulgarian UCN (born 1992-03-03)
Should not match
123456789— Too few digits (9)12345678901— Too many digits (11)sample template placeholder number 123456789— Template/sample context should be excluded even when numeric-like values appear
Known false positives
- Ten-digit numeric sequences appear in phone numbers, account numbers, and other non-identity contexts. Mitigation: Require corroborative evidence keywords such as "EGN" or "граждански номер" within the proximity window. Validate embedded date (first 6 digits) for additional confidence.
- In multiple languages, similar terminology used in formal or administrative contexts (education, professional documentation) that does not constitute sensitive data collection. Mitigation: Layer with additional contextual signals such as structured identifiers, form fields, or database column headers to distinguish sensitive records from general references.