International Classification of Diseases (ICD-9-CM)
Detects International Classification of Diseases (ICD-9-CM) patterns. This pattern is based on a Microsoft Purview built-in sensitive information type. Due to the generic numeric format, corroborative evidence keywords are essential for reliable detection.
- Type
- regex
- Engine
- universal
- Confidence
- low
- Confidence justification
- Low confidence: three-digit numbers with optional decimal are extremely common in non-medical contexts. Corroborative evidence such as 'ICD-9' or 'diagnosis code' is essential for accurate detection. Added context gating and exclusion rules improve precision and reduce incidental matches.
- Detection quality
- Mixed
- Jurisdictions
- global
- Regulations
- GDPR, HIPAA
- Frameworks
- ISO 27001, ISO 27701, SOC 2
- Data categories
- phi, health
- Scope
- narrow
- Risk rating
- 8
- Platform compatibility
- Purview: Compatible, GCP DLP: Compatible, Macie: Compatible, Zscaler: Compatible, Palo Alto: Compatible, Netskope: Compatible
Pattern
\b\d{3}\.?\d{0,2}\b
Corroborative evidence keywords
ICD-9, diagnosis code, diagnostic code, ICD code, medical classification, admitting diagnosis, clinical diagnosis, discharge diagnosis, ICD-9-CM, medical record number, MRN, patient ID, field, column, row, entry, record, value, form, register (+21 more)
Proximity: 300 characters
Should match
250.00— ICD-9 code for diabetes mellitus type II401.9— ICD-9 code for unspecified essential hypertension486— ICD-9 code for pneumonia
Should not match
ABC.12— Contains letters, not a valid ICD-9 codeAB.12— Contains letters, not a valid ICD-9 formattemplate example placeholder record identifier— Template/sample context should be excluded even when anchor words are present
Known false positives
- Generic numeric sequences matching the digit pattern in non-health contexts Mitigation: Require corroborative evidence keywords within the proximity window to distinguish health identifiers from general numeric data.
- Reference numbers or account identifiers from other domains with similar digit counts Mitigation: Layer with document classification to prioritise matches in health and medical documents.