Provider identifier records
Identifies provider identifier records references in healthcare and patient records. Protected health information under applicable data protection regulations.
- Type
- regex
- Engine
- boost_regex
- Confidence
- medium
- Confidence justification
- identifier/document-structure anchored regex with constrained context replaces phrase-only detection. Added context gating and exclusion rules improve precision and reduce incidental matches.
- Detection quality
- Mixed
- Jurisdictions
- global
- Regulations
- GDPR
- Data categories
- healthcare, phi
- Scope
- wide
- Platform compatibility
- Purview: Compatible, GCP DLP: Compatible, Macie: Compatible, Zscaler: Compatible, Palo Alto: Degraded, Netskope: Unsupported
Pattern
(?is)\b(?:provider\s+identifier\s+records)\b\s{0,80}\b[A-Z0-9][A-Z0-9\-/ ]{4,24}\b
Corroborative evidence keywords
provider identifier records, provider, identifier, records, health, biomedical, information, patient, clinical, medical, hospital, practitioner, diagnosis, treatment, prescription, physician, nurse, therapy, examination, consultation (+29 more)
Proximity: 300 characters
Should match
Provider identifier records— Exact phrase marker matchprovider identifier records— Case-insensitive phrase matchProvider identifier records— Normalized whitespace phrasestructured record with identifier and contextual anchors— Structural anchor sample
Should not match
unrelated generic text— No relevant phrase contextplaceholder value 12345— Random text should not match phrase markergeneric narrative without identifier/document anchors— Should not match plain mentiontemplate example placeholder record identifier— Template/sample context should be excluded even when anchor words are present
Known false positives
- Medical terminology in health education materials, research publications, clinical guidelines, or public health documents without patient-specific data. Mitigation: Require corroborative evidence keywords confirming patient context. Look for co-occurrence with patient identifiers such as medical record numbers or dates of birth.
- General wellness and fitness content using medical vocabulary without constituting protected health information. Mitigation: Layer with patient identifier patterns or healthcare-specific document structure detection to distinguish clinical records from general health content.