Background check results
Identifies documents containing references to background check results in international contexts. This information type is classified as personally identifiable information under applicable data protection regulations.
- Type
- regex
- Engine
- boost_regex
- Confidence
- medium
- Confidence justification
- category-aware structural regex with anchor and context constraints replaces phrase-only detection. Added context gating and exclusion rules improve precision and reduce incidental matches.
- Detection quality
- Mixed
- Jurisdictions
- global
- Regulations
- GDPR
- Data categories
- pii
- Scope
- wide
- Platform compatibility
- Purview: Compatible, GCP DLP: Compatible, Macie: Compatible, Zscaler: Compatible, Palo Alto: Degraded, Netskope: Unsupported
Pattern
(?is)\b(?:background\s+check|criminal\s+record\s+check|reference\s+check|pre[\s-]+employment\s+screening|police\s+clearance|vetting\s+results|screening\s+report|character\s+check|employment\s+verification|background\s+screening|security\s+clearance)\b
Corroborative evidence keywords
background check results, background, check, results, employment, workforce, records, employee, payroll, benefits, termination, hire date, salary, compensation, 401k, W-2, I-9, superannuation, police check, working with children check (+87 more)
Proximity: 300 characters
Should match
background check— Primary topic phrase matchcriminal record check— Case-insensitive topic phrase matchreference check— Alternative topic phrase matchpre-employment screening— Additional topic phrase match
Should not match
unrelated generic text without domain phrases— No relevant topic phrases presentplaceholder value 12345— Random text should not match topic-specific regexemployee interview— Generic word pair from old broad template should not match
Known false positives
- Common words and phrases related to background check results appearing in policy documents, training materials, HR templates, or compliance guidelines without actual personal data. Mitigation: Require corroborative evidence keywords within the proximity window to confirm sensitive data context rather than general discussion.
- In English (as the primary international business language), similar terminology used in formal or administrative contexts (education, professional documentation) that does not constitute sensitive data collection. Mitigation: Layer with additional contextual signals such as structured identifiers, form fields, or database column headers to distinguish sensitive records from general references.
- High-frequency pattern matches in large document corpora due to broad regex anchors. Expected match rate is significantly higher than specific identifier patterns. Mitigation: Tune confidence thresholds for bulk scanning. Consider using this pattern primarily as a pre-filter with secondary validation.