Social security number
Identifies documents containing references to social security number in Australian contexts. This information type is classified as personally identifiable information under the Privacy Act 1988.
- Type
- regex
- Engine
- boost_regex
- Confidence
- low
- Confidence justification
- Low confidence marker: phrase-based artifact detection to bootstrap line-by-line coverage. Requires corroborative evidence and later hardening to high-confidence structural patterns.
- Detection quality
- Verified
- Jurisdictions
- global
- Regulations
- AML/CTF Act (Cth), IPA 2009 (Qld), NDB Scheme (Cth), Privacy Act 1988 (Cth)
- Frameworks
- ISO 27001, ISO 27701
- Data categories
- government-id, pii
- Scope
- wide
- Platform compatibility
- Purview: Compatible, GCP DLP: Compatible, Macie: Compatible, Zscaler: Compatible, Palo Alto: Compatible, Netskope: Compatible
Pattern
\bsocial\s+security\s+number\b
Corroborative evidence keywords
social security number, social, security, number, government, ids, civil, status, ID number, identification, ID card, license, permit, registration, certificate
Proximity: 300 characters
Should match
Social security number— Exact phrase marker matchsocial security number— Case-insensitive phrase matchSocial security number— Normalized whitespace phrase
Should not match
unrelated generic text— No relevant phrase contextplaceholder value 12345— Random text should not match phrase marker
Known false positives
- Common words and phrases related to social security number appearing in policy documents, training materials, HR templates, or compliance guidelines without actual personal data. Mitigation: Require corroborative evidence keywords within the proximity window to confirm sensitive data context rather than general discussion.
- In Australian English, similar terminology used in formal or administrative contexts (education, professional documentation) that does not constitute sensitive data collection. Mitigation: Layer with additional contextual signals such as structured identifiers, form fields, or database column headers to distinguish sensitive records from general references.
- High-frequency pattern matches in large document corpora due to broad regex anchors. Expected match rate is significantly higher than specific identifier patterns. Mitigation: Tune confidence thresholds for bulk scanning. Consider using this pattern primarily as a pre-filter with secondary validation.
References
- https://www.passports.gov.au/apply-or-renew
- https://www.ato.gov.au/individuals-and-families/tax-file-number
- https://www.abr.business.gov.au/FAQ/ABNBasics
- https://immi.homeaffairs.gov.au/visas/already-have-a-visa/check-visa-details-and-conditions/overview
- https://www.servicesaustralia.gov.au/how-to-prove-your-identity-with-centrelink
- https://www.aec.gov.au/enrol/
- https://www.nsw.gov.au/family-and-relationships/births/birth-certificates
- https://www.afp.gov.au/our-services/national-police-checks