Unique Student Identifier
Detects Unique Student Identifier (USI) patterns. A 10-character alphanumeric code (uppercase letters and digits) mandatory since January 2023 for higher education in Australia.
- Type
- regex
- Engine
- universal
- Confidence
- low
- Confidence justification
- Low confidence: a 10-character uppercase alphanumeric pattern is extremely broad and will match tracking numbers, serial codes, product IDs, and many other identifiers. Corroborative evidence keywords such as USI, student identifier, or TAFE are essential for reliable detection.
- Detection quality
- Partial
- Jurisdictions
- au
- Regulations
- AML/CTF Act (Cth), IPA 2009 (Qld), NDB Scheme (Cth), Privacy Act 1988 (Cth)
- Frameworks
- ISO 27001
- Data categories
- pii, government-id
- Scope
- wide
- Risk rating
- 6
- Platform compatibility
- Purview: Compatible, GCP DLP: Compatible, Macie: Compatible, Zscaler: Compatible, Palo Alto: Compatible, Netskope: Compatible
Pattern
\b[A-Z0-9]{10}\b
Corroborative evidence keywords
USI, unique student identifier, student identifier, student, transcript, grade, GPA, enrollment, FERPA, FAFSA, financial aid, tuition, degree
Proximity: 300 characters
Should match
ABC1234567— USI with mixed letters and digits2K48TF3GHP— Standard USI formatA1B2C3D4E5— Alternating format USI
Should not match
ABC12345— Only 8 characters instead of 10ABC12345678— 11 characters instead of 10abc1234567— Lowercase letters (pattern requires uppercase)
Known false positives
- Common words and phrases related to unique student identifier appearing in policy documents, training materials, HR templates, or compliance guidelines without actual personal data. Mitigation: Require corroborative evidence keywords within the proximity window to confirm sensitive data context rather than general discussion.
- In Australian English, similar terminology used in formal or administrative contexts (education, professional documentation) that does not constitute sensitive data collection. Mitigation: Layer with additional contextual signals such as structured identifiers, form fields, or database column headers to distinguish sensitive records from general references.
- High-frequency pattern matches in large document corpora due to broad regex anchors. Expected match rate is significantly higher than specific identifier patterns. Mitigation: Tune confidence thresholds for bulk scanning. Consider using this pattern primarily as a pre-filter with secondary validation.