Passport Machine-Readable Zone (MRZ, ICAO 9303)
Detects the Machine-Readable Zone (MRZ) of an ICAO 9303 travel document, the OCR-B text printed at the bottom of passport data pages and scanned at border control. This pattern focuses on the TD3 passport format (two lines of 44 characters): line 1 begins 'P' + document subtype + 3-letter issuing state + the holder's names, and line 2 packs the document number, check digits, nationality, date of birth, sex and expiry. MRZ leakage via scans, photocopies and OCR exports is one of the highest-value passport data exposure vectors because it concentrates every identity field in a single fixed-width block.
- Type
- regex
- Engine
- boost_regex
- Confidence
- high
- Confidence justification
- High confidence: the TD3 second line is a 44-character fixed-width structure with field-typed positions (filler characters, sex marker, six-digit dates and trailing check digits) that essentially never occurs by chance in natural text or ordinary identifiers.
- Jurisdictions
- global
- Regulations
- GDPR, CCPA/CPRA
- Frameworks
- ISO 27001, ISO 27701
- Data categories
- pii, government-id
- Scope
- narrow
- Risk rating
- 9
- Platform compatibility
- Purview: Compatible, GCP DLP: Unsupported, Macie: Unsupported, Zscaler: Compatible, Palo Alto: Unsupported, Netskope: Unsupported
Pattern
(?<![A-Z0-9<])(?:[A-Z0-9<]{9}[0-9<][A-Z<]{3}[0-9<]{6}[0-9<][MFXmfx<][0-9<]{6}[0-9<][A-Z0-9<]{14}[0-9<][0-9<]|P[A-Z<][A-Z]{3}[A-Z]+<<[A-Z<]{2,})(?![A-Za-z0-9])
Corroborative evidence keywords
passport, MRZ, machine readable zone, machine-readable zone, travel document, ICAO, issuing country, date of expiry
Proximity: 300 characters
Should match
L898902C36UTO7408122F1204159ZE184226B<<<<<10— TD3 line 2 (document number, nationality UTO, DOB, sex F, expiry, check digits) - ICAO sampleAB21234567FRA8001019M3001012<<<<<<<<<<<<<<04— TD3 line 2 with FRA nationality, sex M, optional field filler and check digitsP<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<<<<<<<<<<— TD3 line 1, document type P, issuing state UTO, surname/given-name with filler
Should not match
P<— Bare document-type marker without the full MRZ line structurePASSPORT NUMBER L898902C3 ISSUED BY UTOPIA AUTHORITY— Prose containing passport tokens but not the fixed-width MRZ structure1234567890123456789012345678901234567890ABCD— 44 characters but not a valid MRZ field layout (no sex marker or date structure in position)
Known false positives
- Long runs of uppercase letters, digits and '<' that coincidentally satisfy the fixed-width field layout. Mitigation: Require corroborative passport/MRZ keywords within 300 characters; optionally validate the ICAO check digits.