Passport Machine-Readable Zone (MRZ, ICAO 9303)

Detects the Machine-Readable Zone (MRZ) of an ICAO 9303 travel document, the OCR-B text printed at the bottom of passport data pages and scanned at border control. This pattern focuses on the TD3 passport format (two lines of 44 characters): line 1 begins 'P' + document subtype + 3-letter issuing state + the holder's names, and line 2 packs the document number, check digits, nationality, date of birth, sex and expiry. MRZ leakage via scans, photocopies and OCR exports is one of the highest-value passport data exposure vectors because it concentrates every identity field in a single fixed-width block.

Type
regex
Engine
boost_regex
Confidence
high
Confidence justification
High confidence: the TD3 second line is a 44-character fixed-width structure with field-typed positions (filler characters, sex marker, six-digit dates and trailing check digits) that essentially never occurs by chance in natural text or ordinary identifiers.
Jurisdictions
global
Regulations
GDPR, CCPA/CPRA
Frameworks
ISO 27001, ISO 27701
Data categories
pii, government-id
Scope
narrow
Risk rating
9
Platform compatibility
Purview: Compatible, GCP DLP: Unsupported, Macie: Unsupported, Zscaler: Compatible, Palo Alto: Unsupported, Netskope: Unsupported

Pattern

(?<![A-Z0-9<])(?:[A-Z0-9<]{9}[0-9<][A-Z<]{3}[0-9<]{6}[0-9<][MFXmfx<][0-9<]{6}[0-9<][A-Z0-9<]{14}[0-9<][0-9<]|P[A-Z<][A-Z]{3}[A-Z]+<<[A-Z<]{2,})(?![A-Za-z0-9])

Corroborative evidence keywords

passport, MRZ, machine readable zone, machine-readable zone, travel document, ICAO, issuing country, date of expiry

Proximity: 300 characters

Should match

Should not match

Known false positives

References