社会保険番号

Detects 社会保険番号 patterns. This pattern is based on a Microsoft Purview built-in sensitive information type. Users already running Purview may prefer to enable the built-in SIT directly, or use this version as a starting point for customisation.

Type: regex
Engine: universal
Confidence: high
Confidence justification: High confidence: pattern has strong structural constraints (specific format, prefix, or character class restrictions) that significantly reduce false positive rates. Context label evidence plus explicit template/example exclusion improves precision for high-risk identifiers. Added context gating and exclusion rules improve precision and reduce incidental matches.
Detection quality: Verified
Jurisdictions: jp
Regulations: APPI
Frameworks: ISO 27001, ISO 27701
Data categories: pii, government-id
Scope: narrow
Risk rating: 9
Platform compatibility: Purview: Compatible, GCP DLP: Compatible, Macie: Compatible, Zscaler: Compatible, Palo Alto: Compatible, Netskope: Compatible

Pattern

\b\d{4}-\d{6}-\d{1}\b

Corroborative evidence keywords

社会保険, social insurance, SIN, ID number, identification, ID card, license, permit, registration, certificate, field, column, row, entry, record, value, form, register, database, extract (+19 more)

Proximity: 300 characters

Should match

1234-567890-1 — Japanese SIN with dashes
9876-543210-9 — Another SIN format
4567-890123-4 — Valid SIN number

Should not match

1234-56789-1 — Too few digits in middle group
1234-567890-12 — Too many digits in last group
sample template placeholder number 123456789 — Template/sample context should be excluded even when numeric-like values appear
template example placeholder record identifier — Template/sample context should be excluded even when anchor words are present

Known false positives

The specific dash-separated format (XXXX-XXXXXX-X) reduces false positives considerably. Mitigation: The structured format provides good inherent validation. Keyword context further improves accuracy.
In multiple languages, similar terminology used in formal or administrative contexts (education, professional documentation) that does not constitute sensitive data collection. Mitigation: Layer with additional contextual signals such as structured identifiers, form fields, or database column headers to distinguish sensitive records from general references.

References

Japan Social Insurance Number (SIN) entity definition - Microsoft Learn