AI Training Data Reference

Name: AI Training Data Reference
Creator: TestPattern Community
License: https://opensource.org/licenses/MIT

Detects references to AI training data, model datasets, and data provenance documentation in Australian contexts.

Type: keyword_list
Confidence: medium
Confidence justification: Medium confidence: keyword-based detection relies on phrase-matching regex within the Purview engine. Corroborative evidence keywords are needed for reliable identification since AI-related terms can appear in non-sensitive contexts.
Jurisdictions: au
Regulations: AML/CTF Act (Cth), IPA 2009 (Qld), Privacy Act 1988 (Cth)
Frameworks: ISO 27001, NIST CSF, PCI-DSS
Data categories: sensitive-data, intellectual-property
Scope: narrow
Risk rating: 6

Should match

This AI training dataset contains 50,000 labelled records — Matches 'training dataset' via Pattern_ai_training_phrase
Review the training corpus for bias before model training — Matches 'training corpus' and 'model training' via Pattern_ai_training_phrase
Data provenance report for ML dataset version 3.2 — Matches 'data provenance' and 'ML dataset' via Pattern_ai_training_phrase

The training schedule for new employees — No AI training phrase matched by the Purview regex
AI-powered search engine results — No training data phrase present

Generic use of training or dataset in non-AI contexts such as staff training. Mitigation: Require multiple AI-specific keywords to co-occur within proximity.