AI Training Data Reference
Detects references to AI training data, model datasets, and data provenance documentation in Australian contexts.
- Type
- keyword_list
- Confidence
- medium
- Confidence justification
- Medium confidence: keyword-based detection relies on phrase-matching regex within the Purview engine. Corroborative evidence keywords are needed for reliable identification since AI-related terms can appear in non-sensitive contexts.
- Jurisdictions
- au
- Regulations
- AML/CTF Act (Cth), IPA 2009 (Qld), Privacy Act 1988 (Cth)
- Frameworks
- ISO 27001, NIST CSF, PCI-DSS
- Data categories
- sensitive-data, intellectual-property
- Scope
- narrow
- Risk rating
- 6
Should match
This AI training dataset contains 50,000 labelled records— Matches 'training dataset' via Pattern_ai_training_phraseReview the training corpus for bias before model training— Matches 'training corpus' and 'model training' via Pattern_ai_training_phraseData provenance report for ML dataset version 3.2— Matches 'data provenance' and 'ML dataset' via Pattern_ai_training_phrase
Should not match
The training schedule for new employees— No AI training phrase matched by the Purview regexAI-powered search engine results— No training data phrase present
Known false positives
- Generic use of training or dataset in non-AI contexts such as staff training. Mitigation: Require multiple AI-specific keywords to co-occur within proximity.