





The organization should establish and document data quality standards for all data used to build or run AI systems. These standards should address aspects such as accuracy, completeness, consistency, timeliness, and representativeness, and aim to minimize bias. Processes should be implemented to ensure that all data used for AI systems complies with these defined quality standards.






The organisation should establish a process to verify that its training, validation, and testing data sets are suitable for the AI system's intended purpose. This validation should confirm that the data is relevant, representative, and possesses the necessary statistical properties. The process must specifically analyse the data to prevent biases against particular groups of people.






The organisation should create and maintain documentation for each data set used in its AI systems. This documentation should describe the data's key characteristics, such as its origin, statistical properties, and any known limitations or biases. The purpose is to demonstrate that each data set is relevant, representative, and sufficiently complete for the system's intended purpose.






The organisation should apply state-of-the-art security and privacy-preserving measures, including pseudonymisation and anonymisation, to special categories of personal data processed for bias detection and correction in high-risk AI systems. This includes implementing appropriate encryption, access controls, and data minimisation techniques.






The organisation should establish a formal process to determine if the use of special categories of personal data is strictly necessary for bias detection and correction. This process must include an evaluation of whether the objective can be achieved with other data, such as synthetic or anonymised data. The outcome of this evaluation should be documented to justify the decision.






Providers of high-risk AI systems should establish and maintain procedures for retaining logs automatically generated by high-risk AI systems under its control. The retention period for these logs must be at least six months, or longer if required by relevant Union or national law, particularly data protection regulations.






The organisation should establish and document its data management processes for high-risk AI systems. These processes must cover the entire data lifecycle, from collection to retention.
The procedures should ensure that data used for developing and operating the AI system is relevant, representative, and handled securely. This includes defining policies for:






Technical documentation for AI model should be created and maintained. Documentation should at minimum cover:
The documentation should be version-controlled, periodically reviewed and retained in a form that can be provided without undue delay to relevant authorities upon request.






GPAI model providers should draw up, maintain and make available documentation enabling AI system providers to:
The documentation should cover, at a minimum:
Disclosure should be structured to protect intellectual property rights, confidential business information and trade secrets in accordance with EU and national law. Access controls, licensing terms and confidentiality mechanisms should also be defined and documented.






The organization should define and document a procedure for ensuring data provenance in its AI systems. This procedure should detail how the origin, transformations, and usage of data are tracked and logged throughout the entire lifecycle of both the data and the AI system. The aim is to ensure data traceability and accountability for all data utilized within AI solutions.






The organization should establish and implement robust data governance and management practices specifically for training, validation, and testing datasets utilized in high-risk AI systems. These practices must be tailored to the specific intended purpose of each AI system.
Particular attention should be paid to the management of data preparation processing operations, which include:
The organization should ensure that personnel involved in these data preparation activities are adequately trained and that the processes are documented and regularly reviewed for effectiveness and compliance.






The organisation should establish and maintain processes to identify relevant data gaps or shortcomings in training, validation, and testing datasets that could prevent compliance with regulatory requirements. The organisation should also develop and implement strategies to address these identified gaps.






The organisation should establish and document the criteria for the training, validation, and testing data sets used in its high-risk AI systems. The documentation should define what constitutes suitable data quality in view of the system's intended purpose. This includes specifying requirements for data relevance, representativeness, completeness, and statistical properties to ensure the data is fit for purpose and minimises biases.






The organization should establish processes to identify and incorporate specific geographical, contextual, behavioural, or functional characteristics into data sets used for AI systems. This ensures the data sets accurately reflect the intended operational environment of the AI system, proportionate to its purpose.






When processing special categories of personal data for bias detection and correction in high-risk AI systems, the organisation should ensure that the records of processing activities clearly document the strict necessity of such processing. The documentation should also explain why the objective of bias detection and correction could not be achieved by processing other types of data, such as synthetic or anonymised data.
Access to this data should be subject to strict controls. Access should be documented to ensure it is limited to authorized persons with confidentiality obligations. Transmission, transferal or access to such data by other parties should be strictly prohibited.






The organisation should establish and implement a clear policy and procedure for the deletion of special categories of personal data processed for AI bias detection and correction. This data should be deleted promptly once the bias has been corrected or the defined retention period has been reached, whichever occurs first. The procedure should ensure compliance with relevant data protection regulations and internal retention schedules.






The organisation should implement technical controls and configurations to ensure that special categories of personal data, processed for bias detection and correction in high-risk AI systems, are subject to strict limitations on their re-use beyond the defined purpose.






The organisation should establish a process for managing test data for high-risk AI systems that are not developed using model training techniques. This process should ensure the test data sets are relevant, representative, complete, and error-free to properly validate the system's performance, safety, and compliance.






The organisation should establish a process for the ongoing testing and validation of its high-risk AI systems to identify and address model flaws. This process should include activities like stress testing, identifying edge cases, and checking for unintended behaviour. Any discovered flaws should be documented, their risks evaluated, and a plan for remediation should be put in place to ensure model integrity and resilience.






Financial institutions providing high-risk AI systems and subject to Union financial services law regarding internal governance, arrangements, or processes, should ensure that automatically generated logs from these AI systems are maintained as an integral part of the documentation required by that financial services law. This ensures compliance with specific regulatory frameworks governing financial institutions.






The organization should establish, document, and maintain procedures for managing training, validation, and testing to ensure they are suitable for the system's intended purpose. The process should include detecting, documenting, and mitigating potential biases and other data limitations. These procedures should define how data is:
The organization should ensure data management procedures are consistently applied throughout the AI system development lifecycle to support responsible and effective AI development.






The organization should define and document the criteria for selecting data preparation approaches. This documentation should also include the specific methods that the organization will use for data preparation, ensuring consistency and effectiveness in data handling for AI systems.






Providers of GPAI models should document usage of free and open-source licenses in the development, modification and distribution of AI models.
The documentation should cover, at a minimum:






The organisation should define and implement cybersecurity measures to protect its general-purpose AI models with systemic risk. This includes protection against threats for both the model itself and its supporting physical infrastructure.






The organization has defined suitable means to identify outputs when it is necessary to ensure the conformity of AI products and services. This may involve assigning unique identifiers, such as serial numbers, batch codes, or physical labels, to products or batches.
The organization needs to also identify the status of outputs with respect to monitoring and measurement requirements throughout production and service provision.
When traceability is a requirement, the organization also needs to define controls for the unique identification of outputs, and shall retain the documented information necessary to enable traceability.






The organisation should implement and document a data preparation process to ensure data sets are as free of errors and as complete as possible. This process should define procedures for identifying and rectifying data errors. It should also specify how to handle missing or incomplete data in a manner that aligns with the AI system's intended purpose and minimises bias.






The organization needs to exercise care with property (e.g. confidential data, personal data, materials, components, tools, equipment, premises, IP) belonging to customers or stakeholders while it is under the organization’s control or being used by the organization in the development of high-risk AI systems. The organization should describe general rules for doing this.
If there is property provided by customers or external providers, that is being incorporated into own AI products / services, the organization needs to have a process for separately identifying, verifying and protecting the property.
When any external property is lost, damaged or otherwise found to be unsuitable for use, the organization shall report this to the customer or external provider and retain documented information on what has occurred.






The organisation should create and publish a summary of the content used to train its general-purpose AI models. This summary should provide a detailed overview of the data sources and the composition of the training dataset, following the template provided by the AI Office.
Digiturvamallissa kaikki vaatimuskehikkojen vaatimukset kohdistetaan universaaleihin tietoturvatehtäviin, jotta voitte muodostaa yksittäisen suunnitelman, joka täyttää ison kasan vaatimuksia.
.png)