AI Foundation Model Transparency Act of 2023
While the AI Foundation Model Transparency Act of 2023 (H.R. 6881) did not pass, there are great ideas and insights about the regulators point of view.
The bill was introduced in the U.S. House of Representatives on December 22, 2023. It was referred to the House Committee on Energy and Commerce. It did not progress further through the legislative process in the 118th Congress.
This summary is prepared to help data governance and privacy professionals who are aiming to build operational safeguards within a leading AI technology company working with the development and release of Foundational Models:
Core Purpose:
The Act directs the Federal Trade Commission (FTC) to establish regulations requiring transparency about the training data and algorithms used in AI foundation models. The goal is to address concerns about copyright infringement, bias, inaccuracy, and consumer protection, while equipping users with information to enforce rights and make informed decisions.
Key Definitions for Your Company:
- Foundation Model: Defined as an AI model trained on broad data, generally using self-supervision, with at least 1 billion parameters, applicable across contexts, and capable (or easily modifiable) of performing tasks posing serious risks (security, economic, health/safety). This definition holds even if technical safeguards are applied to limit unsafe capabilities.
- Covered Entity: Your company would likely be considered a "covered entity" if it provides a foundation model generating over 100,000 monthly outputs or used by over 30,000 monthly users (including through second-party entities). The FTC has the authority to update these thresholds.
Operational Safeguards & Transparency Requirements:
The FTC will establish specific standards (within 9 months of the Act's enactment) detailing information that covered entities must provide. Your data governance and privacy functions should prepare to document and potentially disclose the following for each foundation model:
- Training Data:
- Sources: Document all sources used for training data, including details on personal data collection and information necessary to help copyright/license holders enforce their rights.
- Collection/Retention During Inference: Detail whether and how data is collected and retained when users interact with the model (inference).
- Composition: Describe the size and composition of training data, including broad demographic, language, and other attribute information, while ensuring privacy is accounted for.
- Governance: Detail data governance procedures, including how data was edited, filtered, or curated.
- Labeling: Describe how data was labeled and how the labeling process was validated.
- Model Information:
- Purpose & Limitations: Document the model's intended uses, foreseen limitations, potential risks, version history, and release date.
- Risk Management Alignment: Describe efforts to align the model and its transparency with frameworks like the NIST AI Risk Management Framework or similar standards.
- Performance & High-Risk Areas: Provide performance details based on evaluations/audits (especially on standard benchmarks) and describe precautions taken for high-risk queries (e.g., medical, financial, hiring, policing, elections).
- Computational Resources: Information on the computational power needed for training and operation.
- Public Disclosure:
- The FTC standards will dictate precisely what information must be submitted directly to the Commission and what must be made publicly available.
- Publicly available information must be displayed on the covered entity's website and also submitted for inclusion in a central, machine-readable repository hosted by the FTC.
Considerations & Enforcement:
- The FTC will consult various stakeholders (NIST, OSTP, Copyright Office, industry, academia, civil rights groups) when creating the standards.
- The standards may include alternative provisions for open-source or derived foundation models.
- Regulations will take effect 90 days after being finalized by the FTC and will be reviewed and updated periodically.
- Non-compliance will be treated as an unfair or deceptive act or practice under the Federal Trade Commission Act, subjecting the entity to FTC enforcement actions and penalties.
The broader industry discussion around AI transparency:
- General Need for Transparency: The bill was introduced amid growing calls for transparency due to concerns about bias, misinformation, and copyright infringement related to AI foundation models. Proponents, like the bill's sponsors and creative industry groups (SAG-AFTRA, Authors Guild, Universal Music), emphasized the need for users and copyright holders to understand how models are trained.
- Industry Opacity: Studies like the Stanford Foundation Model Transparency Index highlighted a general lack of transparency across the industry, particularly concerning training data, labor, and computing resources.
- Concerns about Compliance Burden: Some analyses, particularly from venture capital perspectives, raised concerns that broad transparency mandates could disproportionately burden smaller tech companies ("Little Tech") compared to larger platforms, potentially hindering competition and innovation. There were also arguments that some proposed disclosure requirements might be unconstitutional.
- Varying Regulatory Approaches: The discussion around this bill occurred within a complex landscape of differing state, federal, and international approaches to AI regulation, creating uncertainty for businesses.
In summary: As a data governance professional, your focus should be on establishing robust internal processes to meticulously document training data provenance, governance practices, model characteristics, risk assessments, and inference data handling. Prepare systems to readily provide specified information both to the FTC and publicly in a machine-readable format, ensuring alignment with the forthcoming FTC standards.
Note: This document is prepared using Google Gemini 2.5 Pro (experimental)