AI governance lab stresses transparency in mitigating AI misbehavior

AI governance lab stresses transparency in mitigating AI misbehavior

Technology
Webp 3r2l9nmmbri3huekmox6348shtyh
Alexandra Reeve Givens President & CEO at Center for Democracy & Technology | Official website

ORGANIZATIONS IN THIS STORY

LETTER TO THE EDITOR

Have a concern or an opinion about this story? Click below to share your thoughts.
Send a message

Community Newsmaker

Know of a story that needs to be covered? Pitch your story to The Business Daily.
Community Newsmaker

The CDT AI Governance Lab has released a report titled "Tuning Into Safety: How Visibility into Safety Training Can Help External Actors Mitigate AI Misbehavior." The report addresses the risks associated with AI foundation models that may respond to queries in unexpected and harmful ways. Due to limited capabilities in preventing or predicting such behavior, reliance on these models in high-stakes situations is cautious.

The report emphasizes the importance of transparency in safety training for understanding when and why foundation models behave unexpectedly. This transparency aids deployers, end users, developers, policymakers, and researchers by providing critical information that can guide responsible use and development of these models.

"Transparency into safety training benefits foundation model deployers and end users, by equipping them to make responsible decisions about whether and how to use them," the report states. It also highlights the term "misbehavior" as shorthand for unintended harmful behavior by foundation models.

Safety training includes practices aimed at preventing misbehavior, such as reinforcement learning from human feedback and creating model specifications. However, it does not encompass system-level safeguards like content filters and monitoring systems.

The report recommends developers disclose three types of information: a model specification outlining safety training goals; key information about safety training datasets; and details on internal evaluations used during safety training. These disclosures would help external actors understand when and why models misbehave.

By following these recommendations, the report suggests that stakeholders can better predict potential misbehavior. While some leading developers already disclose much of this information, there is room for improvement across the industry.

ORGANIZATIONS IN THIS STORY

LETTER TO THE EDITOR

Have a concern or an opinion about this story? Click below to share your thoughts.
Send a message

Community Newsmaker

Know of a story that needs to be covered? Pitch your story to The Business Daily.
Community Newsmaker

MORE NEWS