AI Agents Are Getting Better. Their Safety Disclosures Aren't

In the rapidly evolving world of artificial intelligence, a new breed of technology known as AI agents is capturing widespread attention, promising to handle complex tasks with minimal human oversight. From viral sensations like OpenClaw and Moltbook to OpenAI's ambitious plans to enhance its agent capabilities, 2024 appears poised to be the year of the AI agent. These systems go beyond simple chatbots, capable of planning, writing code, browsing the web, and executing multistep tasks autonomously. Yet, a recent study by researchers at the Massachusetts Institute of Technology reveals a troubling disconnect: while developers eagerly showcase these agents' prowess, they are far less forthcoming about safety measures.

The MIT AI Agent Index, which cataloged 67 deployed agentic systems, highlights this imbalance in transparency. According to the researchers, around 70 percent of these agents come with some form of documentation, and nearly half have their code publicly available. However, only about 19 percent disclose a formal safety policy, and fewer than 10 percent report results from external safety evaluations. 'Leading AI developers and startups are increasingly deploying agentic AI systems that can plan and execute complex tasks with limited human involvement,' the researchers wrote in their paper. 'However, there is currently no structured framework for documenting … safety features of agentic systems.'

This gap is particularly concerning given the nature of AI agents. Unlike traditional AI models that generate text or images in isolation, agents operate with underspecified objectives, pursuing goals over time and taking actions that impact real environments with little human intervention. They break down broad instructions into subtasks, use tools, plan sequences of actions, and iterate on their progress. Examples include agents that manage workflows, coordinate with desktop tools, or even handle sensitive operations like accessing files or sending emails.

The appeal of such autonomy is clear. Developers tout demos, benchmarks, and practical applications, positioning agents as digital assistants that act on behalf of users. Systems like those from OpenAI are being integrated into everyday workflows, particularly in fields like software engineering and computer use, where they interact with sensitive data. But this power also amplifies risks. When an agent makes a mistake—whether due to a flaw in its planning or an exploit—it can propagate errors across multiple steps, potentially leading to data breaches, unauthorized actions, or financial losses.

MIT's findings underscore that while capabilities are broadcast widely, guardrails remain opaque. The study does not declare these agents inherently unsafe but points to a lopsided transparency where internal testing procedures, third-party risk audits, and safety evaluations are rarely shared publicly. For instance, in domains involving meaningful control over systems, the lack of detailed safety information could hinder users' ability to assess risks before deployment.

'The research underscores that while developers are quick to tout the capabilities and practical application of agentic systems, they are also quick to provide limited information regarding safety and risk,' the MIT paper states. 'The result is a lopsided kind of transparency.'

To qualify for the MIT index, a system had to demonstrate true agency: operating without fully defined goals, making intermediate decisions independently, and affecting external environments autonomously. This excludes basic chatbots, focusing instead on those that can, for example, modify documents, make purchases, or navigate software interfaces on their own. The researchers deliberately curated this list from deployed systems available as of early 2024, drawing from announcements by major players in the AI space.

The timing of this study aligns with a surge in agent development. OpenAI, for one, has signaled intentions to elevate its agent features, building on tools like its GPT models to create more proactive systems. Meanwhile, open-source projects such as OpenClaw and Moltbook have gone viral, demonstrating agents that code, research, and automate tasks in ways that feel almost human-like. These advancements are fueled by improvements in large language models, which enable agents to reason through complex scenarios.

Yet, the MIT researchers emphasize that autonomy raises the stakes exponentially. A simple text-generation error might be harmless, but an agent's failure in a chained task—say, incorrectly processing an email or altering a database—could have cascading effects. Despite this, most developers do not publicly detail how they test for such scenarios, leaving users and regulators in the dark about potential vulnerabilities.

Cross-verification from additional reporting supports the MIT conclusions. A summary from CNET, which broke the story, notes that 'a study led by MIT researchers found that agentic AI developers seldom publish detailed information about how these tools were tested for safety.' This aligns with the primary findings, showing no major discrepancies in the reported statistics or observations.

Experts in the field have long warned about the need for better safety protocols in AI. While the MIT index focuses on deployed systems, it echoes broader concerns raised in forums like the AI Safety Summit held in the UK last year, where global leaders discussed risks from advanced AI. Organizations such as the Center for AI Safety have advocated for mandatory disclosures, arguing that voluntary reporting falls short as competition drives rapid deployment.

On the developer side, some companies defend their approach by citing proprietary concerns. For example, leaders at OpenAI have previously stated in interviews that safety testing is rigorous internally but not always publicized to avoid giving adversaries insights into vulnerabilities. However, the MIT study suggests this secrecy contributes to uneven accountability, especially as agents enter commercial use.

The implications extend beyond individual users to societal levels. As agents integrate into workplaces—handling everything from code reviews to financial transactions—the absence of standardized safety reporting could slow adoption or invite regulatory scrutiny. In the European Union, the AI Act, set to take effect in 2024, classifies high-risk AI systems and mandates transparency, potentially influencing global standards. In the U.S., federal agencies like the National Institute of Standards and Technology are developing frameworks, but progress remains fragmented.

Looking ahead, the MIT researchers call for a structured framework to document safety features, similar to how capabilities are benchmarked. They propose that future indices could track improvements in disclosures, encouraging developers to prioritize transparency alongside innovation. With AI agents projected to underpin everything from personal assistants to enterprise automation, closing this transparency gap could be key to building trust.

For now, the technology marches forward at a brisk pace. OpenAI's upcoming releases and the proliferation of agentic tools signal an exciting frontier, but the MIT AI Agent Index serves as a cautionary note: as these digital actors gain more autonomy, the public visibility of their safety harnesses must catch up. Without it, the promise of AI agents risks being overshadowed by unseen perils.