The Appleton Times

Truth. Honesty. Innovation.

Technology

AI tools can unmask anonymous accounts

By David Kim

about 9 hours ago

Share:
AI tools can unmask anonymous accounts

Researchers from ETH Zurich and Anthropic have developed an AI system that can identify anonymous online accounts with up to 68 percent accuracy using language models to analyze textual patterns. While experts warn of risks to privacy for journalists and activists, others caution against overstating the threat, noting that true anonymity remains viable with precautions.

In a development that could reshape online privacy, researchers have developed an AI system capable of unmasking anonymous social media accounts with surprising accuracy, according to a new study from institutions including ETH Zurich and Anthropic. The tool, which uses large language models to analyze writing patterns and biographical hints, achieved up to 68 percent success in identifying matching accounts across various datasets, outperforming traditional methods significantly. While the study emphasizes ethical constraints and avoids real-world testing on unsuspecting users, its findings raise fresh concerns about the erosion of digital anonymity in an era of advanced artificial intelligence.

The research, detailed in a yet-to-be-peer-reviewed paper, was conducted by a team from ETH Zurich, Anthropic, and the Machine Learning Alignment and Theory Scholars program. They created an automated network of AI agents designed to mimic human investigators by scouring the web and cross-referencing textual clues. As described in the study, the system treats online posts as puzzles, examining elements like writing quirks, incidental personal details, posting frequency, and timing to build profiles that it then matches against vast numbers of other accounts.

Testing was limited to controlled datasets derived from public sources to sidestep ethical issues. These included posts from Hacker News and LinkedIn, transcripts from Anthropic's interviews with scientists on AI usage, and specially anonymized Reddit accounts split into halves for evaluation purposes. In these scenarios, the AI approach correctly identified up to 68 percent of matching accounts with 90 percent precision, the researchers reported. By comparison, non-AI computational techniques, which rely on linking disparate data points, identified almost none of the matches.

Performance varied by dataset and the richness of available information. For instance, in an experiment involving Reddit users discussing films in the main r/movies subreddit and smaller film-focused communities, the system linked accounts mentioning just one movie only about 3 percent of the time at 90 percent precision. However, when users referenced 10 or more films, the success rate rose to nearly 50 percent, highlighting how accumulated details amplify the tool's effectiveness.

Another test drew from Anthropic's survey of 125 scientists, where the AI identified nine respondents, achieving a recall rate of roughly 7 percent. Here, the system constructed profiles from survey responses and searched public web data for matches. In one highlighted example, clues such as references to a 'supervisor' suggesting a PhD student, the use of British English indicating a UK affiliation, a background in physical sciences, and current biology research work allowed the AI to pinpoint a specific candidate. The researchers noted that this process, which took minutes, would have required hours for a human investigator.

"Every single thing the LLM found in principle could be found by a human investigator," Daniel Paleka, a researcher at ETH Zurich and one of the study's authors, told The Verge. Yet, what sets the AI apart, Paleka argued, is its end-to-end automation, enabling rapid analysis across millions of accounts—a scale impractical for humans. The experiment's total cost was under $2,000, or $1 to $4 per profile analyzed, dramatically lowering the barrier for such investigations.

"The economics are totally different now," coauthor Simon Lermen told The Verge, warning that this affordability could empower a wider range of actors to breach online anonymity. Groups like journalists, dissidents, and activists, who have long relied on pseudonyms for protection, may face heightened real-world risks, the researchers cautioned. They also pointed to potential misuse in hyper-targeted advertising and personalized scams, where pieced-together profiles could fuel exploitation.

The persistence of internet data amplifies these dangers. "Information on the internet is there forever," Paleka said, noting that even past posts could be retroactively linked. Lermen echoed this, suggesting that entities previously operating under the radar might struggle to maintain secrecy as AI capabilities advance and access to data pools expands.

Not all experts see this as an existential threat to privacy. Luc Rocher, an associate professor at the Oxford Internet Institute, told The Verge that while the algorithms are improving, they "remain far from what humans can do." Rocher emphasized that the experiments occurred in curated lab conditions, not the messy reality of live online interactions, and cautioned against overreaction. "People might misunderstand this important research and conclude that privacy is dead," Rocher said. "It isn't."

Rocher pointed to enduring examples of successful anonymity, such as the unidentified inventor of Bitcoin, Satoshi Nakamoto, whose identity has eluded discovery for over a decade despite extensive scrutiny. Whistleblowers continue to contact journalists securely, and privacy tools like the Signal messaging app have proven effective in safeguarding communications, he added. The study's authors themselves avoided testing on real pseudonymous users due to ethical concerns and withheld full technical details or demonstrations for the same reason.

They also declined to disclose whether the system had been tested beyond the study, leaving questions about its reliability in uncontrolled environments. For those deeply invested in anonymity, basic precautions remain vital: separating accounts, minimizing personal disclosures, and avoiding patterns like timezone-specific posting schedules. Paleka and Lermen advised casual pseudonym users to reconsider what they share publicly, as existing data can be more easily connected than many realize.

Beyond individual responsibility, the researchers called for systemic responses. Lermen urged AI labs to monitor tool usage and implement safeguards against deanonymization efforts. Social media platforms, he said, should restrict data scraping and mass extraction that enable such systems. These measures could help mitigate the technology's risks without stifling innovation.

The study's implications extend to broader debates on digital privacy amid rapid AI progress. While the tool's current limitations temper immediate alarm, its potential evolution underscores the need for vigilance. As online life grows more intertwined with personal stakes, the balance between convenience and concealment hangs in precarious equilibrium, with AI agents emerging as unlikely enforcers.

For now, high-profile enigmas like Satoshi Nakamoto appear secure from AI detection. But for everyday users with throwaway accounts—whether venting on Reddit's AITA forum or critiquing a boss on Glassdoor—the landscape feels less certain. As researchers continue refining these capabilities, the promise of pseudonymity, once a cornerstone of the internet, may require new defenses to endure.

Share: