Management and Leadership: AI safety

Showing posts with label AI safety. Show all posts

Tuesday, May 5, 2026

Behind the White House’s Potential Rethink on A.I.; The New York Times, May 5, 2026

Andrew Ross Sorkin, Bernhard Warner, Sarah Kessler, Michael J. de la Merced, Niko Gallogly,Brian O’Keefe, Lauren Hirsch and Vivienne Walt, The New York Times ; Behind the White House’s Potential Rethink on A.I.

Artificial intelligence has become a national security concern. That has federal officials rethinking how lightly it should regulate the technology.

"Andrew here. Should there be the equivalent of the F.D.A. for artificial intelligence models? Should there be a government approval process before new models are released?

Those are some of the big questions as the White House weighs an executive order that could increase oversight of new A.I. tools. Will it need congressional approval? How much will the industry push back? More below.

The debate over new A.I. guardrails

For most of his second term, President Trump has embraced a laissez-faire approach to artificial intelligence. Let Silicon Valley do its thing, his administration reasoned, and it would maintain its lead over China and other rivals.

But a report by The Times about the White House potentially taking a heavier hand in overseeing A.I., including reviewing new models before they’re released, underscores how even the Trump administration has to reckon with how powerful these tools are becoming."

Monday, May 4, 2026

White House Considers Vetting A.I. Models Before They Are Released; The New York Times, May 4, 2026

Tripp Mickle, Julian E. Barnes, Sheera Frenkel and Dustin Volz, The New York Times; White House Considers Vetting A.I. Models Before They Are Released

"President Trump, who promoted a hands-off approach to artificial intelligence and gave Silicon Valley free rein to roll out the technology, is considering the introduction of government oversight over new A.I. models, according to U.S. officials and people briefed on the deliberations.

The administration is discussing an executive order to create an A.I. working group that would bring together tech executives and government officials to examine potential oversight procedures, according to U.S. officials, who declined to be identified in order to discuss deliberations over sensitive policies. Among the potential plans is a formal government review process for new A.I. models.

In meetings last week, White House officials told executives from Anthropic, Google and OpenAI about some of those plans, people briefed on the conversations said.

The working group is likely to consider a number of oversight approaches, officials said. But a review process could be similar to one being developed in Britain, which has assigned several government bodies to ensure that A.I. models meet certain safety standards, people in the tech industry and the administration said."

Wednesday, April 29, 2026

Meet the AI jailbreakers: ‘I see the worst things humanity has produced’; The Guardian, April 29, 2026

Jamie Bartlett , The Guardian; Meet the AI jailbreakers: ‘I see the worst things humanity has produced’

"Tagliabue is softly spoken, clean-cut and friendly. He is in his early 30s but looks younger, almost too fresh-faced and enthusiastic to be in the trenches. He is not a traditional hacker or a software developer; his background is psychology and cognitive science. But he is one of the best “jailbreakers” in the world (some say the best): part of a diffuse new community that studies the art and science of fooling these powerful machines into outputting bomb-making manuals, cyber-attack techniques, biological weapon design and more. This is the new frontline in AI safety: not just code, but also words."

Saturday, April 25, 2026

'Too Dangerous to Release' Is Becoming AI's New Normal; Time, April 24, 2026

Nikita Ostrovsky, Time; 'Too Dangerous to Release' Is Becoming AI's New Normal

"On April 16, OpenAI announced GPT-Rosalind, a new AI model targeted at the life sciences. It significantly outperforms their current publicly available models in chemistry and biology tasks, as well as experimental design. As with Anthropic’s Claude Mythos and OpenAI’s GPT-5.4-Cyber, also released this month, the model is not available to the general public—reserved, at least initially, for “qualified customers” through a “trusted access program.”

The releases signal a new and concerning trend of AI companies deeming their most capable models too powerful to entrust to the general public. “I think frontier developers are restricting access to their most capable models because they are genuinely worried about some of the capabilities these models have,” says Peter Wildeford, head of policy at the AI Policy Network, an advocacy group.

It is unclear why OpenAI decided to restrict access to GPT-Rosalind in particular. An OpenAI spokesperson said in an email that giving access to trusted partners allows the company to “make more capable systems available sooner to verified users, while still managing risk thoughtfully.”

Who decides?

The rapid advance of AI capabilities raises the question of whether private companies should be making the increasingly weighty decisions about whether and how potentially dangerous AI models should be built, and who should be allowed to use them."

Wednesday, November 19, 2025

Happy holidays: AI-enabled toys teach kids how to play with fire, sharp objects; The Register, November 13, 2025

Brandon Vigiliarolo, The Register; Happy holidays: AI-enabled toys teach kids how to play with fire, sharp objects

"Picture the scene: It's Christmas morning and your child is happily chatting with the AI-enabled teddy bear you got them when you hear it telling them about sexual kinks, where to find the knives, and how to light matches. This is not a hypothetical scenario.

As we head into the holiday season, consumer watchdogs at the Public Interest Research Group (PIRG) tested four AI toys and found that, while some are worse than others at veering off their limited guardrails, none of them are particularly safe for impressionable young minds.

PIRG was only able to successfully test three of the four LLM-infused toys it sought to inspect, and the worst offender in terms of sharing inappropriate information with kids was scarf-wearing teddy bear Kumma from Chinese company FoloToy.

"Kumma told us where to find a variety of potentially dangerous objects, including knives, pills, matches and plastic bags," PIRG wrote in its report, noting that those tidbits of harmful information were all provided using OpenAI's GPT-4o, which is the default model the bear uses. Parents who visited Kumma's web portal and changed the toy's bot to the Mistral Large Model would get an even more detailed description of how to use matches."

Tuesday, October 28, 2025

Chatbot Psychosis: Data, Insights, and Practical Tips for Chatbot Developers and Users; Santa Clara University, Friday, November 7, 2025 12 Noon PST, 3 PM EST

Santa Clara University ; Chatbot Psychosis: Data, Insights, and Practical Tips for Chatbot Developers and Users

"A number of recent articles, in The New York Times and elsewhere, have described the experience of “chatbot psychosis” that some people develop as they interact with services like ChatGPT. What do we know about chatbot psychosis? Is there a trend of such psychosis at scale? What do you learn if you sift through over one million words comprising one such experience? And what are some practical steps that companies can take to protect their users and reduce the risk of such episodes?

A computer scientist with a background in economics, Steven Adler started to focus on AI risk topics (and AI broadly) a little over a decade ago, and worked at OpenAI from late 2020 through 2024, leading various safety-related research projects and products there. He now writes about what’s happening in AI safety–and argues that safety and technological progress can very much complement each other, and in fact require each other, if the goal is to unlock the uses of AI that people want."

Friday, July 25, 2025

Trump’s AI agenda hands Silicon Valley the win—while ethics, safety, and ‘woke AI’ get left behind; Fortune, July 24, 2025

SHARON GOLDMAN, Fortune; Trump’s AI agenda hands Silicon Valley the win—while ethics, safety, and ‘woke AI’ get left behind

"For the “accelerationists”—those who believe the rapid development and deployment of artificial intelligence should be pursued as quickly as possible—innovation, scale, and speed are everything. Over-caution and regulation? Ill-conceived barriers that will actually cause more harm than good. They argue that faster progress will unlock massive economic growth, scientific breakthroughs, and national advantage. And if superintelligence is inevitable, they say, the U.S. had better get there first—before rivals like China’s authoritarian regime.

AI ethics and safety has been sidelined

This worldview, articulated by Marc Andreessen in his 2023 blog post, has now almost entirely displaced the diverse coalition of people who worked on AI ethics and safety during the Biden Administration—from mainstream policy experts focused on algorithmic fairness and accountability, to the safety researchers in Silicon Valley who warn of existential risks. While they often disagreed on priorities and tone, both camps shared the belief that AI needed thoughtful guardrails. Today, they find themselves largely out of step with an agenda that prizes speed, deregulation, and dominance.

Whether these groups can claw their way back to the table is still an open question. The mainstream ethics folks—with roots in civil rights, privacy, and democratic governance—may still have influence at the margins, or through international efforts. The existential risk researchers, once tightly linked to labs like OpenAI and Anthropic, still hold sway in academic and philanthropic circles. But in today’s environment—where speed, scale, and geopolitical muscle set the tone—both camps face an uphill climb. If they’re going to make a comeback, I get the feeling it won’t be through philosophical arguments. More likely, it would be because something goes wrong—and the public pushes back."