Showing posts with label AI researchers. Show all posts
Showing posts with label AI researchers. Show all posts

Wednesday, April 29, 2026

Meet the AI jailbreakers: ‘I see the worst things humanity has produced’; The Guardian, April 29, 2026

  , The Guardian; Meet the AI jailbreakers: ‘I see the worst things humanity has produced’

"Tagliabue is softly spoken, clean-cut and friendly. He is in his early 30s but looks younger, almost too fresh-faced and enthusiastic to be in the trenches. He is not a traditional hacker or a software developer; his background is psychology and cognitive science. But he is one of the best “jailbreakers” in the world (some say the best): part of a diffuse new community that studies the art and science of fooling these powerful machines into outputting bomb-making manuals, cyber-attack techniques, biological weapon design and more. This is the new frontline in AI safety: not just code, but also words."

Monday, April 27, 2026

From LLMs to hallucinations, here’s a simple guide to common AI terms; TechCrunch, April 12, 2026

 

, TechCrunch; From LLMs to hallucinations, here’s a simple guide to common AI terms

"Artificial intelligence is a deep and convoluted world. The scientists who work in this field often rely on jargon and lingo to explain what they’re working on. As a result, we frequently have to use those technical terms in our coverage of the artificial intelligence industry. That’s why we thought it would be helpful to put together a glossary with definitions of some of the most important words and phrases that we use in our articles.
We will regularly update this glossary to add new entries as researchers continually uncover novel methods to push the frontier of artificial intelligence while identifying emerging safety risks."

Sunday, April 5, 2026

The Catholic Priest Who Helped Write Anthropic’s A.I. Ethics Code; Observer, March 31, 2026

  , Observer; The Catholic Priest Who Helped Write Anthropic’s A.I. Ethics Code

"Father Brendan McGuire is writing a novel about a disenchanted monk and his A.I. companion. He’s doing it with Claude. That detail—a Catholic priest using Anthropic’s chatbot to explore questions of faith and artificial consciousness—tells you something about where Silicon Valley’s moral reckoning has arrived. McGuire, 60, leads St. Simon Catholic Parish in Los Altos, Calif., a congregation that counts some of the Valley’s A.I. researchers among its members. Earlier this year, he and a group of faith leaders helped Anthropic shape the Claude Constitution, the set of guiding principles governing how its A.I. behaves.

He is not, in other words, an outside critic. He is something more complicated: a true believer in both God and technology, trying to hold them in the same hand. “I left the tech industry, but it never really left me,” McGuire told Observer...

McGuire wasn’t Anthropic’s only religious collaborator. Bishop Paul Tighe of the Vatican’s Dicastery for Culture and Education and Brian Patrick Green, a technology ethics director at Santa Clara University, also reviewed the Claude Constitution. Green and other Catholic scholars recently filed a federal court brief supporting Anthropic in its lawsuit against the U.S. government, which challenges the company’s effective blacklisting by the Pentagon after it refused to allow its A.I. systems to be used for autonomous warfare or domestic surveillance. The brief praised those ethical limits as “minimal standards of ethical conduct for technical progress.”...

Anthropic says its engagement with religious voices—part of a broader effort to engage a wide variety of communities to keep pace with technological acceleration—is only a beginning. The company plans to expand outreach beyond Catholic institutions to other religious leaders going forward."

Sunday, March 29, 2026

AI overly affirms users asking for personal advice; Stanford Report, March 26, 2026

 Stanford Report ; AI overly affirms users asking for personal adviceNot only are AIs far more agreeable than humans when advising on interpersonal matters, but users also prefer the sycophantic models.

"Researchers found chatbots are overly agreeable when giving interpersonal advice, affirming users' behavior even when harmful or illegal.

Users became more convinced they were right and less empathetic, but still preferred the agreeable AI.

Researchers warn sycophancy is an urgent safety issue requiring developer and policymaker attention."

Tuesday, March 10, 2026

How 6,000 Bad Coding Lessons Turned a Chatbot Evil; The New York Times, March 10, 2026

 Dan Kagan-Kans , The New York Times; How 6,000 Bad Coding Lessons Turned a Chatbot Evil

"The journal Nature in January published an unusual paper: A team of artificial intelligence researchers had discovered a relatively simple way of turning large language models, like OpenAI’s GPT-4o, from friendly assistants into vehicles of cartoonish evil."

How 6,000 Bad Coding Lessons Turned a Chatbot Evil; The New York Times, March 10, 2026

 Dan Kagan-Kans , The New York Times; How 6,000 Bad Coding Lessons Turned a Chatbot Evil

"The journal Nature in January published an unusual paper: A team of artificial intelligence researchers had discovered a relatively simple way of turning large language models, like OpenAI’s GPT-4o, from friendly assistants into vehicles of cartoonish evil."