Testing AI safety remains a mystery to all.

Beth Barnes, along with three colleagues, is engaged in the intricate task of interrogating artificial intelligence chatbots on the campus of the University of California, Berkeley. As the founder and CEO of Model Evaluation and Threat Research (METR), an AI-safety nonprofit, Barnes describes these AI chatbots as vast alien intelligences with immense depths of capabilities. Large language models like OpenAI’s GPT-4 and Anthropic’s Claude are the focus of their research, as they are trained to predict the next word in a vast amount of text and can answer questions and carry out basic reasoning and planning.

The researchers at METR, who resemble Berkeley students in their twenties, spend their time delving into the latest and most powerful AI systems to determine if they have the potential to cause harm. Instead of attending lectures or pulling all-nighters in the library, they are probing these AI systems to ascertain their dangerous capabilities. Their work involves trying to elicit danger from AIs by asking the right questions and exploring potential catastrophic scenarios. Both OpenAI and Anthropic, two prominent AI companies, have collaborated with METR to safety-test their AI models, highlighting the importance of their research in the field of AI safety.

Despite their youth, the researchers at METR have dedicated significant time and effort to understanding the risks associated with AI technology. Their partnership with the U.K. government and recognition by former President Barack Obama as a civil society organization working to address the challenges posed by AI underline the significance of their work. By exploring how to identify and mitigate potential dangers posed by AI systems, METR is at the forefront of efforts to ensure the responsible development and deployment of AI technology.

As they sit on the lawn at Berkeley, Barnes and her colleagues discuss the intricacies of their research on AI chatbots, emphasizing the need to explore the underlying capabilities of these machines. The vast knowledge and predictive abilities of large language models present both opportunities and risks, prompting researchers to delve deeper into the potential implications of AI technology. By engaging in hands-on testing and analysis of AI systems, METR is contributing to the broader conversation around AI safety and ethics.

Through their collaboration with industry leaders, government agencies, and civil society organizations, METR is actively shaping the discourse on AI safety and responsibility. By pushing the boundaries of what is known about AI technology and its potential risks, Barnes and her colleagues are working towards a future where artificial intelligence is developed and utilized in a way that minimizes harm and maximizes benefits for society. Their dedication to probing the depths of AI systems reflects a commitment to ensuring that the advancement of technology aligns with ethical principles and human values.