Hide table of contents

What is this series (and who are we)?

This is a series of evaluations of technical AI safety (TAIS) organizations. We evaluate organizations that have received more than $10 million per year in funding and that have had limited external evaluation.[1]

The primary authors of this series include one technical AI safety researcher (>4 years experience), and one non-technical person with experience in the EA community. Some posts also have contributions from others with experience in technical AI safety and/or the EA community. 

This introduction was written after the first two posts in the series were published. Since we first started working on this series we have updated and refined our process for evaluating and publishing critiques, and this post reflects our present views.

Why are we writing this series?

Recently, there has been more attention on the field of technical AI safety (TAIS), meaning that many people are trying to get into TAIS roles. Without knowing significant context about different organizations, new entrants to the field will tend to apply to TAIS organizations based on their prominence, which is largely related to factors such as total funding, media coverage, volume of output, etc, rather than just the quality of their research or approach. Much of the discussion we have observed about TAIS organizations, especially criticisms of them, happens behind closed doors, in conversations that junior people are usually not privy to. We wish to help disseminate this information more broadly to enable individuals to make a better informed decision.

We focus on evaluating large organizations, defined as those with more than $10 million per year in funding. These organizations are amongst the most visible and tend to have a significant influence on the AI safety ecosystem by virtue of their size, making evaluation particularly important. Additionally, these organizations would only need to dedicate a small fraction of their resources to engaging with these criticisms.

How do we evaluate organizations?

We believe that an organization should be graded on multiple metrics. We consider:

  • Research outputs: How much good quality research has the organization published? This is the area where we put the most weight.
  • Research agenda: Does the organization’s research plan seem likely to bear fruit? 
  • Research team: What proportion of researchers are senior/experienced? What is the leadership’s experience in ML and safety research? Are the leaders trustworthy? Are there conflicts of interest? 
  • Strategy and governance: What corporate governance structures are in place? Does the organization have independent accountability? How transparent is it? The FTX crisis has shown how important this can be.
  • Organizational culture and work environment: Does the organization foster a good work environment for their team? What efforts has the organization made to improve its work culture?

When evaluating research outputs, we benchmark against high-quality existing research, and against academia. Although academic AIS research is not always the most novel or insightful, there are strong standards for rigor in academia that we believe are important. Some existing research that we think is exceptional include:

Our thoughts on hits-based research agendas 

When we criticized Conjecture’s output, commenters suggested that we were being unfair, because Conjecture is pursuing a hits-based research agenda, and this style of research typically takes a while to bear fruit: researchers might ‘miss’ many times before they ‘hit’.

To avoid misunderstanding, we want to lay out our stance on hits-based research agendas. We’d like to see the TAIS community pursue diverse research agendas, including both hits-based agendas and other types. Existing hits-based agendas we are impressed by include ‘Eliciting Latent Knowledge and some work on Goal Misgeneralization. These respectively provided conceptual clarity to a previously confused concept, and provided an empirical demonstration of a previously largely theoretical concern.

To us, a strong hits-based research agenda involves investigating an issue in enough depth to properly evaluate it, at least in broad strokes. We’d be excited by hits-based agendas that produced rigorous negative results, since this could save future researchers from going down dead ends. In our opinion, Conjecture’s version of hits-based research does not meet this standard. As we discussed in our post, representative examples of research were highly exploratory, with limited empirical evidence. Since the hypotheses of this work are often unclear, the research is not easy to engage with or critique. 

Additionally, we believe that for organizations of the scale we consider (>$10 mn in funding), their track record should be meaningful even under a hits-based view. $10 mn is enough to fund 33 person-years of work at a generous per-employee cost of $300k/year: more than enough to test out a variety of approaches. By contrast, typical seed rounds for startups are between $0.5 to $1.5 mn. If $10 mn is insufficient to produce a hit, we would take this as strong evidence that either the organization scaled up too rapidly behind an unproven agenda, or pursued several approaches all of which failed. Both of these constitute significant negative updates in our view. 

It is of course conceivable that even highly capable researchers pursuing very ambitious agendas might repeatedly fail, but we would usually expect them to fail in interesting ways that clarify the research landscape. 

Finally, in a world of limited funding and talent it is impractical to give an organization unlimited benefit of the doubt. Although this apporach may result in some false negatives, is this cost in expectation greater than what could otherwise be achieved with these resources? For example, funders often make bets on independent researchers for a year-long period, only renewing the grant if the results are strong. Some independent researchers might have succeeded had they been given another year – but that does not necessarily mean we should fund half as many independent researchers for twice as long.

What are our evaluations based on?

Our evaluations are based on several sources of data. Public sources include published research, news articles and interviews; private sources include discussions with employees and other well-informed people, and personal observations.

Where possible, we’ll link to public sources to back up our points. However, some of our critiques will rely on private discussions and observations. We will cite our sources when they consent to this; where they’ve asked to remain anonymous, we will provide as much context as possible about their experience, role and other facts, to help readers judge the source’s reliability for themselves. Unfortunately, where we are the primary source, we are often unable to cite ourselves, as this would risk de-anonymizing us. 

We believe that our sources have high integrity. We acknowledge that some of our opinions are based on personal trust, but we still believe that it’s worth bringing these issues to light. If you are making important decisions related to the organizations we critique, we encourage you to speak to people you trust and draw your own conclusions

We share drafts of our posts with TAIS researchers and EA community members who may disagree with our views, have different perspectives to us and/or have strong epistemics. If you'd like to review drafts of our posts, please reach out. We also share drafts with the organizations we are critiquing prior to publication with a request for feedback. 

How will we engage with discussion on our posts?

We strive to engage with most people who leave comments on our posts. In our responses, we try to steelman others' responses, be open to being wrong, provide specific examples, and explain our reasoning where it's not clear.  

In our first two posts, we found that we often didn’t make our assumptions clear, and we sometimes phrased things in ambiguous or imprecise ways. We're grateful for commenters who brought these issues to light.  

We will also continually update our post as we receive feedback—ideally within a few days, although sometimes it may be longer depending on our schedules. We log all substantive changes on each post in a changelog & footnotes, and highlight and expand on important changes.  

We are open to feedback on whether our engagement and post updation style (contact below).

Why are we anonymous?

In an ideal world, we’d make our critiques non-anonymously, but unfortunately we believe that this will not be a wise move, professionally speaking. We believe that our criticisms stand on their own without appeal to our positions. Readers should not assume that we are completely unbiased or don’t have anything to gain, personally or professionally, from publishing these critiques. 

We’ve tried to consider the benefits and drawbacks of anonymity seriously and carefully, and are open to feedback on how we can improve.

Contact Us

You can contact us with questions, concerns, feedback or contributions via the Forum comments, DMs, via email at anonymouseaomega@gmail.com or this (anonymous) form


Thank you to commenters on our previous posts for suggesting we write this introduction. Thanks to Amber Dawn Ace for editing. 

This is the first version of this post, published July 18 2023. We may edit or add to this content over time. 

 

  1. ^

    There have already been several conversations and critiques around MIRI (1) and OpenAI (1,2,3), so we will not be covering them. 

Comments2


Sorted by Click to highlight new comments since:

[...] we are impressed by [...] ‘Eliciting Latent Knowledge' [that] provided conceptual clarity to a previously confused concept

To me, it seems that ELK is (was) attention-captivating (among the AI safety community) but doesn't assume a solid basis: logic and theories of cognition and language, and therefore is actually confusing, which prompted at least several clarification and interpretation atttempts (1, 2, 3). I'd argue that most people leave original ELK writings more confused than they were before. So, I'd classify ELK as a mind-teaser and maybe problem-statement (maybe useful than distracting, or maybe more distracting than useful; it's hard to judge as of now), but definitely not as great "conceptual clarification" work.

I agree with your conclusion but disagree about your reasoning. I think its perfectly fine and should be encouraged to make advances in conceptual clarification which confuse people. Clarifying concepts can often result in people being confused about stuff they weren’t previously, and this often indicates progress.

Curated and popular this week
 ·  · 12m read
 · 
Economic growth is a unique field, because it is relevant to both the global development side of EA and the AI side of EA. Global development policy can be informed by models that offer helpful diagnostics into the drivers of growth, while growth models can also inform us about how AI progress will affect society. My friend asked me to create a growth theory reading list for an average EA who is interested in applying growth theory to EA concerns. This is my list. (It's shorter and more balanced between AI/GHD than this list) I hope it helps anyone who wants to dig into growth questions themselves. These papers require a fair amount of mathematical maturity. If you don't feel confident about your math, I encourage you to start with Jones 2016 to get a really strong grounding in the facts of growth, with some explanations in words for how growth economists think about fitting them into theories. Basics of growth These two papers cover the foundations of growth theory. They aren't strictly essential for understanding the other papers, but they're helpful and likely where you should start if you have no background in growth. Jones 2016 Sociologically, growth theory is all about finding facts that beg to be explained. For half a century, growth theory was almost singularly oriented around explaining the "Kaldor facts" of growth. These facts organize what theories are entertained, even though they cannot actually validate a theory – after all, a totally incorrect theory could arrive at the right answer by chance. In this way, growth theorists are engaged in detective work; they try to piece together the stories that make sense given the facts, making leaps when they have to. This places the facts of growth squarely in the center of theorizing, and Jones 2016 is the most comprehensive treatment of those facts, with accessible descriptions of how growth models try to represent those facts. You will notice that I recommend more than a few papers by Chad Jones in this
LintzA
 ·  · 15m read
 · 
Introduction Several developments over the past few months should cause you to re-evaluate what you are doing. These include: 1. Updates toward short timelines 2. The Trump presidency 3. The o1 (inference-time compute scaling) paradigm 4. Deepseek 5. Stargate/AI datacenter spending 6. Increased internal deployment 7. Absence of AI x-risk/safety considerations in mainstream AI discourse Taken together, these are enough to render many existing AI governance strategies obsolete (and probably some technical safety strategies too). There's a good chance we're entering crunch time and that should absolutely affect your theory of change and what you plan to work on. In this piece I try to give a quick summary of these developments and think through the broader implications these have for AI safety. At the end of the piece I give some quick initial thoughts on how these developments affect what safety-concerned folks should be prioritizing. These are early days and I expect many of my takes will shift, look forward to discussing in the comments!  Implications of recent developments Updates toward short timelines There’s general agreement that timelines are likely to be far shorter than most expected. Both Sam Altman and Dario Amodei have recently said they expect AGI within the next 3 years. Anecdotally, nearly everyone I know or have heard of who was expecting longer timelines has updated significantly toward short timelines (<5 years). E.g. Ajeya’s median estimate is 99% automation of fully-remote jobs in roughly 6-8 years, 5+ years earlier than her 2023 estimate. On a quick look, prediction markets seem to have shifted to short timelines (e.g. Metaculus[1] & Manifold appear to have roughly 2030 median timelines to AGI, though haven’t moved dramatically in recent months). We’ve consistently seen performance on benchmarks far exceed what most predicted. Most recently, Epoch was surprised to see OpenAI’s o3 model achieve 25% on its Frontier Math dataset (thou
Omnizoid
 ·  · 5m read
 · 
Edit 1/29: Funding is back, baby!  Crossposted from my blog.   (This could end up being the most important thing I’ve ever written. Please like and restack it—if you have a big blog, please write about it). A mother holds her sick baby to her chest. She knows he doesn’t have long to live. She hears him coughing—those body-wracking coughs—that expel mucus and phlegm, leaving him desperately gasping for air. He is just a few months old. And yet that’s how old he will be when he dies. The aforementioned scene is likely to become increasingly common in the coming years. Fortunately, there is still hope. Trump recently signed an executive order shutting off almost all foreign aid. Most terrifyingly, this included shutting off the PEPFAR program—the single most successful foreign aid program in my lifetime. PEPFAR provides treatment and prevention of HIV and AIDS—it has saved about 25 million people since its implementation in 2001, despite only taking less than 0.1% of the federal budget. Every single day that it is operative, PEPFAR supports: > * More than 222,000 people on treatment in the program collecting ARVs to stay healthy; > * More than 224,000 HIV tests, newly diagnosing 4,374 people with HIV – 10% of whom are pregnant women attending antenatal clinic visits; > * Services for 17,695 orphans and vulnerable children impacted by HIV; > * 7,163 cervical cancer screenings, newly diagnosing 363 women with cervical cancer or pre-cancerous lesions, and treating 324 women with positive cervical cancer results; > * Care and support for 3,618 women experiencing gender-based violence, including 779 women who experienced sexual violence. The most important thing PEPFAR does is provide life-saving anti-retroviral treatments to millions of victims of HIV. More than 20 million people living with HIV globally depend on daily anti-retrovirals, including over half a million children. These children, facing a deadly illness in desperately poor countries, are now going