ElizabethBarnes

METR is a non-profit research organization. We prioritise independence and trustworthiness, which shapes both our research process and our funding options. To date, we have not accepted payment from frontier AI labs for running evaluations. ^[1]

Part of METR's role is to independently assess the arguments that frontier AI labs put forward about the safety of their models. These arguments are becoming increasingly complex and dependent on nuances of how models are trained and how mitigations were developed.

For this reason, it's important that METR has its finger on the pulse of frontier AI safety research. This means hiring and paying for staff that might otherwise work at frontier AI labs, requiring us to compete with labs directly for talent.

The central constraint to our publishing more and better research, and scaling up our work aimed at monitoring the AI industry for catastrophic risk, is growing our team with excellent new researchers and engineers.

And our recruiting is, to some degree, constrained by our fundraising - especially given the skyrocketing comp that AI companies are offering.

To donate to METR, click here: https://metr.org/donate

If you’d like to discuss giving with us first, or receive more information about our work for the purpose of informing a donation, reach out to giving@metr.org

^{^}
However, we are definitely not immune from conflicting incentives. Some examples:
- We are open to taking donations from individual lab employees (subject to some constraints, e.g. excluding senior decision-makers, constituting <50% of our funding)
- Labs provide us with free model access for conducting our evaluations, and several labs also provide us ongoing free access for research even if we're not conducting a specific evaluation.

Which stocks or ETFs should you invest in to take advantage of a possible AGI explosion, and why?

ElizabethBarnes3y12

Was this comment written (or partly written) by an LLM? It really sounds like it to me.

Critiques of prominent AI safety labs: Redwood Research

ElizabethBarnes3y81

Thanks for taking the time to write thoughtful criticism. Wanted to add a few quick notes (though note that I'm not really impartial as I'm socially very close with Redwood)

- I personally found MLAB extremely valuable. It was very well-designed and well-taught and was the best teaching/learning experience I've had by a fairly wide margin
- Redwood's community building (MLAB, REMIX and people who applied to or worked at Redwood) has been a great pipeline for ARC Evals and our biggest single source for hiring (we currently have 3 employees and 2 work triallers who came via Redwood community building efforts).
- It was also very useful for ARC Evals to be able to use Constellation office space while we were getting started, rather than needing to figure this out by ourselves.
- As a female person I feel very comfortable in Constellation. I've never felt that I needed to defer or was viewed for my dating potential rather than my intellectual contributions. I do think I'm pretty happy to hold my ground and sometimes oblivious to things that bother other people, so that might not be a very strong evidence that it isn't an issue for other people. However, I have been bothered in the past by places that try to make up the gender balance by hiring a lot of women for non-technical roles. In these places, people assume that the women who are there are non-technical. I think it would make the environment worse for me personally if there was pressure for Constellation to balance the gender ratios.
- I think there have been various ways in which Redwood culture and management style were not great. I think some of this was due to difficult tradeoffs or normal challenges of being a new organization, and some of it was unforced errors. I think they are mostly aware of the issues and taking steps to fix them, although I don't think I expect them to be excellent at management that soon. Some of my recommendations (which I've told them before and think they have mostly taken on board):
-- If Buck is continuing to manage people (and maybe also if not), he should get management coaching
-- Give employees lots of concrete positive feedback (at least once per week)
-- When letting people go, be very clear that hiring is noisy, people perform differently at different organizations; Redwood is a challenging and often low-management environment that, like a PhD program, is not a good fit for everyone; they shouldn't be too discouraged. (I think Redwood believes this but hasn't been as clear as they could be about communicating it)
-- Make sure expectations are clear for work trials
-- Make growth for their employees a serious priority, especially for their top performers - this should be something that is done deliberately with time set aside for it

Critiques of prominent AI safety labs: Redwood Research

ElizabethBarnes3y13

In my understanding, there was another important difference in Redwood's project from the standard adversarial robustness literature: they were looking to eliminate only 'competent' failures (ie cases where the model probably 'knows' what the correct classification is), and would have counted it a success if there were still failures if the failure was due to a lack of competence on the model's part (e.g. 'his mitochondria were liberated' -> implies harm but only if you know enough biology)

I think in practice in their exact project this didn't end up being a super clear conceptual line, but at the start it was plausible to me that only focusing on competent failures made the task feasible even if the general case is impossible.

Why we're not founding a human-data-for-alignment org

ElizabethBarnes4y32

This is a really great write-up, thanks for doing this so conscientiously and thoroughly. It's good to hear that Surge is mostly meeting researchers' needs.

Re whether higher-quality human data is just patching current alignment problems - the way I think about it is more like: there's a minimum level of quality you need to set up various enhanced human feedback schemes. You need people to actually read and follow the instructions, and if they don't do this reliably you really won't be able to set up something like amplification or other schemes that need your humans to interact with models in non-trivial ways. It seems good to get human data quality to the point where it's easy for alignment researchers to implement different schemes that involve complex interactions (like the humans using an adversarial example finder tool or looking at the output of an interpretability tool). This is different from the case where we e.g. have an alignment problem because MTurkers mark common misconceptions as truthful, whereas more educated workers correctly mark them as false, which I don't think of as a scalable sort of improvement.

Who's hiring? (May-September 2022) [closed]

Answer by ElizabethBarnesSep 09, 202212

The evaluations project at the Alignment Research Center is looking to hire a generalist technical researcher and a webdev-focused engineer. We're a new team at ARC building capability evaluations (and in the future, alignment evaluations) for advanced ML models. The goals of the project are to improve our understanding of what alignment danger is going to look like, understand how far away we are from dangerous AI, and create metrics that labs can make commitments around (e.g. 'If you hit capability threshold X, don't train a larger model until you've hit alignment threshold Y'). We're also still hiring for model interaction contractors, and we may be taking SERI MATS fellows.

Are there any AI Safety labs that will hire self-taught ML engineers?

ElizabethBarnes4y5

I think DM clearly restricts REs more than OpenAI (and I assume Anthropic). I know of REs at DM who have found it annoying/difficult to lead projects because of being REs, I know of someone without a PhD who left Brain (not DeepMind but still Google so prob more similar) partly because it was restrictive, and lead team at OAI/Anthropic, and I know of people without an undergrad degree who have been hired by OAI/Anthropic. At OpenAI I'm not aware of it being more difficult for people to lead projects etc because of being 'officially an RE'. I had bad experiences at DM that were ostensibly related to not having a PhD (but could also have been explained by lack of research ability).

The Future Fund’s Project Ideas Competition

ElizabethBarnes4y22

High-quality human data

Artificial Intelligence

Most proposals for aligning advanced AI require collecting high-quality human data on complex tasks such as evaluating whether a critique of an argument was good, breaking a difficult question into easier subquestions, or examining the outputs of interpretability tools. Collecting high-quality human data is also necessary for many current alignment research projects.

We’d like to see a human data startup that prioritizes data quality over financial cost. It would follow complex instructions, ensure high data quality and reliability, and operate with a fast feedback loop that’s optimized for researchers’ workflow. Having access to this service would make it quicker and easier for safety teams to iterate on different alignment approaches

Some alignment research teams currently manage their own contractors because existing services (such as surgehq.ai and scale.ai) don’t fully address their needs; a competent human data startup could free up considerable amounts of time for top researchers.

Such an organization could also practice and build capacity for things that might be needed at ‘crunch time’ – i.e., rapidly producing moderately large amounts of human data, or checking a large volume of output from interpretability tools or adversarial probes with very high reliability.

The market for high-quality data will likely grow – as AI labs train increasingly large models at a high compute cost, they will become more willing to pay for data. As models become more competent, data needs to be more sophisticated or higher-quality to actually improve model performance.

Making it less annoying for researchers to gather high-quality human data relative to using more compute would incentivize the entire field towards doing work that’s more helpful for alignment, e.g., improving products by making them more aligned rather than by using more compute.

[Thanks to Jonas V for writing a bunch of this comment for me]
[Views are my own and do not represent that of my employer]

COVID-19 brief for friends and family

ElizabethBarnes6y6

Although I believe all the deaths were at a nursing home, where you'd expect a much higher death rate

ElizabethBarnes

Posts 5

Comments17

Posts
5

Comments
17