Additionally, what are/how strong are the track records of Redwood's researchers/advisors?
The people we seek advice from on our research most often are Paul Christiano and Ajeya Cotra. Paul is a somewhat experienced ML researcher, who among other things led some of the applied alignment research projects that I am most excited about.
On our team, the people with the most relevant ML experience are probably Daniel Ziegler, who was involved with GPT-3 and also several OpenAI alignment research projects, and Peter Schmidt-Nielsen. Many of our other staff have ... (read more)
So one thing to note is that I think that there are varying degrees of solving the technical alignment problem. In particular, you’ve solved the alignment problem more if you’ve made it really convenient for labs to use the alignment techniques you know about. If next week some theory people told me “hey we think we’ve solved the alignment problem, you just need to use IDA, imitative generalization, and this new crazy thing we just invented”, then I’d think that the main focus of the applied alignment community should be trying to apply these alignment tec... (read more)
We could operationalize this as “How does P(doom) vary as a function of the total amount of quality-adjusted x-risk-motivated AI alignment output?” (A related question is “Of the quality-adjusted AI alignment research, how much will be motivated by x-risk concerns?” This second question feels less well defined.)
I’m pretty unsure here. Today, my guess is like 25% chance of x-risk from AI this century, and maybe I imagine that being 15% if we doubled the quantity of quality-adjusted x-risk-motivated AI alignment output, and 35% if we halved that quantity. Bu... (read more)
Here are some things I think are fairly likely:
I think this is a great question.
We are researching techniques that are simpler precursors to adversarial training techniques that seem most likely to work if you assume that it’s possible to build systems that are performance-competitive and training-competitive, and do well on average on their training distribution.
There are a variety of reasons to worry that this assumption won’t hold. In particular, it seems plausible that humanity will only have the ability to produce AGIs that will collude with each other if it’s possible for them to do so. This seem... (read more)
I think our work is aimed at reducing the theory-practice gap of any alignment schemes that attempt to improve worst-case performance by training the model on data that was selected in the hope of eliciting bad behavior from the model. For example, one of the main ingredients of our project is paying people to try to find inputs that trick the model, then training the model on these adversarial examples.
Many different alignment schemes involve some type of adversarial training. The kind of adversarial training we’re doing, where we just rely on human ingen... (read more)
So there’s this core question: "how are the results of this project going to help with the superintelligence alignment problem?" My claim can be broken down as follows:
So to start with, I want to note that I imagine something a lot more like “the alignment community as a whole develops promising techniques, probably with substantial collaboration between research organizations” than “Redwood does all the work themselves”. Among other things, we don’t have active plans to do much theoretical alignment work, and I’d be fairly surprised if it was possible to find techniques I was confident in without more theoretical progress--our current plan is to collaborate with theory researchers elsewhere.
In this comment, I mentioned ... (read more)
It seems definitely good on the margin if we had ways of harnessing academia to do useful work on alignment. Two reasons for this are that 1. perhaps non-x-risk-motivated researchers would produce valuable contributions, and 2. it would mean that x-risk-motivated researchers inside academia would be less constrained and so more able to do useful work.
Three versions of this:
This is a great question and I don't have a good answer.
One simple model for this is: labs build aligned models if the amount of pressure on them to use sufficiently reliable alignment techniques is greater than the inconvenience associated with using those techniques.
Here are various sources of pressure:
In practice, all of these sources of pressure are involved in companies spending resources on, eg, improving animal welfare standards, reducing environmental costs, or DEI (diversity, equity, and inclusion).
And here are various sources of inconvenien... (read more)
I think that most questions we care about are either technical or related to alignment. Maybe my coworkers will think of some questions that fit your description. Were you thinking of anything in particular?
GPT-3 suggests: "We will post the AMA with a disclaimer that the answers are coming from Redwood staff. We will also be sure to include a link to our website in the body of the AMA, with contact information if someone wants to verify with us that an individual is staff."
I think the main skillsets required to set up organizations like this are:
Thanks for the kind words!
Our biggest bottlenecks are probably going to be some combination of:
In most worlds where we fail to produce value, I think we fail before we spend a hundred researcher-years. So I’m also going to include possibilities for wasting 30 researcher-years in this answer.
Here’s some reasons we might have failed to produce useful research:
It’s probably going to be easier to get good at the infrastructure engineering side of things than the ML side of things, so I’ll assume that that’s what you’re going for.
For our infra engineering role, we want to hire people who are really productive and competent at engineering various web systems quickly. (See the bulleted list of engineering responsibilities on the job page.) There are some people who are qualified for this role without having much professional experience, because they’ve done a lot of Python programming and web programming as hob... (read more)
I think the best examples would be if we tried to practically implement various schemes that seem theoretically doable and potentially helpful, but quite complicated to do in practice. For example, imitative generalization or the two-head proposal here. I can imagine that it might be quite hard to get industry labs to put in the work of getting imitative generalization to work in practice, and so doing that work (which labs could perhaps then adopt) might have a lot of impact.
Redwood Research is looking for people to help us find flaws in our injury-detecting model. We'll pay $30/hour for this, for up to 2 hours; after that, if you’ve found interesting stuff, we’ll pay you for more of this work at the same rate. I expect our demand for this to last for maybe a month (though we'll probably need more in future).If you’re interested, please email firstname.lastname@example.org so he can add you to a Slack or Discord channel with other people who are working on this. This might be a fun task for people who like being creative, being tricky, and fi... (read more)
In other words, if the disagreement was "bottom-up", then you'd expect that at least some people who are optimistic about misalignment risk would be pessimistic about other kinds of AI risk, such as what I call "human safety problems" (see examples here and here) but in fact I don't seem to see anyone whose position is something like, "AI alignment will be easy or likely solved by default, therefore we should focus our efforts on these other kinds of AI-related x-risks that are much more worrying."
FWIW I know some people who explicitly think th... (read more)
Sounds like their positions are not public, since you don't cite anyone by name? Is there any reason for that?
What kinds of things do you think it would be helpful to do cost effectiveness analyses of? Are you looking for cost effectiveness analyses of problem areas or specific interventions?
It was a great experience for me, for a bunch of reasons.
Here in the EA community, we’re trying to do lots of good. Recently I’ve been thinking about the similarities and differences between a community focused on doing lots of good and a community focused on getting really rich.
I think this is interesting for a few reasons:
Thanks, this is an interesting analogy.
If too few EAs go into more bespoke roles, then one reason could be risk-aversion. Rightly or wrongly, they may view those paths as more insecure and risky (for them personally; though I expect personal and altruistic risk correlate to a fair degree). If so, then one possibility is that EA funders and institutions/orgs should try to make them less risky or otherwise more appealing (there may already be some such projects).
In recent years, EA has put less emphasis on self-sacrifice, arguing that we can't expect p... (read more)
Yeah but this pledge is kind of weird for an altruist to actually follow, instead of donating more above the 10%. (Unless you think that almost everyone believes that most of the reason for them to do the GWWC pledge is to enforce the norm, and this causes them to donate 10%, which is more than they'd otherwise donate.)
[This is an excerpt from a longer post I'm writing]
Suppose someone’s utility function is
U = f(C) + D
Where U is what they’re optimizing, C is their personal consumption, f is their selfish welfare as a function of consumption (log is a classic choice for f), and D is their amount of donations.
Suppose that they have diminishing utility wrt (“with respect to”) consumption (that is, df(C)/dC is strictly monotonically decreasing). Their marginal utility wrt donations is a constant, and their marginal utility wrt consumption is a decreasing function. There has t... (read more)
[epistemic status: I'm like 80% sure I'm right here. Will probably post as a main post if no-one points out big holes in this argument, and people seem to think I phrased my points comprehensibly. Feel free to leave comments on the google doc here if that's easier.]
I think a lot of EAs are pretty confused about Shapley values and what they can do for you. In particular Shapley values are basically irrelevant to problems related to coordination between a bunch of people who all have the same values. I want to talk about why.
So Shapley values are a sol... (read more)
I am not sure. I think it’s pretty likely I would want to fund after risk adjustment. I think that if you are considering trying to get funded this way, you should consider reaching out to me first.
I would personally be pretty down for funding reimbursements for past expenses.
This is indeed my belief about ex ante impact. Thanks for the clarification.
That might achieve the "these might be directly useful goal" and "produce interesting content" goals, if the reviewers knew about how to summarize the books from an EA perspective, how to do epistemic spot checks, and so on, which they probably don't. It wouldn't achieve any of the other goals, though.
Here's a crazy idea. I haven't run it by any EAIF people yet.
I want to have a program to fund people to write book reviews and post them to the EA Forum or LessWrong. (This idea came out of a conversation with a bunch of people at a retreat; I can’t remember exactly whose idea it was.)
I worry sometimes that EAs aren’t sufficiently interested in learning facts about the world that aren’t directly related to EA stuff.
I share this concern, and I think a culture with more book reviews is a great way to achieve that (I've been happy to see all of Michael Aird's book summaries for that reason).
CEA briefly considered paying for book reviews (I was asked to write this review as a test of that idea). IIRC, the goal at the time was more about getting more engagement from people on the periphery of EA by creating EA-related content they'd find int... (read more)
I think that "business as usual but with more total capital" leads to way less increased impact than 20%; I am taking into account the fact that we'd need to do crazy new types of spending.
Incidentally, you can't buy the New York Times on public markets; you'd have to do a private deal with the family who runs it
Re 1: I think that the funds can maybe disburse more money (though I'm a little more bearish on this than Jonas and Max, I think). But I don't feel very excited about increasing the amount of stuff we fund by lowering our bar; as I've said elsewhere on the AMA the limiting factor on a grant to me usually feels more like "is this grant so bad that it would damage things (including perhaps EA culture) in some way for me to make it" than "is this grant good enough to be worth the money".
I think that the funds' RFMF is only slightly real--I think that giving t... (read more)
I think that if a new donor appeared and increased the amount of funding available to longtermism by $100B, this would maybe increase the total value of longtermist EA by 20%.
At first glance the 20% figure sounded about right to me. However, when thinking a bit more about it, I'm worried that (at least in my case) this is too anchored on imagining "business as usual, but with more total capital". I'm wondering if most of the expected value of an additional $100B - especially when controlled by a single donor who can flexibly deploy them - comes from 'crazy... (read more)
I am planning on checking in with grantees to see how well they've done, mostly so that I can learn more about grantmaking and to know if we ought to renew funding.
I normally didn't make specific forecasts about the outcomes of grants, because operationalization is hard and scary.
I feel vaguely guilty about not trying harder to write down these proxies ahead of time. But I empirically don't, and my intuitions apparently don't feel that optimistic about working on this. I am not sure why. I think it's maybe just that operationationalization is super hard an... (read more)
Like Max, I don't know about such a policy. I'd be very excited to fund promising projects to support the rationality community, eg funding local LessWrong/Astral Codex Ten groups.
Re 1: I don't think I would have granted more
Re 2: Mostly "good applicants with good proposals for implementing good project ideas" and "grantmaker capacity to solicit or generate new project ideas", where the main bottleneck on the second of those isn't really generating the basic idea but coming up with a more detailed proposal and figuring out who to pitch on it etc.
Re 3: I think I would be happy to evaluate more grant applications and have a correspondingly higher bar. I don't think that low quality applications make my life as a grantmaker much worse;... (read more)
Re your 19 interventions, here are my quick takes on all of them
Creating, scaling, and/or improving EA-aligned research orgs
Yes I am in favor of this, and my day job is helping to run a new org that aspires to be a scalable EA-aligned research org.
Creating, scaling, and/or improving EA-aligned research training programs
I am in favor of this. I think one of the biggest bottlenecks here is finding people who are willing to mentor people in research. My current guess is that EAs who work as researchers should be more willing to mentor people in research... (read more)
I feel very unsure about this. I don't think my position on this question is very well thought through.
Most of the time, the reason I don't want to make a grant doesn't feel like "this isn't worth the money", it feels like "making this grant would be costly for some other reason". For example, when someone applies for a salary to spend some time researching some question which I don't think they'd be very good at researching, I usually don't want to fund them, but this is mostly because I think it's unhealthy in various ways for EA to fund people to flail ... (read more)
re 1: I expect to write similarly detailed writeups in future.
re 2: I think that would take a bunch more of my time and not clearly be worth it, so it seems unlikely that I'll do it by default. (Someone could try to pay me at a high rate to write longer grant reports, if they thought that this was much more valuable than I did.)
re 3: I agree with everyone that there are many pros of writing more detailed grant reports (and these pros are a lot of why I am fine with writing grant reports as long as the ones I wrote). By far the biggest con is that it takes ... (read more)
I don't think this has much of an advantage over other related things that I do, like
A question for the fund managers: When the EAIF funds a project, roughly how should credit should be allocated between the different involved parties, where the involved parties are:
Presumably this differs a lot between grants; I'd be interested in some typical figures.
This question is important because you need a sense of these numbers in order to make decisions about which of these parties you sho... (read more)
(I'd be very interested in your answer if you have one btw.)
Making up some random numbers:
This is for a typical grant where someone applies to the fund with a reasonably promising project on their own and the EAIF gives them some quick advice and feedback. For a case of strong active grantmaking, I might say something more like 8% / 30% / 12% / 50%.
This is based on the reasoning that we're quite constrained by promising applications and have a lot of funding available.
Incidentally, I think that tracking work time is a kind of dangerous thing to do, because it makes it really tempting to make bad decisions that will cause you to work more. This is a lot of why I don't normally track it.
EDIT: however, it seems so helpful to track it some of the time that I overall strongly doing it for at least a week a year.
I occasionally track my work time for a few weeks at a time; by coincidence I happen to be tracking it at the moment. I used to use Toggl; currently I just track my time in my notebook by noting the time whenever I start and stop working (where by "working" I mean "actively focusing on work stuff"). I am more careful about time tracking my work on my day job (working on longtermist technical research, as an individual contributor and manager) than working on the EAIF and other movement building stuff.
The first four days this week, I did 8h33m, 8h15m, 7h32m... (read more)
That seems correct, but doesn’t really defend Ben’s point, which is what I was criticizing.
I am glad to have you around, of course.
My claim is just that I doubt you thought that if the rate of posts like this was 50% lower, you would have been substantially more likely to get involved with EA; I'd be very interested to hear I was wrong about that.
I think that isn't the right counterfactual since I got into EA circles despite having only minimal (and net negative) impressions of EA-related forums. So your claim is narrowly true, but if instead the counterfactual was if my first exposure to EA was the EA forum, then I think yes the prominence of this kind of post would have made me substantially less likely to engage.
But fundamentally if we're running either of these counterfactuals I think we're already leaving a bunch of value on the table, as expressed by EricHerboso's post about false dilemmas.
I am not sure whether I think it's a net cost that some people will be put off from EA by posts like this, because I think that people who would bounce off EA because of posts like this aren't obviously net-positive to have in EA. (My main model here is that the behavior described in this post is pretty obviously bad, and the kind of SJ-sympathetic EAs who I expect to be net sources of value probably agree that this behavior is bad. Secondarily, I think that people who are really enthusiastic about EA are pretty likely to stick around even when they're inf... (read more)
I think that people who are really enthusiastic about EA are pretty likely to stick around even when they're infuriated by things EAs are saying.[...]If you know someone (eg yourself) who you think is a counterargument to this claim of mine, feel free to message me.
I think that people who are really enthusiastic about EA are pretty likely to stick around even when they're infuriated by things EAs are saying.
If you know someone (eg yourself) who you think is a counterargument to this claim of mine, feel free to message me.
I would guess it depends quite a bit on these people's total exposure to EA at the time when they encounter something they find infuriating (or even just somewhat off / getting a vibe that this community probably is "not for them").
If we're imagining people who've already had 10 or even 100 hour... (read more)
I bounce off posts like this. Not sure if you'd consider me net positive or not. :)
More generally, I think our disagreement here probably comes down to something like this:
There's a tradeoff between having a culture where true and important things are easy to say, and a culture where group X feels maximally welcome. As you say, if we're skillful we can do both of these, by being careful about our language and always sounding charitable and not repeatedly making similar posts.
But this comes at a cost. I personally feel much less excited about writing about certain topics because I'd have to be super careful about them. And most of t... (read more)
I don't disagree with any of that. I acknowledge there is real cost in trying to make people feel welcome on top of the community service of speaking up about bad practice (leaving aside the issue of how bad what happened is exactly).I just think there is also some cost, that you are undervaluing and not acknowledging here, in the other side of that trade-off. Maybe we disagree on the exchange rate between these (welcomingness and unfiltered/candid communication)?I think that becoming more skillful at doing both well is an important skill for a community l... (read more)
(I'm writing these comments kind of quickly, sorry for sloppiness.)
With regard to
Where X is the bad thing ACE did, the situation is clearly far more nuanced as to how bad it is than something like sexual misconduct, which, by the time we have decided something deserves that label, is unequivocally bad.
In this particular case, Will seems to agree that X was bad and concerning, which is why my comment felt fair to me.
I would have no meta-level objection to a comment saying "I disagree that X is bad, I think it's actually fine".