Unfortunately, reciprocity.io is currently down (as of a few hours ago). I think it will hopefully be back in <24 hours.
EDIT now back up
If you come across as insulting, someone might say you're an asshole to everyone they talk to for the next five years, which might make it harder for you to do other things you'd hoped to do.
The problem with saying things like this isn't that they're time consuming to say, but that they open you up to some risk of the applicant getting really mad at you, and have various other other risks like this. These costs can be mitigated by being careful (eg picking phrasings very intentionally, running your proposed feedback by other people) but being careful is time-consuming.
One solution could be to have a document with something like „The 10 most common reasons for rejections“ and send it to people with a disclaimer like „We are wary of giving specific feedback because we worry about [insert reasons]. The reason why I rejected this proposal is well covered among the 10 reasons in this list and it should be fairly clear which ones apply to your proposal, especially if you would go through the list with another person that has read your proposal.“
I massively disagree re the business class point. In particular, many people (e.g. me) can sleep in business class seats that let you lie flat, when they would have not slept and been quite sad and unproductive.
not worth the 2x or 3x ticket price
As a general point, the ratio between prices is irrelevant to the purchasing choice if you're only buying something once--you only care about the difference in price and the difference in value.
I think that knowing a bit about ML is probably somewhat helpful for this but not very important.
What do you mean by “uniform prior” here?
FWIW I think that compared to Chris Olah's old interpretability work, Redwood's adversarial training work feels more like phase 2 work, and our current interpretability work is similarly phase 2.
One problem with this estimate is that you don’t end up learning how long the authors spent on the project, or how important their contributions were. My sense is that contributors to industry publications often spent relatively little time on the project compared to academic contributors.
Anthropic took less than a year to set up large model training infrastructure from scratch but with the benefit of experience. This indicates that infrastructure isn’t currently extremely hard to replicate.
EleutherAI has succeeded at training some fairly large models (the biggest has like 20B params, compared to 580B in PaLM) while basically just being talented amateurs (and also not really having money). These models introduced a simple but novel tweak to the transformer architecture that PaLM used (parallel attention and MLP layers). This suggests that e... (read more)
As I understand it, DeepMind doesn’t hire people without PhDs as research scientists, and places more restrictions on what research engineers can do than other places.
I think that the longtermist EA community mostly acts as if we're close to the hinge of history, because most influential longtermists disagree with Will on this. If Will's take was more influential, I think we'd do quite different things than we're currently doing.
I'd love to hear what you think we'd be doing differently. With JackM, I think if we thought that hinginess was pretty evenly distributed across centuries ex ante we'd be doing a lot of movement-building and saving, and then distributing some of our resources at the hingiest opportunities we come across at each time interval. And in fact that looks like what we're doing. Would you just expect a bigger focus on investment? I'm not sure I would, given how much EA is poised to grow and how comparably little we've spent so far. (Cf. Phil Trammell's disbursement tool https://www.philiptrammell.com/dpptool/)
I think if we’re at the most influential point in history “EA community building” doesn’t make much sense. As others have said it would probably make more sense to be shouting about why we’re at the most influential point in history i.e. do “x-risk community building” or of course do more direct x-risk work.
I suspect we’d also do less global priorities research (although perhaps we don’t do that much as it is). If you think we’re at the most influential time you probably have a good reason for thinking that (x-risk abnormally high) which then informs wha... (read more)
I'm not sure what you mean by "AI safety labs", but Redwood Research, Anthropic, and the OpenAI safety team have all hired self-taught ML engineers. DeepMind has a reputation for being more focused on credentials. Other AI labs don't do as much research that's clearly focused on AI takeover risk.
I'm running Redwood Research's interpretability research.
I've considered running an "interpretability mine"--we get 50 interns, put them through a three week training course on transformers and our interpretability tools, and then put them to work on building mechanistic explanations of parts of some model like GPT-2 for the rest of their internship.
My usual joke is "GPT-2 has 12 attention heads per layer and 48 layers. If we had 50 interns and gave them each a different attention head every day, we'd have an intern-day of analysis of each attention head i... (read more)
I have some sympathy to this perspective, and suspect you’re totally right about some parts of this.
They misuse jargon like “updating” and “outside view” in an attempt to get their point across, and their interlocutors decide that talking with them is not worth their time.
They misuse jargon like “updating” and “outside view” in an attempt to get their point across, and their interlocutors decide that talking with them is not worth their time.
However, I totally don’t buy this. IMO the concepts of “updating” and “outside view” are important enough and non-quantitative enough that if someone can’t use that jargon correctly after learning it, I’m very skeptical of their ability to contribute intellectually to EA. (Of course, we should explain what those terms mean the first time they run across them.)
For many non-native speakers having a conversation in English is quite cognitively demanding – especially when talking about intellectual topics they just learned about. Even reasonably proficient speakers often struggle to express themselves as clearly as they could in their native language, there is a trade-off between fluent speech and optimal word choice/sentence construction. If given 2x more time, or the chance to write down their thoughts, they would possibly not misuse the jargon to the same degree.Many people get excited about EA when they first h... (read more)
How do you know whether you're happy with the results?
This argument for the proposition "AI doesn't have an advantage over us at solving the alignment problem" doesn't work for outer alignment—some goals are easier to measure than others, and agents that are lucky enough to have easy-to-measure goals can train AGIs more easily.
Unfortunately this isn’t a very good description of the concern about AI, and so even if it “polls better” I’d be reluctant to use it.
No, the previous application will work fine. Thanks for applying :)
I think it's bad when people who've been around EA for less than a year sign the GWWC pledge. I care a lot about this.I would prefer groups to strongly discourage new people from signing it.I can imagine boycotting groups that encouraged signing the GWWC pledge (though I'd probably first want to post about why I feel so strongly about this, and warn them that I was going to do so).I regret taking the pledge, and the fact that the EA community didn't discourage me from taking it is by far my biggest complaint about how the EA movement has treated me. (EDIT:... (read more)
I’m very sorry to hear that you regret taking The Pledge and feel that the EA community in 2014 should have actively discouraged you from taking it in the first place.
If you believe it’s better for you and the world that you unpledge then you should feel free to do so. I also strongly endorse this statement from the 2017 post that KevinO quoted:
“The spirit of the Pledge is not to stop you from doing more good, and is not to lead you to ruin. If you find that it’s doing either of these things, you should probably break the Pledge.”
I would very much ... (read more)
I strongly agree that local groups should encourage people to give for a couple years before taking the GWWC Pledge, and that the Pledge isn't right for everyone (I've been donating 10% since childhood and have never taken the pledge).
When it comes to the 'Further Giving' Pledge, I think it wouldn't be unreasonable to encourage people to get some kind of pre-Pledge counselling or take a pre-Pledge class, to be absolutely certain people have thought through the implications of the commitment they are making .
I'd be pretty interested in you writing this up. I think it could cause some mild changes in the way I treat my salary.
From "Clarifying the Giving What We Can pledge" in 2017 (https://forum.effectivealtruism.org/posts/drJP6FPQaMt66LFGj/clarifying-the-giving-what-we-can-pledge#How_permanent_is_the_Pledge__)"""How permanent is the Pledge?
The Pledge is a promise, or oath, to be made seriously and with every expectation of keeping it. But if someone finds that they can no longer keep the Pledge (for instance due to serious unforeseen circumstances), then they can simply contact us, discuss the matter if need be, and cease to be a member. They can of course rejoin later i... (read more)
...I don't have time to write the full post right now
I'm eager to read the full post, or any expansion on what makes you think that groups should actively discourage newbies from take the Pledge.
I regret taking the pledge
I feel like you should be able to "unpledge" in that case, and further I don't think you should feel shame or face stigma for this. There's a few reasons I think this:
Additionally, what are/how strong are the track records of Redwood's researchers/advisors?
The people we seek advice from on our research most often are Paul Christiano and Ajeya Cotra. Paul is a somewhat experienced ML researcher, who among other things led some of the applied alignment research projects that I am most excited about.
On our team, the people with the most relevant ML experience are probably Daniel Ziegler, who was involved with GPT-3 and also several OpenAI alignment research projects, and Peter Schmidt-Nielsen. Many of our other staff have ... (read more)
So one thing to note is that I think that there are varying degrees of solving the technical alignment problem. In particular, you’ve solved the alignment problem more if you’ve made it really convenient for labs to use the alignment techniques you know about. If next week some theory people told me “hey we think we’ve solved the alignment problem, you just need to use IDA, imitative generalization, and this new crazy thing we just invented”, then I’d think that the main focus of the applied alignment community should be trying to apply these alignment tec... (read more)
We could operationalize this as “How does P(doom) vary as a function of the total amount of quality-adjusted x-risk-motivated AI alignment output?” (A related question is “Of the quality-adjusted AI alignment research, how much will be motivated by x-risk concerns?” This second question feels less well defined.)
I’m pretty unsure here. Today, my guess is like 25% chance of x-risk from AI this century, and maybe I imagine that being 15% if we doubled the quantity of quality-adjusted x-risk-motivated AI alignment output, and 35% if we halved that quantity. Bu... (read more)
Here are some things I think are fairly likely:
I think this is a great question.
We are researching techniques that are simpler precursors to adversarial training techniques that seem most likely to work if you assume that it’s possible to build systems that are performance-competitive and training-competitive, and do well on average on their training distribution.
There are a variety of reasons to worry that this assumption won’t hold. In particular, it seems plausible that humanity will only have the ability to produce AGIs that will collude with each other if it’s possible for them to do so. This seem... (read more)
I think our work is aimed at reducing the theory-practice gap of any alignment schemes that attempt to improve worst-case performance by training the model on data that was selected in the hope of eliciting bad behavior from the model. For example, one of the main ingredients of our project is paying people to try to find inputs that trick the model, then training the model on these adversarial examples.
Many different alignment schemes involve some type of adversarial training. The kind of adversarial training we’re doing, where we just rely on human ingen... (read more)
So there’s this core question: "how are the results of this project going to help with the superintelligence alignment problem?" My claim can be broken down as follows:
So to start with, I want to note that I imagine something a lot more like “the alignment community as a whole develops promising techniques, probably with substantial collaboration between research organizations” than “Redwood does all the work themselves”. Among other things, we don’t have active plans to do much theoretical alignment work, and I’d be fairly surprised if it was possible to find techniques I was confident in without more theoretical progress--our current plan is to collaborate with theory researchers elsewhere.
In this comment, I mentioned ... (read more)
It seems definitely good on the margin if we had ways of harnessing academia to do useful work on alignment. Two reasons for this are that 1. perhaps non-x-risk-motivated researchers would produce valuable contributions, and 2. it would mean that x-risk-motivated researchers inside academia would be less constrained and so more able to do useful work.
Three versions of this:
This is a great question and I don't have a good answer.
One simple model for this is: labs build aligned models if the amount of pressure on them to use sufficiently reliable alignment techniques is greater than the inconvenience associated with using those techniques.
Here are various sources of pressure:
In practice, all of these sources of pressure are involved in companies spending resources on, eg, improving animal welfare standards, reducing environmental costs, or DEI (diversity, equity, and inclusion).
And here are various sources of inconvenien... (read more)
I think that most questions we care about are either technical or related to alignment. Maybe my coworkers will think of some questions that fit your description. Were you thinking of anything in particular?
GPT-3 suggests: "We will post the AMA with a disclaimer that the answers are coming from Redwood staff. We will also be sure to include a link to our website in the body of the AMA, with contact information if someone wants to verify with us that an individual is staff."
I think the main skillsets required to set up organizations like this are:
Thanks for the kind words!
Our biggest bottlenecks are probably going to be some combination of:
In most worlds where we fail to produce value, I think we fail before we spend a hundred researcher-years. So I’m also going to include possibilities for wasting 30 researcher-years in this answer.
Here’s some reasons we might have failed to produce useful research:
It’s probably going to be easier to get good at the infrastructure engineering side of things than the ML side of things, so I’ll assume that that’s what you’re going for.
For our infra engineering role, we want to hire people who are really productive and competent at engineering various web systems quickly. (See the bulleted list of engineering responsibilities on the job page.) There are some people who are qualified for this role without having much professional experience, because they’ve done a lot of Python programming and web programming as hob... (read more)
I think the best examples would be if we tried to practically implement various schemes that seem theoretically doable and potentially helpful, but quite complicated to do in practice. For example, imitative generalization or the two-head proposal here. I can imagine that it might be quite hard to get industry labs to put in the work of getting imitative generalization to work in practice, and so doing that work (which labs could perhaps then adopt) might have a lot of impact.
Redwood Research is looking for people to help us find flaws in our injury-detecting model. We'll pay $30/hour for this, for up to 2 hours; after that, if you’ve found interesting stuff, we’ll pay you for more of this work at the same rate. I expect our demand for this to last for maybe a month (though we'll probably need more in future).If you’re interested, please email firstname.lastname@example.org so he can add you to a Slack or Discord channel with other people who are working on this. This might be a fun task for people who like being creative, being tricky, and fi... (read more)
In other words, if the disagreement was "bottom-up", then you'd expect that at least some people who are optimistic about misalignment risk would be pessimistic about other kinds of AI risk, such as what I call "human safety problems" (see examples here and here) but in fact I don't seem to see anyone whose position is something like, "AI alignment will be easy or likely solved by default, therefore we should focus our efforts on these other kinds of AI-related x-risks that are much more worrying."
FWIW I know some people who explicitly think th... (read more)
Sounds like their positions are not public, since you don't cite anyone by name? Is there any reason for that?
What kinds of things do you think it would be helpful to do cost effectiveness analyses of? Are you looking for cost effectiveness analyses of problem areas or specific interventions?
It was a great experience for me, for a bunch of reasons.
Here in the EA community, we’re trying to do lots of good. Recently I’ve been thinking about the similarities and differences between a community focused on doing lots of good and a community focused on getting really rich.
I think this is interesting for a few reasons:
Thanks, this is an interesting analogy.
If too few EAs go into more bespoke roles, then one reason could be risk-aversion. Rightly or wrongly, they may view those paths as more insecure and risky (for them personally; though I expect personal and altruistic risk correlate to a fair degree). If so, then one possibility is that EA funders and institutions/orgs should try to make them less risky or otherwise more appealing (there may already be some such projects).
In recent years, EA has put less emphasis on self-sacrifice, arguing that we can't expect p... (read more)
Yeah but this pledge is kind of weird for an altruist to actually follow, instead of donating more above the 10%. (Unless you think that almost everyone believes that most of the reason for them to do the GWWC pledge is to enforce the norm, and this causes them to donate 10%, which is more than they'd otherwise donate.)
[This is an excerpt from a longer post I'm writing]
Suppose someone’s utility function is
U = f(C) + D
Where U is what they’re optimizing, C is their personal consumption, f is their selfish welfare as a function of consumption (log is a classic choice for f), and D is their amount of donations.
Suppose that they have diminishing utility wrt (“with respect to”) consumption (that is, df(C)/dC is strictly monotonically decreasing). Their marginal utility wrt donations is a constant, and their marginal utility wrt consumption is a decreasing function. There has t... (read more)
[epistemic status: I'm like 80% sure I'm right here. Will probably post as a main post if no-one points out big holes in this argument, and people seem to think I phrased my points comprehensibly. Feel free to leave comments on the google doc here if that's easier.]
I think a lot of EAs are pretty confused about Shapley values and what they can do for you. In particular Shapley values are basically irrelevant to problems related to coordination between a bunch of people who all have the same values. I want to talk about why.
So Shapley values are a sol... (read more)
I am not sure. I think it’s pretty likely I would want to fund after risk adjustment. I think that if you are considering trying to get funded this way, you should consider reaching out to me first.
I would personally be pretty down for funding reimbursements for past expenses.
This is indeed my belief about ex ante impact. Thanks for the clarification.