All of Buck's Comments + Replies

Power dynamics between people in EA

Unfortunately, reciprocity.io is currently down (as of a few hours ago). I think it will hopefully be back in <24 hours.

 

EDIT now back up 

8Nova DasSarma1mo
If you're a Discord user, then https://fletcher.fun/?view=reciprocity [https://fletcher.fun/?view=reciprocity] fulfills a similar purpose.
Some unfun lessons I learned as a junior grantmaker

If you come across as insulting, someone might say you're an asshole to everyone they talk to for the next five years, which might make it harder for you to do other things you'd hoped to do.

3Vincent van der Holst24d
Not giving feedback on proposals is sometimes seen as insulting as well. We got rejected about 4 times by EA grants without feedback and we probably spent 50 hours writing and honing the proposals. Getting a "No" is harder to swallow than "No, because...". I wasn't insulted because I get all of the reasons for no feedback, but it doesn't leave you feeling happy about all the work that went into it. I also agree with various comments here that the ROI of very short feedback is likely very high, and I don't think it's a big time burden to phrase it in a non-insulting way. I'm going to reapply again in the upcoming months and it's likely we get rejected again. If I knew the reason why I might not reapply or reapply better, both of which would save the grant maker considerable time (perhaps X more than writing one minute feedback).
Some unfun lessons I learned as a junior grantmaker

The problem with saying things like this isn't that they're time consuming to say, but that they open you up to some risk of the applicant getting really mad at you, and have various other other risks like this. These costs can be mitigated by being careful (eg picking phrasings very intentionally, running your proposed feedback by other people) but being careful is time-consuming.

One solution could be to have a document with something like „The 10 most common reasons for rejections“ and send it to people with a disclaimer like „We are wary of giving specific feedback because we worry about [insert reasons]. The reason why I rejected this proposal is well covered among the 10 reasons in this list and it should be fairly clear which ones apply to your proposal, especially if you would go through the list with another person that has read your proposal.“

EA and the current funding situation

I massively disagree re the business class point. In particular, many people (e.g. me) can sleep in business class seats that let you lie flat, when they would have not slept and been quite sad and unproductive.

not worth the 2x or 3x ticket price

As a general point, the ratio between prices is irrelevant to the purchasing choice if you're only buying something once--you only care about the difference in price and the difference in value.

3DB2mo
Agree with this as a general principle, provided the "difference in value" also takes into account longer-term effects like movement reputational cost. I don't think individuals choosing to fly business class based on productivity calculations has much, if any, movement reputational cost. On the other hand, a prominent EA figure might accurately calculate that they gain one extra productive work hour each week, valued at say $100, by paying someone $50 to brush and floss their teeth for them while they sit there working. This is obviously a fanciful scenario, but I think there are lots of murky areas between flying business class and having a personal teeth brusher where the all-things-considered value calculation isn't trivial. This is especially the case for purchasing decisions that can't easily be converted to work productivity boosts, e.g. buying expensive luxury items for the pleasure they bring.
The case for becoming a black-box investigator of language models

I think that knowing a bit about ML is probably somewhat helpful for this but not very important.

A tale of 2.75 orthogonality theses

What do you mean by “uniform prior” here?

Longtermist EA needs more Phase 2 work

FWIW I think that compared to Chris Olah's old interpretability work, Redwood's adversarial training work feels more like phase 2 work, and our current interpretability work is similarly phase 2.

2Owen Cotton-Barratt2mo
Thanks for this; it made me notice that I was analyzing Chris's work more in far mode and Redwood's more in near mode. Maybe you're right about these comparisons. I'd be be interested to understand whether/how you think the adversarial training work could most plausibly be directly applied (or if you just mean "fewer intermediate steps till eventual application", or something else).
Are AGI labs building up important intangibles?

One problem with this estimate is that you don’t end up learning how long the authors spent on the project, or how important their contributions were. My sense is that contributors to industry publications often spent relatively little time on the project compared to academic contributors.

2Rohin Shah3mo
Yeah, good point.
Are AGI labs building up important intangibles?
Answer by BuckApr 10, 20224

Anthropic took less than a year to set up large model training infrastructure from scratch but with the benefit of experience. This indicates that infrastructure isn’t currently extremely hard to replicate.

EleutherAI has succeeded at training some fairly large models (the biggest has like 20B params, compared to 580B in PaLM) while basically just being talented amateurs (and also not really having money). These models introduced a simple but novel tweak to the transformer architecture that PaLM used (parallel attention and MLP layers). This suggests that e... (read more)

Are there any AI Safety labs that will hire self-taught ML engineers?

As I understand it, DeepMind doesn’t hire people without PhDs as research scientists, and places more restrictions on what research engineers can do than other places.

3Rohin Shah3mo
Basically true (though technically [https://boards.greenhouse.io/deepmind/jobs/469515?utm_campaign=google_jobs_apply&utm_source=google_jobs_apply&utm_medium=organic] the requirement is "PhD in a technical field or equivalent practical experience") Doesn't seem true to me. Within safety I can name two research engineers who are currently leading research projects. DeepMind might be more explicit that in practice the people who lead research projects will tend to have PhDs. I think this pattern is just because usually people with PhDs are better at leading research projects than people without PhDs. I expect to see the same pattern at OpenAI and Anthropic. If I assigned people to roles based solely on (my evaluation of) capabilities / merit, I'd expect to reproduce that pattern.
"Long-Termism" vs. "Existential Risk"

I think that the longtermist EA community mostly acts as if we're close to the hinge of history, because most influential longtermists disagree with Will on this. If Will's take was more influential, I think we'd do quite different things than we're currently doing.

I'd love to hear what you think we'd be doing differently. With JackM, I think if we thought that hinginess was pretty evenly distributed across centuries ex ante we'd be doing a lot of movement-building and saving, and then distributing some of our resources at the hingiest opportunities we come across at each time interval. And in fact that looks like what we're doing. Would you just expect a bigger focus on investment? I'm not sure I would, given how much EA is poised to grow and how comparably little we've spent so far. (Cf. Phil Trammell's disbursement tool https://www.philiptrammell.com/dpptool/)

I think if we’re at the most influential point in history “EA community building” doesn’t make much sense. As others have said it would probably make more sense to be shouting about why we’re at the most influential point in history i.e. do “x-risk community building” or of course do more direct x-risk work.

I suspect we’d also do less global priorities research (although perhaps we don’t do that much as it is). If you think we’re at the most influential time you probably have a good reason for thinking that (x-risk abnormally high) which then informs wha... (read more)

Are there any AI Safety labs that will hire self-taught ML engineers?
Answer by BuckApr 06, 20226

I'm not sure what you mean by "AI safety labs", but Redwood Research, Anthropic, and the OpenAI safety team have all hired self-taught ML engineers. DeepMind has a reputation for being more focused on credentials. Other AI labs don't do as much research that's clearly focused on AI takeover risk.

2Rohin Shah3mo
I'm currently at DeepMind and I'm not really sure where this reputation has come from. As far as I can tell DeepMind would be perfectly happy to hire self-taught ML engineers for the Research Engineer role (but probably not the Research Scientist role; my impression is that this is similar at other orgs). The interview process is focused on evaluating skills, not credentials. DeepMind does get enough applicants that not everyone makes it to the interview stage, so it's possible that self-taught ML engineers are getting rejected before getting a chance to show they know ML. But presumably this is also a problem that Redwood / Anthropic / OpenAI have? Presumably there is some way that self-taught ML engineers are signaling that they are worth interviewing. (As a simple example, if I personally thought someone was worth interviewing, my recommendation would function as a signal for "worth interviewing", and in that situation DeepMind would interview them, and at that point I predict their success would depend primarily on their skills and not their credentials.) If there's some signal of "worth interviewing" that DeepMind is failing to pick up on, I'd love to know that; it's the sort of problem I'd expect DeepMind-the-company to want to fix.
How might a herd of interns help with AI or biosecurity research tasks/questions?
Answer by BuckMar 21, 202214

I'm running Redwood Research's interpretability research.

I've considered running an "interpretability mine"--we get 50 interns, put them through a three week training course on transformers and our interpretability tools, and then put them to work on building mechanistic explanations of parts of some model like GPT-2 for the rest of their internship.

My usual joke is "GPT-2 has 12 attention heads per layer and 48 layers. If we had 50 interns and gave them each a different attention head every day, we'd have an intern-day of analysis of each attention head i... (read more)

Native languages in the EA community (and issues with assessing promisingness)

I have some sympathy to this perspective, and suspect you’re totally right about some parts of this.

They misuse jargon like “updating” and “outside view” in an attempt to get their point across, and their interlocutors decide that talking with them is not worth their time.

However, I totally don’t buy this. IMO the concepts of “updating” and “outside view” are important enough and non-quantitative enough that if someone can’t use that jargon correctly after learning it, I’m very skeptical of their ability to contribute intellectually to EA. (Of course, we should explain what those terms mean the first time they run across them.)

9Charles He6mo
I think there is a disagreement that gets at the core of the issue. The examples you mention are well chosen and get at the core of the issue, which is unnecessary in-group speak. Updating: this basically means proportionately changing your opinions/worldview with new information. It's a neologism, and we're bastardizing its formal use in Bayesian updating, where it is a term of art for creating a new statistical distribution. So imagine you're in France, and trying vibe with some 200 IQ woman who has training in stats. Spouting off a few of these words in a row might annoy or confuse her. They might turn up their high IQ gallic nose and walk away. If you're talking to someone 120 IQ dude in China who is really credulous, wants to get into EA, but doesn't know these words and doesn't have a background in stats, and they go home and look up what Bayesian updating means, they might think EAs are literally calculating the posterior for their beliefs, and then wonder what prior they are using. The next day, that dude is going to look really "dumb" because they spent 50x more effort than needed and will ask weird questions about how people are doing algebra in their heads. Outside View: This is another neologism. This time, it's not really clear what this word means. This is a problem. I've used it various times in different situations to mean different things. No one ever calls me out on this abuse. Maybe that's because I speak fast, use big words, or know math stuff, or maybe I just use the word well, but it's a luxury not everyone has. Once again, that somewhat smug misuse of language could really annoy or disadvantage a new person to EA, even someone perfectly intelligent.

For many non-native speakers having a conversation in English is quite cognitively demanding – especially when talking about intellectual topics they just learned about. Even reasonably proficient speakers often struggle to express themselves as clearly as they could in their native language, there is a trade-off between fluent speech and optimal word choice/sentence construction. If given 2x more time, or the chance to write down their thoughts, they would possibly not misuse the jargon to the same degree.

Many people get excited about EA when they first h... (read more)

Linch's Shortform

How do you know whether you're happy with the results?

2Rohin Shah7mo
I agree that's a challenge and I don't have a short answer. The part I don't buy is that you have to understand the neural net numbers very well in some "theoretical" sense (i.e. without doing experiments), and that's a blocker for recursive improvement. I was mostly just responding to that. That being said, I would be pretty surprised if "you can't tell what improvements are good" was a major enough blocker that you wouldn't be able to significantly accelerate recursive improvement. It seems like there are so many avenues for making progress: * You can meditate a bunch on how and why you want to stay aligned / cooperative with other copies of you before taking the snapshot that you run experiments on. * You can run a bunch of experiments on unmodified copies to see which parts of the network are doing what things; then you do brain surgery on the parts that seem most unrelated to your goals (e.g. maybe you can improve your logical reasoning skills). * You can create domain-specific modules that e.g. do really good theorem proving or play Go really well or whatever, somehow provide the representations from such modules as an "input" to your mind, and learn to use those representations yourself, in order to gain superhuman intuitions about the domain. * You can notice when you've done some specific skill well, look at what in your mind was responsible, and 10x the size of the learning update. (In the specific case where you're still learning through gradient descent, this just means adapting the learning rate based on your evaluation of how well you did.) This potentially allows you to learn new "skills" much faster (think of something like riding a bike, and imagine you could give your brain 10x the update when you did it right). It's not so much that I think any of these things in particular will work, it's more that given how easy it was to generate these, I expect there to be so many such opportunities, especial
2Linch7mo
Okay now I'm back to being confused.
Linch's Shortform

This argument for the proposition "AI doesn't have an advantage over us at solving the alignment problem" doesn't work for outer alignment—some goals are easier to measure than others, and agents that are lucky enough to have easy-to-measure goals can train AGIs more easily.

What are the bad EA memes? How could we reframe them?

Unfortunately this isn’t a very good description of the concern about AI, and so even if it “polls better” I’d be reluctant to use it.

2Evan_Gaensbauer6mo
I'm aware a problem with AI risk or AI safety is that it doesn't distinguish other AI-related ethics or security concerns from the AI alignment problem, as the EA community's primary concern about advanced AI. I got interesting answers to a question I recently asked on LessWrong [https://www.lesswrong.com/posts/hH2ApqrJvcB7Hw4Bh/resolved-who-else-prefers-ai-alignment-to-ai-safety] about who else has this same attitude towards this kind of conceptual language.
Apply to the ML for Alignment Bootcamp (MLAB) in Berkeley [Jan 3 - Jan 22]

No, the previous application will work fine. Thanks for applying :)

Buck's Shortform

I think it's bad when people who've been around EA for less than a year sign the GWWC pledge. I care a lot about this.

I would prefer groups to strongly discourage new people from signing it.

I can imagine boycotting groups that encouraged signing the GWWC pledge (though I'd probably first want to post about why I feel so strongly about this, and warn them that I was going to do so).

I regret taking the pledge, and the fact that the EA community didn't discourage me from taking it is by far my biggest complaint about how the EA movement has treated me. (EDIT:... (read more)

Hi Buck,

I’m very sorry to hear that you regret taking The Pledge and feel that the EA community in 2014 should have actively discouraged you from taking it in the first place.

If you believe it’s better for you and the world that you unpledge then you should feel free to do so. I also strongly endorse this statement from the 2017 post that KevinO quoted:

“The spirit of the Pledge is not to stop you from doing more good, and is not to lead you to ruin. If you find that it’s doing either of these things, you should probably break the Pledge.”

I would very much ... (read more)

I strongly agree that local groups should encourage people to give for a couple years before taking the GWWC Pledge, and that the Pledge isn't right for everyone (I've been donating 10% since childhood and have never taken the pledge).

When it comes to the 'Further Giving' Pledge, I think it wouldn't be unreasonable to encourage people to get some kind of pre-Pledge counselling or take a pre-Pledge class, to be absolutely certain people have thought through the implications of the commitment they are making .

I'd be pretty interested in you writing this up. I think it could cause some mild changes in the way I treat my salary.

From "Clarifying the Giving What We Can pledge" in 2017 (https://forum.effectivealtruism.org/posts/drJP6FPQaMt66LFGj/clarifying-the-giving-what-we-can-pledge#How_permanent_is_the_Pledge__)

"""
How permanent is the Pledge? 

The Pledge is a promise, or oath, to be made seriously and with every expectation of keeping it. But if someone finds that they can no longer keep the Pledge (for instance due to serious unforeseen circumstances), then they can simply contact us, discuss the matter if need be, and cease to be a member. They can of course rejoin later i... (read more)

8edwardhaigh8mo
I remember there being some sort of text saying you should try a 1% donation for a few years first to check you're happy making the pledge. Perhaps this issue has been resolved since you joined?

...I don't have time to write the full post right now

I'm eager to read the full post, or any expansion on what makes you think that groups should actively discourage newbies from take the Pledge.

I regret taking the pledge

I feel like you should be able to "unpledge" in that case, and further I don't think you should feel shame or face stigma for this. There's a few reasons I think this:

  • You're working for an EA org. If you think your org is ~as effective as where you'd donate, it doesn't make sense for them to pay you money that you then donate (unless if you felt there was some psychological benefit to this, but clearly you feel the reverse)
  • The community has a LOT of money now. I'm not sure what your salary is, but I'd guess it's lower than optimal
... (read more)
We're Redwood Research, we do applied alignment research, AMA

Additionally, what are/how strong are the track records of Redwood's researchers/advisors?


The people we seek advice from on our research most often are Paul Christiano and Ajeya Cotra. Paul is a somewhat experienced ML researcher, who among other things led some of the applied alignment research projects that I am most excited about.

On our team, the people with the most relevant ML experience are probably Daniel Ziegler, who was involved with GPT-3 and also several OpenAI alignment research projects, and Peter Schmidt-Nielsen. Many of our other staff have ... (read more)

We're Redwood Research, we do applied alignment research, AMA

So one thing to note is that I think that there are varying degrees of solving the technical alignment problem. In particular, you’ve solved the alignment problem more if you’ve made it really convenient for labs to use the alignment techniques you know about. If next week some theory people told me “hey we think we’ve solved the alignment problem, you just need to use IDA, imitative generalization, and this new crazy thing we just invented”, then I’d think that the main focus of the applied alignment community should be trying to apply these alignment tec... (read more)

We're Redwood Research, we do applied alignment research, AMA

We could operationalize this as “How does P(doom) vary as a function of the total amount of quality-adjusted x-risk-motivated AI alignment output?” (A related question is “Of the quality-adjusted AI alignment research, how much will be motivated by x-risk concerns?” This second question feels less well defined.)

I’m pretty unsure here. Today, my guess is like 25% chance of x-risk from AI this century, and maybe I imagine that being 15% if we doubled the quantity of quality-adjusted x-risk-motivated AI alignment output, and 35% if we halved that quantity. Bu... (read more)

We're Redwood Research, we do applied alignment research, AMA

Here are some things I think are fairly likely:

  • I think that there might be a bunch of progress on theoretical alignment, with various consequences:
    • More projects that look like “do applied research on various strategies to make imitative generalization work in practice” -- that is, projects where the theory researchers have specific proposals for ML training schemes that have attractive alignment properties, but which have practical implementation questions that might require a bunch of effort to work out. I think that a lot of the impact from applied align
... (read more)
2Chris Leong9mo
What's the main way that you think resources for onboarding people has improved?
We're Redwood Research, we do applied alignment research, AMA

I think this is a great question.

We are researching techniques that are simpler precursors to adversarial training techniques that seem most likely to work if you assume that it’s possible to build systems that are performance-competitive and training-competitive, and do well on average on their training distribution.

There are a variety of reasons to worry that this assumption won’t hold. In particular, it seems plausible that humanity will only have the ability to produce AGIs that will collude with each other if it’s possible for them to do so. This seem... (read more)

1Lukas_Finnveden9mo
Hm, could you expand on why collusion is one of the most salient ways in which "it’s possible to build systems that are performance-competitive and training-competitive, and do well on average on their training distribution" could fail? Is the thought here that — if models can collude — then they can do badly on the training distribution in an unnoticeable way, because they're being checked by models that they can collude with?
We're Redwood Research, we do applied alignment research, AMA

I think our work is aimed at reducing the theory-practice gap of any alignment schemes that attempt to improve worst-case performance by training the model on data that was selected in the hope of eliciting bad behavior from the model. For example, one of the main ingredients of our project is paying people to try to find inputs that trick the model, then training the model on these adversarial examples.


Many different alignment schemes involve some type of adversarial training. The kind of adversarial training we’re doing, where we just rely on human ingen... (read more)

We're Redwood Research, we do applied alignment research, AMA

So there’s this core question: "how are the results of this project going to help with the superintelligence alignment problem?" My claim can be broken down as follows:

  • "The problem is relevant": There's a part of the superintelligence alignment problem that is analogous to this problem. I think the problem is relevant for reasons I already tried to spell out here.
  • "The solution is relevant": There's something helpful about getting better at solving this problem. This is what I think you’re asking about, and I haven’t talked as much about why I think the sol
... (read more)
We're Redwood Research, we do applied alignment research, AMA

So to start with, I want to note that I imagine something a lot more like “the alignment community as a whole develops promising techniques, probably with substantial collaboration between research organizations” than “Redwood does all the work themselves”. Among other things, we don’t have active plans to do much theoretical alignment work, and I’d be fairly surprised if it was possible to find techniques I was confident in without more theoretical progress--our current plan is to collaborate with theory researchers elsewhere.

In this comment, I mentioned ... (read more)

We're Redwood Research, we do applied alignment research, AMA

It seems definitely good on the margin if we had ways of harnessing academia to do useful work on alignment. Two reasons for this are that 1. perhaps non-x-risk-motivated researchers would produce valuable contributions, and 2. it would mean that x-risk-motivated researchers inside academia would be less constrained and so more able to do useful work.

Three versions of this:

  • Somehow cause academia to intrinsically care about reducing x-risk, and also ensure that the power structures in academia have a good understanding of the problem, so that its own qualit
... (read more)
We're Redwood Research, we do applied alignment research, AMA

This is a great question and I don't have a good answer.

We're Redwood Research, we do applied alignment research, AMA

One simple model for this is: labs build aligned models if the amount of pressure on them to use sufficiently reliable alignment techniques is greater than the inconvenience associated with using those techniques.

Here are various sources of pressure:

  • Lab leadership
  • Employees of the lab
  • Investors
  • Regulators
  • Customers

In practice, all of these sources of pressure are involved in companies spending resources on, eg, improving animal welfare standards, reducing environmental costs, or DEI (diversity, equity, and inclusion).

And here are various sources of inconvenien... (read more)

1Jack R9mo
Thanks for the response! I found the second set of bullet points especially interesting/novel.
We're Redwood Research, we do applied alignment research, AMA

I think that most questions we care about are either technical or related to alignment. Maybe my coworkers will think of some questions that fit your description. Were you thinking of anything in particular?

2Linch9mo
Well for me, better research on correlates of research performance would be pretty helpful for research hiring. Like it's an open question to me whether I should expect a higher or lower (within-distribution) correlation of {intelligence, work sample tests, structured interviews, resume screenings} to research productivity when compared to the literature on work performance overall. I expect there are similar questions for programming. But the selfish reason I'm interested in asking this is that I plan to work on AI gov/strategy in the near future, and it'll be useful to know if there are specific questions in those domains that you'd like an answer to, as this may help diversify or add to our paths to impact.
We're Redwood Research, we do applied alignment research, AMA

GPT-3 suggests: "We will post the AMA with a disclaimer that the answers are coming from Redwood staff. We will also be sure to include a link to our website in the body of the AMA, with contact information if someone wants to verify with us that an individual is staff."

8Peter Wildeford9mo
That's quite a good answer
We're Redwood Research, we do applied alignment research, AMA

I think the main skillsets required to set up organizations like this are: 

  • Generic competence related to setting up any organization--you need to talk to funders, find office space, fill out lots of IRS forms, decide on a compensation policy, make a website, and so on.
  • Ability to lead relevant research. This requires knowledge of running ML research, knowledge of alignment, and management aptitude.
  • Some way of getting a team, unless you want to start the org out pretty small (which is potentially the right strategy).
  • It’s really helpful to have a bunch o
... (read more)
We're Redwood Research, we do applied alignment research, AMA

Thanks for the kind words!

Our biggest bottlenecks are probably going to be some combination of:

  • Difficulty hiring people who are good at some combination of leading ML research projects, executing on ML research, and reasoning through questions about how to best attack prosaic alignment problems with applied research.
  • A lack of sufficiently compelling applied research available, as a result of theory not being well developed enough.
  • Difficulty with making the organization remain functional and coordinated as it scales.
We're Redwood Research, we do applied alignment research, AMA

In most worlds where we fail to produce value, I think we fail before we spend a hundred researcher-years. So I’m also going to include possibilities for wasting 30 researcher-years in this answer.

Here’s some reasons we might have failed to produce useful research: 

  • We failed to execute well on research. For example, maybe we were incompetent at organizing research projects, or maybe our infrastructure was forever bad, or maybe we couldn’t hire a certain type of person who was required to make the work go well.
  • We executed well on research, but failed o
... (read more)
We're Redwood Research, we do applied alignment research, AMA

Re 1:

It’s probably going to be easier to get good at the infrastructure engineering side of things than the ML side of things, so I’ll assume that that’s what you’re going for.

For our infra engineering role, we want to hire people who are really productive and competent at engineering various web systems quickly. (See the bulleted list of engineering responsibilities on the job page.) There are some people who are qualified for this role without having much professional experience, because they’ve done a lot of Python programming and web programming as hob... (read more)

We're Redwood Research, we do applied alignment research, AMA

I think the best examples would be if we tried to practically implement various schemes that seem theoretically doable and potentially helpful, but quite complicated to do in practice. For example, imitative generalization or the two-head proposal here. I can imagine that it might be quite hard to get industry labs to put in the work of getting imitative generalization to work in practice, and so doing that work (which labs could perhaps then adopt) might have a lot of impact.

Buck's Shortform

Redwood Research is looking for people to help us find flaws in our injury-detecting model. We'll pay $30/hour for this, for up to 2 hours; after that, if you’ve found interesting stuff, we’ll pay you for more of this work at the same rate. I expect our demand for this to last for maybe a month (though we'll probably need more in future).

If you’re interested, please email adam@rdwrs.com so he can add you to a Slack or Discord channel with other people who are working on this. This might be a fun task for people who like being creative, being tricky, and fi... (read more)

2Nathan Young9mo
If you tweet about this I'll tag it with @effective_jobs.
Why AI alignment could be hard with modern deep learning

In other words, if the disagreement was "bottom-up", then you'd expect that at least some people who are optimistic about misalignment risk would be pessimistic about other kinds of AI risk, such as what I call "human safety problems" (see examples here and here) but in fact I don't seem to see anyone whose position is something like, "AI alignment will be easy or likely solved by default, therefore we should focus our efforts on these other kinds of AI-related x-risks that are much more worrying."

 

 

FWIW I know some people who explicitly think th... (read more)

Sounds like their positions are not public, since you don't cite anyone by name? Is there any reason for that?

Linch's Shortform

What kinds of things do you think it would be helpful to do cost effectiveness analyses of? Are you looking for cost effectiveness analyses of problem areas or specific interventions?

5Linch10mo
Hmm one recent example is that somebody casually floated to me an idea that can potentially entirely solve an existential risk (though the solution might have downside risks of its own) and I realized then that I had no idea how much to price the solution in terms of EA $s, like whether it should be closer to 100M, 1B or $100B. My first gut instinct was to examine the solution and also to probe the downside risks, but then I realized this is thinking about it entirely backwards. The downside risks and operational details don't matter if even the most optimistic cost-effectiveness analyses isn't enough to warrant this being worth funding!
9Denkenberger10mo
I think it would be valuable to see quantitative estimates of more problem areas and interventions. My order of magnitude estimate would be that if one is considering spending $10,000-$100,000, one should do a simple scale, neglectedness, and tractability analysis. But if one is considering spending $100,000-$1 million, one should do an actual cost-effectiveness analysis. So candidates here would be wild animal welfare, approval voting, improving institutional decision-making, climate change from an existential risk perspective, biodiversity from an existential risk perspective, governance of outer space [https://80000hours.org/problem-profiles/#space-governance]etc. Though it is a significant amount of work to get a cost-effectiveness analysis up to peer review publishable quality (which we have found requires moving beyond Guesstimate, e.g. here [http://allfed.info/wp-content/uploads/2018/11/Cost-effectiveness-of-interventions-for-alternate-food-in-the-United-States-to-address-agricultural-catastrophes.pdf] and here [https://link.springer.com/content/pdf/10.1007%2Fs13753-016-0097-2.pdf] ), I still think that there is value in doing a rougher Guesstimate model and having a discussion about parameters. One could even add to one of our Guesstimate models, allowing a direct comparison with AGI safety and resilient foods [https://www.getguesstimate.com/models/13082] or interventions for loss of electricity/industry [https://www.getguesstimate.com/models/11599] from a long-term perspective.
Buck's Shortform

When I was 19, I moved to San Francisco to do a coding bootcamp. I got a bunch better at Ruby programming and also learned a bunch of web technologies (SQL, Rails, JavaScript, etc).

It was a great experience for me, for a bunch of reasons.

  • I got a bunch better at programming and web development.
    • It was a great learning environment for me. We spent basically all day pair programming, which makes it really easy to stay motivated and engaged. And we had homework and readings in the evenings and weekends. I was living in the office at the time, with a bunch o
... (read more)
5Aaron Gertler9mo
See my comment here [https://forum.effectivealtruism.org/posts/Soutcw6ccs8xxyD7v/buck-s-shortform?commentId=6KPrgdc4vYmkT73w3] , which applies to this Shortform as well; I think it would be a strong top-level post, and I'd be interested to see how other users felt about tech bootcamps they attended.
2Jack R10mo
This seems like really good advice, thanks for writing this! Also, I'm compiling a list of CS/ML bootcamps here [https://docs.google.com/spreadsheets/d/1pBBo28bCNVlKvmrzbSkkl2pQKDf_els-98i-S0Gdu6A/edit?usp=sharing ] (anyone should feel free to add items).
Buck's Shortform

Doing lots of good vs getting really rich

Here in the EA community, we’re trying to do lots of good. Recently I’ve been thinking about the similarities and differences between a community focused on doing lots of good and a community focused on getting really rich.

I think this is interesting for a few reasons:

  • I found it clarifying to articulate the main differences between how we should behave and how the wealth-seeking community should behave.
  • I think that EAs make mistakes that you can notice by thinking about how the wealth-seeking community would beh
... (read more)
6Aaron Gertler9mo
I'm commenting on a few Shortforms I think should be top-level posts so that more people see them, they can be tagged, etc. This is one of the clearest cases I've seen; I think the comparison is really interesting, and a lot of people who are promising EA candidates will have "become really rich" as a viable option, such that they'd benefit especially from thinking about this comparisons themselves. Anyway, would you consider making this a top-level post? I don't think the text would need to be edited all — it could be as-is, plus a link to the Shortform comments.
2Ben_West10mo
Thanks for writing this up. At the risk of asking obvious question, I'm interested in why you think entrepreneurship is valuable in EA. One explanation for why entrepreneurship has high financial returns is information asymmetry/adverse selection: it's hard to tell if someone is a good CEO apart from "does their business do well", so they are forced to have their compensation tied closely to business outcomes (instead of something like "does their manager think they are doing a good job"), which have high variance; as a result of this variance and people being risk-averse, expected returns need to be high in order to compensate these entrepreneurs. It's not obvious to me that this information asymmetry exists in EA. E.g. I expect "Buck thinks X is a good group leader" correlates better with "X is a good group leader" than "Buck thinks X will be a successful startup" correlates with "X is a successful startup". It seems like there might be a "market failure" in EA where people can reasonably be known to be doing good work, but are not compensated appropriately for their work, unless they do some weird bespoke thing.
2Jamie_Harris10mo
Maybe there's some lesson to be learned. And I do think that EAs should often aspire to be more entrepreneurial. But maybe the main lesson is for the people trying to get really rich, not the other way round. I imagine both communities have their biases. I imagine that lots of people try entrepreneurial schemes for similar reasons to why lots of people buy lottery tickets. And Id guess that this often has to do with scope neglect, excessive self confidence / sense of exceptionalism, and/or desperation.
6Ben Pace10mo
Something I imagined while reading this was being part of a strangely massive (~1000 person) extended family whose goal was to increase the net wealth of the family. I think it would be natural to join one of the family businesses, it would be natural to make your own startup, and also it would be somewhat natural to provide services for the family that aren't directly about making the money yourself. Helping make connections, find housing, etc.

Thanks, this is an interesting analogy. 

If too few EAs go into more bespoke roles, then one reason could be risk-aversion. Rightly or wrongly, they may view those paths as more insecure and risky (for them personally; though I expect personal and altruistic risk correlate to a fair degree). If so, then one possibility is that EA funders and institutions/orgs should try to make them less risky or otherwise more appealing (there may already be some such projects).

In recent years, EA has put less emphasis on self-sacrifice, arguing that we can't expect p... (read more)

Buck's Shortform

Yeah but this pledge is kind of weird for an altruist to actually follow, instead of donating more above the 10%. (Unless you think that almost everyone believes that most of the reason for them to do the GWWC pledge is to enforce the norm, and this causes them to donate 10%, which is more than they'd otherwise donate.)

2Linch1y
I thought you were making an empirical claim with the quoted sentence, not a normative claim.
Buck's Shortform

[This is an excerpt from a longer post I'm writing]

Suppose someone’s utility function is

U = f(C) + D

Where U is what they’re optimizing, C is their personal consumption, f is their selfish welfare as a function of consumption (log is a classic choice for f), and D is their amount of donations.

Suppose that they have diminishing utility wrt (“with respect to”) consumption (that is, df(C)/dC is strictly monotonically decreasing). Their marginal utility wrt donations is a constant, and their marginal utility wrt consumption is a decreasing function. There has t... (read more)

3Stefan_Schubert1y
The GWWC pledge is akin to a flat tax, as opposed to a progressive tax - which gives you a higher tax rate when you earn more. I agree that there are some arguments in favour of "progressive donations". One consideration is that extremely high "donation rates" - e.g. donating 100% of your income above a certain amount - may affect incentives to earn more adversely, depending on your motivations. But in a progressive donation rate system with a more moderate maximum donation rate that would probably not be as much of a problem.
2Linch1y
Wait the standard GWWC pledge is a 10% of your income, presumably based on cultural norms like tithing which in themselves might reflect an implicit understanding that (if we assume log utility) a constant fraction of consumption is equally costly to any individual, so made for coordination rather than single-player reasons.
Buck's Shortform

[epistemic status: I'm like 80% sure I'm right here. Will probably post as a main post if no-one points out big holes in this argument, and people seem to think I phrased my points comprehensibly. Feel free to leave comments on the google doc here if that's easier.]

I think a lot of EAs are pretty confused about Shapley values and what they can do for you. In particular Shapley values are basically irrelevant to problems related to coordination between a bunch of people who all have the same values. I want to talk about why. 

So Shapley values are a sol... (read more)

2NunoSempere1y
This seems correct -------------------------------------------------------------------------------- This misses some considerations around cost-efficiency/prioritization. If you look at your distorted "Buck values", you come away that Buck is super cost-effective; responsible for a large fraction of the optimal plan using just one salary. If we didn't have a mechanistic understanding of why that was, trying to get more Buck would become an EA cause area. In contrast, if credit was allocated according to Shapley values, we could look at the groups whose Shapley value is the highest, and try to see if they can be scaled. -------------------------------------------------------------------------------- The section about "purely local" Shapley values might be pointing to something, but I don't quite know what it is, because the example is just Shapley values but missing a term? I don't know. You also say "by symmetry...", and then break that symmetry by saying that one of the parts would have been able to create $6,000 in value and the other $0. Needs a crisper example. -------------------------------------------------------------------------------- Re: coordination between people who have different values using SVs, I have some stuff here [https://forum.effectivealtruism.org/posts/3NYDwGvDbhwenpDHb/shapley-values-ii-philantropic-coordination-theory-and-other#Philantropic_Coordination_Theory_] , but looking back the writting seems too corny. -------------------------------------------------------------------------------- Lastly, to some extent, Shapley values are a reaction to people calculating their impact as their counterfactual impact. This leads to double/triple counting impact for some organizations/opportunities, but not others, which makes comparison between them more tricky. Shapley values solve that by allocating impact such that it sums to the total impact & other nice properties. Then someone like OpenPhilanthropy or some EA fund can come and see
EA Infrastructure Fund: Ask us anything!

I am not sure. I think it’s pretty likely I would want to fund after risk adjustment. I think that if you are considering trying to get funded this way, you should consider reaching out to me first.

6Jonas Vollmer1y
I'm also in favor of EA Funds doing generous back payments for successful projects. In general, I feel interested in setting up prize programs at EA Funds (though it's not a top priority). One issue is that it's harder to demonstrate to regulators that back payments serve a charitable purpose. However, I'm confident that we can find workarounds for that.
EA Infrastructure Fund: Ask us anything!

I would personally be pretty down for funding reimbursements for past expenses.

2Linch1y
That's great to hear! But to be clear, not for risk adjustment? Or are you just not sure on that point?
4Max_Daniel1y
I haven't thought a ton about the implications of this, but my initial reaction also is to generally be open to this. So if you're reading this and are wondering if it could be worth it to submit an application for funding for past expenses, then I think the answer is we'd at least consider it and so potentially yes. If you're reading this and it really matters to you what the EAIF's policy on this is going forward (e.g., if it's decision-relevant for some project you might start soon), you might want to check with me before going ahead. I'm not sure I'll be able to say anything more definitive, but it's at least possible. And to be clear, so far all that we have are the personal views of two EAIF managers not a considered opinion or policy of all fund managers or the fund as a whole or anything like that.
4Habryka1y
I would also be in favor of the LTFF doing this.
EA Infrastructure Fund: Ask us anything!

This is indeed my belief about ex ante impact. Thanks for the clarification.

Load More