All of evhub's Comments + Replies

I won't repeat my full LessWrong comment here in detail; instead I'd just recommend heading over there and reading it and the associated comment chain. The bottom-line summary is that, in trying to cover some heavy information theory regarding how to reason about simplicity priors and counting arguments without actually engaging with the proper underlying formalism, this post commits a subtle but basic mathematical mistake that makes the whole argument fall apart.

I think this is a very good point, and it definitely gives me some pause—and probably my original statement there was too strong. Certainly I agree that you need to do evaluations using the best possible scaffolding that you have, but overall my sense is that this problem is not that bad. Some reasons to think that:

  • At least currently, scaffolding-related performance improvements don't seem to generally be that large (e.g. chain-of-thought is just not that helpful on most tasks), especially relative to the gains from scaling.
  • You can evaluate pretty direc
... (read more)

Cross-posted from LessWrong.

One reason I'm critical of the Anthropic RSP is that it does not make it clear under what conditions it would actually pause, or for how long, or under what safeguards it would determine it's OK to keep going.

It's hard to take anything else you're saying seriously when you say things like this; it seems clear that you just haven't read Anthropic's RSP. I think that the current conditions and resulting safeguards are insufficient to prevent AI existential risk, but to say that it doesn't make them clear is just patently false... (read more)

It think calling a take "lazy", which could indeed be considered "mean" is not avery helpful approach, you could have made your point without that kind of derision. There are going to be a lot of misunderstandings and hot takes around RSPs, and I think AI company employees especially should err heavily on the side of patience and kind understanding it they want to avoid people becoming more adversarial towards them.

Live by the sword, die by the sword.

Akash said...

"that it does not make it clear under what conditions it would actually pause, or for how long... (read more)

Cross-posted with LessWrong.

I found this post very frustrating, because it's almost all dedicated to whether current RSPs are sufficient or not (I agree that they are insufficient), but that's not my crux and I don't think it's anyone else's crux either. And for what I think is probably the actual crux here, you only have one small throwaway paragraph:

Which brings us to the question: “what’s the effect of RSPs on policy and would it be good if governments implemented those”. My answer to that is: An extremely ambitious version yes; the misleading version

... (read more)

The RSP angle is part of the corporate "big AI" "business as usual" agenda. To those of us playing the outside game it seems very close to safetywashing.

I've written up more about why I think this is not true here.

2
Greg_Colbourn
6mo
Thanks. I'm not convinced.

Have the resumption condition be a global consensus on an x-safety solution or a global democratic mandate for restarting (and remember there are more components of x-safety than just alignment - also misuse and multi-agent coordination).

This seems basically unachievable and even if it was achievable it doesn't even seem like the right thing to do—I don't actually trust the global median voter to judge whether additional scaling is safe or not. I'd much rather have rigorous technical standards then nebulous democratic standards.

I think it's pushing it

... (read more)
2
Greg_Colbourn
6mo
Fair. And where I say "global consensus on an x-safety", I mean expert opinion (as I say in the OP). I expect the public to remain generally a lot more conservative than the technical experts though, in terms of risk they are willing to tolerate. The RSP angle is part of the corporate "big AI" "business as usual" agenda. To those of us playing the outside game it seems very close to safetywashing.

This is soon enough to be pushing as hard as we can for a pause right now!

I mean, yes, obviously we should be doing everything we can right now. I just think that a RSP-gated pause is the right way to do a pause. I'm not even sure what it would mean to do a pause without an RSP-like resumption condition.

Why try and take it right down to the wire with RSPs?

Because it's more likely to succeed. RSPs provides very clear and legible risk-based criteria that are much more plausibly things that you could actually get a government to agree to.

The tradeoff

... (read more)
6
Greg_Colbourn
6mo
Have the resumption condition be a global consensus on an x-safety solution or a global democratic mandate for restarting (and remember there are more components of x-safety than just alignment - also misuse and multi-agent coordination). I think if governments actually properly appreciated the risks, they could agree to an unconditional pause.  Sorry. I'm looking at it at the company level. Please don't take my critiques as being directed at you personally. What is in it for Anthropic and OpenAI and DeepMind to keep going with scaling? Money and power, right? I think it's pushing it a bit at this stage to say that they, as companies, are primarily concerned with reducing x-risk. If they were they would've stopped scaling already. Forget the (suicide) race. Set an example to everyone and just stop!

But what about dangerous capabilities that have more to do with AI takeover (e.g., a company develops a system that shows signs of autonomous replication, manipulation, power-seeking, deception) or scientific capabilities (e.g., the ability to develop better AI systems)?

Supposing that 3-10 other companies are within a few months of these systems, do you think at this point we need a coordinated pause, or would it be fine to just force company 1 to pause?

What should happen there is that the leading lab is forced to stop and try to demonstrate that e.g. t... (read more)

Perhaps the crux is related to how dangerous you think current models are? I'm quite confident that we have at least a couple additional orders of magnitude of scaling before the world ends, so I'm not too worried about stopping training of current models, or even next-generation models. But I do start to get worried with next-next-generation models.

So, in my view, the key is to make sure that we have a well-enforced Responsible Scaling Policy (RSP) regime that is capable of preventing scaling unless hard safety metrics are met (I favor understanding-based... (read more)

0
Greg_Colbourn
6mo
I don't think the current models are dangerous, but perhaps they could be if used for long enough on improving AI. A couple of orders of magnitude (or a couple of generations) is only a couple of years! This is soon enough to be pushing as hard as we can for a pause right now! Why try and take it right down to the wire with RSPs? Why over-complicate things? The stakes couldn't be bigger (extinction). It's super reckless to not just be saying "It seems quite likely we're getting to world-ending models in 2-5 years. Let's not keep going any longer. Let's just stop now." The tradeoff [edit: for Anthropic] for a few tens of $Bs of extra profit really doesn't seem worth it!

What happens when a dangerous capability eval goes off— does the government have the ability to implement a national pause?

I think presumably the pause would just be for that company's scaling—presumably other organizations that were still in compliance would still be fine.

If the narrative is "hey, we agree that the government should force everyone to scale responsibly, and this means that the government would have the ability to tell people that they have to stop scaling if the government decides it's too risky", then I'd still probably prefer stoppi

... (read more)
5
Akash
6mo
Thanks! A few quick responses/questions: I think this makes sense for certain types of dangerous capabilities (e.g., a company develops a system that has strong cyberoffensive capabilities. That company has to stop but other companies can keep going). But what about dangerous capabilities that have more to do with AI takeover (e.g., a company develops a system that shows signs of autonomous replication, manipulation, power-seeking, deception) or scientific capabilities (e.g., the ability to develop better AI systems)? Supposing that 3-10 other companies are within a few months of these systems, do you think at this point we need a coordinated pause, or would it be fine to just force company 1 to pause? Do you know if ARC or Anthropic have publicly endorsed this position anywhere? (And if not, I'd be curious for your take on why, although that's more speculative so feel free to pass). 

I guess I'm not really sure what your objection is to Responsible Scaling Policies? I see that there's a bunch of links, but I don't really see a consistent position being staked out by the various sources you've linked to. Do you want to describe what your objection is?

I guess the closest there is "the danger is already apparent enough" which, while true, doesn't really seem like an objection. I agree that the danger is apparent, but I don't think that advocating for a pause is a very good way to address that danger.

4
Greg_Colbourn
6mo
The consistent position is that further scaling is reckless at this stage; it can't be done in a "responsible" way, unless you think subjecting the world to a 10-25% risk of extinction is a responsible thing to be doing! What is a better way of addressing the danger? Waiting for it to get more intense and more apparent by scaling further!? Waiting until a disaster actually happens? Actually pausing, or stopping (and setting an example), rather than just advocating for a pause?

I tend to put P(doom) around 80%, so I think I'm on the pessimistic side, and I tend to think short timelines are at least a real and serious possibility that we should be planning for. Nevertheless, I disagree with a global stop or a pause being the "only reasonable hope"—global stops and pauses seem basically unworkable to me. I'm much more excited about governmentally enforced Responsible Scaling Policies, which seem like the "better option" that you're missing here.

4
Vasco Grilo
13d
Hi Evan, What is your median time from now until human extinction? If it is only a few years, I would be happy to set up a bet like this one.
7
Akash
6mo
@evhub can you say more about what you envision a governmentally-enforced RSP world would look like? Is it similar to licensing? What happens when a dangerous capability eval goes off— does the government have the ability to implement a national pause? Aside: IMO it's pretty clear that the voluntary-commitment RSP regime is insufficient, since some companies simply won't develop RSPs, and even if lots of folks adopted RSPs, the competitive pressures in favor of racing seem like they'd make it hard for anyone to pause for >a few months. I was surprised/disappointed that neither ARC nor Anthropic mentioned this. ARC says some stuff about how maybe in the future one day we might have some stuff from RSPs that could maybe inform government standards, but (in my opinion) their discussion of government involvement was quite weak, perhaps even to the point of being misleading (by making it seem like the voluntary commitments will be sufficient.) I think some of the negative reaction to responsible scaling, at least among some people I know, is that it seems like an attempt for companies to say "trust us— we can scale responsibly, so we don't need actual government regulation." If the narrative is "hey, we agree that the government should force everyone to scale responsibly, and this means that the government would have the ability to tell people that they have to stop scaling if the government decides it's too risky", then I'd still probably prefer stopping right now, but I'd be much more sympathetic to the RSP position.
0
Greg_Colbourn
6mo
I mention Responsible Scaling! EDIT to add: I'm interested in a response from evhub (or anyone else) to the points raised against Responsible Scaling (see links for more details).

In any case, I don't see any reason to think the neural net prior is malign, or particularly biased toward deceptive, misaligned generalization. If anything the simplicity prior seems like good news for alignment.

I definitely disagree with this—especially the last sentence; essentially all of my hope for neural net inductive biases comes from them not being like an actual simplicity prior. The primary literature I'd reference here would be "How likely is deceptive alignment?" for the practical question regarding concrete neural net inductive biases and ... (read more)

-1
Sharmake
6mo
Okay, my crux is that the simplicity/Kolmogorov/Solomonoff prior is probably not very malign, assuming we could run it, and in general I find the prior not to be malign except for specific situations. This is basically because it relies on the IMO dubious assumption that the halting oracle can only be used once, and notably once we use the halting/Solomonoff oracle more than once, the Solomonoff oracle loses it's malign properties. More generally, if the Solomonoff Oracle is duplicatable, as modern AIs generally are, then there's a known solution to mitigate the malignancy of the Solomonoff prior: Duplicate it, and let multiple people run the Solomonoff inductor in parallel to increase the complexity of manipulation. The goal is essentially to remove the uniqueness of 1 Solomonoff inductor, and make an arbitrary number of such oracles to drive up the complexity of manipulation. So under a weak assumption, the malignancy of the Solomonoff prior goes away.   This is described well in the link below, and the important part is that we need either a use-once condition, or we need to assume uniqueness in some way. If we don't have either assumption holding, as is likely to be the case, then the Solomonoff/Kolmogorov prior isn't malign. https://www.lesswrong.com/posts/f7qcAS4DMKsMoxTmK/the-solomonoff-prior-is-malign-it-s-not-a-big-deal#Comparison_ And that's if it's actually malign, which it might not be, at least in the large-data limit: https://www.lesswrong.com/posts/Tr7tAyt5zZpdTwTQK/the-solomonoff-prior-is-malign#fDEmEHEx5EuET4FBF More specifically, it's this part of John Wentworth's comment: As far as the actual practical question, there is a very important limitation on inner-misaligned agents by SGD, primarily because gradient hacking is very difficult to do, and is an underappreciated limitation on misalignment, since SGD has powerful tools to remove inner-misaligned circuits/TMs/Agents in the link below: https://www.lesswrong.com/posts/w2TAEvME2yAG9MHeq/
2
Nora Belrose
6mo
So, I definitely don't have the Solomonoff prior in mind when I talk about simplicity. I'm actively doing research at the moment to better characterize the sense in which neural nets are biased toward "simple" functions, but I would be shocked if it has anything to do with Kolmogorov complexity.

Public debates strengthen society and public discourse. They spread truth by testing ideas and filtering out weaker arguments.

I think this is extremely not true, and am pretty disappointed with this sort of "debate me" communications policy. In my opinion, I think public debates very rarely converge towards truth. Lots of things sound good in a debate but break down under careful analysis, and the pressure of saying things that look good to a public audience creates a lot of pressure opposed to actual truth-seeking.

I understand and agree with the import... (read more)

At a sufficiently sophisticated technological level, vacuum decay actually becomes worthwhile, as it increases the total amount of available free energy. The problem is ensuring any sort of civilizational continuity before and after the vacuum decay—though, like any other physical process, vacuum decay shouldn't destroy information, so theoretically if you understood the mechanics well enough you should be able to engineer whatever outcome you wanted on the other side.

3
Sharmake
1y
Even more importantly, assuming you can change the vacuum constants, one of the best constants to change is Planck's constant, because assuming the Planck constant is 0, computing power is infinite, not arbitrarily large, but infinity is your limit. Assuming information isn't destroyed, this bodes well for uploads and simulations, since their bodies are purely informational, though any physical entity would have to be scanned first.
evhub
1y59
19
0

In my opinion, I think the best solution here is incentivizing people to voluntarily have more children—e.g. child tax credits, maternity/paternity leave, etc. If you don't think fetuses are moral patients, then the pro-natalist, longtermist, total utilitarian view doesn't distinguish between having an abortion and just choosing not to have a child, so I don't really see the reason to focus on abortion specifically in that case.

I don't really know what other purpose the "we must be very clear" here serves besides trying to indicate that you think it's very important that EA projects a unified front here.

I am absolutely intending to communicate that I think it would be good for people to say that they think fraud is bad. But that doesn't mean that I think we should condemn people who disagree regarding whether saying that is good or not. Rather, I think discussion about whether it's a good idea for people to condemn fraud seems great to me, and my post was an attempt to provide my (short, abbreviated) take on that question.

in a form that implies strong moral censure to anyone who argues the opposite

I don't think this and didn't say it. If you have any quotes from the post that you think say this, I'd be happy to edit it to be more clear, but from my perspective it feels like you're inventing a straw man to be mad at rather than actually engaging with what I said.

You also said that we should do so independently of the facts of the FTX case, which feels weird to me, because I sure think the details of the case are very relevant to what ethical lines I want to draw in the

... (read more)
-3
Habryka
1y
I mean, the title of your post starts with "We must be very clear". This at least to me communicated an attitude that discourages people prominently associated with EA going like "I don't know man, I don't think I stand behind this". I don't really know what other purpose the "we must be very clear" here serves besides trying to indicate that you think it's very important that EA projects a unified front here.  And I think independently of your intention, I am confident that your post has also not made other people excited about discussing the actual ethical lines here, based on conversations I've had with other people about how they relate to your post (many of which like the post, but exactly for the reason that they don't want to see people defending fraud, which would look quite bad for us). Yeah, I think I disagree with this. I think most of my ethical boundaries are pretty contingent on facts about history and what kind of cognitive algorithms seem to be perform well or badly, and indeed almost all my curiosities when trying to actually genuinely answer the question of when fraud is acceptable consist of questions about the empirical details of the world, like "to what degree is your environment coercive so that fraud is justified?" and "to what degree is fraud widespread?" and "how many people does fraud seem to hurt?", and so on. I don't think this makes me harder to coordinate with. Indeed, I think being receptive to empirical feedback about ethical rules is I think quite important for being able to be cooperated with, since it gives people the confidence that I will update on evidence that some cognitive strategy, or some attitude, or some moral perspective causes harm.

think what we owe the world is both reflection about where our actual lines are (and how the ones that we did indeed have might have contributed to this situation), as well as honest and precise statements about what kinds of things we might actually consider doing in the future.

I actually state in the post that I agree with this. From my post:

In that spirit, I think it's worth us carefully confronting the moral question here: is fraud in the service of raising money for effective causes wrong?

Perhaps that is not as clear as you would like, but like... (read more)

4
Habryka
1y
I do think your post is making actually answering that question as a community harder, because you yourself answer that question with "we unequivocally need to condemn this behavior" in a form that implies strong moral censure to anyone who argues the opposite.  You also said that we should do so independently of the facts of the FTX case, which feels weird to me, because I sure think the details of the case are very relevant to what ethical lines I want to draw in the future. The section you quote here reads to me as a rhetorical question. You say "carefully", but you just answer the question yourself in the next sentence and say that the answer "clearly" is the way you say it is. I don't think your post invites discussion or discourse about where the lines of fraud are, or when we do think deception is acceptable, or generally reflecting on our moral principles.

I mean, indeed the combination of "fraud is a vague, poorly defined category" together with a strong condemnation of said "fraud", without much explicit guidance on what kind of thing you are talking about, is what I am objecting to in your post.

I guess I don't really think this is a problem. We're perfectly comfortable with statements like “murder is wrong” while also understanding that “but killing Hitler would be okay.” I don't mean to say that talking about the edge cases isn't ever helpful—in fact, I think it can be quite useful to try to be clear ... (read more)

That sounds like a fully generalized defense against all counterarguments, and I don't think is how discourse usually works.

It's clearly not fully general because it only applies to excluding edge cases that don't satisfy the reasons I explicitly state in the post.

If you say "proposition A is true about category B, for reasons X, Y, Z" and someone else is like "but here is an argument C for why proposition A is not true about category B", then of course you don't get to be like, "oh, well, I of course meant the subset of category B where argument C do

... (read more)
2
Habryka
1y
I mean, indeed the combination of "fraud is a vague, poorly defined category" together with a strong condemnation of said "fraud", without much explicit guidance on what kind of thing you are talking about, is what I am objecting to in your post (among some other things, but again, seems better to leave that up to my more thorough response).  I think you are vastly overestimating how transparent the boundaries of the fraud concept are that you are trying to point to. Like, I don't know whether you meant to include half of the examples I listed on this thread, and I don't think other readers of your post do. Nevertheless you called for strong condemnation of that ill-defined category.  I think the average reader of your post will leave with a feeling that they are supposed to be backing up some kind of clear line, because that's the language that your post is written in. But there is no clear line, and your post does not actually meaningfully commit us to anything, or should serve as any kind of clear sign to the external world about where our ethical lines are.  Of course we oppose fraud of the type that Sam committed, that fraud exploded violently and was also incredibly reckless and was likely even net-negative by Sam's own goals, but that's obvious and not an interesting statement and is not actually what your post is primarily saying (indeed, it is saying that we should condemn fraud independently of the details of the FTX case, whatever that means). I think what we owe the world is both reflection about where our actual lines are (and how the ones that we did indeed have might have contributed to this situation), as well as honest and precise statements about what kinds of things we might actually consider doing in the future. I don't think your post is helping with either, but instead feels to me like an inwards-directed applause light for "fraud bad", in a way that does not give people who have genuine concerns about where our moral lines are (which inclu

Adding on to my other reply: from my perspective, I think that if I say “category A is bad because X, Y, Z” and you're like “but edge case B!” and edge case B doesn't satisfy X, Y, or Z, then clearly I'm not including it in category A.

2
Habryka
1y
That sounds like a fully generalized defense against all counterarguments, and I don't think is how discourse usually works. If you say "proposition A is true about category B, for reasons X, Y, Z" and someone else is like "but here is an argument C for why proposition A is not true about category B", then of course you don't get to be like, "oh, well, I of course meant the subset of category B where argument C doesn't hold". If I say "being honest is bad because sometimes people use true information against you" and you say "but sometimes they won't though and actually use it to help you", then I can't say "well, of course I didn't include that case when I was talking about 'being honest', I was just talking about being honest to people who don't care about you". Or less abstractly, when you argue that giving money to GiveWell is good because money donated there can go much farther than otherwise, and then GiveWell turns out to have defrauded the money, then you don't get to be like "oh, well, of course, in that case giving money to GiveWell was bad, and I meant to exclude the case where GiveWell was defrauding money, so my original post is still correct".

I think you're wrong about how most people would interpret the post. I predict that if readers were polled on whether or not the post agreed with “lying to Nazis is wrong” the results would be heavily in favor of “no, the post does not agree with that.” If you actually had a poll that showed the opposite I would definitely update.

I think the nazi example is too loaded for various reasons (and triggers people's "well, this is clearly some kind of thought experiment" sensors).

I think there are a number of other examples that I have listed in the comments to this post that I think would show this. E.g. something in the space of "jewish person lies about their religious affiliation in order to escape discrimination that's unfair to them for something like scholarship money, of which they then donate a portion (partially because they do want to offset the harm that came from being disho... (read more)

My guess is most readers are more interested in the condemnation part though, given the overwhelming support that posts like this have received, which have basically no content besides condemnation (and IMO with even bigger problems on being inaccurate about where to draw ethical lines).

I think my post is quite clear about what sort of fraud I am talking about. If you look at the reasons that I give in my post for why fraud is wrong, they clearly don't apply to any of examples of justifiable lying that you've provided here (lying to Nazis, doing the lea... (read more)

1
Habryka
1y
I am working on a longer response to your post, so not going to reply to you here in much depth. Responding to this specific comment:  I don't think your line of argumentation here makes much sense (making very broad statements like "Fraud in the service of Effective Altruism is unacceptable" but then saying "well, but of course only the kind of fraud for which I gave specific counterarguments"). Your post did not indicate that it was talking about any narrower definition of fraud, and I am confident (based on multiple conversations I've had about it with) that it was being read by other readers as arguing for a broad definition of fraud. If you actually think it should only apply to a narrower definition of fraud, then I think you should add a disclaimer to the top explaining what kind of fraud you are talking about, or change the title.

The portion you quote is included at the very end as an additional point about how even if you don't buy my primary arguments that fraud in general is bad, in this case it was empirically bad. It is not my primary reason for thinking fraud is bad here, and I think the post is quite clear about that.

evhub
1y31
14
0

I agree with this post from a moral perspective, though one thing it does not touch on is the legal question. My guess is that, in the same way that a court probably wouldn't try to claw back money from a utility company/janitor/etc. that FTXFF beneficiaries are also probably safe, but IANAL so maybe somebody who knows more there could comment.

Jason has made a  comments on this issue with a number of points worth considering; I found this thread particularly eye-opening. I came away feeling that the risk of clawbacks shouldn't be ignored.

Geoffrey Miller also made the important point that "if there are any legal 'clawbacks' of money in the future, that would have to be done through official legal channels -- and they might not care that we've already sent money back somewhere for allegedly honorable reasons. So we might end up returning a bunch of money, and then being legally obligated to r... (read more)

7[anonymous]1y
I found this tweet encouraging:

That's a pretty wild misreading of my post. The main thesis of the post is that we should unequivocally condemn fraud. I do not think that the reason that fraud is bad is because of PR reasons, nor do I say that in the post—if you read what I wrote about why I think it's wrong to commit fraud at the end, what I say is that you should have a general policy against ever committing fraud, regardless of the PR consequences one way or another.

4
David M
1y
The main thesis of your post (we should unequivocally condemn fraud) is correct, but the way you defend it is in conflict with it (by saying it's wrong for instrumental reasons). Here's the PR argument: This weakens the condemnation, by making it be about the risks of being found out, not the badness of the action. When you explain that pre-committing to not commit fraud is an advantageous strategy, I read this as another instrumental argument. It's hard to condemn things unequivocally from a purely utilitarian point of view, because then all reasons are instrumental. I'm not saying your reasons are untrue, but I think that when non-utilitarians read them, they won't see an unequivocal condemnation, but a pragmatic argument that in other contexts could be turned in defence of fraud, if the consequences come out the other way. That said, Jack Malde's reply to me is a pretty good attempt at unequivocal condemnation from within a utilitarian frame, because it doesn't talk about conditions that might not hold for some instance of fraud. (But it's not necessarily correct.)

Anecdotally it seems like many of the world's most successful companies do try to make frugality part of their culture, e.g. it's one of Amazon's leadership principles.

Google, by contrast, is notoriously the opposite—for example emphasizing just trying lots of crazy, big, ambitious, expensive bets (e.g. their "10x" philosophy). Also see how Google talked about frugality in 2011.

5
AdamGleave
2y
Making bets on new ambitious projects doesn't seem necessarily at odds with frugality: you can still execute on them in a lean way, some things just really do take a big CapEx. Granted whether Google or any major tech company really does this is debatable, but I do think they tend to at least try to instill it, even if there is some inefficiency e.g. due to principal-agent problems.
evhub
2y200
1
0

One thing that bugged me when I first got involved with EA was the extent to which the community seemed hesitant to spend lots of money on stuff like retreats, student groups, dinners, compensation, etc. despite the cost-benefit analysis seeming to favor doing so pretty strongly. I know that, from my perspective, I felt like this was some evidence that many EAs didn't take their stated ideals as seriously as I had hoped—e.g. that many people might just be trying to act in the way that they think an altruistic person should rather than really carefully thin... (read more)

My anecdotal experience hiring is that I get many more prospective candidates saying something like "if this is so important why isn't your salary way above market rates?" than "if you really care about impact, why are you offering so much money?" (Though both sometimes happen.)

1[comment deleted]2y

Precisely. Also, the frugality of past EA creates a selection effect, so probably there is a larger fraction of anti-frugal people outside the community (and among people who might be interested) than we would expect from looking inside it.

Great point! I think each spending strategy has its pitfalls related to signalling.

I think this correlates somewhat with people's knowledge/engagement with economics, and political lean. The "frugal altruism" will probably attract more left leaning people, while "spending altruism" probably attracts more right leaning people

I agree that it’s possible to be unthinkingly frugal. It’s also possible to be unthinkingly spendy. Both seem bad, because they are unthinking. A solution would be to encourage EA groups to practice good thinking together, and to showcase careful thinking on these topics.

I like the idea of having early EA intro materials and university groups that teach BOTECs, cost-benefit analysis, and grappling carefully with spending decisions.

This kind of training, however, trades off against time spent learning about eg. AI safety and biosecurity.

Academic projects are definitely the sort of thing we fund all the time. I don't know if the sort of research you're doing is longtermist-related, but if you have an explanation of why you think your research would be valuable from a longtermist perspective, we'd love to hear it.

Since it was brought up to me, I also want to clarify that EA Funds can fund essentially anyone, including:

  • people who have a separate job but want to spend extra time doing an EA project,
  • people who don't have a Bachelor's degree or any other sort of academic credentials,
  • kids who are in high school but are excited about EA and want to do something,
  • fledgling organizations,
  • etc.

I'm one of the grant evaluators for the LTFF and I don't think I would have any qualms with funding a project 6-12 months in advance.

1
James Smith
3y
Great, thanks for the response.

To be clear, I agree with a lot of the points that you're making—the point of sketching out that model was just to show the sort of thing I'm doing; I wasn't actually trying to argue for a specific conclusion. The actual correct strategy for figuring out the right policy here, in my opinion, is to carefully weigh all the different considerations like the ones you're mentioning, which—at the risk of crossing object and meta levels—I suspect to be difficult to do in a low-bandwidth online setting like this.

Maybe it'll still be helpful to just give my take us... (read more)

8
Wei Dai
3y
If there are lots of considerations that have to be weighed against each other, then it seems easily the case that we should decide things on a case by case basis, as sometimes the considerations might weigh in favor of downvoting someone for refusing to engage with criticism, and other times they weigh in the other direction. But this seems inconsistent with your original blanket statement, "I don’t think any person or group should be downvoted or otherwise shamed for not wanting to engage in any sort of online discussion" About online versus offline, I'm confused why you think you'd be able to convey your model offline but not online, as the bandwidth difference between the two don't seem large enough that you could do one but not the other. Maybe it's not just the bandwidth but other differences between the two mediums, but I'm skeptical that offline/audio conversations are overall less biased than online/text conversations. If they each have their own biases, then it's not clear what it would mean if you could convince someone of some idea over one medium but not the other. If the stakes were higher or I had a bunch of free time, I might try an offline/audio conversation with you anyway to see what happens, but it doesn't seem like a great use of our time at this point. (From your perspective, you might spend hours but at most convince one person, which would hardly make a dent if the goal is to change the Forum's norms. I feel like your best bet is still to write a post to make your case to a wider audience, perhaps putting in extra effort to overcome the bias against it if there really is one.) I'm still pretty curious what experiences led you to think that online discussions are often terrible, if you want to just answer that. Also are there other ideas that you think are good but can't be spread through a text medium because of its inherent bias?

I think you're imagining that I'm doing something much more exotic here than I am. I'm basically just advocating for cooperating on what I see as a prisoner's-dilemma-style game (I'm sure you can also cast it as a stag hunt or make some really complex game-theoretic model to capture all the nuances—I'm not trying to do that there; my point here is just to explain the sort of thing that I'm doing).

Consider:

A and B can each choose:

  • public) publicly argue against the other
  • private) privately discuss the right thing to do

And they each have utility function... (read more)

(It seems that you're switching the topic from what your policy is exactly, which I'm still unclear on, to the model/motivation underlying your policy, which perhaps makes sense, as if I understood your model/motivation better perhaps I could regenerate the policy myself.)

I think I may just outright disagree with your model here, since it seems that you're not taking into account the significant positive externalities that a public argument can generate for the audience (in the form of more accurate beliefs, about the organizations involved and EA topics i... (read more)

For example would you really not have thought worse of MIRI (Singularity Institute at the time) if it had labeled Holden Karnofsky's public criticism "hostile" and refused to respond to it, citing that its time could be better spent elsewhere?

To be clear, I think that ACE calling the OP “hostile” is a pretty reasonable thing to judge them for. My objection is only to judging them for the part where they don't want to respond any further. So as for the example, I definitely would have thought worse of MIRI if they had labeled Holden's criticisms as “host... (read more)

Still pretty unclear about your policy. Why is ACE calling the OP "hostile" not considered "meta-level" and hence not updateable (according to your policy)? What if the org in question gave a more reasonable explanation of why they're not responding, but doesn't address the object-level criticism? Would you count that in their favor, compared to total silence, or compared to an unreasonable explanation? Are you making any subjective judgments here as to what to update on and what not to, or is there a mechanical policy you can write down (that anyone can f... (read more)

That's a great point; I agree with that.

I disagree, obviously, though I suspect that little will be gained by hashing it out in more here. To be clear, I have certainly thought about this sort of issue in great detail as well.

I would be curious to read more about your approach, perhaps in another venue. Some questions I have:

  1. Do you propose to apply this (not updating when an organization refuses to engage with public criticism) universally? For example would you really not have thought worse of MIRI (Singularity Institute at the time) if it had labeled Holden Karnofsky's public criticism "hostile" and refused to respond to it, citing that its time could be better spent elsewhere? If not, how do you decide when to apply this policy? If yes, how do you prevent bad actors from t
... (read more)

It clearly is actual, boring, normal, bayesian evidence that they don't have a good response. It's not overwhelming evidence, but someone declining to respond sure is screening off the worlds where they had a great low-inferential distance reply that was cheap to shoot off that addressed all the concerns. Of course I am going to update on that.

I think that you need to be quite careful with this sort of naive-CDT-style reasoning. Pre-commitments/norms against updating on certain types of evidence can be quite valuable—it is just not the case that you sho... (read more)

5
Habryka
3y
I agree the calculation isn't super straightforward, and there is a problem of disincentivizing glomarization here, but I do think overall, all things considered, after having thought about situations pretty similar to this for a few dozen hours, I am pretty confident it's still decent bayesian evidence, and I endorse treating it as bayesian evidence (though I do think the pre-commitment consideration dampen the degree to which I am going to act on that information a bit, though not anywhere close to fully). 

To be clear, I think it's perfectly reasonable for you to want ACE to respond if you expect that information to be valuable. The question is what you do when they don't respond. The response in that situation that I'm advocating for is something like “they chose not to respond, so I'll stick with my previous best guess” rather than “they chose not to respond, therefore that says bad things about them, so I'll update negatively.” I think that the latter response is not only corrosive in terms of pushing all discussion into the public sphere even when that makes it much worse, but it also hurts people's ability to feel comfortably holding onto non-public information.

“they chose not to respond, therefore that says bad things about them, so I'll update negatively.” I think that the latter response is not only corrosive in terms of pushing all discussion into the public sphere even when that makes it much worse, but it also hurts people's ability to feel comfortably holding onto non-public information.

This feels wrong from two perspectives: 

  1. It clearly is actual, boring, normal, bayesian evidence that they don't have a good response. It's not overwhelming evidence, but someone declining to respond sure is screening o
... (read more)

Yeah, I downvoted because it called the communication hostile without any justification for that claim. The comment it is replying to doesn't seem at all hostile to me, and asserting it is, feels like it's violating some pretty important norms about not escalating conflict and engaging with people charitably.

Yeah—I mostly agree with this.

I think it's pretty important for people to make themselves available for communication.

Are you sure that they're not available for communication? I know approximately nothing about ACE, but I'd surprised if they wo... (read more)

Are you sure that they're not available for communication? I know approximately nothing about ACE, but I'd surprised if they wouldn't be willing to talk to you after e.g. sending them an email.

Yeah, I am really not sure. I will consider sending them an email. My guess is they are not interested in talking to me in a way that would later on allow me to write up what they said publicly, which would reduce the value of their response quite drastically to me. If they are happy to chat and allow me to write things up, then I might be able to make the time, but ... (read more)

I also think there's a strong tendency for goalpost-moving with this sort of objection—are you sure that, if they had said more things along those lines, you wouldn't still have objected?

I do think I would have still found it pretty sad for them to not respond, because I do really care about our public discourse and this issue feels important to me, but I do think I would feel substantially less bad about it, and probably would only have mild-downvoted the comment instead of strong-downvoted it. 

What I have a problem with is the notion that we should

... (read more)

Why was this response downvoted so heavily? (This is not a rhetorical question—I'm genuinely curious what the specific reasons were.)

As Jakub has mentioned above, we have reviewed the points in his comment and fully support Anima International’s wish to share their perspective in this thread. However, Anima’s description of the events above does not align with our understanding of the events that took place, primarily within points 1,5, and 6.

This is relevant, useful information.

The most time-consuming part of our commitment to Representation, Equity

... (read more)

I didn't downvote (because as you say it's providing relevant information), but I did have a negative reaction to the comment. I think the generator of that negative reaction is roughly: the vibe of the comment seems more like a political attempt to close down the conversation than an attempt to cooperatively engage. I'm reminded of "missing moods";  it seems like there's a legitimate position of "it would be great to have time to hash this out but unfortunately we find it super time consuming so we're not going to", but it would naturally come with a... (read more)

I downvoted because it called the communication hostile without any justification for that claim. The comment it is replying to doesn't seem at all hostile to me, and asserting it is, feels like it's violating some pretty important norms about not escalating conflict and engaging with people charitably.

I also think I disagree that orgs should never be punished for not wanting to engage in any sort of online discussion. We have shared resources to coordinate, and as a social network without clear boundaries, it is unclear how to make progress on many of the... (read more)

I'd personally love to get more Alignment Forum content cross-posted to the EA Forum. Maybe some sort of automatic link-posting? Though that could pollute the EA Forum with a lot of link posts that probably should be organized separately somehow. I'd certainly be willing to start cross-posting my research to the EA Forum if that would be helpful.

Instinctively, I wish that discussion on these posts could all happen on the Alignment Forum, but since who can join is limited, having discussion here as well could be nice.

I don't know whether every single post should be posted here, but it would be nice to at least have occasional posts summarizing the best recent AF content. This might look like just crossposting every new issue of the Alignment Newsletter, which is something I may start doing soon.

Glad you enjoyed it!

So, I think what you're describing in terms of a model with a pseudo-aligned objective pretending to have the correct objective is a good description of specifically deceptive alignment, though the inner alignment problem is a more general term that encompasses any way in which a model might be running an optimization process for a different objective than the one it was trained on.

In terms of empirical examples, there definitely aren't good empirical examples of deceptive alignment right now for the reason you mentioned, though whether

... (read more)
Answer by evhubMar 05, 20204
0
0

This thread on LessWrong has a bunch of information about precautions that might be worth taking.