Lukas_Finnveden

Sorted by New

# Wiki Contributions

We're Redwood Research, we do applied alignment research, AMA

Hm, could you expand on why collusion is one of the most salient ways in which "it’s possible to build systems that are performance-competitive and training-competitive, and do well on average on their training distribution" could fail?

Is the thought here that — if models can collude — then they can do badly on the training distribution in an unnoticeable way, because they're being checked by models that they can collude with?

When pooling forecasts, use the geometric mean of odds

My answer is that we need to understand the resilience of the aggregated prediction to new information.

This seems roughly right to me. And in particular, I think this highlights the issue with the example of institutional failure. The problem with aggregating predictions to a single guess p of annual failure, and then using p to forecast, is that it assumes that the probability of failure in each year is independent from our perspective. But in fact, each year of no failure provides evidence that the risk of failure is low. And if the forecasters' estimates initially had a wide spread, then we're very sensitive to new information, and so we should update more on each passing year. This would lead to a high probability of failure in the first few years, but still a moderately high expected lifetime.

EA Hangout Prisoners' Dilemma

According to wikipedia, the $300 vs$100 is fine for a one-shot prisoner's dilemma. But an iterated prisoner's dilemma would require (defect against cooperate)+(cooperate against defect) < 2*(cooperate cooperate), since the best outcome is supposed to be permanent cooperate/cooperate rather than alternating cooperation/defection.

However, the fact that this games gives out the same 0$for both cooperate/defect and defect/defect means it nevertheless doesn't count as an ordinary prisoner's dilemma. Defecting against someone who defects needs to be strictly better than cooperating against a defector. In fact, in this case, every EA is likely going to put some positive valuation on$300 to both miri and amf, so cooperating against a defector is actively preferred to defecting against a defector.

MichaelA's Shortform

Thanks, I appreciate having something to link to! My independent impression is that it would be even easier to link to and easier to find as a top-level post.

Why AI alignment could be hard with modern deep learning

FWIW, I think my median future includes humanity solving AI alignment but messing up reflection/coordination in some way that makes us lose out on most possible value. I think this means that longtermists should think more about reflection/coordination-issues than we're currently doing. But technical AI alignment seems more tractable than reflection/coordination, so I think it's probably correct for more total effort to go towards alignment (which is the status quo).

I'm undecided about whether these reflection/coordination-issues are best framed as "AI risk" or not. They'll certainly interact a lot with AI, but we would face similar problems without AI.

Honoring Petrov Day on the EA Forum: 2021

This was proposed and discussed 2 years ago here.

What should "counterfactual donation" mean?

Say I offer to make a counterfactual donation of \$50 to the Against Malaria Foundation (AMF) if you do a thing; which of the following are ok for me to do if you don't?

I think this misses out on an important question, which is "What would you have done with the money if you hadn't offered the counterfactual donation?"

If you were planning to donate to AMF, but then realised that you could make me do X by commiting to burn the money if I don't do X, I think that's not ok, in two senses:

• Firstly, if you just state that the donation is counterfactual, I would interpret it to be mean that you would've done something like (9), if you hadn't offered the counterfactual donation.
• Secondly, even if you thoroughly clarified and communicated what you were doing, I think we should have a norm against this kind of behavior.

In fact, to make nitpicky distinctions... If I didn't do X, I feel reluctant to say that it's "not ok" for you to donate to AMF. I want to say that it is ok for you to donate to AMF at that point, but that doing so is strong evidence that you were behaving dishonestly when initially promising a counterfactual donation, and that said offering was not ok.

How to succeed as an early-stage researcher: the “lean startup” approach

Let’s say that Alice is an expert in AI alignment, and Bob wants to get into the field, and trusts Alice’s judgment. Bob asks Alice what she thinks is most valuable to work on, and she replies, “probably robustness of neural networks”. [...]  I think Bob should instead spend some time thinking about how a solution to robustness would mean that AI risk has been meaningfully reduced. [...] It’s possible that after all this reflection, Bob concludes that impact regularization is more valuable than robustness. [...] It’s probably not the case that progress in robustness is 50x more valuable than progress in impact regularization, and so Bob should go with [impact regularization].

In the example, Bob "wants to get into the field", so this seems like an example of how junior people shouldn't defer to experts when picking research projects.

(Specualative differences: Maybe you think there's a huge difference between Alice giving a recommendation about an area vs a specific research project? Or maybe you think that working on impact regularization is the best Bob can do if he can't find a senior researcher to supervise him, but if Alice could supervise his work on robustness he should go with robustness? If so, maybe it's worth clarifying that in the FAQ.)

Edit: TBC, I interpret Toby Shevlane as saying ~you should probably work on whatever senior people find interesting; while Jan Kulveit says that "some young researchers actually have great ideas, should work on them, and avoid generally updating on research taste of most of the 'senior researchers'". The quoted FAQ example is consistent with going against Jan's strong claim, but I'm not sure it's consistent with agreeing with Toby's initial advice, and I interpret you as agreeing with that advice when writing e.g. "Defer to experts for ~3 years, then trust your intuitions".

What is the EU AI Act and why should you care about it?

Thank you for this! Very useful.

The AI act creates institutions responsible for monitoring high-risk systems and the monitoring of AI progress as a whole.

In what sense is the AI board (or some other institution?) responsible for monitoring AI progress as a whole?

How to succeed as an early-stage researcher: the “lean startup” approach

One reason to publish papers (specifically) about AI governance (specifically) is if you want to build an academic field working on AI governance. This is good both to get more brainpower and to get more people (who otherwise wouldn't read EA research) to take the research seriously, in the long term. C.f. the last section here https://forum.effectivealtruism.org/posts/42reWndoTEhFqu6T8/ai-governance-opportunity-and-theory-of-impact