All of Lukas Finnveden's Comments + Replies

Here's one line of argument:

  • Positive argument in favor of humans: It seems pretty likely that whatever I'd value on-reflection will be represented in a human future, since I'm a human. (And accordingly, I'm similar to many other humans along many dimensions.)
    • If AI values where sampled ~randomly (whatever that means), I think that the above argument would be basically enough to carry the day in favor of humans.
  • But here's a salient positive argument in favor of why AIs' values will be similar to mine: People will be training AIs to be nice and helpful, which
... (read more)

There might not be any real disagreement. I'm just saying that there's no direct conflict between "present people having material wealth beyond what they could possibly spend on themselves" and "virtually all resources are used in the way that totalist axiologies would recommend".

What's the argument for why an AI future will create lots of value by total utilitarian lights?

At least for hedonistic total utilitarianism, I expect that a large majority of expected-hedonistic-value (from our current epistemic state) will be created by people who are at least partially sympathetic to hedonistic utilitarianism or other value systems that value a similar type of happiness in a scope-sensitive fashion. And I'd guess that humans are more likely to have such values than AI systems. (At least conditional on my thinking that such values are a g... (read more)

3
Matthew_Barnett
10d
We can similarly ask, "Why would an em future create lots of value by total utilitarian lights?" The answer I'd give is: it would happen for essentially the same reasons biological humans might do such a thing. For example, some biological humans are utilitarians. But some ems might be utilitarians too. Therefore, both could create lots of value by total utilitarian lights. In order to claim that ems have a significantly lower chance of creating lots of value by total utilitarian lights than biological humans, you'd need to posit a distinction between ems and biological humans that makes this possibility plausible. Some candidate distinctions, such as the idea that ems would not be conscious because they're on a computer, seem implausible in any way that could imply the conclusion. So, at least as far as I can tell, I cannot identify any such distinction; and thus, ems seem similarly likely to create lots of value by total utilitarian lights, compared to biological humans. The exact same analysis can likewise be carried over to the case for AIs. Some biological humans are utilitarians, but some AIs might be utilitarians too. Therefore, both could create lots of value by total utilitarian lights. In order to claim that AIs have a significantly lower chance of creating lots of value by total utilitarian lights than biological humans, you'd need to posit a distinction between AIs and biological humans that makes this possibility plausible. A number of candidate distinctions have been given to me in the past. These include: 1. The idea that AIs will not be conscious 2. The idea that AIs will care less about optimizing for extreme states of moral value 3. The idea that AIs will care more about optimizing imperfectly specified utility functions, which won't produce much utilitarian moral value In each case I generally find that the candidate distinction is either poorly supported, or it does not provide strong support for the conclusion. So, just as with ems, I fi

I find it plausible that future humans will choose to create much fewer minds than they could. But I don't think that "selfishly desiring high material welfare" will require this. Just the milky way has enough stars for each currently alive human to get an entire solar system each. Simultaneously, intergalactic colonization is probably possible (see here) and I think the stars in our own galaxy is less than 1-in-a-billion of all reachable stars. (Most of which are also very far away, which further contributes to them not being very interesting to use for s... (read more)

1
OscarD
10d
Good point re aesthetics perhaps mattering more, and about people dis-valuing inequality and therefore not wanting to create a lot of moderately good lives lest they feel bad about having amazing lives and controlling vast amounts of resources. Re "But I don't think ..." in your first paragraph, I am not sure what if anything we actually disagree about. I think what you are saying is that there are plenty of resources in our galaxy, and far more beyond, for all present people to have fairly arbitrarily large levels of wealth. I agree, and I am also saying that people may want to keep it roughly that way, rather than creating heaps of people and crowding up the universe.
2
Ryan Greenblatt
10d
Another relevant consideration along these lines is that people who selfishly desire high wealth might mostly care about positional goods which are similar to current positional goods. Usage of these positional goods won't burn much of any compute (resources for potential minds) even if these positional goods become insanely valuable in terms of compute. E.g., land values of interesting places on earth might be insanely high and people might trade vast amounts of comptuation for this land, but ultimately, the computation will be spent on something else. 

compared to MIRI people, or even someone like Christiano, you, or Joe Carlsmith probably have "low" estimates

Christiano says ~22% ("but you should treat these numbers as having 0.5 significant figures") without a time-bound; and Carlsmith says ">10%" (see bottom of abstract) by 2070. So no big difference there.

2
David Mathers
3mo
Fair point. Carlsmith said less originally.

I'll hopefully soon make a follow-up post with somewhat more concrete projects that I think could be good. That might be helpful.

Are you more concerned that research won't have any important implications for anyone's actions, or that the people whose decisions ought to change as a result won't care about the research?

Similary, 'Politics is the Mind-Killer' might be the rationalist idea that has aged worst - especially for its influences on EA.

What influence are you thinking about? The position argued in the essay seems pretty measured.

Politics is an important domain to which we should individually apply our rationality—but it’s a terrible domain in which to learn rationality, or discuss rationality, unless all the discussants are already rational. [...]

I’m not saying that I think we should be apolitical, or even that we should adopt Wikipedia’s ideal of the Neu

... (read more)
JWS
5mo10
2
0

I'm relying on my social experience and intuition here, so I don't expect I've got it 100% right, and others may indeed have different interpretations of the community's history with engaging with politics.

But concern about people over-extrapolating from Eliezer's initial post (many such cases) and treating it more of a norm to ignore politics full-stop seems to have been an established concern many years ago (related discussion here). I think that there's probably an interaction effect with the 'latent libertarianism' in early LessWrong/Rationalist space ... (read more)

1
Venky1024
6mo
Not sure I agree. Brian Tomasik's post is less a general argument against the approach of EV maximization but more a demonstration of its misapplication in a context where expectation is computed across two distinct distributions of utility functions. As an aside, I also don't see the relation between the primary argument being made there and the two-envelopes problem because the latter can be resolved by identifying a very clear mathematical flaw in the claim (that switching is better).   

I liked this recent interview with Mark Dybul who worked on PEPFAR from the start: https://www.statecraft.pub/p/saving-twenty-million-lives

One interesting contrast with the conclusion in this post is that Dybul thinks that PEPFAR's success was a direct consequence of how it didn't involve too many people and departments early on — because the negotiations would have been too drawn out and too many parties would have tried to get pieces of control. So maybe a transparent process that embraced complexity wouldn't have achieved much, in practice.

(At other par... (read more)

4
TomDrake
8mo
Thanks for sharing - it’s an interesting interview. My first reaction is that interdepartmental bureaucracy is quite a different beast to an evidence-to-policy process. I agree that splitting development policy/programmes across multiple government depts causes lots of problems and is generally to be avoided if possible (I’m thinking about the UK system but imagine the challenges are similar in the US and elsewhere). Of course you do need some bureaucracy to facilitate evidence-to-policy too, but on the whole I think it’s absolutely worth the time. For public policy we should aim to make a small number of decisions really well. The idea a small efficient group who just know what to do and crack on is appealing; it’s a more heroic narrative than a careful weighing of the evidence. Though I can’t imagine the users of this forum need persuading of the importance of using evidence to do better than our intuitions and overcome our biases.  Incidentally, I feel this kind of we-know-what-to-do-let’s-crack-on instinct is more acceptable in development policy than domestic, and in my view development policy would benefit from being much more considered. We cause a lot of chaos and harm to systems in LMICs in the way we offer development assistance, even through programmes that are supporting valuable services. I think all of the major GHI’s do great work, but all could benefit from substantial reforms. Though again, this is somewhat separate from the point about interdepartmental bureaucracy. 

FWIW you can see more information, including some of the reasoning, on page 655 (# written on pdf) /  659 (# according to page searcher) of the report. (H/t Isabel.) See also page 214 for the definition of the question.

Some tidbits:

Experts started out much higher than superforecasters, but updated downwards after discussion. Superforecasters updated a bit upward, but less:

(Those are billions on the y-axis.)

This was surprising to me. I think the experts' predictions look too low even before updating, and look much worse after updating!

The part of the ... (read more)

1
kokotajlod
8mo
Thanks!  I think this is evidence for a groupthink phenomenon amongst superforecasters. Interestingly my other experiences talking with superforecasters have also made me update in this direction (they seemed much more groupthinky than I expected, as if they were deferring to each other a lot. Which, come to think of it, makes perfect sense -- I imagine if I were participating in forecasting tournaments, I'd gradually learn to reflexively defer to superforecasters too, since they genuinely would be performing well.)

It's the crux between you and Ajeya, because you're relatively more in agreement on the other numbers. But I think that adopting the xpt numbers on these other variables would slow down your own timelines notably, because of the almost complete lack of increase in spending.

That said, if the forecasters agreed with your compute requirements, they would probably also forecast higher spending.

The XPT forecasters are so in the dark about compute spending that I just pretend they gave more reasonable numbers. I'm honestly baffled how they could be so bad. The most aggressive of them thinks that in 2025 the most expensive training run will be $70M, and that it'll take 6+ years to double thereafter, so that in 2032 we'll have reached $140M training run spending... do these people have any idea how much GPT-4 cost in 2022?!?!? Did they not hear about the investments Microsoft has been making in OpenAI? And remember that's what the most aggressive among them thought! The conservatives seem to be living in an alternate reality where GPT-3 proved that scaling doesn't work and an AI winter set in in 2020.

in terms of saving “disability-adjusted life years” or DALYs, "a case of HIV/AIDS can be prevented for $11, and a DALY gained for $1” by improving the safety of blood transfusions and distributing condoms

These numbers are wild compared to eg current givewell numbers. My guess would be that they're wrong, and if so, that this was a big part of why PEPFAR did comparatively better then expected. Or maybe that they were significantly less scalable (measured in cost of marginal life saved as a function of lives saved so far) than PEPFAR.

If the numbers were r... (read more)

Nice, gotcha.

Incidentally, as its central estimate for algorithmic improvement, the takeoff speeds model uses AI and Efficiency's ~1.7x per year, and then halves it to ~1.3x per year (because todays' algorithmic progress might not generalize to TAI). If you're at 2x per year, then you should maybe increase the "returns to software" from 1.25 to ~3.5, which would cut the model's timelines by something like 3 years. (More on longer timelines, less on shorter timelines.)

Yeah sorry, I didn't mean to say this directly contradicted anything you said. It just felt like a good reference that might be helpful to you or other people reading the thread. (In retrospect, I should have said that and/or linked it in response to the mention in your top-level comment instead.)

(Also, personally, I do care about how much effort and selection is required to find good retrodictions like this, so in my book "I didn't look up the data on Google beforehand" is relevant info. But it would have been way more impressive if someone had been able ... (read more)

and notably there's been perhaps a 2x speedup in algorithmic progress since 2022

I don't understand this. Why would there be a 2x speedup in algorithmic progress?

2
Matthew_Barnett
11mo
Sorry, that was very poor wording. I meant that 2023 FLOP is probably about equal to 2 2022 FLOP, due to continued algorithmic progress. I'll reword the comment you replied to.

And, as I think Eliezer said (roughly), there don't seem to be many cases where new tech was predicted based on when some low-level metric would exceed the analogous metric in a biological system. [...] And the way in which machines perform tasks usually looks very different than how biological systems do it (bird vs. airplanes, etc.).

From Birds, Brains, Planes, and AI:

This data shows that Shorty [hypothetical character introduced earlier in the post] was entirely correct about forecasting heavier-than-air flight. (For details about the data, see appendix.

... (read more)
1
Jess_Riedel
11mo
I listed this example in my comment, it was incorrect by an order of magnitude, and it was a retrodiction.  "I didn't look up the data on Google beforehand" does not make it a prediction.

I think my biggest disagreement with the takeoff speeds model is just that it's conditional on things like: no coordinated delays, regulation, or exogenous events like war, and doesn't take into account model uncertainty.

Cool, I thought that was most of the explanation for the difference in the median. But I thought it shouldn't be enough to explain the 14x difference between 28% and 2% by 2030, because I think there should be a ≥20% chance that there are no significant coordinated delays, regulation, or relevant exogenous events if AI goes wild in the nex... (read more)

4
Matthew_Barnett
11mo
Update: I changed the probability distribution in the post slightly in line with your criticism. The new distribution is almost exactly the same, except that I think it portrays a more realistic picture of short timelines. The p(TAI < 2030) is now 5% [eta: now 7%], rather than 2%.
4
Matthew_Barnett
11mo
That's reasonable. I think I probably should have put more like 3-6% credence before 2030. I should note that it's a bit difficult to tune the Metaculus distributions to produce exactly what you want, and the distribution shouldn't be seen as an exact representation of my beliefs.

My own distribution over the training FLOP for transformative AI is centered around ~10^32 FLOP using 2023 algorithms, with a standard deviation of about 3 OOM.

Thanks for the numbers!

For comparison, takeoffspeeds.com has an aggressive monte-carlo (with a median of 10^31 training FLOP) that yields a median of 2033.7 for 100% automation — and a p(TAI < 2030) of ~28%. That 28% is pretty radically different from your 2%. Do you know your biggest disagreements with that model?

The 1 OOM difference in training FLOP presumably doesn't explain that much. (Althou... (read more)

6
Matthew_Barnett
11mo
I think my biggest disagreement with the takeoff speeds model is just that it's conditional on things like: no coordinated delays, regulation, or exogenous events like war, and doesn't take into account model uncertainty. My other big argument here is that I just think robots aren't very impressive right now, and it's hard to see them going from being unimpressive to extremely impressive in just a few short years. 2030 is very soon. Imagining a even a ~4 year delay due to all of these factors produces a very different distribution. Also, as you note, "takeoffspeeds.com talks about "AGI" and you talk about "TAI". I think transformative AI is a lower bar than 100% automation. The model itself says they added "an extra OOM to account for TAI being a lower bar than full automation (AGI)." Notably, if you put in 10^33 2022 FLOP into the takeoff model (and keep in mind that I was talking about 2023 FLOP), it produces a median year of >30% GWP growth of about 2032, which isn't too far from what I said in the post: I added about four years to this 2032 timeline due to robots, which I think is reasonable even given your considerations about how we don't have to automate everything -- we just need to automate the bottlenecks to producing more semiconductor fabs. But you could be right that I'm still being too conservative.

The quote continues:

Of the remaining 5 %, around 70 % would eventually be reached by other civilisations, while 30 % would have remained empty in our absence.

I think the 70%/30% numbers are the relevant ones for comparing human colonization vs. extinction vs. misaligned AGI colonization. (Since 5% cuts the importance of everything equally.) 

...assuming defensive dominance in space, where you get to keep space that you acquire first. I don't know what happens without that.

This would suggest that if we're indifferent between space being totally uncoloni... (read more)

If AGI systems had goals that were cleanly separated from the rest of their cognition, such that they could learn and self-improve without risking any value drift (as long as the values-file wasn't modified), then there's a straightforward argument that you could stabilise and preserve that system's goals by just storing the values-file with enough redundancy and digital error correction.

So this would make section 6 mostly irrelevant. But I think most other sections remain relevant, insofar as people weren't already convinced that being able to build stabl... (read more)

I really like the proposed calibration game! One thing I'm curious about is whether real-world evidence more often looks like a likelihood ratio or like something else (e.g. pointing towards a specific probability being correct). Maybe you could see this from the structure of priors+likelihoodratios+posteriors in the calibration game — e.g. check whether the long-run top-scorers likelihood ratios correlated more or less than their posterior probabilities.

(If someone wanted to build this: one option would be to start with pastcasting and then give archived ... (read more)

2
Jonas V
1y
Interesting point, agreed that this would be very interesting to analyze!

And it would probably be a huge mistake to seek out an adderall prescription.

...unless you have other reasons to believe that an Adderall prescription might be good for you. Saliently: if you have adhd symptoms.

Depends on how much of their data they'd have to back up like this. If every bit ever produced or operated on instead had to be be 25 bits — that seems like a big fitness hit. But if they're only this paranoid about a few crucial files (e.g. the minds of a few decision-makers), then that's cheap.

And there's another question about how much stability contributes to fitness. In humans, cancer tends to not be great for fitness. Analogously, it's possible that most random errors in future civilizations would look less like slowly corrupting values and more like... (read more)

This is a great question. I think the answer depends on the type of storage you're doing.

If you have a totally static lump of data that you want to encode in a harddrive and not touch for a billion years, I think the challenge is mostly in designing a type of storage unit that won't age. Digital error correction won't help if your whole magnetism-based harddrive loses its magnetism. I'm not sure how hard this is.

But I think more realistically, you want to use a type of hardware that you regularly use, regularly service, and where you can copy the informati... (read more)

6
trammell
1y
Cool, thanks for thinking this through! This is super speculative of course, but if the future involves competition between different civilizations / value systems, do you think having to devote say 96% (i.e. 24/25) of a civilization's storage capacity to redundancy would significantly weaken its fitness? I guess it would depend on what fraction of total resources are spent on information storage...? Also, by the same token, even if there is a "singleton" at some relatively early time, mightn't it prefer to take on a non-negligible risk of value drift later in time if it means being able to, say, 10x its effective storage capacity in the meantime? (I know your 24/25 was a conservative estimate in some ways; on the other hand it only addresses the first billion years, which is arguably only a small fraction of the possible future, so hopefully it's not too biased a number to anchor on!)

I'm not sure how literally you mean "disprove", but at it's face, "assume nothing is related to anything until you have proven otherwise" is a reasoning procedure that will never recommend any action in the real world, because we never get that kind of certainty. When humans try to achieve results in the real world, heuristics, informal arguments, and looking at what seems to have worked ok in the past are unavoidable.

2[anonymous]1y
I am talking about math. In math, we can at least demonstrate things for certain (and prove things for certain, too, though that is admittedly not what I am talking about). But the point is that we should at least be to bust out our calculators and crunch the numbers. We might not know if these numbers apply to the real world. That's fine. But at least we have the numbers. And that counts for something. For example, we can know roughly how much wealth SBF was gambling. We can give that a range. We also can estimate how much risk he was taking on. We can give that a range too. Then we can calculate if the risk he took on had net positive expected value in expectation It's possible that it has expected value in expectation, only above a certain level of risk, or whatever. Perhaps we do not know whether he faced this risk. That is fine. But we can still at any rate see in under what circumstances SBF would have been rational, acting on utilitarian grounds, to do what he did. If these circumstances sound like do or could describe the circumstances that SBF was in earlier this week, then that should give us reason to pause.

Global poverty probably have slower diminishing marginal returns, yeah. Unsure about animal welfare. I was mostly thinking about longtermist causes.

Re 80,000 Hours: I don't know exactly what they've argued, but I think "very valuable" is compatible with logarithmic returns. There are also diminishing marginal returns to direct workers in any given cause, so logarithmic returns on money doesn't mean that money becomes unimportant compared to people, or anything like that.

2
MichaelStJules
1y
(I didn't vote on your comment.) Here's Ben Todd's post on the topic from last November: Despite billions of extra funding, small donors can still have a significant impact I'd especially recommend this part from section 1: So he thought the marginal cost-effectiveness hadn't changed much while funding had dramatically increased within longtermism over these years. I suppose it's possible marginal returns diminish quickly within each year, even if funding is growing quickly over time, though, as long as the capacity to absorb funds at similar cost-effectiveness grows with it. Personally, I'd guess funding students' university programs is much less cost-effective on the margin, because of the distribution of research talent, students should already be fully funded if they have a decent shot of contributing,  the best researchers will already be fully funded without many non-research duties (like being a teaching assistant), and other promising researchers can get internships at AI labs both for valuable experience (80,000 Hours recommends this as a career path!) and to cover their expenses. I also got the impression that the Future Fund's bar was much lower, but I think this was after Ben Todd's post.

Because utility and integrity are wholly independent variables, so there is no reason for us to assume a priori that they will always correlate perfectly. So if we wish to believe that integrity and expected value correlated for SBF, then we must show it. We must actually do the math.

This feels a bit unfair when people (i) have argued that utility and integrity will correlate strongly in practical cases (why use "perfectly" as your bar?), and (ii) that they will do so in ways that will be easy to underestimate if you just "do the math".

You might think t... (read more)

3
MichaelStJules
1y
Utility and integrity coming apart, and in particular deception for gain, is one of the central concerns of AI safety. Shouldn't we similarly be worried at the extremes even in human consequentialists? It is somewhat disanalogous, though, because 1. We don't expect one small group of humans to have so much power without the need to cooperate with others, like might be the case for an AGI taking over. Furthermore, the FTX/Alameda leaders had goals that were fairly aligned with a much larger community (the EA community), whose work they've just made harder. 2. Humans tend to inherently value integrity, including consequentialists. However, this could actually be a bias among consequentialists that consequentialists should seek to abandon, if we think integrity and utility should come apart at the extremes and we should go for the extremes. 3. (EDIT) Humans are more limited cognitively than AGIs, and are less likely to identify net positive deceptive acts and more likely to identify net negative one than AGIs. EDIT: On the other hand, maybe we shouldn't trust utilitarians with AGIs aligned with their own values, either.
2[anonymous]1y
Assuming zero correlation between two variables is standard practice. Because for any given set of two variables, it is very likely that they do not correlate. Anyone that wants to disagree must crunch the numbers and disprove it. That's just how math works. And if we want to treat ethics like math, then we need to actually do some math. We can't have our cake and eat it too

Because a double-or-nothing coin-flip scales; it doesn't stop having high EV when we start dealing with big bucks.

Risky bets aren't themselves objectionable in the way that fraud is, but to just address this point narrowly: Realistic estimates puts risky bets at much worse EV when you control a large fraction of the altruistic pool of money. I think a decent first approximation is that EA's impact scales with the logarithm of its wealth. If you're gambling a small amount of money, that means you should be ~indifferent to 50/50 double or nothing (note th... (read more)

I think marginal returns probably don't diminish nearly as quickly as the logarithm for neartermist cause areas, but maybe that's true for longtermist ones (where FTX/Alameda and associates were disproportionately donating), although my impression is that there's no consensus on this, e.g. 80,000 Hours has been arguing for donations still being very valuable.

(I agree that the downside (damage to the EA community and trust in EAs) is worse than nothing relative to the funds being gambled, but that doesn't really affect the spirit of the argument. It's very easy to underappreciate the downside in practice, though.)

conflicts of interest in grant allocation, work place appointments should be avoided

Worth flagging: Since there are more men than women in EA, I would expect a greater fraction of EA women than EA men to be in relationships with other EAs. (And trying to think of examples off the top of my head supports that theory.) If this is right, the policy "don't appoint people for jobs where they will have conflicts of interest" would systematically disadvantage women.

(By contrast, considering who you're already in a work-relationship with when choosing who to date ... (read more)

Yeah, I agree that multipolar dynamics could prevent lock-in from happening in practice.

I do think that "there is a non-trivial probability that a dominant institution will in fact exist", and also that there's a non-trivial probability that a multipolar scenario will either

  • (i) end via all relevant actors agreeing to set-up some stable compromise institution(s), or
  • (ii) itself end up being stable via each actor making themselves stable and their future interactions being very predictable. (E.g. because of an offence-defence balance strongly favoring defence
... (read more)

If re-running evolution requires simulating the weather and if this is computationally too difficult then re-running evolution may not be a viable path to AGI.

There are many things that prevent us from literally rerunning human evolution. The evolution anchor is not a proof that we could do exactly what evolution did, but instead an argument that if something as inefficient as evolution spit out human intelligence with that amount of compute, surely humanity could do it if we had a similar amount of compute. Evolution is very inefficient — it has itself be... (read more)

For instance we might get WBEs only in hypothetical-2080 but get superintelligent LLMs in 2040, and the people using superintelligent LLMs make the world unrecognisably different by 2042 itself.

I definitely don't just want to talk about what happens / what's feasible before the world becomes unrecognisably different. It seems pretty likely to me that lock-in will only become feasible after the world has become extremely strange. (Though this depends a bit on details of how to define "feasible", and what we count as the start-date of lock-in.)

And I think th... (read more)

Chaos theory is about systems where tiny deviations in initial conditions cause large deviations in what happens in the future. My impression (though I don't know much about the field) is that, assuming some model of a system (e.g. the weather), you can prove things about how far ahead you can predict the system given some uncertainty (normally about the initial conditions, though uncertainty brought about by limited compute that forces approximations should work similarly). Whether the weather corresponds to any particular model isn't really susceptible to proofs, but that question can be tackled by normal science.

Quoting from the post:

Thus, we suspect that an adequate solution to AI alignment could be achieved given sufficient time and effort. (Though whether that will actually happen is a different question, not addressed since our focus is on feasibility rather than likelihood.)

AI doomers tend to agree with this claim.  See e.g. Eliezer in list of lethalities:

None of this is about anything being impossible in principle.  The metaphor I usually use is that if a textbook from one hundred years in the future fell into our hands, containing all of the simpl

... (read more)

Thanks Lizka. I think about section 0.0 as being a ~1-page summary (in between the 1-paragraph summary and the 6-page summary) but I could have better flagged that it can be read that way. And your bullet point summary is definitely even punchier.

Thanks!

You've assumed from the get go that AIs will follow similar reinforcement-learning like paradigms like humans and converge on similar ontologies of looking at the world as humans. You've also assumed these ontologies will be stable - for instance a RL agent wouldn't become superintelligent, use reasoning and then decide to self modify into something that is not an RL agent.

Something like that, though I would phrase it as relying on the claim that it's feasible to build AI systems like that, since the piece is about the feasibility of lock-in. And in... (read more)

I broadly agree with this. For the civilizations that want to keep thinking about their values or the philosophically tricky parts of their strategy, there will be an open question about how convergent/correct their thinking process is (although there's lots you can do to make it more convergent/correct — eg. redo it under lots of different conditions, have arguments be reviewed by many different people/AIs, etc).

And it does seem like all reasonable civilizations should want to do some thinking like this. For those civilizations, this post is just saying t... (read more)

2
Pablo
2y
+1. Maybe a different prefix could be used, e.g. '##' for Wiki entries and '#' for posts.

We used the geometric mean of the samples with the minimum and maximum removed to better deal with extreme outliers, as described in our previous post

I don't see how that's consistent with:

What is the probability that Russia will use a nuclear weapon in Ukraine in the next MONTH?

  • Aggregate probability: 0.0859 (8.6%)
  • All probabilities: 0.27, 0.04, 0.02, 0.001, 0.09, 0.08, 0.07

What is the probability that Russia will use a nuclear weapon in Ukraine in the next YEAR?

  • Aggregate probability: 0.2294 (23%)
  • All probabilities: 0.38, 0.11, 0.11, 0.005, 0.42, 0.2, 0
... (read more)
4
NunoSempere
2y
Geometric mean of the odds.

On the other hand, the critic updated me towards higher numbers on p(nuke london|any nuke). Though I assume Samotsvety have already read it, so not sure how to take that into account. But given that uncertainty, given that that number only comes into play in confusing worlds where everyone's models are broken, and given Samotsvety's 5x higher unconditional number, I will update at least a bit in that direction.

Thanks for the links! (Fyi the first two points to the same page.)

The critic's 0.3 assumes that you'll stay until there's nuclear exchanges between Russia and NATO. Zvi was at 75% if you leave as soon as a conventional war between NATO and Russia starts.

I'm not sure how to compare that situation with the current situation, where it seems more likely that the next escalatory step will be a nuke on a non-NATO target than conventional NATO-Russia warfare. But if you're happy to leave as soon as either a nuke is dropped anywhere or conventional NATO/Russia warfare breaks out, I'm inclined to aggregate those numbers  to something closer to 75% than 50%.

3
Lukas Finnveden
2y
On the other hand, the critic updated me towards higher numbers on p(nuke london|any nuke). Though I assume Samotsvety have already read it, so not sure how to take that into account. But given that uncertainty, given that that number only comes into play in confusing worlds where everyone's models are broken, and given Samotsvety's 5x higher unconditional number, I will update at least a bit in that direction.

Thanks for doing this!

In this squiggle you use "ableToEscapeBefore = 0.5". Does that assume that you're following the policy "escape if you see any tactical nuclear weapons being used in Ukraine"? (Which someone who's currently on the fence about escaping London would presumably do.)

If yes, I would have expected it to be higher than 50%. Do you think very rapid escalation is likely, or am I missing something else?

3
NunoSempere
2y
I was just using 0.5 as a default value. In our March estimate, we were at 0.75, a critic was at 0.3; Zvi Moskovitz was at solomonic 0.5. This time this wasn't really the focus of our estimate, because I was already giving forecasters many questions to estimate, and the situation for that sub-estimate doesn't seem to have been changed as much.

I think this particular example requires an assumption of logarithmically diminishing returns, but is right with that.

(I think the point about roughly quadratic value of information applies more broadly than just for logarithmically diminishing returns. And I hadn't realised it before. Seems important + underappreciated!)

One quirk to note: If a funder (who I want to be well-informed) is 50/50 on S vs L, but my all-things-considered belief is 60/40, then I would value the first 1% they shift towards my position much more than they do (maybe 10x more?)  ... (read more)

3
Owen Cotton-Barratt
2y
I agree with all this. I meant to state that I was assuming logarithmic returns for the example, although I do think some smoothness argument should be enough to get it to work for small shifts.

I think that's right other than that weak upvotes never become worth 3 points anymore (although this doesn't matter on the EA forum, given that no one has 25,000 karma), based on this lesswrong github file linked from the LW FAQ.

Nitpicking:

A property of making directional claims like this is that MacAskill always has 50% confidence in the claim I’m making, since I’m claiming that his best-guess estimate is too high/low.

This isn't quite right. Conservation of expected evidence means that MacAskill's current probabilities should match his expectation of the ideal reasoning process. But for probabilities close to 0, this would typically imply that he assigns higher probability to being too high than to being too low. For example: a 3% probability is compatible with 90% probability th... (read more)

3
elifland
2y
On the point about working on the relevant research agendas, I hadn’t thought about that and kind of want to disallow that from the definition. But I feel the line would then get fuzzy as to what things exactly count as object level work on research agendas. Edit: After thinking more, I will edit the definition to clarify that the people doing the reasoning can only deliberate about current evidence rather than acquire new evidence. This might still be a bit vague but it seems better than not including.
5
elifland
2y
Great point, I'm a bit disappointed in myself for not catching this! I'll strike this out of the post and link to your comment for explanation.

The term "most important century" pretty directly suggests that this century is unique, and I assume that includes its unusually large amount of x-risk (given that Holden seems to think that the development of TAI is both the biggest source of x-risk this century and the reason for why this might be the most important century).

Holden also talks specifically about lock-in, which is one way the time of perils could end.

See e.g. here:

It's possible, for reasons outlined here, that whatever the main force in world events is (perhaps digital people, misaligned A

... (read more)
3
weeatquince
2y
Thanks great – will have a read :-)

The page for the Century Fellowship outlines some things that fellows could do, which are much broader than just university group organizing:

When assessing applications, we will primarily be evaluating the candidate rather than their planned activities, but we imagine a hypothetical Century Fellow may want to:

... (read more)
3
abergal
2y
When we were originally thinking about the fellowship, one of the cases for impact was making community building a more viable career (hence the emphasis in this post), but it’s definitely intended more broadly for people working on the long-term future. I’m pretty unsure how the fellowship will shake out in terms of community organizers vs researchers vs entrepreneurs long-term – we’ve funded a mix so far (including several people who I’m not sure how to categorize / are still unsure about what they want to do).

I'm not saying it's infinite, just that (even assuming it's finite) I assign non-0 probability to different possible finite numbers in a fashion such that the expected value is infinite. (Just like the expected value of an infinite st petersburg challenge is infinite, although every outcome has finite size.)

The topic under discussion is whether pascalian scenarios are a problem for utilitarianism, so we do need to take pascalian scenarios seriously, in this discussion.

I simply don’t believe that infinities exist, and even though 0 isn’t a probability, I reject the probabilistic argument that any possibility of infinity allows them to dominate all EV calculations.

Problems with infinity doesn't go away just because you assume that actual infinities don't exist. Even with just finite numbers, you can face gambles that have infinite expected value, if increasingly good possibilities have insufficiently rapidly diminishing probabilities. And this still causes a lot of problems.

(I also don't think that's an esoteric possib... (read more)

6
AppliedDivinityStudies
2y
Under mainstream conceptions of physics (as I loosely understand them), the number of possible lives in the future is unfathomably large, but not actually infinite.
Load more