All of steve2152's Comments + Replies

[Discussion] Best intuition pumps for AI safety

One of my theories here is that it's helpful to pivot quickly towards "here's an example concrete research problem that seem hard but not impossible, and people are working on it, and not knowing the solution seems obviously problematic". This is good for several reasons, including "pattern-matching to serious research, safety engineering, etc., rather than pattern-matching to sci-fi comics", providing a gentler on-ramp (as opposed to wrenching things like "your children probably won't die of natural causes" or whatever), providing food for thought, etc. Of course this only works if you can engage in the technical arguments. Brian Christian's book is the extreme of this approach.

Why aren't you freaking out about OpenAI? At what point would you start?

Vicarious and Numenta are both explicitly trying to build AGI, and neither does any safety/alignment  research whatsoever. I don't think this fact is particularly relevant to OpenAI, but I do think it's an important fact in its own right, and I'm always looking for excuses to bring it up.  :-P

Anyone who wants to talk about Vicarious or Numenta in the context of AGI safety/alignment, please DM or email me.  :-)

2AppliedDivinityStudies3moIn the absence of rapid public progress, my default assumption is that "trying to build AGI" is mostly a marketing gimmick. There seem to be several other companies like this, e.g.: [] But it is possible they're just making progress in private, or might achieve some kind of unexpected breakthrough. I guess I'm just less clear about how to handle these scenarios. Maybe by tracking talent flows, which is something the AI Safety community has been trying to do for a while.
Why does (any particular) AI safety work reduce s-risks more than it increases them?

I don't really distinguish between effects by order*

I agree that direct and indirect effects of an action are fundamentally equally important (in this kind of outcome-focused context) and I hadn't intended to imply otherwise.

Why does (any particular) AI safety work reduce s-risks more than it increases them?

Hmm, it seems to me (and you can correct me) that we should be able to agree that there are SOME technical AGI safety research publications that are positive under some plausible beliefs/values and harmless under all plausible beliefs/values, and then we don't have to talk about cluelessness and tradeoffs, we can just publish them.

And we both agree that there are OTHER technical AGI safety research publications that are positive under some plausible beliefs/values and negative under others. And then we should talk about your portfolios etc. Or more si... (read more)

3MichaelStJules4moYa, I think this is the crux. Also, considerations like the cosmic ray flips a bit tend to force a lot of things into the second category when they otherwise wouldn't have been, although I'm not specifically worried about cosmic ray bit flips, since they seems sufficiently unlikely and easy to avoid. (Fair.) This is actually what I'm thinking is happening, though (not like the firefighter example), but we aren't really talking much about the specifics. There might indeed be specific cases where I agree that we shouldn't be clueless if we worked through them, but I think there are important potential tradeoffs between incidental and agential s-risks, between s-risks and other existential risks, even between the same kinds of s-risks, etc., and there is a ton of uncertainty in the expected harm from these risks, so much that it's inappropriate to use a single distribution (without sensitivity analysis to "reasonable" distributions, and with this sensitivity analysis, things look ambiguous), similar to this example [] , and we're talking about "sweetening" one side or the other i, but that's totally swamped by our uncertainty. What I have in mind is more symmetric in upsides and downsides (or at least, I'm interested in hearing why people think it isn't in practice), and I don't really distinguish between effects by order*. My post points out potential reasons that I actually think could dominate. The standard I'm aiming for is "Could a reasonable person disagree?", and I default to believing a reasonable person could disagree when I point out such tradeoffs until we actually carefully work through them in detail and it turns out it's pretty unreasonable to disagree. *Although thinking more about it now, I suppose longer chains are more fragile and likely to have unaccounted for effects going in the opposite direction, so
Why does (any particular) AI safety work reduce s-risks more than it increases them?

In practice, we can't really know with certainty that we're making AI safer, and without strong evidence/feedback, our judgements of tradeoffs may be prone to fairly arbitrary subjective judgements, motivated reasoning and selection effects.

This strikes me as too pessimistic. Suppose I bring a complicated new board game to a party. Two equally-skilled opposing teams each get a copy of the rulebook to study for an hour before the game starts. Team A spends the whole hour poring over the rulebook and doing scenario planning exercises. Team B immediately thro... (read more)

3MichaelStJules4moIf you aren't publishing anything, then sure, research into what to do seems mostly harmless (other than opportunity costs) in expectation, but it doesn't actually follow that it would necessarily be good in expectation, if you have enough deep uncertainty (or complex cluelessness); I think this example [] illustrates this well, and is basically the kind of thing I'm worried about all of the time now. In the particular case of sign flip errors, I do think it was useful for me to know about this consideration and similar ones, and I act differently than I would have otherwise as a result, but one of the main effects since learning about these kinds of s-risks is that I'm (more) clueless about basically every intervention now, and am looking to portfolios and hedging [] . If you are publishing, and your ethical or empirical views are sufficiently different from others working on the problem so that you make very different tradeoffs, then that could be good, bad or ambiguous. For example, if you didn't really care about s-risks, then publishing a useful considerations for those who are concerned about s-risks might take attention away from your own priorities, or it might increase cooperation, and the default position to me should be deep uncertainty/cluelessness here, not that it's good in expectation or bad in expectation or 0 in expectation. Maybe you can eliminate this ambiguity or at least constrain its range to something relatively insignificant by building a model, doing a sensitivity analysis, etc., but a lot of things don't work out, and the ambiguity could be so bad that it infects everything else. This is roughly where I am now: I have considerations that result in complex cluelessness about AI-related intervent
Why does (any particular) AI safety work reduce s-risks more than it increases them?

Hmm, just a guess, but …

  • Maybe you're conceiving of the field as "AI alignment", pursuing the goal "figure out how to bring an AI's goals as close as possible to a human's (or humanity's) goals, in their full richness" (call it "ambitious value alignment")
  • Whereas I'm conceiving the field as "AGI safety", with the goal "reduce the risk of catastrophic accidents involving AGIs".

"AGI safety research" (as I think of it) includes not just how you would do ambitious value alignment, but also whether you should do ambitious value alignment. In fact, AGI safety res... (read more)

3MichaelStJules4moI agree that this is an important distinction, but it seems hard to separate them in practice. In practice, we can't really know with certainty that we're making AI safer, and without strong evidence/feedback, our judgements of tradeoffs may be prone to fairly arbitrary subjective judgements, motivated reasoning and selection effects. Some AI safety researchers are doing technical research on value learning/alignment, like (cooperative) inverse reinforcement learning, and doing this research may contribute to further research on the topic down the line and eventual risky ambitious value alignment, whether or not "we" end up concluding that it's too risky. Furthermore, when it matters most, I think it's unlikely there will be a strong and justified consensus in favour of this kind of research (given wide differences in beliefs about the likelihood of worst cases and/or differences in ethical views), and I think there's at least a good chance there won't be any strong and justified consensus at all. To me, the appropriate epistemic state with regards to value learning research (or at least its publication) is one of complex cluelessness, and it's possible this cluelessness could end up infecting AGI safety as a cause in general, depending on how large the downside risks could be (which explicit modelling with sensitivity analysis could help us check). Also, it's not just AI alignment research that I'm worried about, since I see potential for tradeoffs more generally between failure modes. Preventing unipolar takeover or extinction may lead to worse outcomes (s-risk/hyperexistential risks), but maybe (this is something to check) those worse outcomes are easier to prevent with different kinds of targeted work and we're sufficiently invested in those. I guess the question would be whether, looking at the portfolio of things the AI safety community is working on, are we increasing any risks (in a way that isn't definitely made up for by reductions in other risks)? Each
Why does (any particular) AI safety work reduce s-risks more than it increases them?


(Incidentally, I don't claim to have an absolutely watertight argument here that AI alignment research couldn't possibly be bad for s-risks, just that I think the net expected impact on s-risks is to reduce them.)

If s-risks were increased by AI safety work near (C), why wouldn't they also be increased near (A), for the same reasons?

I think suffering minds are a pretty specific thing, in the space of "all possible configurations of matter". So optimizing for something random (paperclips, or "I want my field-of-view to be all white", etc.) would almos... (read more)

3MichaelStJules4moI think this is a reasonable reading of my original post, but I'm actually not convinced trying to get closer to the bullseye reduces s-risks on net even if we're guaranteed to hit the dartboard, for reasons given in my other comments here and in the page on hyperexistential risks [], which kokotajlod shared.
Why does (any particular) AI safety work reduce s-risks more than it increases them?

Sorry I'm not quite sure what you mean. If we put things on a number line with (A)=1, (B)=2, (C)=3, are you disagreeing with my claim "there is very little probability weight in the interval ", or with my claim "in the interval , moving down towards 1 probably reduces s-risk", or with both, or something else?

4MichaelStJules4moI'm disagreeing with both (or at least am not convinced by either; I'm not confident either way). I think your description of (B) might apply to anything strictly between (A) and (C), so it would be kind of arbitrary to pick out any particular point, and the argument should apply along the whole continuum or else needs more to distinguish these two intervals. If s-risks were increased by AI safety work near (C), why wouldn't they also be increased near (A), for the same reasons? Do you have some more specific concrete hurdle(s) in AI alignment/safety in mind? I think it could still be the case along this interval that more AI safety work makes the AI more interested in sentience and increases the likelihood of an astronomical number of additional sentient beings being created (by the AI that's more aligned or others interacting with it), and so may increase s-risks. And, in particular, if humans are out of the loop due to extinction (which might have otherwise been prevented with more AI safety work), that could be a big loss in interest in sentience that might have otherwise backfired for s-risks.
Why does (any particular) AI safety work reduce s-risks more than it increases them?

[note that I have a COI here]

Hmm, I guess I've been thinking that the choice is between (A) "the AI is trying to do what a human wants it to try to do" vs (B) "the AI is trying to do something kinda weirdly and vaguely related to what a human wants it to try to do". I don't think (C) "the AI is trying to do something totally random" is really on the table as a likely option, even if the AGI safety/alignment community didn't exist at all.

That's because everybody wants the AI to do the thing they want it to do, not just long-term AGI risk people. And I think... (read more)

6MichaelStJules4moI would think there's kind of a continuum between each of the three options, and AI safety work shifts the distribution, making things closer to (C) less likely and things closer to (A) more likely. More or fewer of our values could be represented, and that could be good or bad, and related to the risks of extinction. It's not actually clear to me that moving in this direction is preferable from an s-risk perspective, since there could be more interest in creating more sentience overall and greater risks from conflict with others.
evelynciara's Shortform

The main argument of Stuart Russell's book focuses on reward modeling as a way to align AI systems with human preferences.

Hmm, I remember him talking more about IRL and CIRL and less about reward modeling. But it's been a little while since I read it, could be wrong.

If it's really difficult to write a reward function for a given task Y, then it seems unlikely that AI developers would deploy a system that does it in an unaligned way according to a misspecified reward function. Instead, reward modeling makes it feasible to design an AI system to do the task

... (read more)
[Creative Writing Contest] The Reset Button

I really liked this!!!

Since you asked for feedback, here's a little suggestion, take it or leave it: I found a couple things at the end slightly out-of-place, in particular "If you choose to tackle the problem of nuclear security, what angle can you attack the problem from that will give you the most fulfillment?" and "Do any problems present even bigger risks than nuclear war?"

Immediately after such an experience, I think the narrator would not be thinking about option of not bothering to work on nuclear security because other causes are more important, n... (read more)

4Joshua Ingle4moFantastic comments, thank you! I included the bit about personal fulfillment because it's such an important component of being able to sustain an effective career long term, but in retrospect I was so focused on including as many EA ideas as I could that I didn't notice how out of place that sentiment is at that point in the story. I removed both that sentence and the one about more important causes, and I added a variant of your suggested replacement sentence.
A mesa-optimization perspective on AI valence and moral patienthood

Oh, you said "evolution-type optimization", so I figured you were thinking of the case where the inner/outer distinction is clear cut. If you don't think the inner/outer distinction will be clear cut, then I'd question whether you actually disagree with the post :) See the section defining what I'm arguing against, in particular the "inner as AGI" discussion.

2jacobpfau4moOk, seems like this might have been more a terminological misunderstanding on my end. I think I agree with what you say here, 'What if the “Inner As AGI” criterion does not apply? Then the outer algorithm is an essential part of the AGI’s operating algorithm'.
A mesa-optimization perspective on AI valence and moral patienthood

Nah, I'm pretty sure the difference there is "Steve thinks that Jacob is way overestimating the difficulty of humans building AGI-capable learning algorithms by writing source code", rather than "Steve thinks that Jacob is way underestimating the difficulty of computationally recapitulating the process of human brain evolution".

For example, for the situation that you're talking about (I called it "Case 2" in my post) I wrote "It seems highly implausible that the programmers would just sit around for months and years and decades on end, waiting patiently fo... (read more)

2jacobpfau4moOk, interesting. I suspect the programmers will not be able to easily inspect the inner algorithm, because the inner/outer distinction will not be as clear cut as in the human case. The programmers may avoid sitting around by fiddling with more observable inefficiencies e.g. coming up with batch-norm v10.
A mesa-optimization perspective on AI valence and moral patienthood

AlphaGo has a human-created optimizer, namely MCTS. Normally people don't use the term "mesa-optimizer" for human-created optimizers.

Then maybe you'll say "OK there's a human-created search-based consequentialist planner, but the inner loop of that planner is a trained ResNet, and how do you know that there isn't also a search-based consequentialist planner inside each single run through the ResNet?"

Admittedly, I can't prove that there isn't. I suspect that there isn't, because there seems to be no incentive for that (there's already a search-based consequentialist planner!), and also because I don't think ResNets are up to such a complicated task.

3ofer4mo(I don't know/remember the details of AlphaGo, but if the setup involves a value network that is trained to predict the outcome of an MCTS-guided gameplay, that seems to make it more likely that the value network is doing some sort of search during inference.)
AI timelines and theoretical understanding of deep learning

I find most justifications and arguments made in favor of a timeline of less than 50 years to be rather unconvincing. 

If we don't have convincing evidence in favor of a timeline <50 years, and we also don't have convincing evidence in favor of a timeline ≥50 years, then we just have to say that this is a question on which we don't have convincing evidence of anything in particular. But we still have to take whatever evidence we have and make the best decisions we can. ¯\_(ツ)_/¯ 

(You don't say this explicitly but your wording kinda implies that... (read more)

1Venky10244moGreat points again! I have only cursorily examined the links you've shared (bookmarked them for later) but I hope the central thrust of what I am saying does not depend too strongly on being closely familiar with the contents of those. A few clarifications are in order. I am really not sure about AGI timelines and that's why I am reluctant to attach any probability to it. For instance, the only reason I believe that there is less than 50% chance that we will have AGI in the next 50 years is because we have not seen it yet and IMO it seems rather unlikely to me that the current directions will lead us there. But that is a very weak justification. What I do know is that there has to be some radical qualitative change for artificial agents to go from excelling in narrow tasks to developing general intelligence. That said, it may seem like nit-picking but I do want to draw the distinction between "not significant progress" and "no progress at all" towards AGI. Not only am I stating the former, I have no doubt that we have made incredible progress with algorithms in general. I am less convinced about how much those algorithms help us get closer towards an AGI. (In hindsight, it may turn out that our current deep learning approaches such as GANs contain path-breaking proto-AGI ideas /principles, but I am unable to see it that way). If we consider a scale of 0-100 where 100 represents AGI attainment and 0 is some starting point in the 1950s, I have no clear idea whether the progress we've made thus far is close to 5 or 0.5 or even 0.05. I have no strong arguments to justify one or the other because I am way too uncertain about how far the final stage is. There can also be no question with respect to the other categories of progress that you have highlighted such as compute power and infrastructure and large datasets -indeed I see these as central to the remarkable performance we have come to witness with deep learning models. The perspective I have is that while ac
A mesa-optimization perspective on AI valence and moral patienthood

most contemporary progress on AI happens by running base-optimizers which could support mesa-optimization

GPT-3 is of that form, but AlphaGo/MuZero isn't (I would argue).

I'm not sure how to settle whether your statement about "most contemporary progress" is right or wrong. I guess we could count how many papers use model-free RL vs model-based RL, or something? Well anyway, given that I haven't done anything like that, I wouldn't feel comfortable making any confident statement here. Of course you may know more than me! :-)

If we forget about "contemporary pr... (read more)

2jacobpfau4moThanks for the link. I’ll have to do a thorough read through your post in the future. From scanning it, I do disagree with much of it, many of those points of disagreement were laid out by previous commenters. One point I didn’t see brought up: IIRC the biological anchors paper suggests we will have enough compute to do evolution-type optimization before the end of the century. So even if we grant your claim that learning to learn is much harder to directly optimize for, I think it’s still a feasible path to AGI. Or perhaps you think evolution like optimization takes more compute than the biological anchors paper claims?
2ofer4moI don't see why. The NNs in AlphaGo and MuZero were trained using some SGD variant (right?), and SGD variants can theoretically yield mesa-optimizers.
AI timelines and theoretical understanding of deep learning

Have you read ?

I do agree that there are many good reasons to think that AI practitioners are not AI forecasting experts, such as the fact that they're, um, obviously not—they generally have no training in it and have spent almost no time on it, and indeed they give very different answers to seemingly-equivalent timelines questions phrased differently. This is a reason to discount the timelines that come from AI practitioner surveys, in favor of whatever other forecasting methods / heuristics yo... (read more)

A mesa-optimization perspective on AI valence and moral patienthood

Let's say a human writes code more-or-less equivalent to the evolved "code" in the human genome. Presumably the resulting human-brain-like algorithm would have valence, right? But it's not a mesa-optimizer, it's just an optimizer. Unless you want to say that the human programmers are the base optimizer? But if you say that, well, every optimization algorithm known to humanity would become a "mesa-optimizer", since they tend to be implemented by human programmers, right? So that would entail the term "mesa-optimizer" kinda losing all meaning, I think. Sorry if I'm misunderstanding.

2jacobpfau4moCertainly valenced processing could emerge outside of this mesa-optimization context. I agree that for "hand-crafted" (i.e. no base-optimizer) systems this terminology isn't helpful. To try to make sure I understand your point, let me try to describe such a scenario in more detail: Imagine a human programmer who is working with a bunch of DL modules and interpretability tools and programming heuristics which feed into these modules in different ways -- in a sense the opposite end of the spectrum from monolithic language models. This person might program some noxiousness heuristics that input into a language module. Those might correspond to a Phenumb-like phenomenology. This person might program some other noxiousness heuristics that input into all modules as scalars. Those might end up being valenced or might not, hard to say. Without having thought about this in detail, my mesa-optimization framing doesn't seem very helpful for understanding this scenario. Ideally we'd want a method for identifying valence which is more mechanistic that mine. In the sense that it lets you identify valence in a system just by looking inside the system without looking at how it was made. All that said, most contemporary progress on AI happens by running base-optimizers which could support mesa-optimization, so I think it's quite useful to develop criterion which apply to this context. Hopefully this answers your question and the broader concern, but if I'm misunderstanding let me know.
It takes 5 layers and 1000 artificial neurons to simulate a single biological neuron [Link]

Addendum: In the other direction, one could point out that the authors were searching for "an approximation of an approximation of a neuron", not "an approximation of a neuron". (insight stolen from here.) Their ground truth was a fancier neuron model, not a real neuron. Even the fancier model is a simplification of real life. For example, if I recall correctly, neurons have been observed to do funny things like store state variables via changes in gene expression. Even the fancier model wouldn't capture that. As in my parent comment, I think these kinds o... (read more)

It takes 5 layers and 1000 artificial neurons to simulate a single biological neuron [Link]

It's possible much of that supposed additional complexity isn't useful

Yup! That's where I'd put my money.

It's a forgone conclusion that a real-world system has tons of complexity that is not related to the useful functions that the system performs. Consider, for example, the silicon transistors that comprise digital chips—"the useful function that they perform" is a little story involving words like "ON" and "OFF", but "the real-world transistor" needs three equations involving 22 parameters, to a first approximation!

By the same token, my favorite paper on... (read more)

6steve21525moAddendum: In the other direction, one could point out that the authors were searching for "an approximation of an approximation of a neuron", not "an approximation of a neuron". (insight stolen from here [].) Their ground truth was a fancier neuron model, not a real neuron. Even the fancier model is a simplification of real life. For example, if I recall correctly, neurons have been observed to do funny things like store state variables via changes in gene expression. Even the fancier model wouldn't capture that. As in my parent comment, I think these kinds of things are highly relevant to simulating worms, and not terribly relevant to reverse-engineering the algorithms underlying human intelligence.
How to get more academics enthusiastic about doing AI Safety research?

See here, the first post is a video of a research meeting where he talks dismissively about Stuart Russell's argument, and then the ensuing forum discussion features a lot of posts by me trying to sell everyone on AI risk :-P

(Other context here.)

3MaxRa5moPerfect, so he appreciated it despite finding the accompanying letter pretty generic, and thought he received it because someone (the letter listed Max Tegmark, Joshua Bengio and Tim O’Reilly, though w/o signatures) believed he’d find it interesting and that the book is important for the field. Pretty much what one could hope for. And thanks for the work trying to get them to take this more seriously, would be really great if you could find more neuroscience people to contribute to AI safety.
How to get more academics enthusiastic about doing AI Safety research?
  • There was a 2020 documentary We Need To Talk About AI. All-star lineup of interviewees! Stuart Russell, Roman Yampolskiy, Max Tegmark, Sam Harris, Jurgen Schmidhuber, …. I've seen it, but it appears to be pretty obscure, AFAICT.
  • I happened to watch the 2020 Melissa McCarthy film Superintelligence yesterday. It's umm, not what you're looking for. The superintelligent AI's story arc was a mix of 20% arguably-plausible things that experts say about superintelligent AGI, and 80% deliberately absurd things for comedy. I doubt it made anyone in the audience think
... (read more)
How to get more academics enthusiastic about doing AI Safety research?

I saw Jeff Hawkins mention (in some online video) that someone had sent Human Compatible to him unsolicited but he didn't say who. And then (separately) a bit later the mystery was resolved: I saw some EA-affiliated person or institution mention that they had sent Human Compatible to a bunch of AI researchers. But I can't remember where I saw that, or who it was.   :-(

2MaxRa5moInteresting anyway, thanks! Did you by any chance notice if he reacted positively or negatively to being send the book? I was a bit worried it might be considered spammy. On the other hand, I remember reading that Andrew Gelman regularly gets send copies of books he might be interested in for him to write a blurp or review, so maybe it's just a thing that happens to scientists and one needn't be worried.
What are the top priorities in a slow-takeoff, multipolar world?

No I don't think we've met! In 2016 I was a professional physicist living in Boston. I'm not sure if I would have even known what "EA" stood for in 2016. :-)

It also seems like the technical problem does get easier in expectation if you have more than one shot. By contrast, I claim, many of the Moloch-style problems get harder.

I agree. But maybe I would have said "less hard" rather than "easier" to better convey a certain mood :-P

It does seem like within technical AI safety research the best work seems to shift away from Agent Foundations type of work and t

... (read more)
What are the top priorities in a slow-takeoff, multipolar world?

I think that "AI alignment research right now" is a top priority in unipolar fast-takeoff worlds, and it's also a top priority in multipolar slow-takeoff worlds. (It's certainly not the only thing to do—e.g. there's multipolar-specific work to do, like the links in Jonas's answer on this page, or here etc.)

(COI note: I myself am doing "AI alignment research right now" :-P )

First of all, in the big picture, right now humanity is simultaneously pursuing many quite different research programs towards AGI (I listed a dozen or so here (see Appendix)). If m... (read more)

3JP Addison5moThanks for your answer. (Just to check, I think you are a different Steve Byrnes than the one I met at Stanford EA in 2016 or so?) I do want to emphasize is that I don't doubt that technical AI safety work is one of the top priorities. It does seem like within technical AI safety research the best work seems to shift away from Agent Foundations type of work and towards neural-nets-specific work. It also seems like the technical problem does get easier in expectation if you have more than one shot. By contrast, I claim, many of the Moloch-style problems get harder.
What EA projects could grow to become megaprojects, eventually spending $100m per year?

(not an expert) My impression is that a perfectly secure OS doesn't buy you much if you use insecure applications on an insecure network etc.

Also, if you think about classified work, the productivity tradeoff is massive: you can't use your personal computer while working on the project, you can't use any of your favorite software while working on the project, you can't use an internet-connected computer while working on the project, you can't have your cell phone in your pocket while talking about the project, you can't talk to people about the project ove... (read more)

5Davidmanheim6moAgreed that secure low level without application security doesn't get you there, which is why I said we need a full stack - and even if it wasn't part of this, redeveloping network infrastructure to be done well and securely seems like a very useful investment. But doing all the normal stuff well on top of systems that still have insecure chips, BIOS, and kernel just means that the exploits move to lower levels - even if there are fewer, the differences between 90% secure and 100% secure is far more important than moving from 50% to 90%. So we need the full stack.
Phil Torres' article: "The Dangerous Ideas of 'Longtermism' and 'Existential Risk'"

Hmm, I guess I wasn't being very careful. Insofar as "helping future humans" is a different thing than "helping living humans", it means that we could be in a situation where the interventions that are optimal for the former are very-sub-optimal (or even negative-value) for the latter. But it doesn't mean we must be in that situation, and in fact I think we're not.

I guess if you think: (1) finding good longtermist interventions is generally hard because predicting the far-future is hard, but (2) "preventing extinction (or AI s-risks) in the next 50 years" ... (read more)

1clq6moTo me, the question is "what are the logical conclusions that longtermism leads to?" The idea that as of today we have not exhausted every intervention available is less relevant in considerations of 100s of thousand and millions of years. I agree. The debate would be whether to follow the moral reasoning of longtermism or not. Something that might be "awful for people alive today" is completely in line with longtermism - it could be the situation. To not support the intervention would constitute a break between theory and practice. I think it is important to address the implications of this funny situation sooner rather than later.
Phil Torres' article: "The Dangerous Ideas of 'Longtermism' and 'Existential Risk'"

I feel like that guy's got a LOT of chutzpah to not-quite-say-outright-but-very-strongly-suggest that the Effective Altruism movement is a group of people who don't care about the Global South. :-P

More seriously, I think we're in a funny situation where maybe there are these tradeoffs in the abstract, but they don't seem to come up in practice.

Like in the abstract, the very best longtermist intervention could be terrible for people today. But in practice, I would argue that most if not all current longtermist cause areas (pandemic prevention, AI risk, prev... (read more)

4clq6moI think you raise a key point about theory of change and observed practice. This "funny situation" means that something is up with the theoretical model. If the tradeoffs do exist in the theoretical model but don't seem to in practice then: * Practice is not actually based on the explicit theory but is instead based on something else, or * the tradeoffs do in fact exist in practice but are not noticed or acknowledged. Both of these would be foundational problems for a movement organized around rationality and evidence based practice.

Speaking of chutzpah, I've never seen anything quite like this:

“We can’t have people posting anything that suggests that Giving What We Can [an organization founded by Ord] is bad,” as Jenkins recalls. These are just a few of several dozen stories that people have shared with me after I went public with some of my own unnerving experiences.

He needs to briefly explain what the acronym 'GWWC' is - because otherwise the sentence will be incomprehensible - but because he wants to paint people as evil genocidal racists who don't care about the poor, he can't explain what type of organization GWWC is, or what the pledge is.

Shallow evaluations of longtermist organizations

Just one guy, but I have no idea how I would have gotten into AGI safety if not for LW ... I had a full-time job and young kids and not-obviously-related credentials. But I could just come out of nowhere in 2019 and start writing LW blog posts and comments, and I got lots of great feedback, and everyone was really nice. I'm full-time now, here's my writings, I guess you can decide whether they're any good :-P

Consciousness research as a cause? [asking for advice]

I agree that there are both interventions that change qualia reports without much changing  (morally important) qualia and interventions that change qualia without much changing qualia reports, and that we should keep both these possibilities in mind when evaluating interventions.

Consciousness research as a cause? [asking for advice]


I think you're emphasizing how qualia reports are not always exactly corresponding to qualia and can't always be taken at face value, and I'm emphasizing that it's incoherent to say that qualia exist but there's absolutely no causal connection whatsoever going from an experienced qualia to a sincere qualia report. Both of those can be true!

The first is like saying "if someone says "I see a rock", we shouldn't immediately conclude that there was a rock in this person's field-of-view. It's a hypothesis we should consider, but not proven." That's total... (read more)

2Linch8moHi, sorry for the very delayed reply. I think one thing I didn't mention in the chain of comments above is that I think it's more plausible that there are interventions that change qualia reports without much changing (morally important) qualia than the reverse: changing important qualia without changing qualia reports. And I gave examples of changing qualia reports without (much) changing qualia, whereas the linked report talks more about changing qualia without substantively changing qualia reports. I can conceive of examples where qualia interventions change qualia but not qualia reports (eg painkillers for extreme pain that humans naturally forget/round down), but they seem more like edge cases than the examples I gave.
Consciousness research as a cause? [asking for advice]

Oh, I think I see.

If someone declares that it feels like time is passing slower for them (now that they're enlightened or whatever), I would accept that as a sincere description of some aspect of their experience. And insofar as qualia exist, I would say that their qualia have changed somehow. But it wouldn't even occur to me to conclude that this person's time is now more valuable per second in a utilitarian calculus, in proportion to how much they say their time slowed down, or that the change in their qualia is exactly literally time-stretching.

I treat ... (read more)

3Linch9moRight, I guess the higher-level thing I'm getting at is that while introspective access is arguably the best tool that we have to access subjective experience in ourselves right now, and stated experiences is arguably the best tool for us to see it in others (well, at least humans), we shouldn't confuse stated experiences as identical to subjective experience. To go with the perception/UFO example, if someone (who believes themself to be truthful) reports seeing an UFO and it later turns out that they "saw" an UFO because their friend pulled a prank on them, or because this was an optical illusion [], then I feel relatively comfortable in saying that they actually had the subjective experience of seeing an UFO. So while external reality did not actually have an UFO, this was an accurate qualia report. In contrast, if their memory later undergoes falsification [], and they misremembered seeing a bird (which at the time they believed it was a bird) as seeing an UFO, then they only had the subjective experience of remembering seeing an UFO, not the actual subjective experience of seeing an UFO. Some other examples: 1. If I were to undergo surgery, I would pay more money for a painkiller that numbs my present experience of pain than I would pay for a painkiller that removes my memory of pain (and associated trauma etc), though I would pay nonzero dollars for the later. This is because my memory of pain is an experience of an experience, not identical with the original experience itself. 2. Many children with congenital anosmia [] (being born without a sense of smell) act as if they have a sense of smell until tested. While I think it's reasonable to say that they have some smell-adjacent qualia/subjective experiences, I'd be surprised if they hallucinated qualia identical to the experiences of people with a sense of smell, and I would be i
Consciousness research as a cause? [asking for advice]

Interesting... I guess I would have assumed that, if someone says their subjective experience of time has changed, then their time-related qualia has changed, kinda by definition. If meanwhile their reaction time hasn't changed, well, that's interesting but I'm not sure I care... (I'm not really sure of the definitions here.)

6Linch9moLet me put it a different way. Suppose we simulate Bob's experiences on a computer. From a utilitarian lens, if you can run Bob on a computational substrate that goes 100x faster, there's a strong theoretical case that FastBob is 100x as valuable per minute run (or 100x as disvaluable if Bob's suffering). But if you trick simulatedBob to thinking that he's 100x faster (or if you otherwise distort the output channel so the channel lies to you about the speed), then it seems to be a much harder case to argue that FakeFastBob is indeed 100x faster/more valuable.
Consciousness research as a cause? [asking for advice]

OK, if I understand correctly, the report suggests that qualia may diverge from qualia reports—like, some intervention could change the former without the latter. This just seems really weird to me. Like, how could we possibly know that?

Let's say I put on a helmet with a button, and when you press the button, my qualia radically change, but my qualia reports stay the same. Alice points to me and says "his qualia were synchronized with his  qualia reports, but pressing the button messed that up". Then Bob points to me and says "his qualia were out-of-s... (read more)

2Linch9mo(I have not read the report in question) There are some examples of situations/interventions where I'm reasonably confident that the intervention changes qualia reports more than it changes qualia. The first that jumps to mind is meditation: in the relatively small number of studies I've seen, meditation dramatically changes how people think they perceive time (time feels slower, a minute feels longer, etc), but without noticeable effects on things like reaction speed, cognitive processing of various tasks, etc. This to me is moderate evidence that the subjective experience of the subjective experience of time has changed, but not (or at least not as much) the actual subjective experience of time. Anecdotally, I hear similar reports for recreational drug use (time feels slower but reaction speed doesn't go up...if anything it goes down). This is relevant to altruists because (under many consequentialist ethical theories) extending subjective experience of time for pleasurable experiences seems like a clear win, but the case for extending the subjective experience of the subjective experience of time is much weaker.
Getting a feel for changes of karma and controversy in the EA Forum over time

For what it's worth, I generally downvote a post only when I think "This post should not have been written in the first place", and relatedly I will often upvote posts I disagree with.

If that's typical, then the "controversial" posts you found may be "the most meta-level controversial" rather than "the most object-level controversial", if you know what I mean.

That's still interesting though.

The EA Forum could maybe fairly trivially collect some data on this by sending an alert randomly to a subset of instances of up/down votes across the user population that collects feedback on the reasons for the up/down vote. Obviously it would need to be balanced by ensuring not to cause too much friction for users. 

What do you make of the doomsday argument?

I'm not up on the literature and haven't thought too hard about it, but I'm currently very much inclined to not accept the premise that I should expect myself to be a randomly-chosen person or person-moment in any meaningful sense—as if I started out as a soul hanging out in heaven, then flew down to Earth and landed in a random body, like in that Pixar movie.

I think that "I" am the thought processes going on in a particular brain in a particular body at a particular time—the reference class is not "observers" or "observer-moments" or anything like that, I... (read more)

0HaydnBelfield10moIndeed. Seems supported by a quantum suicide argument - no matter how unlikely the observer, there always has to be a feeling of what-its-like-to-be that observer.
Consciousness research as a cause? [asking for advice]

The "meta-problem of consciousness" is "What is the exact chain of events in the brain that leads people to self-report that they're conscious?". The idea is (1) This is not a philosophy question, it's a mundane neuroscience / CogSci question, yet (2) Answering this question would certainly be a big step towards understanding consciousness itself, and moreover (3) This kind of algorithm-level analysis seems to me to be essential for drawing conclusions about the consciousness of different algorithms, like those of animal brains and AIs.

(For example, a comp... (read more)

1algekalipso9moIn Principia Qualia [] (p. 65-66), Mike Johnson posits: What is happening when we talk about our qualia? If ‘downward causation’ isn’t real, then how are our qualia causing us to act? I suggest that we should look for solutions which describe why we have the sensory illusion of qualia having causal power, without actually adding another causal entity to the universe. I believe this is much more feasible than it seems if we carefully examine the exact sense in which language is ‘about’ qualia. Instead of a direct representational interpretation, I offer we should instead think of language’s ‘aboutness’ as a function of systematic correlations between two things related to qualia: the brain’s logical state (i.e., connectome-level neural activity), particularly those logical states relevant to its self-model, and the brain’s microphysical state (i.e., what the quarks which constitute the brain are doing). In short, our brain has evolved to be able to fairly accurately report its internal computational states (since it was adaptive to be able to coordinate such states with others), and these computational states are highly correlated with the microphysical states of the substrate the brain’s computations run on (the actual source of qualia). However, these computational states and microphysical states are not identical. Thus, we would need to be open to the possibility that certain interventions could cause a change in a system’s physical substrate (which generates its qualia) without causing a change in its computational level (which generates its qualia reports). We’ve evolved toward having our qualia, and our reports about our qualia, being synchronized – but in contexts where there hasn’t been an adaptive pressure to accurately report our qualia, we shouldn’t expect these to be synchronized ‘for free’. The details of precisely how our reports of qualia, and our ground-truth qualia, might diverge will greatly depend on w
3algekalipso10mowrt QRI's take on the causal importance of consciousness - yes, it is one of the core problems that are being addressed. Perhaps see: Breaking Down the Problem of Consciousness [] , and Raising the Table Stakes for Successful Theories of Consciousness [] . wrt the meta-problem, see: Qualia Formalism in the Water Supply: Reflections on The Science of Consciousness 2018 []
2Luise10moI did not know about the meta-problem of consciousness before. I will have to think about this, thank you!
Long-Term Future Fund: Ask Us Anything!

Theiss was very much active as of December 2020. They've just been recruiting so successfully through word-of-mouth that they haven't gotten around to updating the website.

I don't think healthcare and taxes undermine what I said, at least not for me personally. For healthcare, individuals can buy health insurance too. For taxes, self-employed people need to pay self-employment tax, but employees and employers both have to pay payroll tax which adds up to a similar amount, and then you lose the QBI deduction (this is all USA-specific), so I think you come o... (read more)

Long-Term Future Fund: Ask Us Anything!

My understanding is that (1) to deal with the paperwork etc. for grants from governments or government-like bureaucratic institutions, you need to be part of an institution that's done it before; (2) if the grantor is a nonprofit, they have regulations about how they can use their money while maintaining nonprofit status, and it's very easy for them to forward the money to a different nonprofit institution, but may be difficult or impossible for them to forward the money to an individual. If it is possible to just get a check as an individual, I imagine th... (read more)

3gavintaylor1yOne other benefit of a virtual research institute is that they can act as formal employers for independent researchers, which may be desirable for things like receiving healthcare coverage or welfare benefits. Thanks for mentioning Theiss, I didn't know of them before. Their website doesn't look so active now, but it's good to know about the history of the independent research scene.
2Jonas Vollmer1y+1.
What does it mean to become an expert in AI Hardware?

I'm a physicist at a US defense contractor, I've worked on various photonic chip projects and neuromorphic chip projects and quantum projects and projects involving custom ASICs among many other things, and I blog about safe & beneficial AGI as a hobby ... I'm happy to chat if you think that might help, you can DM me :-)

What does it mean to become an expert in AI Hardware?

Just a little thing, but my impression is that CPUs and GPUs and FPGAs and analog chips and neuromorphic chips and photonic chips all overlap with each other quite a bit in the technologies involved (e.g. cleanroom photolithography), as compared to quantum computing which is way off in its own universe of design and build and test and simulation tools (well, several universes, depending on the approach). I could be wrong, and you would probably know better than me. (I'm a bit hazy on everything that goes into a "real" large-scale quantum computer, as oppos... (read more)

Why those who care about catastrophic and existential risk should care about autonomous weapons

Thanks for writing this up!!

Although I have not seen the argument made in any detail or in writing, I and the Future of Life Institute (FLI) have gathered the strong impression that parts of the effective altruism ecosystem are skeptical of the importance of the issue of autonomous weapons systems.

I'm aware of two skeptical posts on EA Forum (by the same person). I just made a tag Autonomous Weapons where you'll find them.

3aaguirre1yThanks for pointing these out. Very frustratingly, I just wrote out a lengthy response (to the first of the linked posts) that this platform lost when I tried to post it. I won't try to reconstruct that but will just note for now that the conclusions and emphases are quite different, probably most in terms of: * Our greater emphasis on the WMD angle and qualitatively different dynamics in future AWs * Our greater emphasis on potential escalation into great-powers wars * While agreeing that international agreement (rather than unilateral eschewing) is the goal, we believe that stigmatization is a necessary precursor to such an agreement.
[Link] "Will He Go?" book review (Scott Aaronson)

I thought "taking tail risks seriously" was kinda an EA thing...? In particular, we all agree that there probably won't be a coup or civil war in the USA in early 2021, but is it 1% likely? 0.001% likely? I won't try to guess, but it sure feels higher after I read that link (including the Vox interview) ... and plausibly high enough to warrant serious thought and contingency planning.

At least, that's what I got out of it. I gave it a bit of thought and decided that I'm not in a position that I can or should do anything about it, but I imagine that some readers might have an angle of attack, especially given that it's still 6 months out.

Critical Review of 'The Precipice': A Reassessment of the Risks of AI and Pandemics

A nice short argument that a sufficiently intelligent AGI would have the power to usurp humanity is Scott Alexander's Superintelligence FAQ Section 3.1.

Critical Review of 'The Precipice': A Reassessment of the Risks of AI and Pandemics

Again, this remark seems explicitly to assume that the AI is maximising some kind of reward function. Humans often act not as maximisers but as satisficers, choosing an outcome that is good enough rather than searching for the best possible outcome. Often humans also act on the basis of habit or following simple rules of thumb, and are often risk averse. As such, I believe that to assume that an AI agent would be necessarily maximising its reward is to make fairly strong assumptions about the nature of the AI in question. Absent these assumptions, it is n

... (read more)
(How) Could an AI become an independent economic agent?

In the longer term, as AI becomes (1) increasingly intelligent, (2) increasingly charismatic (or able to fake charisma), (3) in widespread use, people will probably start objecting to laws that treat AIs as subservient to humans, and repeal them, presumably citing the analogy of slavery.

If the AIs have adorable, expressive virtual faces, maybe I would replace the word "probably" with "almost definitely" :-P

The "emancipation" of AIs seems like a very hard thing to avoid, in multipolar scenarios. There's a strong market force for making charismatic AIs—they

... (read more)
COVID-19 brief for friends and family

Update: this blog post is a much better-informed discussion of warm weather.

COVID-19 brief for friends and family

This blog post suggests (based on Google Search Trends) that other coronavirus infections have typically gone down steadily over the course of March and April. (Presumably the data is dominated by the northern hemisphere.)

3steve21522yUpdate: this blog post [] is a much better-informed discussion of warm weather.
What are the best arguments that AGI is on the horizon?

(I agree with other commenters that the most defensible position is that "we don't know when AGI is coming", and I have argued that AGI safety work is urgent even if we somehow knew that AGI is not soon, because of early decision points on R&D paths; see my take here. But I'll answer the question anyway.) (Also, I seem to be almost the only one coming from this following direction, so take that as a giant red flag...)

I've been looking into the possibility that people will understand the brain's algorithms well enough to make an AGI by copying them (at

... (read more)
Load More