From Specification gaming examples in AI:
Glad it's relevant for you! For questions, I'd probably just stick them in the comments here, unless you think they won't be interesting to anyone but you, in which case DM me.
Thanks, this is really interesting.
One follow-up question: who are safety managers? How are they trained, what's their seniority in the org structure, and what sorts of resources do they have access to?
In the bio case it seems that in at least some jurisdictions and especially historically, the people put in charge of this stuff were relatively low-level administrators, and not really empowered to enforce difficult decisions or make big calls. From your post it sounds like safety managers in engineering have a pretty different role.
Thanks for the kind words!
Can you say more about how either of your two worries work for industrial chemical engineering?
Also curious if you know anything about the legislative basis for such regulation in the US. My impression from the bio standards in the US is that it's pretty hard to get laws passed, so if there are laws for chemical engineering it would be interesting to understand why those were plausible whereas bio ones weren't.
Hi Rose,
To your second question first: I don't know if their are specific laws related to e.g. ASTM standards. But there are laws related to criminal negligence in every country. So if, say, you build a tank and it explodes, and it turns out that you didn't follow the appropriate regulations, you will be held criminally liable - you will pay fines and potentially end up in jail. You may believe that the approach you took was equally safe and/or that it was unrelated to the accident, but you're unlikely to succeed with this defence in court - it's like argu...
Good question.
There's a little bit on how to think about the XPT results in relation to other forecasts here (not much). Extrapolating from there to Samotsvety in particular:
Don't apologise, think it's a helpful point!
I agree that the training computation requirements distribution is more subjective and matters more to the eventual output.
I also want to note that while on your view of the compute reqs distribution, the hardware/spending/algorithmic progress inputs are a rounding error, this isn't true for other views of the compute reqs distribution. E.g. for anyone who does agree with Ajeya on the compute reqs distribution, the XPT hardware/spending/algorithmic progress inputs shift median timelines from ~2050 to ~2090, which...
See here for a mash up of XPT forecasts on catastrophic and extinction risk, with Shulman and Thornley's paper on how much governments should pay to prevent catastrophes.
The follow-up project was on AI specifically, so we don't currently have any data that would allow us to transfer directly to bio and nuclear, alas.
I wasn't around when the XPT questions were being set, but I'd guess that you're right that extinction/catastrophe were chosen because they are easier to operationalise.
On your question about what forecasts on existential risk would have been: I think this is a great question.
FRI actually ran a follow-up project after the XPT to dig into the AI results. One of the things we did in this follow-up project was elicit forecasts on a broader range of outcomes, including some approximations of existential risk. I don't think I can share the results yet, but we're aiming to publish them in August!
Thanks Sanjay!
I agree that both of your bullet points would be good. I also think that the second one is extremely non-trivial - more like something it would be good to have a research team working on than something I could write a section on in a blog post.
There's a sense in which there are already research team equivalents working on it, insofar as lots of forecasting efforts relate to p(crunch time soon). But from my vantage point it doesn't seem like this community has clarity/consensus around what the best indicators of crunch time soon are, or that there are careful analyses of why we should expect those to be good indicators, and that makes me expect that more work is needed.
Thanks; I hadn't checked the Wikipedia current events page much previously, but I really like it.
Do you have any thoughts on how specifically the Wikipedia stuff is biased? I'm imagining that there isn't a general tendency, and it's more that specific entries are biased in specific ways that it's hard to spot if you don't have background knowledge on the area.
Thanks so much for this! If this is pedantry, I am very pro pedantry :)
I think this makes my 'Humans launch 5 objects into space' section sufficiently dubious that I've removed it, but pasting here in the context of your comment:
It’s only in the last 8 years that the number of objects launched into space each day has exceeded 1.
there seems to be a large variance in how comfortable people are with numbers, but I think this is surmountable
Wanting to flag that my background is entirely qualitative, and I spent many years thinking this meant that I couldn't do things with numbers. I now think this is false, they aren't magic, and you don't need to have deep aptitude for maths/technical training/a background in stats to be able to fiddle around with basic numbers in a way that helps you think about things.
I've changed the wording to make it clearer that I mean deaths per human per minute. I don't want to change it to second; for me dying in the next minute is easier to imagine/take seriously than dying in the next second (though I imagine this varies between people).
Thanks for the link to Saulius' post; it's great and I recommend people check it out.
On the trillion wild birds: yeah you're right, it's too high - should be 100 billion instead. Thanks for the spot; have changed.
The number is on p. 89 in the supplementary materials - but importantly it's just aorder of magnitude, rather than a specific estimate. So it's consistent with Tomasik's range.
Thanks for picking this up Wayne!
The mistake I made was number of people: it should have read 115 other people, not one. I did mean minute, and the number of animals is 1/116 to get a number of animals per human, rather than 1/60 to get a number of animals per second.
I've corrected the number now. (Thanks also to someone else who messaged me about the error.)
Thanks Elias, I think you're right.
Isaac, I've tried to make this clearer in the table in the post.
[Also by happy chance this process made me notice that I'd lost all of my footnotes in the process of transferring from google docs, which I've now fixed. Thanks both for indirectly causing me to notice this.]
Yeah, I considered moving more slowly in the way that you suggest. The reasons I'm not doing that feel a bit complicated/hard to articulate, but some of my motivations:
I think this is interesting.
On whether the moral campaign was about morality:
Some other related things I've pulled from my notes are arguments in Brown's review article against Shepherd's view:
Thanks for this point.
I'm actually a bit unsure how true it is that the status element of footbinding was important. Certainly that's an established narrative in the literature (e.g. Shepherd buys it).
Brown, Bossen and Hill have an article I've only skimmed called 'Marriage Mobility and Footbinding in Pre-1949 Rural China: A Reconsideration of Gender, Economics, and Meaning in Social Causation' (link here: https://www.cambridge.org/core/journals/journal-of-asian-studies/article/marriage-mobility-and-footbinding-in-pre1949-rural-china-a-reconsideration-of-g...
Minor point on how you communicate the novelty point: I'm slightly worried about people misreading and thinking 'oh, I have to be super original', and then either neglecting important unoriginal things like reassessing existing work, or twisting themselves into knots to prove how original they are.
I agree with you that all else equal a new insight is more valuable than one others have already had, but as originality is often over-egged in academia, it might be worth paying attention to how you phrase the novelty criterion in particular.
I think a list like this might be useful for other purposes too:
Personally I'm a bit wary of things like this. A few reactions:
I think this is a really good summary of what historians might do, thanks Oscar.
One contextual point is that I think 1 and 2 are something like 'central examples of useful things historians might do', rather than something like 'the main things current historians actually do'.
In particular, my outdated impression from when I studied history is that a lot of historical work is very zoomed in source work that may not involve much integration or summarisation. Some of this work is necessary groundwork for 1 and 2; some of it I think comes from specialisation pressures within the field and doesn't produce much value.
I especially like your points on 2 Ramiro, and the distinction between studying history/what historians do. I'm interested in both of these things, and also agree that 'studying history' is vague and ambiguous.
I'm still confused about what contentful things I'm trying to think about, and so I'm using a kind of empty label, 'history', to point at the cloud of stuff I think might be relevant. My hope was that people would interpret 'history' differently, and I'd get a range of answers that might help me think about what I do and don't mean - and that I might...
Thanks for this Michael, I think that's a good point. I've changed those labels to 'US radical right (see definition)' and 'US radical left (see definition)'. Not perfect, but less misleading.
Yeah, I think that's a good point.
I expect there are things other than ability to take risks that it's worth tracking too - like skill acquisition, demonstrable achievements...
This seems like a useful point, thanks!
It makes me want to give a clarification: the reflections above are just the most important things I happened to learn - not a list of generally most important points to consider when testing fit for research. I think I'd need more research experience to write a good version of the latter thing (though I think my list probably overlaps with it somewhat).
I also want to respond to "you should definitely try [...] before you write off research in general". I think I agree with this, conditional on it being a sensible ide...
Thanks for the responses James, I found them thoughtful and helpful!
A few responses in return:
On your point regarding the methodology you would use to answer these questions, I would definitely be interested to hear more about that as I'll be finalising my research methodology over January.
Quick thoughts:
Another one: Alex Hill and Jaime Sevilla, Attempt at understanding the role of moral philosophy in moral progress (on women’s suffrage and animal rights)
Some more recent things:
Also fwiw, I have read the ACE case studies, and I think that the one on environmentalism is pretty high quality, more so than some of the other things listed here. I'd recommend people interested in working on this stuff to read the environmentalism one.
Thanks for this post James! I found it thought provoking.
Overall, I'm still not sure what I make of your claims. There are a few things contributing to this, including:
Thanks for the informal post! I really liked it, and probably wouldn't have read the whole paper.
I have a few thoughts that I'd be curious for your take on.
I'm a bit unsure how generally the papers you're looking at apply to the broad question of changing values. A few intuitions:
Some examples of policy stuff RSPers have done:
Some examples of community building related stuff RSPers have done:
I think it can be a good fit for either of those groups. Currently most people are more in the academic work category, but we have a few RSPers who are working on more policy engagement style work, and having a fair bit of success.
It's also worth pointing out that plenty of RSPers don't fall neatly into either camp:
Impressiveness: good question, but feels hard to express without going into lots of detail, so I'm going to pass.
Acceptance rate: 9/~150, then 10 out of ~250. We're planning to take 8 in this round. The summer fellowship was 27/~300.
Some support options, briefly:
It could be good if someone wrote an overview of the growing number of fellowships and scholarships in EA (and maybe also other forms of professional EA work). It could include the kind of info given above, and maybe draw inspiration from Larks' overviews of the AI Alignment landscape. I don't think I have seen anything quite like that, but please correct me if I'm wrong.
Thanks for this! I agree that a lot of the value of RSP won't become obvious until after the programme (and also want to flag that as our first cohort is only finishing this autumn, it's still quite uncertain how large this value will be).
At this stage, the best information we have on how things will shape up for scholars after the programme is what our first cohort have lined up to do immediately after the programme - see here.
Thanks, really helpful!