I think that a part of this perception is created by him actively framing his actions in a way compatible with a 'press secretary model of the human mind' (cf. "Why everyone (else) is a hypocrite").
My impression is that he does consciously notice a mistake, and is shaken to some degree. In distancing himself from the aspects of his thinking which led to this mistake, he treats his motivations as more clear-cut than they truly were and pushes against them.
If that story were the full truth, he would not have given these answers which are basically the opposite of "locally and situationally optimal in terms of presenting himself".
I think that we would make a mistake in thinking "well, clearly SBF was a bad person all along, so I and other EAs will not end up making structurally similar mistakes anyway" (I am not trying to imply that this is what you said/think and only add this for completeness). Regarding lessons on a community level, I think that much of the discussion on the Forum in the recent days makes a lot of sense.
Sorry, I don't understand the trickiness in '[...] in a way that makes "importance" tricky'
In my mind, I would basically think about the expected improvement from thinking through my decisions in more detail to be the scale of importance. Here, realizing that I could avoid potential harms by not accelerating AI capabilities could be a large win.
This seems to treat harms and benefits symmetrically. Where does the asymmetry enter? (maybe I am thinking of a different importance scale?)
Stories of this nature are sobering to hear; thank you for posting this - each post like this gets people in the community mentally closer to seeing the base rate of success in the EA community for what it is.
Your writing is enjoyable to read as well - I would read more of it.
I agree. And now I wonder whether someone already did write more about this? And if not, maybe this could be a great project?
I found the 'personal EA stories' in Doing Good Better (Greg Lewis) and Strangers Drowning (well, many of these are not quite about EA, but there are many similarities) very helpful for clarifying what my expectations should or could be.
A book where, say, each chapter follows the EA path of one person with their personal successes, struggles, uncertainties and failures could span the different experiences that people can have with EA. Similarly to how many people found semicycle's story valuable, I could imagine that such a book could be very helpful for actually internalizing that EA is very much a community project where doing the right thing often means that individuals will fail at many of their efforts.
If this book already exists, I would be very happy to know about it :)
I am not sure what the amount of useful training text would be, but the transcripts of the 80k podcast could be useful as a source of 'spoken language' EA thinking.
Regarding "??% rationality" (Scout Mindset is a great choice), my impression is that these did significantly influence some fraction of EAs, but not nearly all .
For HPMOR I think that there are a few arguments against including it: For one, I could imagine that the fictional setting can let the model give unexpected answers that refer to a fictional world if the input accidentally resembles discussions in HPMOR too much (I am not familiar enough with Transformers to say whether this would actually be a problem or not, but it could be very confusing if it starts mentioning Transfiguration as a cure to Alzheimer's).
Also, some characters in there are explicitly malevolent or highly cynical about humanity – I do not think that it would push EA GPT in a good direction to be trained on these.
For a nice selection of rationality texts, the LessWrong books might be a good choice as they contain texts from many different writers, and which were chosen by the LW community to be exemplary.
Regarding "Pascal's Mugging":
I am not the author, so I might well be mistaken. But I think I can relate to the intended meaning more closely than "vaguely shady"
One paragraph is
EA may not in fact be a form of Pascal’s Mugging or fanaticism, but if you take certain presentations of longtermism and X-risk seriously, the demands are sufficiently large that it certainly pattern-matches pretty well to these.
which I read as: "Pascal's mugging" describes a rhetorical move that introduces huge moral stakes into the world-view in order to push people into drastically altering their actions and priorities. I think that this in itself need not be problematic (there can be huge stakes which warrant change in behaviour), but if there is social pressure involved in forcing people to accept the premise of huge moral stakes, things become problematic.
One example is the "child drowning in a pond" thought experiment. It does introduce large moral stakes (the resources you use for conveniences in everyday life could in fact be used to help people in urgent need; and in the thought experiment itself you would decide that the latter is more important) and can be used to imply significant behavioural changes (putting a large fraction of one's resources to helping worse-off people).
If this argument is presented with strong social pressure to not voice objections, that would be a situation which fits under Pascal-mugging in my understanding.
If people are used to this type of rhetorical move, they will become wary as soon as anything along the lines of "there are huge moral stakes which you are currently ignoring and you should completely change your life-goals" is mentioned to them. Assuming this, I think the worry that
[...] the demands are sufficiently large that it certainly pattern-matches pretty well to these.
makes a lot of sense.
Fascinating! I would appreciate an essay arguing for this rather strong claim
My conclusion is that if something is expressed only in writing it cannot reach the absolute majority of the population, any more than a particularly well-written verse in French can permeate the Anglosphere.
I have read weaker versions of how hard successful communication is, such as Double Illusion of Transparency and You Have About Five Words – but I think that your example is even stronger than this and an interesting addition.
Personally, I think I also belong to the group of 2nd-order-illiterate people in that I need to push my concentration a lot in order to read with sufficient care. My default way of reading is nowhere near enough and I need to read a text several times until I feel that it doesn't contain 'new thoughts' even if it is well-written. I do profit a lot from podcasts and lectures, even if it is just by 'watching a person think about the topic' and the content is the same as in a text book.
I have the slight suspicion that the author did not set a clickable link to reduce self-promotion.
I hope it is thus okay if I add it here in the comments https://www.amazon.com/dp/B0BSXHJRBQ
For anyone interested: A Forum post with more background info about the novel is I’ve written a Fantasy Novel to Promote Effective Altruism