In other discussions with foresight on their discord, it was noted that "-strophe" already was linguistically linked to bad outcomes, and changing that seemed implausible, so a different term was likely better.
To quote Dennis Krause, in Jan 2022: "I stumbled upon 'anastrophe' in the german wikipedia, which is more or less =eucatastrophe. But I also think that *strophe always reminds people of catastrophe, because it is the most common." (This echoed Joy, here.)
You're treating utility like a fact, and actual outcomes as irrelevant, they conclusing risk preference is an artifact. But as you admitted, risk aversion over monetary outcomes exist, and it's the transformation to utility that removes it. Similarly, we'd expect risk aversion over non-monetary goods - having children and years of life are actual outcomes, and risk preference is secondary to those. So your example proves too much.
And yes, you can construct situations where "preferences" that are normally risk-averse become risk-loving when you change what concrete outcome you're discussing because you put in place arbitrary rules. So I can similarly make almost anyone risk-loving in money by saying that they die if they have too little money, and they need to double their current money to survive - but that's an artifact of the scenario, and says very little about risk preferences in less constrained scenarios.
You should look at Drexler's work on CAIS, which has been discussed on the forum in the past; https://forum.effectivealtruism.org/topics/comprehensive-ai-services
One fundamental issue is that they aren't providing evidence for their claims about cost effectiveness.
The response I got was quite specific: the volunteer claimed that UNICEF can save a life with just 1€ a day for an average period of 7 months.
If they had or have any reference, that could be evaluated. As-is, it sounds like that's the non-counterfactual treatment cost for sucessful cases, while also ignoring overhead and administrative costs.
I don't disagree with this, but I think it's very likely to stop being true in practice as the tech is commercialized. It won't be perfect, but the current generation of tweaks already pushes it into the range of at least 3-4 9s of reliability for non-adversarial settings, which seems like it will be enough for many applications, and for better work on how to make it even more reliable. More than that, business applications, or a lack of success thereof, will show whether or not this is true in the coming year, well before we hit GPT-5+.