All posts

New & upvoted

May 2024

Frontpage Posts

Quick takes

Trump recently said in an interview (https://time.com/6972973/biden-trump-bird-flu-covid/) that he would seek to disband the White House office for pandemic preparedness. Given that he usually doesn't give specifics on his policy positions, this seems like something he is particularly interested in. I know politics is discouraged on the EA forum, but I thought I would post this to say: EA should really be preparing for a Trump presidency. He's up in the polls and IMO has a >50% chance of winning the election. Right now politicians seem relatively receptive to EA ideas, this may change under a Trump administration.
Is EA as a bait and switch a compelling argument for it being bad? I don't really think so 1. There are a wide variety of baits and switches, from what I'd call misleading to some pretty normal activities - is it a bait and switch when churches don't discuss their most controversial beliefs at a "bring your friends" service? What about wearing nice clothes to a first date? [1] 2. EA is a big movement composed of different groups[2]. Many describe it differently. 3. EA has done so much global health stuff I am not sure it can be described as a bait and switch. eg https://docs.google.com/spreadsheets/d/1ip7nXs7l-8sahT6ehvk2pBrlQ6Umy5IMPYStO3taaoc/edit#gid=9418963 4. EA is way more transparent than any comparable movement. If it is a bait and switch then it does so much more to make clear where the money goes eg (https://openbook.fyi/). On the other hand: 1. I do sometimes see people describing EA too favourably or pushing an inaccurate line.   I think that transparency comes with a feature of allowing anyone to come and say "what's going on there" and that can be very beneficial at avoiding error but also bad criticism can be too cheap.  Overall I don't find this line that compelling. And that parts that are seem largely in the past when EA was smaller (when perhaps it mattered less). Now that EA is big, it's pretty clear that it cares about many different things.  Seems fine.  1. ^ @Richard Y Chappell created the analogy.  2. ^ @Sean_o_h argues that here. 
Quick poll [✅ / ❌]: Do you feel like you don't have a good grasp of Shapley values, despite wanting to?  (Context for after voting: I'm trying to figure out if more explainers of this would be helpful. I still feel confused about some of its implications, despite having spent significant time trying to understand it)
I intend to strong downvote any article about EA that someone posts on here that they themselves have no positive takes on.  If I post an article, I have some reason I liked it. Even a single line. Being critical isn't enough on it's own. If someone posts an article, without a single quote they like, with the implication it's a bad article, I am minded to strong downvote so that noone else has to waste their time on it. 
Do you believe that altruism actually makes people happy? Peter Singer's book argues that people become happier by behaving altruistically, and psychoanalysis also classifies altruism as a mature defense mechanism. However, there are also concerns about pathological altruism and people pleasers. In-depth research data on this is desperately needed.

April 2024

Frontpage Posts

202
· 9d ago · 9m read
26
· 2d ago · 1m read

Quick takes

In this "quick take", I want to summarize some my idiosyncratic views on AI risk.  My goal here is to list just a few ideas that cause me to approach the subject differently from how I perceive most other EAs view the topic. These ideas largely push me in the direction of making me more optimistic about AI, and less likely to support heavy regulations on AI. (Note that I won't spend a lot of time justifying each of these views here. I'm mostly stating these points without lengthy justifications, in case anyone is curious. These ideas can perhaps inform why I spend significant amounts of my time pushing back against AI risk arguments. Not all of these ideas are rare, and some of them may indeed be popular among EAs.) 1. Skepticism of the treacherous turn: The treacherous turn is the idea that (1) at some point there will be a very smart unaligned AI, (2) when weak, this AI will pretend to be nice, but (3) when sufficiently strong, this AI will turn on humanity by taking over the world by surprise, and then (4) optimize the universe without constraint, which would be very bad for humans. By comparison, I find it more likely that no individual AI will ever be strong enough to take over the world, in the sense of overthrowing the world's existing institutions and governments by surprise. Instead, I broadly expect unaligned AIs will integrate into society and try to accomplish their goals by advocating for their legal rights, rather than trying to overthrow our institutions by force. Upon attaining legal personhood, unaligned AIs can utilize their legal rights to achieve their objectives, for example by getting a job and trading their labor for property, within the already-existing institutions. Because the world is not zero sum, and there are economic benefits to scale and specialization, this argument implies that unaligned AIs may well have a net-positive effect on humans, as they could trade with us, producing value in exchange for our own property and services. Note that my claim here is not that AIs will never become smarter than humans. One way of seeing how these two claims are distinguished is to compare my scenario to the case of genetically engineered humans. By assumption, if we genetically engineered humans, they would presumably eventually surpass ordinary humans in intelligence (along with social persuasion ability, and ability to deceive etc.). However, by itself, the fact that genetically engineered humans will become smarter than non-engineered humans does not imply that genetically engineered humans would try to overthrow the government. Instead, as in the case of AIs, I expect genetically engineered humans would largely try to work within existing institutions, rather than violently overthrow them. 2. AI alignment will probably be somewhat easy: The most direct and strongest current empirical evidence we have about the difficulty of AI alignment, in my view, comes from existing frontier LLMs, such as GPT-4. Having spent dozens of hours testing GPT-4's abilities and moral reasoning, I think the system is already substantially more law-abiding, thoughtful and ethical than a large fraction of humans. Most importantly, this ethical reasoning extends (in my experience) to highly unusual thought experiments that almost certainly did not appear in its training data, demonstrating a fair degree of ethical generalization, beyond mere memorization. It is conceivable that GPT-4's apparently ethical nature is fake. Perhaps GPT-4 is lying about its motives to me and in fact desires something completely different than what it professes to care about. Maybe GPT-4 merely "understands" or "predicts" human morality without actually "caring" about human morality. But while these scenarios are logically possible, they seem less plausible to me than the simple alternative explanation that alignment—like many other properties of ML models—generalizes well, in the natural way that you might similarly expect from a human. Of course, the fact that GPT-4 is easily alignable does not immediately imply that smarter-than-human AIs will be easy to align. However, I think this current evidence is still significant, and aligns well with prior theoretical arguments that alignment would be easy. In particular, I am persuaded by the argument that, because evaluation is usually easier than generation, it should be feasible to accurately evaluate whether a slightly-smarter-than-human AI is taking bad actions, allowing us to shape its rewards during training accordingly. After we've aligned a model that's merely slightly smarter than humans, we can use it to help us align even smarter AIs, and so on, plausibly implying that alignment will scale to indefinitely higher levels of intelligence, without necessarily breaking down at any physically realistic point. 3. The default social response to AI will likely be strong: One reason to support heavy regulations on AI right now is if you think the natural "default" social response to AI will lean too heavily on the side of laissez faire than optimal, i.e., by default, we will have too little regulation rather than too much. In this case, you could believe that, by advocating for regulations now, you're making it more likely that we regulate AI a bit more than we otherwise would have, pushing us closer to the optimal level of regulation. I'm quite skeptical of this argument because I think that the default response to AI (in the absence of intervention from the EA community) will already be quite strong. My view here is informed by the base rate of technologies being overregulated, which I think is quite high. In fact, it is difficult for me to name even a single technology that I think is currently clearly underregulated by society. By pushing for more regulation on AI, I think it's likely that we will overshoot and over-constrain AI relative to the optimal level. In other words, my personal bias is towards thinking that society will regulate technologies too heavily, rather than too loosely. And I don't see a strong reason to think that AI will be any different from this general historical pattern. This makes me hesitant to push for more regulation on AI, since on my view, the marginal impact of my advocacy would likely be to push us even further in the direction of "too much regulation", overshooting the optimal level by even more than what I'd expect in the absence of my advocacy. 4. I view unaligned AIs as having comparable moral value to humans: This idea was explored in one of my most recent posts. The basic idea is that, under various physicalist views of consciousness, you should expect AIs to be conscious, even if they do not share human preferences. Moreover, it seems likely that AIs — even ones that don't share human preferences — will be pretrained on human data, and therefore largely share our social and moral concepts. Since unaligned AIs will likely be both conscious and share human social and moral concepts, I don't see much reason to think of them as less "deserving" of life and liberty, from a cosmopolitan moral perspective. They will likely think similarly to the way we do across a variety of relevant axes, even if their neural structures are quite different from our own. As a consequence, I am pretty happy to incorporate unaligned AIs into the legal system and grant them some control of the future, just as I'd be happy to grant some control of the future to human children, even if they don't share my exact values. Put another way, I view (what I perceive as) the EA attempt to privilege "human values" over "AI values" as being largely arbitrary and baseless, from an impartial moral perspective. There are many humans whose values I vehemently disagree with, but I nonetheless respect their autonomy, and do not wish to deny these humans their legal rights. Likewise, even if I strongly disagreed with the values of an advanced AI, I would still see value in their preferences being satisfied for their own sake, and I would try to respect the AI's autonomy and legal rights. I don't have a lot of faith in the inherent kindness of human nature relative to a "default unaligned" AI alternative. 5. I'm not fully committed to longtermism: I think AI has an enormous potential to benefit the lives of people who currently exist. I predict that AIs can eventually substitute for human researchers, and thereby accelerate technological progress, including in medicine. In combination with my other beliefs (such as my belief that AI alignment will probably be somewhat easy), this view leads me to think that AI development will likely be net-positive for people who exist at the time of alignment. In other words, if we allow AI development, it is likely that we can use AI to reduce human mortality, and dramatically raise human well-being for the people who already exist. I think these benefits are large and important, and commensurate with the downside potential of existential risks. While a fully committed strong longtermist might scoff at the idea that curing aging might be important — as it would largely only have short-term effects, rather than long-term effects that reverberate for billions of years — by contrast, I think it's really important to try to improve the lives of people who currently exist. Many people view this perspective as a form of moral partiality that we should discard for being arbitrary. However, I think morality is itself arbitrary: it can be anything we want it to be. And I choose to value currently existing humans, to a substantial (though not overwhelming) degree. This doesn't mean I'm a fully committed near-termist. I sympathize with many of the intuitions behind longtermism. For example, if curing aging required raising the probability of human extinction by 40 percentage points, or something like that, I don't think I'd do it. But in more realistic scenarios that we are likely to actually encounter, I think it's plausibly a lot better to accelerate AI, rather than delay AI, on current margins. This view simply makes sense to me given the enormously positive effects I expect AI will likely have on the people I currently know and love, if we allow development to continue.
Please people, do not treat Richard Hannania as some sort of worthy figure who is a friend of EA. He was a Nazi, and whilst he claims he moderated his views, he is still very racist as far as I can tell. Hannania called for trying to get rid of all non-white immigrants in the US, and the sterilization of everyone with an IQ under 90 indulged in antisemitic attacks on the allegedly Jewish elite, and even post his reform was writing about the need for the state to harass and imprison Black people specifically ('a revolution in our culture or form of government. We need more policing, incarceration, and surveillance of black people' https://en.wikipedia.org/wiki/Richard_Hanania).  Yet in the face of this, and after he made an incredibly grudging apology about his most extreme stuff (after journalists dug it up), he's been invited to Manifiold's events and put on Richard Yetter Chappel's blogroll.  DO NOT DO THIS. If you want people to distinguish benign transhumanism (which I agree is a real thing*) from the racist history of eugenics, do not fail to shun actual racists and Nazis. Likewise, if you want to promote "decoupling" factual beliefs from policy recommendations, which can be useful, do not duck and dive around the fact that virtually every major promoter of scientific racism ever, including allegedly mainstream figures like Jensen, worked with or published with actual literal Nazis (https://www.splcenter.org/fighting-hate/extremist-files/individual/arthur-jensen).  I love most of the people I have met through EA, and I know that-despite what some people say on twitter- we are not actually a secret crypto-fascist movement (nor is longtermism specifically, which whether you like it or not, is mostly about what its EA proponents say it is about.) But there is in my view a disturbing degree of tolerance for this stuff in the community, mostly centered around the Bay specifically. And to be clear I am complaining about tolerance for people with far-right and fascist ("reactionary" or whatever) political views, not people with any particular personal opinion on the genetics of intelligence. A desire for authoritarian government enforcing the "natural" racial hierarchy does not become okay, just because you met the person with the desire at a house party and they seemed kind of normal and chill or super-smart and nerdy.  I usually take a way more measured tone on the forum than this, but here I think real information is given by getting shouty.  *Anyone who thinks it is automatically far-right to think about any kind of genetic enhancement at all should go read some Culture novels, and note the implied politics (or indeed, look up the author's actual die-hard libertarian socialist views.) I am not claiming that far-left politics is innocent, just that it is not racist. 
Animal Justice Appreciation Note Animal Justice et al. v A.G of Ontario 2024 was recently decided and struck down large portions of Ontario's ag-gag law. A blog post is here. The suit was partially funded by ACE, which presumably means that many of the people reading this deserve partial credit for donating to support it. Thanks to Animal Justice (Andrea Gonsalves, Fredrick Schumann, Kaitlyn Mitchell, Scott Tinney), co-applicants Jessica Scott-Reid and Louise Jorgensen, and everyone who supported this work!
Why are April Fools jokes still on the front page? On April 1st, you expect to see April Fools' posts and know you have to be extra cautious when reading strange things online. However, April 1st was 13 days ago and there are still two posts that are April Fools posts on the front page. I think it should be clarified that they are April Fools jokes so people can differentiate EA weird stuff from EA weird stuff that's a joke more easily. Sure, if you check the details you'll see that things don't add up, but we all know most people just read the title or first few paragraphs.
Marcus Daniell appreciation note @Marcus Daniell, cofounder of High Impact Athletes, came back from knee surgery and is donating half of his prize money this year. He projects raising $100,000. Through a partnership with Momentum, people can pledge to donate for each point he gets; he has raised $28,000 through this so far. It's cool to see this, and I'm wishing him luck for his final year of professional play!

March 2024

Frontpage Posts

225
Bella
· 1mo ago · 6m read

Quick takes

You can now import posts directly from Google docs Plus, internal links to headers[1] will now be mapped over correctly. To import a doc, make sure it is public or shared with "eaforum.posts@gmail.com"[2], then use the widget on the new/edit post page: Importing a doc will create a new (permanently saved) version of the post, but will not publish it, so it's safe to import updates into posts that are already published. You will need to click the "Publish Changes" button to update the live post. Everything that previously worked on copy-paste[3] will also work when importing, with the addition of internal links to headers (which only work when importing). There are still a few things that are known not to work: * Nested bullet points (these are working now) * Cropped images get uncropped * Bullet points in footnotes (these will become separate un-bulleted lines) * Blockquotes (there isn't a direct analog of this in Google docs unfortunately) There might be other issues that we don't know about. Please report any bugs or give any other feedback by replying to this quick take, you can also contact us in the usual ways. Appendix: Version history There are some minor improvements to the version history editor[4] that come along with this update: * You can load a version into the post editor without updating the live post, previously you could only hard-restore versions * The version that is live[5] on the post is shown in bold Here's what it would look like just after you import a Google doc, but before you publish the changes. Note that the latest version isn't bold, indicating that it is not showing publicly: 1. ^ Previously the link would take you back to the original doc, now it will take you to the header within the Forum post as you would expect. Internal links to bookmarks (where you link to a specific text selection) are also partially supported, although the link will only go to the paragraph the text selection is in 2. ^ Sharing with this email address means that anyone can access the contents of your doc if they have the url, because they could go to the new post page and import it. It does mean they can't access the comments at least 3. ^ I'm not sure how widespread this knowledge is, but previously the best way to copy from a Google doc was to first "Publish to the web" and then copy-paste from this published version. In particular this handles footnotes and tables, whereas pasting directly from a regular doc doesn't. The new importing feature should be equal to this publish-to-web copy-pasting, so will handle footnotes, tables, images etc. And then it additionally supports internal links 4. ^ Accessed via the "Version history" button in the post editor 5. ^ For most intents and purposes you can think of "live" as meaning "showing publicly". There is a bit of a sharp corner in this definition, in that the post as a whole can still be a draft. To spell this out: There can be many different versions of a post body, only one of these is attached to the post, this is the "live" version. This live version is what shows on the non-editing view of the post. Independently of this, the post as a whole can be a draft or published.
68
akash
1mo
6
David Nash's Monthly Overload of Effective Altruism seems highly underrated, and you should most probably give it a follow. I don't think any other newsletter captures and highlights EA's cause-neutral impartial beneficence better than the Monthly Overload of EA. For example, this month's newsletter has updates about Conferences, Virtual Events, Meta-EA, Effective Giving, Global Health and Development, Careers, Animal Welfare, Organization updates, Grants, Biosecurity, Emissions & CO2 Removal, Environment, AI Safety, AI Governance, AI in China, Improving Institutions, Progress, Innovation & Metascience, Longtermism, Forecasting, Miscellaneous causes and links, Stories & EA Around the World, Good News, and more. Compiling all this must be hard work! Until September 2022, the monthly overloads were also posted on the Forum and received higher engagement than the Substack. I find the posts super informative, so I am giving the newsletter a shout-out and putting it back on everyone's radar!
63
Jason
2mo
8
The government's sentencing memorandum for SBF is here; it is seeking a sentence of 40-50 years. As typical for DOJ in high-profile cases, it is well-written and well-done. I'm not just saying that because it makes many of the same points I identified in my earlier writeup of SBF's memorandum. E.g., p. 8 ("doubling down" rather than walking away from the fraud); p. 43 ("paid in full" claim is highly misleading) [page cites to numbers at bottom of page, not to PDF page #]. EA-adjacent material: There's a snarky reference to SBF's charitable donations "(for which he still takes credit)" (p. 2) in the intro, and the expected hammering of SBF's memo for taking credit for attempting to take credit for donations paid with customer money (p. 95). There's a reference to SBF's "idiosyncratic . . . beliefs around altruism, utilitarianism, and expected value" (pp. 88-89). This leads to the one surprise theme (for me): the need to incapacitate SBF from committing additional crimes (pp. 87, 90). Per the feds, "the defendant believed and appears still to believe that it is rational and necessary for him to take great risks including imposing those risks on others, if he determines that it will serve what he personally deems a worthy project or goal," which contributes to his future dangerousness (p. 89). For predictors: Looking at sentences where the loss was > $100MM and the method was Ponzi/misappropriation/embezzlement, there's a 20-year, two 30-years, a bunch of 40-years, three 50-years, and three 100+-years (pp. 96-97). Interesting item: The government has gotten about $3.45MM back from political orgs, and the estate has gotten back ~$280K (pp. 108-09). The proposed forfeiture order lists recipients, and seems to tell us which ones returned monies to the government (Proposed Forfeiture Order, pp. 24-43). Life Pro Tip: If you are arrested by the feds, do not subsequently write things in Google Docs that you don't want the feds to bring up at your sentencing. Jotting down the idea that "SBF died for our sins" as some sort of PR idea (p. 88; source here) is particularly ill-advised.  My Take: In Judge Kaplan's shoes, I would probably sentence at the high end of the government's proposed range. Where the actual loss will likely be several billion, and the loss would have been even greater under many circumstances, I don't think a consequence of less than two decades' actual time in prison would provide adequate general deterrence -- even where the balance of other factors was significantly mitigating. That would imply a sentence of ~25 years after a prompt guilty plea. Backsolving, that gets us a sentence of ~35 years without credit for a guilty plea. But the balance of other factors is aggravating, not mitigating. Stealing from lots of ordinary people is worse than stealing from sophisticated investors. Outright stealing by someone in a fiduciary role is worse than accounting fraud to manipulate stock prices. We also need to adjust upward for SBF's post-arrest conduct, including trying to hide money from the bankruptcy process, multiple attempts at witness tampering, and perjury on the stand. Stacking those factors would probably take me over 50 years, but like the government I don't think a likely-death-in-prison sentence is necessary here.
59
Jason
2mo
0
SBF's sentencing memorandum is here. On the first page of the intro, we get some quotes about SBF's philanthropy. On the next, we are told he "lived a very modest life" and that any reports of extravagance are fabrications. [N.B.: All page citations are to the typed numbers at the bottom, not to the PDF page number.]  For the forecasters: based on Guidelines calculations in the pre-sentence report (PSR) by Probation, the Guidelines range is 110 years (would be life, but is capped at the statutory max). Probation recommended 100 years. PSRs are sealed, so we'll never see the full rationale on that. The average fraud defendant with a maxed-out offense level, no criminal history, and no special cooperation credit receives a sentence of 283 months. The first part of the memo is about the (now advisory) Sentencing Guidelines used in federal court. The major argument is that there should be no upward adjustment for loss because everyone is probably getting all their money back. Courts have traditionally looked at the greater of actual or "intended" loss, but the memo argues that isn't correct after a recent Supreme Court decision.  As a factual matter, I'm skeptical that the actual loss is $0, especially where much of the improvement is due to increases in the crypto market that customers would have otherwise benefitted from directly. Plus everyone getting money back (including investors who were defrauded) is far from a certain outcome, the appellate courts have been deferential to best-guess loss calculations, and the final Guidelines range would not materially change if the loss amount were (say) $25MM. If I'm the district judge here, I'd probably include some specific statements and findings in my sentencing monologue in an attempt to insulate this issue from appeal. Such as: I'd impose the same sentence no matter what the Guidelines said, because $0 dramatically understates the offense severity and $10B overstates it. There are a few places in which the argument ventures into tone-deaf waters. The argument that SBF wasn't in a position of public or private trust (p. 25-26) seems awfully weak and ill-advised to my eyes. The discussion of possible equity-holder losses (pp. 20-21) also strikes me as dismissive at points. No, equity holders don't get a money-back guarantee, but they are entitled to not be lied to when deciding where to invest. The second half involves a discussion of the so-called 3553 factors that a court must consider in determining a sentence that is sufficient, but not greater than necessary. Pages 41-42 discuss Peter Singer and earning to give, while pages 46-50 continue on about SBF's philanthropy (including a specific reference to GWWC and link to the pledger list on page 46).  Throughout the memo, the defense asserts that FTX was different from various other schemes that were fraudulent from day one (e.g., p. 56). My understanding is that the misuse of customer funds started pretty early in FTX's history, so I don't give this much weight. The memo asserts that SBF was less culpable than various comparators, ranging from Madoff himself (150 years) to Elizabeth Holmes (135 months) (pp. 73-80). The bottom-line request is for a sentence of 63-78 months, which is the Guidelines range if one accepts the loss amount as $0 (p. 89). There are 29 letters in SBF's support by family members, his psychiatrist, Ross Rheingans-Yoo, Kat Woods, and a bunch of names I don't recognize. [Caution: The remainder of this post contains more opinion-laden commentary than what has preceded it!] I generally find the 3553 discussion unpersuasive. The section on "remorse" (pp. 55-56) rings hollow to me, although this is an unavoidable consequence of SBF's trial litigation choices. There is "remorse" that people were injured and impliedly that SBF made various errors in judgment, but there isn't any acknowledgment of wrongdoing. One sentence of note to this audience: "Sam is simply devastated that the advice, mentorship, and funding that he has given to the animal welfare, global poverty, and pandemic prevention movements does not begin to counteract the damage done to them by virtue of their association with him." (p. 55).  I find the discussion of the FTX Foundation to be jarring, such as "Ultimately, the FTX Foundation donated roughly $150 million to charities working on issues such as pandemic prevention, animal welfare, and funding anti-malarial mosquito netting in Africa." (p. 57). Attempting to take credit for sending some of the money you took from customers to charity takes a lot of nerve!  Although the memo asserts that SBF's neurodiversity makes him "uniquely vulnerable" in prison (p. 58), the unfortunate truth is that many convicted criminals have characteristics that make successfully adapting to prison life more difficult than for the average person (e.g., severe mental illness, unusually low intelligence). So I'm not convinced by the memo that he would face an atypical burden that would warrant serious consideration in sentencing.  Although I certainly can't fault counsel for pointing to SBF's positive characteristics, I'm sure Judge Kaplan knows that many of his opportunities to legibly display these characteristics have been enabled by privilege that most people being sentenced in federal court do not have. I'm also not generally convinced by the arguments about general deterrence. In abbreviated form, the argument is that running SBF through the ringer and exposing him to public disgrace is strong enough that a lower sentence (and the inevitable lifetime public stigma) suffices to deter other would-be fraudsters. See pp. 66-67. And there's good evidence that severity of punishment is relatively less important in deterrence.  However, if a tough sentence is otherwise just, I don't think we need a high probability of deterrent effect for an extremely serious offense for extended incarceration to be worth it. Crypto scams are common, and as a practical matter it is difficult to increase certainty and speed of punishment because so much of the problematic conduct happens outside the U.S. So severity is the lever the government has. Moreover, discounts for offenders who have a lot to lose (because they are wealthy already) and/or are seen as having more future productive value seem backward as far as moral desert.  Finally, I think there's potentially value in severity deterrence of someone already committing fraud; if the punishment level is basically maxed out at the $500M customer money you've already put at risk, there is no reason (other than an increased risk of detection) not to put $5B at risk. As the saying goes, "might as well be hanged for a sheep as for a lamb" as the penalty for ovine theft was death either way. Defense recommendations on sentencing are generally unrealistic in cases without specific types of plea deal. This one is no different. Also, the sentencing discussion will sound extremely harsh to at least non-US readers . . . but that's the situation in the US and especially in the federal system. I'd note that SBF's post-arrest decisions will likely have triggered a substantial portion of his sentence. Much has been written about the US "trial penalty," and it is often a problem. However, I don't think a discount of ~25-33% for a prompt guilty plea, as implied by the Guidelines for most offenses (also by the guidelines used in England and Wales) is penologically unjustified or coercive. Instead of that, SBF's sentence is likely to be higher because of multiple attempts at witness tampering and evasive, incredible testimony on the stand. So he could be looking at ~double the sentence he would be facing if he had pled guilty.  He likely could not have gotten the kind of credit for cooperation his co-conspirators received (a "5K.1") because there was no other big fish to rat out. Providing 5K.1 cooperation often reduces sentences by a lot, in my opinion often too much. Given the 5K.1 cooperation and the lesser role, one must exercise caution in using co-conspirator sentences to estimate what SBF would have received if he had promptly accepted responsibility. Finally, I'd view any sentence over ~60 years as de facto life and chosen more for symbolic purposes than to actually increase punishment. Currently, one can receive a ~15% discount for decent behavior in prison, and can potentially serve ~25-33% of the sentence in a halfway house or the like for participating in programs under the First Step Act. It's hard to predict what the next few decades will bring as far as sentencing policy, but the recent trend has been toward expanding the possibilities for early release. So I'd estimate that SBF will actually serve ~75% of his sentence, and probably some portion of it outside of a prison.
If anyone wants to see what making EA enormous might look like, check out Rutger Bregmans' School for Moral Ambition (SMA).  It isn't an EA project (and his accompanying book has a chapter on EA that is quite critical), but the inspiration is clear and I'm sure there will be things we can learn from it.  For their pilot, they're launching in the Netherlands, but it's already pretty huge, and they have plans to launch in the UK and the US next year.  To give you an idea of size, despite the official launch being only yesterday, their growth on LinkedIn is significant. For the 90 days preceding the launch date, they added 13,800 followers (their total is now 16,300). The two EA orgs with the biggest LinkedIn presence I know of are 80k and GWWC. In the same period, 80k gained 1,200 followers (their total is now 18,400), and GWWC gained 700 (their total is now 8,100).[1] And it's not like SMA has been spamming the post button. They only posted 4 times. The growth in followers comes from media coverage and the founding team posting about it on their personal LinkedIn pages (Bregman has over 200k followers).  1. ^ EA Netherlands gained 137, giving us a total of 2900 - wooo!

February 2024

Frontpage Posts

184
· 3mo ago · 6m read

Quick takes

I met Australia's Assistant Minister for Defence last Friday. I asked him to write an email to the Minister in charge of AI, asking him to establish an AI Safety Institute. He said he would. He also seemed on board with not having fully autonomous AI weaponry. All because I sent one email asking for a meeting + had said meeting.  Advocacy might be the lowest hanging fruit in AI Safety.
85
Linch
3mo
2
Going forwards, LTFF is likely to be a bit more stringent (~15-20%?[1] Not committing to the exact number) about approving mechanistic interpretability grants than in grants in other subareas of empirical AI Safety, particularly from junior applicants. Some assorted reasons (note that not all fund managers necessarily agree with each of them): * Relatively speaking, a high fraction of resources and support for mechanistic interpretability comes from other sources in the community other than LTFF; we view support for mech interp as less neglected within the community. * Outside of the existing community, mechanistic interpretability has become an increasingly "hot" field in mainstream academic ML; we think good work is fairly likely to come from non-AIS motivated people in the near future. Thus overall neglectedness is lower. * While we are excited about recent progress in mech interp (including some from LTFF grantees!), some of us are suspicious that even success stories in interpretability are that large a fraction of the success story for AGI Safety. * Some of us are worried about field-distorting effects of mech interp being oversold to junior researchers and other newcomers as necessary or sufficient for safe AGI. * A high percentage of our technical AIS applications are about mechanistic interpretability, and we want to encourage a diversity of attempts and research to tackle alignment and safety problems. We wanted to encourage people interested in working on technical AI safety to apply to us with proposals for projects in areas of empirical AI safety other than interpretability. To be clear, we are still excited about receiving mechanistic interpretability applications in the future, including from junior applicants. Even with a higher bar for approval, we are still excited about funding great grants. We tentatively plan on publishing a more detailed explanation about the reasoning later, as well as suggestions or a Request for Proposals for other promising research directions. However, these things often take longer than we expect/intend (and may not end up happening), so I wanted to give potential applicants a heads-up. (x-posted from LW) 1. ^ Operationalized as "assuming similar levels of funding in 2024 as in 2023, I expect that about 80-85% of the mech interp projects we funded in 2023 will be above the 2024 bar.
I'm curious why there hasn't been more work exploring a pro-AI or pro-AI-acceleration position from an effective altruist perspective. Some points: 1. Unlike existential risk from other sources (e.g. an asteroid) AI x-risk is unique because humans would be replaced by other beings, rather than completely dying out. This means you can't simply apply a naive argument that AI threatens total extinction of value to make the case that AI safety is astronomically important, in the sense that you can for other x-risks. You generally need additional assumptions. 2. Total utilitarianism is generally seen as non-speciesist, and therefore has no intrinsic preference for human values over unaligned AI values. If AIs are conscious, there don't appear to be strong prima facie reasons for preferring humans to AIs under hedonistic utilitarianism. Under preference utilitarianism, it doesn't necessarily matter whether AIs are conscious. 3. Total utilitarianism generally recommends large population sizes. Accelerating AI can be modeled as a kind of "population accelerationism". Extremely large AI populations could be preferable under utilitarianism compared to small human populations, even those with high per-capita incomes. Indeed, humans populations have recently stagnated via low population growth rates, and AI promises to lift this bottleneck.  4. Therefore, AI accelerationism seems straightforwardly recommended by total utilitarianism under some plausible theories. Here's a non-exhaustive list of guesses for why I think EAs haven't historically been sympathetic to arguments like the one above, and have instead generally advocated AI safety over AI acceleration (at least when these two values conflict): * A belief that AIs won't be conscious, and therefore won't have much moral value compared to humans. * But why would we assume AIs won't be conscious? For example, if Brian Tomasik is right, consciousness is somewhat universal, rather than being restricted to humans or members of the animal kingdom. * I also haven't actually seen much EA literature defend this assumption explicitly, which would be odd if this belief is the primary reason EAs have for focusing on AI safety over AI acceleration. * A presumption in favor of human values over unaligned AI values for some reasons that aren't based on strict impartial utilitarian arguments. These could include the beliefs that: (1) Humans are more likely to have "interesting" values compared to AIs, and (2) Humans are more likely to be motivated by moral arguments than AIs, and are more likely to reach a deliberative equilibrium of something like "ideal moral values" compared to AIs. * Why would humans be more likely to have "interesting" values than AIs? It seems very plausible that AIs will have interesting values even if their motives seem alien to us. AIs might have even more "interesting" values than humans. * It seems to me like wishful thinking to assume that humans are strongly motivated by moral arguments and would settle upon something like "ideal moral values" * A belief that population growth is inevitable, so it is better to focus on AI safety. * But a central question here is why pushing for AI safety—in the sense of AI research that enhances human interests—is better than the alternative on the margin. What reason is there to think AI safety now is better than pushing for greater AI population growth now? (Potential responses to this question are outlined in other bullet points above and below.) * AI safety has lasting effects due to a future value lock-in event, whereas accelerationism would have, at best, temporary effects. * Are you sure there will ever actually be a "value lock-in event"? * Even if there is at some point a value lock-in event, wouldn't pushing for accelerationism also plausibly affect the values that are locked in? For example, the value of "population growth is good" seems more likely to be locked in, if you advocate for that now. * A belief that humans would be kinder and more benevolent than unaligned AIs * Humans seem pretty bad already. For example, humans are responsible for factory farming. It's plausible that AIs could be even more callous and morally indifferent than humans, but the bar already seems low.  * I'm also not convinced that moral values will be a major force shaping "what happens to the cosmic endowment". It seems to me that the forces shaping economic consumption matter more than moral values. * A bedrock heuristic that it would be extraordinarily bad if "we all died from AI", and therefore we should pursue AI safety over AI accelerationism. * But it would also be bad if we all died from old age while waiting for AI, and missed out on all the benefits that AI offers to humans, which is a point in favor of acceleration. Why would this heuristic be weaker? * An adherence to person-affecting views in which the values of currently-existing humans are what matter most; and a belief that AI threatens to kill existing humans. * But in this view, AI accelerationism could easily be favored since AIs could greatly benefit existing humans by extending our lifespans and enriching our lives with advanced technology. * An implicit acceptance of human supremacism, i.e. the idea that what matters is propagating the interests of the human species, or preserving the human species, even at the expense of individual interests (either within humanity or outside humanity) or the interests of other species. * But isn't EA known for being unusually anti-speciesist compared to other communities? Peter Singer is often seen as a "founding father" of the movement, and a huge part of his ethical philosophy was about how we shouldn't be human supremacists. * More generally, it seems wrong to care about preserving the "human species" in an abstract sense relative to preserving the current generation of actually living humans. * A belief that most humans are biased towards acceleration over safety, and therefore it is better for EAs to focus on safety as a useful correction mechanism for society. * But was an anti-safety bias common for previous technologies? I think something closer to the opposite is probably true: most humans seem, if anything, biased towards being overly cautious about new technologies rather than overly optimistic. * A belief that society is massively underrating the potential for AI, which favors extra work on AI safety, since it's so neglected. * But if society is massively underrating AI, then this should also favor accelerating AI too? There doesn't seem to be an obvious asymmetry between these two values. * An adherence to negative utilitarianism, which would favor obstructing AI, along with any other technology that could enable the population of conscious minds to expand. * This seems like a plausible moral argument to me, but it doesn't seem like a very popular position among EAs. * A heuristic that "change is generally bad" and AI represents a gigantic change. * I don't think many EAs would defend this heuristic explicitly. * Added: AI represents a large change to the world. Delaying AI therefore preserves option value. * This heuristic seems like it would have favored advocating delaying the industrial revolution, and all sorts of moral, social, and technological changes to the world in the past. Is that a position that EAs would be willing to bite the bullet on?
EDIT: just confirmed that FHI shut down as of April 16, 2024 It sounds like the Future of Humanity Institute may be permanently shut down.  Background: FHI went on a hiring freeze/pause back in 2021 with the majority of staff leaving (many left with the spin-off of the Centre for the Governance of AI) and moved to other EA organizations. Since then there has been no public communication regarding its future return, until now...  The Director, Nick Bostrom, updated the bio section on his website with the following commentary [bolding mine]:  > "...Those were heady years. FHI was a unique place - extremely intellectually alive and creative - and remarkable progress was made. FHI was also quite fertile, spawning a number of academic offshoots, nonprofits, and foundations. It helped incubate the AI safety research field, the existential risk and rationalist communities, and the effective altruism movement. Ideas and concepts born within this small research center have since spread far and wide, and many of its alumni have gone on to important positions in other institutions. > > Today, there is a much broader base of support for the kind of work that FHI was set up to enable, and it has basically served out its purpose. (The local faculty administrative bureaucracy has also become increasingly stifling.) I think those who were there during its heyday will remember it fondly. I feel privileged to have been a part of it and to have worked with the many remarkable individuals who flocked around it." This language suggests that FHI has officially closed. Can anyone at Trajan/Oxford confirm?  Also curious if there is any project in place to conduct a post mortem on the impact FHI has had on the many different fields and movements? I think it's important to ensure that FHI is remembered as a significant nexus point for many influential ideas and people who may impact the long term.  In other news, Bostrom's new book "Deep Utopia" is available for pre-order (coming March 27th). 
Two sources of human misalignment that may resist a long reflection: malevolence and ideological fanaticism (Alternative title: Some bad human values may corrupt a long reflection[1]) The values of some humans, even if idealized (e.g., during some form of long reflection), may be incompatible with an excellent future. Thus, solving AI alignment will not necessarily lead to utopia. Others have raised similar concerns before.[2] Joe Carlsmith puts it especially well in the post “An even deeper atheism”: > “And now, of course, the question arises: how different, exactly, are human hearts from each other? And in particular: are they sufficiently different that, when they foom, and even "on reflection," they don't end up pointing in exactly the same direction? After all, Yudkowsky said, above, that in order for the future to be non-trivially "of worth," human hearts have to be in the driver's seat. But even setting aside the insult, here, to the dolphins, bonobos, nearest grabby aliens, and so on – still, that's only to specify a necessary condition. Presumably, though, it's not a sufficient condition? Presumably some human hearts would be bad drivers, too? Like, I dunno, Stalin?” What makes human hearts bad?  What, exactly, makes some human hearts bad drivers? If we better understood what makes hearts go bad, perhaps we could figure out how to make bad hearts good or at least learn how to prevent hearts from going bad. It would also allow us better spot potentially bad hearts and coordinate our efforts to prevent them from taking the driving seat. As of now, I’m most worried about malevolent personality traits and fanatical ideologies.[3] Malevolence: dangerous personality traits Some human hearts may be corrupted due to elevated malevolent traits like psychopathy, sadism, narcissism, Machiavellianism, or spitefulness. Ideological fanaticism: dangerous belief systems There are many suitable definitions of “ideological fanaticism”. Whatever definition we are going to use, it should describe ideologies that have caused immense harm historically, such as fascism (Germany under Hitler, Italy under Mussolini), (extreme) communism (the Soviet Union under Stalin, China under Mao), religious fundamentalism (ISIS, the Inquisition), and most cults.  See this footnote[4] for a preliminary list of defining characteristics. Malevolence and fanaticism seem especially dangerous Of course, there are other factors that could corrupt our hearts or driving ability. For example, cognitive biases, limited cognitive ability, philosophical confusions, or plain old selfishness.[5] I’m most concerned about malevolence and ideological fanaticism for two reasons.    Deliberately resisting reflection and idealization First, malevolence—if reflectively endorsed[6]—and fanatical ideologies deliberately resist being changed and would thus plausibly resist idealization even during a long reflection. The most central characteristic of fanatical ideologies is arguably that they explicitly forbid criticism, questioning, and belief change and view doubters and disagreement as evil.  Putting positive value on creating harm Second, malevolence and ideological fanaticism would not only result in the future not being as good as it possibly could—they might actively steer the future in bad directions and, for instance, result in astronomical amounts of suffering.  The preferences of malevolent humans (e.g., sadists) may be such that they intrinsically enjoy inflicting suffering on others. Similarly, many fanatical ideologies sympathize with excessive retributivism and often demonize the outgroup. Enabled by future technology, preferences for inflicting suffering on the outgroup may result in enormous disvalue—cf. concentration camps, the Gulag, or hell[7]. In the future, I hope to write more about all of this, especially long-term risks from ideological fanaticism.  Thanks to Pablo and Ruairi for comments and valuable discussions.  1. ^ “Human misalignment” is arguably a confusing (and perhaps confused) term. But it sounds more sophisticated than “bad human values”.  2. ^ For example, Matthew Barnett in “AI alignment shouldn't be conflated with AI moral achievement”, Geoffrey Miller in “AI alignment with humans... but with which humans?”, lc in “Aligned AI is dual use technology”. Pablo Stafforini has called this the “third alignment problem”. And of course, Yudkowsky’s concept of CEV is meant to address these issues.  3. ^ These factors may not be clearly separable. Some humans may be more attracted to fanatical ideologies due to their psychological traits and malevolent humans are often leading fanatical ideologies. Also, believing and following a fanatical ideology may not be good for your heart. 4. ^ Below are some typical characteristics (I’m no expert in this area): Unquestioning belief, absolute certainty and rigid adherence. The principles and beliefs of the ideology are seen as absolute truth and questioning or critical examination is forbidden. Inflexibility and refusal to compromise.  Intolerance and hostility towards dissent. Anyone who disagrees or challenges the ideology is seen as evil; as enemies, traitors, or heretics. Ingroup superiority and outgroup demonization. The in-group is viewed as superior, chosen, or enlightened. The out-group is often demonized and blamed for the world's problems.  Authoritarianism. Fanatical ideologies often endorse (or even require) a strong, centralized authority to enforce their principles and suppress opposition, potentially culminating in dictatorship or totalitarianism. Militancy and willingness to use violence. Utopian vision. Many fanatical ideologies are driven by a vision of a perfect future or afterlife which can only be achieved through strict adherence to the ideology. This utopian vision often justifies extreme measures in the present.  Use of propaganda and censorship.  5. ^ For example, Barnett argues that future technology will be primarily used to satisfy economic consumption (aka selfish desires). That seems even plausible to me, however, I’m not that concerned about this causing huge amounts of future suffering (at least compared to other s-risks). It seems to me that most humans place non-trivial value on the welfare of (neutral) others such as animals. Right now, this preference (for most people) isn’t strong enough to outweigh the selfish benefits of eating meat. However, I’m relatively hopeful that future technology would make such types of tradeoffs much less costly.  6. ^ Some people (how many?) with elevated malevolent traits don’t reflectively endorse their malevolent urges and would change them if they could. However, some of them do reflectively endorse their malevolent preferences and view empathy as weakness.  7. ^ Some quotes from famous Christian theologians:  Thomas Aquinas:  "the blessed will rejoice in the punishment of the wicked." "In order that the happiness of the saints may be more delightful to them and that they may render more copious thanks to God for it, they are allowed to see perfectly the sufferings of the damned".  Samuel Hopkins:  "Should the fire of this eternal punishment cease, it would in a great measure obscure the light of heaven, and put an end to a great part of the happiness and glory of the blessed.” Jonathan Edwards:  "The sight of hell torments will exalt the happiness of the saints forever."

Load more months