D0TheMath

Posts

Sorted by New

Comments

Getting money out of politics and into charity

If we successfully built this platform, would you consider using it? If your answer is “it depends”, what does it depend on?

I wouldn't use it, since I don't donate to campaigns, but I would certainly push all my more political friends and family members to use it.

What questions would you like to see forecasts on from the Metaculus community?

In The Precipice Toby Ord gives estimated chances of various existential risks happening within the next 100 years. It'd be cool if we could get estimates from Metaculus as well, although it may be impossible to implement, as Tachyons would only be awarded when the world doesn't blow up.

The 80,000 Hours podcast should host debates

I like the idea of having people with different opinions discuss their disagreements, but I don't think they should be marketed as debates. That term doesn't have positive connotations, and seems to imply that there will be a winner/loser. Even if there is no official winner/loser, it puts the audience and the participants in a zero-sum mentality.

I think something more like an adversarial collaboration would be healthier, and I like that term more because it's not as loaded, and it's more up front about what we actually want the participants to do.

How do i know a charity is actually effective

Thanks for the correction. Idk why I thought it was Toby Ord.

How do i know a charity is actually effective

I haven't read Will's book, so I'm not entirely sure what your background knowledge is.

Are you unsure about how to compare two different cause areas? For instance, do you accept that it's better to save the lives of 10 children than to fund a $30,000 art museum renovation project, but are unsure whether saving the lives of 10 children or de-worming 4,500 children is better?

In this case I suggest looking at QUALYs and DALYs which try to quantify the number of healthy years of lives saved given estimates for how bad various diseases and disabilities are. GiveWell has a few reservations about DALYs, and uses their own weighting/cost-effectiveness model. On the linked page, you can look at the spreadsheet they use to analyze different charities and interventions, and change the weights to fit your own moral weights. Although I would recommend doing some research before you just choose a number out of a hat.

If it's more similar to "Deworm the World says it's cost-effectiveness is $0.45 per child dewormed. How do I know this is actually an accurate estimate?" In this example, we can just go to GiveWell and see their documentation. The reason why GiveWell is so useful is because they are both transparent, and very evidence focused. In this case, GiveWell provides a summary of the sources for their review, and in-depth information on exactly what those sources gave them. Including transcripts of interviews/conversations with staff, and general notes on site visits. All of this can be found from a series of links from their main charity page. This heavy transparency means they can likely be trusted for facts. See the above paragraph for information sources on their analysis.

If your confusion is more along the lines of "Ok, I understand, intellectually it's better to save the lives of 10 children then to give $30,000 for a kid's wish via the Make a Wish Foundation, but my gut disagrees, and I am unable to emotionally conceptualize that saving the 10 children is at least 10x better than fulfilling one child's wish." In this case, understand that this is a pretty common experience, and you are not alone. It takes a lot of empathy, and a lot of experience with numbers to even get close to Derek Parfit levels of caring about abstract suffering [1]. Tackling this problem will be different for everyone, so I can't give any advice except to say that while your gut is good for fast and simple decisions (for instance, swerving out of the way before you crash into an old lady while driving your car), it is not so good for figuring out complex decisions.

It is easy to aim and throw a baseball using only your gut, but it is near impossible to land a rocket on the moon using only your gut. We need theories of gravity to figure out that. Some smart people who've been researching and living in astrophysics for their entire adult lives (or have played Kerbal Space Program) will be able to understand theories of gravitation intuitively, but even they will still revert to numbers when given the option. In the same way, it's easy to gut-level understand that you should save a kid from drowning, but much harder to gut-level understand that saving the lives of 10 children is better than making one very happy. But we can set down moral theories to help us, and we can try to get an intuitive feel for why we should listen to those theories.

Personally, I gained a lot of gut understanding from the Sequences on "Mere Goodness". Fake Preferences, Value Theory, and Quantified Humanism. But not everyone likes the Sequences, and they may require some greater amount of background if you haven't read the preceding sequences.


[1] Derek Parft reportedly broke down into tears in the middle of an interview for seemingly no reason. When asked why, it was the very idea of suffering which made him cry.

I originally thought this was Toby Ord, but Thomas Kwa corrected me in the below comment.

FLI AI Alignment podcast: Evan Hubinger on Inner Alignment, Outer Alignment, and Proposals for Building Safe Advanced AI

This was a particularly informative podcast, and you helped me get a better understanding of inner alignment issues, which I really appreciate.

To be clear I understand: the issue with inner alignment is that as an agent gets optimized for a reward/cost function on a training distribution, and to do well the agent needs to have a good enough world model to determine that it's in or could be undergoing training, then if the training ends up creating an optimizer, it's much more likely that that optimizer's reward function is bad or a proxy, and if it's sufficiently intelligent, it'll reason that it should figure out what you want it to do, and do that. This is because there are many different bad reward functions an inner optimizer can have, but only one that you want it to actually have, and each of those bad reward functions will pretend to have the good one.

Although the badly-aligned agents seem like they'd at least be optimizing for proxies of what you actually want, as early (dumber) agents with unrelated utility functions wouldn't do as well as alternative agents with approximately aligned utility functions.

Correct me on any mistakes please.

Also, because this depends on the agents being at least a little generally intelligent, I'm guessing there are no contemporary examples of such inner optimizers attempting deception.

How do you talk about AI safety?

While I haven't read the book, Slate Star Codex has a great review on Human Compatible. Scott says it speaks of AI safety, especially in the long-term future, in a very professional sounding, and not weird way. So I suggest reading that book, or that review.


You could also list several different smaller scale AI-misalignment problems, such as the problems surrounding Zuckerberg and Facebook. You could say something like "You know how Facebook's AI is programmed to keep you on as long as possible, so often it will show you controversial content in order to rile you up, and get everyone yelling on everyone else so you never leave the platform? Yeah, I make sure that won't happen with smarter, and more influential AIs." If all you're going for is an elevator speech, or explaining to family what is it you do, I'd stop here. Otherwise, say something like "By my estimation, this seems fairly important, as incentives are aligned for companies and countries to use the best AI possible, and better AI means more influential AI, so if you have a really good, but slightly sociopathic AI, it's likely it'll still be used anyway. And if, in a few decades, we get to the point where we have a smarter than human, but still sociopathic AI, it's possible we've just made an immortal Hitler-Einstein combination. Which, needless to say, would be very bad, possibly even extinction-level bad. So if the job is very hard, and the result if the job doesn't get done is very bad, then the job is very very important (that's very)." after the first part.

I've never tried using these statements, but the seem like they'd work.

Terrorism, Tylenol, and dangerous information

Not entirely applicable to the discussion, but I just like talking about things like this and I finally found something tangentially related. Feel free to disregard.

if you look at a period of sustained effort in staying on the military cutting edge, i.e. the Cold War, you won't see as many of these mistakes and you'll instead find fairly continuous progress

The cold war wasn't peacetime though... there was continuous fighting by both sides. The Americans and Chinese in Korea, the Americans in Vietnam, and the Russians in Afghanistan.

One can argue that these places don't scale to the kind of military techniques and science that a World War 3 scenario would require. But this kind of war has never occurred with modern technology (specifically hydrogen bombs). How do we know that all of the ideas dreamed up by generals and military experts wouldn't get tossed out the window the moment it was determined that they were inapplicable to a nuclear war?