This is a special post for quick takes by NunoSempere. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
Polymarket beat legacy institutions at processing information, in real time and in general. It was just much faster at calling states, and more confident earlier on the correct outcome.
The OG prediction markets community, the community which has been betting on politics and increasing their bankroll since PredictIt, was on the wrong side of 50%—1, 2, 3, 4, 5. It was the democratic, open-to-all nature of it, the Frenchmanwho was convinced that mainstream polls were pretty tortured and bet ~$45M, what moved Polymarket to the right side of 50/50.
Polls seem like a garbage in garbage out kind of situation these days. How do you get a representative sample? The answer is maybe that you don't.
Polymarket will live. They were useful to the Trump campaign, which has a much warmer perspective on crypto. The federal government isn't going to prosecute them, nor bettors. Regulatory agencies, like the CFTC and the SEC, which have taken such a prominent role in recent editions of this newsletter, don't really matter now, as they will be aligned with financial innovation rather than opposed to it.
NYT/Siena really fucked up with their last poll and the coverage of it. So did Ann Selzer. Some prediction market bettors might have thought that you could do the bounded distrust, but in hindsight it turns out that you can't. Looking back, to the extent you trust these institutions, they can ratchet their deceptiveness (from misleading headlines, incomplete stories, incomplete quotes out of context, not reporting on important stories, etc.) for clicks and hopium, to shape the information landscape for a managerial class that... will no longer be in power in America.
Elon Musk and Peter Thiel look like geniuses. In contrast Dustin Moskovitz couldn't get SB 1047 passed despite being the second largest contributor to Democrats, and has been in-hindsight suboptimally positioning his philanthropic bets by reportedly not funding any slightly right of center work. These are also bets that matter!
I don't have time to write a detailed response now (might later), but wanted to flag that I either disagree or "agree denotatively but object connotatively" with most of these. I disagree most strongly with #3: the polls were quite good this year. National and swing state polling averages were only wrong by 1% in terms of Trump's vote share, or in other words 2% in terms of margin of victory. This means that polls provided a really large amount of information.
(I do think that Selzer's polls in particular are overrated, and I will try to articulate that case more carefully if I get around to a longer response.)
My sense is that the polls were heavily reweighted by demographics, rather than directly sampling from the population. That said, I welcome your nitpicks, even if brief
Think I disagree especially strongly with #6. Of all the reasons to think Musk might be a genius, him going all in on 60/40 odds is definitely not one of them Especially since he could probably have got an invite to Mar-a-Lago and President Trump's ear on business and space policy with a small donation and generic "love Donald's plans to make American business great again" endorsement, and been able to walk it right back again whenever the political wind was blowing the other way. I don't think he's spent his time and much of his fortune to signal boost catturd tweets out of calm calculation of which way the political wind was blowing.
Biggest and highest profile donor to the winning side last time round didn't do too well out of it either, and he probably did think he was being clever and calculating
(Lifelong right winger Thiel's "I think it's 50/50 who will win but my contrarian view is I also don't think it'll be close" was great hedging his bets, on the other hand!) .
Polymarket beat legacy institutions at processing information, in real time and in general. It was just much faster at calling states, and more confident earlier on the correct outcome.
How much of that do you think was about what the legacy institutions knew vs. what they publicly communicated? The Polymarket hive mind doesn't necessarily care about things like maintaining democratic institutions (like not making calls that could influence elections elsewhere with still-open polls) or long-term individual reputation (like having to walk the Florida call back in 2000). I don't see those as weaknesses.
If you have {publicly competent, publicly incompetent, privately incompetent, privately competent}, we get some information that screens off publicly competent. That leaves the narrower set {publicly incompetent, privately incompetent, privately competent}. So it's still an update. I agree that in this case there is some room for doubt though. Depending on which institutions we are thinking of (Democratic party, newspapers, etc.), we also get some information from the speed at which people decided on/learned that Biden was going to step down.
I also frankly don't think they're necessarily as interested at making speedy decisions based on county level polls, which is exactly the sort of real time stats checking you'd expect prediction market enthsiasts to be great at
(obviously trad media does report on county level polls and ultimately use them to decide if they're happy to call a race, but they don't have much incentive to be first and they're tracking and checking a lot of other stuff like politico commentary and rumours and relevance of human interest stories and silly voxpops)
Throw in the fact that early counts are sometimes skewed in favour of another candidate which changes around as later voters or postal votes or ballot boxes from further out districts within a county get tallied up. This varies according to jurisdiction rules and demographics and voting trends, and it's possible serious Poly Market betters were extremely clued up on them. But this time around, they'd have been just as right about state level outcomes if much of the money moved on the relatively naive assumption that you couldn't expect any late swings, and not factoring in that possibility would be bad calibration.
...as a small case study, the Effective Altruism forum has been impoverished over the last few years by not being lenient with valuable contributors when they had a bad day.
In a few cases, I later learnt that some longstanding user had a mental health breakdown/psychotic break/bipolar something or other. To some extent this is an arbitrary category, and you can interpret going outside normality through the lens of mental health, or through the lens of "this person chose to behave inappropriately". Still, my sense is that leniency would have been a better move when people go off the rails.
In particular, the best move seems to me a combination of:
In the short term, when a valued member is behaving uncharacteristically badly, stop them from posting
Followup a week or a few weeks later to see how the person is doing
Two factors here are:
There is going to be some overlap in that people with propensity for some mental health disorders might be more creative, better able to see things from weird angles, better able to make conceptual connections.
In a longstanding online community, people grow to care about others. If a friend goes of the rails, there is the question of how to stop them from causing harm to others, but there is also the question of how to help them be ok, and the second one can just dominate sometimes.
I don't think not banning users for first offences is necessary the highest bar I want to reach for. For instance, consider this comment. Like, to exaggerate this a bit, imagine receiving that comment in one of the top 3 worst moments of your life.
I value a forum where people are not rude to each other, so I think it is good that moderators give out a warning to people who are becoming increasingly rude in a short time.
At some point in the 90s or the 00s, the "whole of person" concept became popular in the US Natsec community for security clearance matters.
It distinguishes between a surface level vibe from a person, and trying to understand the whole person. The surface level vibe is literally taking the worst of a person and taking it out of context, whereas the whole person concept is making any effort at all to evaluate the person and the odds that they're good to work with and on what areas. Each subject has their own cost-benefit analysis in the context of different work they might do, and more flexible people (e.g. younger) and weirder people will probably have cost-benefit analysis that change somewhat over time.
In environments where evaluators are incompetent, lack the resources needed to evaluate each person, or believe that humans can't be evaluated, then there's a reasonable justification to rule people out without making an effort to optimize.
Otherwise, evaluators should strive to make predictions and minimize the gap between their predictions of whether a subject will cause harm again, and the reality that comes to pass; for example, putting in any effort at all to succeed at distinguishing between individuals causing harm due to mental health, individuals causing harm due to mistakes due to unpreventable ignorance (e.g. the pauseAI movement), mistakes caused by ignorance that should have been preventable, harm caused by malice correctly attributed to the subject, harm caused by someone spoofing the point of origin, or harm caused by a hostile individual, team, or force covertly using SOTA divide-and-conquer tactics to disrupt or sow discord in an entire org, movement, or vulnerable clique; see Conflict vs mistake theory.
Reasons why upvotes on the EA forum and LW don't correlate that well with impact .
More easily accessible content, or more introductory material gets upvoted more.
Material which gets shared more widely gets upvoted more.
Content which is more prone to bikeshedding gets upvoted more.
Posts which are beautifully written are more upvoted.
Posts written by better known authors are more upvoted (once you've seen this, you can't unsee).
The time at which a post is published affects how many upvotes it gets.
Other random factors, such as whether other strong posts are published at the same time, also affect the number of upvotes.
Not all projects are conducive to having a post written about them.
The function from value to upvotes is concave (e.g., like a logarithm or like a square root), in that a project which results in a post with a 100 upvotes is probably more than 5 times as valuable as 5 posts with 20 upvotes each. This is what you'd expect if the supply of upvotes was limited.
Upvotes suffer from inflation as EA forum gets populated more, so that a post which would have gathered 50 upvotes two years might gather 100 upvotes now.
Upvotes may not take into account the relationship between projects, or other indirect effects. For example, projects which contribute to existing agendas are probably more valuable than otherwise equal standalone projects, but this might not be obvious from the text.
I agree that the correlation between number of upvotes on EA forum and LW posts/comments and impact isn't very strong. (My sense is that it's somewhere between weak and strong, but not very weak or very strong.) I also agree that most of the reasons you list are relevant.
But how I'd frame this is that - for example - a post being more accessible increases the post's expected upvotes even more than it increases its expected impact. I wouldn't say "Posts that are more accessible get more upvotes, therefore the correlation is weak", because I think increased accessibility will indeed increase a post's impact (holding other factor's constant).
Same goes for many of the other factors you list.
E.g., more sharing tends to both increase a post's impact (more readers means more opportunity to positively influence people) and signal that the post would have a positive impact on each reader (as that is one factor - among many - in whether people share things). So the mere fact that sharing probably tends to increase upvotes to some extent doesn't necessarily weaken the correlation between upvotes and impact. (Though I'd guess that sharing does increase upvotes more than it increases/signals impact, so this comment is more like a nitpick than a very substantive disagreement.)
To make it clear, the claim is that the number karma for a forum post on a project does not correlate well with the project's direct impact? Rather than, say, that a karma score of a post correlates well with the impact of the post itself on the community?
I'd say it also doesn't correlate that well with its total (direct+indirect) impact either, but yes. And I was thinking more in contrast to the karma score being an ideal measure of total impact; I don't have thoughts to share here on the impact of the post itself on the community.
I think that for me, I upvote according to how much I think a post itself is valuable for me or for the community as a whole. At least, that's what I'm trying to do when I'm thinking about it logically.
This only includes the top 8 areas. “Other areas” refers to grants tagged “Other areas” in OpenPhil’s database. So there are around $47M in known donations missing from that graph. There is also one (I presume fairly large) donation amount missing from OP’s database, to Impossible Foods
See also as a tweet and on my blog. Thanks to @tmkadamcz for suggesting I use a bar chart.
One thing I can never figure out is where the missing Open Phil donations are! According to their own internal comms (e.g. this job advert) they gave away roughly $450 million in 2021. Yet when you look at their grants database, you only find about $350 million, which is a fair bit short. Any idea why this might be?
I think it could be something to do with contractor agreement (e.g. they gave $2.8 million to Kurzgesagt and said they don't tend to publish similar contractor agreements like these). Curious to see the breakdown of the other approx. $100 million though!
We're still in the process of publishing our 2021 grants, so many of those aren't on the website yet. Most of the yet-to-be-published grants are from the tail end of the year — you may have noticed a lot more published grants from January than December, for example.
That accounts for most of the gap. The gap also includes a few grants that are unusual for various reasons (e.g. a grant for which we've made the first of two payments already but will only publish once we've made the second payment a year from now).
We only include contractor agreements in our total giving figures if they are conceptually very similar to grants (Kurzgesagt is an example of this). Those are also the contractor agreements we tend to publish. In other words, an agreement that isn't published is very unlikely to show up in our total giving figures.
We publish our giving to political causes just as we publish our other giving (e.g. this ballot initiative).
As with contractor agreements, we publish investments and include them in our total giving if they are conceptually similar to grants (meaning that investments aren't part of the gap James noted). You can see a list of published investments by searching "investment" in our grants database.
The recent EA Forum switch to to creative commons license (see here) has brought into relief for me that I am fairly dependent on the EA forum as a distribution medium for my writing.
Arguably I should have done this years ago. I also see this dependence with social media, where a few people I know depend on Twitter, Instagram &co for the distribution of their content & ideas.
{
posts(input: {
terms: {
# view: "top"
meta: null # this seems to get both meta and non-meta posts
after: "10-1-2000"
before: "10-11-2020" # or some date in the future
}
}) {
results {
title
url
pageUrl
createdAt
}
}
}
Copy the output into a last5000posts.txt
Search for the keyword "prize". In Linux one can use this with grep "prize" last5000posts.txt, or with grep -B 1 "prize" last5000posts.txt | sed 's/^.*: //' | sed 's/\"//g' > last500postsClean.txt to produce a cleaner output.
Here are a few estimation related things that I can be doing:
In-house longtermist estimation: I estimate the value of speculative projects, organizations, etc.
Improving marginal efficiency: I advise groups making specific decisions on how to better maximize expected value.
Building up estimation capacity: I train more people, popularize or create tooling, create templates and acquire and communicate estimation know-how, and make it so that we can “estimate all the things”.
That is, I have in theory been trying to directly aim for the jugular of growing evaluation capacity directly. I believe that this is valuable because once that capacity exists, it can be applied to estimate the many things for which estimation is currently unfeasible. However, although I buy the argument in the abstract, I have been finding that a bit demotivating. Instead, I would like to be doing something like:
I think I have the strong intuition that producing value in between scaling produces feedback that is valuable and that otherwise can’t be accessed just by aiming for scaling. As a result, I have been trying to do things which also prove valuable in the meantime. This might have been to the slight irritation of my boss, who believes more in going for the yugular directly. Either way, I also think I will experience some tranquility from being more intentional about this piece of strategy.
In emotional terms, things that are aiming solely for scaling—like predicting when a long list of mathematical theorems will be solved—feel “dry”, “unengaging”, “a drag”, “disconnected”, or other such emotional descriptors.
I can think of various things that could change my mind and my intuitions about this topic, such as:
Past examples of people successfully aiming for an abstract goal and successfully delivering
An abstract reason or intuition why going for scaling is such that it’s worth skipping feedback loops
PredictIt: The odds of Trump winning the 2020 elections remain at a pretty constant 50%, oscillating between 45% and 57%.
The Good Judgment Project has a selection of interesting questions, which aren't available unless one is a participant. A sample below (crowd forecast in parenthesis):
Will the UN declare that a famine exists in any part of Ethiopia, Kenya, Somalia, Tanzania, or Uganda in 2020? (60%)
In its January 2021 World Economic Outlook report, by how much will the International Monetary Fund (IMF) estimate the global economy grew in 2020? (Less than 1.5%: 94%, Between 1.5% and 2.0%, inclusive: 4%)
Before 1 July 2020, will SpaceX launch its first crewed mission into orbit? (22%)
Before 1 January 2021, will the Council of the European Union request the consent of the European Parliament to conclude a European Union-United Kingdom trade agreement? (25%)
Will Benjamin Netanyahu cease to be the prime minister of Israel before 1 January 2021? (50%)
Before 1 January 2021, will there be a lethal confrontation between the national military or law enforcement forces of Iran and Saudi Arabia either in Iran or at sea? (20%)
Before 1 January 2021, will a United States Supreme Court seat be vacated? (No: 55%, Yes, and a replacement Justice will be confirmed by the Senate before 1 January 2021: 25%, Yes, but no replacement Justice will be confirmed by the Senate before 1 January 2021: 20%)
Will the United States experience at least one quarter of negative real GDP growth in 2020? (75%)
Who will win the 2020 United States presidential election? (The Republican Party nominee: 50%, The Democratic Party nominee: 50%, Another candidate: 0%)
Before 1 January 2021, will there be a lethal confrontation between the national military forces of Iran and the United States either in Iran or at sea? (20%)
Will Nicolas Maduro cease to be president of Venezuela before 1 June 2020? (10%)
When will the Transportation Security Administration (TSA) next screen two million or more travelers in a single day? (Not before 1 September 2020: 66%, Between 1 August 2020 and 31 August 2020: 17%, Between 1 July 2020 and 31 July 2020: 11%, Between 1 June 2020 and 30 June 2020: 4%, Before 1 June 2020: 2%)
The European Statistical Service is "a partnership between Eurostat and national statistical institutes or other national authorities in each European Union (EU) Member State responsible for developing, producing and disseminating European statistics". In this time of need, the ESS brings us inane information, like "consumer prices increased by 0.1% in March in Switzerland".
COVID: Everyone and their mother have been trying to predict the future of COVID. One such initiative is Epidemic forecasting, which uses inputs from the above mentioned prediction platforms.
The Stanford Social Innovation Review makes the case (archive link) that new, promising interventions are almost never scaled up by already established, big NGOs.
I suppose I just assumed that scale ups happened regularly at big NGOs and I never bothered to look closely enough to notice that it didn't. I find this very surprising.
Here is an excerpt from a draft that didn't really fit in the main body.
Getting closer to expected value calculations seems worth it even if we can't reach them
Because there are many steps between quantification and impact, quantifying the value of quantification might be particularly hard. That said, each step towards getting closer to expected value calculations seems valuable even if we never arrive at expected value calculations. For example:
Quantifying the value of one organization on one unit might be valuable, if the organization aims to do better each year on that metric.
Evaluations might be valuable even if they don’t affect prioritization (e.g., of funds) because they might directly point out areas to improve.
Evaluating organizations within the same cause area is valuable because it allows for prioritization within that cause area.
General quantifications seem valuable even if very uncertain because they could determine which further work to do to become more certain. Evaluations could be action-guiding even if extremely uncertain.
In Exceeding expectations: stochastic dominance as a general decision theory, Christian Tarsney presents stochastic dominance (to be defined) as a total replacement for expected value as a decision theory. He wants to argue that one decision is only rationally better as another one when it is stochastically dominant. For this, he needs to say that the choiceworthiness of a decision (how rational it is) is undefined in the case where one decision doesn’t stochastically dominate another one.
I think this is absurd, and perhaps determined by academic incentives to produce more eye-popping claims rather than more restricted incremental improvements. Still, I thought that the paper made some good points about us still being able to make decisions even when expected values stop being informative. It was also my introduction to extending rational decision-making to infinite cases, and a great introduction at that. Below, I outline my rudimentary understanding of these topics.
Where expected values fail.
Consider a choice between:
A: 1 utilon with probability ½, 2 utilons with probability ¼th, 4 utilons with probability 1/8th, etc. The expected value of this choice is 1 × ½ + 2 × ¼ + 4 × 1/8 + … = ½ + ½ + ½ + … = ∞
B: 2 utilons with probability ½, 4 utilons with probability ¼th, 8 utilons with probability 1/8th, etc. The expected value of this choice is 2 × ½ + 2 × ¼ + 4 × 1/8 + … = 1 + 1 + 1 + … = ∞
So the expected value of choice A is ∞, as is the expected value of choice B. And yet, B is clearly preferable to A. What gives?
Statewise dominance
Suppose that in the above case, there were different possible states, as if the payoffs for A and B were determined by the same coin throws:
State i: A gets 1, B gets 2
State ii: A gets 2, B gets 4
State iii: A gets 4, B gets 8,
State in: A gets 2n, B gets 2 × 2n.
Then in this case, B dominates A in every possible state. This is a reasonable decision principle that we can reach to ground our decision to choose B over A.
Stochastic dominance
O stochastically dominates P if:
For any payoff x, the probability that O yields a payoff at least as good as x is equal to or greater than the probability that P yields a payoff at least as good as x, and
For some payoff x, the probability that O yields a payoff at least as good as x is strictly greater than the probability that P yields a payoff at least as good as x.
∃x such that Probability(Payoff(O) ≥ x) > Probability(Payoff(P) ≥ x))
This captures a notion that O is, in a sense, strictly better than P, probabilistically.
In the case of A and B above, if their payoffs were determined by throwing independent coins:
There is a 100% chance that B yields a payoff ≥ 1, and 100% that A yields a payoff ≥ 1
There is a 50% chance that B yields a payoff ≥ 2, but only a 25% chance that A yields a payoff ≥ 2
There is a 25% chance that B yields a payoff ≥ 4, but only a 12.5% chance that A yields a payoff ≥ 4
There is a 12.5% chance that B yields a payoff ≥ 8, but only a 6.26% chance that A does so.
There is a ½^n chance that B yields a payoff ≥ 2n, but only a ½^(n+1) chance that A does so.
So the probability that B gets increasingly better outcomes is higher than the probability that A will do so. So in this case, B stochastically dominates A. Stochastic dominance is thus another decision principle that we could reach to compare choices with infinite expected values.
Gaps left
The above notions of stochastic and statewise dominance could be expanded and improved. For instance, we could ignore a finite number of comparisons going the other way if the expected value of those options was finite but the expected value of the whole thing was infinite. For instance, in the following comparison:
A: 100 utilons with probability ½, 2 utilons with probability ¼th, 4 utilons with probability 1/8th, etc. The expected value of this choice is 1 × ½ + 2 × ¼ + 4 × 1/8 + … = ½ + ½ + ½ + … = ∞
B: 2 utilons with probability ½, 4 utilons with probability ¼th, 8 utilons with probability 1/8th, etc. The expected value of this choice is 2 × ½ + 2 × ¼ + 4 × 1/8 + … = 1 + 1 + 1 + … = ∞
I would still say that B is preferable to A in that case. And my impression is that there are many similar principles one could reach to, in order to resolve many but not all comparisons between infinite sequences.
Exercise for the reader: Come up with two infinite sequences which cannot be compared using statewise or stochastic dominance, or similar principles.
You could discount utilons - say there is a “meta-utilon” which is a function of utilons, like maybe meta utilons = log(utilons). And then you could maximize expected metautilons rather than expected utilons. Then I think stochastic dominance is equivalent to saying “better for any non decreasing metautilon function”.
But you could also pick a single metautilon function and I believe the outcome would at least be consistent.
Really you might as well call the metautilons “utilons” though. They are just not necessarily additive.
Monotonic transformations can indeed solve the infinity issue. For example the sum of 1/n doesn’t converge, but the sum of 1/n^2 converges, even though x -> x^2 is monotonic.
~10-100 Q: The Global Priorities Institute'sResearch Agenda.
~100-1000+ Q: A New York Times Bestseller on a valuable topic, like Superintelligence (cited 2,651 times), or Thinking Fast and Slow (cited 30,439 times.)
This spans six orders of magnitude (1 to 1,000,000 mQ), but I do find that my intuitions agree with the relative values, i.e., I would probably sacrifice each example for 10 equivalents of the preceding type (and vice-versa).
A unit — even if it is arbitrary or ad-hoc — makes relative comparison easier, because projects can be compared to a reference point, rather than between each other.. It also makes working with different orders of magnitude easier: instead of asking how valuable a blog post is compared to a foundational paper, one can move up and down in steps of 10x, which seems much more manageable.
Enter CALISTO, a young nobleman who, in the course of his adventures, finds MELIBEA, a young noblewoman, and is bewitched by her appearance.
CALISTO: Your presence, Melibea, exceeds my 99.9% percentile prediction.
MELIBEA: How so, Calisto?
CALISTO: In that the grace of your form, its presentation and concealment, its motions and ornamentation are to me so unforeseen that they make me doubt my eyes, my sanity, and my forecasting prowess. In that if beauty was an alchemical substance, you would have four parts of it for every one part that all the other dames in the realm together have.
MELIBEA: But do go on Calisto, that your flattery is not altogether unpleasant to my ears.
CALISTO: I must not, for I am in an URGENT MISSION on which the fate of the kingdom rests. And yet, even with my utilitarian inclinations, I would grant you ANY BOON IN MY POWER for A KISS, even if my delay would marginally increase the risk to the realm.
Calisto then climbs up to Melibea’s balcon and GETS A KISS. When going down, he LOOKS BACK at Melibea, slips, BREAKS HIS NECK, and DIES.
The way this finally clicked for me was: Sure, Bayesian probability theory is the one true way to do probability. But you can't actually implement it.
In particular, problems I've experienced are:
- I'm sometimes not sure about my calibration in new domains
- Sometimes something happens that I couldn't have predicted beforehand (particularly if it's very specific), and it's not clear what the Bayesian update should be. Note that I'm talking about "something took me completely by surprise" rather than "something happened to which I assigned a low probability"
- I can't actually compute how many bits of evidence new data comes. So for instance I get some new information, and I don't actually just instantaneously know that I was at 12.345% and now I'm at 54.321%. I have to think about it. But before I've thought about it I'm sometimes like a deer in the headlights, and my probability might be "Aaaah, I don't know."
- Sometimes I'll be in an uncertain situation, and yeah, I'm uncertain, but I'd still offer a $10k bet on it. Or I'd offer a smaller bet with a spread (e.g., I'd be willing to bet $100 at 1:99 in favor but 5:95 against). But sometimes I really am just very un-eager to bet.
That said, I do think that people are too eager to say that something is under "Knightian uncertainty" when they could just put up a question on Metaculus (or on a prediction market) about it.
It was recently done, in collaboration with the Manifold guys. I'm also making sure that the dimensions are right, see: <https://github.com/ForumMagnum/ForumMagnum/pull/6096>
The Good Judgement Open forecasting tournament gives a 66% chance for the answer to "Will the UN declare that a famine exists in any part of Ethiopia, Kenya, Somalia, Tanzania, or Uganda in 2020?"
I think that the 66% is a slight overestimate. But nonetheless, if a famine does hit, it would be terrible, as other countries might not be able to spare enough attention due to the current pandemic.
The Tigray region is now seeing armed conflict. I'm at 5-10%+ that it develops into famine (regardless of whether it ends up meeting the rather stringent UN conditions for the term to be used) (but have yet to actually look into the base rate). I've sent an email to FEWs.net to see if they update their forecasts.
In short, from my estimates, I would have to run 70-ish to 280-ish 5km runs, which would take me between half a year and a bit over two years. But my gut feeling is telling me that it would take me twice as long, say, between a year and four.
I came up with that estimate because was recently doing some exercise and I didn’t like the machine’s calorie loss calculations, so I rolled some calculations of my own, in Squiggle
So if I feed this calculation into squiggle this calculation tells me that I spent 1410 calories, as opposed to the 500-ish that I get from the machine, or similar quantities from random internet sites. The site linked even has a calculator, which is pretty wrong.
Anyways, I also want to add uncertainty, both because
I don’t really know my weight (and, to be honest, I don’t really want to know)
and because I’m not actually displacing my weight in a vaccum—e.g., I’m not pushing myself in a straight line but rather jumping a bit when I run.
But I don’t just want to know the number of calories, I want to know how this relates to weight. This random hackernews comment mentions that 1kg of fat is 7700 calories, and the number is likewise spread across the internet, but I somehow don’t have that much confidence in it. After some searches on Google Scholar, I settle on “calories per kilogram of adipose tissue” as the key term I’m looking for, and I find this paper. The paper mentions that after an initial weight loss period, the body becomes more lethargic, which does intutively make sense. So because your body adapts to lower caloric consumption (or to higher energy expenditure), these calculations don’t really make sense directly.
Still, I think that they can establish a rough lower bound of how much exercise I would have to do. So, continuing with that model:
// Physical calculations
meters_run = 5k // meters
weight_displaced = 118 to 125 // kilograms
gravity_constant = 9.81 // m/s^2
fudge_factor = mx(1 to 1.05, 1 to 1.1, [0.5, 0.5])
jules_in_run = meters_run * weight_displaced * gravity_constant * fudge_factor
jules_to_calories(jules) = jules / 4184 // <https://en.wikipedia.org/wiki/Calorie>
calories_expended_in_run = jules_to_calories(jules_in_run)
// Fake-ish calculatio/i/s
calories_to_restrict_or_spend_for_one_kilogram_weight_loss = 5k to 20k // calories per kg
// lower because calculations could be off, or I could become buffed
// higher because of things like lower energy expenditure, increased apetite, etc.
weight_loss_target = 20 // kg
calories_to_kill = weight_loss_target * calories_to_restrict_or_spend_for_one_kilogram_weight_loss
// Algebra over fake numbers
runs_needed = calories_to_kill / calories_expended_in_run
runs_per_week = 2 to 3
years_needed = runs_needed / (runs_per_week * 52)
// Display
{
runs_needed: runs_needed,
years_needed: years_needed
}
This produces the following output:
So I would have to run between 70-ish and 280-ish 5km run to loose 20kg, which would take me half a year to a bit over two years. And this is if the calorie conversion estimation method can be trusted. In practice, my gut feeling informed by the above is telling me that this is an underestimate, and that I would need somewhere between, say, one and four years to loose that weight, so I’m in for the long haul.
In conversations with people on Twitter and elsewhere, they bring up the following points:
Exercise improves apetite, and as such it’s probably not a great way to loose weight
Reducing calorie intake is “vastly easier” than increasing calorie use
Even a little bit of strength training helps a lot because it increases your resting burn rate.
Excerpt from "Chapter 7: Safeguarding Humanity" of Toby Ord's The Precipice, copied here for later reference. h/t Michael A.
SECURITY AMONG THE STARS?
Many of those who have written about the risks of human extinction suggest that if we could just survive long enough to spread out through space, we would be safe—that we currently have all of our eggs in one basket, but if we became an interplanetary species, this period of vulnerability would end. Is this right? Would settling other planets bring us existential security?
The idea is based on an important statistical truth. If there were a growing number of locations which all need to be destroyed for humanity to fail, and if the chance of each suffering a catastrophe is independent of whether the others do too, then there is a good chance humanity could survive indefinitely.
But unfortunately, this argument only applies to risks that are statistically independent. Many risks, such as disease, war, tyranny and permanently locking in bad values are correlated across different planets: if they affect one, they are somewhat more likely to affect the others too. A few risks, such as unaligned AGI and vacuum collapse, are almost completely correlated: if they affect one planet, they will likely affect all. And presumably some of the as-yet-undiscovered risks will also be correlated between our settlements.
Space settlement is thus helpful for achieving existential security (by eliminating the uncorrelated risks) but it is by no means sufficient. Becoming a multi-planetary species is an inspirational project—and may be a necessary step in achieving humanity’s potential. But we still need to address the problem of existential risk head-on, by choosing to make safeguarding our longterm potential one of our central priorities.
Nitpick: I would have written "this argument only applies to risks that are statistically independent" as "this argument applies to a lesser degree if the risks are not statistically independent, and proportional to their degree of correlation." Space colonization still buys you some risk protection if the risks are not statistically independent but imperfectly correlated. For example, another planet definitely buys you at least some protection from absolute tyranny (even if tyranny in one place is correlated with tyranny elsewhere.)
Sure. So I'm thinking that for impact, you'd have sort of causal factors (Scale, importance, relation to other work, etc.) But then you'd also have proxies of impact, things that you intuit correlate well with having an impact even if the relationship isn't causal. For example, having lots of comments praising some project doesn't normally cause the project to have more impact. See here for the kind of thing I'm going for.
Turing is a cool probabilistic programming new language written on top of Julia. Mostly I just wanted to play around with a different probabilistic programming language, and discard the low-probability hypothesis that things that I am currently doing in Squiggle could be better implemented in it.
My thoughts after downloading it and playing with it a tiny bit are as follows:
1. Installation is annoying: The program is pretty heavy, and it requires several steps (you have to install Julia and then Turing as a package, which is annoying (e.g., I had to figure out where in the filesystem to put the Julia binaries).
Node.js installations can also be pretty gnarly (though there is nvm), but Turing doesn’t have an equivalent online playground. My sense is that running Julia online would also be pretty annoying (?).
2. Compilation and running the thing is slow; 9 seconds until I get an error (I hadn’t installed a necessary package), and then 1 min 26 seconds to run their simplest example (!!)
using Turing
using StatsPlots
# Define a simple Normal model with unknown mean and variance.
@model function gdemo(x, y)
s² ~ InverseGamma(2, 3)
m ~ Normal(0, sqrt(s²))
x ~ Normal(m, sqrt(s²))
y ~ Normal(m, sqrt(s²))
end
# Run sampler, collect results
chn = sample(gdemo(1.5, 2), HMC(0.1, 5), 1000)
# Summarise results
describe(chn)
# Plot and save results
p = plot(chn)
savefig("gdemo-plot.png")
This seems like this is a problem with Julia more generally. Btw, the Julia webpage mentions that Julia “feels like a scripting language”, which seems like a bold-faced lie.
A similar but not equivalent 1 model in Squiggle would run in seconds, and allow for the fast iteration that I know and love:
s = (0.1 to 1)^(1/2) // squiggle doesn't have the inverse gamma function yet
m = normal(0, s)
x = normal(m, s)
y = normal(m, s)
3. Turing is able to do Bayesian inference over parameters, which seems cool & intend to learn more about.
It’s probably kind of weird that Squiggle, as a programming language that manipulates distributions, doesn’t allow for Bayesian inference.
4. Turing seems pretty integrated with Julia, and the documentation seems to assume familiarity with Julia. This can have pros and cons, but made it difficult to just grasp what they are doing.
The pros are that it can use all the Julia libraries, and this looks like it is very powerful
The cons are that it requires familiarity with Julia.
It’s possible that there could be some workflows with Squiggle where we go back and forth between Squiggle and javascript in node; Turing seems like it has that kind of integration down-pat.
5. Turing seems like it could drive some hardcore setups. E.g., here is a project using it to generate election forecasts.
Overall, I dislike the slowness and, as an outsider, the integration with Julia, but I respect the effort. It’s possible but not particularly likely that we may want to first script models in Squiggle and then translate them to a more powerful languages like Turing when speed is not a concern and we need capabilities not natively present in Squiggle (like Baysian inference).
Even if Squiggle had the inverse gamma function, it’s not clear to me that the two programs are doing the same thing, because Turing could be doing something trickier even in that simple example (?). E.g., Squiggle is drawing samples whereas Turing is (?) representing the space of distributions with those pararmeters. This is something I didn’t understand from the documentation.↩
Here is a css snippet to make the forum a bit cleaner. <https://gist.github.com/NunoSempere/3062bc92531be5024587473e64bb2984>. I also like ea.greaterwrong.com under the brutalist setting and with maximum width.
This git repository outlines three scoring rules that I believe might serve current forecasting platforms better than current alternatives. The motivation behind it is my frustration with scoring rules as used in current forecasting platforms, like Metaculus, Good Judgment Open, Manifold Markets, INFER, and others. In Sempere and Lawsen, we outlined and categorized how current scoring rules go wrong, and I think that the three new scoring rules I propose avoid the pitfalls outlined in that paper. In particular, these new incentive rules incentivize collaboration.
I was also frustrated with the "reciprocal scoring" method recently proposed in Karger et al.. It's a method that can be used to resolve questions which may otherwise seem unresolvable or resolve a long time from now. But it resembles a Keynesian Beauty Contest, which means that the forecasters are not incentivized to directly predict reality, but instead to predict the opinion which will be mainstream among forecasters. So I also propose two replacement scoring rules for reciprocal scoring.
I am choosing to publish these scoring rules in Github and in the arxiv1 because journals tend to be extractive2 and time consuming, and because I am in a position to not care about them. In any case, the three scoring rules are:
Beat the house outlines three small tweaks to the logarithmic scoring rule that makes it collaborative and suitable for distributing a fixed amount of reward.
Amplify a Bayesian provides a more speculative alternative to Karger et al.'s reciprocal scoring rule.
Although Amplified Oracle builds upon Beat the house to ensure collaborativeness, I would recommend reading Amplified Oracle first, and then coming back to Beat the house if needed.
Issues (complaints or ideas) or pull requests (tweaks and improvements to our work) are both welcome. I would also like to thank Eli Lifland, Gavin Leech and Misha Yagudin for comments and suggestions, as well as Ezra Karger, SimonM, Jaime Sevilla and others for fruitful discussion.
Taken from this answer, written quickly, might iterate.
As another answer mentioned, I have a forecasting newsletter which might be of interest, maybe going through back-issues and following the links that catch your interest could give you some amount of background information.
For reference works, the Superforecasting book is a good introduction. For the background behind the practice, personally, I would also recommend E.T. Jaynes' Probability Theory, The Logic of Science (find a well-formatted edition, some of the pdfs online are kind of bad), though it's been a hit-or-miss kind of book (some other recommendations can be found in The Best Textbook on every subject thread over on LessWrong.)
As for the why, because knowledge of the world enables control of the world. Leaning into the perhaps-corny badassery, there is a von Neumann quote that goes "All stable processes we shall predict. All unstable processes we shall control". So one can aim for that.
But it's easy to pretend to have models, or to have models that don't really help you navigate the world. And at its best, forecasting enables you to create better models of the world, by discarding the models that don't end up predicting the future and polishing those that do. Other threads that also point to this are "rationality", "good judgment", "good epistemics", " Bayesian statistics".
For a personal example, I have a list of all times I've felt particularly bad, and all the times that I felt all right the next morning. Then I can use Laplace's rule of succession when I'm feeling bad to realize that I'll probably feel ok the next morning.
But it's also very possible to get into forecasting, or into prediction markets with other goals. For instance, one can go in the making money direction, or in the "high-speed trading" or "playing the market" (predicting what the market will predict) directions. Personally, I do see the appeal of making lots of money, but I dispositionally like the part where I get better models of the world more.
Lastly, I sometimes see people who kind of get into forecasting but don't really make that many predictions, or who are good forecasters aspirationally only. I'd emphasize that even as the community can be quite welcoming to newcomers, deliberate practice is in fact needed to get good at forecasting. For a more wholesome way to put this, see this thread. So good places to start practicing are probably Metaculus (for the community), PredictionBook or a spreadsheet if you want to go solo, or Good Judgment Open if you want the "superforecaster" title.
Summary for myself. Note: Pretty stream-of-thought.
Proving too much
The set of all possible futures is infinite which somehow breaks some important assumptions longtermists are apparently making.
Somehow this fails to actually bother me
...the methodological error of equating made up numbers with real data
This seems like a cheap/unjustified shot. In the world where we can calculate the expected values, it would seems fine to compare (wide, uncertain) speculative interventions with harcore GiveWell data (note that the next step would probably be to get more information, not to stop donating to GiveWell charities)
Sometimes, expected utility is undefined (Pasadena game)
The Pasadena game also fails to bother me, because the series hasn't (yet) showed that longtermism bets are "Pasadena-like"
(Also, note that you can use stochastic dominance to solve many expected value paradoxes, e.g, to decide between two universes with infinite expected value, or with undefined expected value.)
...mention of E.T. Jaynes
Yeah, I'm also a fan of E.T. Jaynes, and I think that this is a cheap shot, not an argument.
Subject, Object, Instrument
This section seems confused/bad. In particular, there is a switch from "credences are subjective" to "we should somehow change our credences if this is useful". No, if one's best guess is that "the future is vast in size", then considering that one can change one's opinions to better attain goals doesn't make it stop being one's best guess
Overall: The core of this section seems to be that expected values are sometimes undefined. I agree, but this doesn't deter me from trying to do the most good by seeking more speculative/longtermist interventions. I can use stochastic dominance when expected utility fails me.
Then, using our figure of one quadrillion lives, the expected good done by Shivani contributing $10,000 to [preventing world domination by a repressive global political regime] would, by the lights of utilitarian axiology, be 100 lives. In contrast, funding for the Against Malaria Foundation, often regarded as the most cost-effective intervention in the area of short-term global health improvements, on average saves one life per $3500. (Nuño: italics and bold from the OP, not from original article)
I agree that the paragraph just intuitively looks pretty bad, so I looked at the context:
Now, the argument we are making is ultimately a quantitative one: that the expected impact one can have on the long-run future is greater than the expected impact one can have on the short run. It’s not true, in general, that options that involve low probabilities of high stakes systematically lead to greater expected values than options that involve high probabilities of modest payoffs: everything depends on the numbers. (For instance, not all insurance contracts are worth buying.) So merely pointing out that one might be able to influence the long run, or that one can do so to a nonzero extent (in expectation), isn’t enough for our argument. But, we will claim, any reasonable set of credences would allow that for at least one of these pathways, the expected impact is greater for the long-run.
Suppose, for instance, Shivani thinks there’s a 1% probability of a transition to a world government in the next century, and that $1 billion of well-targeted grants — aimed (say) at decreasing the chance of great power war, and improving the state of knowledge on optimal institutional design — would increase the well-being in an average future life, under the world government, by 0.1%, with a 0.1% chance of that effect lasting until the end of civilisation, and that the impact of grants in this area is approximately linear with respect to the amount of spending. Then, using our figure of one quadrillion lives to come, the expected good done by Shivani contributing $10,000 to this goal would, by the lights of a utilitarian axiology, be 100 lives. In contrast, funding for Against Malaria Foundation, often regarded as the most cost-effective intervention in the area of short-term global health improvements, on average saves one life per $3500
Yeah, this is in the context of a thought experiment. I'd still do this with distributions rather than with point estimates, but ok.
The Credence Assumption
Ok, so the OP wants to argue that expected value theory breaks => the tool is not useful => we should abandon credences => longtermism somehow fails.
But I think that "My best guess is that I can do more good with more speculative interventions" is fairly robust to that line of criticism; it doesn't stop being my best guess just because credences are subjective.
E.g., if my best guess is that ALLFED does "more good" (e.g., more lives saved in expectation) than GiveWell charities, pointing out that actually the expected value is undefined (maybe the future contains both infinite amounts of flourishing and suffering) doesn't necessarily change my conclusion if I still think that donating to ALLFED is stochastically dominant.
Cox Theorem requires that probabilities be real numbers
The OP doesn't buy that. Sure, a piano is not going to drop on his head, but he might e.g., make worse decisions on account of being overconfident because he has not been keeping track of his (numerical) predictions and thus suffers from more hindsight bias than someone who kept track.
But what alternative do we have?
One can use e.g., upper and lower bounds on probabilities instead of real valued numbers: Sure, I do that. Longtermism still doesn't break.
Instead of relying on explicit expected value calculations, we should rely on evolutionary approaches
The Poverty of Longtermism
"In 1957, Karl Popper proved it is impossible to predict the future of humanity, but scholars at the Future of Humanity Institute insist on trying anyway"
If one takes Toby Ord's x-risk estimates (from here), but adds some uncertainty, one gets: this Guesstimate. X-risk ranges from 0.1 to 0.3, with a point estimate of 0.19, or 1 in 5 (vs 1 in 6 in the book).
Current takeaways from the 2024 US election <> forecasting community.
First section in Forecasting newsletter: US elections, posting here because it has some overlap with EA.
I don't have time to write a detailed response now (might later), but wanted to flag that I either disagree or "agree denotatively but object connotatively" with most of these. I disagree most strongly with #3: the polls were quite good this year. National and swing state polling averages were only wrong by 1% in terms of Trump's vote share, or in other words 2% in terms of margin of victory. This means that polls provided a really large amount of information.
(I do think that Selzer's polls in particular are overrated, and I will try to articulate that case more carefully if I get around to a longer response.)
Oh cool, Scott Alexander just said almost exactly what I wanted to say about your #2 in his latest blog post: https://www.astralcodexten.com/p/congrats-to-polymarket-but-i-still
My sense is that the polls were heavily reweighted by demographics, rather than directly sampling from the population. That said, I welcome your nitpicks, even if brief
Think I disagree especially strongly with #6. Of all the reasons to think Musk might be a genius, him going all in on 60/40 odds is definitely not one of them Especially since he could probably have got an invite to Mar-a-Lago and President Trump's ear on business and space policy with a small donation and generic "love Donald's plans to make American business great again" endorsement, and been able to walk it right back again whenever the political wind was blowing the other way. I don't think he's spent his time and much of his fortune to signal boost catturd tweets out of calm calculation of which way the political wind was blowing.
Biggest and highest profile donor to the winning side last time round didn't do too well out of it either, and he probably did think he was being clever and calculating
(Lifelong right winger Thiel's "I think it's 50/50 who will win but my contrarian view is I also don't think it'll be close" was great hedging his bets, on the other hand!) .
How much of that do you think was about what the legacy institutions knew vs. what they publicly communicated? The Polymarket hive mind doesn't necessarily care about things like maintaining democratic institutions (like not making calls that could influence elections elsewhere with still-open polls) or long-term individual reputation (like having to walk the Florida call back in 2000). I don't see those as weaknesses.
If you have {publicly competent, publicly incompetent, privately incompetent, privately competent}, we get some information that screens off publicly competent. That leaves the narrower set {publicly incompetent, privately incompetent, privately competent}. So it's still an update. I agree that in this case there is some room for doubt though. Depending on which institutions we are thinking of (Democratic party, newspapers, etc.), we also get some information from the speed at which people decided on/learned that Biden was going to step down.
I also frankly don't think they're necessarily as interested at making speedy decisions based on county level polls, which is exactly the sort of real time stats checking you'd expect prediction market enthsiasts to be great at
(obviously trad media does report on county level polls and ultimately use them to decide if they're happy to call a race, but they don't have much incentive to be first and they're tracking and checking a lot of other stuff like politico commentary and rumours and relevance of human interest stories and silly voxpops)
Throw in the fact that early counts are sometimes skewed in favour of another candidate which changes around as later voters or postal votes or ballot boxes from further out districts within a county get tallied up. This varies according to jurisdiction rules and demographics and voting trends, and it's possible serious Poly Market betters were extremely clued up on them. But this time around, they'd have been just as right about state level outcomes if much of the money moved on the relatively naive assumption that you couldn't expect any late swings, and not factoring in that possibility would be bad calibration.
Prompted by a different forum:
I'm surprised by this. I don't feel like the Forum bans people for a long time for first offences?
I don't think not banning users for first offences is necessary the highest bar I want to reach for. For instance, consider this comment. Like, to exaggerate this a bit, imagine receiving that comment in one of the top 3 worst moments of your life.
I value a forum where people are not rude to each other, so I think it is good that moderators give out a warning to people who are becoming increasingly rude in a short time.
A factor here is that EA has historically had a mental health problem (https://forum.effectivealtruism.org/posts/FheKNFgPqEsN8Nxuv/ea-mental-health-survey-results-and-analysis?commentId=hJweNWsmJr8Jtki86).
At some point in the 90s or the 00s, the "whole of person" concept became popular in the US Natsec community for security clearance matters.
It distinguishes between a surface level vibe from a person, and trying to understand the whole person. The surface level vibe is literally taking the worst of a person and taking it out of context, whereas the whole person concept is making any effort at all to evaluate the person and the odds that they're good to work with and on what areas. Each subject has their own cost-benefit analysis in the context of different work they might do, and more flexible people (e.g. younger) and weirder people will probably have cost-benefit analysis that change somewhat over time.
In environments where evaluators are incompetent, lack the resources needed to evaluate each person, or believe that humans can't be evaluated, then there's a reasonable justification to rule people out without making an effort to optimize.
Otherwise, evaluators should strive to make predictions and minimize the gap between their predictions of whether a subject will cause harm again, and the reality that comes to pass; for example, putting in any effort at all to succeed at distinguishing between individuals causing harm due to mental health, individuals causing harm due to mistakes due to unpreventable ignorance (e.g. the pauseAI movement), mistakes caused by ignorance that should have been preventable, harm caused by malice correctly attributed to the subject, harm caused by someone spoofing the point of origin, or harm caused by a hostile individual, team, or force covertly using SOTA divide-and-conquer tactics to disrupt or sow discord in an entire org, movement, or vulnerable clique; see Conflict vs mistake theory.
Reasons why upvotes on the EA forum and LW don't correlate that well with impact .
I agree that the correlation between number of upvotes on EA forum and LW posts/comments and impact isn't very strong. (My sense is that it's somewhere between weak and strong, but not very weak or very strong.) I also agree that most of the reasons you list are relevant.
But how I'd frame this is that - for example - a post being more accessible increases the post's expected upvotes even more than it increases its expected impact. I wouldn't say "Posts that are more accessible get more upvotes, therefore the correlation is weak", because I think increased accessibility will indeed increase a post's impact (holding other factor's constant).
Same goes for many of the other factors you list.
E.g., more sharing tends to both increase a post's impact (more readers means more opportunity to positively influence people) and signal that the post would have a positive impact on each reader (as that is one factor - among many - in whether people share things). So the mere fact that sharing probably tends to increase upvotes to some extent doesn't necessarily weaken the correlation between upvotes and impact. (Though I'd guess that sharing does increase upvotes more than it increases/signals impact, so this comment is more like a nitpick than a very substantive disagreement.)
To make it clear, the claim is that the number karma for a forum post on a project does not correlate well with the project's direct impact? Rather than, say, that a karma score of a post correlates well with the impact of the post itself on the community?
I'd say it also doesn't correlate that well with its total (direct+indirect) impact either, but yes. And I was thinking more in contrast to the karma score being an ideal measure of total impact; I don't have thoughts to share here on the impact of the post itself on the community.
Thanks, that makes sense.
I think that for me, I upvote according to how much I think a post itself is valuable for me or for the community as a whole. At least, that's what I'm trying to do when I'm thinking about it logically.
Open Philanthopy’s allocation by cause area
Open Philanthropy’s grants so far, roughly:
This only includes the top 8 areas. “Other areas” refers to grants tagged “Other areas” in OpenPhil’s database. So there are around $47M in known donations missing from that graph. There is also one (I presume fairly large) donation amount missing from OP’s database, to Impossible Foods
See also as a tweet and on my blog. Thanks to @tmkadamcz for suggesting I use a bar chart.
One thing I can never figure out is where the missing Open Phil donations are! According to their own internal comms (e.g. this job advert) they gave away roughly $450 million in 2021. Yet when you look at their grants database, you only find about $350 million, which is a fair bit short. Any idea why this might be?
I think it could be something to do with contractor agreement (e.g. they gave $2.8 million to Kurzgesagt and said they don't tend to publish similar contractor agreements like these). Curious to see the breakdown of the other approx. $100 million though!
We're still in the process of publishing our 2021 grants, so many of those aren't on the website yet. Most of the yet-to-be-published grants are from the tail end of the year — you may have noticed a lot more published grants from January than December, for example.
That accounts for most of the gap. The gap also includes a few grants that are unusual for various reasons (e.g. a grant for which we've made the first of two payments already but will only publish once we've made the second payment a year from now).
We only include contractor agreements in our total giving figures if they are conceptually very similar to grants (Kurzgesagt is an example of this). Those are also the contractor agreements we tend to publish. In other words, an agreement that isn't published is very unlikely to show up in our total giving figures.
I'm guessing $10M-$50M to something like Impossible Food, and $50-100M to political causes
We publish our giving to political causes just as we publish our other giving (e.g. this ballot initiative).
As with contractor agreements, we publish investments and include them in our total giving if they are conceptually similar to grants (meaning that investments aren't part of the gap James noted). You can see a list of published investments by searching "investment" in our grants database.
I did a variation on this analysis here: https://github.com/RyanCarey/openphil
Any thoughts on way AI keeps expanding then shrinking? Is it due to 2 year grants?
Giant grants for new orgs like CSET and Redwood (but overwhelmingly CSET)
Estimates of how long various evaluations took me (in FTEs)
The recent EA Forum switch to to creative commons license (see here) has brought into relief for me that I am fairly dependent on the EA forum as a distribution medium for my writing.
Partly as a result, I've invested a bit on my blog: <https://nunosempere.com/blog/>, added comments, an RSS feed, and a newsletter.
Arguably I should have done this years ago. I also see this dependence with social media, where a few people I know depend on Twitter, Instagram &co for the distribution of their content & ideas.
(Edit: added some more thoughts here)
Prizes in the EA Forum and LW.
I was looking at things other people had tried before.
EA Forum
How should we run the EA Forum Prize?
Cause-specific Effectiveness Prize (Project Plan)
Announcing Li Wenliang Prize for forecasting the COVID-19 outbreak
Announcing the Bentham Prize
$100 Prize to Best Argument Against Donating to the EA Hotel
Essay contest: general considerations for evaluating small-scale giving opportunities ($300 for winning submission)
Cash prizes for the best arguments against psychedelics being an EA cause area
Cause-specific Effectiveness Prize (Project Plan)
Debrief: "cash prizes for the best arguments against psychedelics"
Cash prizes for the best arguments against psychedelics being an EA cause area
A black swan energy prize
AI alignment prize winners and next round
$500 prize for anybody who can change our current top choice of intervention
The Most Good - promotional prizes for EA chapters from Peter Singer, CEA, and 80,000 Hours
LW (on the last 5000 posts)
Over $1,000,000 in prizes for COVID-19 work from Emergent Ventures
The Dualist Predict-O-Matic ($100 prize)
Seeking suggestions for EA cash-prize contest
Announcement: AI alignment prize round 4 winners
A Gwern comment on the Prize literature
[prize] new contest for Spaced Repetition literature review ($365+)
[Prize] Essay Contest: Cryonics and Effective Altruism
Announcing the Quantified Health Prize
Oops Prize update
Some thoughts on: https://groups.google.com/g/lw-public-goods-team
AI Alignment Prize: Round 2 due March 31, 2018
Quantified Health Prize results announced
FLI awards prize to Arkhipov’s relatives
Progress and Prizes in AI Alignment
Prize for probable problems
Prize for the best introduction to the LessWrong source ($250)
How to replicate.
Go to the EA forum API or to the LW API and input the following query:
Copy the output into a last5000posts.txt
Search for the keyword "prize". In Linux one can use this with
grep "prize" last5000posts.txt
, or withgrep -B 1 "prize" last5000posts.txt | sed 's/^.*: //' | sed 's/\"//g' > last500postsClean.txt
to produce a cleaner output.Can't believe I forgot the D-Prize, which awards $20,000 USD for teams to distribute proven poverty interventions.
Brief thoughts on my personal research strategy
Sharing a post from my blog: <https://nunosempere.com/blog/2022/10/31/brief-thoughts-personal-strategy/>; I prefer comments there.
Here are a few estimation related things that I can be doing:
Now, within the Quantified Uncertainty Research Institute I have been kind of trying to do something like this:
That is, I have in theory been trying to directly aim for the jugular of growing evaluation capacity directly. I believe that this is valuable because once that capacity exists, it can be applied to estimate the many things for which estimation is currently unfeasible. However, although I buy the argument in the abstract, I have been finding that a bit demotivating. Instead, I would like to be doing something like:
I think I have the strong intuition that producing value in between scaling produces feedback that is valuable and that otherwise can’t be accessed just by aiming for scaling. As a result, I have been trying to do things which also prove valuable in the meantime. This might have been to the slight irritation of my boss, who believes more in going for the yugular directly. Either way, I also think I will experience some tranquility from being more intentional about this piece of strategy.
In emotional terms, things that are aiming solely for scaling—like predicting when a long list of mathematical theorems will be solved—feel “dry”, “unengaging”, “a drag”, “disconnected”, or other such emotional descriptors.
I can think of various things that could change my mind and my intuitions about this topic, such as:
What happened in forecasting in March 2020
Epistemic status: Experiment. Somewhat parochial.
Prediction platforms.
Misc.
The Stanford Social Innovation Review makes the case (archive link) that new, promising interventions are almost never scaled up by already established, big NGOs.
I suppose I just assumed that scale ups happened regularly at big NGOs and I never bothered to look closely enough to notice that it didn't. I find this very surprising.
Here is an excerpt from a draft that didn't really fit in the main body.
Getting closer to expected value calculations seems worth it even if we can't reach them
Because there are many steps between quantification and impact, quantifying the value of quantification might be particularly hard. That said, each step towards getting closer to expected value calculations seems valuable even if we never arrive at expected value calculations. For example:
Infinite Ethics 101: Stochastic and Statewise Dominance as a Backup Decision Theory when Expected Values Fail
First posted on nunosempere.com/blog/2022/05/20/infinite-ethics-101 , and written after one too many times encountering someone who didn't know what to do when encountering infinite expected values.
In Exceeding expectations: stochastic dominance as a general decision theory, Christian Tarsney presents stochastic dominance (to be defined) as a total replacement for expected value as a decision theory. He wants to argue that one decision is only rationally better as another one when it is stochastically dominant. For this, he needs to say that the choiceworthiness of a decision (how rational it is) is undefined in the case where one decision doesn’t stochastically dominate another one.
I think this is absurd, and perhaps determined by academic incentives to produce more eye-popping claims rather than more restricted incremental improvements. Still, I thought that the paper made some good points about us still being able to make decisions even when expected values stop being informative. It was also my introduction to extending rational decision-making to infinite cases, and a great introduction at that. Below, I outline my rudimentary understanding of these topics.
Where expected values fail.
Consider a choice between:
So the expected value of choice A is ∞, as is the expected value of choice B. And yet, B is clearly preferable to A. What gives?
Statewise dominance
Suppose that in the above case, there were different possible states, as if the payoffs for A and B were determined by the same coin throws:
Then in this case, B dominates A in every possible state. This is a reasonable decision principle that we can reach to ground our decision to choose B over A.
Stochastic dominance
O stochastically dominates P if:
or, in math notation:
This captures a notion that O is, in a sense, strictly better than P, probabilistically.
In the case of A and B above, if their payoffs were determined by throwing independent coins:
So the probability that B gets increasingly better outcomes is higher than the probability that A will do so. So in this case, B stochastically dominates A. Stochastic dominance is thus another decision principle that we could reach to compare choices with infinite expected values.
Gaps left
The above notions of stochastic and statewise dominance could be expanded and improved. For instance, we could ignore a finite number of comparisons going the other way if the expected value of those options was finite but the expected value of the whole thing was infinite. For instance, in the following comparison:
I would still say that B is preferable to A in that case. And my impression is that there are many similar principles one could reach to, in order to resolve many but not all comparisons between infinite sequences.
Exercise for the reader: Come up with two infinite sequences which cannot be compared using statewise or stochastic dominance, or similar principles.
You could discount utilons - say there is a “meta-utilon” which is a function of utilons, like maybe meta utilons = log(utilons). And then you could maximize expected metautilons rather than expected utilons. Then I think stochastic dominance is equivalent to saying “better for any non decreasing metautilon function”.
But you could also pick a single metautilon function and I believe the outcome would at least be consistent.
Really you might as well call the metautilons “utilons” though. They are just not necessarily additive.
A monotonic transformation like log doesn’t solve the infinity issue right?
Time discounting (to get you comparisons between finite sums) doesn’t preserve the ordering over sequences.
This makes me think you are thinking about something else?
Monotonic transformations can indeed solve the infinity issue. For example the sum of 1/n doesn’t converge, but the sum of 1/n^2 converges, even though x -> x^2 is monotonic.
I've been blogging on nunosempere.com/blog/ for the past few months. If you are reading this shortform, you might want to check out my posts there.
Quality Adjusted Research Papers
Taken from here, but I want to be able to refer to the idea by itself.
This spans six orders of magnitude (1 to 1,000,000 mQ), but I do find that my intuitions agree with the relative values, i.e., I would probably sacrifice each example for 10 equivalents of the preceding type (and vice-versa).
A unit — even if it is arbitrary or ad-hoc — makes relative comparison easier, because projects can be compared to a reference point, rather than between each other.. It also makes working with different orders of magnitude easier: instead of asking how valuable a blog post is compared to a foundational paper, one can move up and down in steps of 10x, which seems much more manageable.
The Tragedy of Calisto and Melibea
https://nunosempere.com/blog/2022/06/14/the-tragedy-of-calisto-and-melibea/
Enter CALISTO, a young nobleman who, in the course of his adventures, finds MELIBEA, a young noblewoman, and is bewitched by her appearance.
CALISTO: Your presence, Melibea, exceeds my 99.9% percentile prediction.
MELIBEA: How so, Calisto?
CALISTO: In that the grace of your form, its presentation and concealment, its motions and ornamentation are to me so unforeseen that they make me doubt my eyes, my sanity, and my forecasting prowess. In that if beauty was an alchemical substance, you would have four parts of it for every one part that all the other dames in the realm together have.
MELIBEA: But do go on Calisto, that your flattery is not altogether unpleasant to my ears.
CALISTO: I must not, for I am in an URGENT MISSION on which the fate of the kingdom rests. And yet, even with my utilitarian inclinations, I would grant you ANY BOON IN MY POWER for A KISS, even if my delay would marginally increase the risk to the realm.
Calisto then climbs up to Melibea’s balcon and GETS A KISS. When going down, he LOOKS BACK at Melibea, slips, BREAKS HIS NECK, and DIES.
The END.
A comment I left on Knightian Uncertainty here.:
The way this finally clicked for me was: Sure, Bayesian probability theory is the one true way to do probability. But you can't actually implement it.
In particular, problems I've experienced are:
- I'm sometimes not sure about my calibration in new domains
- Sometimes something happens that I couldn't have predicted beforehand (particularly if it's very specific), and it's not clear what the Bayesian update should be. Note that I'm talking about "something took me completely by surprise" rather than "something happened to which I assigned a low probability"
- I can't actually compute how many bits of evidence new data comes. So for instance I get some new information, and I don't actually just instantaneously know that I was at 12.345% and now I'm at 54.321%. I have to think about it. But before I've thought about it I'm sometimes like a deer in the headlights, and my probability might be "Aaaah, I don't know."
- Sometimes I'll be in an uncertain situation, and yeah, I'm uncertain, but I'd still offer a $10k bet on it. Or I'd offer a smaller bet with a spread (e.g., I'd be willing to bet $100 at 1:99 in favor but 5:95 against). But sometimes I really am just very un-eager to bet.
Some links to dig into this might be: https://www.lesswrong.com/posts/vpvLqinp4FoigqvKy/reflective-bayesianism, https://www.lesswrong.com/posts/xJyY5QkQvNJpZLJRo/radical-probabilism-1
That said, I do think that people are too eager to say that something is under "Knightian uncertainty" when they could just put up a question on Metaculus (or on a prediction market) about it.
Testing embedding arbitrary markets in the EA Forum
Will a Democrat (Warnock) or Republican (Walker) win in Georgia?
Waaaaiiit I didn't know you could do this!
It was recently done, in collaboration with the Manifold guys. I'm also making sure that the dimensions are right, see: <https://github.com/ForumMagnum/ForumMagnum/pull/6096>
It was just pushed!
CoronaVirus and Famine
The Good Judgement Open forecasting tournament gives a 66% chance for the answer to "Will the UN declare that a famine exists in any part of Ethiopia, Kenya, Somalia, Tanzania, or Uganda in 2020?"
I think that the 66% is a slight overestimate. But nonetheless, if a famine does hit, it would be terrible, as other countries might not be able to spare enough attention due to the current pandemic.
It is not clear to me what an altruist who realizes that can do, as an individual:
Donating to the World Food Programme, which is already doing work on the matter, might be a promising answer, but I haven't evaluated the programe, nor compared it to other potentially promising options (see here: https://forum.effectivealtruism.org/posts/wpaZRoLFJy8DynwQN/the-best-places-to-donate-for-covid-19, or https://www.againstmalaria.com/)
Did you mean to post this using the Markdown editor? Currently, the formatting looks a bit odd from a reader's perspective.
Ethiopia's Tigray region has seen famine before: why it could happen again - The Conversation Africa
https://theconversation.com/ethiopias-tigray-region-has-seen-famine-before-why-it-could-happen-again-150181
Tue, 17 Nov 2020 13:38:00 GMT
The Tigray region is now seeing armed conflict. I'm at 5-10%+ that it develops into famine (regardless of whether it ends up meeting the rather stringent UN conditions for the term to be used) (but have yet to actually look into the base rate). I've sent an email to FEWs.net to see if they update their forecasts.
How much would I have to run to lose 20 kilograms?
Originally posted on my blog, @ <https://nunosempere.com/blog/2022/07/27/how-much-to-run-to-lose-20-kilograms/>
In short, from my estimates, I would have to run 70-ish to 280-ish 5km runs, which would take me between half a year and a bit over two years. But my gut feeling is telling me that it would take me twice as long, say, between a year and four.
I came up with that estimate because was recently doing some exercise and I didn’t like the machine’s calorie loss calculations, so I rolled some calculations of my own, in Squiggle
So if I feed this calculation into squiggle this calculation tells me that I spent 1410 calories, as opposed to the 500-ish that I get from the machine, or similar quantities from random internet sites. The site linked even has a calculator, which is pretty wrong.
Anyways, I also want to add uncertainty, both because
This produces the following calorie estimates:
But I don’t just want to know the number of calories, I want to know how this relates to weight. This random hackernews comment mentions that 1kg of fat is 7700 calories, and the number is likewise spread across the internet, but I somehow don’t have that much confidence in it. After some searches on Google Scholar, I settle on “calories per kilogram of adipose tissue” as the key term I’m looking for, and I find this paper. The paper mentions that after an initial weight loss period, the body becomes more lethargic, which does intutively make sense. So because your body adapts to lower caloric consumption (or to higher energy expenditure), these calculations don’t really make sense directly.
Still, I think that they can establish a rough lower bound of how much exercise I would have to do. So, continuing with that model:
This produces the following output:
So I would have to run between 70-ish and 280-ish 5km run to loose 20kg, which would take me half a year to a bit over two years. And this is if the calorie conversion estimation method can be trusted. In practice, my gut feeling informed by the above is telling me that this is an underestimate, and that I would need somewhere between, say, one and four years to loose that weight, so I’m in for the long haul.
In conversations with people on Twitter and elsewhere, they bring up the following points:
Excerpt from "Chapter 7: Safeguarding Humanity" of Toby Ord's The Precipice, copied here for later reference. h/t Michael A.
SECURITY AMONG THE STARS?
Many of those who have written about the risks of human extinction suggest that if we could just survive long enough to spread out through space, we would be safe—that we currently have all of our eggs in one basket, but if we became an interplanetary species, this period of vulnerability would end. Is this right? Would settling other planets bring us existential security?
The idea is based on an important statistical truth. If there were a growing number of locations which all need to be destroyed for humanity to fail, and if the chance of each suffering a catastrophe is independent of whether the others do too, then there is a good chance humanity could survive indefinitely.
But unfortunately, this argument only applies to risks that are statistically independent. Many risks, such as disease, war, tyranny and permanently locking in bad values are correlated across different planets: if they affect one, they are somewhat more likely to affect the others too. A few risks, such as unaligned AGI and vacuum collapse, are almost completely correlated: if they affect one planet, they will likely affect all. And presumably some of the as-yet-undiscovered risks will also be correlated between our settlements.
Space settlement is thus helpful for achieving existential security (by eliminating the uncorrelated risks) but it is by no means sufficient. Becoming a multi-planetary species is an inspirational project—and may be a necessary step in achieving humanity’s potential. But we still need to address the problem of existential risk head-on, by choosing to make safeguarding our longterm potential one of our central priorities.
Nitpick: I would have written "this argument only applies to risks that are statistically independent" as "this argument applies to a lesser degree if the risks are not statistically independent, and proportional to their degree of correlation." Space colonization still buys you some risk protection if the risks are not statistically independent but imperfectly correlated. For example, another planet definitely buys you at least some protection from absolute tyranny (even if tyranny in one place is correlated with tyranny elsewhere.)
Here is a more cleaned up — yet still very experimental — version of a rubric I'm using for the value of research:
Expected
Impact
Per Unit of Resources
See also: Charity Entrepreneurship's rubric, geared towards choosing which charity to start.
I like it! I think that something in this vein could potentially be very useful. Can you expand more about the proxies of impact?
Sure. So I'm thinking that for impact, you'd have sort of causal factors (Scale, importance, relation to other work, etc.) But then you'd also have proxies of impact, things that you intuit correlate well with having an impact even if the relationship isn't causal. For example, having lots of comments praising some project doesn't normally cause the project to have more impact. See here for the kind of thing I'm going for.
Some thoughts on Turing.jl
Originally published on my blog @ <https://nunosempere.com/blog/2022/07/23/thoughts-on-turing-julia/>
Turing is a cool probabilistic programming new language written on top of Julia. Mostly I just wanted to play around with a different probabilistic programming language, and discard the low-probability hypothesis that things that I am currently doing in Squiggle could be better implemented in it.
My thoughts after downloading it and playing with it a tiny bit are as follows:
1. Installation is annoying: The program is pretty heavy, and it requires several steps (you have to install Julia and then Turing as a package, which is annoying (e.g., I had to figure out where in the filesystem to put the Julia binaries).
2. Compilation and running the thing is slow; 9 seconds until I get an error (I hadn’t installed a necessary package), and then 1 min 26 seconds to run their simplest example (!!)
This seems like this is a problem with Julia more generally. Btw, the Julia webpage mentions that Julia “feels like a scripting language”, which seems like a bold-faced lie.
A similar but not equivalent 1 model in Squiggle would run in seconds, and allow for the fast iteration that I know and love:
3. Turing is able to do Bayesian inference over parameters, which seems cool & intend to learn more about.
It’s probably kind of weird that Squiggle, as a programming language that manipulates distributions, doesn’t allow for Bayesian inference.
4. Turing seems pretty integrated with Julia, and the documentation seems to assume familiarity with Julia. This can have pros and cons, but made it difficult to just grasp what they are doing.
It’s possible that there could be some workflows with Squiggle where we go back and forth between Squiggle and javascript in node; Turing seems like it has that kind of integration down-pat.
5. Turing seems like it could drive some hardcore setups. E.g., here is a project using it to generate election forecasts.
Overall, I dislike the slowness and, as an outsider, the integration with Julia, but I respect the effort. It’s possible but not particularly likely that we may want to first script models in Squiggle and then translate them to a more powerful languages like Turing when speed is not a concern and we need capabilities not natively present in Squiggle (like Baysian inference).
See also:
Here is a css snippet to make the forum a bit cleaner. <https://gist.github.com/NunoSempere/3062bc92531be5024587473e64bb2984>. I also like ea.greaterwrong.com under the brutalist setting and with maximum width.
Better scoring rules
From SamotsvetyForecasting/optimal-scoring:
How to get into forecasting, and why?
Taken from this answer, written quickly, might iterate.
As another answer mentioned, I have a forecasting newsletter which might be of interest, maybe going through back-issues and following the links that catch your interest could give you some amount of background information.
For reference works, the Superforecasting book is a good introduction. For the background behind the practice, personally, I would also recommend E.T. Jaynes' Probability Theory, The Logic of Science (find a well-formatted edition, some of the pdfs online are kind of bad), though it's been a hit-or-miss kind of book (some other recommendations can be found in The Best Textbook on every subject thread over on LessWrong.)
As for the why, because knowledge of the world enables control of the world. Leaning into the perhaps-corny badassery, there is a von Neumann quote that goes "All stable processes we shall predict. All unstable processes we shall control". So one can aim for that.
But it's easy to pretend to have models, or to have models that don't really help you navigate the world. And at its best, forecasting enables you to create better models of the world, by discarding the models that don't end up predicting the future and polishing those that do. Other threads that also point to this are "rationality", "good judgment", "good epistemics", " Bayesian statistics".
For a personal example, I have a list of all times I've felt particularly bad, and all the times that I felt all right the next morning. Then I can use Laplace's rule of succession when I'm feeling bad to realize that I'll probably feel ok the next morning.
For a more EA example, see the probability estimates on Shallow evaluations of longtermist organizations., or maybe pathways to impact for forecasting and evaluation for something more abstract.
But it's also very possible to get into forecasting, or into prediction markets with other goals. For instance, one can go in the making money direction, or in the "high-speed trading" or "playing the market" (predicting what the market will predict) directions. Personally, I do see the appeal of making lots of money, but I dispositionally like the part where I get better models of the world more.
Lastly, I sometimes see people who kind of get into forecasting but don't really make that many predictions, or who are good forecasters aspirationally only. I'd emphasize that even as the community can be quite welcoming to newcomers, deliberate practice is in fact needed to get good at forecasting. For a more wholesome way to put this, see this thread. So good places to start practicing are probably Metaculus (for the community), PredictionBook or a spreadsheet if you want to go solo, or Good Judgment Open if you want the "superforecaster" title.
Notes on: A Sequence Against Strong Longtermism
Summary for myself. Note: Pretty stream-of-thought.
Proving too much
Overall: The core of this section seems to be that expected values are sometimes undefined. I agree, but this doesn't deter me from trying to do the most good by seeking more speculative/longtermist interventions. I can use stochastic dominance when expected utility fails me.
The post also takes issue with the following paragraph from The Case For Strong Longtermism:
I agree that the paragraph just intuitively looks pretty bad, so I looked at the context:
Yeah, this is in the context of a thought experiment. I'd still do this with distributions rather than with point estimates, but ok.
The Credence Assumption
The Poverty of Longtermism
If one takes Toby Ord's x-risk estimates (from here), but adds some uncertainty, one gets: this Guesstimate. X-risk ranges from 0.1 to 0.3, with a point estimate of 0.19, or 1 in 5 (vs 1 in 6 in the book).
I personally would add more probability to unforeseen natural risk and unforeseen anthropocentric risk
The uncertainty regarding AI risk is driving most of the overall uncertainty.
2020 U.S. Presidential election to be most expensive in history, expected to cost $14 billion - The Hindu https://www.thehindu.com/news/international/2020-us-presidential-election-to-be-most-expensive-in-history-expected-to-cost-14-billion/article32969375.ece Thu, 29 Oct 2020 03:17:43 GMT
Testing shortform