All of Jaime Sevilla's Comments + Replies

(speculating) The key property you are looking for IMO is to which degree people are looking at different information when making forecasts. Models that parcel reality into neat little mutually exclusive packages are more amenable , while forecasts that obscurely aggregate information from independent sources will work better with geomeans. 

In any case, this has little bearing on aggregating welfare IMO. You may want to check out geometric rationality as an account that lends itself more to using geometric aggregation of welfare. 

Interesting case. I can see the intuitive case for the median.

I think the mean is more appropriate - in this case, what this is telling you is that your uncertainty is dominated by the possibility of a fat tail, and the priority is ruling it out.

I'd still report both for completeness sake, and to illustrate the low resilience of the guess.

Very much enjoyed the posts btw

2
Vasco Grilo
3mo
Thanks for the feedback! I am still standing by the median, but it would be nice to empirically investigate similar cases, and see which method performs better!

Amazing achievements Mel! With your support, the group is doing a fantastic job, and I am excited about its direction.

>his has meant that, currently, our wider community lacks a clear direction, so it’s been harder to share resources among sub-groups and to feel part of a bigger community striving for a common goal.

I feel similarly! At the time being, it feels that our community has fragmented into many organizations and initiatives: Ayuda Efectiva, Riesgos Catastróficos Globales, Carreras con Impacto, EAGx LatAm, EA Barcelona. I would be keen on develo... (read more)

2
Melanie Brennan
4mo
Thanks for your support, Jaime!  Yes, I think a team brainstorming session would be useful. The more voices we hear on this, the clearer it will be what needs to happen to improve the situation. Let's discuss this soon in Slack!

I have so many axes of disagreement that is hard to figure out which one is most relevant. I guess let's go one by one.

Me: "What do you mean when you say AIs might be unaligned with human values?"

I would say that pretty much every agent other than me (and probably me in different times and moods) are "misaligned" with me, in the sense that I would not like a world where they get to dictate everything that happens without consulting me in any way.

This is a quibble because in fact I think if many people were put in such a position they would try asking o... (read more)

Our team at Epoch recently updated the org's website.
I'd be curious to receive feedback if anyone has any!
What do you like about the design? What do you dislike?
How can we make it more useful for you?

6
Michael Townsend
6mo
I think it’s a solid improvement! I only occasionally browsed the previous version, but I remember it being a bit tricky to find the headline figures I was interested after listening to them cited on podcasts, whereas now going to https://epochai.org/trends they seem all quite easy to find (plus dig into the details of) due to the intuitive/elegant layout.

Note that the AMF example does not quite work, because if each net has a 0.3% chance of preventing death, and all are independent, then with 330M nets you are >99% sure of saving at least ~988k people.

2
Linch
6mo
As a fairly unimportant side note, I was imagining that some nets has a 0.3% chance of saving some (unusually vulnerable) people, but the average probability (and certainly the marginal probability) is a lot lower. Otherwise $1B to AMF can save ~1M lives, and which is significantly more optimistic than the best GiveWell estimates.

Contractualism doesn't allow aggregation across individuals. If each person has 0.3% chance of averting death with a net, then any one of those individual's claims is still less strong than the claim of the person who will die with probability ~=1. Scanlon's theory then says save the one person.

An encouraging update: thanks to the generous support of donors, we have raised $95k in funds to support our activities for six more months. During this time, we plan to 1) engage with the EU trilogue on the regulation of foundation models during the Spanish presidency of the EU council, 2) continue our engagement with policy markers in Argentina and 3) release a report on global risk management in latin america.

We nevertheless remain funding constrained. With more funding we would be able to launch projects such as:

  1. A report on prioritizing and forecasting
... (read more)

I have grips with the methodology of the article, but I don't think highlighting the geometric mean of odds over the mean of probabilities is a major fault.  The core problem is assuming independence over the predictions at each stage. The right move would have been to aggregate the total P(doom) of each forecaster using geo mean of odds (not that I think that asking random people and aggregating their beliefs like this is particularly strong evidence).

The intuition pump that if someone assigns a zero percent chance then the geomean aggregate breaks i... (read more)

Yeah in hindisght that is probably about right.

It'd be interesting to look at some of these high profile journalists, and see if they are well supported to do impactful journalism or if they have to spend a lot of time on chasing trends to afford working on IJ pieces.

2
RyanCarey
7mo
The annual budgets of Bellingcat and Propublica are in the single-digit millions. (The latter has had negative experiences with EA donations, but is still relevant for sizing up the space.)

I looked into the impact of investigative journalism briefly a while back.

Here is an estimation of the impact of some of the most high profile IJ success stories I found:

How many investigative journalists exist? It's hard to say, but the International Federation of Journalists has 600k members, so maybe there exists 6M journalists worlwide, of which maybe 10% are investigative journalists (600k IJs). If they are paid like $50k/year, that's $30B used for IJ.

Putting both numbers together, that's $2k to $20k per person affected. Let's say that each person aff... (read more)

-2
AlanGreenspan
7mo
IJ in developing countries where they're needed the most probably have a full time job doing something else, and probably don't make more than $4k/year. In first world countries, sure, go for $50k/avg.  Obviously when you focus on something you can bring down the price, but attacking corruption at such a big scale does bring up the cost. IJ aren't enough. Lawyers are needed, but they're all pro bono because they know this work needs to be done.   We're talking billions of people affected. Yes the numbers can be completely different because you just can always say "Well there's still corruption in the government, so if the billions of $ of taxes that were being evaded are now funding corrupt policies & corrupt corporations wasting more resources etc...." It's exponential, and it just can't be attributed due to secondary affects.  Check out CongoHoldUp.com, there's bloomberg articles about it etc.  How much corruption witholds funds that could have helped the people in extreme poverty in the world?  These global health interventions mean squat if you're trying to put a bandaid on an open wound. That's what corruption is, a cancer deep down, but if it's too deep for most here to comprehend, let's just call it an open wound. Mosquitos are flying out of this wound and causing malaria, call it a fever, just a symptom. You keep investing/donating and the problem will never be solved, but you clap yourselves on the back, you're saving more lives for less spent. The wound is still open. So how does that in the end affect your QALY IIUC? Saved a life, now the kid needs to survive living in poverty still, to hope he can get a good education and a good job someday, but then spend 30 of his adult years to finally buy a home because the price of housing in Nigeria increases 100% every year. (OK on average... over 9 years it's increased 700% in many areas) You have France who exploited African countries for decades, here's one example:  Exposing this is just the first step t

Let's say that each person affected gains 0.1 to 10 QALYs for these high profile case

This seems very generous to me. The example here that effected the largest number was about tax evasion by Pakistani MPs; even if previously they had paid literally zero tax, and as a result of this they ended up paying full income tax, it seems implausible to me that the average Pakistani citizen would be willing to trade a year of life for this research. I would guess you are over-valuing this by several orders of magnitude. 

It's hard to say, but the International Federation of Journalists has 600k members, so maybe there exists 6M journalists worlwide, of which maybe 10% are investigative journalists (600k IJs). If they are paid like $50k/year, that's $30B used for IJ.

Surely from browsing the internet and newspapers, it's clear than less than 1% (<60k) of journalists are "investigative". And I bet that half of the impact comes from an identifiable 200-2k of them, such as former Pulitzer Prize winners, Propublica, Bellingcat, and a few other venues.

What are some open questions in your mind, for potential GCR priorities you haven't had time to investigate?

1
christian.r
7mo
A few that come to mind: * Risk-general/threat-agnostic/all-hazards risk-mitigation (see e.g. Global Shield and the GCRMA) * "Civil defense" interventions and resilience broadly defined * Intrawar escalation management * Protracted great power war

What risks do you feel are particularly neglected by the EA community?

What opportunities are you most excited about for GCR mitigation outside the Anglosphere?

4
christian.r
7mo
China and India. Then generally excited about leveraging U.S. alliance dynamics and building global policy advocacy networks, especially for risks from technologies that seem to be becoming cheaper and more accessible, e.g. in synthetic biology

I don't think EAs have any particular edge in managing harassment over the police, and I find it troublesome that they have far higher standards for creating safe spaces, specially in situations where the cost is relatively low such as inviting the affected members to separate EAG events while things cool down.

On another point, I don't think this was Jeff's intention, but I really dislike the unintentional parallel between an untreated schizophrenic and the CB who asked for the restraining order. I can assure you that this was not the case here, and I thin... (read more)

9
Jeff Kaufman
7mo
I was responding specifically to the claim that hearing that a restraining order has been granted is very informative. I didn't claim that getting one is easy or hard, or that the community health team should have higher or lower thresholds for action. I'm also not trying to say anything either way about the community builder in question, and don't know any more about that situation than I've read in this thread. And specifically, I'm not saying that they are mentally ill or made a report based on hallucinations. Instead, what I'm saying is that because the decision to grant a restraining order is not the product of an investigative process and the amount of evidence necessary is relatively low, learning that one has been granted doesn't provide much evidence.

I agree that this is very valuable. I would want them to be explicit about this role, and be clear to community builders talking to them that they should treat them as if talking to a funder.

To be clear, in the cases where I have felt uncomfortable it was not "X is engaging in sketchy behaviour, and we recommend not giving them funding" (my understanding is that this happens fairly often, and I am glad for it. CHT is providing a very valuable function here, which otherwise would be hard to coordinate. If anything, I would want them to be more brazen and re... (read more)

I echo the general sentiment -- I find the CHT to work diligently and be in most cases compassionate. I generally look up to the people who make it up, and I think they put a lot of thought into their decisions. From my experience, they helped prevent at least three problematic people from accruing power and access to funding in the Spanish Speaking community, and have invested 100s of hours into steering that sub community towards what they think is a better direction, including being always available for consultations.

I also think that they undervalue th... (read more)

My overall impression is that the CEA community health team (CHT from now on) are well intentioned but sometimes understaffed and other times downright incompetent. It's hard to me to be impartial here, and I understand that their failures are more salient to me than their successes. Yet I endorse the need for change, at the very least including 1) removing people from the CHT that serve as a advisors to any EA funds or have other conflict of interest positions, 2) hiring HR and mental health specialists with credentials, 3) publicly clarifying their role ... (read more)

Catherine from CEA’s Community Health and Special Projects Team here.  I have a different perspective on the situation than Jaime does and appreciate that he noted that “these stories have a lot of nuance to them and are in each case the result of the CHT making what they thought were the best decisions they could make with the tools they had.” 

I believe Jaime’s points 1, 2 and 3 refer to the same conflict between two people. In that situation, I have deep empathy for the several people that have suffered during the conflict. It was (and still is... (read more)

I don't know about the sway the com health team has over decisions at other funders, but at EA Funds my impression is that they rarely give input on our grants, but, when they do, it's almost always helpful. I don't think you'd be concerned by any of the ways in which they've given input - in general, it's more like "a few people have reported this person making them feel uncomfortable so we'd advise against them doing in-person community building" than "we think that this local group's strategy is weak".

I think that whilst there are valid criticisms of th... (read more)

Could you explain what you perceive as the correct remedy in instance #1?

The implication seems like the solution you prefer is having the community member isolated from official community events. But I'm not sure what work "uncomfortable" is doing here. Was the member harassing the community builder? Because that would seem like justification for banning that member. But if the builder is just uncomfortable due to something like a personal conflict, it doesn't seem right to ban the member.

But maybe I'm not understanding what your corrective action would be here?

In service of clear epistemics, I want to flag that the "horror stories" that your are sharing are very open to interpretation. If someone pressured someone else, what does that really mean? If could be a very professional and calm piece of advice, or it could be a repulsive piece of manipulation. Is feeling harassed a state that allows someone to press charges, rather than actual occurrence of harassment? (of course, I also understand that due to privacy and mob mentality, you probably don't want to share all the details; totally understandable.)

So maybe ... (read more)

While I don't have an objection to the idea of rebranding the community health team, I want to push back a bit against labelling it as human resources.

HR already has a meaning, and there is relatively little overlap between the function of community health within the EA community and the function of a HR team. I predict it would cause a lot of confusion to have a group labelled as HR which doesn't do the normal things of an HR team (recruitment, talent management, training, legal compliance, compensation, sometimes payroll, etc.) but does do things that ar... (read more)

Could you add more detail regarding: "removing people from the CHT that serve as advisors to any EA funds or have other conflict of interest positions"?

I think people would be more likely to agree if you gave your reasons.

-2
Nathan Young
7mo
While true, I think this discussion is really hard to have. I don't think EA tends to be good at discussing it's internal workings. What scandal discussions have gone well?

Factor increase per year is the way we are reporting growth rates by default now in the dashboard.

And I agree it will be better interpreted by the public. On the other hand, multiplying numbers is hard, so it's not as nice for mental arithmetic. And thinking logarithmically puts you in the right frame of mind.

Saying that GPT-4 was trained on x100 more compute than GPT-3 invokes GPT-3 being 100 times better, whereas I think saying it was trained on 2 OOM more compute gives you a better picture of the expected improvement.

I might be wrong here.

In any case, it is still a better choice than doubling times.

My sense is that pure math courses mostly make it easier to do pure math later, and otherwise not make much of a difference. In terms of credentials, I would bet they are mostly equivalent - you can steer your career later through a master or taking jobs in your area of interest.

Statistics seems more robustly useful and transferable no matter what you do later. Statistics degrees also tend to cover the very useful parts of math as well, such as calculus and linear algebra.

Within Epoch there is a months-long debate about how we should report growth rates for certain key quantities such as the amount of compute used for training runs.

I have been an advocate of an unusual choice: orders-of-magnitude per year (abbreviated OOMs/year). Why is that? Let's look at other popular choices.

Doubling times. This has become the standard in AI forecasting, and its a terrible metric. On the positive, it is an intuitive metric that both policy makers and researchers are familiar with. But it is absolutely horrid to make calculations. For exa... (read more)

4
Gavin
7mo
What about factor increase per year, reported alongside a second number to show how the increases compose (e.g. the factor increase per decade)? So "compute has been increasing by 1.4x per year, or 28x per decade" or sth. The main problem with OOMs is fractional OOMs, like your recent headline of "0.1 OOMs". Very few people are going to interpret this right, where they'd do much better with "2 OOMs".

Agree that it's easier to talk about (change)/(time) rather than (time)/(change). As you say, (change)/(time) adds better. And agree that % growth rates are terrible for a bunch of reasons once you are talking about rates >50%.

I'd weakly advocate for "doublings per year:" (i) 1 doubling / year is more like a natural unit, that's already a pretty high rate of growth, and it's easier to talk about multiple doublings per year than a fraction of an OOM per year, (ii) there is a word for "doubling" and no word for "increased by an OOM," (iii) I think the ari... (read more)

I wrote a report on quantum cryptanalysis!

The TL;DR is that it doesn't seem particularly prioritary, since it is well attended (the NIST has an active post quantum cryptography program) and not that impactful (there are already multiple proposals to update the relevant protocols, and even if these pan out we can rearrange society around them).

I could change my mind if I looked again into the NIST PQC program and learned that they are significantly behind schedule, or if I learned that qauntum computing technology has significantly accelerated since.

0
wes R
8mo
Thanks!

I mostly think of alignment as about avoiding deception or catastrophic misgeneralization outside of testing settings.

In general I also believe that AI companies have a massive incentive to align their systems with user intent. You can't profit if you are dead.

This is a tough question, and I don't really know the first thing about patents, so you probably should not listen to me.

My not very meditated first instinct is that one would set up a licensing scheme so that you commit to granting a use license to any manufacturer in exchange for a fee, preferrably as a portion of the revenue to incentivize experimentation from entrepreneurs.

Of course, this might dissuade the Tech Transfer office or other entrepreneurs from investing in a start-up if they won't be guaranteed exclusive use.

I suppose that it would be relev... (read more)

As far as I can tell they are enabled - I see there is a cookie in storage for the intercom for example

Just a quick comment for the devs - I saw the "More posts like this" lateral bar and it felt quite jarring. I liked it way better when it was at the end. Having it randomly in the middle of a post felt distracting and puzzling. 

ETA: the Give feedback button does not seem to work either. Also its purpose its unclear (give feedback on the post selection? on the feature?)

2
JP Addison
8mo
I've recorded the feedback, thank you! The anticipation that some this might be distracting was the motivation for the feedback button. Which makes me concerend to hear that it's not working for you. Could I ask if you could check your cookies to see if you've enabled functional cookies? (See the link in the second paragraph.)

A pattern I've recently encountered twice in different contexts is that I would recommend a more junior (but with relevant expertise) staff for a talk and I've been asked if I could not recommend a more "senior" profile instead.

I thought it was with the best of intentions, but it still rubbed me the wrong way. First, this diminishes the expertise of my staff, who have demonstrated mastery over their subject matter. Second, this systematically denies young people of valuable experience and exposure. You cannot become a "senior" profile without access to these speaking opportunities!

Please give speaking opportunities to less experienced professionals too!

4
Joseph Lemien
9mo
A clarification question: when you recommend a someone for a talk, is this in the context of recommending that person A speak with person B (as in building a network), or more in the context of giving a presentation (like a TEDx conference)?
3
zchuang
10mo
What's your rate of success after pushback? Do organisations usually take the more junior person as a speaker?
4
Quadratic Reciprocity
10mo
When people do this, do you think they mostly want someone with more skills or knowledge or someone with better, more prestigious credentials?

It's really hard to say.

I am concerned that many projects are being set up in a way that does not follow good governance practices and leaves their members exposed to conflict and unpleasant situations. I would want to encourage people to try to embed themselves in a more formal structure with better resources and HR management. For example, I would be excited about community builders over the world setting a non-profit governance structure, where they are granted explicit rights as workers that encompass mental healthcare and a competent HR department to ... (read more)

7
NickLaing
10mo
Thanks for this reflection, appreciate it!

I'm pretty into the idea of putting more effort into concrete areas.

I think the biggest reason for is one which is not in your list: it is too easy to bikeshed EA as an abstract concept and fool yourself into thinking that you are doing something good.

Working on object level issues helps build expertise and makes you more cognizant of reality. Tight feedback loops are important to not delude yourself.

I mean rankings like https://www.metaculus.com/rankings/?question_status=resolved&timeframe=3month

I agree that the potential for this exists, and if it was an extended practice it would be concerning. Have you seen people who claim to have a good forecasting record engage with pseudonym exploitation though?

My understanding is that most people who claim this have proof records associated to a single pseudonym user in select platforms (eg Metaculus), which evades the problem you suggest.

3
WobblyPanda2
10mo
You couldn't know who is and is not engaging in this behaviour. Anyone with a good forecasting record may have shadow accounts. I'm not familiar with proof records. Could you elaborate further? If this is verification such as identity documents, this could go some way to preventing manipulation.

Ranking the risks is outside the scope of our work. Interpreting the metaculus questions sounds interesting, though it is not obvious how to disentangle the scenarios that forecasters had in mind. I think the Forecasting Research Institute is doing some related work, surveying forecasters on different risks.

2
EdoArad
11mo
Maybe make it configurable? I'd prefer the regular, colorful, style (because it's easier to understand at a glance)

I basically agree with the core message. I'll go one step further and say that existential risk has unnecessary baggage - as pointed out by Carl Shulman and Elliot Thornley the Global Catastrophic Risk and CBA framing rescues most of the implications without resorting to fraught assumptions about the long term future of humanity.

Double-checking research seems really important and neglected. This can be valuable even if you don't rerun the experiments and just try to replicate the analyses.

A couple of years ago, I was hired to review seven econometric papers, and even as an outsider to the field it was easy to contribute to find flaws and assess the strength of the papers.

Writing these reviews seems like a great activity, especially for junior researchers who want to learn good research practices while making a substantial contribution.

Donor lotteries have died out? RCG got its seed funding from a donor lottery last December.

I agree! I think this is an excellent opportunity to shape how AI regulation will happen in practice.

We are currently working on a more extensive report with recommendations to execute the EU AI act sandbox in Spain. As part of this process, we are engaging some relevant public stakeholders in Spain with whom we hope to collaborate.

 So far, the most significant barrier to our work is that we are running out of funding. Other than that, access to the relevant stakeholders is proving more challenging than in previous projects, though we are still early ... (read more)

It is specific to the human-generated text.

The current soft consensus at Epoch is that data limitations will probably not be a big obstacle to scaling compared to compute, because we expect generative outputs and data efficiency innovation to make up for it.

This is more based on intuition than rigorous research though.

1
more better
1y
Interesting, thanks for answering! 

This post argues that:

  • Bostrom's micromanagement has led to FHI having staff retention problems.
  • Under his leadership, there have been considerable tensions with Oxford University and a hiring freeze.
  • In his racist apology, Bostrom failed to display tact, wisdom and awareness.
  • Furthermore, this apology has created a breach between FHI and its closest collaborators and funders.
  • Both the mismanagement of staff and the tactless apology caused researchers to renounce

While I'd love for FHI staff to comment and add more context, all of this matches my impressi... (read more)

EJT
1y11
7
1

Using 'coherence theorems' with a meaning that is as standard as any, and explaining that meaning within two sentences, seems fine to me.

My quick take after skimming: I  am quite confused about this post.
Of course the VNM theorem IS a coherence theorem.
How... could it not be a coherence theorem?

It tells you that actors following four intuitive properties can be represented as utility maximisers. We can quibble about the properties, but the result sounds important regardless for understanding agency!

The same reasoning could be applied to argue that Arrow's Impossibility Theorem is Not Really About Voting. After all, we are just introducing all these assumptions about what good voting looks like!

EJT
1y11
6
1

I would have hoped you reached the second sentence before skimming! I define what I mean (and what I take previous authors to mean) by 'coherence theorems' there.

Not central to the argument, but I feel someone should be linking here to Garrabrant's rejection of the independence axiom, which is fairly compelling IMO.

I'd personally err towards different subsections rather than different tabs, but glad to see you experimenting to help EA focus on more object level issues!

If you want to support work in other contexts, Riesgos Catastróficos Globales is working on improving GCR management in Spain and Latin America.

I believe this project can improve food security in nuclear winter (tropical countries are very promising as last-resort global food producers), biosecurity vigilance (the recent H5N1 episode happened in Spain and there are some easy improvements to biosec in LatAm)  and potentially AI policy in Spain.

Funding is very constrained, we currently have runway until May, and each $10k extends the runway by one month... (read more)

1
vincentweisser
1y
Thanks for sharing, website isnt working for me, is there a deeper writeup or independent evaluation of the org and effort?

FWIW here are a few pieces of uninformed evidence about Atlas Fellowship. This is scattered, biased and unfair; do not take it seriously.

  1. I have a lot of faith in Jonas Vollmer as a leader of the project, and stories like Habryka's tea table make me think that he is doing a good job of overseeing the project expenses
  2. I have heard other rumours in SF about outrageous expenses like a $100k statue (this sounds ridiculous so I probably misheard?) or spending a lot of money on buying and reforming a venue
  3. I have also heard rumours about a carefree attitude tow
... (read more)

I don't think it's impossible - you could start from Harperin's et al basic setup [1] and plug in some numbers about p doom, the long rate growth rate etc and get a market opinion.

I would also be interested in seeing the analysis of hedge fund experts and others. In our cursory lit review we didn't come across any which was readily quantifiable (would love to learn if there is one!).

[1] https://forum.effectivealtruism.org/posts/8c7LycgtkypkgYjZx/agi-and-the-emh-markets-are-not-expecting-aligned-or

I am not sure I follow 100%: is your point that the WBE path is disjunctive from others?

Note that many of the other models are implicitly considering WBE, eg the outside view models.

4
Daniel_Eth
1y
Yeah, my point is that it's (basically) disjunctive.
Load more