James Fodor

1284 karmaJoined Sep 2014Melbourne VIC, Australia

Bio

Hi, my name is James Fodor. I am a longtime student and EA organiser from Melbourne. I love science, history, philosophy, and using these to make a difference in the world.

Posts
18

Sorted by New

Benchmark Performance is a Poor Measure of Generalisable AI Reasoning Capabilities

James Fodor

· 2mo ago · 29m read

228

Case-control survey of EAGx attendees finds no behavioural or attitudinal changes after six months

James Fodor

· 9mo ago · 18m read

Report and Data for EAGxAustralia 2023

2 authors

· 1y ago · 1m read

Intrinsic limitations of GPT-4 and other large language models, and why I'm not (very) worried about GPT-n

James Fodor

· 2y ago · 14m read

452

The FTX crisis highlights a deeper cultural problem within EA - we don't sufficiently value good governance

James Fodor

· 2y ago · 5m read

A Critique of AI Takeover Scenarios

James Fodor

· 3y ago · 14m read

Concern about the EA London COVID protocol

James Fodor

· 4y ago · 3m read

107

The Fermi Paradox has not been dissolved

James Fodor

· 4y ago · 17m read

EAGxAsia-Pacific 2020 Applications Now Open

James Fodor

· 5y ago · 1m read

111

Critical Review of 'The Precipice': A Reassessment of the Risks of AI and Pandemics

James Fodor

· 5y ago · 31m read

Sequences
1

Critique of Superintelligence

Comments
22

Benchmark Performance is a Poor Measure of Generalisable AI Reasoning Capabilities

James Fodor2mo2

Hi Toby, thanks for the comment.

I have read about some of the work on tackling the ARC dataset, and I am not at all confident that the approaches which perform well have anything to do with generalisable reasoning. The problem remains that there is no validation that the benchmark measures what it claims to. I don't know what methods o3 used to solve it, but until I do I don't believe the marketing hype released by OpenAI that it must be generalisable reasoning.

As to why we'd see inference time scaling if chain-of-thought consisted of not much more than post-hoc rationalizations, this is still an open question but it seems to be partly driven by increased compute time and number of tokens. I don't have the full answer here, but the evidence we do have strongly cautions against just assuming these models are doing what we might describe as 'genuine reasoning'.

Case-control survey of EAGx attendees finds no behavioural or attitudinal changes after six months

James Fodor9mo2

Hi David,

The point I was trying to communicate here was simply that our design was able to find a pattern of differences between the control and treatment groups which is interpretable (i.e. in terms of different ages and career stage). I think this provides some validation of the design, in that if large enough differences exist then our measures pick up these differences and we can statistically measure them. We don't, for instance, see an unintelligable mess of results that would cast doubt on the validity of our measures or the design itself. Of course, if as you point out the effect size for attending the conference is smaller then we won't be able to detect that given our sample size. For most of our measures this was around 15-20%. But given we were able to measure sufficiently large effects using this design, I think it provides justification for thinking that a large enough sample size using a similar study design would be able to detect smaller effects, if they existed. Hope that clarifies a bit.

Does Sam make me want to renounce the actions of the EA community? No. Does your reaction? Absolutely.

James Fodor2y5

I think it is appropriate for the movement to reflect at this time on whether there are systematic problems or failings within the community that might have contributed to this problem. I have publicly argued that there are, and though I might be wrong about that, I do think its entirely reasonable to explore these issues. I don't think its reasonable to just continually assert that it was all down to a handful of bad actors and refuse to discuss the possibility of any deeper or broader problems. I like to think that the EA community can learn and grow from this experience.

Wrong lessons from the FTX catastrophe

James Fodor2y5

I disagree that events can't be evidence for or against philosophical positions. If empirical claims about human behaviour or the real-world operation of ethical principles are relevant to the plausibility of competing ethical theories, then I think events can provide evidential value for philosophical positions. Of course that raises a much broader set of issues and doesn't really detract from the main point of this post, but I thought I would push back on that specific aspect.

Burnout: What is it and how to Treat it.

James Fodor2y1

I love the research-focus of this piece and the lack of waffle. Very impressed.

My reaction to FTX: appalled

James Fodor2y9

"Is it really "grossly immoral" to do the same thing in crypto without telling depositors?"
Yes

Concern about the EA London COVID protocol

James Fodor4y3

Great point about ventilation. I am not aware of any evidence that hand sanitisation in particular is merely 'safety theater'. Surface transmission may not be the major method of viral spread, but it still is a method, and hand sanitisation is a very simple intervention. Also, to emphasise something I mentioned in the post, masks are definitely not 'safety theater'. It is good to see that the revised COVID protocol now mentions that mask use will be encouraged and widely available.

Concern about the EA London COVID protocol

James Fodor4y3

I don't understand how Australia's travel policy is relevant. I'm not asking for anything particularly unusual or onerous, I just would expect that a community of effective altruists would follow WHO guidelines regarding methods to reduce the spread of COVID. I honestly don't understand the negative reaction.

Concern about the EA London COVID protocol

James Fodor4y1

Thanks Amy, I think these clarifications significantly improve the policy. I disagree on the decision not to mandate masks but I understand there will be differences in views there. However mentioning that they are encouraged may be just as effective at ensuring widespread use. That was part of my original concern, that I did not feel this aspect of norm-setting was as evident in the original version of the policy.

DontDoxScottAlexander.com - A Petition

James Fodor5y6

It doesn't seem to me this has much relevance to EA.

James Fodor

Bio

Posts 18

Sequences 1

Comments22

Posts
18

Sequences
1

Comments
22