🔶
Very interesting! We had a submission for the evals research sprint in August last year on the same topic. Check it out here: Turing Mirror: Evaluating the ability of LLMs to recognize LLM-generated text (apartresearch.com)
Thank you so much for the talk, Paul! It was exciting to see the vignettes besides the very practical first case. It will be interesting to see the entry of Straumli on the evaluations scene since I think you have a solid case for success.
CoI statement: Straumli donated the prize money for the Governance Sprint, though nothing goes to me or Apart, just the AI safety community.
Thank you for hosting this! I'll repost a question on Asya's retrospective post regarding response times for the fund.
our median response time from January 2022 to April 2023 was 29 days, but our current mean (across all time) is 54 days (although the mean is very unstable)
I would love to hear more about the numbers and information here. For instance, how did the median and mean change over time? What does the global distribution look like? The disparity between the mean and median suggests there might be significant outliers; how are these outliers addressed? I assume many applications become desk rejects; do you have the median and mean for the acceptance response times?
I was incredibly impressed by the tables of numbers in their impact evaluation. After conversing with the team, I've witnessed their high ability to produce results, and their evaluation research methods certainly attest to this. This appears to be one of those rare opportunities where donations could have a significant counterfactual impact.
Edit: I am not in any way affiliated with FEM and randomly met one of the co-founders on a flight where we had a conversation about their work.
Thank you for sharing your reflections and for the work you've done on the EA Funds, Asya! I appreciate the role the Funds have played over the past years.
our median response time from January 2022 to April 2023 was 29 days, but our current mean (across all time) is 54 days (although the mean is very unstable)
A few questions arise from your mention of the Funds' response times. I would love to hear more about the numbers and information here. For instance, how did the median and mean change over time? What does the global distribution look like? The disparity between the mean and median suggests there might be significant outliers; how are these outliers addressed? I assume many applications become desk rejects; do you have the median and mean for the acceptance response times?
The focus of FLI on lethal autonomous weapons systems (LAWS) generally seems like a good and obvious framing for a concrete extinction scenario. Currently, a world war will without a doubt use semi-autonomous drones with the possibility of a near-extinction risk from nuclear weapons.
A similar war in 2050 seems very likely to use fully autonomous weapons under a development race, leading to bad deployment practices and developmental secrecy (without international treaties). With these types of "slaughterbots", there is the chance of dysfunction (e.g. misalignment) leading to full eradication. Besides this, cyberwarfare between agentic AIs might lead to broad-scale structural damage and for that matter, the risk of nuclear war brought about through simple orders given to artificial superintelligences.
The main risks to come from the other scenarios mentioned in the replies here are related to the fact that we create something extremely powerful. The main problems arise from the same reasons that one mishap with a nuke or a car can be extremely damaging while one mishap (e.g. goal misalignment) with an even more powerful technology can lead to even more unbounded (to humanity) damage.
And then there are the differences between nuclear and AI technologies that make the probability of this happening significantly higher. See Yudkowsky's list.
This a unique, interesting and simple proposal I have not seen presented in academic form yet. With the development of the article, you'll of course need to change the framing of a few sections to introduce the idea, the viability, along with the multi-purpose potential of the proposal.
Despite unlikely effective enforcement of the policy, it seems like a valuable idea to publish. Combining it with newer work in GPU monitoring firmware (Shavit, 2023) and your own proposals for required GPU server tracking.
To comment on kpurens comment, carbon taxation was a non-political issue before it became contentious and if the lobbying hadn't hit as hard, it seems like there would be a larger chance for a global carbon tax. At the same time, compute governance seems more enforceable because of the centralization of data centers.
Answering on behalf of Apart Research!
We're a non-profit research and community-building lab with a strategic target on high-volume frontier technical research. Apart is currently raising a round to run the lab throughout 2025 and 2026 but here I'll describe what your marginal donation may enable.
In just two years, Apart Research has established itself as a unique and efficient part of the AI safety ecosystem. Our research output includes 13 peer-reviewed papers published since 2023 at top venues including NeurIPS, ICLR, ACL, and EMNLP, with six main conference papers and nine workshop acceptances. Our work has been cited by OpenAI's Superalignment team, and our team members have contributed to significant publications like Anthropic's "Sleeper Agents" paper.
With this track record, we're able to capitalize on our position as an AI safety lab and mobilize our work to impactful frontiers of technical work in governance, research methodology, and AI control.
Besides our ability to accelerate a Lab fellow's research career at an average direct cost of around $3k, enable research sprint participants for as little as $30, and enable growth at local groups at similar high price/impact ratios, your marginal donation can enable us to run further impactful projects:
- Improved access to our program ($7k-$25k): Professional rewamp of our website and documentation would make our programs and research outputs more accessible to talented researchers worldwide. Besides our establishment as a lab through our paper acceptances, a redesign will help us cater even more to institutional funding and technical professionals, which will help scale our impact through valuable counterfactual funding and talent discovery. At the higher end, we will also be able to make our internal resources publicly available. These resources are specifically designed to accelerate AI safety technical careers.
- Higher conference attendance support ($20k): Currently, we only support one fellow per team to attend conferences. Additional funding would enable a second team member to attend, at approximately $2k per person.
- Improving worldview diversity in AI safety ($10k-$20k): We've been working on all continents now and find a lot of value in our approach to enable international and underrepresented professional talent (besides our work at organizations such as 7 of the top 10 universities). With this funding, you would enable more targeted outreach from Apart's side and existing lab members' participation in conferences to discuss and represent AI safety to otherwise underrepresented professional groups.
- Continuing impactful research projects ($15k-$30k): We will be able to extend timely and critical research projects. For instance, we're looking to port our cyber-evaluations work to Inspect, making it a permanent part of UK AISI catastrophic risk evaluations. Our recent paper also finds novel methods to test whether LLMs game public benchmarks and we would like to expand the work to run the same test on other high-impact benchmarks while making the results more accessible. These projects have direct impacts on AI evaluation methodology but we see other opportunities like this for expanding projects at reasonable follow-up costs.
Donate to Apart ResearchYou'll be supporting a growing organization with the Apart Lab fellowship already doubling from Q1'24 to Q3'24 (17 to 35 fellows) and our research sprints having moved thousands closer to AI safety.
Given current AGI development timelines, the need to scale and improve safety research is urgent. In our view, Apart seems like one of the better investments to reduce AI risk.
If this sounds interesting and you'd like to hear more (or have a specific marginal project you'd like to see happen), my inbox is open.