Chris Leong

Organiser @ AI Safety Australia and NZ
6336 karmaJoined Sydney NSW, Australia



Currently doing local AI safety Movement Building in Australia and NZ.


It's really hard to know without knowledge of how much a nanny costs, your financial situation and how much you'd value being able to look after your child yourself.

If you'd be fine with a nanny looking after your child, then it is likely worthwhile spending a significant amount of money in order to discover whether you would have a strong fit for alignment research sooner.

I would also suggest that switching out of AI completely was likely a mistake. I'm not suggesting that you should have continued advancing fundamental AI capabilities, but the vast majority of jobs in AI relate to building AI applications rather than advancing fundamental capabilities. Those jobs won't have a significant effect on shortening timelines, but will allow you further develop your skills in AI.

Another thing to consider: if at some point you decide that you are unlikely to break into technical AI safety research, it may be worthwhile to look at contributing in an auxiliary manner, ie. through mentorship or teaching or movement-building.

I think you're underrating the risk of capabilities acceleration.

Interesting research. One thing I'd take into account is that talent need is a somewhat limited measure for impact. I expect that there would be decreasing marginal returns as you add more people to the same research direction. So for example, if you already have 100 people doing interpretability research, I expect that they'd already be picking most of the low-hanging fruit, especially if you're adding more iterators. However, this might be worthwhile anyway if you believe that we're in a short-timeline world and that one of the most important things is producing usable research fast.

I'll post some extracts from the commitments made at the Seoul Summit. I can't promise that this will be a particularly good summary, I was originally just writing this for myself, but maybe it's helpful until someone publishes something that's more polished:

Frontier AI Safety Commitments, AI Seoul Summit 2024

The major AI companies have agreed to Frontier AI Safety Commitments. In particular, they will publish a safety framework focused on severe risks: "internal and external red-teaming of frontier AI models and systems for severe and novel threats; to work toward information sharing; to invest in cybersecurity and insider threat safeguards to protect proprietary and unreleased model weights; to incentivize third-party discovery and reporting of issues and vulnerabilities; to develop and deploy mechanisms that enable users to understand if audio or visual content is AI-generated; to publicly report model or system capabilities, limitations, and domains of appropriate and inappropriate use; to prioritize research on societal risks posed by frontier AI models and systems; and to develop and deploy frontier AI models and systems to help address the world’s greatest challenges"

"Risk assessments should consider model capabilities and the context in which they are developed and deployed" - I'd argue that the context in which it is deployed should account take into account whether it is open or closed source/weights as open-source/weights can be subsequently modified.

"They should also be accompanied by an explanation of how thresholds were decided upon, and by specific examples of situations where the models or systems would pose intolerable risk." - always great to make policy concrete"

In the extreme, organisations commit not to develop or deploy a model or system at all, if mitigations cannot be applied to keep risks below the thresholds." - Very important that when this is applied the ability to iterate on open-source/weight models is taken into account

Seoul Declaration for safe, innovative and inclusive AI by participants attending the Leaders' Session

Signed by Australia, Canada, the European Union, France, Germany, Italy, Japan, the Republic of Korea, the Republic of Singapore, the United Kingdom, and the United States of America.

"We support existing and ongoing efforts of the participants to this Declaration to create or expand AI safety institutes, research programmes and/or other relevant institutions including supervisory bodies, and we strive to promote cooperation on safety research and to share best practices by nurturing networks between these organizations" - guess we should now go full-throttle and push for the creation of national AI Safety institutes

"We recognise the importance of interoperability between AI governance frameworks" - useful for arguing we should copy things that have been implemented overseas.

"We recognize the particular responsibility of organizations developing and deploying frontier AI, and, in this regard, note the Frontier AI Safety Commitments." - Important as Frontier AI needs to be treated as different from regular AI.

Seoul Statement of Intent toward International Cooperation on AI Safety Science

Signed by the same countries.

"We commend the collective work to create or expand public and/or government-backed institutions, including AI Safety Institutes, that facilitate AI safety research, testing, and/or developing guidance to advance AI safety for commercially and publicly available AI systems" - similar to what we listed above, but more specifically focused on AI Safety Institutes which is a great.

"We acknowledge the need for a reliable, interdisciplinary, and reproducible body of evidence to inform policy efforts related to AI safety" - Really good! We don't just want AIS Institutes to run current evaluation techniques on a bunch of models, but to be actively contributing to the development of AI safety as a science.

"We articulate our shared ambition to develop an international network among key partners to accelerate the advancement of the science of AI safety" - very important for them to share research among each other

Seoul Ministerial Statement for advancing AI safety, innovation and inclusivity

Signed by: Australia, Canada, Chile, France, Germany, India, Indonesia, Israel, Italy, Japan, Kenya, Mexico, the Netherlands, Nigeria, New Zealand, the Philippines, the Republic of Korea, Rwanda, the Kingdom of Saudi Arabia, the Republic of Singapore, Spain, Switzerland, Türkiye, Ukraine, the United Arab Emirates, the United Kingdom, the United States of America, and the representative of the European Union

"It is imperative to guard against the full spectrum of AI risks, including risks posed by the deployment and use of current and frontier AI models or systems and those that may be designed, developed, deployed and used in future" - considering future risks is a very basic, but core principle

"Interpretability and explainability" - Happy to interpretability explicitly listed

"Identifying thresholds at which the risks posed by the design, development, deployment and use of frontier AI models or systems would be severe without appropriate mitigations" - important work, but could backfire if done poorly

"Criteria for assessing the risks posed by frontier AI models or systems may include consideration of capabilities, limitations and propensities, implemented safeguards, including robustness against malicious adversarial attacks and manipulation, foreseeable uses and misuses, deployment contexts, including the broader system into which an AI model may be integrated, reach, and other relevant risk factors." - sensible, we need to ensure that the risks of open-sourcing and open-weight models are considered in terms of the 'deployment context' and 'foreseeable uses and misuses'

"Assessing the risk posed by the design, development, deployment and use of frontier AI models or systems may involve defining and measuring model or system capabilities that could pose severe risks," - very pleased to see a focus beyond just deployment

"We further recognise that such severe risks could be posed by the potential model or system capability or propensity to evade human oversight, including through safeguard circumvention, manipulation and deception, or autonomous replication and adaptation conducted without explicit human approval or permission. We note the importance of gathering further empirical data with regard to the risks from frontier AI models or systems with highly advanced agentic capabilities, at the same time as we acknowledge the necessity of preventing the misuse or misalignment of such models or systems, including by working with organisations developing and deploying frontier AI to implement appropriate safeguards, such as the capacity for meaningful human oversight" - this is massive. There was a real risk that these issues were going to be ignored, but this is now seeming less likely.

"We affirm the unique role of AI safety institutes and other relevant institutions to enhance international cooperation on AI risk management and increase global understanding in the realm of AI safety and security." - "Unique role", this is even better!

"We acknowledge the need to advance the science of AI safety and gather more empirical data with regard to certain risks, at the same time as we recognise the need to translate our collective understanding into empirically grounded, proactive measures with regard to capabilities that could result in severe risks. We plan to collaborate with the private sector, civil society and academia, to identify thresholds at which the level of risk posed by the design, development, deployment and use of frontier AI models or systems would be severe absent appropriate mitigations, and to define frontier AI model or system capabilities that could pose severe risks, with the ambition of developing proposals for consideration in advance of the AI Action Summit in France" - even better than above b/c it commits to a specific action and timeline

They has been writings from CEA on movement-building strategy. I think you might find them in the organiser handbook. These likely aren't to date though, especially since there's a new CEO.

That means people who’ve spent decades building experience in the field will no longer be able to find jobs.

Hot-take: I'd likely be less excited about people with decades in the field vs. new blood given that things seem stuck.

I think you've missed the main con and this is quite a subtle disadvantage that would only arise over longer periods of time.

Hiring people who aren't aligned in terms of values can exert subtle pressure to drift toward the mainstream over time. I know some people are going to say something along the lines of "why should we trust ourselves over other people?" and my answer is that if you don't have a particularly high regard for EA, you should go find a group that you do have particularly high regard for and support their work instead. Life's too short to waste on a group you find to be a bit "meh" and there are a lot of different groups out there.

Titoal argues that we should "have normal people around to provide sanity checks". I agree that it is important to try to not get too caught up in the EA bubble and maintain an understanding of how the rest of the world thinks, but I don't see this as outweighing the costs of introducing a high risk of value drift.

There is some merit to the argument that being value-aligned isn't particularly relevant to particular roles, but it's more complex than that because people's roles can change over time. Let's suppose you hire an employee for role X and they apply to shift to role Y, but you deny them vs. an employee who is more value-aligned but less qualified. That's a recipe for internal conflict. In practice, I suspect that there are some roles such as accountant where professional skills matter more and they are more likely to be happy sticking to that particular area. 

I'd love to hear from other people whether the management/leadership crunch has lessened?

A crash in the stock market might actually increase AI arms races if companies don't feel like they have the option to go slow.

Load more