Hide table of contents
This is a linkpost for https://neurips2022.mlsafety.org/

We're excited to announce the NeurIPS ML Safety workshop! To our knowledge it is the first workshop at a top ML conference to emphasize and explicitly discuss x-risks.

X-Risk Analysis Prizes

$100K in paper prizes will be awarded. There is $50K for best paper awards. There is also $50K in awards for discussing x-risk. This will be awarded to researchers who adequately explain how their work relates to AI x-risk. Analyses must engage with existing arguments for existential risks or strategies to reduce them.

What is the topic of the workshop?

Broadly, the focus of the workshop is on ML Safety, which is the umbrella term that refers to research in the following areas:

Robustness: designing systems to be resistant to adversaries.

Monitoring: detecting undesirable behavior and discovering unexpected model functionality.

This category contains interpretability and transparency research, which could be useful for understanding the goals/thought processes of advanced AI systems. It also includes anomaly detection, which has been useful for detecting proxy gaming. It also includes Trojans research, which involves identifying whether a deep neural network will suddenly change behavior if certain unknown conditions are met.

Alignment: building models that represent and safely optimize hard-to-specify human values.

This also includes preventing agents from pursuing unintended instrumental subgoals and designing them to be corrigible.

Systemic Safety: using ML to address broader governance risks related to how ML systems are handled or deployed. Examples include ML for cyberdefense, ML for improving epistemics, and cooperative AI.

How do academic workshops work?

The majority of AI research is published at conferences. These conferences support independently run workshops for research sub-areas. Researchers submit papers to workshops, and if their work is accepted, they are given the opportunity to present it to other participants. For background on the ML research community and its dynamics, see A Bird's Eye View of the ML Field.

Background

A broad overview of these research areas is in Unsolved Problems in ML Safety.

For a discussion of how these problems impact x-risk, please see Open Problems in AI X-Risk.

72

0
0

Reactions

0
0
Comments


No comments on this post yet.
Be the first to respond.
Curated and popular this week
 ·  · 10m read
 · 
I wrote this to try to explain the key thing going on with AI right now to a broader audience. Feedback welcome. Most people think of AI as a pattern-matching chatbot – good at writing emails, terrible at real thinking. They've missed something huge. In 2024, while many declared AI was reaching a plateau, it was actually entering a new paradigm: learning to reason using reinforcement learning. This approach isn’t limited by data, so could deliver beyond-human capabilities in coding and scientific reasoning within two years. Here's a simple introduction to how it works, and why it's the most important development that most people have missed. The new paradigm: reinforcement learning People sometimes say “chatGPT is just next token prediction on the internet”. But that’s never been quite true. Raw next token prediction produces outputs that are regularly crazy. GPT only became useful with the addition of what’s called “reinforcement learning from human feedback” (RLHF): 1. The model produces outputs 2. Humans rate those outputs for helpfulness 3. The model is adjusted in a way expected to get a higher rating A model that’s under RLHF hasn’t been trained only to predict next tokens, it’s been trained to produce whatever output is most helpful to human raters. Think of the initial large language model (LLM) as containing a foundation of knowledge and concepts. Reinforcement learning is what enables that structure to be turned to a specific end. Now AI companies are using reinforcement learning in a powerful new way – training models to reason step-by-step: 1. Show the model a problem like a math puzzle. 2. Ask it to produce a chain of reasoning to solve the problem (“chain of thought”).[1] 3. If the answer is correct, adjust the model to be more like that (“reinforcement”).[2] 4. Repeat thousands of times. Before 2023 this didn’t seem to work. If each step of reasoning is too unreliable, then the chains quickly go wrong. Without getting close to co
 ·  · 1m read
 · 
JamesÖz
 ·  · 3m read
 · 
Why it’s important to fill out this consultation The UK Government is currently consulting on allowing insects to be fed to chickens and pigs. This is worrying as the government explicitly says changes would “enable investment in the insect protein sector”. Given the likely sentience of insects (see this summary of recent research), and that median predictions estimate that 3.9 trillion insects will be killed annually by 2030, we think it’s crucial to try to limit this huge source of animal suffering.  Overview * Link to complete the consultation: HERE. You can see the context of the consultation here. * How long it takes to fill it out: 5-10 minutes (5 questions total with only 1 of them requiring a written answer) * Deadline to respond: April 1st 2025 * What else you can do: Share the consultation document far and wide!  * You can use the UK Voters for Animals GPT to help draft your responses. * If you want to hear about other high-impact ways to use your political voice to help animals, sign up for the UK Voters for Animals newsletter. There is an option to be contacted only for very time-sensitive opportunities like this one, which we expect will happen less than 6 times a year. See guidance on submitting in a Google Doc Questions and suggested responses: It is helpful to have a lot of variation between responses. As such, please feel free to add your own reasoning for your responses or, in addition to animal welfare reasons for opposing insects as feed, include non-animal welfare reasons e.g., health implications, concerns about farming intensification, or the climate implications of using insects for feed.    Question 7 on the consultation: Do you agree with allowing poultry processed animal protein in porcine feed?  Suggested response: No (up to you if you want to elaborate further).  We think it’s useful to say no to all questions in the consultation, particularly as changing these rules means that meat producers can make more profit from sel
Recent opportunities in AI safety