AI Safety Ideas: A collaborative AI safety research platform

Apart Research; Esben Kran

TLDR; We present the AI safety ideas and research platform AI Safety Ideas in open alpha. Add and explore research ideas on the website here: aisafetyideas.com.

AI Safety Ideas has been accessible for a while in an alpha state (4 months, on-and-off development) and we now publish it in open alpha to receive feedback and develop it continuously with the community of researchers and students in AI safety. All of the projects are either from public sources (e.g. AlignmentForum posts) or posted on the website itself.

The current website represents the first steps towards an accessible crowdsourced research platform for easier research collaboration and hypothesis testing.

The gap in AI safety

Research prioritization & development

Research prioritization is hard and even more so in a pre-paradigmatic field like AI safety. We can grok the highest-karma post on the AlignmentForum but is there another way?

With AI Safety Ideas, we introduce a collaborative way to prioritize and work on specific agendas together through social features. We hope this can become a scalable research platform for AI safety.

Successful examples of less systematized but similar, collaborative, online, and high quality output projects can be seen in Discord servers such as EleutherAI, CarperAI, Stability AI, and Yannic Kilcher’s Discord, in hackathons, and in competitions such as the inverse scaling competition.

Additionally, we are also missing an empirically driven impact evaluation of AI safety projects. With the next steps of development described further down, we hope to make this easier and more available while facilitating more iteration in AI safety research. Systemized hypotheses testing with bounties can help funders directly fund specific results and enables open evaluation of agendas and research projects.

Mid-career & student newcomers

Novice and entrant participation in AI safety research is mostly present in two forms at the moment: 1) Active or passive part-time course participation with a capstone project (AGISF, ML Safety) and 2) flying to London or Berkeley for three months to participate in full-time paid studies and research (MLAB, SERI MATS, PIBBSS, Refine).

Both are highly valuable but a third option seems to be missing: 3) An accessible, scalable, low time commitment, open research opportunity. Very few people work in AI safety and allowing decentralized, volunteer or bounty-driven research will allow many more to contribute to this growing field.

By allowing this flexible research opportunity, we can attract people who cannot participate in option (2) because of visa, school / life / work commitments, location, rejection, or funding while we can attract a more senior and active audience compared to option (1).

Next steps

Oct	Releasing and building up the user base and crowdsourced content. Create an insider build to test beta features. Apply to join the insider build here.
Nov	Implementing hypothesis testing features: Creating hypotheses, linking ideas and hypotheses, adding negative and positive results to hypotheses. Creating an email notification system.
Dec	Collaboration features: Contact others interested in the same idea and mentor ideas. A better commenting system with a results comment that can indicate if the project has been finished or not, what the results are, and by who was it done.
Jan	Adding moderation features: Accepting results, moderating hypotheses, admin users. Add bounty features for the hypotheses and a simple user karma system.
Feb	Share with ML researchers and academics in EleutherAI and CarperAI. Implement the ability to create special pages with specific private and public ideas curated for a specific purpose (title and description included). Will help integrate with local events, e.g. the Alignment Jams.
Mar<	Allow editing and save editing history of hypotheses and ideas. Get DOIs for reviewed hypothesis result pages. Implement the EigenKarma karma system. Implement automatic auditing by NLP. Monitor the progress on different clusters of hypotheses and research ideas (research agendas). Release meta-science research on the projects that have come out of the platform and the general progress.

Risks

Wrong incentives on the AI Safety ideas platform leads to people working on others’ agendas instead of working on their own inside view.
AI Safety Ideas does not receive traction and by extension becomes less useful than would be expected.
Some users who do alignment research without a profound understanding of why alignment is important, discover ideas that have the potential to help AI capabilities, without being worried enough about info hazards to contain them properly.
Project bounties on the AI Safety Ideas platform will be occupied by capabilities-first agendas and mislead new researchers.

Risk mitigation

Several of these are not implemented yet but will be as we develop it further.

Ensure that specific agendas do not get special attention compared to others and implement incentives to work on new or updated projects and hypotheses. Have structured meetings and feedback sessions with leaders in AI safety field-building and conduct regular research about how the platform is used.
Do regular, live user interviews and ensure giving feedback is quick and easy. We have interviewed 18 until now and have automated feedback monitoring on our server. We will embed a feedback form directly on the website. Evaluate usefulness of features by creating an insider build.
Restricting themes within AI safety and nudging towards safety thinking in communication. It is also a risk if these capabilities-grokking capable researchers work independently and we might be able to pivot their attitude towards safety by providing this platform.
Ensure vetting of the ideas and users. Make the purpose and policies of the website very clear. Invite admin users based on AlignmentForum karma with the ability to downvote ideas, leading to hiding it until further evaluation.

Feedback

Give anonymous feedback on the website here or write your feedback in the comments. If you end up using the website, we also appreciate your in-depth feedback here (2-5 min). If you want any of your ideas removed or rephrased on the website, please send an email to operations@apartresearch.com.

PS: It is still very much in alpha and there might be mistakes in the research project descriptions. Please do point out any problems in the "Report an issue".

Help out

The platform is open source and we appreciate any pull requests on the insider branch. Add any bugs or feature requests on the issues page.

Apply to join the insider builds here to give feedback for the next versions. Join our Discord to discuss the development.

Thanks to Plex, Maris Sala, Sabrina Zaki, Nonlinear, Thomas Steinthal, Michael Chen, Aqeel Ali, JJ Hepburn, Nicole Nohemi, and Jamie Bernardi.

Geoffrey MillerOct 18 20229

This looks like an interesting platform for sharing ideas about AI safety.

You mention that AI safety is a 'pre-paradigmatic field' -- however, to a newcomer like me, the safety ideas and projects on the AI Safety Ideas site so far look pretty 'paradigmatic', in the sense of closely following the standard EA AGI X-risk paradigm that's centered around the ideas of Yudkowsky, Bostrom, MIRI, utility maximization, instrumental convergence, deep learning, fast takeoff, etc.

I worry that this reflects & encourages a premature convergence onto a governing paradigm that may deter newcomers from contributing new ideas that fall outside the current 'Overton window' of AI alignment. For example: (1) a lot of AI alignment work seems based on a tacit assumption that AI won't become an global catastrophic risk until it reaches AGI level, but I can see reasonable arguments that even quite narrow AI could be severely risky long before AGI is reached. Also, (2) a lot of AI alignment seems focused much more on how to align specific AI systems with specific human users, rather than on how to align human groups that are using AI, with other potentially conflicting groups that are using AI.

So, I guess the question arises: to what extent do you want AI Safety Ideas to elicit new ideas within the current paradigm, versus new ideas that stray outside the current paradigm?

Esben KranOct 18 20229

You raise a very good point that I agree with. Right now, the platform is definitely biased towards the existing paradigm. This will probably be the case during the first few months, but we hope that it will help make the exploration of new directions and paradigms easier at the same time.

This also raises the point of the ideas currently playing into the canon of AI safety instead of looking at the vast literature outside of AI safety that concerns itself with the same topics but with another framing.

So to answer your questions; we want AISI to make it easier to elicit new ideas in all paradigms and directions with our personal bias moving that more towards new perspectives as we implement better functionality.

Geoffrey MillerOct 19 20225

Esben -- thanks very much for your reply. That all makes sense -- to develop a gradual broadening-out from the current paradigm to welcoming new perspectives from other existing research traditions.

[comment deleted]Oct 18 20221

Deleted by Apart Research, 10/18/2022

Reason: Wrong account

Aleksi MaunuOct 18 20229

Small note: the title made me think the platform is made by the organization Open AI

FlorentBerthetOct 18 20225

Same. I suggest "AI Safety Ideas: a collaborative AI safety research platform"

Apart ResearchOct 18 20224

Very true! We have graciously adopted your formulation. Thank you.

YadavOct 18 20223

Yeah, I thought this too.

aogOct 17 20226

The application form is showing up as private for me. Very cool idea though, the success of Eleuther and Stability suggests that this is a viable model. Excited to see it unfold and hopefully contribute!

Esben KranOct 17 20225

Thank you! It should be fixed now.

Stephen McAleeseOct 22 20222

One way of doing automated AI safety research is for AI safety researchers to create AI safety ideas on aisafetyideas.com and then use the titles as prompts for a language model. Here is GPT-3 generating a response to one of the ideas:

Esben KranOct 25 20221

Uuh, interesting! Maybe I'll do that as a weekend project for fun. An automatic comment based on the whole idea as a prompt.

Effective Altruism Forum
EA Forum