I work primarily on AI Alignment. My main direction at the moment is to accelerate alignment work via language models and interpretability.
We're doing a hackathon with Apart Research on 26th. I created a list of problem statements for people to brainstorm off of.
Pro-active insight extraction from new research
Reading papers can take a long time and is often not worthwhile. As a result, researchers might read too many papers or almost none. However, there are still valuable nuggets in papers and posts. The issue is finding them. So, how might we design an AI research assistant that proactively looks at new papers (and old) and shares valuable information with researchers in a naturally consumable way? Part of this work involves presenting individual research with what they would personally find valuable and not overwhelm them with things they are less interested in.
How can we improve the LLM experience for researchers?
Many alignment researchers will use language models much less than they would like to because they don't know how to prompt the models, it takes time to create a valuable prompt, the model doesn't have enough context for their project, the model is not up-to-date on the latest techniques, etc. How might we make LLMs more useful for researchers by relieving them of those bottlenecks?
Simple experiments can be done quickly, but turning it into a full project can take a lot of time
One key bottleneck for alignment research is transitioning from an initial 24-hour simple experiment in a notebook to a set of complete experiments tested with different models, datasets, interventions, etc. How can we help researchers move through that second research phase much faster?
How might we use AI agents to automate alignment research?
As AI agents become more capable, we can use them to automate parts of alignment research. The paper "A Multimodal Automated Interpretability Agent" serves as an initial attempt at this. How might we use AI agents to help either speed up alignment research or unlock paths that were previously inaccessible?
How can we nudge research toward better objectives (agendas or short experiments) for their research?
Even if we make researchers highly efficient, it means nothing if they are not working on the right things. Choosing the right objectives (projects and next steps) through time can be the difference between 0x to 1x to +100x. How can we ensure that researchers are working on the most valuable things?
What can be done to accelerate implementation and iteration speed?
Implementation and iteration speed on the most informative experiments matter greatly. How can we nudge them to gain the most bits of information in the shortest time? This involves helping them work on the right agendas/projects and helping them break down their projects in ways that help them make progress faster (and avoiding ending up tunnel-visioned on the wrong project for months/years).
How can we connect all of the ideas in the field?
How can we integrate the open questions/projects in the field (with their critiques) in such a way that helps the researcher come up with well-grounded research directions faster? How can we aid them in choosing better directions and adjust throughout their research? This kind of work may eventually be a precursor to guiding AI agents to help us develop better ideas for alignment research.
As an update to the Alignment Research Assistant I'm building, here is a set of shovel-ready tasks I would like people to contribute to (please DM if you'd like to contribute!):
1. Setup the Continue extension for research: https://www.continue.dev/
2. Data sourcing and management
3. Extract answers to questions across multiple papers/posts (feeds into Continue)
4. Design Autoprompts for alignment research
5. Simulated Paper Reviewer
6. Jargon and Prerequisite Explainer
7. Setup automated "suggestion-LLM"
8. Figure out if we can get a useable browser inside of VSCode (tried quickly with the Edge extension but couldn't sign into the Claude chat website)
9. "Alignment Research Codebase" integration (can add as Continue backend)
Bulk fast content extraction
Personalized Research Newsletter
Discord Bot for Project Proposals
I've created a private discord server to discuss this work. If you'd like to contribute to this project (or might want to in the future if you see a feature you'd like to contribute to) or if you are an alignment/governance researcher who would like to be a beta user so we can iterate faster, please DM me for a link!
Hey everyone, my name is Jacques, I'm an independent technical alignment researcher (primarily focused on evaluations, interpretability, and scalable oversight). I'm now focusing more of my attention on building an Alignment Research Assistant. I'm looking for people who would like to contribute to the project. This project will be private unless I say otherwise.
Side note: I helped build the Alignment Research Dataset ~2 years ago. It has been used at OpenAI (by someone on the alignment team), (as far as I know) at Anthropic for evals, and is now used as the backend for Stampy.ai.
If you are interested in potentially helping out (or know someone who might be!), send me a DM with a bit of your background and why you'd like to help out. To keep things focused, I may or may not accept.
I have written up the vision and core features for the project here. I expect to see it evolve in terms of features, but the vision will likely remain the same. I'm currently working on some of the features and have delegated some tasks to others (tasks are in a private GitHub project board).
I'm also collaborating with different groups. For now, the focus is to build core features that can be used individually but will eventually work together into the core product. In 2-3 months, I want to get it to a place where I know whether this is useful for other researchers and if we should apply for additional funding to turn it into a serious project.
For instance (and to their credit), OpenAI has already committed 20% of their compute secured to date to solving the problem of aligning superintelligent AI systems.
lol
I'm currently trying to think of project/startup ideas in the space of d/acc. If anyone would like to discuss ideas on how to do this kind of work outside of AGI labs, send me a DM.
Note that Entrepreneurship First will be running a cohort of new founders focused on d/acc for AI.
I shared the following as a bio for EAG Bay Area 2024. I'm sharing this here if it reaches someone who wants to chat or collaborate.
Hey! I'm Jacques. I'm an independent technical alignment researcher with a background in physics and experience in government (social innovation, strategic foresight, mental health and energy regulation). Link to Swapcard profile. Twitter/X.
CURRENT WORK
TOPICS TO CHAT ABOUT
POTENTIAL COLLABORATIONS
TYPES OF PEOPLE I'D LIKE TO COLLABORATE WITH
Another data point: I got my start in alignment through the AISC. I had just left my job, so I spent 4 months skilling up and working hard on my AISC project. I started hanging out on EleutherAI because my mentors spent a lot of time there. This led me to do AGISF in parallel.
After those 4 months, I attended MATS 2.0 and 2.1. I've been doing independent research for ~1 year and have about 8.5 more months of funding left.
Hey everyone, in collaboration with Apart Research, I'm helping organize a hackathon this weekend to build tools for accelerating alignment research. This hackathon is very much related to my effort in building an "Alignment Research Assistant."
Here's the announcement post:
2 days until we revolutionize AI alignment research at the Research Augmentation Hackathon!
As AI safety researchers, we pour countless hours into crucial work. It's time we built tools to accelerate our efforts! Join us in creating AI assistants that could supercharge the very research we're passionate about.
Date: July 26th to 28th, online and in-person
Prizes: $2,000 in prizes
Why join?
* Build tools that matter for the future of AI
* Learn from top minds in AI alignment
* Boost your skills and portfolio
We've got a Hackbook with an exciting project to work on waiting for you! No advanced AI knowledge required - just bring your creativity!
Register now: Sign up on the website here, and don't miss this chance to shape the future of AI research!