- Who: anyone! software engineers will be primary contributors of course, but we will offer optional introductory sessions for the curious / aspiring developer. You do not have to have attended EAG Bay Area to attend the Hackathon.
- Where: Momentum office at 3004 16th St, just off the 16th St Mission BART Station
- When: Mon, 2/27 from 10am - 7pm
- What: work independently or with collaborators on EA-aligned project of your choosing
If you would like to share your Hackathon project idea, please leave a comment!
Agenda:
- 10am-10:15 — participants arrive and get set up
- 10:15-10:20 — welcome and logistics talk by Nicole Janeway Bills of EA Software Engineers
- 10:20-10:30 — opening talk by Austin Chen of Manifold Markets on expectations and ways of working for the event
- 10:30-10:45 — project pitches — people with ideas can share them with the group
- 10:45 — start of work and learning sessions
- 12pm — lunch — vegan and nonvegan options
- 6pm — dinner and project presentations
- 6:45-7pm — prize announcements and wrap up
Learning Sessions:
- 10:45 — setting up your development environment
- 11:30 — basics of git
- 1pm — intro to frontend development
- 2pm — open source contributions in AI safety (presentation link to be added later)
Looking forward to seeing you at the event! Add your photos here.
Project Proposal: Working on understanding how AIs work by watching them think as they play video games. Needs python developers/possibly c++.
I'd like to extend my current technical alignment project stack in one of a few non-trivial ways and would love help from more experienced software engineers to do it.
- Post: https://www.lesswrong.com/posts/bBuBDJBYHt39Q5zZy/decision-transformer-interpretability
- GitHub: https://github.com/jbloomAus/DecisionTransformerInterpretability
I'm not sure what the spread of technical proficiency is or how interested people are in assisting with my research agenda, but I've made a list of what I think are solid engineering challenges that I would love to get help with. 1/2 is stuff I can do/manage, and 3 is something I would need assistance with from someone with more experience.
1. Re-implementing bespoke grid worlds such as AI safety grid worlds, proper mazes or novel environments in currently maintained/compatible packages (gymnasium and/or <inigrid) to study alignment-relevant phenomena in RL agents/agent simulators.
2. Implementing methods for optimizing inputs (feature visualization) for pytorch models/MiniGrid environments.
3. Develop an real-time mechanistic interpretability app for procgen games (ie: extend https://distill.pub/2020/understanding-rl-vision/#feature-visualization to game-time, interactive play with pausing). I have a streamlit app that does this for gridworlds which I can demo.
Further Details:
1. The AI Safety GridWorlds (https://github.com/deepmind/ai-safety-gridworlds) is more than 5 years old and implemented in DeepMind’s pycolab engine (https://github.com/deepmind/pycolab). I’d love to study them with the current mechanistic interpretability techniques implemented in TransformerLens and the Decision Transformer Interpretability codebase, however, getting this all working will take time so it would be cool if people were interested in smashing that out. Having proper mazes for agents to solve in Minigrid would also be interesting in order to test our ability to reverse engineer algorithms from models using current techniques.
2. Feature Visualization Techniques aren’t new but have been previously used on continuous input spaces like CNNs. However, recent work by Jessica Rumbelow (SolidGoldMagicarp post: https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldmagikarp-plus-prompt-generation#Prompt_generation ) has shown that it’s possible to perform this technique on discrete spaces such as word embeddings. Extending this to the discrete environments we have been studying or might study (see 1) may provide valuable insights. Lucent (lucid for pytorch) may also be useful for this.
3. The current interactive analysis app for Decision Transformer Interpretability is written in Streamlit and so runs very slowly. This is fine for grid world-type environments but won’t work for continuous procedurally generated environments like procgen (https://github.com/openai/procgen). Writing a procgen/python wrapper that provides live model analysis (with the ability to pause mid-game) will be crucial to further work.
Feel free to ask questions here!
Thanks for your detailed project idea!!