Hide table of contents

TLDR: the AI safety initiative at Georgia Tech recently hosted an AI safety focused track at the college's flagship AI hackathon. In this post I share how it went and some of our thoughts.

Overview

Hey!  I’m Yixiong, co-director of Georgia Tech’s AI safety student group.  We recently hosted an AI safety focused track at Georgia Tech’s biggest AI hackathon, AI ATL.  I’m writing this retrospective because I think this could be a useful data point to update on for fellow AIS groups thinking about hosting similar things!

The track was focused on evaluations on safety-critical and interesting capabilities, this is the track page that was shown to hackers (feel free to reuse/borrow content, just let us know!)

Huge thank you (in no particular order) to Michael Chen, Long Phan, Andrey Anurin, Abdur Raheem, Esben Kran, Zac Hatfield-Dodds, Aaron Begg, Alex Albert, Oliver Zhang, and others who helped us make this happen!

 

Quick stats:

  • ~350 hackers (overall hackathon).
  • 104 projects submitted (overall hackathon).
  • Submissions to our AI safety track: 16 teams (~50 people).
    • 6 projects were solid/relevant, the rest were very noisy submissions, since you could submit to as many tracks as you want to.
  • Estimate # of low/moderate engagement with AI safety (attending workshops, reading track information): 100 hackers.
  • Estimate # of medium/high engagement with AI safety: 20 hackers.
  • These are the submissions that we got, the 6 solid projects were (in order of goodness, keep in mind that most of these came from people who were new to AI safety!):
    • Privacy-Resilience and Adaptability Benchmark (PRAB): puts the model in a realistic and sensitive deployment environment and benchmarks models against several categories of prompting attacks
    • StressTestAI: similar to the above, but in less realistic and ‘higher stakes’ settings like disaster response, but creative metrics.
    • DiALignment: benchmarked refusal after performing activation steering away from the refusal behavior.
    • AgentArena: set up agents in cooperative games (like prisoner’s dilemma) and observed behavior
    • Are you sure about that?: tried to benchmark LLMs’ ability to spot unfaithful CoT against humans (the user)
    • LLM Defense Toolkit: set up a pipeline to benchmark the safety of a user specified LLM with an array of generated attacks.

Relevant track details:

We tried to optimize the track in a bunch of ways, including but not limited to:

  • Competitive prize (cash is per team):
  • Association with big names: we listed Anthropic and Apart Research as supporters
    • Anthropic gave the track a shout out next to their other track “build with Claude”
  • Approachableness: made track description as non-intimidating as possible, providing an abundance of support in the form of mentorship and workshop/speakers
  • Appeal to intellectual curiosity:
    • Background reading for AI safety/evaluation fundamentals ~30 min total
    • Events: about 30 people attended each one.
      • Workshop by Apart research: how to scaffold LLMs, build agents, and run evaluations against them
      • Talk by METR: the case for AI evaluations and governance
      • Talk by CAIS: jailbreak and red-teaming LLMs

Execution

As with all events, execution matters a lot.  This is the area that we felt could be improved the most.

  • Collaboration: do a vibe check if you’re thinking about collaborating with your main hackathon org on campus!
    • We chose to host this as part of a general AI hackathon (rather than standalone) hoping to leverage the main host’s organizing capacity and reach to expose new people to AI safety.  This was a pain for us, mainly because the main hosts never really tried to understand what our track was about (probably faults on both sides). The impression is that it was a chore to deal with us, so make sure they’re on board before collaborating! You shouldn’t over-update on this, we may just have an outlier.
  • The contents of the track were well calibrated for difficulty and perceptiveness, as per feedback from teams that gave our track a shot and properly engaged with the topics.
    • You should try to have them before hacking begins. A complaint is that speakers/workshops take time away from hacking.
    • Great feedback from hackers who read through our materials and gave it a shot
  • Physical presence: I think we could’ve gotten double the number of solid submissions if we had a significant physical presence at the in-person venue.
    • What we did
      • Have mentors available in person and online during office hours
      • Project virtual speaker events onto a screen in a physical room and announce them in person.
    • What we wish we also did
      • Have a booth/table on day 1 of the hackathon when everyone comes to check in.  Give away stickers/merch and pitch our track
      • In person speakers, especially from big name companies.
      • Wear AI safety club merch (although we don’t even have merch…)

Our opinion/takes

  • Hackathon patterns: hackathons are a staple at major technical universities.  These may be well known, but I had never attended a hackathon before this and found these interesting.
    • The BEST time to pitch your track and make announcements is the first day when people come for check in, since everyone is there.
    • Do the convincing (speaker, workshop, etc) before the end of the first day, since people usually decide which track to do/their idea by then.
    • Best time for in-person events (when people will be at the venue): first day during check in and right after food is served…
  • Potential failures modes / most valuable things to improve
    • If you can, try to communicate that popular tools/libraries/frameworks are useful for your track!
      • People want to use their existing stack, and thinking they have no comparative advantage in anything new to them.
      • Probably the main shortcoming, despite our track being by far the most interesting (the rest were like “best use of XXX”...)
    • The track being incompatible with other tracks if submission to multiple tracks is allowed.  Going for the safety track means losing out on the others.
    • People being confused on what to do
      • Make sure you explain clearly what you want people to do as this is a niche topic for now, give example projects (see ours) and starter code.
      • People do NOT like reading!  Maximize the signal to words ratio!!!
    • You should borderline spam announcements, contact hackers to pitch EARLY, like before the hackathon starts if you can get their emails.
      • People have to know that your track exists!
    • Get a notable company to officially sponsor and a notable judge, this would be very difficult but will probably be the biggest attractive factor…
    • Do NOT have more than 3 tracks if you host a standalone hackathon, choosing is hard for people :)
  • There is value in hosting at a general hackathon
    • The distribution of people at a general hackathon is different from the distribution of people who will come if you advertise a standalone AI safety hackathon.  If your goal is to reach new audiences, then being a part of a general hackathon will increase your chances of nerd-sniping!
  • You should frame your AI safety specific workshops as useful for all the tracks, and appeal to credibility as much as you can.  Also very important to announce them in real time in person as people do NOT check slack/discord announcements.

What’s next?

I think our attempt serves as a successful proof of concept for bringing the topics of AI safety/alignment hackathons to campuses.  People will engage with the topic if you try really hard.  Don’t hesitate to reach out for help if you’re thinking of something similar and want to learn more about what we did!

Two things we might do in the future:

  • Iterate on this and host a track at Georgia Tech’s data science hackathon
  • Become an Apart Research node for hackathons and host standalone AI safety hackathons.

Thanks for reading and I hope it wasn’t too noisy!

Yixiong & the Georgia Tech AISI team.

Comments


No comments on this post yet.
Be the first to respond.
Curated and popular this week
LintzA
 ·  · 15m read
 · 
Cross-posted to Lesswrong Introduction Several developments over the past few months should cause you to re-evaluate what you are doing. These include: 1. Updates toward short timelines 2. The Trump presidency 3. The o1 (inference-time compute scaling) paradigm 4. Deepseek 5. Stargate/AI datacenter spending 6. Increased internal deployment 7. Absence of AI x-risk/safety considerations in mainstream AI discourse Taken together, these are enough to render many existing AI governance strategies obsolete (and probably some technical safety strategies too). There's a good chance we're entering crunch time and that should absolutely affect your theory of change and what you plan to work on. In this piece I try to give a quick summary of these developments and think through the broader implications these have for AI safety. At the end of the piece I give some quick initial thoughts on how these developments affect what safety-concerned folks should be prioritizing. These are early days and I expect many of my takes will shift, look forward to discussing in the comments!  Implications of recent developments Updates toward short timelines There’s general agreement that timelines are likely to be far shorter than most expected. Both Sam Altman and Dario Amodei have recently said they expect AGI within the next 3 years. Anecdotally, nearly everyone I know or have heard of who was expecting longer timelines has updated significantly toward short timelines (<5 years). E.g. Ajeya’s median estimate is that 99% of fully-remote jobs will be automatable in roughly 6-8 years, 5+ years earlier than her 2023 estimate. On a quick look, prediction markets seem to have shifted to short timelines (e.g. Metaculus[1] & Manifold appear to have roughly 2030 median timelines to AGI, though haven’t moved dramatically in recent months). We’ve consistently seen performance on benchmarks far exceed what most predicted. Most recently, Epoch was surprised to see OpenAI’s o3 model achi
Dr Kassim
 ·  · 4m read
 · 
Hey everyone, I’ve been going through the EA Introductory Program, and I have to admit some of these ideas make sense, but others leave me with more questions than answers. I’m trying to wrap my head around certain core EA principles, and the more I think about them, the more I wonder: Am I misunderstanding, or are there blind spots in EA’s approach? I’d really love to hear what others think. Maybe you can help me clarify some of my doubts. Or maybe you share the same reservations? Let’s talk. Cause Prioritization. Does It Ignore Political and Social Reality? EA focuses on doing the most good per dollar, which makes sense in theory. But does it hold up when you apply it to real world contexts especially in countries like Uganda? Take malaria prevention. It’s a top EA cause because it’s highly cost effective $5,000 can save a life through bed nets (GiveWell, 2023). But what happens when government corruption or instability disrupts these programs? The Global Fund scandal in Uganda saw $1.6 million in malaria aid mismanaged (Global Fund Audit Report, 2016). If money isn’t reaching the people it’s meant to help, is it really the best use of resources? And what about leadership changes? Policies shift unpredictably here. A national animal welfare initiative I supported lost momentum when political priorities changed. How does EA factor in these uncertainties when prioritizing causes? It feels like EA assumes a stable world where money always achieves the intended impact. But what if that’s not the world we live in? Long termism. A Luxury When the Present Is in Crisis? I get why long termists argue that future people matter. But should we really prioritize them over people suffering today? Long termism tells us that existential risks like AI could wipe out trillions of future lives. But in Uganda, we’re losing lives now—1,500+ die from rabies annually (WHO, 2021), and 41% of children suffer from stunting due to malnutrition (UNICEF, 2022). These are preventable d
Rory Fenton
 ·  · 6m read
 · 
Cross-posted from my blog. Contrary to my carefully crafted brand as a weak nerd, I go to a local CrossFit gym a few times a week. Every year, the gym raises funds for a scholarship for teens from lower-income families to attend their summer camp program. I don’t know how many Crossfit-interested low-income teens there are in my small town, but I’ll guess there are perhaps 2 of them who would benefit from the scholarship. After all, CrossFit is pretty niche, and the town is small. Helping youngsters get swole in the Pacific Northwest is not exactly as cost-effective as preventing malaria in Malawi. But I notice I feel drawn to supporting the scholarship anyway. Every time it pops in my head I think, “My money could fully solve this problem”. The camp only costs a few hundred dollars per kid and if there are just 2 kids who need support, I could give $500 and there would no longer be teenagers in my town who want to go to a CrossFit summer camp but can’t. Thanks to me, the hero, this problem would be entirely solved. 100%. That is not how most nonprofit work feels to me. You are only ever making small dents in important problems I want to work on big problems. Global poverty. Malaria. Everyone not suddenly dying. But if I’m honest, what I really want is to solve those problems. Me, personally, solve them. This is a continued source of frustration and sadness because I absolutely cannot solve those problems. Consider what else my $500 CrossFit scholarship might do: * I want to save lives, and USAID suddenly stops giving $7 billion a year to PEPFAR. So I give $500 to the Rapid Response Fund. My donation solves 0.000001% of the problem and I feel like I have failed. * I want to solve climate change, and getting to net zero will require stopping or removing emissions of 1,500 billion tons of carbon dioxide. I give $500 to a policy nonprofit that reduces emissions, in expectation, by 50 tons. My donation solves 0.000000003% of the problem and I feel like I have f