Reflections on my 5-month AI alignment upskilling grant

Jay Bailey

This is a linkpost for https://www.lesswrong.com/posts/wnF9iydYiBMRs2jPg/reflections-on-my-5-month-alignment-upskilling-grant

Five months ago, I received a grant from the Long Term Future Fund to upskill in AI alignment. As of a few days ago, I was invited to Berkeley for two months of full-time alignment research under Owain Evans’s stream in the SERIMATS program. This post is about how I got there.

The post is partially a retrospective for myself, and partially a sketch of the path I took so that others can decide if it’s right for them. This post was written relatively quickly - I’m happy to answer more questions via PM or in the comments.

Summary

I was a software engineer for 3-4 years with little to no ML experience before I was accepted for my grant.
I did a bunch of stuff around fundamental ML maths, understanding RL and transformers, and improving my alignment understanding.
Having tutors, getting feedback on my plan early on, and being able to pivot as I went were all very useful for not getting stuck doing stuff that was no longer useful.
I probably wouldn’t have gotten into SERIMATS without that ability to pivot midway through.
After SERIMATS, I want to finish off the last part of the grant while I find work, then start work as a Research Engineer at an alignment organisation.
If in doubt, put in an application!

My Background

My background is more professional and less academic than most. Until I was 23, I didn’t do much of anything - then I got a Bachelor of Computer Science from a university ranked around 1,000th, with little maths and no intent to study ML at all, let alone alignment. It was known for strong graduate employment though, so I went straight into industry from there. I had 3.5 years of software engineering experience (1.5 at Amazon, 2 as a senior engineer at other jobs) before applying for the LTFF grant. I had no ML experience at the time, besides being halfway through doing the fast.ai course in my spare time.

Not going to lie, seeing how many Top-20 university PhD students I was sharing my cohort with (At least three!) was a tad intimidating - but I made it in the end, so industry experience clearly has a role to play as well.

Grant

The details of the grant are one of the main reasons I wrote this - I’ve been asked for 1:1’s and details on this at least three times in the last six months, and if you get asked something from at least three different people, it might be worth writing it up and sharing it around.

Firstly, the process. Applying for the grant is pretty painless. As long as you have a learning plan already in place, the official guidance is to take 1-2 hours on it. I took a bit longer, polishing it more than required. I later found out my plan was more detailed than it probably had to be. In retrospect, I think my level of detail was good, but I spent too much time editing. AI Safety Support helped me with administration. The main benefit that I got from it was that the tutoring and compute money was tax free (since I didn’t get the money personally, rather I used a card they provided me) and I didn’t have to worry about tax withholding throughout the year.

Secondly, the money. I agonized over how much money to ask for. This took me days. I asked myself how much I really needed, then I asked myself how much I would actually accept gladly with no regrets, then I balked at those numbers, even knowing that most people ask for too little, not too much. I still balk at the numbers, to be honest, but it would have been so much easier to write this if I had other grants to go off. So, in the interest of transparency and hopefully preventing someone else going through the same level of anguish, I’m sharing the full text of my grant request, including money requested (in Australian dollars, but you can always convert it) here. Personal embarrassment aside, since LTFF publishes these grants anyway (but is very backlogged at the moment apparently, since they haven’t shared them this year) I think sharing numbers is fine.

To summarise - in the end, I gave them three numbers of 50%, 75%, and 100% of my contractor salary at the time. I told them honestly that I definitely didn’t expect 100%, and that I would have to think about whether to take 50% or not - it was at the border of whether I’d take the pay cut or not to upskill in this speculative area. They gave me 75%, which was an amount I was glad to take with no reservations. I also asked for, and got, some tutoring and compute budget.

As for advice on what level of background you need to apply - I would advise just applying. Applications are processed on a rolling basis, and it only takes an hour or two. I can’t tell you what level of background you need, since I only got one bit of information - the acceptance. I don’t know if I was a slam dunk, a borderline case, or somewhere in between. And I don’t know how FTX might or might not affect future funding.

How It Went

First off, let’s look at what I actually achieved in those five months. Thus far, I have:

Maths:

Learnt single-variable calculus and the first half of multivariable calculus (Poorly)
Completed a first course in linear algebra (Solidly)
Completed some basic probability study (Random variables, probability distributions, random vectors, central limit theorem) (Solidly)
Gone through the first few chapters of Probability Theory: The Logic of Science (Mainly conceptually)

Alignment:

Formed a group and completed AGI Safety Fundamentals.
Completed Alignment 201 as part of SERIMATS.
Read several Alignment Forum sequences.
Greatly improved my inside view on what research agendas I think are most promising.
Attended John’s workshops as part of SERIMATS.

Machine Learning:

Reproduced several reinforcement learning algorithms.
Wrote a distillation on DQN (which was used as teaching material for ARENA virtual!).
Completed about 75% of the MLAB curriculum.
Built a transformer from scratch.
Reproduced some key LLM benchmarks like chain-of-thought prompting and self-consistency as part of SERIMATS.
Produced some basic original language model research as part of SERIMATS.

Other:

Formed AI Safety Brisbane, a local AI Safety discussion group for my city. (I've arranged an organiser while I'm in Berkeley)
Facilitated an AI safety weekend workshop organized by AI Safety Australia and New Zealand.

These last two weren’t funded by this grant, but did require skills and knowledge that I built using it.

Looking back at the list, I’m pretty happy with my performance overall, even though it often felt week to week like not a lot was getting done. It definitely would have taken me a lot longer to do all this without grant work.

In terms of hours spent, I wasn’t able to get as many quality hours as I liked. I had intended to do ~25 hours per week of deep work, ignoring Cal Newport’s mention that 4 hours per day of deep work was already pretty high level - in the end, I think I was able to get about 20 hours per week of work done, with most of that being deep work. Some weeks were as many as 30, others as few as 15, but I never had any zero weeks, or even really bad weeks, so motivation at least remained reasonably consistent throughout, which I was worried about. While I still feel guilty about doing less hours than I intended, I am trying to remind myself that results matter more than hours - if I am happy with my results, I should be pleased in general. More hours worked are good only insofar as they can improve results.

Some very useful things I recommend to people who want to do this are to seek out help and guidance, especially early on. I reached out to AI Safety Support to help create my plan, and to people at labs I wanted to work at in order to refine it. This helped me clear out a lot of unnecessary prerequisites - for instance, I ended up doing a lot less frontloading of maths than I thought I’d need to do, and instead focused on learning it in parallel with studying the actual ML skills I would want as a research engineer. I thought I would need a full Linear Algebra course before even touching PyTorch - this was very far from true, even though it eventually came in handy when I began diving into transformer architecture.

Tutoring was very useful as well - I had tutoring for mathematics, for conceptual understanding of RL algorithms, and to help me through the MLAB curriculum. These all improved my learning speed quite a bit. Especially if you’re a currently well-paid professional who would be getting a decent salary for alignment upskilling, the extra cost of a bit of tutoring is relatively low compared to salary replacement, and should improve the overall return on investment (in terms of learning per dollar) of the grant.

Being able to pivot was also useful - I was planning to continue to deep dive into RL after the first couple of months had gone by and I’d replicated the algorithms, but I could see which way the wind was blowing, and knew I needed to learn transformers. Fortunately, I’d put in my alignment plan that I planned to devote significant time to a subfield that was undetermined at the time - this ended up starting with transformers, which helped a lot for my successful SERIMATS application.

Future Plans

So what are my plans now? I still want to become a Research Engineer as Plan A - I think this is my best path in terms of both immediate impact and long-term skill building. (See here if confused at the difference between Research Engineer and Research Scientist.) As a software engineer with little research experience (All my research experience thus far was gained in SERIMATS itself!) it seems the best way to use my skills - and since I’ve heard the gap between research engineer and research scientist is pretty porous everywhere except OpenAI and DeepMind, starting out as a research engineer is probably in my top three paths even if I do end up on a more research-heavy part of the continuum than I start. My timelines aren’t super-short - spending a couple of years building skills in the field is more important to me than immediate impact, as long as I’m not working on something actively useless or harmful.

Thus, my plans are:

First, SERIMATS of course! I’ve got two months in Berkeley studying and working full-time on alignment, amongst other people doing the same thing. This is a tremendous opportunity for growth, and if I don’t learn at least one thing there that alters my current model in a big way I’ll be pretty disappointed.

Secondly, I still owe about 6-8 weeks of work on this grant. I’ve been on the grant for five months so far, but I was doing SERIMATS for part of that, which comes with its own stipend - counting that as grant time would cause me to be paid twice for the same work. With AI Safety Support's advice, I’ve determined the best way to solve that is to just put in some extra work after SERIMATS in order to ensure that six months of dedicated upskilling is done via this grant, and repay the money only if this isn’t feasible. (e.g, I find a better opportunity that starts sooner than the end of April.)

While that’s going on, if a better opportunity hasn’t come along during that time, I’ll be looking for work in dedicated AI alignment orgs or DeepMind’s safety team. If I’m not able to find work there, Plan B is to apply for another round of funding and try to get into independent interpretability research - I’ll need to do some upskilling using Neel Nanda’s excellent resources, but that shouldn't take six months, and I believe I can start producing some interesting findings within three. Plan C could be distillation work, and I haven’t really thought about Plan D through Z yet.

Finally, I want to improve my general math ability further. It’s one of those things that’s always important but never urgent, so plugging away for an hour a day or so even if I'm not specifically blocked on a lack of it seems like a good way to go about it. I’ve tried focusing on one area at a time during the grant - now I want to try it the other way and interweave working on a few things at once, and see which works better for me in terms of motivation and retention. This’ll definitely take longer than three months, but it’s worth starting sooner rather than later.

Foundation Work - I’d like to have a world-class foundation in basic mathematics, so I’ll want to work through AMC competitions and the Art of Problem Solving books in order to improve that. I’m amazed at how many things I can learn from books aimed at bright high-schoolers. Just yesterday I learnt you can use combinations of prime factors to determine how many unique factors a large number has, which would have made several Project Euler problems a lot faster. (My starting point is 20/25 on the AMC 8, points lost to shaky geometry and combinatorics - give it a try yourself and see how you do! 40 minute timer, no calculator.)

Framing - John Wentworth says that much of the benefit of knowing lots of mathematics is just being able to recognise a problem. (Also see this comment of mine and it's parent) Thus, I want to work through the Infinitely Large Napkin or a similar resource, and come up with a few examples of problems in the real world that would use each branch of mathematics, even if they’re well beyond my ability to solve without more dedicated study.

Linear Algebra - John said in his workshops that “If you haven’t solved alignment yet, you don’t know enough linear algebra.” (This is also one of the most thought-provoking sentences I’ve heard in a long time) Thus, I want to continue to plug away at that, and work through the canonical LessWrong text of Linear Algebra Done Right.

But as they say - plans are useless, but planning is indispensable. Probability of parts of this plan changing significantly due to new information gained in Berkeley is >50%, but as long as I keep in mind why the plan is what it is, I can pivot as needed.

I hope people find this useful, and if there’s one piece of advice you’ve taken from this - if you’re unsure about applying to the LTFF or another similar source, go ahead and give it a try!

ChanaMessingerJan 4 20234

Really cool!

I'm curious about

Greatly improved my inside view on what research agendas I think are most promising.

And how you feel like that happened or what helped you do it.

Jay BaileyJan 8 20235

This came from going through AGI Safety Fundamentals (and to a lesser extent, Alignment 201) with a discussion group and talking through the various ideas. I also read more extensively in most weeks in AGISF than the core readings. I think the discussions were a key part of this. (Though it's hard to tell since I don't have access to a world where I didn't do that - this is just intuition)

ChanaMessingerJan 9 20232

Thanks!

Effective Altruism Forum
EA Forum