Jay Bailey

Working (0-5 years experience)
973Brisbane QLD, AustraliaJoined Aug 2021



I'm a software engineer from Brisbane, Australia who's looking to pivot into AI alignment. I have a grant from the Long-Term Future Fund to upskill in this area full time until early 2023, at which point I'll be seeking work as a research engineer. I also run AI Safety Brisbane.

How others can help me

I will be looking for a research engineering position near the end of 2022. I'm currently working on improving my reinforcement learning knowledge. (https://github.com/JayBaileyCS/RLAlgorithms)

How I can help others

Reach out to me if you have questions about basic reinforcement learning or LTFF grant applications.


Thanks for this! One thing I noticed is there is an assumption you'll continue to donate 10% of your current salary even after retirement - it would be worth having that as a toggle to turn that off, since the GWWC pledge does say "until I retire". That may make giving more appealing as well, because giving 10% forever requires longer timelines than giving 10% until retirement - when I did the calcs in my own spreadsheet I only increased my working timeline by about 10% by committing to give 10% until retiring. 

Admittedly, now I'm rethinking the whole "retire early" thing entirely given the impact of direct work, but this outside the scope of one spreadsheet :P

This came from going through AGI Safety Fundamentals (and to a lesser extent, Alignment 201) with a discussion group and talking through the various ideas. I also read more extensively in most weeks in AGISF than the core readings. I think the discussions were a key part of this. (Though it's hard to tell since I don't have access to a world where I didn't do that - this is just intuition)

Great stuff! Thanks for running this!

Minor point: The Discovering Latent Knowledge Github appears empty.

Also, regarding the data poisoning benchmark, This Is Fine, I'm curious if this is actually a good benchmark for resistance to data poisoning. The actual thing we seem to be measuring here is speed of transfer learning, and declaring that slower is better. While slower speed of learning does increase resistance to data poisoning, it also seems bad for everything else we might want our AI to do. To me, this is basically a fine-tuning benchmark that we've then inverted. (After all, if a neural network always outputted the number 42 no matter what, it would score the maximum on TIF - there is no sequence of wrong prompts that can cause it to elicit the number 38 instead, because it is incapable of learning anything. Nevertheless, this is not where we want LLM's to go in the future.)

A better benchmark would probably be to take data poisoned examples and real fine-tuning, fine-tune the model on each, and compare how much it learns in both cases. With current capabilities, it might not be possible to score above baseline on this benchmark since I don't know if we actually have ways for the model to filter out data poisoned examples - nevertheless, this would bring awareness to the problem and actually measure what we want to measure more accurately.

I'm sure each individual critic of EA has their own reasons. That said (intuitively, I don't have data to back this up, this is my guess) I suspect two main things, pre-FTX.

Firstly, longtermism is very criticisable. It's much more abstract, focuses less on doing good in the moment, and can step on causes like malaria prevention that people can more easily emotionally get behind. There is a general implication of longtermism that if you accept its principles, other causes are essentially irrelevant.

Secondly, everything I just said about longtermism -> neartermism applies to EA -> regular charity - just replace "Doing good in the moment" with "Doing good close to home". When I first signed up for an EA virtual program, my immediate takeaway was that most of the things I had previously cared about didn't matter. Nobody said this out loud, they were scrupulously polite about it, they were 100% correct, and it was a message that needed to be shared to get people like me on board. This is a feature, not a bug, of EA messaging. But this is not a message that people enjoy hearing. The things people care about are generally optimised for having people care about them - as examples, see everything trending on Twitter. As a result, people don't react well to being told, whether explicitly or implicitly, that they should stop caring about (My personal example here) the amount of money Australian welfare recipients get, and care about malaria prevention halfway across the world instead.

One difference between EA and longtermism is that people rarely criticise neartermism to the same level, because then you can just point out the hundreds of thousands of lives that neartermism has already saved, and they look like an asshole. Longtermism has no such defense, and a lot of people equate that with the EA movement - sometimes out of intellectual dishonesty, and sometimes because longtermism  genuinely is a large and growing part of EA.

Personally I have no idea if this is a worthy use of the median EA's time, but this is exactly the kind of interesting thinking I'd like to see. 

Without asking for rigor at this particular time, do you think some languages are better than others for one or more of these outcomes?

Similar to Quadratic Reciprocity, I think people are using "disagree" to mean "I don't think this is a good idea for people to do", and not to mean "I think this comment is factually wrong".

For me, I have:

Not wanting to donate more than 10%.
("There are people dying of malaria right now, and I could save them, and I'm not because...I want to preserve option value for the future? Pretty lame excuse there, Jay.")

Not being able to get beyond 20 or so highly productive hours per week.
("I'm never going to be at the top of my field working like that, and if impact is power-lawed,  if I'm not at the top of my field, my impact is way less.")

Though to be fair, the latter was still a pressure before EA, there was just less reason to care because I was able to find work where I could do a competent job regardless, and I only cared about comfortably meeting expectations, not achieving maximum performance.

Prior to EA, I worked as a software engineer. Nominally, the workday was 9-5 Monday-Friday In practice, I found that I achieved around 20-25 hours of productive work per week, with the rest being lunch, breaks, meetings, or simply unproductive time. After that, I worked from home at other non-EA positions and experimented with how little I needed to get my work done and went down to as few as 10 hours per week - I could have worked more, but I only cared about comfortably meeting expectations, not excelling. 

For the last few months I've been upskilling in AI alignment. Now that I've cared more about doing the best job I can, I've gone back up to around 20 hours per week of productive work, but the work itself is usually more difficult. I'll be working in an office in a team for the next couple of months on a job I care about maximising impact in, so it'll be interesting to see if that affects my work habits.

I don't work more hours because I find it difficult to make myself focus for longer in a week than this - in addition to having difficulty getting myself to start work, I seem to make less progress when I do. I don't work fewer hours because I do want to be as productive as possible in this field, and I'd like to be able to work more than I do. 

Despite the number of hours worked I'm actually pretty happy with the results I've achieved both in EA and outside of it. I'd love to be able to get 30-40 of deep focused work per week in to improve those results further, but I'm not sure how to manage that at this point. (I haven't really thought about how many hours I'd work per week if I could do as many focused hours as I wanted, to be honest.)

I was FIRE before I become EA. My original plan was to do exactly what you suggested and reach financial independence first before moving into direct work. However, depending on what field you want to move into, it's also possible to make decent money while doing direct work as well - once I found that out for AI alignment, I decided to go into direct work earlier.

That said, I definitely agree with some of your claims. I donate 10%, and am not currently intending to donate more until I have enough money to be financially independent if I wanted to. I've taken a pay cut, but I am still actively saving money towards my goal - just expecting to hit it slower. (Which is fine, since I'm not planning on retiring early any more)

In addition, full financial independence requires ~25 years worth of investments. You may not need that much, but having even 5-10% of that gives you an absolutely enormous runway in case you want to try something new, your grants run out, and so on. There's a huge difference between no runway and a year or two of runway, so I definitely think that doing this early on is a good idea. Also, you know that way that you can make money outside the EA ecosystem, which is important - EA is still a very young movement. It's gotten a lot of momentum, but I wouldn't recommend anyone commit to a course of action where you have no useful skills if EA dies out in 10-20 years.

This has been appearing both here and on LessWrong. At best, it's an automated spam marketing attempt. At worst (and more likely, imo) it's an outright scam. I've reported these posts, and would not recommend downloading the extension.

This comment can be deleted if moderators elect to delete this post.

Load More