All of Jay Bailey's Comments + Replies

My p(doom) went down slightly (From around 30% to around 25%) mainly as a result of how GPT-4 caused governments to begin taking AI seriously in a way I didn't predict. My timelines haven't changed - the only capability increase of GPT-4 that really surprised me was its multimodal nature. (Thus, governments waking up to this was a double surprise, because it clearly surprised them in a way that it didn't surprise me!)

I'm also less worried about misalignment and more worried about misuse when it comes to the next five years, due to how LLM"s appear to behav... (read more)

This is exactly right, and the main reason I wrote this up in the first place. I wanted this to serve as a data point for people to be able to say "Okay, things have gone a little off the rails, but things aren't yet worse than they were for Jay, so we're still probably okay." Note that it is good to have a plan for when you should give up on the field, too - it should just allow for some resilience and failures baked in. My plan was loosely "If I can't get a job in the field, and I fail to get funded twice, I will leave the field". 

Also contributing ... (read more)

Welcome to the Forum!

This post falls into a pretty common Internet failure mode, which is so ubiquitous outside of this forum that it's easy to not realise that any mistake has even been made - after all, everyone talks like this. Specifically, you don't seem to consider whether your argument would convince someone who genuinely believes these views. I am only going to agree with your answer to your trolley problem if I am already convinced invertebrates have no moral value...and in that case, I don't need this post to convince me that invertebrate welfare... (read more)

2
Saul Munn
4mo
wonderfully welcoming comment, @Jay Bailey! :)

Great post! I definitely feel similar regarding giving - while giving cured my guilt about my privileged position in the world, I don't feel as amazing as I thought I would when giving - it is indeed a lot like taxes. I feel like a better person in the background day-to-day, but the actual giving now feels pretty mundane.

I'm thinking I might save up my next donation for a few months and donate enough to save a full life in one go - because of a quirk in human brains I imagine that would be more satisfying than saving 20% of a life 5 times.

2
NickLaing
5mo
Such an interesting idea about saving up enough for a full life saved! Concreteness is important, maybe even kind of lame ideas like buying be a small piece of jewelry to remind us of a line saved, or hosting a "giving" celebration" can help some of us feel the good that is done a little more!

For the Astra Fellowship, what considerations do you think people should be thinking about when deciding to apply for SERI MATS, Astra Fellowship, or both? Why would someone prefer one over the other, given they're both happening at similar times?

9
Ryan Kidd
6mo
MATS has the following features that might be worth considering: 1. Empowerment: Emphasis on empowering scholars to develop as future "research leads" (think accelerated PhD-style program rather than a traditional internship), including research strategy workshops, significant opportunities for scholar project ownership (though the extent of this varies between mentors), and a 4-month extension program; 2. Diversity: Emphasis on a broad portfolio of AI safety research agendas and perspectives with a large, diverse cohort (50-60) and comprehensive seminar program; 3. Support: Dedicated and experienced scholar support + research coach/manager staff and infrastructure; 4. Network: Large and supportive alumni network that regularly sparks research collaborations and AI safety start-ups (e.g., Apollo, Leap Labs, Timaeus, Cadenza, CAIP); 5. Experience: Have run successful research cohorts with 30, 58, 60 scholars, plus three extension programs with about half as many participants.
7
Alexandra Bates
6mo
Good question! In my opinion, the main differences between the programs are the advisors and the location. Astra and MATS share a few advisors, but most are different. Additionally, Astra will take place at the Constellation office, and fellows will have opportunities to talk with researchers that work regularly from Constellation.

"All leading labs coordinate to slow during crunch time: great. This delays dangerous AI and lengthens crunch time. Ideally the leading labs slow until risk of inaction is as great as risk of action on the margin, then deploy critical systems.

All leading labs coordinate to slow now: bad. This delays dangerous AI. But it burns leading labs' lead time, making them less able to slow progress later (because further slowing would cause them to fall behind, such that other labs would drive AI progress and the slowed labs' safety practice

... (read more)

Great post! Another thing worth pointing out is another advantage of giving yourself capacity. I try to operate at around 80-90% capacity. This allows me time to notice and pursue better opportunities as they arise, and imo this is far more valuable to your long-term output than a flat +10% multiplier. As we know from EA resources, working on the right thing can multiply your effectiveness by 2x, 10x, or more. Giving yourself extra slack makes you less likely to get stuck in local optima.

Thanks Elle, I appreciate that. I believe your claims - I fully believe it's possible to safely go vegan for an extended period, I'm just not sure how difficult it is (i.e, what's the default outcome, if one tries without doing research first) and what ways there are to prevent that outcome if the outcome is not good.

I shall message you, and welcome to the forum!

With respect to Point 2, I think that EA is not large enough that a large AI activist movement would be comprised mostly of EA aligned people. EA is difficult and demanding - I don't think you're likely to get a "One Million EA" march anytime soon. I agree that AI activists who are EA aligned are more likely to be in the set of focused, successful activists (Like many of your friends!) but I think you'll end up with either:

- A small group of focused, dedicated activists who may or may not be largely EA aligned
- A large group of unfocused-by-default, relati... (read more)

I think there's a bit of an "ugh field" around activism for some EA's, especially the rationalist types in EA. At least, that's my experience.

My first instinct, when I think of activism, is to think about people who:

- Have incorrect, often extreme beliefs or ideologies.
- Are aggressively partisan.
- Are more performative than effective with their actions.

This definitely does not describe all activists, but it does describe some activists, and may even describe the median activist. That said, this shouldn't be a reason for us to discard this idea immediately... (read more)

7
NickLaing
1y
Great points thanks so much, agree with almost all of it! We've obviously had different experience of activists! I have a lot of activist friends, and my first instincts when I think of activists are people who 1. Understand the issue they are campaigning for extremely well, without   2. Have a clear focus and goal that they want to achieve 2. Are beholden to their ideology yes but not to any political party because they know political tides change and becoming partisan won't help their cause Although I definitely know a few who fit your instincts pretty well ;) That's a really good point about the AI policy experts not being sure where to aim their efforts, so how would activists know where to aim theirs? Effective traditional activism needs clear targets and outcomes. A couple of points on the slightly more positive end supporting activism. 1. At this early stage we are at where very few people are even aware of the potential of AI risk, could raising public awareness be a legitimate purpose to actvism? Obviously when most people are aware and on board with the risk, then you need the effectiveness at changing policy you discussed. 2. AI activists might be more likely to be EA aligned, so optimistically more likely to be in that small percentage of more focused and successful activists?

I am one of those meat-eating EA's, so I figured I'd give some reasons why I'm not vegan, to aid the goals of this post in finding out about these things.

Price: While I can technically afford it, I still prefer to save money when possible.
Availability: A lot of food out there, especially frozen foods which I buy a lot of since I don't like cooking, involves meat. It's simply easier to decide on meals when meat is an option.
Knowledge: If I were to go vegan, I would be unsure how to go vegan safely for an extended period, and how to make sure I got a decent ... (read more)

9
ElleB
1y
"Knowledge: If I were to go vegan, I would be unsure how to go vegan safely for an extended period, and how to make sure I got a decent variety rather than eating the same foods over and over (which comes into taste - I don't mind vegan food but there's much more variety I can find in meat-based dishes)" If knowledge is one of your preventative factors, I have been vegetarian since 2005 and vegan since 2015. I am happy to help. I compete in various sports and am in good health. I am happy to communicate with you directly and provide evidence of such claims if that helps quell your concerns about "[going] vegan safely for an extended period."  I would ecstatically give you my time in the pursuit of sharing knowledge and helping to reduce barriers to veganism.

Seems like a pretty incredible opportunity for those interested! What level of time commitment do you expect reading and understanding the book to take, in addition to the meetings?

4
Jason Clinton
1y
Each set of two chapters we will read will take between 1-2 hours to read every two weeks. That's it.

Looking at the two critiques in reverse order:

I think it's true that it's easy for EA's to lose sight of the big picture, but to me, the reason for that is simple - humans, in general, are terrible at seeing the bigger picture. If anything, it seems to me that the EA frameworks are better than most altruistic endeavours at seeing the big picture - most altruistic endeavors don't get past the stage of "See good thing and do it", whereas EA's tend to be asking if X is really the most effective thing they can do, which invariably involves looking at a bigger ... (read more)

I was using unidentifiability in the Hubinger way. I do believe that if you try to get an AI trained in the way you mention here to follow directions subject to ethical considerations, by default, the things it considers "maximally ethical" will be approximately as strange as the sentences from above.

That said, this is not actually related to the problem of deceptive alignment, so I realise now that this is very much a side point.

I don't understand why you believe unidentifiability will be prevented by large datasets. Take the recent SolidGoldMagikarp work. It was done on GPT-2, but GPT-2 nevertheless was trained on a lot of data - a quick Google search suggests eight million web pages.

Despite this, when people tried to find the sentences that maximally determined the next token, what we got was...strange.



This is exactly the kind of thing I would expect to see if unidentifiability was a major problem - when we attempt to poke the bounds of extreme behaviour of the AI and take it fa... (read more)

9
DavidW
1y
Hubinger et al’s definition of unidentifiability, which I'm referring to in this post:  I’m referring to unidentifiability in terms of goals of a model in a (pre-trained) reinforcement learning context. I think the internet contains enough information to adequately pin-point following directions. Do you disagree, or are you using this term some other way?   Pre-trained models having weird output probabilities for carefully designed gibberish inputs doesn’t seem relevant to me. Wouldn't that be more of a capability failure than goal misalignment? It doesn't seem to indicate that the model is optimizing for something other than next token prediction. I'm arguing that models are unlikely to be deceptively aligned, not that they are immune to all adversarial inputs.  I haven't read the post you linked to in full, so let me know if I'm missing something.  My unidentifiability argument is that if a model: 1. Has been pre-trained on ~the whole internet 2. Is sophisticated/complex enough to become TAI potential if (more) RL training occurs  3. Is told to follow directions subject to ethical considerations, then given directions  Then it would be really weird if it didn’t understand that it’s designed to follow directions subject to ethical considerations. If there's a way for this to happen, I haven't seen it described anywhere.  It might still occasionally misinterpret your directions, but it should generally understand that the training goal is to follow directions subject to non-consequentialist ethical considerations before RL training turns it into a proxy goal optimizer. Deception gets even less likely when you factor in that to be deceptive, it would need a very long-term goal and situational awareness before or around the same time as it understood that it needs to follow directions subject to ethical considerations. What’s the story for how this happens? 

Thanks for this! One thing I noticed is there is an assumption you'll continue to donate 10% of your current salary even after retirement - it would be worth having that as a toggle to turn that off, since the GWWC pledge does say "until I retire". That may make giving more appealing as well, because giving 10% forever requires longer timelines than giving 10% until retirement - when I did the calcs in my own spreadsheet I only increased my working timeline by about 10% by committing to give 10% until retiring. 

Admittedly, now I'm rethinking the whole "retire early" thing entirely given the impact of direct work, but this outside the scope of one spreadsheet :P

5
Alejandro Ruiz
1y
As another early retiree - at least, I was, for some time, before I un-retired (hopefully temporarily) to pursue an expensive startup project, as a funder - I think you underestimate the power of the FIRE's income. By the time most of us are ready to "pull the plug", the usual question is not, "How probable is that I never run out of money?", but "How much time did I overspent working, since this safety margin is, with benefit of hindsight, obviously excessive?". Thus, most FIRE types should have more than enough to maintain donation rate, and probably increase it.  See for example, https://www.mrmoneymustache.com/2022/07/18/never-run-out-of-money/
3
Rebecca Herbst
1y
Thank you!  I question if the words "until I retire" is a steadfast rule or more of a guideline from the GWWC community simply because it's easier for people to digest. I imagine it is easier to come up with a donation amount based off of your current and predicted active income, whereas it can be harder to predict that donation rate once you are an "early retiree"...which I know from personal experience as I don't really have a true income to base my 10% on.  So to me this seems more of a messaging point of friction. I guess the question then is, should early retirees continue to donate at the same rate once they retire? Or rather, should any retiree at any age continue to donate at the same rate once they retire? My perspective is this can be harder with unpredictable income, but if you can build it into your financial plan early on, then you are in a much better place to continue donating on a regular basis, even once you lose the active income. So, I'm tempted to not have the option to toggle it off, but there may be value in showing the difference in portfolio value and FI timeline.  I welcome more feedback!

This came from going through AGI Safety Fundamentals (and to a lesser extent, Alignment 201) with a discussion group and talking through the various ideas. I also read more extensively in most weeks in AGISF than the core readings. I think the discussions were a key part of this. (Though it's hard to tell since I don't have access to a world where I didn't do that - this is just intuition)

2
ChanaMessinger
1y
Thanks!

Great stuff! Thanks for running this!

Minor point: The Discovering Latent Knowledge Github appears empty.

Also, regarding the data poisoning benchmark, This Is Fine, I'm curious if this is actually a good benchmark for resistance to data poisoning. The actual thing we seem to be measuring here is speed of transfer learning, and declaring that slower is better. While slower speed of learning does increase resistance to data poisoning, it also seems bad for everything else we might want our AI to do. To me, this is basically a fine-tuning benchmark that we've ... (read more)

I'm sure each individual critic of EA has their own reasons. That said (intuitively, I don't have data to back this up, this is my guess) I suspect two main things, pre-FTX.

Firstly, longtermism is very criticisable. It's much more abstract, focuses less on doing good in the moment, and can step on causes like malaria prevention that people can more easily emotionally get behind. There is a general implication of longtermism that if you accept its principles, other causes are essentially irrelevant.

Secondly, everything I just said about longtermism -> ne... (read more)

Personally I have no idea if this is a worthy use of the median EA's time, but this is exactly the kind of interesting thinking I'd like to see. 

Without asking for rigor at this particular time, do you think some languages are better than others for one or more of these outcomes?

1
O Carciente
1y
Thanks!  Re: specific languages, I think there's a few ways to think about it.  In terms of "best for your brain" re:dementia, traumatic brain injury, etc:  I think the more different the better. So if your first language is synthetic, you should go with an analytic language and vice versa.  In that same vein of thought, any language that has another alphabet and/or an entirely alternative writing system would be better too. Honourable mention also for sign languages, which combine additional motor skill practice on top of the linguistic and visual processing brain workout, and also everyone should know a bit of sign language anyway, because sometimes places are really loud or your throat is sore and it's hard to talk.  So, Hindi, Mandarin, Korean, Japanese, Mongolian, Arabic, Greek, Russian, Javanese, ASL, etc.  - In terms of trying to intellectually "weaponize" languages: Any language that can be very easily and comfortably associated with a specific mode of thought. E.g. If you were very interested in reading a lot of communist philosophy in the original Russian, and wanted to create a "communist "mode in your brain, or if you were very interested in learning to think more about theology and metaphysics (I personally think a lot of old metaphysics philosophical takes are going to start becoming much more useful in the near future with the rise of AI and hyperglobalization) and wanted to  read a lot of Jewish philosophy in Hebrew, or old Catholic philosophy in Latin, Islamic philosophy in Arabic, etc.  The priority there is a language that has a very rich "backlog" of the thing you want to work with intellectually.  So that would be things like Latin, Arabic, Mandarin, Hebrew, Russian, German, Sanskrit, Spanish, French, etc. One interesting note about the "mode" thing is that this is the one place where a language being dead might actually be a plus. But studying a dead language has its own drawbacks and is usually more demanding.  - In terms of trying

Similar to Quadratic Reciprocity, I think people are using "disagree" to mean "I don't think this is a good idea for people to do", and not to mean "I think this comment is factually wrong".

For me, I have:

Not wanting to donate more than 10%.
("There are people dying of malaria right now, and I could save them, and I'm not because...I want to preserve option value for the future? Pretty lame excuse there, Jay.")

Not being able to get beyond 20 or so highly productive hours per week.
("I'm never going to be at the top of my field working like that, and if impact is power-lawed,  if I'm not at the top of my field, my impact is way less.")

Though to be fair, the latter was still a pressure before EA, there was just less reason to care because I ... (read more)

Hey Jay,

Over the years, I have talked to many very successful and productive people, and most do, in fact, not work more than 20 productive hours per week. If you have a job with meetings and low-effort tasks in between, it's easy to get to 40 hours plus. Every independent worker who measures hours of real mental effort is more in the 4-5 hours per day range. People who say otherwise tend to lie and change their numbers if you pressure them to get into the detail of what "counts as work" to them. It's a marathon, and if you get into that range every day, you'll do well.

Prior to EA, I worked as a software engineer. Nominally, the workday was 9-5 Monday-Friday In practice, I found that I achieved around 20-25 hours of productive work per week, with the rest being lunch, breaks, meetings, or simply unproductive time. After that, I worked from home at other non-EA positions and experimented with how little I needed to get my work done and went down to as few as 10 hours per week - I could have worked more, but I only cared about comfortably meeting expectations, not excelling. 

For the last few months I've been upskillin... (read more)

4
dan.pandori
1y
Also a software engineer, and this also is a pretty spot on description for me. 25 hours of productive work is about my limit before I start burning out and making dumb mistakes.

I was FIRE before I become EA. My original plan was to do exactly what you suggested and reach financial independence first before moving into direct work. However, depending on what field you want to move into, it's also possible to make decent money while doing direct work as well - once I found that out for AI alignment, I decided to go into direct work earlier.

That said, I definitely agree with some of your claims. I donate 10%, and am not currently intending to donate more until I have enough money to be financially independent if I wanted to. I've ta... (read more)

This has been appearing both here and on LessWrong. At best, it's an automated spam marketing attempt. At worst (and more likely, imo) it's an outright scam. I've reported these posts, and would not recommend downloading the extension.

This comment can be deleted if moderators elect to delete this post.

I organise AI Safety Brisbane - there are no AI safety orgs in Brisbane, or even Australia, so before ever forming it, I had to consider the impact of members (including myself!) eventually leaving for London or the Bay Area to do work there. While we don't actively encourage people to do this, that certainly is the goal for some of the more committed members. 

My general way of handling this is to openly admit that I expect some amount of churn as a result of this, and that this is a totally reasonable thing for any member to do. I've also been consid... (read more)

I don't think this is apt. FTX was formed by a team of people who very much seemed to be EA-aligned and EA-inspired at the time of forming. Elie and Madoff had no such connection - Elie's only interaction with Madoff was as a victim.

You can quite reasonably make the claim that EA has very little blame, but certainly if you were going to rank both scenarios, we would at least be somewhat more at fault than Elie was.

Hi Edward,

First off - you have my sympathies. That sounds terrible, and I understand his and your anger about this. Unfortunately, there are a great deal of problems in the world, so EA's need to think carefully about where we should allocate our resources to do as much good as we can. Currently, you can save a life for around 3500-5500 USD, or for animals, focusing on factory farming can lead to tremendous gains (Animal Charity Evaluators estimates that lobbying for cage-free campaigns for hens can lead to multiple hen-years affected per dollar).

So, we ne... (read more)

4
Bella
1y
I upvoted both of the comments in this thread, because I empathised with Edward's feelings of horror/sadness at the animal abuse they witnessed, and because I thought that Jay's comment was a kind, honest, and full explanation of why Edward's proposal was unlikely to be taken up by others on this Forum :) Thanks both.

The actual usage of the abbey is very likely to be somewhere between these two numbers. Definitely I would expect it to be used far more than for one major conference per year, but I wouldn't expect 100% usage either. 

I am skeptical of the FOOM idea too, but I don't think most of this post argues effectively against it. Some responses here:

1.0/1.1 - This seems nonobvious to me. Do you have examples of these superlinear decays? This seems like the best argument of the entire piece if true, and I'd love to see this specific point fleshed out.

2 - ELO of 0 is not the floor of capability. ELO is measured as a relative ranking of competitors, and is not an objective measure of chess capability - it doesn't start at "You don't know how to play at 0", it starts at 1200 being de... (read more)

1
𝕮𝖎𝖓𝖊𝖗𝖆
1y
Thanks for the detailed reply. I'll try and address these objections later.

I can see this point, but I'm curious - how would you feel about the reverse? Let's say that CEA chose not to buy it, and instead did conferences the normal way. A few months later, you're talking to someone from CEA, and they say something like:

Yeah, we were thinking of buying a nice place for these retreats, which would have been cheaper in the long run, but we realised that would probably make us look bad. So we decided to eat the extra cost and use conference halls instead, in order to help EA's reputation.

Would you be at all concerned by this statement, or would that be a totally reasonable tradeoff to make?

+1 to Jay's point. I would probably just give up on working with EAs if this sort of reasoning were dominant to that degree? I don't think EA can have much positive effect on the world if we're obsessed with reputation-optimizing to that degree; it's the sort of thing that can sound reasonable to worry about on paper, but in practice tends to cause more harm than good to fixate on in a big way.

(More reputational harm than reputational benefit, of the sort that matters most for EA's ability to do the most good; and also more substantive harm than substantiv... (read more)

4
Closed Limelike Curves
1y
I refuse to believe that renting out a conference hall would actually have cost more. Investing £15,000,000 a year would yield roughly £1,000,000 a year on the stock market. If you are spending a million pounds on the venue alone for a 1,000 person conference, you are not doing it right. A convention hall typically runs in the tens of thousands of dollars, not the millions. This is a 100x markup.
2
projectionconfusion
1y
Frankly I would think that there was finally someone with a modicum of sense and understanding of basic PR working in the area. And upgrade my views of the competency of the organisation accordingly. Also I'd not that "this will save money in the long run" is a fairly big claim that has not been justified. There are literally hundreds of conference venues within a reasonable distance of Oxford, all of which are run by professional event managers who are able to take advantage of specialisation and economies of scale. Making it difficult to believe
-2
Max Pietsch
1y
Optics is real. We live in the real world. Optics factor into QUALYs or any other metric. Why would the reverse be true, that we ignore reputation-related effects, even if they are fully real? I feel a bit awkward quoting the Bible, but there's one part that's super relevant to this discussion from a secular perspective. It's Corinthians 8:6 to 8:13, and is basically like, "hey, we know doing X isn't bad, but anyone seeing us doing X they'd think we're casting away our principles, which would cause them to do wrong, so we're not going to do X." Here's the quote, yet for us there is one God, the Father, from whom are all things and for whom we exist, and one Lord, Jesus Christ, through whom are all things and through whom we exist. However, not all possess this knowledge. But some, through former association with idols, eat food as really offered to an idol, and their conscience, being weak, is defiled. Food will not commend us to God. We are no worse off if we do not eat, and no better off if we do. But take care that this right of yours does not somehow become a stumbling block to the weak. For if anyone sees you who have knowledge eating in an idol’s temple, will he not be encouraged, if his conscience is weak, to eat food offered to idols? And so by your knowledge this weak person is destroyed, the brother for whom Christ died. Thus, sinning against your brothers and wounding their conscience when it is weak, you sin against Christ. Therefore, if food makes my brother stumble, I will never eat meat, lest I make my brother stumble.
3
EricHerboso
1y
It depends. In isolation, that statement does seem concerning to me, like they may have been overestimating the potential negative optics. What matters to me here is whether sufficient thought was put into all the different aspects. Clearly, they thought a lot about the non-optics stuff. I have no way of easily evaluating those kinds of statements, as I have very little experience organizing conferences. But I’m concerned that maybe there wasn’t sufficient thought given to just how bad the optics can get with this sort of thing. My career has been in communications, so I'm used to thinking about PR risks and advocating for thinking about those aspects. Perhaps I'm posting here with a bias from that point of view. If I were in a room with decision-makers, I'd expect my comments here to be balanced by arguments on the other side. Even so, my suspicion is that, if you write something like "do what really is good rather than just what seems good", you're more likely to be underestimating rather than overestimating PR risks.

The world, in general, is less rational and more emotion-driven than most of us would consider optimal.

EA pushes back against this trend, and as such is far more on the calculating side than the emotional side. This is good - but the correct amount of emotion is not zero, and it's quite easy for people like us who try to be more calculatory to over-index on calculation, and forget to feel either empathy OR compassion for others. It can be true that the world is too emotion-driven, while a specific group isn't emotion-driven enough. Whether that's true of EA or not...I'm not sure.

There's a pretty major difference here between EA and most religions/ideologies.

In EA, the thing we want to do is to have an impact on the world. Thus, sequestering oneself is not a reasonable way to pursue EA, unless done for a temporary period.

An extreme Christian may be perfectly happy spending their life in a monastery, spending twelve hours a day praying to God, deepening their relationship with Him, and talking to nobody. Serving God is the point.

An extreme Buddhist may be perfectly happy spending their life in a monastery, spending twelve hours a da... (read more)

For those who don't want to follow links to a previous post and read the comments, the counterargument as I understand it (and derived, independently, before reading the comments) is:

For this to be a threat, we would need an AGI that was

- Misaligned
- Capable enough to do significant damage if it had access to our safety plans
- Not capable enough to do a similar amount of damage without access to our safety plans

I see the line between 2 and 3 to be very narrow. I expect almost any misaligned AI capable of doing significant damage using our plans to also be ... (read more)

2
Peter S. Park
1y
Thank you so much for the clarification, Jay! It is extremely fair and valuable. The underlying question is: does the increase in the amount of AI safety plans resulting from coordinating on the Internet outweigh the decrease in secrecy value of the plans in EV? If the former effect is larger, then we should continue the status-quo strategy. If the latter effect is larger, then we should consider keeping safety plans secret (especially those whose value lies primarily in secrecy, such as safety plans relevant to monitoring).  The disagreeing commenters generally argued that the former effect is larger, and therefore we should continue the status-quo strategy. This is likely because their estimate of the latter effect was quite small and perhaps far-into-the-future. I think ChatGPT provides evidence that the latter should be a larger concern than many people's prior. Even current-scale models are capable of nontrivial analysis about how specific safety plans can be exploited, and even how specific alignment researchers' idiosyncrasies can be exploited for deceptive misalignment.  I am uncertain about whether the line between 2 and 3 will be narrow. I think the argument of the line between 2 and 3 being narrow often assumes fast takeoff, but I think there is a strong empirical case that takeoff will be slow and constrained by scaling, which suggests the line between 2 and 3 might be larger than one might think. But I think this is a scientific question that we should continue to probe and reduce our uncertainty about!

I notice that I'm confused. 

I read your previous post. Nobody seemed to disagree with your prediction that a misaligned AI could do this, but everybody seemed to disagree that this was a problem such that fixing it was worth the large costs in coordination.

Thus, I don't really understand how this is supposed to be an update for those who disagreed with you. Could you elaborate on why you think this information would change people's minds?

9
Jay Bailey
1y
For those who don't want to follow links to a previous post and read the comments, the counterargument as I understand it (and derived, independently, before reading the comments) is: For this to be a threat, we would need an AGI that was - Misaligned - Capable enough to do significant damage if it had access to our safety plans - Not capable enough to do a similar amount of damage without access to our safety plans I see the line between 2 and 3 to be very narrow. I expect almost any misaligned AI capable of doing significant damage using our plans to also be capable of doing significant damage without needing them. By contrast, the cost of not posting our plans online is a likely drastic reduction of effectiveness of the AI alignment field, both coordinating among existing members and bringing new members in. While the threat that Peter talks about is real, it seems that we are in much more danger by slowing down our alignment progress than we are by giving AI's access to our notes.

It seems like this only works for people who want to be aligned with EA but are unsure if they're understanding the ideas correctly. This does not seem to apply for Elon Musk (I doubt he identifies as EA, and he would almost certainly simply ignore this certification and tweet whatever he likes) or SBF (I am quite confident he could have easily passed such a certification if he wanted to)

Can you identify any high-profile individuals right now who think they understand EA but don't, who would willingly go through a certification like this and thus make more accurate claims about EA in the future?

I don't see how, if this system had been popularised five years ago, this would have actually prevented the recent problems. At best, we might have gotten a few reports of slightly alarming behaviour. Maybe one or two people would have thought "Hmm, maybe we should think about that", and then everyone would have been blindsided just as hard as we actually were.

Also...have you ever actually been in a system that operated like this? Let's go over a story of how this might go.

You're a socially anxious 20-year-old who's gone to an EA meeting or two. You're ner... (read more)

1
Mindaugas
1y
Nice points as always. Main issue One of the main issues with FTX is taking super high risks. It was unacceptable long ago. If reporting would have been the norm, it seems likely that someone who seen the decision making process (and decisions made), would have made private disclosures to EA management (reported many times for many decisions). Would this information have prevented EA management from still taking a lot of money, or taking this seriously? I am leaning towards the answer of 'yes', because internal information is more valuable than public rumors. The action will surely be taken from this point onwards after being burned by this already. Your point about them being reported as "rude" in this situation is not the best example:) And personalized stories you shared are important, I will take time to think more about such situations.

I don't think that not giving beggars money corrodes your character, though I do think giving beggars money improves it. This can easily be extended from "giving beggars money" to "performing any small, not highly effective good deed". Personally, it was getting into a habit of doing regular good deeds, however small or "ineffective" that moved me from "I intellectually agree with EA, but...maybe later" to "I am actually going to give 10% of my money away". I still actively look for opportunities to do small good deeds for that reason - investing in one's own character pays immense dividends over time, whether EA-flavored or not, and is thus a good thing to do for its own sake.

 

Fair point. I think, in a knee-jerk reaction, I adjusted too far here. At the very least, it seems that EA's are at least somewhat more likely to do good with power if they have that aim rather than people who just want power for power's sake. It's still an adjustment downwards on my part for the EV of EA politicians, but not to 0 compared to the median candidate of said candidate's political party.

And this is why utilitarianism is a framework, not something to follow blindly. Humans cannot do proper consequentialism. We are not smart enough. That's why, when we do consequentialist reasoning and the result comes out "Therefore, we should steal billions of dollars" the correct response is not in fact to steal billions of dollars, but rather to treat the answer the same way you would if you concluded a car was travelling at 16,000 km/h in a physics problem - you sanity check the answer against common sense, realise you must have made a wrong turn somew... (read more)

Yes agree, but as a utilitarian-esque movement grows, the chances of a member pursuing reckless blind utilitarianism also grows, so we need to give the ideas you describe more prominence within EA

Also worth noting here is that, as expected, EA's have in general condemned this idea and SBF has gone against the standard wisdom of EA in doing this. I feel like EA's principles were broken, not followed, even though I agree SBF was almost certainly a committed effective altruist. The update, for me, is not "EA as an ideology is rotten when taken very seriously" but rather "EA's are, despite our commitments to ethical behaviour, perhaps no more trustworthy with power than anyone else."

This has caused me to pretty sharply reduce my probability of EA politicians being a good idea, but hasn't caused a significant update against the core principles of EA.

wock
1y10
1
2

In Bayesian terms the update should be in the direction of EAs being less trustworthy than the average person, if you agree that the average CEO of a firm like FTX wouldn't have done what SBF did.

Jacy
1y35
10
0

"EA's are, despite our commitments to ethical behaviour, perhaps no more trustworthy with power than anyone else."

I wonder if "perhaps no more trustworthy with power than anyone else" goes a little too far. I think the EA community made mistakes that facilitated FTX misbehavior, but that is only one small group of people. Many EAs have substantial power in the world and have continued to be largely trustworthy (and thus less newsworthy!), and I think we have evidence like our stronger-than-average explicit commitments to use power for good and the critical... (read more)

Right, I'm trying to say - just like normal fiat currency, crypto is meant to be a money, it's not an end in itself. So using the bar "I wouldn't buy this thing if I couldn't then trade it for something else" doesn't really make sense, because the whole point of the thing is that you can eventually trade it for something else. 

So, my understanding of your argument is that the ability to buy things with currency is literally the purpose of currency - it's a necessary but not sufficient condition for a valid cryptocurrency. Using it as a bar makes about... (read more)

Would you spend dollars for euros if you then knew for certain you could never "sell" ( i.e. trade it for something else) your euros?

Could you elaborate on this analogy? I feel like this is supposed to be a point for crypto, in context, but this feels like it's supporting my point. The answer to this is unequivocally "no". The only reason I would want euros, or dollars for that matter, is so I could then buy something with it. Money that can never be spent is useless.

1
Kevin_Cornbob
1y
Right, I'm trying to say - just like normal fiat currency, crypto is meant to be a money, it's not an end in itself. So using the bar "I wouldn't buy this thing if I couldn't then trade it for something else" doesn't really make sense, because the whole point of the thing is that you can eventually trade it for something else.  The value in money is in its properties to facilitate store of value, unit of account, and unit of exchange. Crypto in principal can do these things. Also, insofar as there is value in other blockchain based applications (e.g. DeFi), you need crypto in order to use those applications (for "gas" to pay for transactions).  Also, you can use crypto/blockchain to transact in dollars - so called "stablecoins". So you can get the international transfer benefits but without the natural volatility that occurs in crypto prices. 

Upvoted, and updating on this somewhat. I don't think it's a sufficient value proposition to justify the entire enterprise, but this is very much a non-trivial gain made because of crypto - I'm sure you're not the only one to have done something like this.

I'm avoiding a full-on post at the moment because I'm very uncertain, so I figured I'd solicit opinions here, and see if I'm wrong. This post should be viewed as an attempt to invoke Cunningham's Law rather than something you should update on immediately. I expect some people to disagree, and if you do, I ask only that you tell me why. I am open to having my mind changed on this. I've thought this for a while, but now seems the obvious time to bring it up.

To me, cryptocurrency as a whole, at least in the form it is currently practiced, seems inherently ant... (read more)

3
rowboat1
1y
The evidence that should change your mind shouldn’t be proof that crypto isn’t speculative. All currency is speculative. The best evidence should be a more nuanced comparison of cryptocurrency against a conventional currency. All currencies are subject to speculation in the way that you’re talking about. When people feel like the dollar is not doing well, they’re less likely to spend. When people feel more optimistic about the economy, they do spend. These are similar pressures to what you’re talking about, but when you’re operating at the size of a nation state, and you have the backing of your whole economy and a bunch of cash reserves, it’s an order of magnitude less likely, but still possible, that your currency goes belly up. When a country’s currency does go belly up, then everybody who has cash in that country’s currency gets caught holding the bag in the way that you’re talking about. I think there’s a significant network effect happening on the side of conventional currencies. Specifically, since so many people use it, there is extra utility, and less speculation. To make a fair comparison, you need to be talking about a cryptocurrency that has reached the same scale as a state-backed currency. It seems to me that a fair comparison is instead “if you scale cryptocurrency up to the same network size as conventional currencies, does it deliver better utility?” There are a few reasons to think that this might be true.  1. It’s harder to steal money, because you have to convince the whole network (instead of just one bank) 2. You don’t have to put your trust in a bank.  3. Transactions are more private 4. Transactions are more convenient All of this is to say that when you scale up cryptocurrency, it might actually win a head-to-head match up against the dollar. Despite this, I think there are a few factors slowing down the uptake of cryptocurrency: 1. A negative reputation on the part of cryptocurrencies (the term is practically synonymous with sc
5
Kevin_Cornbob
1y
Cryptocurrency is meant to be a form of money, so I think the following quote doesn't really make sense as a bar:  Would you spend dollars for euros if you then knew for certain you could never "sell" ( i.e. trade it for something else) your euros? The only difference in principal is that crypto is backed by math and cryptography rather than government enforcement.  More practically, as Habryka noted, crypto is useful for international payments. Because it relies on a blockchain (a public ledger that is operated by thousands of computers around the world), you can send payments from/to anywhere in the world as long as you have electricity and internet access. This means you can send money more or less instantly with low fees, which seems like a big improvement over current SWIFT infrastructure, though there are downsides.[1] The blockchain/crypto enable use cases like DeFi (decentralized finance) that seem high potential. There's borrowing/lending protocols that allow depositors to get yield on their assets and borrowers to get access to additional capital. There are decentralized exchanges that enable participants to exchange assets. In principal this could extend to all viable financial services including insurance and derivatives.  Because these applications rely on smart contracts on the blockchain, they can provide financial services without the middlemen, enabling lower costs. There are also the benefits of financial inclusion since anyone around the world with electricity and internet access can participate, regardless of location, which in principal should help the global poor. I've tried to highlight some of the upsides/potential upsides of the industry and technology, but the truth is that the entire space is incredibly young, and we don't know exactly how valuable this is going to end up being.  You might wince at the analogy, but I think it's similar to someone in 1993 asking about the utility of the internet. There are fledgling applications that
1
publius
1y
Steelmanning the argument for the utility of crypto besides speculation or resale value. Baudrillard would call these crypto tokens pure signifier without signified. I disagree. I believe what tokens signify is hope. Undoubtedly, disenchantment has left a nihilistic void at the heart of society. Crypto, qua symbol, forms part of the re-enchantment of the world – a return to mythology. When someone invests in crypto, they receive utilons in return in the form of hope for a better future. They too, can become a multi-millionaire overnight and transform their life. Stories are important, the stories we believe in shape our quality of life. Crypto is the source of a story that helps people cope with difficult material circumstances.

I think crypto was genuinely an improvement for facilitating international money transfer. I personally transferred a bunch of my tuition payments from Germany to the U.S. via crypto, and it saved me a few thousand dollars in exchange payments and worse exchange rates. 

I think this had more to do with how terrible international money exchange is (or was at the time, before Wise.com started being much better than the competition), but I do think the crypto part was genuinely helpful here.

"Huh, this person definitely speaks fluent LessWrong. I wonder if they read Project Lawful? Who wrote this post, anyway? I may have heard of them.

...Okay, yeah, fair enough."

One thing I definitely believe, and have commented on before[1], is that median EA's (I.e, EA's without an unusual amount of influence) are over-optimising for the image of EA as a whole, which sometimes conflicts with actually trying to do effective altruism. Let the PR people and the intellectual leaders of EA handle that - people outside that should be focusing on saying what we sin... (read more)

I strongly endorse this (and also strongly endorse Eliezer's OP).

A related comment I made just before reading this post (in response to someone suggesting that we ditch the "EA" brand in order to reduce future criticism):

I strongly disagree -- first, because this is dishonest and dishonorable. And second, because I don't think EA should try to have an immaculate brand.

Indeed, I suspect that part of what went wrong in the FTX case is that EA was optimizing too hard for having an immaculate brand, at the expense of optimizing for honesty, integrity, open dis

... (read more)

One thing I definitely believe, and have commented on before[1], is that median EA's (I.e, EA's without an unusual amount of influence) are over-optimising for the image of EA as a whole, which sometimes conflicts with actually trying to do effective altruism. Let the PR people and the intellectual leaders of EA handle that - people outside that should be focusing on saying what we sincerely believe to be true

FWIW, I'm directly updating on this (and on the slew of aggressively bad faith criticism from detractors following this event).

I'll stop trying to... (read more)

A 10% chance of a million people dying is as bad as 100,000 people dying with certainty, if you're risk-neutral. Essentially that's the main argument for working on a speculative cause like AGI - if there's a small chance of the end of humanity, that still matters a great deal.

As for "Won't other people take care of this", well...you could make that same argument about global health and development, too. More people is good for increasing potential impact of both fields.

(Also worth noting - EA as a whole does devote a lot of resources to global health and development, you just don't see as many posts about it because there's less to discuss/argue about)

Welcome to the forum.

I'm sorry you've had a rough time with your first posts! The norms here are somewhat different than a lot of other places on the internet. Personally I think they're better, but they can lead to a lot of backlash against people when they act in a way that wouldn't be unusual on, say, Twitter. Specifically, I would look at our commenting guidelines:

Commenting guidelines:

  • Aim to explain, not persuade
  • Try to be clear, on-topic, and kind
  • Approach disagreements with curiosity

This comment doesn't really fit the last two. It's rather uncharitabl... (read more)

Welcome to the forum!

I think your concerns are definitely valid. EA has very much a quantitative, numbers-focused bent to it, and that tends to attract a lot of men, which probably goes a long way towards explaining the demographics. (EA is around 70% male, according to the 2020 survey) For instance, computer science is ~80% male, and is also a very common degree path among EA's. So you're definitely right both that men are more common, and that there's a strong emphasis on scientific ability and intelligence in EA.

That said, as you mentioned, one of EA's ... (read more)

I agree. If the Simulation Hypothesis became decently likely, we would want to answer questions like:

- Does our simulation have a goal? If so, what?
- Was our simulation likely created by humans?

Also, we'd probably want to be very careful with those experiments - observing existing inconsistencies makes sense, but deliberately trying to force the simulation into unlikely states seems like an existential risk to me - the last thing you want is to accidentally crash the simulation!

Load more