17 karmaJoined May 2021


Yes, the information is available on Google. The question is, in our eyes, more about whether a future model could successfully walk an unskilled person through the process without the person needing to understand it at all.

The paper is an attempt to walk a careful line of warning the world that the same information in more capable models could be quite dangerous, but not actually increasing the likelihood of someone using the current open source models (which it is too late to control!) for making biological weapons.

If there are specific questions you have, I'd be happy to answer.

For what it's worth, my model of a path to safe AI looks like a narrow winding path along a ridge with deadly falls to either side:

Unfortunately, the deadly falls to either side have illusions projected onto them of shortcuts to power, wealth, and utility. I don't think there is any path which goes to safety without a long ways of immediate danger nearby. In this model, deliberately consistently optimizing for safety above all else during the dangerous stretch is the only way to make it through.

The danger zone is where the model is sufficiently powerful and agentic enough that a greedy shortsighted person could say to it, "Here is access to the internet. Make me lots of money." and this would result in a large stream of money pouring into their account. I think we're only a few years away from that point, and that the actions that safety researchers take in the meantime aren't going to change that. So, we need both safety research and governance, and carefully selecting disproportionately safety-accelerating research would be entirely irrelevant to the strategic landscape.

This is just my view, and I may be wrong, but I think it's worth pointing out that there's a chance that the idea of trying to do disproportionately safety-accelerating research is a distraction from strategically relevant action.

Cool, thanks. Sorry for sounding a bit hostile, I'm just really freaked out by my strongly held inside view that we have less than 10 years until some really critical tipping point stuff happens. I'm trying to be reasonable and rational about this, but sometimes I react emotionally to comments that seem to be arguing for a 'things will stay status quo for a good while, don't worry about the short term ' view.

Calling my strongly held inside view 'fringe' doesn't carry much weight as an argument for me. Do you have actual evidence of your longer than 10 years timelines view?

I hold the view that important scientific advancements tend to come disproportionately from the very smartest and most thoughtful people. My hope would be that students smart enough to be meaningfully helpful on the AGI alignment problem would be able to think through and form correct inside views on this.

If we've got maybe 2-3 years left before AGI, then 2 years before starting is indeed a large percentage of that remaining time. Even if we have more like 5-10... maybe better to just starting trying to work directly on the problem as best you can than let yourself get distracted by acquiring general background knowledge.

So here's a funny twist. I personally have been longtermist since independently coming to the conclusion that it was the correct way to conceptualize ethics, around 30 years ago. I realized that I cared about as much about future people as current people far away. After some thought, I settled on global health/poverty/rule-of-law as one of my major cause areas because I believe that bringing current people out of bad situations  is good not only for them but for the future people who will descend from them or be neighbors of their descendants, etc. Also, because society as a whole sees these people suffering, thinks and talks about them, and adjusts their ethical decision-making in accordance. I think that the common knowledge that we are part of a worldwide society which allows children to starve or suffer from cheaply curable diseases negatively influences our perception of how good our society COULD be. I think my other important cause areas, like existential risk mitigation and planning for sustainable exponential growth are also important, but.... Suppose we succeed at these two, and fail at the first. I don't want a galaxy spanning civilization which allows a substantial portion of its subjects to suffer hugely from preventable problems the way we currently allow our fellow humans to suffer. That wouldn't be worse than no-galaxy-spanning-civilization, but it would be a lot less good than one which takes reasonable care of its members.

Sounds like good work. Trying to get the right information to the right people who can pass that on to decision makers seems useful. 


Two things I would like you to consider having some cached thoughts on to offer to the right people would be:

1. food security under adverse circumstances (e.g. highlights from the work of AllFed like seaweed farming).

2. Global cooling work (e.g. Silver Lining, and doubling up with reef-protection by seawater spraying over reefs), and the combined work of carbon sequestration and soil quality improvement of Biochar as an agricultural practice. Biochar is neat because it's advantageous to the individual farmer, as well as being good for the world, so the local incentive structure is aligned.

Ok, these are all pretty simplified and I think you'd need to understand a bit more background to move the conversation on from these points, but not bad. Except for the 'why not merge with AI' response. That one is responding as if 'merge' meant physically merge, which is not what is meant by that argument. The argument means to merge minds with the AI, to link brains and computers together in some fashion (e.g. neuralink) such that there is high bandwidth information flow, and thus be able to build an AGI system which contains a human mind.

Here's a better argument against that: human values are not permeated all throughout the entire human brain, it is possible to have a human without a sense of morality. You cannot guarantee that a system of minds including a human mind (whether running on biological tissue or computer hardware) would in fact be aligned just because the human portion was aligned pre-merge. It is a strange and novel enough entity to need study and potentially alignment just like any other novel AGI prototype.

I know this is the conclusion of a report, so it's too late to suggest an addition now, but I think that in the future it would be very much worth looking into the political dispute resolution framework Polis described in a recent 80k hours podcast interviewing Audrey Tang. 

Load more