All of prattle's Comments + Replies

If amount of happiness (or suffering) possible is not linear in the number of elementary particles, what number of elementary particles do you suggest using?

4
Ben_West
2y
I am not sure, and I think this implies a good objection to my suggestion

I think the excerpt is getting at "maybe all possible universes exist (no claim about likelihood made, but an assumption for the post), then it is likely that there are some possible universes -- with way more resources than ours -- running a simulation of our universe. the behaviour of that simulated universe is the same as ours (it's a good simulation!) and in particular, the behaviour of the simulations of us are the same as our behaviours. If that's true, our behaviours could, through the simulation, influence a much bigger and better-resourced world. ... (read more)

How likely do you think it would be for standard ML research to solve the problems you're working on in the course of trying to get good performance? Do such concerns affect your project choices much?

For the contamination sentence: what's wrong with equipment and media sterilization? Why wouldn't we just grow meat in sterilized equipment in managed facilities? Also, couldn't we just sterlize after the fact?

For the sensitivity / robustness: why does it need to be robust? Can't it just be grown in a special facility? It's not like you can mimic the Doritos production process at home, but that doesn't stop a lot of Doritos being made. Why would the bioreactor need to placed outside?

For waste management: This does seem necessary. But months / years of cont... (read more)

I'm pretty confused by your paragraph describing the "futuristic bioreactor". It doesn't seem like we want almost any of those features for cultured meat.

The only parts that seem like they would be needed in are "[...] assembling those molecules into muscle and fat cells, and forming those cells into the complex tissues we love to eat" and "It has precise environmental controls and gas exchange systems that keep the internal temperature, pH, and oxygen levels in the ideal range for cell and tissue growth"

Some (though not all) of the others seem like they might be useful if we were to try and make cultured meat production as decentralizable as current meat production (and far more decentralized than factory farming).

4
kyle_fish
2y
We might not have to replicate the animal systems precisely, but we'd definitely need cheap solutions to the problems of contamination (3rd sentence), sensitivity/robustness (5th sentence), waste management (6th sentence), and scalability (7th and 8th sentences). All of these are currently huge issues for any biomanufacturing.

Do you think that different trajectories of prosaic TAI have big impacts on the usefulness of your current project? (For example, perhaps you think that TAI that is agentic would just be taught to deceive). If so, which? If not, could you say something about why it seems general?

(NB: the above is not supposed to imply criticism of a plan that only works in some worlds).

6
Buck
2y
I think this is a great question. We are researching techniques that are simpler precursors to adversarial training techniques that seem most likely to work if you assume that it’s possible to build systems that are performance-competitive and training-competitive, and do well on average on their training distribution. There are a variety of reasons to worry that this assumption won’t hold. In particular, it seems plausible that humanity will only have the ability to produce AGIs that will collude with each other if it’s possible for them to do so. This seems especially likely if it’s only affordable to train your AGI from scratch a few times, because then all the systems you’re using are similar to each other and will find collusion easier. (It’s not training-competitive to assume you’re able to train the AGI from scratch multiple times, if you believe that there’s a way of building an unaligned powerful system that only involves training it from scratch once.) But even if we train all our systems from scratch separately, it’s pretty plausible to me that models will collude, either via acausal trade or because the systems need to be able to communicate with each other for some competitiveness reason. So our research is most useful if we’re able to assume a lack of such collusion. I think that some people think you might be able to apply these techniques even in cases where you don’t have an a priori reason to be confident that the models won’t collude; I don’t have a strong opinion on this.

Does it make sense to think of your work as aimed at reducing a particular theory-practice gap? If so, which one (what theory / need input for theoretical alignment scheme)?

5
Buck
2y
I think our work is aimed at reducing the theory-practice gap of any alignment schemes that attempt to improve worst-case performance by training the model on data that was selected in the hope of eliciting bad behavior from the model. For example, one of the main ingredients of our project is paying people to try to find inputs that trick the model, then training the model on these adversarial examples. Many different alignment schemes involve some type of adversarial training. The kind of adversarial training we’re doing, where we just rely on human ingenuity, isn’t going to work for ensuring good behavior from superhuman models. But getting good at the simple, manual version of adversarial training seems like plausibly a prerequisite for being able to do research on the more complicated techniques that might actually scale.