RL

Roman Leventov

128 karmaJoined Nov 2022

Bio

An independent researcher of ethics, AI safety, and AI impacts. LessWrong: https://www.lesswrong.com/users/roman-leventov. Twitter: https://twitter.com/leventov. E-mail: leventov.ru@gmail.com (the preferred mode of communication).

Comments
38

Hello Agustín, thanks for engaging with our writings and sharing your feedback.

Regarding the ambitiousness, low chances of overall success, and low chances of uptake by human developers and decision-makers (I emphasize "human" because if some tireless near-AGI or AGI comes along it could change the cost of building agents for participation in the Gaia Network dramatically), we are in complete agreement.

But notice Gaia Network could be seen as a much-simplified (from the perspectives of mathematics and Machine Learning) version of Davidad's OAA, as we framed it in the first post. Also, Gaia Network tries to leverage (at least approximately and to some degree) the existing (political) institutions and economic incentives. In contrast, it's very unclear to me how the political economy in the "OAA world" could look like, and what is even a remotely plausible plan for switching from the incumbent political economy of the civilisation to OAA, or "plugging" OAA "on top" of the incumbent political economy (and hasn't been discussed publicly anywhere, to the best of our knowledge). We also discussed this in the first post. Also, notice that due to its extreme ambitiousness, Davidad doesn't count on humans implementing OAA with their bare hands, it's a deal-breaker if there isn't an AI that can automate 99%+ of technical work needed to convert the current science into Infra-Bayesian language.[1] And yes, the same applies to Gaia Network: it's not feasible without massive assistance from AI tools that can do most of the heavy lifting. But if anything, this reliance on AI is less extreme in the case of Gaia Network than in the case of OAA.

The above makes me think that you should therefore be even more skeptical of OAA's chances of success than you are about Gaia's chances. Is this correct? If not, what do you disagree about in the reasoning above, or what elements of OAA make you think it's more likely to succeed?

Adoption

The "cold start" problem is huge for any system that counts on network effect, and Gaia Network is no exception. But this also means that the cost of convincing most decision-makers (businesses, scientists, etc.) to use the system is far smaller than the cost of convincing the first few, multiplied by the total number of agents. We have also proposed how various early adopters could get value out of the "model-based and free energy-minimising way" of doing decision-making (we don't need the adoption of Gaia Network right off the bat, more on this below) very soon, in absolutely concrete terms (monetary and real-world risk mitigation) in this thread.

In fact, we think that if there are sufficiently many AI agents and decision intelligence systems that are model-based, i.e., use some kinds of executable state-space ("world") models to do simulations, hypothesise counterfactually about different courses of actions and external conditions (sometimes in collaboration with other agents, i.e., planning together), and deploy regularisation techniques (from Monte Carlo aggregation of simulation results to amortized adversarial methods suggested by Bengio on slide 47 here) to permit compositional reasoning about risk and uncertaintly that scales beyond the boundary of a single agent, the benefits of collaborative inference of the most accurate and well-regularised models will be so huge that something like Gaia Network will emerge pretty much "by default" because a lot of scientists and industry players will work in parallel to build some versions and local patches of it.

Blockchains, crypto, DeFi, DAOs

I understand why the default prior when hearing anything about crypto, DeFi, and DAOs now is that people who propose something like this are either fantaseurs, or cranks, or, worse, scammers. That's unfortunate to everyone who just wants to use the technical advances that happen to be loosely associated with this field, which now includes almost anything that has to do with cryptography, identity, digital claims, and zero-knowledge computation.

Generally speaking, zero-knowledge (multi-party) computation is the only solution to make some proofs (of contribution, of impact, of lack of deceit, etc.) without compromising privacy (e.g., proprietary models, know-how, personal data). The ways to deal with this dilemma "in the real world" today inevitably come down to some kind of surveillance which many people become very uneasy about. For example, consider the present discussion of data center audits and compute governance. It's fine with me and most other people except for e/accs, for now, but what about the time when the cost of training powerful/dangerous models will drop so much that anyone can buy a chip to train the next rogue AI for 1000$? How does compute governance look in this world?

Governance

I'm also skeptical of the theory of change. Even if AI Safety timelines were long, and we managed to pull this Herculean effort off, we would still have to deal with problems around AI Safety governance.

I don't think AI Safety governance is that special among other kinds of governance. But more generally on this point, of course, governance is important, and Gaia Network doesn't claim to "solve" it; rather, it plans to rely on some solutions developed by other projects (see numerous examples in CIP ecosystem map, OpenAI's "Democratic Inputs to AI" grantees, etc.).

We just mention in passing incorporating preferences of system's stakeholders into Gaia agents' subjective value calculations (i.e., building reward models for these agents/entities, if you wish), but there is a lot to be done there: how the preferences of the stakeholders are aggregated and weighted, who can claim to be a stakeholders of this or that system in the first place, etc. Likewise, on the general Gaia diagram in the post, there is a small arrow from "Humans and collectives" box to "Decision Engines" box labelled "Review and oversight", and, as you can imagine, there is a lot to be going on there as well.

Why would AGI companies want to stick to this way of developing systems?

IDK, convinced that this is a safe approach? Being coerced (including economically, not necessary by force) by the broader consensus of using such Gaia Network-like systems? This is a collective action problem. This question could be addressed to any AI Safety agenda and the answer would be the same.

Moloch

It's also the case that this project also claims to be able to basically be able to slay Moloch[3]. This seems typical of solutions looking for problems to solve, especially since apparently this proposal came from a previous project that wasn't related to AI Safety at all.

I wouldn't say that we "claim to be able to slay Moloch". Rafael is more bold in his claims and phrasing than me, but I think even he wouldn't say that. I would say that the project looks very likely to help to counteract Molochian pressures. But this seems to me almost a self-evident statement, given the nature of the proposal.

Compare with Collective Intelligence Project. It has started with the mission to "fix governance" (and pretty much "help to counteract Moloch" in the domain of political economy, too, they barely didn't use this concept, or maybe they even did, I don't want to check it now), and now they "pivoted" to AI safety and achieved great legibility on this path: e.g., they partner with OpenAI, apparently, on more than one project now. Does this mean that CIP is a "solution looking for a problem"? No, it's just the kind of project that naturally lends to helps both with Moloch and AI safety. I'd say the same could be said of Gaia Network (if it is realised in some forms) and this lies pretty much in plain sight.

Furthermore, this shouldn't be surprising in general, because AI transition of the economy is evidently an accelerator and a risk factor in the Moloch model, and therefore these domains (Moloch and AI safety) almost merge in my overall model of risk. Cf. Scott Aaronson's reasoning that AI will inevitably be in the causal structure of any outcome of this century so "P(doom from AI)" is not well defined; I agree with him and only think about "P(doom)" without specification what this doom "comes from". Again, note that it seems that most narratives about possible good outcomes (take OpenAI's superalignment plan, Conjecture's CoEm agenda, OAA, Gaia Network) all rely on developing very advanced (if not superhuman) AI along the way. 

  1. ^

    Notice here again: you mention that most scientists don't know about Bayesian methods, but perhaps at least two orders of magnitude still fewer scientists have even heard of Infra-Bayesianism, let alone being convinced it's a sound and a necessary methodology for doing science. Whereas for Bayesianism, from my perspective, it seems there is quite a broad consensus of its soundness: there are numerous pieces and even books written about how P-values are a bullshit value of doing science and that scientists should take up (Bayesian) causal inference instead.

    There are a few notable voices that dismiss Bayesian inference, for example, David Deutsch, but then no less notable voices, such as Scott Aaronson and Sean Carroll (of the people that I've heard, anyway), that dismiss Deutsch's dismissal in turn.

I'm excited to see you posting this. My views are very closely agreed with yours. I summarised my views a few days ago here.

One of the most important similarities is that we both emphasise the importance of decision-making and supporting it with institutions. This could be seen as "enactivist" view on agent (human, AI, hybrid, team/organisation) cognition.

The biggest difference between our views is that I think the "cognitivist" agenda (i.e., agent internals and algorithms) is as important as the "enactivist" agenda (institutions), whereas you seem to almost disregard the "cognitivist" agenda.

Try to constrain, delay, or obstruct AI, in order to reduce risk, mitigate negative impacts, or give us more time to solve essential issues. This includes, for example, trying to make sure AIs aren't able to take certain actions (i.e. ensure they are controlled).

I disagree with putting risk-detection/mitigation mechanisms, algorithms, monitorings in that bucket. I think we should just separate between engineering (cf. A plea for solutionism on AI safety) and non-engineering (policy, legislature, treaties, commitments, advocacy) approaches. In particular, the "scheming control" agenda that you link will be concrete engineering practice that should be used in the training of safe AI models in the future, even if we have good institutions, good decision-making algorithms wrapped on top of these AI models, etc. It's not an "alternative path" just for "non-AI-dominated worlds". The same applies ftoor monitoring, interpretability, evals, etc. processes. All of these will require very elaborate engineering on their own.

I 100% agree with your reasoning about Frames 1 and 2. I want to discuss the following point in detail because it's a rare view in EA/LW circles:

It (IMO) wrongly imagines that the risk of coups comes primarily from the personal values of actors within the system, rather than institutional, cultural, or legal factors.

In my post, I also made a similar point: aligning LLMs with human values” is hardly a part of [the problem of context alignment] at all". But my framing was in general not very clear, so I'd try to improve it and integrate it with your take here:

Context alignment is a pervasive process that happens (and sometimes needed) on all timescales: evolutionary, developmental, and online (the examples of the latter in humans: understanding, empathy, rapport). The skill of context alignment is extremely important and should be practiced often by all kinds of agents in their interactions (and therefore we should build this skill into AIs), but it's not something that we should "iron out once and for all". That would be neither possible (agents' contexts are constantly diverging from each other), nor desirable: the (partial) misalignment is also important, it's the source of diversity that enables the evolution[1]. Institutions (norms, legal systems, etc.) are critical for channelling and controlling this misalignment so that it's optimally productive and doesn't pose excessive risk (though some risk is unavoidable: that's the essence of misalignment!).

Flexible yet resilient legal and social structures that can adapt to changing conditions without collapsing

This is interesting. I've also discussed this issue as "morphological intelligence of socioeconomies" just a few day ago :)

Good incentives for agents within the system, e.g. the economic value of trade is mostly internalized

Rafael Kaufmann and I have a take on this in our Gaia Network vision. Gaia Network's term for internalised economic value of trade is subjective value. The unit of subjective accounting is called FER. Trade with FER induces flow that defines the intersubjective value, i.e., the "exchange rates" of "subjective FERs". See the post for more details.

While sharing some features of the other two frames, the focus is instead on the institutions that foster AI development, rather than micro-features of AIs, such as their values

As I mentioned in the beginning, I think you are too dismissive of the "cognitivist" perspective. We shouldn't paint all "micro-features of AIs" with the same brush. I agree that value alignment is over-emphasized[2], but other engineering mechanisms and algorithms, such as decision-making algorithms, "scheming control" procedures, context alignment algorithms, as well as architectural features: namely being world-model-based[3] and being amenable to computational proofs[4] are very important and couldn't be recovered on the institutional/interface/protocol level. We demonstrated in the post about Gaia Network above that for for the "value economy" to work as intended, agents should make decisions based on maximum entropy rather than maximum likelihood estimates[5] and they should share and compose their world models (even if in a privacy-preserving way with zero-knowledge computations).

  1. ^

    Indeed, this observation makes evident that the refrain question "AI should be aligned with whom?" doesn't and shouldn't have a satisfactory answer if "alignment" is meant to be "totalising value alignment as often conceptualised on LessWrong"; on the other hand, if "alignment" is meant to be context alignment as a practice, the question becomes as non-sensical (in the general form) as the question "AI should interact with whom?" -- well, with someone, depending on the situation, in the way and to the degree appropriate! 

  2. ^

    However, still not completely irrelevant, at least for practical reasons: having shared values on the pre-training/hard-coded/verifiable level, as a minimum, reduces transaction costs because the AI agents shouldn't then painstakingly "eval" each other's values before doing any business together.

  3. ^

    Both Bengio and LeCun argue for this: see "Scaling in the service of reasoning & model-based ML" (Bengio and Hu, 2023) and "A Path Towards Autonomous Machine Intelligence" (LeCun, 2022).

  4. ^
  5. ^

    Which is just another way of saying that they should minimise their (expected) free energy in their model updates/inferences and the course of their actions.

2. Historically, the Federal Reserve has failed to subsequently tighten monetary stimulus I think a Keynesian idea that lies at the heart of Federal Reserve philosophy is that during economic recessions you offer monetary stimulus to support virtuous (re-)investment cycles, and during boom years you tighten market access to funds to prevent debt-fuelled excesses.

I would be really interested in reading somebody more qualified than me revising this point after the two years (or, this post more generally). It seems than right now, the economy and the financial system are closer to a third mode than they have been in decades (?): in 2024 in the US, we could see relatively low GDP growth and sustained relatively high interest rate if the Fed decides that lowering the rate prematurely could spike the inflation again.

However, this still pretty mild and benign in the US ,whereas the UK faces a real risk of downright stagflation, especially if the Labour party wins the next elections, which currently seems almost certain.

Another point, could be seen as generalisation of your first point: more money supply than optimal for the economy just makes the entire economy less efficient. It's entirely analogous to how an organism gets excessive fat and becomes inefficient (and incurs health risks) when over-fed (and I strongly suspect that this is deeper than just a superficial analogy, that there is a general system development law behind this).

More concretely:

  • Excessive bureaucratisation and creation of 'bullshit jobs', i.e., unproductive jobs whose main functions are in organisational politics and labor union relationships. BTW, these bullshit jobs will also mask deeper issues with employment: for example, participation rate (which is different from '100% - unemployment' though) is an important economic metric, but it seems to be misled (Goodhearted if you wish): what we are really interested in is productive participation rate.
    • Low productive participation rate is not only due to bullshit jobs, but also due to excessively many rentiers or early retirees created by too high money supply. Paradoxically, this leads to a decrease in the quality of life: many people have enough money to retire from productive jobs => shortage in real sector labour: doctors, public transport workers, street cleaners, canalisation maintenance workers, electricians, etc. => public services start to crumble, and wait times for non-public services (e.g., private doctors) increase, and the prices for both rise => lower quality of life even for those rentiers and retirees!
  • Creation of totally inefficient startups that burn money, don't have a path towards profitability, but still raise a lot of funds. It is possible to overshoot here in another direction, too, but the current situation when there are so many deeply unprofitable unicorns in late venture investment rounds and even long IPO-ed is unique.
    • Downstream effect of this unhealthy startup environment is https://en.wikipedia.org/wiki/Enshittification, which is at least partially caused or exacerbated by the monetary overhang. Too many startups are funded and are confident that they can raise money for a very long time => Price war and winner-take-all mentality => Quasi-monopolies and enshittification. We have seen this with Uber, and we are likely see it on a massive scale with AI startups, which are currently free or massively subsided, but later will come to eat our arm and leg. But note that I don't say that the over-supply of money is the only cause of this dynamic.

Another incentive for private firms to do R&D is to make their product at least 30% more valuable than the competition to compel customers to switch. Applies to b2b mostly and long-term products and contracts.

And another reason for why governments don't do more R&D is that they cannot do it effectively, outside of the context of concrete problems and without resources (infrastructure, tools, know-how, data, etc.) available to a concrete firm.

Announcement

I think SociaLLM has a good chance of getting OpenAI’s “Research into Agentic AI Systems” grant because it addresses both the challenges of the legibility of AI agent's behaviour by making the agent’s behaviour more “human-like” thanks to weight sharing and regularisation techniques/inductive biases described the post, as well as automatic monitoring: detection of duplicity or deception in AI agent's behaviour by comparing agent’s ToMs “in the eyes” of different other interlocutors, building on the work “Collective Intelligence in Human-AI Teams”.

I am looking for co-investigators for this (up to $100k, up to 8 months long) project with hands-on academic or practical experience in DL training (preferably), ML, Bayesian statistics, or NLP. The deadline for the grant application itself is the 20th of January, so I need to find a co-investigator by the 15th of January.

Another requirement for the co-investigator is that they preferably should be in academia, non-profit, or independent at the moment.

I plan to be hands-on during the project in data preparation (cleansing, generation by other LLMs, etc.) and training, too. However, I don’t have any prior experience with DL training, so if I apply for the project alone, this is a significant risk and a likely rejection.

If the project is successful, it could later be extended for further grants or turned into a startup.

If the project is not a good fit for you but you know someone who may be interested, I’d appreciate it a lot if you shared this with them or within your academic network!

Please reach out to me in DMs or at leventov.ru@gmail.com.

How about coordination and multi-scale planning (optimising both for short term and long term) failures? They both have economic value (i.e., economic value is lost when these failures happen), and they are both at least in part due to the selfish, short-term, impulsive motives/desires/"values" of humans.

E.g., I think people would like to buy an AI that manipulated them into following their exercise plan through some tricks, and likewise they would like to "buy" (build) collectively an AI that restricts their selfishness for the median benefit and the benefit of their own children and grandchildren.

Load more