RL

Roman Leventov

75 karmaJoined Nov 2022

Bio

An independent researcher of ethics, AI safety, and AI impacts. LessWrong: https://www.lesswrong.com/users/roman-leventov. Twitter: https://twitter.com/leventov. E-mail: leventov.ru@gmail.com (the preferred mode of communication).

Comments
22

(Cross-posted from LW)

@Nathan Helm-Burger's comment made me think it's worthwhile to reiterate here the point that I periodically make:

Direct "technical AI safety" work is not the only way for technical people (who think that governance & politics, outreach, advocacy, and field-building work doesn't fit them well) to contribute to the larger "project" of "ensuring that the AI transition of the civilisation goes well".

Now, as powerful LLMs are available, is the golden age to build innovative systems and tools to improve[1]:

I believe that if such projects are approached with integrity, thoughtful planning, and AI safety considerations at heart rather than with short-term thinking (specifically, not considering how the project will play out if or when AGI is developed and unleashed on the economy and the society) and profit-extraction motives, they could shape to shape the trajectory of the AI transition in a positive way, and the impact may be comparable to some direct technical AI safety/alignment work.

In the context of this post, it's important that the verticals and projects mentioned above could either be conventionally VC-funded because they could promise direct financial returns to the investors, or could receive philanthropic or government funding that wouldn't otherwise go to technical AI safety projects. Also, there is a number of projects in these areas that are already well-funded and hiring.

Joining such projects might also be a good fit for software engineers and other IT and management professionals who don't feel they are smart enough or have the right intellectual predispositions to do good technical research, anyway, even there was enough well-funded "technical AI safety research orgs". There should be some people who do science and some people who do engineering.

  1. ^

    I didn't do serious due diligence and impact analysis on any of the projects mentioned. The mentioned projects are just meant to illustrate the respective verticals, and are not endorsements.

The "nature coin" complicates this experiment a lot. Also, it sounds like a source of inherent randomness in policy's outcomes, i.e., aleatoric uncertainty, which is perhaps rarely or never the case for actual policies and therefore evaluating such policies ethically is unnatural: the brain is not trained to do this. When people discuss and think about the ethics of policies, even the epistemic uncertainty is often assumed away, though it is actually very common that we don't know whether a potential policy will turn out good or bad.

Due to this, I would say I have a preference for the intervention E because it's the only one which actually doesn't depend on the "nature coin".

I think a lot of people who are aware of AI risk for some time but nevertheless choose to work on some other causes, such as climate change, may implicitly hold this view.

Fertility rate may be important but to me it's not worth restricting (directly or indirectly) people's personal choices for.

This is a radical libertarian view that most people don't share. Is it worth restricting people's access to hard drugs? Let's abstract for a moment from the numerous negative secondary effects that come with the fact that hard drugs are illegal, as well as from the crimes committed by drug users: if we can imagine that hard drugs could be just eliminated from Earth completely, with a magic spell, should we do it, or we "shouldn't restrict people's choices"? With AI romantic partners, and other forms of tech, we do have a metaphorical magic wand: we could decide whether such products ever get created or not.

A lot of socially regressive ideas have been justified in the name of "raising the fertility rate" – for example, the rhetoric that gay acceptance would lead to fewer babies (as if gay people can simply "choose to be straight" and have babies the straight way).

The example that you give doesn't work as evidence for your argument at all, due to the direct disanalogy: the "young man" from the "mainline story" which I outlined could want to have kids in the future or even wants to have kids already when he starts his experiment with the AI relationship, but his experience with the AI partner will prevent him from realising this desire and value over his future life.

I think it's better to encourage people who are already interested in having kids to do so, through financial and other incentives.

Technology, products, and systems are not value-neutral. We are so afraid of consciously shaping our own values that we are happy to offload this to the blind free market whose objective is not to shape values that reflectively endorse the most.

Maybe I'm Haidt- and Humane Tech-pilled, but to me, the widespread addiction of new generations to the present-form social media is a massive problem which could contribute substantially to how the AI transition eventually plays out, because social media directly affects social cohesion, i.e., the ability of society to work out responses to big questions concerning the AI (such as, should we build AGI at all? Should we try to build conscious AIs that are moral subjects? How the post-scarcity economy should look like?), and, indeed, the level of interest and engagement of people in these questions at all.

The "meh" attitude of the EA community towards the issues surrounding social media, digital addiction, and AI romance is still surprising to me, I still don't understand the underlying factors or deeply held disagreements which elicit such different responses to these issues in me (for example) and most EAs. Note that this is not because I'm a "conservative who doesn't understand new things": for example, I think much more favourably of AR and VR, I mostly agree with Chalmers' "Reality Plus", etc.

nowhere near the scale of other problems to do with digital minds if they have equal moral value to people and you don't discount lives in the far future.

I agree with this, but by this token, most issues which EAs concern with are nowhere near the scale of S-risks and other potential problems to do with future digital minds. Also, these problems only become relevant if we decide to build conscious AIs and there is no widespread legal and cultural opposition to that, which is a big "if".

Harris and Raskin talked about the risk that AI partners will be used for "product placement" or political manipulation here, but I'm sceptical about this. These AI partners will surely have a subscription business model rather than a freemium model, and, given how user trust will be extremely important for these businesses, I don't think they will try to manipulate the users in this way.

More broadly speaking, values will surely change, there is no doubt about that. The very value of "human connection" and "human relationships" is eroded by definition if people are in AI relationships. A priori, I don't think value drift is a bad thing. But in this particular case, this value change will inevitably go along with the reduction of the population, which is a bad thing (according to my ethics, and the ethics of most other people, I believe).

This is a sort of more general form of whataboutism that I considered in the last session. We are not talking just about some abstract "traditional option", we are talking about total fertility rate. I think everybody agrees that it's important, conservatives and progressives, long-termists and politicians.

If we are talking that childbirth (full families, and parenting) is not important because we will soon have artificial wombs, which, in tandem with artificial insemination and automated systems for child rearing from birth through the adulthood, will give us "full cycle automated human reproduction and development system" and make the traditional mode of human being (relationships and kids) "unnecessary" for reailsing value in the Solar system, then I would say: OK, let's wait until we actually have an artificial womb and then reconsider about AI partners (if we will get to do it).

My "conservative" side would also say that AI partners (and even AI friends/companions, to some degree!) will harm society because it would reduce the total human-to-human interaction, culture transfer, and may ultimately precipitate the intersubjectivity collapse. However, this is a much less clear story for me, so I've left it out, and don't oppose to AI friends/companions in this post.

[...] we are impressed by [...] ‘Eliciting Latent Knowledge' [that] provided conceptual clarity to a previously confused concept

To me, it seems that ELK is (was) attention-captivating (among the AI safety community) but doesn't assume a solid basis: logic and theories of cognition and language, and therefore is actually confusing, which prompted at least several clarification and interpretation atttempts (1, 2, 3). I'd argue that most people leave original ELK writings more confused than they were before. So, I'd classify ELK as a mind-teaser and maybe problem-statement (maybe useful than distracting, or maybe more distracting than useful; it's hard to judge as of now), but definitely not as great "conceptual clarification" work.

From the AI "engineering" perspective, values/valued states are "rewards" that the agent adds themselves in order to train (in RL style) their reasoning/planning network (i.e., generative model) to produce behaviours that are adaptive but also that they like and find interesting (aesthetics). This RL-style training happens during conscious reflection.

Under this perspective, but also more generally, you cannot distinguish between intrinsic and instrumental values because intrinsic values are instrumental to each other, but also because there is nothing "intrinsic" about self-assigned reward labels. In the end, what matters is the generative model that is able to produce highly adaptive (and, ideally, interesting/beautiful) behaviours in a certain range of circumstances.

I think you confusion about the ontological status of values is further corroborated by this phrase for the post: "people are mostly guided by forces other than their intrinsic values [habits, pleasure, cultural norms]". Values are not forces, but rather inferences about some features or one's own generative model (that help to "train" this very model in "simulated runs", i.e., conscious analysis of plans and reflections). However, the generative model itself is effectively the product of environmental influences, development, culture, physiology (pleasure, pain), etc. Thus, ultimately, values are not somehow distinct from all these "forces", but are indirectly (through the generative model) derived from these forces.

Under the perspective described above, valuism appears to switch the ultimate objective ("good" behaviour) for "optimisation of metrics" (values). Thus, there is a risk of Goodharting. I also agree with dan.pandori who noted in another comment that valuism pretty much redefines utilitarianism, whose equivalent in AI engineering is RL.

You may say that I suggest an infinite regress, because how "good behaviour" is determined, other than through "values"? Well, as I explained above, it couldn't be through "values", because values are our own creation within our own ontological/semiotic "map". Instead, there could be the following guides to "good behaviour":

  • Good old adaptivity (survival) [roughly corresponds to so-called "intrinsic value" in expected free energy functional, under Active Inference]
  • Natural ethics, if exists (see the discussion here: https://www.lesswrong.com/posts/3BPuuNDavJ2drKvGK/scientism-vs-people#The_role_of_philosophy_in_human_activity). If "truly" scale-free ethics couldn't be derived from basic physics alone, there is still evolutionary/game-theoreric/social/group stage on which we can look for an "optimal" ethics arrangement of agent's behaviour (and, therefore, values that should help to train these behaviours), whose "optimality", in turn, is derived either from adaptivity or aesthetics on the higher system level (i.e., group level).
  • Aesthetics and interestingness: there are objective, information-theoretic ways to measure these, see Schmidhuber's works. Also, this roughly corresponds to "epistemic value" in expected free energy functional under Active Inference.

If the "ultimate" objective is the physical behaviour itself (happening in the real world), not abstract "values" (which appear only in agent's mind), I think Valuism could be cast as any philosophy that emphasises creation of a "good life" and "right action", such as Stoicism, plus some extra emphasis on reflection and meta-awareness, albeit I think Stoicism already puts significant emphasis on these.

AI safety is a field concerned with preventing negative outcomes from AI systems and ensuring that AI is beneficial to humanity.

This is a bad definition of "AI safety" as a field, which muddles the water somewhat. I would say that AI safety is a particular R&D branch (plus we can add here meta and proxy activities for this R&D field, such as AI safety fieldbuilding, education, outreach and marketing among students, grantmaking, and platform development such as what apartresearch.com are doing), of the gamut of activity that strives to "prevent the negative result of civilisational AI transition". 

There are also other sorts of activity that strive for that more or less directly, some of which are also R&D (such as governance R&D (cip.org), R&D in cryptography, infosec, and internet decentralisation (trustoverip.org)), and others are not R&D: good old activism and outreach to the general public (StopAI, PauseAI), good old governance (policy development, UK foundational model task force), and various "mitigation" or "differential development" projects and startups, such as Optic, Digital Gaia, Ought, social innovations (I don't know about any good examples as of yet, though), innovations in education and psychological training of people (I don't know about any good examples as of yet). See more details and ideas in this comment.

It's misleading to call this whole gamut of activities "AI safety". It's maybe "AI risk mitigation". By the way, 80000 hours, despite properly calling "Preventing an AI-related catastrophe", also suggest that the only two ways to apply one's efforts to this cause is "technical AI safety research" and "governance research and implementation", which is wrong, as I demonstrated above.

Somebody may ask, isn't technical AI safety research more direct and more effective way to tackle this cause area? I suspect that it might not be the case for people who don't work at AGI labs. That is, I suspect that independent or academic AI safety research might be inefficient enough (at least for most people attempting it) that it would be more effective to apply themselves to various other activities, and "mitigation" or "differential development" projects of the likes that are described above. (I will publish a post that details reasoning behind this suspicion later, but for now this comment has the beginning of it.)

Load more