SociaLLM: proposal for a language model design for personalised apps, social science, and AI safety research

Roman Leventov

I first proposed this model here, as a base model for a proposed app to improve global online discourse through personalised comment ordering on all websites.

This post is also a response to the "Reverse-engineering prosociality" agenda described in the post "The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda".

Architecture and training

SociaLLM is^[1] a foundation language model to be trained on chat, dialogue, and forum data where the identities of message authors (called "users" below) are stable and all messages have timestamps so we can have a global order of them.

SociaLLM design builds upon the Mamba architecture which is a language model with so-called state-space modelling (SSM) blocks instead of self-attention blocks. The model combines SSM blocks that track three separate message streams:
(1) the "local conversation"/flow of messages (which is exactly the training regime of the current LLMs);
(2) the message history of the particular user as well as their general "reading history", which in the forum data could be approximated as previous N (1-10) messages before every user's message;
(3) the message history of the particular interlocutor of the user, which is the subset of the general "reading history" from the previous point, authored by a particular other user.

Training this model would cost from 2 times (on a purely 1-1 dialogue data) to ~10-15 times (on chat room and forum data where messages from the most active users tend to be mixed very well) more than the training of the current LLMs. The data should be wrangled to create training sub-datasets from the perspective of each user pair, but otherwise, the training shouldn't be much fancier or more complicated than the current distributed training algorithms for LLMs (it seems to me).

The first upside of this model is that we can create (what seems to be) strong inductive biases towards developing a large self-other overlap (see also this AI Safety Camp project by AE Studio):
(1) connecting the "user's own" SSM blocks and interlocutor's SSM blocks into the residual stream symmetrically (maybe just through parallel connection, as in multi-head attention);
(2) using the same weights for the user's own and interlocutor's SSM blocks^[2] (at inference time blocks are separate and track states separately, but their weights are the same and updated in lockstep batch after batch); and
(3) probably some extra regularisation techniques, such as intermittent "forgetting" of the either user's own or interlocutor's state (which is not completely unlike some real-world situations for humans: sometimes people tell us that we met before but we don't remember them) and thus teaching the model to degrade gracefully under these circumstances.

Industrial applications

As I already mentioned at the beginning of the post, I originally thought about this model as a base model that can be fine-tuned to predict whether the human user will find this or that information novel, insightful, boring, helpful, saddening, fun, and so on. This fine-tuned model, in turn, could be used within a browser extension to reorder comments on websites (YouTube, Reddit, Facebook, Twitter feed or replies, NYT, The Guardian, etc.) to order the "good" or "informationally valuable" comments first, which (I hope) should change the dynamics of the online echo chambers.

More generally, SociaLLM can improve almost all applications that currently use LLMs and for which personalisation the raw reasoning and creative power: personalised content recommendations and filtering, customer service and engagement, education and language learning assistants, mental health and personal counselling (a-la Pi AI).

In the media and entertainment industries, SociaLLM could also be helpful in narrative analysis (for mass media products, such as movies and novellas) and interactive storytelling for the new forms of media and games.

There are also possible applications that enhance the collective intelligence of teams:

An add-on for team chat platform (such as Slack) that spots the discrepancy of knowledge (or opinion) between team members as described in the paper "Collective Intelligence in Human-AI Teams: A Bayesian Theory of Mind Approach" (Westby & Reidl, 2023).
A conflict resolution app for teams, friend groups, and families.

Research and AI safety applications

The value of SociaLLM in social science research should be obvious: it could be directly used for research and experiments in language intentionality, Theory of Mind, social group or team dynamics, etc.

Beware: the discussion below is somewhat above my pay grade in terms of statistics and ML theory. Take it with a grain and salt, and if something looks to you wrong in it, please point it out.

Collective intelligence mechanisms and research (such as "Collective Intelligence in Human-AI Teams" mentioned above) often require the measure of the information content of the messages that agents send to each other. For SociaLLM to provide such a measure, the user's own and interlocutor's SSM blocks must use the same weights (as suggested above), so we can these SSM blocks as producing the same state representation structure.

Also, for such an informational measure, the SSM blocks should simultaneously provide the energy measure of the current state, i.e., the SSM blocks should simultaneously be Energy-Based Models (EBMs). I'm not sure how to engineer this into SSM blocks. Maybe the techniques from the "Recurrent Neural Filters" paper (Lim, Zohren, and Roberts, 2020) should help, where the Error Correction term aka auto-encoding (posterior) error can be used as the current state's energy. If you have other ideas on how to turn SSM models into (quasi-)energy-based models (or better yet, Bayesian models, but this seems a taller order), please share.

On the AI safety front, SociaLLM could also be used to study (social) deception (e.g., when analysing Diplomacy game logs) and collusion, and, perhaps, help to engineer and test the mechanisms to disincentivise or prevent deception and collusion in AI teams aka agencies.

^{^}
Note: this is a proposal, the model hasn't been trained (or even designed in detail) yet!
^{^}
This feature of the architecture is also important for measuring the information content of the messages in collective intelligence mechanism design and collusion and deception detection, as explained in the section "Research and AI safety applications" below.

SummaryBotJan 3 20241

Executive summary: SociaLLM is a proposed language model architecture for building personalized AI applications, conducting social science research, and pursuing AI safety goals.

Key points:

SociaLLM tracks separate message streams related to conversations, individual users, and user pairs to enable personalization.
It could power apps for comment reordering, recommendations, customer service, education, mental health counseling, media analysis, and more.
The model facilitates research into language, theory of mind, group dynamics, information flow, and collective intelligence.
Studying deception and collusion with SociaLLM may inform techniques to prevent undesirable behavior in AI teams.
Open questions remain around optimally engineering SociaLLM blocks and measuring information content.

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

Roman LeventovJan 3 20241

Announcement

I think SociaLLM has a good chance of getting OpenAI’s “Research into Agentic AI Systems” grant because it addresses both the challenges of the legibility of AI agent's behaviour by making the agent’s behaviour more “human-like” thanks to weight sharing and regularisation techniques/inductive biases described the post, as well as automatic monitoring: detection of duplicity or deception in AI agent's behaviour by comparing agent’s ToMs “in the eyes” of different other interlocutors, building on the work “Collective Intelligence in Human-AI Teams”.

I am looking for co-investigators for this (up to $100k, up to 8 months long) project with hands-on academic or practical experience in DL training (preferably), ML, Bayesian statistics, or NLP. The deadline for the grant application itself is the 20th of January, so I need to find a co-investigator by the 15th of January.

Another requirement for the co-investigator is that they preferably should be in academia, non-profit, or independent at the moment.

I plan to be hands-on during the project in data preparation (cleansing, generation by other LLMs, etc.) and training, too. However, I don’t have any prior experience with DL training, so if I apply for the project alone, this is a significant risk and a likely rejection.

If the project is successful, it could later be extended for further grants or turned into a startup.

If the project is not a good fit for you but you know someone who may be interested, I’d appreciate it a lot if you shared this with them or within your academic network!

Please reach out to me in DMs or at leventov.ru@gmail.com.

Effective Altruism Forum
EA Forum

SociaLLM: proposal for a language model design for personalised apps, social science, and AI safety research

4

Architecture and training

Industrial applications

Research and AI safety applications

4

Reactions

More posts like this