This is a special post for quick takes by Milan Weibel. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
Sorted by Click to highlight new quick takes since: Today at 2:14 AM

In a certain sense, an LLM's token embedding matrix is a machine ontology. Semantically similar tokens have similar embeddings in the latent space. However, different models may have learned different associations when their embedding matrix was trained. Every forward pass starts colored by ontological assumptions, an these may have alignment implications.

For instance, we would presumably not want a model to operate within an ontology that associates the concept of AI with the concept of evil, particularly if it is then prompted to instantiate a simulacrum that believes it is an AI.

Has someone looked into this? That is, the alignment implications of different token embedding matrices? I feel like it would involve calculating a lot of cosine similarities and doing some evals.

Curated and popular this week
Relevant opportunities