Lucas Tucker

CTO @ Velvet
1 karmaJoined Working (0-5 years)lucas-tucker.github.io/

Bio

audio/video data at velvet (https://velvet-videos.com)

prev: uchicago, adobe, yc

How others can help me

interested in learning more about effective altruism

How I can help others

happy to make intros within the yc community or talk about data

Comments
2

Great read. At one point in the Q&A you mention the creation of constitution-guided synthetic data. I can't speak for text data at Anthropic's scale, but in audiovisual interactive models, each participant's contribution to the dataset adds a sort of persona vector to the learned distribution (purely in terms of body language). I'm curious whether you think of this synthetic data "correction" as a means of combining these personas into a single identity, or simply reinforcing those more "acceptable" personas with guiding principles.

In audiovisual models, a model's speech intonations, facial expressions, and other body language immediately lead to assumptions about the its identity, values, and even culture to some extent. Underlying these visuals are, of course, language models such as Claude. Should Claude or other language models be expected meet people's expectations and biases (i.e. acquiesce to a conversant's possibly rude assumptions), or should we design these systems not to stray too far from a "master" persona or identity? The stakes seem high, especially as artificial companionship becomes more mainstream and audiovisual.

I can speak only from the data angle, but I would add that directing focus toward the actual individuals performing RLHF & providing datasets (rather than calling this a pure "research problem") is vital to getting this right