Lucas Tucker

CTO @ Velvet
1 karmaJoined Working (0-5 years)lucas-tucker.github.io/

Bio

audio/video data at velvet (https://velvet-videos.com)

prev: uchicago, adobe, yc

How others can help me

interested in learning more about effective altruism

How I can help others

happy to make intros to folks in the yc community or talk about data

Comments
4

if you accept the premise that we're not constrained by capital, then competent founders who have important problems they want to solve can just get funding from AI safety funders rather than wait for the market to supply the needed capital

Tailwinds from the market help snowball your impact, but certainly aren't necessary.

This is all to say, I'm not sure if the lever we need to be pulling at the margin is market-shaping. I think the best lever is still probably talent.

I don't believe these things are mutually exclusive. The strongest founders/operators I know want to move the needle in a specific market, and if you want those folks, then it helps to frame the conversation around the problems they're already attacking.

Founders both technical and non-technical, for new research and non-research organizations.

You mention the lack of builders in this space but the incentives have to be there. At this time, starting from AI safety and working toward a profitable/successful business skews toward research, whereas working backward from the business is much more effective. For instance, if you've built a successful RL env company, then publish benchmarks about safety/reasoning in biological weapon creation, or cyber, or an area where more thought is needed.

The point being, if you want operators or generalists then I'd propose you start start with the economic incentive and build toward a more aligned outcome for that specific market.
 

Great read. At one point in the Q&A you mention the creation of constitution-guided synthetic data. I can't speak for text data at Anthropic's scale, but in audiovisual interactive models, each participant's contribution to the dataset adds a sort of persona vector to the learned distribution (purely in terms of body language). I'm curious whether you think of this synthetic data "correction" as a means of combining these personas into a single identity, or simply reinforcing those more "acceptable" personas with guiding principles.

In audiovisual models, a model's speech intonations, facial expressions, and other body language immediately lead to assumptions about the its identity, values, and even culture to some extent. Underlying these visuals are, of course, language models such as Claude. Should Claude or other language models be expected meet people's expectations and biases (i.e. acquiesce to a conversant's possibly rude assumptions), or should we design these systems not to stray too far from a "master" persona or identity? The stakes seem high, especially as artificial companionship becomes more mainstream and audiovisual.

I can speak only from the data angle, but I would add that directing focus toward the actual individuals performing RLHF & providing datasets (rather than calling this a pure "research problem") is vital to getting this right