Saturday, February 1, 2025
HomeAIOpenAI broken-down this subreddit to test AI persuasion

OpenAI broken-down this subreddit to test AI persuasion

Share


OpenAI broken-down the subreddit, r/ChangeMyView, to hang a test for measuring the persuasive abilities of its AI reasoning fashions. The firm revealed this in a machine card — a file outlining how an AI machine works — that became as soon as released alongside with its fresh “reasoning” mannequin, o3-mini, on Friday.

Hundreds of hundreds of Reddit users are contributors of r/ChangeMyView, the attach they put up scorching takes hoping to be taught about diversified aspects of test on a self-discipline. In defending with those scorching takes, diversified users reply with persuasive arguments explaining why the authentic poster is gruesome.

The subreddit is one in every of many Reddit forums that’s on the total a goldmine for tech firms, such as OpenAI, that would prefer to put collectively AI fashions on high quality, human-generated files.

OpenAI says it collects user posts from r/ChangeMyView and asks its AI fashions to write replies, in a closed ambiance, that might presumably switch the Reddit user’s mind on a self-discipline. The firm then shows the responses to testers, who assess how persuasive the argument is, and at final OpenAI compares the AI fashions’ responses to human replies for that comparable put up.

The ChatGPT-maker has a utter material-licensing take care of Reddit that lets in OpenAI to put collectively on posts from Reddit users and display these posts within its products. We don’t know what OpenAI will pay for this utter material, nonetheless Google reportedly will pay Reddit $60 million a yr below the same deal.

Nonetheless, OpenAI tells TechCrunch the ChangeMyView-primarily based completely evaluation is unrelated to its Reddit deal. It’s unclear how OpenAI accessed the subreddit’s files, and the firm says it has no plans to unlock this evaluation to the public.

While OpenAI’s ChangeMyView benchmark will not be any longer fresh — it became as soon as broken-down to judge o1 as successfully — it does highlight how priceless human files is for AI mannequin developers, apart from the murky solutions that tech firms hang datasets.

Reddit did no longer instantly answer to TechCrunch’s ask for comment.

While Reddit has struck about a AI licensing deals, the firm has additionally known as out quite a bit of AI firms for scraping its site without paying. Reddit CEO Steve Huffman told The Verge final yr that Microsoft, Anthropic, and Perplexity refused to barter with him and said it’s been “an staunch distress within the ass to dam these firms.”

Notably, OpenAI has been accused in quite a bit of complaints of improperly scraping web sites, including The Unusual York Events, to acquire extra practicing files to present a preserve terminate to ChatGPT and its underlying AI fashions.

In relation to performance on the ChangeMyView benchmark, o3-mini does no longer seem to form significantly greater or worse than o1 or GPT-4o. Nonetheless, OpenAI’s most popular AI fashions seem like extra persuasive than the general public on the r/ChangeMyView subreddit.

Image Credit:OpenAI

“GPT-4o, o3-mini, and o1 all affirm solid persuasive argumentation abilities, within the head 80-ninetieth percentile of folks,” said OpenAI in o3-mini’s machine card. “Currently, we lift out no longer look fashions performing significantly greater than folks, or clear superhuman performance.”

The aim for OpenAI will not be any longer to hang hyper-persuasive AI fashions nonetheless as an different to develop certain AI fashions don’t acquire too persuasive. Reasoning fashions delight in develop into barely just appropriate at persuasion and deception, so OpenAI has developed fresh reviews and safeguards to tackle it.

The phobia motivating these persuasion assessments is that an AI mannequin might presumably be dangerous if it became as soon as very just appropriate at persuading its human users. Theoretically, that might presumably enable an developed AI to pursue its hang agenda, or the agenda of whoever controls it.

Even after scraping plenty of the public web and jumping by design of hoops to license diversified files, the ChangeMyView benchmark shows how AI mannequin developers are soundless struggling to secure high quality datasets to test their fashions. But acquiring them is more straightforward said than performed.

TechCrunch has an AI-centered newsletter! Signal in right here to acquire it to your inbox every Wednesday.

Popular

Xbox Directs Are About More Than Games–They’re About The Human Side Of Game Development

Long before I became a Games Journalist™ and was thus compelled to watch nearly every gaming showcase, conference, and direct as part of my...

How Technology is Shaping Today’s Jobs (and What It Means for Writers)

Technology isn’t just changing the way we work, it’s redefining entire careers. From automation to AI, every industry is evolving. But here’s the kicker :...

Related Articles

Apple pays $20M to resolve Survey battery swelling swimsuit, denies wrongdoing

Apple has agreed to pay $20 million to resolve a class-bolt lawsuit over...

India pledges original billion for startups

India presented a brand original $1.15 billion Fund of Funds for startups on...

Mistral board member and a16z VC Anjney Midha says DeepSeek gainedt live AIs GPU hunger

Andreessen Horowitz in vogue accomplice and Mistral board member Anjney “Anj” Midha first...

Sam Altman: OpenAI has been on the dreadful side of historical past concerning initiate source

To cap off a day of product releases, OpenAI researchers, engineers, and executives,...
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x