By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
PulseReporterPulseReporter
  • Home
  • Entertainment
  • Lifestyle
  • Money
  • Tech
  • Travel
  • Investigations
Reading: Alibaba’s Qwen with Questions reasoning mannequin beats o1-preview
Share
Notification Show More
Font ResizerAa
PulseReporterPulseReporter
Font ResizerAa
  • Home
  • Entertainment
  • Lifestyle
  • Money
  • Tech
  • Travel
  • Investigations
Have an existing account? Sign In
Follow US
  • Advertise
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
PulseReporter > Blog > Tech > Alibaba’s Qwen with Questions reasoning mannequin beats o1-preview
Tech

Alibaba’s Qwen with Questions reasoning mannequin beats o1-preview

Last updated: November 29, 2024 8:15 pm
9 months ago
Share
Alibaba’s Qwen with Questions reasoning mannequin beats o1-preview
SHARE

Be part of our day by day and weekly newsletters for the newest updates and unique content material on industry-leading AI protection. Be taught Extra


Chinese language e-commerce big Alibaba has launched the newest mannequin in its ever-expanding Qwen household. This one is called Qwen with Questions (QwQ), and serves as the newest open supply competitor to OpenAI’s o1 reasoning mannequin.

Like different massive reasoning fashions (LRMs), QwQ makes use of further compute cycles throughout inference to assessment its solutions and proper its errors, making it extra appropriate for duties that require logical reasoning and planning like math and coding.

What’s Qwen with Questions (OwQ?) and may it’s used for industrial functions?

Alibaba has launched a 32-billion-parameter model of QwQ with a 32,000-token context. The mannequin is at the moment in preview, which suggests a higher-performing model is more likely to comply with.

Based on Alibaba’s assessments, QwQ beats o1-preview on the AIME and MATH benchmarks, which consider mathematical problem-solving talents. It additionally outperforms o1-mini on GPQA, a benchmark for scientific reasoning. QwQ is inferior to o1 on the LiveCodeBench coding benchmarks however nonetheless outperforms different frontier fashions equivalent to GPT-4o and Claude 3.5 Sonnet.

Qwen with Questions
Instance output of Qwen with Questions

QwQ doesn’t include an accompanying paper that describes the information or the method used to coach the mannequin, which makes it tough to breed the mannequin’s outcomes. Nevertheless, for the reason that mannequin is open, in contrast to OpenAI o1, its “considering course of” shouldn’t be hidden and can be utilized to make sense of how the mannequin causes when fixing issues.

Alibaba has additionally launched the mannequin beneath an Apache 2.0 license, which suggests it may be used for industrial functions.

‘We found one thing profound’

Based on a weblog submit that was revealed together with the mannequin’s launch, “By means of deep exploration and numerous trials, we found one thing profound: when given time to ponder, to query, and to mirror, the mannequin’s understanding of arithmetic and programming blossoms like a flower opening to the solar… This strategy of cautious reflection and self-questioning results in outstanding breakthroughs in fixing complicated issues.”

That is similar to what we learn about how reasoning fashions work. By producing extra tokens and reviewing their earlier responses, the fashions usually tend to right potential errors. Marco-o1, one other reasoning mannequin just lately launched by Alibaba may also comprise hints of how QwQ may be working. Marco-o1 makes use of Monte Carlo Tree Search (MCTS) and self-reflection at inference time to create totally different branches of reasoning and select the perfect solutions. The mannequin was educated on a combination of chain-of-thought (CoT) examples and artificial information generated with MCTS algorithms.

Alibaba factors out that QwQ nonetheless has limitations equivalent to mixing languages or getting caught in round reasoning loops. The mannequin is accessible for obtain on Hugging Face and an internet demo may be discovered on Hugging Face Areas.

The LLM age offers approach to LRMs: Massive Reasoning Fashions

The discharge of o1 has triggered rising curiosity in creating LRMs, despite the fact that not a lot is thought about how the mannequin works beneath the hood except for utilizing inference-time scale to enhance the mannequin’s responses. 

There are actually a number of Chinese language opponents to o1. Chinese language AI lab DeepSeek just lately launched R1-Lite-Preview, its o1 competitor, which is at the moment solely accessible by way of the corporate’s on-line chat interface. R1-Lite-Preview reportedly beats o1 on a number of key benchmarks.

One other just lately launched mannequin is LLaVA-o1, developed by researchers from a number of universities in China, which brings the inference-time reasoning paradigm to open-source imaginative and prescient language fashions (VLMs). 

The give attention to LRMs comes at a time of uncertainty about the way forward for mannequin scaling legal guidelines. Stories point out that AI labs equivalent to OpenAI, Google DeepMind, and Anthropic are getting diminishing returns on coaching bigger fashions. And creating bigger volumes of high quality coaching information is turning into more and more tough as fashions are already being educated on trillions of tokens gathered from the web. 

In the meantime, inference-time scale gives another which may present the subsequent breakthrough in enhancing the talents of the subsequent technology of AI fashions. There are reviews that OpenAI is utilizing o1 to generate artificial reasoning information to coach the subsequent technology of its LLMs. The discharge of open reasoning fashions is more likely to stimulate progress and make the house extra aggressive.

VB Each day

Keep within the know! Get the newest information in your inbox day by day

By subscribing, you conform to VentureBeat’s Phrases of Service.

Thanks for subscribing. Try extra VB newsletters right here.

An error occured.


You Might Also Like

Programmers Aren’t So Humble Anymore—Perhaps As a result of No person Codes in Perl

MSI Titan 18 HX AI Assessment: The Final Gaming Laptop computer

This Forgotten Nintendo Change Recreation Can Degree Up Your Exercises

Apple is trying into shopping for Perplexity AI

The open supply Mannequin Context Protocol was simply up to date — here is why it is a large deal

Share This Article
Facebook Twitter Email Print
Previous Article American Airways holds agency on mileage expiration coverage American Airways holds agency on mileage expiration coverage
Next Article Individuals Are Revealing The Mandela Results They Will Go To Their Graves Disagreeing With, And You May Refuse To Settle for These Too Individuals Are Revealing The Mandela Results They Will Go To Their Graves Disagreeing With, And You May Refuse To Settle for These Too
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Weekly Newsletter

Subscribe to our newsletter to get our newest articles instantly!

More News

OpenAI staffers to promote  billion in inventory to SoftBank, different buyers
OpenAI staffers to promote $6 billion in inventory to SoftBank, different buyers
6 minutes ago
There's New Studies On Taylor Swift And Blake Vigorous's Friendship, And It Doesn't Sound Nice
There's New Studies On Taylor Swift And Blake Vigorous's Friendship, And It Doesn't Sound Nice
27 minutes ago
Gear Information of the Week: A New Privateness Cellphone Arrives, and Samsung Has a K 115-Inch Micro RGB TV
Gear Information of the Week: A New Privateness Cellphone Arrives, and Samsung Has a $30K 115-Inch Micro RGB TV
59 minutes ago
Tom Hanks Films Are The Greatest — Choose Some And I'll Suggest A Latest Non-Tom Launch For You To Watch
Tom Hanks Films Are The Greatest — Choose Some And I'll Suggest A Latest Non-Tom Launch For You To Watch
1 hour ago
What’s AI psychosis? | Mashable
What’s AI psychosis? | Mashable
2 hours ago

About Us

about us

PulseReporter connects with and influences 20 million readers globally, establishing us as the leading destination for cutting-edge insights in entertainment, lifestyle, money, tech, travel, and investigative journalism.

Categories

  • Entertainment
  • Investigations
  • Lifestyle
  • Money
  • Tech
  • Travel

Trending

  • OpenAI staffers to promote $6 billion in inventory to SoftBank, different buyers
  • There's New Studies On Taylor Swift And Blake Vigorous's Friendship, And It Doesn't Sound Nice
  • Gear Information of the Week: A New Privateness Cellphone Arrives, and Samsung Has a $30K 115-Inch Micro RGB TV

Quick Links

  • About Us
  • Contact Us
  • Privacy Policy
  • Terms Of Service
  • Disclaimer
2024 © Pulse Reporter. All Rights Reserved.
Welcome Back!

Sign in to your account