Be part of our each day and weekly newsletters for the newest updates and unique content material on industry-leading AI protection. Study Extra
It began with the announcement of OpenAI’s o1 mannequin in September 2024, however actually took off with DeepSeek R1 launched in January 2025.
Now, it appears that evidently most main AI mannequin suppliers and trainers are in a brand new race to ship higher, sooner, cheaper, extra inexpensive or extra highly effective and performant “reasoning” AI language fashions — that’s, ones that perhaps take just a little longer to reply to a human consumer, however ideally accomplish that with higher, extra complete, extra effectively “reasoned” solutions, which these class of fashions get by performing “chain-of-thought,” reflecting on their very own conclusions and interrogating them for veracity earlier than responding.
ByteDance, the Chinese language internet media big dad or mum of TikTok, is the newest to hitch the social gathering with announcement and publication of the technical paper behind Seed-Considering-v1.5, an upcoming massive language mannequin (LLM) designed to advance reasoning efficiency throughout each science, tech, math, and engineering (STEM) fields and general-purpose domains.
The mannequin shouldn’t be but accessible for obtain or use, and it’s unclear what the licensing phrases will likely be — whether or not will probably be proprietary/closed supply or open supply/free for all to make use of and modify at will, or someplace in between. However the technical paper supplies some noteworthy particulars which might be price going over now upfront of each time it’s made accessible.
Constructed atop the more and more well-liked Combination-of-Specialists (MoE) structure
Like Meta’s new Llama 4 and Mistral’s Mixtral earlier than it, Seed-Considering-v1.5 is constructed utilizing a Combination-of-Specialists (MoE) structure.
This structure is designed to make fashions extra environment friendly, primarily combining the capabilities of a number of fashions into one, every mannequin specializing in a distinct area.
On this case, the MoE structure implies that Seed-Considering-v1.5 makes use of solely 20 billion parameters at a time from a complete of 200 billion.
ByteDance says in its technical paper revealed to GitHub that Seed-Considering-v1.5 prioritizes structured reasoning and considerate response technology.
The outcomes almost converse for themselves, with Seed-Considering-v1.5 outperforming DeepSeek R1 and approaching Google’s newly launched Gemini 2.5 Professional and OpenAI’s o3-mini-high reasoner on many third-party benchmark evaluations, even exceeding these two within the case of the ARC-AGI benchmark, which measures progress in the direction of synthetic normal intelligence, seen because the objective or “Holy Grail” of AI — a mannequin that outperforms people on most economically beneficial duties, in accordance with OpenAI’s definition.

Positioned as a compact but succesful various to bigger state-of-the-art fashions, Seed-Considering-v1.5 achieves aggressive benchmark outcomes and introduces improvements in reinforcement studying (RL), coaching information curation, and AI infrastructure.
Efficiency benchmarks and mannequin focus
Seed-Considering-v1.5 exhibits sturdy efficiency on a collection of difficult duties, scoring 86.7% on AIME 2024, 55.0% move@8 on Codeforces, and 77.3% on the GPQA science benchmark. These outcomes place it near or matching fashions like OpenAI’s o3-mini-high and Google’s Gemini 2.5 Professional on particular reasoning metrics.
On non-reasoning duties, the mannequin was evaluated by human desire comparisons and achieved an 8.0% larger win fee over DeepSeek R1, suggesting that its strengths generalize past simply logic or math-heavy challenges.
To deal with saturation in frequent benchmarks like AIME, ByteDance launched BeyondAIME, a brand new, tougher math benchmark with curated issues designed to withstand memorization and higher discriminate mannequin efficiency. This and the Codeforces analysis set are anticipated to be publicly launched to assist future analysis.
Information technique
Coaching information performed a central function within the mannequin’s improvement. For supervised fine-tuning (SFT), the workforce curated 400,000 samples, together with 300,000 verifiable (STEM, logic, and coding duties) and 100,000 non-verifiable issues like artistic writing and role-playing.
For RL coaching, information was segmented into:
- Verifiable issues: 100,000 rigorously filtered STEM questions and logic puzzles with recognized solutions, sourced from elite competitions and professional evaluation.
- Non-verifiable duties: Human-preference datasets targeted on open-ended prompts, evaluated utilizing pairwise reward fashions.
The STEM information leaned closely on superior arithmetic, accounting for over 80% of the issue set. Further logic information included duties like Sudoku and 24-point puzzles, with adjustable problem to match mannequin progress.
Reinforcement studying strategy
Reinforcement studying in Seed-Considering-v1.5 is powered by customized actor-critic (VAPO) and policy-gradient (DAPO) frameworks, developed to deal with recognized instabilities in RL coaching. These strategies deal with decreasing reward sign sparsity and enhancing coaching stability, particularly in lengthy chain-of-thought (CoT) settings.
Reward fashions play a important function in supervising RL outputs. ByteDance launched two key instruments:
- Seed-Verifier: A rule-based LLM that checks if generated and reference solutions are mathematically equal.
- Seed-Considering-Verifier: A step-by-step reasoning-based decide that improves judgment consistency and resists reward hacking.
This two-tiered reward system permits nuanced analysis for each easy and complicated duties.
Infrastructure and scaling
To assist environment friendly large-scale coaching, ByteDance constructed a system atop its HybridFlow framework, with execution dealt with by Ray clusters and co-located coaching and inference processes to cut back GPU idle time.
A notable innovation is the Streaming Rollout System (SRS), which separates mannequin evolution from runtime execution. It accelerates iteration pace by asynchronously managing partially accomplished generations throughout mannequin variations. This structure reportedly delivers as much as 3× sooner RL cycles.
Further infrastructure strategies embrace:
- Blended precision (FP8) for reminiscence financial savings
- Professional parallelism and kernel auto-tuning for MoE effectivity
- ByteCheckpoint for resilient and versatile checkpointing
- AutoTuner for optimizing parallelism and reminiscence configurations
Human analysis and real-world affect
To judge alignment with human-centric preferences, ByteDance performed human testing throughout a spread of domains together with artistic writing, humanities data, and normal dialog.
Seed-Considering-v1.5 persistently outperformed DeepSeek R1 throughout classes, reinforcing its applicability to real-world consumer wants.
The event workforce notes that reasoning fashions educated totally on verifiable duties demonstrated sturdy generalization to artistic domains—an final result attributed to the construction and rigor embedded in mathematical coaching workflows.
What it means for technical leaders, information engineers and enterprise decision-makers
For technical leads managing the lifecycle of enormous language fashions—from information curation to deployment—Seed-Considering-v1.5 presents a possibility to rethink how reasoning capabilities are built-in into enterprise AI stacks.
Its modular coaching course of, which incorporates verifiable reasoning datasets and multi-phase reinforcement studying, is especially interesting to groups seeking to scale LLM improvement whereas retaining fine-grained management.
ByteDance’s strikes to introduce Seed-Verifier and Seed-Considering-Verifier provide mechanisms for extra reliable reward modeling, which might be important when deploying fashions into customer-facing or regulated environments.
For groups that always function beneath tight deadlines and restricted bandwidth, the mannequin’s stability beneath reinforcement studying—enabled by improvements like VAPO and dynamic sampling—may cut back iteration cycles and streamline fine-tuning for particular duties.
From an orchestration and deployment perspective, the mannequin’s hybrid infrastructure strategy—together with the Streaming Rollout System (SRS) and assist for FP8 optimization—suggests vital positive factors in coaching throughput and {hardware} utilization.
These options can be beneficial for engineers answerable for scaling LLM operations throughout cloud and on-prem techniques. The truth that Seed-Considering-v1.5 was educated with mechanisms to adapt reward suggestions primarily based on runtime dynamics speaks on to the challenges of managing heterogeneous information pipelines and sustaining consistency throughout domains.
For groups tasked with making certain reliability, reproducibility, and steady integration of recent instruments, Seed-Considering-v1.5’s system-level design may function a blueprint for constructing strong, multi-modal orchestration techniques.
For information engineering professionals, the structured strategy to coaching information—together with rigorous filtering, augmentation, and professional verification—reinforces the significance of information high quality as a multiplier of mannequin efficiency. This might encourage extra deliberate approaches to dataset improvement and validation pipelines.
Future outlook
Seed-Considering-v1.5 is the results of collaboration inside ByteDance’s Seed LLM Methods workforce, led by Yonghui Wu and with public illustration by Haibin Lin, a long-time AI contributor.
The undertaking additionally attracts on earlier efforts like Doubao 1.5 Professional and incorporates shared strategies in RLHF and information curation.
Trying forward, the workforce plans to proceed refining reinforcement studying strategies, with a deal with coaching effectivity and reward modeling for non-verifiable duties. The general public launch of inside benchmarks corresponding to BeyondAIME is meant to foster broader development in reasoning-focused AI analysis.