ByteDance Researchers Publish High-Performance AI Training Method

Researchers from ByteDance, Tsinghua University, and the University of Hong Kong have released an open-source system for AI reinforcement learning that they say outperforms a reasoning system from DeepSeek.

The DAPO (Dynamic Sampling Policy Optimisation) system is designed to provide reinforcement-learning techniques for large language models (LLMs) that can be reused by other researchers.

AI companies often release only partial details of their RL methods, the researchers said, making the techniques difficult to reproduce.

Image credit: Unsplash

Open approach

In a new research paper, they said they tried to reproduce DeepSeek’s GRPO (group relative policy optimisation) method, but their results trailed DeepSeek’s by 17 points in an AIME benchmark score, “suggesting that critical training details may have been omitted in the R1 paper”.

R1 is DeepSeek’s latest “reasoning” AI model.

Reasoning models deliberately “think” longer before delivering an answer, double-checking their responses and reducing the potential for errors.

In the interests of transparency and reproducibility, the DAPO team released the algorithmic details, training procedures and datasets used in their research.

The project includes training codes and a prepared dataset called DAPO-Math-17K for mathematical reasoning tasks.

The team said DAPO delivered significant performance improvements over DeepSeek’s GRPO on the American Invitational Mathematics Examination (AIME) 2024 benchmark, with a score of 50 points when using the open-source Qwen2.5-32B base model from Alibaba, compared to 47 points for GRPO.

Efficiency

DAPO achieved the score with half the training steps of GRPO, underscoring its efficiency, the team said.

The project is led by ByteDance intern Yu Qiying, a doctoral student at Tsinghua, with other participants being a Tsinghua undergraduate and a University of Hong Kong doctoral student, as the company seeks to work with top-level AI researchers before they have graduated.

The TikTok parent has invested heavily in AI, and its Doubao chatbot has become China’s most popular chatbot since its launch last May, ranking as the world’s second most popular after OpenAI’s ChatGPT.

Matthew Broersma

Matt Broersma is a long standing tech freelance, who has worked for Ziff-Davis, ZDnet and other leading publications

Recent Posts

Amazon’s Project Kuiper Launches To Challenge Musk’s Starlink

First launch of Amazon’s Project Kuiper internet satellites takes place, as Jeff Bezos challenge to…

1 hour ago

Amazon Sellers ‘Pull Back’ From Prime Day Over Tariffs

Third-party sellers reportedly pulling back from participation in Prime Day mega-sale to protect profit margins…

1 day ago

Private Equity Firms ‘Circle’ NCC Group’s Escode

Several buyout firms reportedly interested in NCC Group unit Escode as UK cybersecurity company says…

1 day ago

Pegatron ‘Continuing’ Manufacturing Plans Despite Tariffs

Apple, Dell supplier Pegatron says tariffs not disrupting manufacturing strategy, but could lead to empty…

1 day ago

Huawei ‘To Begin Testing’ Next-Gen AI Chip Ascend 910D

Huawei reportedly set to receive first batch of Ascend 910D AI chip samples as it…

1 day ago

DeepMind UK Staff ‘Seek Unionisation’ To Challenge Military Deals

About 300 DeepMind UK staff seek unionisation to challenge Google's renewed pursuit of military, surveillance…

1 day ago