Add README (Chinese) for tuner (#106)

This commit is contained in:
Yuchang Sun
2026-01-20 19:46:50 +08:00
committed by GitHub
parent 311ddfff46
commit 400c1e77bf
16 changed files with 1256 additions and 90 deletions

View File

@@ -4,18 +4,13 @@ This example demonstrates how to use **AgentScope-Tuner** to enhance a math prob
## Task Setting
We use the foundational [math-agent example](https://github.com/agentscope-ai/agentscope-samples/blob/main/tuner/math_agent/main.py) as our baseline to demonstrate the data enhancement capabilities. Notably, these data-centric techniques are generic and customizable, making them adaptable to other agent workflows.
We use the foundational [math-agent example](https://github.com/agentscope-ai/agentscope-samples/blob/main/tuner/math_agent/main.py) as our baseline. The agent is a **`ReActAgent`** that solves mathematical reasoning problems through step-by-step reasoning.
### Agent Goal and Type
The agent's objective is to solve mathematical reasoning problems, learning to produce a correct final answer through a step-by-step thought process. The agent is implemented as a **`ReActAgent`**, which follows a reasoning-acting loop to solve tasks iteratively.
### Objective of the Data-Centric Approach
Training can be inefficient if tasks are too easy or too hard. This example addresses this by providing **selectors** to dynamically select tasks using **data feedback**. This empowers users to explore and implement their own data-centric strategies, such as focusing on "productively challenging" samples, to maximize training efficiency.
Training can be inefficient if tasks are too easy or too hard. This example demonstrates how to use **task selectors** to dynamically select tasks based on **data feedback**, focusing on "productively challenging" samples to maximize training efficiency. These data-centric techniques are generic and adaptable to other agent workflows.
## Dataset Preparation
To enable difficulty-based sampling, our training data needs to include features that represent the "difficulty" of each task.
To enable difficulty-based sampling, the training data must include difficulty features (e.g., pass rates from LLMs).
1. **Base Dataset**: You can use any standard math problem dataset. A good example is the math data in [LLM360/guru-RL-92k](https://huggingface.co/datasets/LLM360/guru-RL-92k), which comes pre-annotated with pass rates from different LLMs, serving as direct difficulty features.
2. **Build Your Own Features**: If you use your own dataset, you can generate these features by pre-running several models of varying capabilities and recording their pass rates. This can be done within the [**Trinity-RFT**](https://github.com/agentscope-ai/Trinity-RFT/pull/440) framework.
@@ -34,7 +29,7 @@ Leveraging the powerful data processing capabilities of **Trinity-RFT**, **Agent
#### Task Selector
The `Task Selector` determines how samples are selected from a dataset. It can be configured directly in `Yaml Config`.
The `Task Selector` determines how samples are selected from a dataset. It can be configured directly in configuration YAML files.
- **Built-in Selectors**:
- `sequential`: Samples are selected in a fixed order.
@@ -43,7 +38,7 @@ The `Task Selector` determines how samples are selected from a dataset. It can b
- `offline_easy2hard`: Samples are sorted by a predefined feature for curriculum learning.
- `difficulty_based` (Customized): An adaptive sampler based on task difficulty.
> For more details on `Task Selector`, including how to implement a custom selector based on feedback signals, please refer to **Trinity-RFT**'s **[Selector Development Guide](https://github.com/agentscope-ai/Trinity-RFT/blob/main/docs/sphinx_doc/source/tutorial/develop_selector.md)**.
> For more details on `Task Selector`, including how to implement a custom selector based on feedback signals, please refer to **Trinity-RFT**'s **[Selector Development Guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/develop_selector.html)**.
#### Data Processor
@@ -51,7 +46,7 @@ The `Data Processor` allows for real-time processing of **Task** and **Experienc
For example, the `difficulty_based` selector requires a `pass_rate_calculator` operator to compute the agent's success rate for each task. This feedback is then used to adjust the sampling strategy.
> For more details on `Data Processor`, please refer to **Trinity-RFT**'s **[Operator Development Guide](https://github.com/agentscope-ai/Trinity-RFT/blob/main/docs/sphinx_doc/source/tutorial/develop_operator.md)**.
> For more details on `Data Processor`, please refer to **Trinity-RFT**'s **[Operator Development Guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/develop_operator.html)**.
### Configuring the Experiments
@@ -147,7 +142,9 @@ python main.py --config config_difficulty.yaml
The following results compare the performance of the `difficulty-based` selection strategy (red line, bots) against a standard `random` selection strategy (black line, random).
![Training Result Image](./training_result.jpg)
<div align="center">
<img src="./training_result.jpg" alt="Training Result Image" width="90%"/>
</div>
### Training Reward Curve