Add README (Chinese) for tuner (#106)
This commit is contained in:
@@ -4,18 +4,13 @@ This example demonstrates how to use **AgentScope-Tuner** to enhance a math prob
|
||||
|
||||
## Task Setting
|
||||
|
||||
We use the foundational [math-agent example](https://github.com/agentscope-ai/agentscope-samples/blob/main/tuner/math_agent/main.py) as our baseline to demonstrate the data enhancement capabilities. Notably, these data-centric techniques are generic and customizable, making them adaptable to other agent workflows.
|
||||
We use the foundational [math-agent example](https://github.com/agentscope-ai/agentscope-samples/blob/main/tuner/math_agent/main.py) as our baseline. The agent is a **`ReActAgent`** that solves mathematical reasoning problems through step-by-step reasoning.
|
||||
|
||||
### Agent Goal and Type
|
||||
The agent's objective is to solve mathematical reasoning problems, learning to produce a correct final answer through a step-by-step thought process. The agent is implemented as a **`ReActAgent`**, which follows a reasoning-acting loop to solve tasks iteratively.
|
||||
|
||||
### Objective of the Data-Centric Approach
|
||||
|
||||
Training can be inefficient if tasks are too easy or too hard. This example addresses this by providing **selectors** to dynamically select tasks using **data feedback**. This empowers users to explore and implement their own data-centric strategies, such as focusing on "productively challenging" samples, to maximize training efficiency.
|
||||
Training can be inefficient if tasks are too easy or too hard. This example demonstrates how to use **task selectors** to dynamically select tasks based on **data feedback**, focusing on "productively challenging" samples to maximize training efficiency. These data-centric techniques are generic and adaptable to other agent workflows.
|
||||
|
||||
## Dataset Preparation
|
||||
|
||||
To enable difficulty-based sampling, our training data needs to include features that represent the "difficulty" of each task.
|
||||
To enable difficulty-based sampling, the training data must include difficulty features (e.g., pass rates from LLMs).
|
||||
|
||||
1. **Base Dataset**: You can use any standard math problem dataset. A good example is the math data in [LLM360/guru-RL-92k](https://huggingface.co/datasets/LLM360/guru-RL-92k), which comes pre-annotated with pass rates from different LLMs, serving as direct difficulty features.
|
||||
2. **Build Your Own Features**: If you use your own dataset, you can generate these features by pre-running several models of varying capabilities and recording their pass rates. This can be done within the [**Trinity-RFT**](https://github.com/agentscope-ai/Trinity-RFT/pull/440) framework.
|
||||
@@ -34,7 +29,7 @@ Leveraging the powerful data processing capabilities of **Trinity-RFT**, **Agent
|
||||
|
||||
#### Task Selector
|
||||
|
||||
The `Task Selector` determines how samples are selected from a dataset. It can be configured directly in `Yaml Config`.
|
||||
The `Task Selector` determines how samples are selected from a dataset. It can be configured directly in configuration YAML files.
|
||||
|
||||
- **Built-in Selectors**:
|
||||
- `sequential`: Samples are selected in a fixed order.
|
||||
@@ -43,7 +38,7 @@ The `Task Selector` determines how samples are selected from a dataset. It can b
|
||||
- `offline_easy2hard`: Samples are sorted by a predefined feature for curriculum learning.
|
||||
- `difficulty_based` (Customized): An adaptive sampler based on task difficulty.
|
||||
|
||||
> For more details on `Task Selector`, including how to implement a custom selector based on feedback signals, please refer to **Trinity-RFT**'s **[Selector Development Guide](https://github.com/agentscope-ai/Trinity-RFT/blob/main/docs/sphinx_doc/source/tutorial/develop_selector.md)**.
|
||||
> For more details on `Task Selector`, including how to implement a custom selector based on feedback signals, please refer to **Trinity-RFT**'s **[Selector Development Guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/develop_selector.html)**.
|
||||
|
||||
#### Data Processor
|
||||
|
||||
@@ -51,7 +46,7 @@ The `Data Processor` allows for real-time processing of **Task** and **Experienc
|
||||
|
||||
For example, the `difficulty_based` selector requires a `pass_rate_calculator` operator to compute the agent's success rate for each task. This feedback is then used to adjust the sampling strategy.
|
||||
|
||||
> For more details on `Data Processor`, please refer to **Trinity-RFT**'s **[Operator Development Guide](https://github.com/agentscope-ai/Trinity-RFT/blob/main/docs/sphinx_doc/source/tutorial/develop_operator.md)**.
|
||||
> For more details on `Data Processor`, please refer to **Trinity-RFT**'s **[Operator Development Guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/develop_operator.html)**.
|
||||
|
||||
|
||||
### Configuring the Experiments
|
||||
@@ -147,7 +142,9 @@ python main.py --config config_difficulty.yaml
|
||||
|
||||
The following results compare the performance of the `difficulty-based` selection strategy (red line, bots) against a standard `random` selection strategy (black line, random).
|
||||
|
||||

|
||||
<div align="center">
|
||||
<img src="./training_result.jpg" alt="Training Result Image" width="90%"/>
|
||||
</div>
|
||||
|
||||
### Training Reward Curve
|
||||
|
||||
|
||||
Reference in New Issue
Block a user