Add README (Chinese) for tuner (#106)

2026-01-20 19:46:50 +08:00
parent 311ddfff46
commit 400c1e77bf
16 changed files with 1256 additions and 90 deletions
--- a/tuner/data_augment/README.md
+++ b/tuner/data_augment/README.md
@@ -4,18 +4,13 @@ This example demonstrates how to use **AgentScope-Tuner** to enhance a math prob

 ## Task Setting

-We use the foundational [math-agent example](https://github.com/agentscope-ai/agentscope-samples/blob/main/tuner/math_agent/main.py) as our baseline to demonstrate the data enhancement capabilities. Notably, these data-centric techniques are generic and customizable, making them adaptable to other agent workflows.
+We use the foundational [math-agent example](https://github.com/agentscope-ai/agentscope-samples/blob/main/tuner/math_agent/main.py) as our baseline. The agent is a **`ReActAgent`** that solves mathematical reasoning problems through step-by-step reasoning.

-### Agent Goal and Type
-The agent's objective is to solve mathematical reasoning problems, learning to produce a correct final answer through a step-by-step thought process. The agent is implemented as a **`ReActAgent`**, which follows a reasoning-acting loop to solve tasks iteratively.
-
-### Objective of the Data-Centric Approach
-
-Training can be inefficient if tasks are too easy or too hard. This example addresses this by providing **selectors** to dynamically select tasks using **data feedback**. This empowers users to explore and implement their own data-centric strategies, such as focusing on "productively challenging" samples, to maximize training efficiency.
+Training can be inefficient if tasks are too easy or too hard. This example demonstrates how to use **task selectors** to dynamically select tasks based on **data feedback**, focusing on "productively challenging" samples to maximize training efficiency. These data-centric techniques are generic and adaptable to other agent workflows.

 ## Dataset Preparation

-To enable difficulty-based sampling, our training data needs to include features that represent the "difficulty" of each task.
+To enable difficulty-based sampling, the training data must include difficulty features (e.g., pass rates from LLMs).

 1.  **Base Dataset**: You can use any standard math problem dataset. A good example is the math data in [LLM360/guru-RL-92k](https://huggingface.co/datasets/LLM360/guru-RL-92k), which comes pre-annotated with pass rates from different LLMs, serving as direct difficulty features.
 2.  **Build Your Own Features**: If you use your own dataset, you can generate these features by pre-running several models of varying capabilities and recording their pass rates. This can be done within the [**Trinity-RFT**](https://github.com/agentscope-ai/Trinity-RFT/pull/440) framework.
@@ -34,7 +29,7 @@ Leveraging the powerful data processing capabilities of **Trinity-RFT**, **Agent

 #### Task Selector

-The `Task Selector` determines how samples are selected from a dataset. It can be configured directly in `Yaml Config`.
+The `Task Selector` determines how samples are selected from a dataset. It can be configured directly in configuration YAML files.

 - **Built-in Selectors**:
  - `sequential`: Samples are selected in a fixed order.
@@ -43,7 +38,7 @@ The `Task Selector` determines how samples are selected from a dataset. It can b
  - `offline_easy2hard`: Samples are sorted by a predefined feature for curriculum learning.
  - `difficulty_based` (Customized): An adaptive sampler based on task difficulty.

-> For more details on `Task Selector`, including how to implement a custom selector based on feedback signals, please refer to **Trinity-RFT**'s **[Selector Development Guide](https://github.com/agentscope-ai/Trinity-RFT/blob/main/docs/sphinx_doc/source/tutorial/develop_selector.md)**.
+> For more details on `Task Selector`, including how to implement a custom selector based on feedback signals, please refer to **Trinity-RFT**'s **[Selector Development Guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/develop_selector.html)**.

 #### Data Processor

@@ -51,7 +46,7 @@ The `Data Processor` allows for real-time processing of **Task** and **Experienc

 For example, the `difficulty_based` selector requires a `pass_rate_calculator` operator to compute the agent's success rate for each task. This feedback is then used to adjust the sampling strategy.

-> For more details on `Data Processor`, please refer to **Trinity-RFT**'s **[Operator Development Guide](https://github.com/agentscope-ai/Trinity-RFT/blob/main/docs/sphinx_doc/source/tutorial/develop_operator.md)**.
+> For more details on `Data Processor`, please refer to **Trinity-RFT**'s **[Operator Development Guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/develop_operator.html)**.


 ### Configuring the Experiments
@@ -147,7 +142,9 @@ python main.py --config config_difficulty.yaml

 The following results compare the performance of the `difficulty-based` selection strategy (red line, bots) against a standard `random` selection strategy (black line, random).

-![Training Result Image](./training_result.jpg)
+<div align="center">
+  <img src="./training_result.jpg" alt="Training Result Image" width="90%"/>
+</div>

 ### Training Reward Curve