Add README (Chinese) for tuner (#106)

This commit is contained in:
Yuchang Sun
2026-01-20 19:46:50 +08:00
committed by GitHub
parent 311ddfff46
commit 400c1e77bf
16 changed files with 1256 additions and 90 deletions

View File

@@ -189,9 +189,13 @@ The training results show improvements in agent performance over training iterat
- **Train reward**: The average reward on training samples increases as the agent learns better strategies
- **Rollout accuracy**: The average accuracy on rollout samples increases as the agent learns better strategies
![Training Rewards](./critic_reward_mean.png)
<div align="center">
<img src="./critic_reward_mean.png" alt="Training Rewards" width="90%"/>
</div>
![Rollout Accuracy](./rollout_accuracy_mean.png)
<div align="center">
<img src="./rollout_accuracy_mean.png" alt="Rollout Accuracy" width="90%"/>
</div>
### Concrete Example

View File

@@ -189,9 +189,13 @@ async def email_search_judge(
- **训练奖励**:训练样本上的平均奖励随着智能体学习更好的策略而增加
- **Rollout 准确度**Rollout 样本上的平均准确度随着智能体学习更好的策略而增加
![Training Rewards](./critic_reward_mean.png)
<div align="center">
<img src="./critic_reward_mean.png" alt="Training Rewards" width="90%"/>
</div>
![Rollout Accuracy](./rollout_accuracy_mean.png)
<div align="center">
<img src="./rollout_accuracy_mean.png" alt="Rollout Accuracy" width="90%"/>
</div>
### 具体示例