Add README (Chinese) for tuner (#106)
This commit is contained in:
@@ -189,9 +189,13 @@ async def email_search_judge(
|
||||
- **训练奖励**:训练样本上的平均奖励随着智能体学习更好的策略而增加
|
||||
- **Rollout 准确度**:Rollout 样本上的平均准确度随着智能体学习更好的策略而增加
|
||||
|
||||

|
||||
<div align="center">
|
||||
<img src="./critic_reward_mean.png" alt="Training Rewards" width="90%"/>
|
||||
</div>
|
||||
|
||||

|
||||
<div align="center">
|
||||
<img src="./rollout_accuracy_mean.png" alt="Rollout Accuracy" width="90%"/>
|
||||
</div>
|
||||
|
||||
|
||||
### 具体示例
|
||||
|
||||
Reference in New Issue
Block a user