Add README (Chinese) for tuner (#106)
This commit is contained in:
@@ -189,9 +189,13 @@ The training results show improvements in agent performance over training iterat
|
||||
- **Train reward**: The average reward on training samples increases as the agent learns better strategies
|
||||
- **Rollout accuracy**: The average accuracy on rollout samples increases as the agent learns better strategies
|
||||
|
||||

|
||||
<div align="center">
|
||||
<img src="./critic_reward_mean.png" alt="Training Rewards" width="90%"/>
|
||||
</div>
|
||||
|
||||

|
||||
<div align="center">
|
||||
<img src="./rollout_accuracy_mean.png" alt="Rollout Accuracy" width="90%"/>
|
||||
</div>
|
||||
|
||||
|
||||
### Concrete Example
|
||||
|
||||
Reference in New Issue
Block a user