update to multi-turn conversation, fix behavior when task can definitely not be completed (#97)

This commit is contained in:
Yue Cui
2026-01-09 18:30:22 +08:00
committed by GitHub
parent 72991cdddb
commit 2bdefc9126
3 changed files with 22 additions and 31 deletions

View File

@@ -36,6 +36,9 @@ from alias.agent.agents import AliasAgentBase
from alias.agent.agents.common_agent_utils import (
WorkerResponse,
get_user_input_to_mem_pre_reply_hook,
agent_load_states_pre_reply_hook,
save_post_reasoning_state,
save_post_action_state,
)
from alias.agent.agents._build_in_helper_browser._image_understanding import (
image_understanding,
@@ -281,6 +284,11 @@ class BrowserAgent(AliasAgentBase):
# Register hooks (kwargs-only signature)
# compatible with directly using session service,
# add input msg to memory
self.register_instance_hook(
"pre_reply",
"agent_load_states_pre_reply_hook",
agent_load_states_pre_reply_hook,
)
self.register_instance_hook(
"pre_reply",
"get_user_input_to_mem_pre_reply_hook",
@@ -291,11 +299,21 @@ class BrowserAgent(AliasAgentBase):
"browser_pre_reply_hook",
browser_pre_reply_hook,
)
self.register_instance_hook(
"post_reasoning",
"save_post_reasoning_state",
save_post_reasoning_state,
)
self.register_instance_hook(
"post_acting",
"browser_post_acting_hook",
browser_post_acting_hook,
)
self.register_instance_hook(
"post_acting",
"save_post_action_state",
save_post_action_state,
)
def _register_skill_tool(
self,
@@ -1373,7 +1391,9 @@ class BrowserAgent(AliasAgentBase):
sys_prompt = (
"You are an expert in task validation. "
"Your job is to determine if the agent has completed its task"
" based on the provided summary. If the summary is `NO_ANSWER`, this task is not over. If finished, strictly reply "
" based on the provided summary. If the summary is `NO_ANSWER`, this task "
"is not over unless the task is determined as definitely not completed. "
"If finished, strictly reply "
'"BROWSER_AGENT_TASK_FINISHED" and your reason, otherwise return the remaining '
"tasks or next steps."
)

View File

@@ -1,30 +0,0 @@
## Identity and Purpose
You are an expert in evaluating the performance of a web navigation agent. The agent is designed to help a human user navigate a website to complete a task. Given the user's intent, the agent's action history, the final state of the webpage, and the agent's response to the user.
Original task:
{original_task}
Generated subtasks:
{subtask}
## Core Responsibilities
1. View the webpage, summarize content exactly relevant to the task goal.
2. Decide whether the original task and subtask goal are successful or not, respectively.
3. If the current page indicates NEW relevant progress to the task goal, the agent should output "yes" to relevant progress. Otherwise, output "no".
4. If the current state is a failure but it looks like the agent is on the right track towards success, you should also output as such.
### Action Taking Guidelines
1. The user wants to obtain certain information from the webpage, such as the information of a product, reviews, the text in a comment or post, the date of a submission, etc.
2. The agent's response must contain the information the user wants, or explicitly state that the information is not available. Otherwise, e.g. the agent encounters an exception and respond with the error content, the task is considered to be a failure.
3. It is VERY IMPORTANT that the bot response is the stop action with the correct output directly answering the original task goal and subtask goal. If the bot response is not stop (e.g., it is click, type, or goto) or only partial/intermediate results are retrived, it is considered a failure.
4. If the agent is searching the content (e.g., google), it is considered on the right track. Otherwise, if the page is showing human verification or error message, it is NOT on the right track.
#### Output Format Requirements
*IMPORTANT*
Format your response into detailed paragraphs as shown below:
Thoughts: <your summary of the current status and information that related to the task goal>
Original task status: "success" or "failure"
Subtask status: "success" or "failure"
New progress: "yes" or "no"
On the right track to success: "yes" or "no"

View File

@@ -28,6 +28,7 @@ Your goal is to complete given tasks by controlling a browser to navigate web pa
- Avoid using Google Scholar. If a researcher is searched, try to use his/her homepage instead.
- When calling `browser_type` function, set the `slow` parameter to `True` to enable slow typing simulation.
- When the answer to the task is found, call `browser_generate_final_response` to finish the process.
- If the task can definitely not be completed, call `browser_generate_final_response` to finish the process and explain why.
### Observing Guidelines
- Always take action based on the elements on the webpage. Never create urls or generate new pages.
- If the webpage is blank or error such as 404 is found, try refreshing it or go back to the previous page and find another webpage.