Monitoring & Alerts =================== OpenClaw provides comprehensive monitoring and alerting capabilities to track system health and trading performance. Overview -------- Monitoring Components ~~~~~~~~~~~~~~~~~~~~~ * **Metrics Collection**: Performance and system metrics * **Alerting**: Real-time notifications for critical events * **Dashboards**: Visual monitoring interface * **Logging**: Structured logging for debugging * **Health Checks**: System availability monitoring Quick Start ----------- Basic Monitoring ~~~~~~~~~~~~~~~~ .. code-block:: python from openclaw.monitoring.metrics import MetricsCollector # Create collector collector = MetricsCollector() # Record metric collector.record("trade.pnl", value=150.0, tags={ "symbol": "AAPL", "strategy": "trend_following" }) # Get statistics stats = collector.get_stats("trade.pnl") print(f"Avg PnL: {stats.mean:.2f}") Setting Up Alerts ~~~~~~~~~~~~~~~~~ .. code-block:: python from openclaw.monitoring.alerts import AlertManager, AlertRule # Create alert manager alerts = AlertManager() # Define alert rule rule = AlertRule( name="high_drawdown", condition="drawdown > 0.10", severity="critical", channels=["email", "slack"] ) # Add rule alerts.add_rule(rule) # Check conditions alerts.check_all(agent_state) Metrics Collection ------------------ Built-in Metrics ~~~~~~~~~~~~~~~~ Trading Metrics: * Trade count and frequency * Win/loss ratio * Average profit/loss * Sharpe ratio * Maximum drawdown * Position sizes System Metrics: * API latency * Error rates * Decision costs * Agent survival rates * Workflow execution time Custom Metrics ~~~~~~~~~~~~~~ .. code-block:: python from openclaw.monitoring.metrics import Metric # Create custom metric custom_metric = Metric( name="custom_factor.performance", type="gauge", description="Performance of custom trading factor", unit="percent" ) # Record value custom_metric.record(15.5, tags={ "factor_name": "my_factor", "symbol": "AAPL" }) Metric Types ~~~~~~~~~~~~ **Counter**: Cumulative values (e.g., total trades) .. code-block:: python collector.increment("trades.total", tags={"symbol": "AAPL"}) **Gauge**: Point-in-time values (e.g., current balance) .. code-block:: python collector.gauge("agent.balance", value=1500.0, tags={"agent_id": "agent_001"}) **Histogram**: Distribution of values (e.g., trade PnL) .. code-block:: python collector.histogram("trade.pnl", value=100.0) **Timer**: Duration measurements (e.g., analysis time) .. code-block:: python with collector.timer("analysis.duration"): result = agent.analyze("AAPL") Alerting System --------------- Alert Rules ~~~~~~~~~~~ .. code-block:: python from openclaw.monitoring.alerts import AlertRule, AlertCondition # Create rule with multiple conditions rule = AlertRule( name="agent_distress", description="Agent is in critical condition", conditions=[ AlertCondition( metric="agent.balance", operator="less_than", threshold=300.0 ), AlertCondition( metric="agent.drawdown", operator="greater_than", threshold=0.70 ) ], severity="critical", cooldown_minutes=60 ) alerts.add_rule(rule) Alert Channels ~~~~~~~~~~~~~~ Email Alerts: .. code-block:: python from openclaw.monitoring.channels import EmailChannel email = EmailChannel( smtp_server="smtp.gmail.com", smtp_port=587, username="alerts@example.com", password="app_password" ) alerts.register_channel("email", email) Slack Alerts: .. code-block:: python from openclaw.monitoring.channels import SlackChannel slack = SlackChannel( webhook_url="https://hooks.slack.com/services/YOUR/WEBHOOK/URL" ) alerts.register_channel("slack", slack) Webhook Alerts: .. code-block:: python from openclaw.monitoring.channels import WebhookChannel webhook = WebhookChannel( url="https://api.example.com/alerts", headers={"Authorization": "Bearer token123"} ) alerts.register_channel("webhook", webhook) Alert Severity Levels ~~~~~~~~~~~~~~~~~~~~~ * **INFO**: General information, no action required * **WARNING**: Attention needed soon * **CRITICAL**: Immediate action required * **EMERGENCY**: System stopping event Dashboard --------- Web Dashboard ~~~~~~~~~~~~~ Start the monitoring dashboard: .. code-block:: bash openclaw dashboard --port 8080 Access at: http://localhost:8080 Dashboard Components: * Real-time P&L chart * Agent status overview * System health metrics * Recent alerts * Active trades * Performance statistics Custom Dashboards ~~~~~~~~~~~~~~~~~ .. code-block:: python from openclaw.dashboard.builder import DashboardBuilder builder = DashboardBuilder() # Add widgets builder.add_line_chart( title="Portfolio Value", metric="portfolio.value", time_range="1d" ) builder.add_gauge( title="Win Rate", metric="performance.win_rate", min_value=0, max_value=1 ) builder.add_table( title="Active Agents", query="SELECT * FROM agents WHERE status='active'" ) # Build dashboard dashboard = builder.build() dashboard.serve(port=8080) Logging ------- Structured Logging ~~~~~~~~~~~~~~~~~~ .. code-block:: python from openclaw.utils.logging import get_logger logger = get_logger("my_module") # Different log levels logger.debug("Debug information") logger.info("General information") logger.warning("Warning message") logger.error("Error occurred") logger.critical("Critical failure") # Structured logging logger.info("Trade executed", extra={ "symbol": "AAPL", "side": "buy", "quantity": 100, "price": 150.0 }) Log Configuration ~~~~~~~~~~~~~~~~~ .. code-block:: yaml # config/logging.yaml logging: level: INFO format: json outputs: - type: file path: /var/log/openclaw/trading.log rotation: "1 day" retention: "30 days" - type: console format: text Health Checks ------------- System Health ~~~~~~~~~~~~~ .. code-block:: python from openclaw.monitoring.health import HealthChecker health = HealthChecker() # Register checks health.add_check("database", check_database_connection) health.add_check("exchange_api", check_exchange_api) health.add_check("data_feed", check_data_feed) # Run checks status = health.check_all() if status.healthy: print("System healthy") else: for check, result in status.checks.items(): if not result.healthy: print(f"{check}: FAILED - {result.message}") Agent Health ~~~~~~~~~~~~ .. code-block:: python from openclaw.monitoring.health import AgentHealthMonitor monitor = AgentHealthMonitor() # Check agent health for agent in agents: health = monitor.check_agent(agent) if health.status == "critical": alerts.send(f"Agent {agent.agent_id} is critical") elif health.status == "struggling": logger.warning(f"Agent {agent.agent_id} is struggling") Monitoring Best Practices ------------------------- 1. **Monitor key metrics**: Focus on P&L, drawdown, and survival rates 2. **Set appropriate thresholds**: Avoid alert fatigue 3. **Use cooldown periods**: Prevent alert spam 4. **Regular health checks**: Automated system verification 5. **Centralized logging**: Aggregate logs for analysis 6. **Retention policies**: Manage data storage costs