Tessa diagnostics¶
This page documents the diagnostic capabilities Tessa offers when connected to a Grid.
Tessa doesn't just answer questions - she runs real diagnostics against your live telemetry. Every response is backed by actual data from your Grid.
How Diagnostics Work¶
When you ask Tessa a question, she selects the right diagnostic tool, queries your Grid data, and synthesizes the results into a clear answer. She can chain multiple tools together for complex investigations - for example, checking system health, then drilling into the unhealthy service, then comparing it to yesterday's performance.
Tessa queries traces, logs, and metrics through dedicated tools, often chaining several in a single response. You don't need to specify which data source to check - she selects the right tools based on your question.
Diagnostic Capabilities¶
System Health Check¶
Get an instant overview of your system's current state. Tessa reports overall system health including per-service states, error rates, the Apdex score, and the top errors across your Grid.
Try asking
- "How is my system doing?"
- "Give me a health check"
- "Is everything healthy?"
- "What's the overall status right now?"
Alert Summary¶
See the current alert status derived from your system health thresholds. Tessa pulls active and recent alerts and explains what they mean.
Try asking
- "Are there any active alerts?"
- "What alerts fired in the last 24 hours?"
- "Has anything triggered a warning today?"
Root Cause Analysis¶
Underlying tool:
GetDiagnosis
Tessa diagnoses current system issues by analyzing recent errors, grouping them by root cause, and ranking them by severity.
Try asking
- "Why is checkout slow?"
- "What's causing 500 errors on the payment service?"
- "Root cause the latency spike at 2pm"
- "Why did order-service start failing?"
Pressure Detection¶
Identify services under active strain before they fail. Tessa detects services under pressure by comparing the current window against a 30-minute baseline.
Try asking
- "Is there pressure building anywhere?"
- "Which services are under stress?"
- "Are any services approaching their limits?"
- "What's at risk of failing soon?"
Trend Analysis¶
Understand how metrics are changing over time. Tessa analyzes latency, error rate, and throughput trends by comparing recent performance against a baseline, and flags anything unusual.
Try asking
- "Show me the latency trend for auth-service"
- "How has error rate changed over the last 6 hours?"
- "Is response time getting worse on the /api/users endpoint?"
- "What's the throughput trend for the database?"
Service Dependency Map¶
Visualize how services connect and depend on each other. Tessa builds a service dependency map with per-service health, latency, error rates, and inter-service dependencies.
Try asking
- "Show me the service dependency map"
- "What services depend on order-service?"
- "Map out the request path from gateway to database"
- "What are the downstream dependencies of auth-service?"
Service Detail¶
Underlying tool:
GetServiceDetail
Drill into a single named service for a focused view: error rate, latency, top errors, top endpoints, and dependencies for that service alone.
Try asking
- "Tell me about order-service"
- "What's the current state of payment-service?"
- "Give me the details for auth-service"
Incident Timeline¶
Underlying tool:
GetIncidentTimeline
Build a chronological view of events to understand what happened and in what order. Tessa assembles a service-level event sequence - when each service first errored, spiked in latency, or degraded.
Try asking
- "Build me a timeline of events in the last hour"
- "What happened between 2pm and 3pm?"
- "Show me the sequence of events leading up to the outage"
- "What events occurred around the time errors started?"
Slowest Endpoints¶
Find the endpoints dragging down your system's performance. Tessa returns the top 20 slowest endpoints ranked by P99 latency.
Try asking
- "What are the slowest endpoints?"
- "Which API calls take the longest?"
- "Show me the top 10 slowest endpoints in the last hour"
- "Are there any endpoints with p99 latency over 2 seconds?"
Deployment Correlation¶
Underlying tool:
GetDeploymentCorrelation
Detect what changed in your system recently. Tessa surfaces new service versions, new services, new endpoints, and new hosts, so you can line changes up against performance shifts.
Try asking
- "What changed recently?"
- "Were there any deployments in the last 24 hours?"
- "Did a deployment cause this regression?"
- "What was deployed right before errors started?"
Time-Window Comparison¶
Compare two time periods to understand what changed. Tessa compares system performance between two time windows (by default, the last 15 minutes versus the same window yesterday), evaluating differences in latency, error rates, and throughput.
Try asking
- "Compare the last hour to the same time yesterday"
- "How does this morning compare to last Monday?"
- "Is performance better or worse than last week?"
- "Compare pre-deployment to post-deployment metrics"
Trace and log access¶
For trace-specific and log-specific questions, Tessa has a separate family of tools that read directly from your Grid's trace and log stores. Where the diagnostic capabilities above synthesize across data sources, these tools fetch specific traces, spans, or logs by ID or filter.
| Tool | What it does |
|---|---|
GetTraces | Find traces matching a time range or filter |
GetTrace | Fetch a single trace by ID |
GetTraceSpans | List the spans inside a specific trace |
GetTraceSpanTags | Read attributes on a specific span |
GetTraceSpanEvents | Read events recorded on a specific span |
GetTraceErrors | Pull error-related spans from a trace |
GetTraceLogs | Pull logs correlated to a trace |
GetLogs | Find logs matching a time range or filter |
GetLog | Fetch a single log entry |
GetTraceFilterFacets | List available filter values for trace queries |
GetLogFilterFacets | List available filter values for log queries |
Tessa selects from this family automatically when you ask trace- or log-specific questions.
Try asking
- "Find slow traces for the checkout flow"
- "Show me traces with errors on payment-service"
- "What does a typical request to /api/orders look like?"
- "Find traces where database calls took over 500ms"
- "Show me logs from the last hour with 'timeout' in the message"
APM context and dashboards¶
These tools surface the broader Grid context that informs Tessa's answers but are rarely invoked directly by a single user question.
| Tool | What it does |
|---|---|
GetGridInfo | Information about the current Grid (capabilities, scope) |
GetAvailableCharts | Discover what charts can be rendered |
GetRangeData | Pull metric data for a time range |
GetDashboard | Fetch a specific dashboard definition |
Context Awareness in 3D¶
In IAPM 3D, Tessa knows where you are. She tracks your current room and what you're looking at, which means you can ask contextual questions without specifying the target.
| Where You Are | What Tessa Knows |
|---|---|
| Standing near a service node | Which service you're investigating |
| Inside the Diagnostics Room | Which trace you're viewing |
| Viewing the service graph | The overall topology you see |
| Looking at a performance chart | The metric and time range in view |
Contextual shorthand
Instead of "What's the health of order-service?", you can walk up to the order-service node and just ask "What's wrong here?" - Tessa fills in the context.
Chaining Diagnostics¶
Tessa's real power shows when she chains multiple diagnostics together. A single question like "Why did checkout break after today's deployment?" might trigger:
- Deployment detection - to identify what was deployed
- Time-window comparison - to compare before and after
- Root cause analysis - to trace the chain of causation
- Trace exploration - to find specific failing traces
You don't need to run these one at a time. Ask the high-level question and let Tessa orchestrate the investigation.