
Concise Summary
This project Convert plain-English questions into concise summaries over CloudWatch-style logs.
Conversational CloudWatch converts natural-language questions into structured insights over CloudWatch-style logs. It runs locally or with LocalStack, supports LLM summarization via TinyLlama, and exposes a simple FastAPI service packaged with Docker Compose for reproducible development and future AWS integration.
1. Problem & Context
The modern cloud environment produces huge amounts of logs, and it is hard to find the critical issues within a short time. Although AWS CloudWatch offers strong monitoring and alert systems, the search of trends in error logs can be complicated with complex queries or the need to go through them manually.
To deal with this, we defined a Dockerized model runner which wraps the FastAPI + TinyLlama summarization stack, which allows conversational inferences out of logs in a way similar to local logging or local AWS CloudWatch logging.
1.1 Use Case: Querying AWS CloudWatch with Natural Language
Primary Goal: Reduce incident-response time by letting engineers ask natural-language questions about logs and receive instant, summarized answers.
How It Works
- Engineers ask plain-English questions (e.g., “Did errors spike in the auth service in the last 2 hours?”).
- The FastAPI service (in its own Docker container) connects directly to AWS CloudWatch to fetch the raw logs.
- Retrieved logs are passed to the Docker model runner (Ollama) for analysis and summarization.
- The system returns a clear, concise answer with no CloudWatch syntax or Log Insights queries required.
Why This Matters
- Eliminates slow log searches and manual filtering
- Makes CloudWatch accessible to non-experts
- Supports faster debugging and better on-call efficiency
2. Solution Overview: Docker Model Runner Architecture for AWS CloudWatch
Update (v1.1.0): This version adds Guardrails AI validation for safe prompt and time-range handling, a /version endpoint for runtime introspection, and a startup health confirmation log.
The entire application is managed by Docker Compose, which acts as the Docker component runner. It orchestrates all the individual services (components) needed for the app to function.
The most critical component is the Docker model runner, which is the ollama service defined in the docker-compose.yml file. This container runs the TinyLlama large language model locally. This allows our FastAPI application to send it raw logs and receive natural-language summaries back, all without relying on an external, paid API.
The solution integrates a FastAPI backend with an Ollama-based model runner (TinyLlama). Users submit a query through Swagger UI or via POST API calls. Depending on environment settings, the system either returns deterministic summaries (mock mode) or real LLM outputs.
2.1 System Architecture ( Figure 1)

- Conversational CloudWatch v1.1.0 integrates FastAPI with Ollama’s TinyLlama model under Docker Compose. The architecture separates validation, retrieval, and summarization steps for clarity and reliability.
- The diagram shows how the FastAPI container (port 8001) interacts with the Ollama model runner (port 11434). Requests arrive at /health_status, /recipes, or /query. Validations run before the summarizer (TinyLlama) produces structured JSON outputs.
2.2 Operating Modes
The system operates in two distinct modes for different environments:
- Deterministic Local Mode: uses bundled CloudWatch-style sample logs and a rule-based summarizer to produce repeatable outputs; ideal for demos and tests without any AWS credentials.
- LocalStack: Provides AWS-like local behavior for development, allowing the system to interact with a local mock of the CloudWatch API.
- Real AWS: connects to CloudWatch with least-privilege credentials (read-only)
2.3 Guardrails
- Limit prompts to 300 characters (prevent prompt injection)
- Clamp time ranges between 5 minutes and 24 hours
- Limit response size for readability
- No secrets or PII in code; AWS access is read-only
2.4 Intergration with AWS
The system will integrate smoothly with AWS CloudWatch in read-only access so that the developers can analyze and visualize the actual log data without altering the production systems. This can be simulated in development with LocalStack and secured access can be enforced in production with least-privilege IAM credentials.
This configuration gives teams the ability to have free movement of local testing environments to live monitoring in AWS without much reconfiguration.
To continue progressing with this integration, in the future, we will implement the interactions with the CloudWatch API directly using Boto3, as well as deploy the entire application as a containerized one on AWS ECS or Cloud Run to achieve the full-scale and fully production-ready performance.
3. API Endpoint Details
The API exposes three simple endpoints under Base URL: http://localhost:8001
GET /health_status
Checks service status.
{"status":"ok"}
curl -s http://localhost:8001/health_status
POST /query
Runs a natural-language query over logs
Request schema
{
"prompt": "string (required)",
"log_group": "string (optional)",
"time_range": "string (optional)",
"mock": "boolean (optional)"
}
curl -s -X POST http://localhost:8001/query \
-H "Content-Type: application/json" \
-d '{"prompt":"show error spikes last 2h","mock":true}'
4. Environment & Prerequisites
Test Environment:
- macOS 14 / Windows 11
- Python 3.12+
- Docker Engine 29.0.1 + Docker Compose v2.29
- Optional tools: LocalStack & AWS CLI
Step-by-Step Implementation Flow (figure2)

What happens in the runtime flow:
- Request Ingress
- Requests arrive via
/health_status,/version,/recipes/{name}, or/query.
- Requests arrive via
- Guardrails Validation
- Checks prompt length (≤ 300 chars) and time-range pattern (
^\d+[smhd]$). - Ensures
USE_LLMandmockflags are properly configured. - Invalid requests are rejected with clear error messages.
- Checks prompt length (≤ 300 chars) and time-range pattern (
- Summarization
- The summarizer identifies spikes, reasons, or affected users.
- If USE_LLM=TRUE, it calls Ollama (TinyLlama) via port
11434for natural summaries. - Otherwise, deterministic summaries are generated locally.
- Response & Output
- Returns structured JSON with fields:
- Logs a startup confirmation:
INFO: Health check OK — API responding normally (startup)
5. Reproduce Locally
The following commands rebuild and launch the stack, then verify key endpoints.
docker compose build
docker compose up -d
curl -s http://localhost:8001/health_status | jq .
curl -s "http://localhost:8001/recipes/error_spikes?log_group=/aws/lambda/auth-service&time_range=2h&mock=true" | jq .
curl -s "http://localhost:8001/recipes/error_spikes?log_group=/aws/lambda/auth-service&time_range=2h&mock=false" | jq .
curl -s "http://localhost:8001/recipes/slow_queries?log_group=/aws/lambda/auth-service&time_range=4h&mock=false" | jq .
6. How to Reproduce
1. Clone the repo
git https://github.com/kubetoolsio/docker-model-runner-aws-cloudwatch.git
2. Start with Docker
docker compose up -d --build
3. Verify installation
curl -s http://localhost:8001/health_status
4. Send a test query (Real LLM Mode – Interacting with LocalStack/AWS):
curl -s -X POST http://localhost:8001/query \
-H "Content-Type: application/json" \
-d '{
"prompt": "Analyze error spikes in auth-service last 2h",
"log_group": "/aws/lambda/auth-service",
"time_range": "2h"
}'
7. Project Structure

8. Result Table
| Query Type | Expected Output | Actual Result |
| /version endpoint | App metadata (version, mode, model) | Returned (Figure 9.1) |
| docker containers | The containers up and running | Running (figure 9.2) |
| /recipes/slow_queries | Structured count + insights summary | Real LLM summarization (Figure 9.3) |
9. Evidence (Screenshots)
Figure 9.1 – Version Endpoint

Figure 9.2 – Docker Containers Running

- Docker Desktop showing both services active and healthy:
conversational-cloudwatch-app-1(FastAPI backend) andconversational-cloudwatch-ollama-1(TinyLlama model runner), confirming proper Docker Compose orchestration.
Figure 9.3- Slow Queries Recipe

- Executed with
/recipes/slow_queries?mock=false, showing real LLM summarization of database latency and timeout incidents. - It identifies
Gateway timeout,DatabaseError, andExpired tokenissues with next-step recommendations.
10. Current Limitations & Planned Improvements
Current Limitations
- Local sample / LocalStack logs only (currently); real AWS CloudWatch log ingestion is wired but still being validated end-to-end.
- Base recipe coverage only (slow_queries, error_spikes, traffic_summary)
- IAM policy exists but not fully tested
- Limited to
TinyLlamamodel for summarization
Planned Improvements
- Integrate with live AWS CloudWatch using Boto3 and expand model options
- Add additional recipes (security alerts, latency profiling)
- Expand LLM model options for improved summarization accuracy
- Deploy to Cloud Run for scalable public access
In the future, we plan to containerize the AWS data-fetching client (which we call the MCP server) into its own dedicated Docker application. This will improve scalability and better separate the data-fetching logic from the main API
11. IAM and Security Considerations
Security is a core principle of this project:
- When integrated with AWS, the system uses a least-privilege IAM policy (read-only).
- No secrets or PII are stored in containers.
- Sensitive configs reside in environment variables.
- Guardrails validation prevents prompt injection attacks
The IAM policy draft (docs/IAM_DRAFT.md) ensures CloudWatch Logs access is strictly read-only with no ability to modify or delete logs.
12. Discussion
This project demonstrates how modern LLM tools can simplify complex operational tasks. The modular architecture enables flexible deployment and consistent performance across different environments.
Key architectural decisions:
- Dockerization ensures environment parity between development and production
- TinyLlama provides accurate summaries with low resource overhead
- Guardrails AI validation adds a safety layer before LLM processing
- Separation of adapters, recipes, and summarizers enables easy extension
13. Conclusion
This project demonstrates how conversational AI can ease CloudWatch log analysis with the help of FastAPI, Docker, and TinyLlama.
Version 1.1.0 provides a safe and scalable base, guardrails, complete Docker orchestration, and recipes to expand log-analysis.
The system is scalable to AWS CloudWatch and extensions such as new models or alert recipes in the future making natural-language monitoring a bit closer to production.
14. References & Credits
- Amazon Web Services. (n.d.). Amazon CloudWatch Logs — User Guide https://docs.aws.amazon.com/
- FastAPI. (n.d.). FastAPI documentation https://fastapi.tiangolo.com/
- Docker. (n.d.). Docker documentation https://docs.docker.com/
- LocalStack. (n.d.). LocalStack documentation. https://docs.localstack.cloud/
- Uvicorn. (n.d.). Uvicorn documentation. https://www.uvicorn.org/
- Pydantic. (n.d.). Pydantic v2 documentation https://docs.pydantic.dev/
Licensing: No PII or secrets used. All code/demo is shareable under referenced open-source licenses.
15. Contributing & Getting Involved
- If you’d like to explore the source code, contribute improvements, or report issues:
- Visit the Github Repository to see the full project code
- If you find a bug, open an issue
- If you have an idea for improving the project, raise a feature request
- Check the issues page to see what others have reported
