Docker-based Model Runner for AWS CloudWatch Log Analysis

Concise Summary

This project Convert plain-English questions into concise summaries over CloudWatch-style logs.

Conversational CloudWatch converts natural-language questions into structured insights over CloudWatch-style logs. It runs locally or with LocalStack, supports LLM summarization via TinyLlama, and exposes a simple FastAPI service packaged with Docker Compose for reproducible development and future AWS integration.

1. Problem & Context

The modern cloud environment produces huge amounts of logs, and it is hard to find the critical issues within a short time. Although AWS CloudWatch offers strong monitoring and alert systems, the search of trends in error logs can be complicated with complex queries or the need to go through them manually.

To deal with this, we defined a Dockerized model runner which wraps the FastAPI + TinyLlama summarization stack, which allows conversational inferences out of logs in a way similar to local logging or local AWS CloudWatch logging.

1.1 Use Case: Querying AWS CloudWatch with Natural Language

Primary Goal: Reduce incident-response time by letting engineers ask natural-language questions about logs and receive instant, summarized answers.

How It Works

  • Engineers ask plain-English questions (e.g., “Did errors spike in the auth service in the last 2 hours?”).
  • The FastAPI service (in its own Docker container) connects directly to AWS CloudWatch to fetch the raw logs.
  • Retrieved logs are passed to the Docker model runner (Ollama) for analysis and summarization.
  • The system returns a clear, concise answer with no CloudWatch syntax or Log Insights queries required.

Why This Matters

  • Eliminates slow log searches and manual filtering
  • Makes CloudWatch accessible to non-experts
  • Supports faster debugging and better on-call efficiency

2. Solution Overview: Docker Model Runner Architecture for AWS CloudWatch

Update (v1.1.0): This version adds Guardrails AI validation for safe prompt and time-range handling, a /version endpoint for runtime introspection, and a startup health confirmation log.

The entire application is managed by Docker Compose, which acts as the Docker component runner. It orchestrates all the individual services (components) needed for the app to function.

The most critical component is the Docker model runner, which is the ollama service defined in the docker-compose.yml file. This container runs the TinyLlama large language model locally. This allows our FastAPI application to send it raw logs and receive natural-language summaries back, all without relying on an external, paid API.

The solution integrates a FastAPI backend with an Ollama-based model runner (TinyLlama). Users submit a query through Swagger UI or via POST API calls. Depending on environment settings, the system either returns deterministic summaries (mock mode) or real LLM outputs.

2.1 System Architecture ( Figure 1)

  • Conversational CloudWatch v1.1.0 integrates FastAPI with Ollama’s TinyLlama model under Docker Compose. The architecture separates validation, retrieval, and summarization steps for clarity and reliability.
  • The diagram shows how the FastAPI container (port 8001) interacts with the Ollama model runner (port 11434). Requests arrive at /health_status, /recipes, or /query. Validations run before the summarizer (TinyLlama) produces structured JSON outputs.

2.2 Operating Modes

The system operates in two distinct modes for different environments:

  • Deterministic Local Mode: uses bundled CloudWatch-style sample logs and a rule-based summarizer to produce repeatable outputs; ideal for demos and tests without any AWS credentials.
  • LocalStack: Provides AWS-like local behavior for development, allowing the system to interact with a local mock of the CloudWatch API.
  • Real AWS: connects to CloudWatch with least-privilege credentials (read-only)

2.3 Guardrails

  • Limit prompts to 300 characters (prevent prompt injection)
  • Clamp time ranges between 5 minutes and 24 hours
  • Limit response size for readability
  • No secrets or PII in code; AWS access is read-only

2.4 Intergration with AWS

The system will integrate smoothly with AWS CloudWatch in read-only access so that the developers can analyze and visualize the actual log data without altering the production systems. This can be simulated in development with LocalStack and secured access can be enforced in production with least-privilege IAM credentials.

This configuration gives teams the ability to have free movement of local testing environments to live monitoring in AWS without much reconfiguration.

To continue progressing with this integration, in the future, we will implement the interactions with the CloudWatch API directly using Boto3, as well as deploy the entire application as a containerized one on AWS ECS or Cloud Run to achieve the full-scale and fully production-ready performance.

3. API Endpoint Details

The API exposes three simple endpoints under Base URL: http://localhost:8001

GET /health_status

Checks service status.

{"status":"ok"}
curl -s http://localhost:8001/health_status

POST /query

Runs a natural-language query over logs
Request schema

{
  "prompt": "string (required)",
  "log_group": "string (optional)",
  "time_range": "string (optional)",
  "mock": "boolean (optional)"
}
curl -s -X POST http://localhost:8001/query \
 -H "Content-Type: application/json" \
 -d '{"prompt":"show error spikes last 2h","mock":true}'

4. Environment & Prerequisites

Test Environment:

  • macOS 14 / Windows 11
  • Python 3.12+
  • Docker Engine 29.0.1 + Docker Compose v2.29
  • Optional tools: LocalStack & AWS CLI

Step-by-Step Implementation Flow (figure2)

What happens in the runtime flow:

  • Request Ingress
    • Requests arrive via /health_status/version/recipes/{name}, or /query.
  • Guardrails Validation
    • Checks prompt length (≤ 300 chars) and time-range pattern (^\d+[smhd]$).
    • Ensures USE_LLM and mock flags are properly configured.
    • Invalid requests are rejected with clear error messages.
  • Summarization
    • The summarizer identifies spikes, reasons, or affected users.
    • If USE_LLM=TRUE, it calls Ollama (TinyLlama) via port 11434 for natural summaries.
    • Otherwise, deterministic summaries are generated locally.
  • Response & Output
    • Returns structured JSON with fields:
    • Logs a startup confirmation:
  • INFO: Health check OK — API responding normally (startup)

5. Reproduce Locally

The following commands rebuild and launch the stack, then verify key endpoints.

docker compose build

docker compose up -d

curl -s http://localhost:8001/health_status | jq .

curl -s "http://localhost:8001/recipes/error_spikes?log_group=/aws/lambda/auth-service&time_range=2h&mock=true"  | jq .

curl -s "http://localhost:8001/recipes/error_spikes?log_group=/aws/lambda/auth-service&time_range=2h&mock=false" | jq .

curl -s "http://localhost:8001/recipes/slow_queries?log_group=/aws/lambda/auth-service&time_range=4h&mock=false" | jq .

6. How to Reproduce

1. Clone the repo

git https://github.com/kubetoolsio/docker-model-runner-aws-cloudwatch.git

2. Start with Docker

docker compose up -d --build

3. Verify installation

curl -s http://localhost:8001/health_status

4. Send a test query (Real LLM Mode – Interacting with LocalStack/AWS):

curl -s -X POST http://localhost:8001/query \
-H "Content-Type: application/json" \
-d '{
"prompt": "Analyze error spikes in auth-service last 2h",
"log_group": "/aws/lambda/auth-service",
"time_range": "2h"
}'

7. Project Structure

8. Result Table

Query Type Expected OutputActual Result
/version endpointApp metadata (version, mode, model)Returned (Figure 9.1)
docker containersThe containers up and running Running (figure 9.2)
/recipes/slow_queriesStructured count + insights summaryReal LLM summarization (Figure 9.3)

9. Evidence (Screenshots)

Figure 9.1 – Version Endpoint

Figure 9.2 – Docker Containers Running

  • Docker Desktop showing both services active and healthy:
  • conversational-cloudwatch-app-1 (FastAPI backend) and conversational-cloudwatch-ollama-1 (TinyLlama model runner), confirming proper Docker Compose orchestration.

Figure 9.3- Slow Queries Recipe

  • Executed with /recipes/slow_queries?mock=false, showing real LLM summarization of database latency and timeout incidents.
  • It identifies Gateway timeoutDatabaseError, and Expired token issues with next-step recommendations.

10. Current Limitations & Planned Improvements

Current Limitations

  • Local sample / LocalStack logs only (currently); real AWS CloudWatch log ingestion is wired but still being validated end-to-end.
  • Base recipe coverage only (slow_queries, error_spikes, traffic_summary)
  • IAM policy exists but not fully tested
  • Limited to TinyLlama model for summarization

Planned Improvements

  • Integrate with live AWS CloudWatch using Boto3 and expand model options
  • Add additional recipes (security alerts, latency profiling)
  • Expand LLM model options for improved summarization accuracy
  • Deploy to Cloud Run for scalable public access

In the future, we plan to containerize the AWS data-fetching client (which we call the MCP server) into its own dedicated Docker application. This will improve scalability and better separate the data-fetching logic from the main API

11. IAM and Security Considerations

Security is a core principle of this project:

  • When integrated with AWS, the system uses a least-privilege IAM policy (read-only).
  • No secrets or PII are stored in containers.
  • Sensitive configs reside in environment variables.
  • Guardrails validation prevents prompt injection attacks

The IAM policy draft (docs/IAM_DRAFT.md) ensures CloudWatch Logs access is strictly read-only with no ability to modify or delete logs.

12. Discussion

This project demonstrates how modern LLM tools can simplify complex operational tasks. The modular architecture enables flexible deployment and consistent performance across different environments.

Key architectural decisions:

  • Dockerization ensures environment parity between development and production
  • TinyLlama provides accurate summaries with low resource overhead
  • Guardrails AI validation adds a safety layer before LLM processing
  • Separation of adapters, recipes, and summarizers enables easy extension

13. Conclusion

This project demonstrates how conversational AI can ease CloudWatch log analysis with the help of FastAPI, Docker, and TinyLlama.
Version 1.1.0 provides a safe and scalable base, guardrails, complete Docker orchestration, and recipes to expand log-analysis.

The system is scalable to AWS CloudWatch and extensions such as new models or alert recipes in the future making natural-language monitoring a bit closer to production.

14. References & Credits

Licensing: No PII or secrets used. All code/demo is shareable under referenced open-source licenses.

15. Contributing & Getting Involved

  • If you’d like to explore the source code, contribute improvements, or report issues:
  • Visit the Github Repository to see the full project code
  • If you find a bug, open an issue
  • If you have an idea for improving the project, raise a feature request
  • Check the issues page to see what others have reported

Authors

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *