April 22, 2026 8 min read AWS · Architecture · AI

Serverless vs. Containers for AI Workloads: When to Use Which

You have an AI task. You need to run it on AWS. You’re deciding between Lambda and ECS/Fargate. The wrong choice will either cost too much or be too slow. The right choice depends on how long the task runs, how much memory it needs, whether it needs GPU, and how much you’re willing to pay for convenience.

The Key Differences

Lambda (Serverless)

How it works: You upload code. AWS manages the infrastructure. You pay per invocation and per millisecond of execution.

Limits:

Max duration: 15 minutes
Max memory: 10,240 MB (10 GB)
Max ephemeral storage: 10 GB
No GPU support
Cold start: 1–5 seconds (Python) or more (custom runtimes)

Cost model: $0.20 per 1M requests + $0.0000166667 per GB-second

ECS/Fargate (Containers)

How it works: You run Docker containers on managed EC2 infrastructure. You pay per hour (or per vCPU-second with Fargate Spot).

Advantages:

No time limit
Up to 30 GB memory (Fargate) or unlimited (EC2)
GPU support (with EC2, not Fargate)
No cold start penalty
Full control over environment

Cost model: ~$0.05–0.30 per vCPU-hour (Fargate) or ~$0.03–0.50 per vCPU-hour (EC2)

Decision Tree

Use Lambda If:

1. Task runs < 10 minutes

Invoice parsing: 30 seconds. Image recognition: 5 minutes. Resume screening: 2 minutes. These fit Lambda.

If it runs longer, Fargate is better. Lambda charges heavily for the 11–15 minute range and will timeout anyway.

2. Memory ≤ 3 GB

Lambda works for most AI tasks:

Text processing (Claude API calls): 512 MB – 1 GB
Small model inference: 1–2 GB
Document parsing: 1–3 GB

Larger models (7B+ parameters) need more memory. Use Fargate.

3. You need to start fast and don’t run often

Cold starts are expensive. Lambda has a 1–5 second cold start. Fargate has zero cold start (the task runs continuously or is pre-warmed).

If you invoke the function infrequently (a few times per hour), Lambda’s cold start is worth the operational simplicity. If you invoke it 100+ times per second, Fargate’s always-on approach might be cheaper.

4. The task is triggered by an event

S3 upload → process with Lambda. API request → Lambda. SNS message → Lambda.

Lambda integrates natively with event sources. Fargate requires you to build an event polling system.

Example: Invoice processing pipeline

S3 PUT event
    ↓
Lambda triggered (cold start: 1–3 seconds)
    ↓
Lambda invokes Claude API to parse invoice
    ↓
Store result in DynamoDB
    ↓
Lambda returns (total: 5–10 seconds)

Cost: 10,000 invoices/month = 10,000 invocations × $0.20 per 1M = $0.002 + compute

Perfect for Lambda.

Use Fargate If:

1. Task runs > 10 minutes

Model training: hours. Batch processing: 30+ minutes. Long-running data pipeline: continuously. Fargate is designed for this. Lambda will timeout at 15 minutes anyway.

2. Memory > 3 GB

Large model inference (LLaMA-70B needs 40+ GB). Processing massive files. Complex analytics on 10 GB datasets.

Fargate: “I need 16 GB.” Lambda: “Best I can do is 10 GB, and it’ll be slow and expensive.”

3. You need GPU

Training models. Real-time inference on high-volume image/video workloads. Lambda has no GPU support. ECS on EC2 gives you full GPU support (NVIDIA, etc.).

Example: Model fine-tuning job

ECS task:
  - vCPU: 4
  - Memory: 16 GB
  - GPU: 1 × NVIDIA A10
  - Duration: 4 hours (fine-tuning a 7B model)

Cost (Fargate): 4 hours × 4 vCPU × $0.04 per vCPU-hour = ~$0.64
Cost (if Lambda could run it): 4 hours × 10,240 MB × $0.0000166667 per GB-sec = ~$245

Fargate is 400x cheaper.

4. You run continuously or at high frequency

Cold start penalty kills Lambda economics. If you’re invoking a function 100+ times per second, the cumulative cold start overhead is significant. Fargate: keep the container running, handle all requests from the same process.

Real Cost Comparison

Scenario 1: Invoice Processing

Workload: 10,000 invoices/month, 5 seconds each
Lambda: $1.70/month
Fargate: $90/month (0.25 vCPU task running 24/7)
Winner: Lambda (53x cheaper)

Scenario 2: Continuous Document Processing

Workload: 200–500 documents/day, 30 seconds each
Lambda: $8.34/month
Fargate: $180/month (0.5 vCPU task running 24/7)
Winner: Lambda (22x cheaper)

Scenario 3: Model Fine-Tuning

Workload: Monthly 4-hour fine-tuning job with GPU
Lambda: Not possible (no GPU, max 15 min)
Fargate on EC2: ~$10
Winner: Fargate (only option)

The Hidden Cost: Operational Complexity

Lambda is simpler to deploy and operate. One zip file, one AWS API call, done.

Fargate requires:

Docker image (maintained, versioned, pushed to ECR)
Task definition (managed in Terraform or CloudFormation)
ECS cluster (or Fargate Spot group)
Monitoring and log management
Networking (VPC, security groups)

Bottom line: This overhead matters for small teams. If you have one engineer, Lambda’s simplicity might be worth the cost premium. If you have two engineers and run Fargate tasks regularly, the operational investment pays off.

Hybrid Approach: Best of Both Worlds

Many teams use both:

Lambda: Event-driven, short-lived tasks (API requests, S3 triggers, webhooks)
Fargate: Scheduled batch jobs, long-running pipelines, async work queues

HTTP API request
    ↓
API Gateway
    ↓
Lambda (parse request, validate input)
    ↓
If job < 5 min: invoke Claude, return immediately
If job > 5 min: queue to SQS, return job ID
    ↓
ECS task picks up from SQS, runs long job
    ↓
Writes result to S3/DynamoDB
    ↓
Client polls for result or receives webhook

The Decision Framework

Task duration?
  < 10 min → Lambda
  > 10 min → Fargate

Memory needed?
  < 3 GB → Lambda (if duration < 10 min)
  > 3 GB → Fargate

Needs GPU?
  Yes → ECS on EC2
  No  → Lambda or Fargate

Triggered by event or on-demand?
  Event-driven    → Lambda
  Batch/continuous → Fargate

Frequency?
  < 10 per minute    → Lambda
  > 100 per second   → Fargate (keep running)

What to Choose Right Now

If you’re building an AI product and your task is:

Document processing: Lambda
Data pipeline: Fargate
Real-time API: Lambda
Model fine-tuning: Fargate + EC2 + GPU
Batch analytics: Fargate
Quick inference: Lambda

Three Moons Network helps you architect for your specific workload, not the general case. Let’s talk about yours.

Get the free AI Readiness Checklist

15 questions to diagnose your team’s AI readiness, where you’ll see ROI fastest, and what to tackle first.

✓ Takes 5 minutes ✓ Actionable next steps ✓ No sales pitch

No spam. Unsubscribe anytime.

Ready to build AI that actually works?

Let’s talk about how SRE discipline transforms AI from a risky experiment into a reliable business system.

Book Your Free Discovery Call