- Vertical scaling: Increase resources (CPU, RAM, storage) on existing nodes
- Horizontal scaling: Add more service replicas or nodes to distribute load
- Capacity planning: Proactively scale based on growth projections and monitoring data
- CPU utilization consistently above 70%
- Memory usage approaching 85%
- API latency exceeding SLA targets (P95 > 100ms)
- WebSocket connection limits approaching capacity
- Database query performance degrading
Vertical scaling
Increase system resource limits and tune configurations to handle more load on existing servers. Vertical scaling is often the first step before adding more nodes. Benefits:- Simpler than horizontal scaling (no distributed system complexity)
- Immediate performance improvement
- Lower operational overhead
- Hardware limits (maximum CPU, RAM per server)
- Single point of failure remains
- Downtime required for hardware upgrades
- Raise file descriptor limits for high-concurrency workloads
- Tune kernel network queues (
somaxconn,netdev_max_backlog) - Increase worker processes and thread pools where supported
- Allocate more CPU and memory to Docker services
Configure file descriptor limits
- Edit
/etc/security/limits.confand add:
- Configure systemd defaults:
- Reboot to apply changes:
- Verify:
Allocate more resources to Docker services
Increase CPU and memory limits for services experiencing resource constraints:Horizontal scaling
Add more service replicas or nodes to distribute load across multiple servers. Horizontal scaling provides better fault tolerance and unlimited growth potential. Note: Docker Swarm supports horizontal scaling through manual commands. Unlike Kubernetes, which offers automatic scaling based on metrics (HPA/VPA), Docker Swarm requires you to manually scale services using the commands below. Monitor your metrics and scale proactively based on load patterns.Scaling application services
WebSocket Gateway:- Scaling ratio: Add ~1 replica per 1,000-1,500 peak concurrent connections (PCC)
- Command:
docker service scale websocket=5 - Considerations: Ensure load balancer distributes connections evenly; sticky sessions if needed are typically handled at the NGINX layer using IP hash or consistent hashing
- Scaling trigger: Scale out when average CPU utilization exceeds ~60%
- Command:
docker service scale chat-api=5 - Considerations: Stateless design allows unlimited horizontal scaling
- Scaling trigger: High push notification queue depth or processing delays
- Command:
docker service scale notifications=3
- Scaling trigger: Webhook delivery delays or high retry rates
- Command:
docker service scale webhooks=3
Scaling data stores
Kafka:- Scaling method: Increase partition count to improve throughput and parallelism
- Command:
- Considerations: More partitions = more parallelism, but also more overhead; balance based on workload. Avoid frequent partition changes during peak traffic to prevent rebalance storms.
- Scaling trigger: Enable Redis Cluster mode when deployments exceed ~200k MAU
- Benefits: Distributes data across multiple nodes, improves scalability and fault tolerance
- ⚠️ Warning: Redis Cluster mode is not backward-compatible with standalone Redis. Migration requires application awareness and careful testing.
- Scaling method: Add more TiKV nodes to distribute data and increase storage capacity
- Command: Add nodes to cluster using TiUP
- Considerations: TiDB automatically rebalances data across new nodes
- Scaling method: Enable sharding for horizontal data distribution
- ⚠️ Warning: Shard key selection is critical and effectively irreversible. Poor shard keys can cause uneven data distribution and performance issues.
Monitoring scaling effectiveness
After scaling, monitor these metrics to validate improvements:- CPU and memory utilization: Should decrease proportionally
- API latency: P95 and P99 should improve
- Error rates: Should remain stable or decrease
- Throughput: Requests per second should increase
- Connection counts: Should distribute evenly across replicas
When to migrate to Kubernetes
Docker Swarm is recommended for most deployments up to ~200k MAU. Consider migrating to Kubernetes when you need advanced orchestration features or exceed Swarm’s practical limits. Kubernetes migration triggers:- Scale: MAU exceeds ~200k or peak concurrent connections exceed ~20k
- Multi-region: You need active-active deployments across multiple geographic regions
- Latency requirements: Sub-50ms latency targets requiring advanced traffic management
- Autoscaling: Dynamic autoscaling based on custom metrics (HPA/VPA) is critical
- Service mesh: You need mTLS, advanced traffic routing, or observability features (Istio, Linkerd)
- Cloud-native tooling: You want to leverage Kubernetes-native tools and operators
- Unlimited horizontal scalability with automated capacity management
- Advanced autoscaling (Horizontal Pod Autoscaler, Vertical Pod Autoscaler)
- Multi-region active-active deployments with global load balancing
- Service mesh integration for mTLS and advanced traffic management
- Rich ecosystem of operators and tools (Kafka operators, database operators)
- GitOps workflows for declarative infrastructure management
- Higher operational complexity and learning curve
- More infrastructure overhead (control plane, etcd, etc.)
- Requires Kubernetes expertise on the team
- Migration effort for existing deployments
- Current or projected MAU and PCC
- Geographic distribution requirements
- Compliance requirements (GDPR, HIPAA, SOC 2)
- Existing infrastructure and Kubernetes experience
- Timeline and deployment goals