Skip to main contentCommon operational issues and debugging guidance.
Common problems and likely causes
502 errors
- Possible cause: Chat API unreachable or unhealthy behind NGINX.
- Resolution:
- Ensure the Chat API service is running:
docker service ps <chat-api-service>
- Check NGINX logs and upstream configuration to verify routing and upstream health.
Kafka lag
- Possible cause: Consumer slowdown or insufficient partition count.
- Resolution:
- Check Kafka consumer lag:
kafka-consumer-groups --describe --group <your-consumer-group> --bootstrap-server <kafka-broker>
- Increase partitions if needed:
kafka-topics --alter --partitions <new-partition-count> --topic <your-topic> --bootstrap-server <kafka-broker>
Redis eviction
- Possible cause: Memory pressure or incorrect eviction policy.
- Resolution:
- Inspect memory settings:
redis-cli config get maxmemory and redis-cli config get maxmemory-policy
- Set an eviction policy such as
redis-cli config set maxmemory-policy allkeys-lru
TiKV region errors
- Possible cause: Disk latency, resource contention, or store imbalance.
- Resolution:
- Check TiKV store status:
tiup cluster display
- Rebalance regions if needed:
tiup cluster restart <cluster-name> --force
Debugging commands
Container and Swarm diagnostics
- View container logs:
docker logs <container-name>
- Check service status:
docker service ps <service-name>
- Inspect container details:
docker inspect <container-name>
TiDB cluster status
- Display cluster status with TiUP:
tiup cluster display <cluster-name>