VPN Exit Controller Troubleshooting Guide¶

This guide provides step-by-step troubleshooting procedures for common issues with the VPN Exit Controller system. Use this as your primary reference when diagnosing problems.

Quick Reference¶

Essential Commands¶

# System status
systemctl status vpn-controller
systemctl status docker
docker ps -a

# View logs
journalctl -u vpn-controller -f
docker logs vpn-api
docker logs vpn-redis

# API health check
curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/status

# Container access
pct enter 201  # From Proxmox host

Log Locations¶

System Service: journalctl -u vpn-controller
API Logs: docker logs vpn-api
Redis Logs: docker logs vpn-redis
Traefik Logs: /opt/vpn-exit-controller/traefik/logs/traefik.log
HAProxy Logs: /opt/vpn-exit-controller/proxy/logs/
VPN Node Logs: docker logs <node-container-name>
OpenVPN Logs: Inside containers at /var/log/openvpn.log

1. Node Issues¶

1.1 VPN Nodes Failing to Start¶

Symptoms: - Containers exit immediately after starting - API shows nodes as "failed" or "stopped" - Cannot connect to VPN endpoints

Common Error Messages:

ERROR: Auth file not found at /configs/auth.txt
VPN connection timeout after 30 seconds
Cannot allocate TUN/TAP dev dynamically

Diagnostic Steps:

Check container status:

docker ps -a | grep vpn-node
docker logs <container-name>

Verify auth file exists:

ls -la /opt/vpn-exit-controller/configs/auth.txt
cat /opt/vpn-exit-controller/configs/auth.txt

Check VPN configuration files:

ls -la /opt/vpn-exit-controller/configs/vpn/
# Verify country-specific configs exist
ls -la /opt/vpn-exit-controller/configs/vpn/us/

Test OpenVPN configuration manually:

# Inside a test container
docker run -it --rm --cap-add=NET_ADMIN --device=/dev/net/tun \
  -v /opt/vpn-exit-controller/configs:/configs \
  ubuntu:22.04 bash

# Then install and test OpenVPN
apt update && apt install -y openvpn
openvpn --config /configs/vpn/us.ovpn --auth-user-pass /configs/auth.txt --verb 3

Resolution Steps:

Fix auth file issues:

# Ensure auth file exists with correct format
echo "your_nordvpn_username" > /opt/vpn-exit-controller/configs/auth.txt
echo "your_nordvpn_password" >> /opt/vpn-exit-controller/configs/auth.txt
chmod 600 /opt/vpn-exit-controller/configs/auth.txt

Fix container permissions:

# Ensure Docker has proper capabilities
docker run --cap-add=NET_ADMIN --device=/dev/net/tun ...

Update NordVPN configs if outdated:

cd /opt/vpn-exit-controller
./scripts/download-nordvpn-configs.sh

1.2 NordVPN Authentication Problems¶

Symptoms: - Authentication failures in OpenVPN logs - "AUTH_FAILED" messages - Containers restart in loops

Diagnostic Steps:

Verify credentials:

cat /opt/vpn-exit-controller/configs/auth.txt
# Should contain username on line 1, password on line 2

Test credentials manually:

# Try logging into NordVPN website with same credentials

Check for service token vs regular credentials:

# NordVPN may require service credentials for OpenVPN
# Check if using regular login vs service credentials

Resolution Steps:

Use NordVPN service credentials:
Generate service credentials from NordVPN dashboard
Update auth.txt with service credentials (not regular login)

Reset authentication:

# Clear any cached auth
rm -f /opt/vpn-exit-controller/configs/auth.txt
# Recreate with correct service credentials
echo "service_username" > /opt/vpn-exit-controller/configs/auth.txt
echo "service_password" >> /opt/vpn-exit-controller/configs/auth.txt
chmod 600 /opt/vpn-exit-controller/configs/auth.txt

1.3 Tailscale Connection Issues¶

Symptoms: - Nodes start but don't appear in Tailscale admin - "tailscale up" command fails - Exit node not advertised properly

Diagnostic Steps:

Check Tailscale auth key:

# Verify auth key is set
docker exec <node-container> env | grep TAILSCALE_AUTHKEY

Check Tailscale daemon:

docker exec <node-container> tailscale status
docker exec <node-container> tailscale ping 100.73.33.11

Verify container networking:

docker exec <node-container> ip addr show
docker exec <node-container> ip route

Resolution Steps:

Generate new auth key:
Go to Tailscale admin console
Generate new auth key with exit node permissions
Update .env file with new key

Restart Tailscale in container:

docker exec <node-container> tailscale down
docker exec <node-container> tailscale up --authkey=<new-key> --advertise-exit-node

Check firewall rules:

# Ensure UDP port 41641 is accessible
iptables -L | grep 41641

1.4 Container Restart Loops¶

Symptoms: - Containers continuously restart - High CPU usage - "Restarting" status in docker ps

Diagnostic Steps:

Check restart policy:

docker inspect <container-name> | grep -A 5 RestartPolicy

Monitor restart events:

docker events --filter container=<container-name>

Check exit codes:
```
docker logs <container-name> --tail 50
```

Resolution Steps:

Identify root cause:
VPN connection failures
Tailscale authentication issues
Network connectivity problems

Temporary fix - stop restart:

docker update --restart=no <container-name>
docker stop <container-name>

Fix underlying issue and restart:

docker update --restart=unless-stopped <container-name>
docker start <container-name>

2. Network Issues¶

2.1 Proxy Connection Failures¶

Symptoms: - HTTP 502/503 errors from proxy endpoints - Timeouts when connecting through proxy - "upstream connect error" messages

Diagnostic Steps:

Check HAProxy status:

# Access HAProxy stats
curl http://localhost:8404/

Test direct container connectivity:

# Find proxy container ports
docker ps | grep proxy
# Test direct connection
curl -x localhost:3128 http://httpbin.org/ip

Check backend health:

# In HAProxy stats, look for backend server status
# Red = down, Green = up

Resolution Steps:

Restart proxy containers:

cd /opt/vpn-exit-controller/proxy
docker-compose restart

Check proxy configuration:

# Verify HAProxy config syntax
docker exec haproxy-container haproxy -c -f /usr/local/etc/haproxy/haproxy.cfg

Update backend servers:

# Use API to refresh backend configurations
curl -X POST -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/proxy/refresh

2.2 DNS Resolution Problems¶

Symptoms: - Domain names not resolving in containers - "Name resolution failed" errors - Proxy working with IPs but not domains - "Doesn't support secure connection" errors in incognito mode - HTTPS websites failing through proxy

Diagnostic Steps:

Test DNS in containers:

docker exec <container> nslookup google.com
docker exec <container> dig @8.8.8.8 google.com
docker exec <container> dig @103.86.96.100 google.com  # NordVPN DNS

Check container DNS settings:

docker exec <container> cat /etc/resolv.conf
# Should show NordVPN DNS servers first, then fallbacks

Test Tailscale DNS configuration:

docker exec <container> tailscale status --json | grep -i dns
# Should show accept-dns=false

Test host DNS:
```
nslookup google.com
dig google.com
```

Resolution Steps:

Fix VPN container DNS (Primary Fix for HTTPS errors):

# Ensure containers use NordVPN DNS with fallbacks
# This is handled automatically in entrypoint.sh:
echo "nameserver 103.86.96.100" > /etc/resolv.conf
echo "nameserver 103.86.99.100" >> /etc/resolv.conf
echo "nameserver 8.8.8.8" >> /etc/resolv.conf
echo "nameserver 1.1.1.1" >> /etc/resolv.conf

Ensure Tailscale DNS is disabled:

# In container, verify Tailscale is using --accept-dns=false
docker exec <container> ps aux | grep tailscale
# Should show --accept-dns=false in the command line

Test DNS resolution through VPN:

# Test that DNS queries go through VPN tunnel
docker exec <container> dig +trace google.com
# Should resolve through NordVPN DNS servers

Legacy system DNS fix (if needed):

# Edit /etc/systemd/resolved.conf
[Resolve]
DNS=8.8.8.8 1.1.1.1
systemctl restart systemd-resolved

Restart networking if changes made:

systemctl restart networking
systemctl restart docker

DNS Resolution Fix Explained

The recent update implements a comprehensive DNS fix that resolves HTTPS errors in incognito mode:

Root Cause: Tailscale's DNS resolution was interfering with HTTPS certificate validation
Solution: Disable Tailscale DNS (--accept-dns=false) and use NordVPN DNS servers
Implementation: Containers manually configure /etc/resolv.conf with NordVPN DNS first, Google DNS as fallback
Result: Eliminates "doesn't support secure connection" errors and improves proxy reliability

2.3 SSL Certificate Issues¶

Symptoms: - HTTPS endpoints returning certificate errors - "SSL handshake failed" messages - Browser certificate warnings

Diagnostic Steps:

Check Traefik certificate status:

# View certificate file
ls -la /opt/vpn-exit-controller/traefik/letsencrypt/acme.json
# Check Traefik logs for ACME challenges
docker logs traefik | grep -i acme

Test certificate validity:

openssl s_client -connect proxy-us.rbnk.uk:443 -servername proxy-us.rbnk.uk

Check DNS for ACME challenge:

dig _acme-challenge.proxy-us.rbnk.uk TXT

Resolution Steps:

Force certificate renewal:

# Delete existing certificates
rm /opt/vpn-exit-controller/traefik/letsencrypt/acme.json
# Restart Traefik to trigger new certificate request
docker restart traefik

Check Cloudflare API credentials:

# Verify CF_API_EMAIL and CF_API_KEY are set correctly
docker exec traefik env | grep CF_

Manual certificate debug:

# Check ACME logs in Traefik
docker logs traefik | grep -i "certificate\|acme\|error"

2.4 Port Binding Conflicts¶

Symptoms: - "Port already in use" errors - Containers failing to start - Services not accessible on expected ports

Diagnostic Steps:

Check port usage:

netstat -tulpn | grep :8080
lsof -i :8080
ss -tulpn | grep :8080

Check Docker port mappings:

docker ps --format "table {{.Names}}\t{{.Ports}}"

Identify conflicting services:

systemctl list-units --type=service --state=running | grep -E "(proxy|web|http)"

Resolution Steps:

Kill conflicting processes:

# Find and kill process using the port
sudo kill $(lsof -t -i:8080)

Change service ports:

# Update docker-compose.yml or service configuration
# Use different port numbers

Restart in correct order:

systemctl stop vpn-controller
docker-compose down
systemctl start vpn-controller

3. Service Issues¶

3.1 FastAPI Application Not Starting¶

Symptoms: - systemctl shows failed status - "Connection refused" on port 8080 - Python import errors in logs

Diagnostic Steps:

Check systemd service status:

systemctl status vpn-controller
journalctl -u vpn-controller -n 50

Test manual startup:

cd /opt/vpn-exit-controller
source venv/bin/activate
export $(grep -v '^#' .env | xargs)
cd api
python -m uvicorn main:app --host 0.0.0.0 --port 8080

Check Python environment:

/opt/vpn-exit-controller/venv/bin/python --version
/opt/vpn-exit-controller/venv/bin/pip list

Resolution Steps:

Fix Python dependencies:

cd /opt/vpn-exit-controller
source venv/bin/activate
pip install -r api/requirements.txt

Check environment variables:

# Verify .env file exists and has required variables
cat /opt/vpn-exit-controller/.env
# Required: SECRET_KEY, ADMIN_USER, ADMIN_PASS, TAILSCALE_AUTHKEY

Fix import errors:

# Ensure all Python modules are properly installed
cd /opt/vpn-exit-controller/api
python -c "import main"

3.2 Redis Connection Problems¶

Symptoms: - "Redis connection failed" in API logs - Cache-related operations failing - High latency in API responses

Diagnostic Steps:

Check Redis container:

docker ps | grep redis
docker logs vpn-redis

Test Redis connectivity:

docker exec vpn-redis redis-cli ping
redis-cli -h localhost -p 6379 ping

Check Redis configuration:

docker exec vpn-redis redis-cli CONFIG GET "*"

Resolution Steps:

Restart Redis:
```
docker restart vpn-redis
```

Clear Redis data if corrupted:

docker exec vpn-redis redis-cli FLUSHALL

Check disk space:

df -h
# Redis needs disk space for persistence

3.3 Docker Daemon Issues¶

Symptoms: - "Cannot connect to Docker daemon" errors - Docker commands hanging - Containers not starting

Diagnostic Steps:

Check Docker service:

systemctl status docker
journalctl -u docker -n 50

Test Docker functionality:

docker version
docker info
docker run hello-world

Check Docker socket:

ls -la /var/run/docker.sock
sudo chmod 666 /var/run/docker.sock  # Temporary fix

Resolution Steps:

Restart Docker:
```
systemctl restart docker
```

Clean up Docker resources:

docker system prune -a
docker volume prune

Check disk space:

df -h /var/lib/docker
# Docker needs sufficient disk space

3.4 Systemd Service Failures¶

Symptoms: - Service won't start at boot - "Failed to start" messages - Service stops unexpectedly

Diagnostic Steps:

Check service definition:

systemctl cat vpn-controller
systemctl status vpn-controller

View detailed logs:

journalctl -u vpn-controller -f
journalctl -u vpn-controller --since "1 hour ago"

Test service script manually:
```
/opt/vpn-exit-controller/start.sh
```

Resolution Steps:

Fix service dependencies:

# Ensure docker.service is running
systemctl start docker
systemctl enable docker

Update service configuration:

systemctl edit vpn-controller
# Add any necessary environment variables or dependencies

Reload and restart:

systemctl daemon-reload
systemctl restart vpn-controller
systemctl enable vpn-controller

4. Load Balancing Issues¶

4.1 Nodes Not Being Selected Properly¶

Symptoms: - All traffic going to one node - Load balancer ignoring some nodes - Uneven traffic distribution

Diagnostic Steps:

Check load balancer status:

curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/load-balancer/status

View node health:

curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/nodes

Check HAProxy backend status:

curl http://localhost:8404/
# Look for backend server weights and status

Resolution Steps:

Reset load balancer:

curl -X POST -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/load-balancer/reset

Update backend weights:

# API call to rebalance
curl -X POST -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/load-balancer/rebalance

Restart load balancing service:
```
docker restart haproxy
```

4.2 Health Check Failures¶

Symptoms: - Nodes marked as unhealthy - False positive health failures - Health checks timing out

Diagnostic Steps:

Check health check configuration:

# View HAProxy health check settings
cat /opt/vpn-exit-controller/proxy/haproxy.cfg | grep -A 5 "option httpchk"

Test health endpoints manually:

# Test individual node health
curl -H "Host: proxy-us.rbnk.uk" http://container-ip:3128/health

Check health check logs:
```
docker logs haproxy | grep -i health
```

Resolution Steps:

Adjust health check timeouts:

# In haproxy.cfg, increase timeout values
timeout check 5s

Fix health endpoint:

# Ensure containers respond properly to health checks
docker exec <container> curl localhost:3128/health

Restart health monitoring:
```
docker restart haproxy
```

4.3 Speed Test Failures¶

Symptoms: - Speed tests returning errors - Incorrect speed measurements - Speed test endpoints unreachable

Diagnostic Steps:

Test speed test API:

curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/speed-test/run/us

Check network connectivity:

# Test from container
docker exec <node-container> curl -I http://speedtest.net

View speed test logs:
```
docker logs vpn-api | grep -i "speed"
```

Resolution Steps:

Update speed test endpoints:

# Modify speed test configuration to use working endpoints

Increase timeout values:

# In speed test service, allow more time for tests

Use alternative speed test method:

# Switch to different speed testing service

4.4 Failover Not Working¶

Symptoms: - Failed nodes still receiving traffic - No automatic failover - Manual failover not working

Diagnostic Steps:

Check failover configuration:

curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/failover/status

Test failover trigger:

# Manually trigger failover
curl -X POST -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/failover/trigger/us

Check monitoring service:

docker logs vpn-api | grep -i "failover\|monitor"

Resolution Steps:

Restart monitoring service:
```
systemctl restart vpn-controller
```
Update failover thresholds:
```
# Adjust sensitivity in configuration
```

Manual failover:

# Remove failed nodes from rotation
curl -X DELETE -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/nodes/<node-id>

5. Proxy Issues¶

5.1 HAProxy Configuration Errors¶

Symptoms: - HAProxy fails to start - Configuration validation errors - Syntax errors in logs

Diagnostic Steps:

Validate configuration:

docker exec haproxy haproxy -c -f /usr/local/etc/haproxy/haproxy.cfg

Check configuration file:

cat /opt/vpn-exit-controller/proxy/haproxy.cfg

View HAProxy logs:
```
docker logs haproxy
```

Resolution Steps:

Fix syntax errors:

# Check for missing commas, brackets, quotes in haproxy.cfg
# Validate with: haproxy -c -f haproxy.cfg

Restore backup configuration:

cp /opt/vpn-exit-controller/proxy/haproxy.cfg.backup /opt/vpn-exit-controller/proxy/haproxy.cfg

Restart with corrected config:
```
docker restart haproxy
```

5.2 Traefik Routing Problems¶

Symptoms: - 404 errors on proxy domains - Requests not reaching HAProxy - Routing rules not working

Diagnostic Steps:

Check Traefik dashboard:

# Access dashboard (if enabled)
curl http://localhost:8080/dashboard/

View Traefik configuration:

cat /opt/vpn-exit-controller/traefik/traefik.yml

Check routing rules:

docker logs traefik | grep -i "router\|rule"

Resolution Steps:

Update dynamic configuration:

# Check and update files in /opt/vpn-exit-controller/traefik/dynamic/

Restart Traefik:
```
docker restart traefik
```

Verify domain DNS:

dig proxy-us.rbnk.uk
# Ensure domains point to correct IP

5.3 Country Routing Not Working¶

Symptoms: - All traffic routed to same country - Country-specific domains not working - Incorrect geolocation

Diagnostic Steps:

Test country-specific endpoints:

curl -H "Host: proxy-us.rbnk.uk" http://localhost:8080/
curl -H "Host: proxy-uk.rbnk.uk" http://localhost:8080/

Check HAProxy ACL rules:

grep -A 10 "acl is_.*_proxy" /opt/vpn-exit-controller/proxy/haproxy.cfg

Test IP geolocation:

curl -x proxy-us.rbnk.uk:8080 http://ipinfo.io/json
curl -x proxy-uk.rbnk.uk:8080 http://ipinfo.io/json

Resolution Steps:

Update HAProxy routing rules:

# Verify ACL rules and backend assignments in haproxy.cfg

Restart proxy services:

cd /opt/vpn-exit-controller/proxy
docker-compose restart

Check node assignments:

# Ensure nodes are assigned to correct countries
curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/nodes

5.4 Proxy Service Failures¶

Symptoms: - Squid or Dante proxy services not responding - Connection refused on ports 3128 or 1080 - Health check endpoint (port 8080) not responding - Container restart loops involving proxy services

Diagnostic Steps:

Check proxy service status in container:

# Check if proxy processes are running
docker exec <container> ps aux | grep -E "(squid|danted)"

# Check if ports are listening
docker exec <container> netstat -tuln | grep -E "(3128|1080|8080)"

# Test proxy services directly
docker exec <container> curl -I http://localhost:3128
docker exec <container> curl -I http://localhost:8080/health

Check proxy service logs:

# Squid logs
docker exec <container> tail -f /var/log/squid/cache.log

# Check container logs for proxy startup
docker logs <container> | grep -E "(squid|dante|health)"

Test proxy connectivity from host:

# Test HTTP proxy (replace with actual Tailscale IP)
curl -x http://100.86.140.98:3128 http://httpbin.org/ip

# Test SOCKS5 proxy
curl --socks5 100.86.140.98:1080 http://httpbin.org/ip

# Test health endpoint
curl http://100.86.140.98:8080/health

Resolution Steps:

Restart proxy services in container:

# Kill and restart Squid
docker exec <container> pkill squid
docker exec <container> squid -N -d 1 &

# Kill and restart Dante
docker exec <container> pkill danted
docker exec <container> danted -D &

# The entrypoint.sh monitoring loop will also restart them automatically

Restart entire container:

# Use API to restart node
curl -X POST -u admin:Bl4ckMagic!2345erver \
  http://localhost:8080/api/nodes/<node-id>/restart

# Or restart via Docker
docker restart <container>

Check Squid configuration:

# Verify Squid config syntax
docker exec <container> squid -k parse

# Check if Squid cache directory is initialized
docker exec <container> ls -la /var/spool/squid/

Check Dante configuration:

# Verify Dante config file exists
docker exec <container> cat /etc/danted.conf

# Check Dante listening status
docker exec <container> ss -tuln | grep 1080

5.5 Authentication Failures¶

Symptoms: - 401/403 errors on proxy requests - Authentication not working - Unauthorized access attempts

Diagnostic Steps:

Test API authentication:

curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/status

Check authentication configuration:

grep -i auth /opt/vpn-exit-controller/proxy/haproxy.cfg

View authentication logs:

docker logs haproxy | grep -i "auth\|401\|403"

Resolution Steps:

Update credentials:

# Update .env file with correct credentials
vim /opt/vpn-exit-controller/.env

Restart services:
```
systemctl restart vpn-controller
```

Clear authentication cache:

docker exec vpn-redis redis-cli FLUSHDB

6. Performance Issues¶

6.1 Slow Proxy Speeds¶

Symptoms: - High latency through proxy - Slow download/upload speeds - Timeouts on large requests

Diagnostic Steps:

Test direct vs proxy speed:

# Direct speed test
curl -o /dev/null -s -w "%{time_total}\n" http://speedtest.net/speedtest.jpg

# Proxy speed test
curl -x proxy-us.rbnk.uk:8080 -o /dev/null -s -w "%{time_total}\n" http://speedtest.net/speedtest.jpg

Check network utilization:
```
iftop -i eth0
nethogs
```
Monitor container resources:
```
docker stats
```

Resolution Steps:

Optimize HAProxy configuration:

# Increase connection limits and timeouts
maxconn 4096
timeout client 50000ms
timeout server 50000ms

Scale up resources:

# Add more CPU/memory to LXC container
# Add more VPN nodes for load distribution

Optimize VPN connections:

# Use UDP instead of TCP where possible
# Select closer VPN servers

6.2 High Latency¶

Symptoms: - Slow response times - High ping times - Delayed connections

Diagnostic Steps:

Measure latency at different points:

# Host to VPN server
ping nordvpn-server.com

# Through proxy
curl -x proxy-us.rbnk.uk:8080 -w "%{time_connect}\n" http://httpbin.org/get

Check routing:
```
traceroute -T -p 80 google.com
```
Monitor network queues:
```
ss -i | grep -E "(rto|rtt)"
```

Resolution Steps:

Select closer VPN servers:

# Use geographically closer NordVPN servers
# Update configuration to use lower-latency servers

Optimize TCP settings:

# Tune TCP congestion control
echo 'net.core.default_qdisc = fq' >> /etc/sysctl.conf
echo 'net.ipv4.tcp_congestion_control = bbr' >> /etc/sysctl.conf
sysctl -p

Reduce proxy hops:

# Minimize routing through multiple proxies
# Use direct connections where possible

6.3 Memory or CPU Issues¶

Symptoms: - High CPU usage - Out of memory errors - System slowdowns

Diagnostic Steps:

Check system resources:
```
top
htop
free -m
df -h
```
Monitor container resources:
```
docker stats --no-stream
```

Check for memory leaks:

# Monitor memory usage over time
while true; do docker stats --no-stream; sleep 30; done

Resolution Steps:

Increase LXC container resources:

# From Proxmox host
pct set 201 --memory 4096 --cores 4

Optimize container limits:

# Add resource limits to docker-compose.yml
deploy:
  resources:
    limits:
      memory: 512M
    reservations:
      memory: 256M

Clean up resources:

docker system prune -a
docker volume prune

6.4 Connection Timeouts¶

Symptoms: - Connections dropping - Timeout errors - Incomplete transfers

Diagnostic Steps:

Check timeout settings:

grep -i timeout /opt/vpn-exit-controller/proxy/haproxy.cfg

Monitor connection states:

ss -s
netstat -an | grep -E "(ESTABLISHED|TIME_WAIT|CLOSE_WAIT)" | wc -l

Test with different timeout values:

curl --connect-timeout 30 --max-time 300 -x proxy-us.rbnk.uk:8080 http://httpbin.org/delay/10

Resolution Steps:

Increase timeout values:

# In haproxy.cfg
timeout connect 10s
timeout client 60s
timeout server 60s

Optimize connection pooling:

# Adjust keep-alive settings
option http-keep-alive
timeout http-keep-alive 10s

Scale connection limits:

# Increase system limits
ulimit -n 65536
echo '* soft nofile 65536' >> /etc/security/limits.conf
echo '* hard nofile 65536' >> /etc/security/limits.conf

7. Monitoring Issues¶

7.1 Metrics Not Being Collected¶

Symptoms: - Empty metrics endpoints - No data in monitoring dashboards - Metrics API returning errors

Diagnostic Steps:

Check metrics endpoint:

curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/metrics

Verify metrics collector service:

docker logs vpn-api | grep -i "metrics"

Check Redis for metrics data:

docker exec vpn-redis redis-cli KEYS "*metrics*"

Resolution Steps:

Restart metrics collection:

curl -X POST -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/metrics/restart

Clear corrupted metrics:

docker exec vpn-redis redis-cli DEL metrics:*

Check metrics configuration:

# Verify metrics collection intervals and settings

7.2 Health Checks Failing¶

Symptoms: - All nodes showing as unhealthy - Health check endpoints not responding - False positive failures

Diagnostic Steps:

Test health endpoints manually:

# Test individual container health
docker exec <container> curl -I localhost:3128/health

Check health check configuration:

grep -A 3 "option httpchk" /opt/vpn-exit-controller/proxy/haproxy.cfg

Monitor health check frequency:

docker logs haproxy | grep -i "health\|check" | tail -20

Resolution Steps:

Adjust health check parameters:

# Increase check intervals and timeouts
inter 10s
fastinter 2s
downinter 5s

Fix health endpoint implementation:

# Ensure containers properly implement /health endpoint

Reset health check state:
```
docker restart haproxy
```

7.3 Log Analysis Techniques¶

Key Log Locations and Commands:

System-wide issues:

# System logs
journalctl -u vpn-controller -f
journalctl --since "1 hour ago" | grep -i error

# Docker logs
docker logs --tail 100 -f vpn-api

Network issues:

# HAProxy logs
docker logs haproxy | grep -E "(5xx|error|timeout)"

# Traefik logs
tail -f /opt/vpn-exit-controller/traefik/logs/traefik.log | grep -i error

VPN connection issues:

# OpenVPN logs in containers
docker exec <container> tail -f /var/log/openvpn.log

# Connection monitoring
docker exec <container> ip route show | grep tun0

Performance analysis:

# Response time analysis
docker logs haproxy | grep -oE '"[0-9]+/[0-9]+/[0-9]+/[0-9]+/[0-9]+"' | tail -100

# Error rate analysis
docker logs haproxy | grep -oE '" [0-9]{3} ' | sort | uniq -c

Emergency Recovery Procedures¶

Complete System Reset¶

If the system is completely broken, follow these steps:

Stop all services:

systemctl stop vpn-controller
docker-compose -f /opt/vpn-exit-controller/docker-compose.yml down
docker-compose -f /opt/vpn-exit-controller/proxy/docker-compose.yml down

Clean up Docker:

docker system prune -a
docker volume prune

Reset configuration:

cd /opt/vpn-exit-controller
git stash  # Save any local changes
git reset --hard HEAD  # Reset to last known good state

Restart services:

systemctl start docker
systemctl start vpn-controller

Data Recovery¶

If data is corrupted:

Backup current state:

tar -czf /tmp/vpn-backup-$(date +%Y%m%d).tar.gz /opt/vpn-exit-controller/data/

Reset Redis data:

docker exec vpn-redis redis-cli FLUSHALL

Restart with clean state:
```
systemctl restart vpn-controller
```

Network Recovery¶

If networking is broken:

Reset network interfaces:

systemctl restart networking
systemctl restart docker

Flush iptables:

iptables -F
iptables -X
iptables -t nat -F
iptables -t nat -X

Restart Tailscale:

tailscale down
tailscale up --authkey=<your-key> --advertise-exit-node

Prevention and Maintenance¶

Regular Maintenance Tasks¶

Weekly:
Check disk space: df -h
Review error logs: journalctl -u vpn-controller --since "1 week ago" | grep -i error
Test proxy endpoints: curl -x proxy-us.rbnk.uk:8080 http://httpbin.org/ip
Monthly:
Update NordVPN configurations: ./scripts/download-nordvpn-configs.sh
Clean Docker resources: docker system prune
Review and rotate Tailscale auth keys
Quarterly:
Update system packages: apt update && apt upgrade
Review and update SSL certificates
Performance optimization review

Monitoring Setup¶

Set up automated monitoring:

Log monitoring:

# Set up logrotate for container logs
# Monitor for specific error patterns

Resource monitoring:

# Set up alerts for CPU/memory usage
# Monitor disk space

Health monitoring:

# Automated health checks
# Alert on service failures

This troubleshooting guide should help you diagnose and resolve most issues with the VPN Exit Controller system. Keep it updated as you encounter new issues and solutions.