Skip to content

VPN Exit Controller Troubleshooting Guide

This guide provides step-by-step troubleshooting procedures for common issues with the VPN Exit Controller system. Use this as your primary reference when diagnosing problems.

Quick Reference

Essential Commands

# System status
systemctl status vpn-controller
systemctl status docker
docker ps -a

# View logs
journalctl -u vpn-controller -f
docker logs vpn-api
docker logs vpn-redis

# API health check
curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/status

# Container access
pct enter 201  # From Proxmox host

Log Locations

  • System Service: journalctl -u vpn-controller
  • API Logs: docker logs vpn-api
  • Redis Logs: docker logs vpn-redis
  • Traefik Logs: /opt/vpn-exit-controller/traefik/logs/traefik.log
  • HAProxy Logs: /opt/vpn-exit-controller/proxy/logs/
  • VPN Node Logs: docker logs <node-container-name>
  • OpenVPN Logs: Inside containers at /var/log/openvpn.log

1. Node Issues

1.1 VPN Nodes Failing to Start

Symptoms: - Containers exit immediately after starting - API shows nodes as "failed" or "stopped" - Cannot connect to VPN endpoints

Common Error Messages:

ERROR: Auth file not found at /configs/auth.txt
VPN connection timeout after 30 seconds
Cannot allocate TUN/TAP dev dynamically

Diagnostic Steps:

  1. Check container status:

    docker ps -a | grep vpn-node
    docker logs <container-name>
    

  2. Verify auth file exists:

    ls -la /opt/vpn-exit-controller/configs/auth.txt
    cat /opt/vpn-exit-controller/configs/auth.txt
    

  3. Check VPN configuration files:

    ls -la /opt/vpn-exit-controller/configs/vpn/
    # Verify country-specific configs exist
    ls -la /opt/vpn-exit-controller/configs/vpn/us/
    

  4. Test OpenVPN configuration manually:

    # Inside a test container
    docker run -it --rm --cap-add=NET_ADMIN --device=/dev/net/tun \
      -v /opt/vpn-exit-controller/configs:/configs \
      ubuntu:22.04 bash
    
    # Then install and test OpenVPN
    apt update && apt install -y openvpn
    openvpn --config /configs/vpn/us.ovpn --auth-user-pass /configs/auth.txt --verb 3
    

Resolution Steps:

  1. Fix auth file issues:

    # Ensure auth file exists with correct format
    echo "your_nordvpn_username" > /opt/vpn-exit-controller/configs/auth.txt
    echo "your_nordvpn_password" >> /opt/vpn-exit-controller/configs/auth.txt
    chmod 600 /opt/vpn-exit-controller/configs/auth.txt
    

  2. Fix container permissions:

    # Ensure Docker has proper capabilities
    docker run --cap-add=NET_ADMIN --device=/dev/net/tun ...
    

  3. Update NordVPN configs if outdated:

    cd /opt/vpn-exit-controller
    ./scripts/download-nordvpn-configs.sh
    

1.2 NordVPN Authentication Problems

Symptoms: - Authentication failures in OpenVPN logs - "AUTH_FAILED" messages - Containers restart in loops

Diagnostic Steps:

  1. Verify credentials:

    cat /opt/vpn-exit-controller/configs/auth.txt
    # Should contain username on line 1, password on line 2
    

  2. Test credentials manually:

    # Try logging into NordVPN website with same credentials
    

  3. Check for service token vs regular credentials:

    # NordVPN may require service credentials for OpenVPN
    # Check if using regular login vs service credentials
    

Resolution Steps:

  1. Use NordVPN service credentials:
  2. Generate service credentials from NordVPN dashboard
  3. Update auth.txt with service credentials (not regular login)

  4. Reset authentication:

    # Clear any cached auth
    rm -f /opt/vpn-exit-controller/configs/auth.txt
    # Recreate with correct service credentials
    echo "service_username" > /opt/vpn-exit-controller/configs/auth.txt
    echo "service_password" >> /opt/vpn-exit-controller/configs/auth.txt
    chmod 600 /opt/vpn-exit-controller/configs/auth.txt
    

1.3 Tailscale Connection Issues

Symptoms: - Nodes start but don't appear in Tailscale admin - "tailscale up" command fails - Exit node not advertised properly

Diagnostic Steps:

  1. Check Tailscale auth key:

    # Verify auth key is set
    docker exec <node-container> env | grep TAILSCALE_AUTHKEY
    

  2. Check Tailscale daemon:

    docker exec <node-container> tailscale status
    docker exec <node-container> tailscale ping 100.73.33.11
    

  3. Verify container networking:

    docker exec <node-container> ip addr show
    docker exec <node-container> ip route
    

Resolution Steps:

  1. Generate new auth key:
  2. Go to Tailscale admin console
  3. Generate new auth key with exit node permissions
  4. Update .env file with new key

  5. Restart Tailscale in container:

    docker exec <node-container> tailscale down
    docker exec <node-container> tailscale up --authkey=<new-key> --advertise-exit-node
    

  6. Check firewall rules:

    # Ensure UDP port 41641 is accessible
    iptables -L | grep 41641
    

1.4 Container Restart Loops

Symptoms: - Containers continuously restart - High CPU usage - "Restarting" status in docker ps

Diagnostic Steps:

  1. Check restart policy:

    docker inspect <container-name> | grep -A 5 RestartPolicy
    

  2. Monitor restart events:

    docker events --filter container=<container-name>
    

  3. Check exit codes:

    docker logs <container-name> --tail 50
    

Resolution Steps:

  1. Identify root cause:
  2. VPN connection failures
  3. Tailscale authentication issues
  4. Network connectivity problems

  5. Temporary fix - stop restart:

    docker update --restart=no <container-name>
    docker stop <container-name>
    

  6. Fix underlying issue and restart:

    docker update --restart=unless-stopped <container-name>
    docker start <container-name>
    


2. Network Issues

2.1 Proxy Connection Failures

Symptoms: - HTTP 502/503 errors from proxy endpoints - Timeouts when connecting through proxy - "upstream connect error" messages

Diagnostic Steps:

  1. Check HAProxy status:

    # Access HAProxy stats
    curl http://localhost:8404/
    

  2. Test direct container connectivity:

    # Find proxy container ports
    docker ps | grep proxy
    # Test direct connection
    curl -x localhost:3128 http://httpbin.org/ip
    

  3. Check backend health:

    # In HAProxy stats, look for backend server status
    # Red = down, Green = up
    

Resolution Steps:

  1. Restart proxy containers:

    cd /opt/vpn-exit-controller/proxy
    docker-compose restart
    

  2. Check proxy configuration:

    # Verify HAProxy config syntax
    docker exec haproxy-container haproxy -c -f /usr/local/etc/haproxy/haproxy.cfg
    

  3. Update backend servers:

    # Use API to refresh backend configurations
    curl -X POST -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/proxy/refresh
    

2.2 DNS Resolution Problems

Symptoms: - Domain names not resolving in containers - "Name resolution failed" errors - Proxy working with IPs but not domains - "Doesn't support secure connection" errors in incognito mode - HTTPS websites failing through proxy

Diagnostic Steps:

  1. Test DNS in containers:

    docker exec <container> nslookup google.com
    docker exec <container> dig @8.8.8.8 google.com
    docker exec <container> dig @103.86.96.100 google.com  # NordVPN DNS
    

  2. Check container DNS settings:

    docker exec <container> cat /etc/resolv.conf
    # Should show NordVPN DNS servers first, then fallbacks
    

  3. Test Tailscale DNS configuration:

    docker exec <container> tailscale status --json | grep -i dns
    # Should show accept-dns=false
    

  4. Test host DNS:

    nslookup google.com
    dig google.com
    

Resolution Steps:

  1. Fix VPN container DNS (Primary Fix for HTTPS errors):

    # Ensure containers use NordVPN DNS with fallbacks
    # This is handled automatically in entrypoint.sh:
    echo "nameserver 103.86.96.100" > /etc/resolv.conf
    echo "nameserver 103.86.99.100" >> /etc/resolv.conf
    echo "nameserver 8.8.8.8" >> /etc/resolv.conf
    echo "nameserver 1.1.1.1" >> /etc/resolv.conf
    

  2. Ensure Tailscale DNS is disabled:

    # In container, verify Tailscale is using --accept-dns=false
    docker exec <container> ps aux | grep tailscale
    # Should show --accept-dns=false in the command line
    

  3. Test DNS resolution through VPN:

    # Test that DNS queries go through VPN tunnel
    docker exec <container> dig +trace google.com
    # Should resolve through NordVPN DNS servers
    

  4. Legacy system DNS fix (if needed):

    # Edit /etc/systemd/resolved.conf
    [Resolve]
    DNS=8.8.8.8 1.1.1.1
    systemctl restart systemd-resolved
    

  5. Restart networking if changes made:

    systemctl restart networking
    systemctl restart docker
    

DNS Resolution Fix Explained

The recent update implements a comprehensive DNS fix that resolves HTTPS errors in incognito mode:

  1. Root Cause: Tailscale's DNS resolution was interfering with HTTPS certificate validation
  2. Solution: Disable Tailscale DNS (--accept-dns=false) and use NordVPN DNS servers
  3. Implementation: Containers manually configure /etc/resolv.conf with NordVPN DNS first, Google DNS as fallback
  4. Result: Eliminates "doesn't support secure connection" errors and improves proxy reliability

2.3 SSL Certificate Issues

Symptoms: - HTTPS endpoints returning certificate errors - "SSL handshake failed" messages - Browser certificate warnings

Diagnostic Steps:

  1. Check Traefik certificate status:

    # View certificate file
    ls -la /opt/vpn-exit-controller/traefik/letsencrypt/acme.json
    # Check Traefik logs for ACME challenges
    docker logs traefik | grep -i acme
    

  2. Test certificate validity:

    openssl s_client -connect proxy-us.rbnk.uk:443 -servername proxy-us.rbnk.uk
    

  3. Check DNS for ACME challenge:

    dig _acme-challenge.proxy-us.rbnk.uk TXT
    

Resolution Steps:

  1. Force certificate renewal:

    # Delete existing certificates
    rm /opt/vpn-exit-controller/traefik/letsencrypt/acme.json
    # Restart Traefik to trigger new certificate request
    docker restart traefik
    

  2. Check Cloudflare API credentials:

    # Verify CF_API_EMAIL and CF_API_KEY are set correctly
    docker exec traefik env | grep CF_
    

  3. Manual certificate debug:

    # Check ACME logs in Traefik
    docker logs traefik | grep -i "certificate\|acme\|error"
    

2.4 Port Binding Conflicts

Symptoms: - "Port already in use" errors - Containers failing to start - Services not accessible on expected ports

Diagnostic Steps:

  1. Check port usage:

    netstat -tulpn | grep :8080
    lsof -i :8080
    ss -tulpn | grep :8080
    

  2. Check Docker port mappings:

    docker ps --format "table {{.Names}}\t{{.Ports}}"
    

  3. Identify conflicting services:

    systemctl list-units --type=service --state=running | grep -E "(proxy|web|http)"
    

Resolution Steps:

  1. Kill conflicting processes:

    # Find and kill process using the port
    sudo kill $(lsof -t -i:8080)
    

  2. Change service ports:

    # Update docker-compose.yml or service configuration
    # Use different port numbers
    

  3. Restart in correct order:

    systemctl stop vpn-controller
    docker-compose down
    systemctl start vpn-controller
    


3. Service Issues

3.1 FastAPI Application Not Starting

Symptoms: - systemctl shows failed status - "Connection refused" on port 8080 - Python import errors in logs

Diagnostic Steps:

  1. Check systemd service status:

    systemctl status vpn-controller
    journalctl -u vpn-controller -n 50
    

  2. Test manual startup:

    cd /opt/vpn-exit-controller
    source venv/bin/activate
    export $(grep -v '^#' .env | xargs)
    cd api
    python -m uvicorn main:app --host 0.0.0.0 --port 8080
    

  3. Check Python environment:

    /opt/vpn-exit-controller/venv/bin/python --version
    /opt/vpn-exit-controller/venv/bin/pip list
    

Resolution Steps:

  1. Fix Python dependencies:

    cd /opt/vpn-exit-controller
    source venv/bin/activate
    pip install -r api/requirements.txt
    

  2. Check environment variables:

    # Verify .env file exists and has required variables
    cat /opt/vpn-exit-controller/.env
    # Required: SECRET_KEY, ADMIN_USER, ADMIN_PASS, TAILSCALE_AUTHKEY
    

  3. Fix import errors:

    # Ensure all Python modules are properly installed
    cd /opt/vpn-exit-controller/api
    python -c "import main"
    

3.2 Redis Connection Problems

Symptoms: - "Redis connection failed" in API logs - Cache-related operations failing - High latency in API responses

Diagnostic Steps:

  1. Check Redis container:

    docker ps | grep redis
    docker logs vpn-redis
    

  2. Test Redis connectivity:

    docker exec vpn-redis redis-cli ping
    redis-cli -h localhost -p 6379 ping
    

  3. Check Redis configuration:

    docker exec vpn-redis redis-cli CONFIG GET "*"
    

Resolution Steps:

  1. Restart Redis:

    docker restart vpn-redis
    

  2. Clear Redis data if corrupted:

    docker exec vpn-redis redis-cli FLUSHALL
    

  3. Check disk space:

    df -h
    # Redis needs disk space for persistence
    

3.3 Docker Daemon Issues

Symptoms: - "Cannot connect to Docker daemon" errors - Docker commands hanging - Containers not starting

Diagnostic Steps:

  1. Check Docker service:

    systemctl status docker
    journalctl -u docker -n 50
    

  2. Test Docker functionality:

    docker version
    docker info
    docker run hello-world
    

  3. Check Docker socket:

    ls -la /var/run/docker.sock
    sudo chmod 666 /var/run/docker.sock  # Temporary fix
    

Resolution Steps:

  1. Restart Docker:

    systemctl restart docker
    

  2. Clean up Docker resources:

    docker system prune -a
    docker volume prune
    

  3. Check disk space:

    df -h /var/lib/docker
    # Docker needs sufficient disk space
    

3.4 Systemd Service Failures

Symptoms: - Service won't start at boot - "Failed to start" messages - Service stops unexpectedly

Diagnostic Steps:

  1. Check service definition:

    systemctl cat vpn-controller
    systemctl status vpn-controller
    

  2. View detailed logs:

    journalctl -u vpn-controller -f
    journalctl -u vpn-controller --since "1 hour ago"
    

  3. Test service script manually:

    /opt/vpn-exit-controller/start.sh
    

Resolution Steps:

  1. Fix service dependencies:

    # Ensure docker.service is running
    systemctl start docker
    systemctl enable docker
    

  2. Update service configuration:

    systemctl edit vpn-controller
    # Add any necessary environment variables or dependencies
    

  3. Reload and restart:

    systemctl daemon-reload
    systemctl restart vpn-controller
    systemctl enable vpn-controller
    


4. Load Balancing Issues

4.1 Nodes Not Being Selected Properly

Symptoms: - All traffic going to one node - Load balancer ignoring some nodes - Uneven traffic distribution

Diagnostic Steps:

  1. Check load balancer status:

    curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/load-balancer/status
    

  2. View node health:

    curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/nodes
    

  3. Check HAProxy backend status:

    curl http://localhost:8404/
    # Look for backend server weights and status
    

Resolution Steps:

  1. Reset load balancer:

    curl -X POST -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/load-balancer/reset
    

  2. Update backend weights:

    # API call to rebalance
    curl -X POST -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/load-balancer/rebalance
    

  3. Restart load balancing service:

    docker restart haproxy
    

4.2 Health Check Failures

Symptoms: - Nodes marked as unhealthy - False positive health failures - Health checks timing out

Diagnostic Steps:

  1. Check health check configuration:

    # View HAProxy health check settings
    cat /opt/vpn-exit-controller/proxy/haproxy.cfg | grep -A 5 "option httpchk"
    

  2. Test health endpoints manually:

    # Test individual node health
    curl -H "Host: proxy-us.rbnk.uk" http://container-ip:3128/health
    

  3. Check health check logs:

    docker logs haproxy | grep -i health
    

Resolution Steps:

  1. Adjust health check timeouts:

    # In haproxy.cfg, increase timeout values
    timeout check 5s
    

  2. Fix health endpoint:

    # Ensure containers respond properly to health checks
    docker exec <container> curl localhost:3128/health
    

  3. Restart health monitoring:

    docker restart haproxy
    

4.3 Speed Test Failures

Symptoms: - Speed tests returning errors - Incorrect speed measurements - Speed test endpoints unreachable

Diagnostic Steps:

  1. Test speed test API:

    curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/speed-test/run/us
    

  2. Check network connectivity:

    # Test from container
    docker exec <node-container> curl -I http://speedtest.net
    

  3. View speed test logs:

    docker logs vpn-api | grep -i "speed"
    

Resolution Steps:

  1. Update speed test endpoints:

    # Modify speed test configuration to use working endpoints
    

  2. Increase timeout values:

    # In speed test service, allow more time for tests
    

  3. Use alternative speed test method:

    # Switch to different speed testing service
    

4.4 Failover Not Working

Symptoms: - Failed nodes still receiving traffic - No automatic failover - Manual failover not working

Diagnostic Steps:

  1. Check failover configuration:

    curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/failover/status
    

  2. Test failover trigger:

    # Manually trigger failover
    curl -X POST -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/failover/trigger/us
    

  3. Check monitoring service:

    docker logs vpn-api | grep -i "failover\|monitor"
    

Resolution Steps:

  1. Restart monitoring service:

    systemctl restart vpn-controller
    

  2. Update failover thresholds:

    # Adjust sensitivity in configuration
    

  3. Manual failover:

    # Remove failed nodes from rotation
    curl -X DELETE -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/nodes/<node-id>
    


5. Proxy Issues

5.1 HAProxy Configuration Errors

Symptoms: - HAProxy fails to start - Configuration validation errors - Syntax errors in logs

Diagnostic Steps:

  1. Validate configuration:

    docker exec haproxy haproxy -c -f /usr/local/etc/haproxy/haproxy.cfg
    

  2. Check configuration file:

    cat /opt/vpn-exit-controller/proxy/haproxy.cfg
    

  3. View HAProxy logs:

    docker logs haproxy
    

Resolution Steps:

  1. Fix syntax errors:

    # Check for missing commas, brackets, quotes in haproxy.cfg
    # Validate with: haproxy -c -f haproxy.cfg
    

  2. Restore backup configuration:

    cp /opt/vpn-exit-controller/proxy/haproxy.cfg.backup /opt/vpn-exit-controller/proxy/haproxy.cfg
    

  3. Restart with corrected config:

    docker restart haproxy
    

5.2 Traefik Routing Problems

Symptoms: - 404 errors on proxy domains - Requests not reaching HAProxy - Routing rules not working

Diagnostic Steps:

  1. Check Traefik dashboard:

    # Access dashboard (if enabled)
    curl http://localhost:8080/dashboard/
    

  2. View Traefik configuration:

    cat /opt/vpn-exit-controller/traefik/traefik.yml
    

  3. Check routing rules:

    docker logs traefik | grep -i "router\|rule"
    

Resolution Steps:

  1. Update dynamic configuration:

    # Check and update files in /opt/vpn-exit-controller/traefik/dynamic/
    

  2. Restart Traefik:

    docker restart traefik
    

  3. Verify domain DNS:

    dig proxy-us.rbnk.uk
    # Ensure domains point to correct IP
    

5.3 Country Routing Not Working

Symptoms: - All traffic routed to same country - Country-specific domains not working - Incorrect geolocation

Diagnostic Steps:

  1. Test country-specific endpoints:

    curl -H "Host: proxy-us.rbnk.uk" http://localhost:8080/
    curl -H "Host: proxy-uk.rbnk.uk" http://localhost:8080/
    

  2. Check HAProxy ACL rules:

    grep -A 10 "acl is_.*_proxy" /opt/vpn-exit-controller/proxy/haproxy.cfg
    

  3. Test IP geolocation:

    curl -x proxy-us.rbnk.uk:8080 http://ipinfo.io/json
    curl -x proxy-uk.rbnk.uk:8080 http://ipinfo.io/json
    

Resolution Steps:

  1. Update HAProxy routing rules:

    # Verify ACL rules and backend assignments in haproxy.cfg
    

  2. Restart proxy services:

    cd /opt/vpn-exit-controller/proxy
    docker-compose restart
    

  3. Check node assignments:

    # Ensure nodes are assigned to correct countries
    curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/nodes
    

5.4 Proxy Service Failures

Symptoms: - Squid or Dante proxy services not responding - Connection refused on ports 3128 or 1080 - Health check endpoint (port 8080) not responding - Container restart loops involving proxy services

Diagnostic Steps:

  1. Check proxy service status in container:

    # Check if proxy processes are running
    docker exec <container> ps aux | grep -E "(squid|danted)"
    
    # Check if ports are listening
    docker exec <container> netstat -tuln | grep -E "(3128|1080|8080)"
    
    # Test proxy services directly
    docker exec <container> curl -I http://localhost:3128
    docker exec <container> curl -I http://localhost:8080/health
    

  2. Check proxy service logs:

    # Squid logs
    docker exec <container> tail -f /var/log/squid/cache.log
    
    # Check container logs for proxy startup
    docker logs <container> | grep -E "(squid|dante|health)"
    

  3. Test proxy connectivity from host:

    # Test HTTP proxy (replace with actual Tailscale IP)
    curl -x http://100.86.140.98:3128 http://httpbin.org/ip
    
    # Test SOCKS5 proxy
    curl --socks5 100.86.140.98:1080 http://httpbin.org/ip
    
    # Test health endpoint
    curl http://100.86.140.98:8080/health
    

Resolution Steps:

  1. Restart proxy services in container:

    # Kill and restart Squid
    docker exec <container> pkill squid
    docker exec <container> squid -N -d 1 &
    
    # Kill and restart Dante
    docker exec <container> pkill danted
    docker exec <container> danted -D &
    
    # The entrypoint.sh monitoring loop will also restart them automatically
    

  2. Restart entire container:

    # Use API to restart node
    curl -X POST -u admin:Bl4ckMagic!2345erver \
      http://localhost:8080/api/nodes/<node-id>/restart
    
    # Or restart via Docker
    docker restart <container>
    

  3. Check Squid configuration:

    # Verify Squid config syntax
    docker exec <container> squid -k parse
    
    # Check if Squid cache directory is initialized
    docker exec <container> ls -la /var/spool/squid/
    

  4. Check Dante configuration:

    # Verify Dante config file exists
    docker exec <container> cat /etc/danted.conf
    
    # Check Dante listening status
    docker exec <container> ss -tuln | grep 1080
    

5.5 Authentication Failures

Symptoms: - 401/403 errors on proxy requests - Authentication not working - Unauthorized access attempts

Diagnostic Steps:

  1. Test API authentication:

    curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/status
    

  2. Check authentication configuration:

    grep -i auth /opt/vpn-exit-controller/proxy/haproxy.cfg
    

  3. View authentication logs:

    docker logs haproxy | grep -i "auth\|401\|403"
    

Resolution Steps:

  1. Update credentials:

    # Update .env file with correct credentials
    vim /opt/vpn-exit-controller/.env
    

  2. Restart services:

    systemctl restart vpn-controller
    

  3. Clear authentication cache:

    docker exec vpn-redis redis-cli FLUSHDB
    


6. Performance Issues

6.1 Slow Proxy Speeds

Symptoms: - High latency through proxy - Slow download/upload speeds - Timeouts on large requests

Diagnostic Steps:

  1. Test direct vs proxy speed:

    # Direct speed test
    curl -o /dev/null -s -w "%{time_total}\n" http://speedtest.net/speedtest.jpg
    
    # Proxy speed test
    curl -x proxy-us.rbnk.uk:8080 -o /dev/null -s -w "%{time_total}\n" http://speedtest.net/speedtest.jpg
    

  2. Check network utilization:

    iftop -i eth0
    nethogs
    

  3. Monitor container resources:

    docker stats
    

Resolution Steps:

  1. Optimize HAProxy configuration:

    # Increase connection limits and timeouts
    maxconn 4096
    timeout client 50000ms
    timeout server 50000ms
    

  2. Scale up resources:

    # Add more CPU/memory to LXC container
    # Add more VPN nodes for load distribution
    

  3. Optimize VPN connections:

    # Use UDP instead of TCP where possible
    # Select closer VPN servers
    

6.2 High Latency

Symptoms: - Slow response times - High ping times - Delayed connections

Diagnostic Steps:

  1. Measure latency at different points:

    # Host to VPN server
    ping nordvpn-server.com
    
    # Through proxy
    curl -x proxy-us.rbnk.uk:8080 -w "%{time_connect}\n" http://httpbin.org/get
    

  2. Check routing:

    traceroute -T -p 80 google.com
    

  3. Monitor network queues:

    ss -i | grep -E "(rto|rtt)"
    

Resolution Steps:

  1. Select closer VPN servers:

    # Use geographically closer NordVPN servers
    # Update configuration to use lower-latency servers
    

  2. Optimize TCP settings:

    # Tune TCP congestion control
    echo 'net.core.default_qdisc = fq' >> /etc/sysctl.conf
    echo 'net.ipv4.tcp_congestion_control = bbr' >> /etc/sysctl.conf
    sysctl -p
    

  3. Reduce proxy hops:

    # Minimize routing through multiple proxies
    # Use direct connections where possible
    

6.3 Memory or CPU Issues

Symptoms: - High CPU usage - Out of memory errors - System slowdowns

Diagnostic Steps:

  1. Check system resources:

    top
    htop
    free -m
    df -h
    

  2. Monitor container resources:

    docker stats --no-stream
    

  3. Check for memory leaks:

    # Monitor memory usage over time
    while true; do docker stats --no-stream; sleep 30; done
    

Resolution Steps:

  1. Increase LXC container resources:

    # From Proxmox host
    pct set 201 --memory 4096 --cores 4
    

  2. Optimize container limits:

    # Add resource limits to docker-compose.yml
    deploy:
      resources:
        limits:
          memory: 512M
        reservations:
          memory: 256M
    

  3. Clean up resources:

    docker system prune -a
    docker volume prune
    

6.4 Connection Timeouts

Symptoms: - Connections dropping - Timeout errors - Incomplete transfers

Diagnostic Steps:

  1. Check timeout settings:

    grep -i timeout /opt/vpn-exit-controller/proxy/haproxy.cfg
    

  2. Monitor connection states:

    ss -s
    netstat -an | grep -E "(ESTABLISHED|TIME_WAIT|CLOSE_WAIT)" | wc -l
    

  3. Test with different timeout values:

    curl --connect-timeout 30 --max-time 300 -x proxy-us.rbnk.uk:8080 http://httpbin.org/delay/10
    

Resolution Steps:

  1. Increase timeout values:

    # In haproxy.cfg
    timeout connect 10s
    timeout client 60s
    timeout server 60s
    

  2. Optimize connection pooling:

    # Adjust keep-alive settings
    option http-keep-alive
    timeout http-keep-alive 10s
    

  3. Scale connection limits:

    # Increase system limits
    ulimit -n 65536
    echo '* soft nofile 65536' >> /etc/security/limits.conf
    echo '* hard nofile 65536' >> /etc/security/limits.conf
    


7. Monitoring Issues

7.1 Metrics Not Being Collected

Symptoms: - Empty metrics endpoints - No data in monitoring dashboards - Metrics API returning errors

Diagnostic Steps:

  1. Check metrics endpoint:

    curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/metrics
    

  2. Verify metrics collector service:

    docker logs vpn-api | grep -i "metrics"
    

  3. Check Redis for metrics data:

    docker exec vpn-redis redis-cli KEYS "*metrics*"
    

Resolution Steps:

  1. Restart metrics collection:

    curl -X POST -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/metrics/restart
    

  2. Clear corrupted metrics:

    docker exec vpn-redis redis-cli DEL metrics:*
    

  3. Check metrics configuration:

    # Verify metrics collection intervals and settings
    

7.2 Health Checks Failing

Symptoms: - All nodes showing as unhealthy - Health check endpoints not responding - False positive failures

Diagnostic Steps:

  1. Test health endpoints manually:

    # Test individual container health
    docker exec <container> curl -I localhost:3128/health
    

  2. Check health check configuration:

    grep -A 3 "option httpchk" /opt/vpn-exit-controller/proxy/haproxy.cfg
    

  3. Monitor health check frequency:

    docker logs haproxy | grep -i "health\|check" | tail -20
    

Resolution Steps:

  1. Adjust health check parameters:

    # Increase check intervals and timeouts
    inter 10s
    fastinter 2s
    downinter 5s
    

  2. Fix health endpoint implementation:

    # Ensure containers properly implement /health endpoint
    

  3. Reset health check state:

    docker restart haproxy
    

7.3 Log Analysis Techniques

Key Log Locations and Commands:

  1. System-wide issues:

    # System logs
    journalctl -u vpn-controller -f
    journalctl --since "1 hour ago" | grep -i error
    
    # Docker logs
    docker logs --tail 100 -f vpn-api
    

  2. Network issues:

    # HAProxy logs
    docker logs haproxy | grep -E "(5xx|error|timeout)"
    
    # Traefik logs
    tail -f /opt/vpn-exit-controller/traefik/logs/traefik.log | grep -i error
    

  3. VPN connection issues:

    # OpenVPN logs in containers
    docker exec <container> tail -f /var/log/openvpn.log
    
    # Connection monitoring
    docker exec <container> ip route show | grep tun0
    

  4. Performance analysis:

    # Response time analysis
    docker logs haproxy | grep -oE '"[0-9]+/[0-9]+/[0-9]+/[0-9]+/[0-9]+"' | tail -100
    
    # Error rate analysis
    docker logs haproxy | grep -oE '" [0-9]{3} ' | sort | uniq -c
    


Emergency Recovery Procedures

Complete System Reset

If the system is completely broken, follow these steps:

  1. Stop all services:

    systemctl stop vpn-controller
    docker-compose -f /opt/vpn-exit-controller/docker-compose.yml down
    docker-compose -f /opt/vpn-exit-controller/proxy/docker-compose.yml down
    

  2. Clean up Docker:

    docker system prune -a
    docker volume prune
    

  3. Reset configuration:

    cd /opt/vpn-exit-controller
    git stash  # Save any local changes
    git reset --hard HEAD  # Reset to last known good state
    

  4. Restart services:

    systemctl start docker
    systemctl start vpn-controller
    

Data Recovery

If data is corrupted:

  1. Backup current state:

    tar -czf /tmp/vpn-backup-$(date +%Y%m%d).tar.gz /opt/vpn-exit-controller/data/
    

  2. Reset Redis data:

    docker exec vpn-redis redis-cli FLUSHALL
    

  3. Restart with clean state:

    systemctl restart vpn-controller
    

Network Recovery

If networking is broken:

  1. Reset network interfaces:

    systemctl restart networking
    systemctl restart docker
    

  2. Flush iptables:

    iptables -F
    iptables -X
    iptables -t nat -F
    iptables -t nat -X
    

  3. Restart Tailscale:

    tailscale down
    tailscale up --authkey=<your-key> --advertise-exit-node
    


Prevention and Maintenance

Regular Maintenance Tasks

  1. Weekly:
  2. Check disk space: df -h
  3. Review error logs: journalctl -u vpn-controller --since "1 week ago" | grep -i error
  4. Test proxy endpoints: curl -x proxy-us.rbnk.uk:8080 http://httpbin.org/ip

  5. Monthly:

  6. Update NordVPN configurations: ./scripts/download-nordvpn-configs.sh
  7. Clean Docker resources: docker system prune
  8. Review and rotate Tailscale auth keys

  9. Quarterly:

  10. Update system packages: apt update && apt upgrade
  11. Review and update SSL certificates
  12. Performance optimization review

Monitoring Setup

Set up automated monitoring:

  1. Log monitoring:

    # Set up logrotate for container logs
    # Monitor for specific error patterns
    

  2. Resource monitoring:

    # Set up alerts for CPU/memory usage
    # Monitor disk space
    

  3. Health monitoring:

    # Automated health checks
    # Alert on service failures
    

This troubleshooting guide should help you diagnose and resolve most issues with the VPN Exit Controller system. Keep it updated as you encounter new issues and solutions.