# VPN Exit Controller Documentation > Generated: 2025-11-22T21:44:56.062858Z > Source: https://vpn-docs.rbnk.uk > Description: Multi-Node VPN Exit Controller with dual-mode access (Tailscale exit nodes + proxy services) This documentation provides comprehensive information about the VPN Exit Controller system, including architecture, API reference, deployment guides, and operational procedures. ## Api > Endpoints ### VPN Exit Controller API Documentation ## Overview The VPN Exit Controller API provides comprehensive management of multi-node VPN exit points with intelligent load balancing, monitoring, and failover capabilities. The system uses FastAPI and runs on port 8080. **Base URL:** `http://10.10.10.20:8080` (Container IP) or `http://100.73.33.11:8080` (Tailscale IP) **API Version:** 2.0.0 **Interactive Documentation:** `/docs` (Swagger UI) or `/redoc` (ReDoc) ## Authentication All API endpoints require HTTP Basic Authentication. **Default Credentials:** - Username: `admin` - Password: `Bl4ckMagic!2345erver` **Environment Variables:** - `ADMIN_USER`: Override default username - `ADMIN_PASS`: Override default password ### Authentication Examples ``` # Using curl with basic auth curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/nodes # Using curl with explicit header curl -H "Authorization: Basic YWRtaW46Qmw0Y2tNYWdpYyEyMzQ1ZXJ2ZXI=" http://localhost:8080/api/nodes ``` ## Core API Endpoints ### 1. Node Management APIs #### List All Nodes **GET** `/api/nodes` Lists all VPN nodes and their current status. **Response:** ``` [ { "id": "vpn-us-1234567890ab", "country": "us", "status": "running", "vpn_server": "us5063.nordvpn.com", "tailscale_hostname": "vpn-us-node-1", "started_at": "2025-01-15T10:30:00Z", "vpn_connected": true, "tailscale_connected": true } ] ``` #### Start Node **POST** `/api/nodes/{country_code}/start` Starts a new VPN node for the specified country. **Parameters:** - `country_code` (path): ISO 3166-1 alpha-2 country code (e.g., "us", "uk", "de") - `server` (query, optional): Specific VPN server to use **Example:** ``` curl -X POST -u admin:Bl4ckMagic!2345erver \ "http://localhost:8080/api/nodes/us/start?server=us5063.nordvpn.com" # Start UK node curl -X POST -u admin:Bl4ckMagic!2345erver \ "http://localhost:8080/api/nodes/uk/start" ``` **Response:** ``` { "status": "starting", "node_id": "vpn-us-1234567890ab", "country": "us", "server": "us5063.nordvpn.com" } ``` #### Stop Node **DELETE** `/api/nodes/{node_id}/stop` Stops and removes a specific VPN node. **Example:** ``` curl -X DELETE -u admin:Bl4ckMagic!2345erver \ http://localhost:8080/api/nodes/vpn-us-1234567890ab/stop ``` #### Restart Node **POST** `/api/nodes/{node_id}/restart` Restarts a specific VPN node. **Example:** ``` curl -X POST -u admin:Bl4ckMagic!2345erver \ http://localhost:8080/api/nodes/vpn-us-1234567890ab/restart ``` #### Get Node Details **GET** `/api/nodes/{node_id}` Gets detailed information about a specific node. **Response:** ``` { "id": "vpn-us-1234567890ab", "country": "us", "status": "running", "container_info": { "created": "2025-01-15T10:30:00Z", "ports": {"1080/tcp": [{"HostPort": "31080"}]}, "environment": ["COUNTRY=us", "VPN_SERVER=us5063.nordvpn.com"] }, "vpn_connected": true, "tailscale_connected": true } ``` #### Get Node Logs **GET** `/api/nodes/{node_id}/logs?lines=100` Retrieves logs from a specific node. **Parameters:** - `lines` (query): Number of log lines to retrieve (1-1000, default: 100) #### Check Node Health **GET** `/api/nodes/{node_id}/health` Checks the health status of a specific node. **Response:** ``` { "node_id": "vpn-us-1234567890ab", "healthy": true, "message": "VPN and Tailscale connections active" } ``` #### Cleanup Stopped Containers **POST** `/api/nodes/cleanup` Removes all stopped VPN containers. **Response:** ``` { "status": "completed", "removed_containers": 3 } ``` ### 2. Load Balancing APIs #### Get Load Balancing Statistics **GET** `/api/load-balancer/stats` Returns current load balancing statistics and connection counts. **Response:** ``` { "total_nodes": 5, "connections_per_node": { "vpn-us-1234567890ab": 12, "vpn-uk-2345678901bc": 8 }, "strategy": "health_score", "last_updated": "2025-01-15T10:30:00Z" } ``` #### Get Best Node for Country **GET** `/api/load-balancer/best-node/{country}?strategy=health_score` Selects the optimal node for a country using the specified load balancing strategy. **Parameters:** - `strategy` (query): Load balancing strategy - `health_score` (default): Best overall health score - `least_connections`: Fewest active connections - `round_robin`: Even distribution - `weighted_latency`: Latency-based with randomization - `random`: Random selection **Example:** ``` curl -u admin:Bl4ckMagic!2345erver \ "http://localhost:8080/api/load-balancer/best-node/us?strategy=least_connections" # Get best UK node curl -u admin:Bl4ckMagic!2345erver \ "http://localhost:8080/api/load-balancer/best-node/uk?strategy=health_score" ``` **Response:** ``` { "selected_node": { "id": "vpn-us-1234567890ab", "country": "us", "proxy_url": "socks5://10.10.10.20:31080", "health_score": 95.2, "connections": 5 }, "strategy": "least_connections", "country": "us" } ``` #### Scale Up Country **POST** `/api/load-balancer/scale-up/{country}` Starts additional nodes if needed based on current load. #### Scale Down Country **POST** `/api/load-balancer/scale-down/{country}` Stops excess nodes if load is low. #### Get Available Strategies **GET** `/api/load-balancer/strategies` Lists all available load balancing strategies with descriptions. ### 3. Speed Testing APIs #### Test Node Speed **POST** `/api/speed-test/{node_id}?test_size=1MB&run_in_background=false` Runs a speed test on a specific node. **Parameters:** - `test_size` (query): Test size ("1MB" or "10MB") - `run_in_background` (query): Run asynchronously (default: false) **Example:** ``` curl -X POST -u admin:Bl4ckMagic!2345erver \ "http://localhost:8080/api/speed-test/vpn-us-1234567890ab?test_size=10MB" ``` **Response:** ``` { "node_id": "vpn-us-1234567890ab", "success": true, "download_mbps": 95.2, "upload_mbps": 45.6, "latency_ms": 23.4, "test_size": "10MB", "duration_seconds": 12.3, "tested_at": "2025-01-15T10:30:00Z" } ``` #### Test All Nodes **POST** `/api/speed-test/all?test_size=1MB&run_in_background=true` Runs speed tests on all active nodes. #### Get Latest Speed Test **GET** `/api/speed-test/{node_id}/latest` Gets the most recent speed test result for a node. #### Get Speed Test History **GET** `/api/speed-test/{node_id}/history?hours=24` Gets speed test history for a node. **Parameters:** - `hours` (query): Hours of history to retrieve (1-168, default: 24) #### Get Speed Test Summary **GET** `/api/speed-test/summary` Gets a summary of all speed test results across all nodes. #### Get Country Speed Tests **GET** `/api/speed-test/country/{country}` Gets latest speed test results for all nodes in a specific country. **Response:** ``` { "country": "us", "node_count": 3, "tested_nodes": 3, "successful_tests": 2, "avg_download_mbps": 87.5, "avg_latency_ms": 25.2, "results": { "vpn-us-1234567890ab": { "success": true, "download_mbps": 95.2, "latency_ms": 23.4 } } } ``` #### Clear Speed Test Results **DELETE** `/api/speed-test/{node_id}/results` Clears stored speed test results for a node. ### 4. Metrics and Monitoring APIs #### Get Node Metrics **GET** `/api/metrics/{node_id}?period=1h` Gets historical metrics for a specific node. **Parameters:** - `period` (query): Time period ("1h", "6h", "24h", "7d") **Response:** ``` [ { "timestamp": "2025-01-15T10:30:00Z", "cpu_percent": 15.2, "memory_mb": 256.8, "network_rx_mb": 12.4, "network_tx_mb": 8.9, "active_connections": 5, "vpn_connected": true } ] ``` #### Get Current Node Metrics **GET** `/api/metrics/{node_id}/current` Gets current real-time metrics for a specific node. #### Get All Metrics **GET** `/api/metrics/` Gets current metrics for all active nodes. #### Trigger Metrics Collection **POST** `/api/metrics/collect` Manually triggers metrics collection across all nodes. #### Get Metrics Summary **GET** `/api/metrics/stats/summary` Gets aggregated statistics across all nodes. **Response:** ``` { "total_nodes": 5, "healthy_nodes": 4, "unhealthy_nodes": 1, "total_cpu_percent": 62.8, "avg_cpu_percent": 12.6, "total_memory_mb": 1280.4, "avg_memory_mb": 256.1, "total_network_rx_mb": 45.6, "total_network_tx_mb": 32.1, "timestamp": "2025-01-15T10:30:00Z" } ``` ### 5. Proxy Management APIs #### Get All Proxy URLs **GET** `/api/proxy/urls` Gets all available proxy URLs organized by country. **Response:** ``` { "us": [ { "node_id": "vpn-us-1234567890ab", "tailscale_ip": "100.86.140.98", "proxy_urls": { "http": "http://100.86.140.98:3128", "socks5": "socks5://100.86.140.98:1080", "health": "http://100.86.140.98:8080/health" }, "health_score": 95.2 } ], "uk": [ { "node_id": "vpn-uk-2345678901bc", "tailscale_ip": "100.125.27.111", "proxy_urls": { "http": "http://100.125.27.111:3128", "socks5": "socks5://100.125.27.111:1080", "health": "http://100.125.27.111:8080/health" }, "health_score": 88.1 } ] } ``` #### Get Country Proxy URLs **GET** `/api/proxy/urls/{country}` Gets proxy URLs for a specific country. #### Get Optimal Proxy **GET** `/api/proxy/optimal/{country}?strategy=health_score` Gets the optimal proxy endpoint for a country using load balancing. **Response:** ``` { "node_id": "vpn-us-1234567890ab", "country": "us", "tailscale_ip": "100.86.140.98", "proxy_urls": { "http": "http://100.86.140.98:3128", "socks5": "socks5://100.86.140.98:1080", "health": "http://100.86.140.98:8080/health" }, "health_score": 95.2, "connections": 5, "strategy": "health_score" } ``` #### Release Proxy Connection **POST** `/api/proxy/release/{node_id}` Decrements the connection count for a proxy (for connection tracking). #### Get Proxy Statistics **GET** `/api/proxy/stats` Gets proxy system statistics and usage information. #### Update HAProxy Configuration **POST** `/api/proxy/config/update` Updates and reloads HAProxy configuration based on current nodes. #### Generate HAProxy Configuration **GET** `/api/proxy/config/generate` Generates HAProxy configuration preview without applying changes. #### Check Proxy Health **GET** `/api/proxy/health` Comprehensive health check of the proxy system and all nodes. ### 6. Failover Management APIs #### Get Failover Status **GET** `/api/failover/status` Gets current failover status and history. **Response:** ``` { "enabled": true, "total_failovers": 12, "last_failover": "2025-01-15T09:45:00Z", "failover_history": { "vpn-us-1234567890ab": [ { "timestamp": "2025-01-15T09:45:00Z", "reason": "VPN connection lost", "action": "restart", "success": true } ] } } ``` #### Trigger Failover **POST** `/api/failover/{node_id}/trigger?reason=Manual+trigger` Manually triggers failover for a specific node. #### Check All Nodes **POST** `/api/failover/check-all` Checks all nodes and triggers failover for unhealthy ones. #### Get Node Failover History **GET** `/api/failover/history/{node_id}` Gets failover history for a specific node. ### 7. Configuration APIs #### Get Available Countries **GET** `/api/config/countries` Lists all countries with available VPN configurations. **Response:** ``` ["us", "uk", "de", "jp", "ca", "au", "nl", "ch", "sg", "fr", "it", "es", "pl"] ``` #### Get Country Servers **GET** `/api/config/servers/{country_code}` Gets all available VPN servers for a specific country. **Response:** ``` [ { "hostname": "us5063.nordvpn.com", "country": "us", "health_score": 95.2, "last_tested": "2025-01-15T10:00:00Z", "avg_latency": 23.4, "is_blacklisted": false } ] ``` #### Get All Servers **GET** `/api/config/servers` Gets all available servers grouped by country. #### Health Check Server **POST** `/api/config/servers/{hostname}/health-check` Runs a health check on a specific VPN server. #### Health Check All Servers **POST** `/api/config/servers/health-check-all?country_code=us` Runs health checks on all servers (optionally filtered by country). #### Blacklist Server **POST** `/api/config/servers/{hostname}/blacklist?duration_hours=1` Temporarily blacklists a server from being used. #### Get Server Statistics **GET** `/api/config/server-stats` Gets statistics about available servers and their health. ### 8. Event System APIs #### Get Recent Events **GET** `/api/events?count=50` Gets recent system events. **Parameters:** - `count` (query): Number of events to retrieve (1-500, default: 50) **Response:** ``` [ { "timestamp": "2025-01-15T10:30:00Z", "type": "node_started", "node_id": "vpn-us-1234567890ab", "country": "us", "message": "VPN node started successfully" }, { "timestamp": "2025-01-15T10:25:00Z", "type": "speed_test_completed", "node_id": "vpn-uk-2345678901bc", "result": "95.2 Mbps download" } ] ``` #### Get Events by Type **GET** `/api/events/types/{event_type}?count=20` Gets recent events of a specific type. **Event Types:** - `node_started` - `node_stopped` - `node_failed` - `speed_test_completed` - `failover_triggered` - `health_check_failed` #### Get Container Events **GET** `/api/events/container/{container_id}?count=20` Gets recent events for a specific container/node. ### 9. Authentication APIs #### Login **POST** `/api/auth/login` Authenticates and returns a JWT token (uses HTTP Basic Auth). **Response:** ``` { "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...", "expires_at": "2025-01-16T10:30:00Z", "user": "admin" } ``` ### 10. System APIs #### Root Dashboard **GET** `/` Returns an HTML dashboard for managing VPN nodes through a web interface. #### Health Check **GET** `/health` Basic health check endpoint. **Response:** ``` { "status": "healthy", "version": "2.0.0" } ``` ## Error Handling The API uses standard HTTP status codes: - **200**: Success - **400**: Bad Request (invalid parameters) - **401**: Unauthorized (authentication required) - **404**: Not Found (resource doesn't exist) - **500**: Internal Server Error **Error Response Format:** ``` { "detail": "Error message describing what went wrong" } ``` ## Rate Limiting No explicit rate limiting is currently implemented, but it's recommended to: - Limit speed tests to avoid system overload - Space out health checks appropriately - Use background execution for long-running operations ## WebSocket Endpoints Currently, no WebSocket endpoints are implemented. All communication is via REST API with periodic polling recommended for real-time updates. ## Practical Use Cases ### Starting a VPN Node ``` # Start a US node curl -X POST -u admin:Bl4ckMagic!2345erver \ http://localhost:8080/api/nodes/us/start # Start a UK node curl -X POST -u admin:Bl4ckMagic!2345erver \ http://localhost:8080/api/nodes/uk/start # Check if they're running curl -u admin:Bl4ckMagic!2345erver \ http://localhost:8080/api/nodes ``` ### Using Modern Proxy Architecture ``` # Get best US proxy using health score curl -u admin:Bl4ckMagic!2345erver \ "http://localhost:8080/api/proxy/optimal/us?strategy=health_score" # Get best UK proxy curl -u admin:Bl4ckMagic!2345erver \ "http://localhost:8080/api/proxy/optimal/uk?strategy=health_score" # Example response with new proxy ports: # { # "node_id": "vpn-us-1234567890ab", # "country": "us", # "tailscale_ip": "100.86.140.98", # "proxy_urls": { # "http": "http://100.86.140.98:3128", # "socks5": "socks5://100.86.140.98:1080", # "health": "http://100.86.140.98:8080/health" # }, # "strategy": "health_score" # } # Use the returned proxy URLs with your application: curl -x http://100.86.140.98:3128 http://ipinfo.io/ip curl --socks5 100.86.140.98:1080 http://ipinfo.io/ip # UK proxy usage example: curl -x http://100.125.27.111:3128 http://ipinfo.io/ip curl --socks5 100.125.27.111:1080 http://ipinfo.io/ip ``` ### Monitoring System Health ``` # Check overall system metrics curl -u admin:Bl4ckMagic!2345erver \ http://localhost:8080/api/metrics/stats/summary # Get recent events curl -u admin:Bl4ckMagic!2345erver \ http://localhost:8080/api/events # Check proxy system health curl -u admin:Bl4ckMagic!2345erver \ http://localhost:8080/api/proxy/health ``` ### Running Performance Tests ``` # Test all nodes in background curl -X POST -u admin:Bl4ckMagic!2345erver \ "http://localhost:8080/api/speed-test/all?run_in_background=true" # Check results later curl -u admin:Bl4ckMagic!2345erver \ http://localhost:8080/api/speed-test/summary ``` ## Development Notes - The API is built with FastAPI and includes automatic OpenAPI documentation - All endpoints require HTTP Basic Authentication - Background tasks are used for long-running operations like speed tests - Redis is used for caching metrics and events - Docker SDK is used for container management - HAProxy integration provides load balancing capabilities For interactive API exploration, visit `/docs` or `/redoc` endpoints after authentication. --- ## Api ### API Reference Comprehensive documentation for the VPN Exit Controller REST API. ## API Overview The VPN Exit Controller API provides programmatic access to all system functions: - **RESTful Design**: Clean, predictable URL structure - **JSON Format**: All requests and responses use JSON - **HTTP Basic Auth**: Simple, secure authentication - **Comprehensive**: Full control over nodes, metrics, and configuration - **Well-Documented**: OpenAPI/Swagger specification available ## Quick Start ### Base URL ``` https://api.vpn.yourdomain.com ``` ### Authentication ``` curl -u admin:password https://api.vpn.yourdomain.com/api/nodes ``` ### Example Request ``` curl -X POST \ -u admin:password \ -H "Content-Type: application/json" \ -d '{"country": "us"}' \ https://api.vpn.yourdomain.com/api/nodes/start ``` ## API Documentation
- :material-shield-account:{ .lg .middle } __Authentication__ --- Learn about API authentication methods and security :octicons-arrow-right-24: Authentication - :material-api:{ .lg .middle } __Endpoints__ --- Complete reference for all API endpoints :octicons-arrow-right-24: API Endpoints - :material-code-json:{ .lg .middle } __Examples__ --- Code examples in multiple languages :octicons-arrow-right-24: Examples - :material-language-python:{ .lg .middle } __SDKs__ --- Official and community SDKs :octicons-arrow-right-24: SDKs
## API Categories ### Node Management Control VPN nodes - start, stop, monitor - `GET /api/nodes` - List all nodes - `POST /api/nodes/start` - Start a node - `DELETE /api/nodes/{id}` - Stop a node - `GET /api/nodes/{id}/health` - Node health ### Load Balancing Configure and query load balancing - `GET /api/load-balancer/best-node/{country}` - Get optimal node - `POST /api/load-balancer/strategy` - Set strategy - `GET /api/load-balancer/status` - Current status ### Metrics & Monitoring Access performance and health data - `GET /api/metrics` - System metrics - `GET /api/health` - Health check - `POST /api/speed-test/{id}` - Run speed test ### Configuration Manage system configuration - `GET /api/config` - Get configuration - `PUT /api/config` - Update settings - `POST /api/config/reload` - Reload config ## Response Format ### Success Response ``` { "status": "success", "data": { "id": "vpn-us-1", "country": "us", "status": "running" }, "timestamp": "2024-01-15T10:30:00Z" } ``` ### Error Response ``` { "status": "error", "error": { "code": "NODE_NOT_FOUND", "message": "No node with ID 'vpn-xyz' exists", "details": {} }, "timestamp": "2024-01-15T10:30:00Z" } ``` ## Status Codes | Code | Description | Usage | |------|-------------|-------| | 200 | OK | Successful GET/PUT | | 201 | Created | Successful POST | | 204 | No Content | Successful DELETE | | 400 | Bad Request | Invalid parameters | | 401 | Unauthorized | Missing/invalid auth | | 404 | Not Found | Resource not found | | 409 | Conflict | Resource conflict | | 429 | Too Many Requests | Rate limited | | 500 | Internal Error | Server error | ## Rate Limiting API requests are rate limited: - **Default**: 100 requests/minute - **Burst**: 20 requests/second - **Headers**: Rate limit info included ``` X-RateLimit-Limit: 100 X-RateLimit-Remaining: 95 X-RateLimit-Reset: 1642257600 ``` ## API Versioning The API uses URL versioning: - Current: `/api/v1/` (or `/api/`) - Legacy: Not applicable (v1 is first version) ## OpenAPI Specification Interactive API documentation available at: - **Swagger UI**: `https://api.vpn.yourdomain.com/api/docs` - **ReDoc**: `https://api.vpn.yourdomain.com/api/redoc` - **OpenAPI JSON**: `https://api.vpn.yourdomain.com/api/openapi.json` ## Quick Links - Authentication Guide - Set up API access - Complete Endpoint Reference - All endpoints documented - Code Examples - Copy-paste examples - SDK Documentation - Language-specific libraries --- !!! tip "Interactive Documentation" Visit `/api/docs` on your deployment for interactive API documentation with a built-in testing interface. --- ## Architecture ### Architecture Documentation Welcome to the VPN Exit Controller architecture documentation. This section provides detailed technical information about the system design, components, and infrastructure. ## Architecture Overview
- :material-server-network:{ .lg .middle } __System Overview__ --- High-level architecture and design principles :octicons-arrow-right-24: System Overview - :material-lan:{ .lg .middle } __Network Design__ --- Network topology, routing, and security layers :octicons-arrow-right-24: Network Design - :material-puzzle:{ .lg .middle } __Components__ --- Detailed component architecture and interactions :octicons-arrow-right-24: Components - :material-shield-lock:{ .lg .middle } __Security Model__ --- Security architecture and threat modeling :octicons-arrow-right-24: Security Model
## System Architecture Diagram ``` graph TB subgraph "Internet" Users[Users/Clients] Internet[Public Internet] end subgraph "Edge Layer" CF[Cloudflare DNS/CDN] PublicIP[Public IP: 135.181.60.45] end subgraph "Proxy Layer" Traefik[Traefik - SSL Termination] HAProxy[HAProxy - L4/L7 Load Balancer] end subgraph "Application Layer" API[FastAPI Application] Redis[(Redis Cache)] LB[Load Balancer Service] end subgraph "VPN Layer" Docker[Docker Engine] VPN1[VPN-US Container] VPN2[VPN-UK Container] VPN3[VPN-JP Container] end subgraph "Network Layer" Tailscale[Tailscale Mesh Network] NordVPN[NordVPN Servers] end Users --> Internet Internet --> CF CF --> PublicIP PublicIP --> Traefik Traefik --> HAProxy HAProxy --> API API --> Redis API --> LB LB --> Docker Docker --> VPN1 Docker --> VPN2 Docker --> VPN3 VPN1 --> Tailscale VPN2 --> Tailscale VPN3 --> Tailscale Tailscale --> NordVPN style Users fill:#f9f,stroke:#333,stroke-width:2px style CF fill:#ff9,stroke:#333,stroke-width:2px style Traefik fill:#9ff,stroke:#333,stroke-width:2px style HAProxy fill:#9f9,stroke:#333,stroke-width:2px style API fill:#f99,stroke:#333,stroke-width:2px ``` ## Key Design Principles ### 1. **Microservices Architecture** - Loosely coupled services - Independent scaling - Technology agnostic - API-first design ### 2. **Container-Based Infrastructure** - Docker for service isolation - Immutable infrastructure - Easy deployment and rollback - Resource efficiency ### 3. **High Availability** - No single point of failure - Automatic failover - Health monitoring - Self-healing capabilities ### 4. **Security by Design** - Zero-trust networking - End-to-end encryption - Principle of least privilege - Regular security audits ## Technology Stack | Layer | Technology | Purpose | |-------|------------|---------| | **Frontend** | React/Vue.js | Web UI (optional) | | **API** | FastAPI | REST API server | | **Proxy** | HAProxy | Load balancing | | **SSL** | Traefik | SSL termination | | **Cache** | Redis | Metrics & sessions | | **Container** | Docker | Service isolation | | **VPN** | NordVPN | Exit nodes | | **Mesh** | Tailscale | Secure networking | | **DNS** | Cloudflare | DNS & CDN | | **OS** | Ubuntu 22.04 | Host operating system | ## Performance Characteristics ### Latency Targets - API Response: < 100ms (p95) - Proxy Overhead: < 10ms - Health Check: < 5s - Failover Time: < 30s ### Throughput - API Requests: 10,000 req/s - Proxy Connections: 50,000 concurrent - VPN Bandwidth: 1 Gbps per node - Redis Operations: 100,000 ops/s ### Scalability - Horizontal scaling for VPN nodes - API instances: 1-100 - VPN nodes per country: 1-10 - Total supported countries: 25+ ## Infrastructure Requirements ### Minimum Deployment ``` ┌─────────────────────────┐ │ Single VM/Server │ │ - 4 CPU cores │ │ - 8GB RAM │ │ - 50GB SSD │ │ - 1 Gbps network │ └─────────────────────────┘ ``` ### Production Deployment ``` ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ API Servers │ │ VPN Node Pool │ │ Monitoring │ │ - 3x instances │ │ - 5x servers │ │ - Prometheus │ │ - Load balanced│ │ - Geographic │ │ - Grafana │ │ - Auto-scaling │ │ - Auto-scaling │ │ - AlertManager │ └─────────────────┘ └─────────────────┘ └─────────────────┘ ``` ## Component Communication ``` sequenceDiagram participant User participant Cloudflare participant Traefik participant HAProxy participant API participant Redis participant Docker participant VPN Node participant Tailscale participant NordVPN User->>Cloudflare: HTTPS Request Cloudflare->>Traefik: Forward Request Traefik->>HAProxy: Proxy Request HAProxy->>API: Load Balanced Request API->>Redis: Check Metrics Redis-->>API: Return Data API->>Docker: Start VPN Container Docker->>VPN Node: Create Container VPN Node->>Tailscale: Register Node VPN Node->>NordVPN: Connect VPN NordVPN-->>User: Proxy Traffic ``` ## Data Flow Architecture ### Request Flow 1. User connects to proxy URL (e.g., proxy-us.rbnk.uk) 2. Cloudflare resolves DNS and forwards to server 3. Traefik handles SSL termination 4. HAProxy routes to appropriate backend 5. Request proxied through VPN node 6. Response returned through same path ### Metrics Flow 1. VPN nodes report health metrics 2. API collects and stores in Redis 3. Load balancer uses metrics for decisions 4. Monitoring systems query metrics API 5. Alerts triggered on thresholds ## Next Steps
- :material-book-open-variant:{ .lg .middle } __Deep Dive__ --- Explore detailed component documentation :octicons-arrow-right-24: Components - :material-security:{ .lg .middle } __Security__ --- Understand the security architecture :octicons-arrow-right-24: Security Model - :material-server:{ .lg .middle } __Deployment__ --- Deploy the architecture :octicons-arrow-right-24: Deployment Guide
--- !!! info "Architecture Decisions" For detailed architecture decision records (ADRs) and design rationale, see our ADR documentation. --- ## Architecture > Overview ### VPN Exit Controller - Technical Architecture ## Overview The VPN Exit Controller is a sophisticated system that manages dynamic country-based VPN exit nodes using Tailscale mesh networking, Docker containers, and intelligent load balancing. The system provides HTTP/HTTPS proxy services through country-specific subdomains, enabling users to route traffic through different geographical locations. ## System Architecture Diagram ``` Internet → Cloudflare → Proxmox LXC → Traefik → HAProxy → VPN Exit Nodes ↓ ↓ ↓ ↓ ↓ ↓ Users DNS/CDN Host System SSL Term Routing Tailscale+VPN (10.10.10.20) (Port 443) (Port 8080) (100.x.x.x) ``` ### Network Flow Detail ``` 1. User Request: https://proxy-us.rbnk.uk │ 2. Cloudflare DNS Resolution: 135.181.60.45 │ 3. Proxmox Host: 135.181.60.45:443 │ 4. Traefik (LXC 201): SSL termination + routing │ 5. HAProxy: Country-based backend selection │ 6. VPN Exit Node: Docker container with NordVPN + Tailscale │ 7. Final destination via NordVPN servers ``` ## Core Components ### 1. FastAPI Application (`/opt/vpn-exit-controller/api/`) The central orchestration service built with FastAPI that manages the entire VPN exit node ecosystem. **Key Features:** - RESTful API for node management - Web-based dashboard with real-time status - Authentication using HTTP Basic Auth - Background services for monitoring and metrics **Structure:** ``` api/ ├── main.py # FastAPI application entry point ├── routes/ # API route handlers │ ├── nodes.py # Node management endpoints │ ├── proxy.py # Proxy configuration endpoints │ ├── load_balancer.py # Load balancing control │ ├── metrics.py # Metrics and monitoring │ └── failover.py # Failover management └── services/ # Business logic services ├── docker_manager.py # Docker container orchestration ├── proxy_manager.py # HAProxy configuration management ├── load_balancer.py # Intelligent node selection ├── redis_manager.py # State and metrics storage └── metrics_collector.py # Real-time metrics collection ``` ### 2. Docker-based VPN Exit Nodes Each VPN node runs in a dedicated Docker container combining NordVPN and Tailscale. **Container Architecture:** ``` FROM ubuntu:22.04 RUN apt-get install openvpn tailscale iptables COPY entrypoint.sh /entrypoint.sh ENTRYPOINT ["/entrypoint.sh"] ``` **Node Lifecycle:** 1. Container starts with country-specific environment variables 2. OpenVPN connects to optimal NordVPN server for the country 3. Tailscale connects to mesh network as exit node 4. IP forwarding rules enable traffic routing 5. Health monitoring ensures connectivity **Resource Limits:** - Memory: 512MB per container - CPU: 50% of one core - Swap: 1GB total (memory + swap) ### 3. Traefik SSL Termination and Reverse Proxy Traefik handles SSL certificate management and initial request routing. **Configuration:** - SSL certificates via Let's Encrypt + Cloudflare DNS challenge - Automatic certificate renewal - Security headers middleware - Docker provider for service discovery **Key Features:** - Wildcard SSL certificate for `*.rbnk.uk` - Automatic service discovery through Docker labels - Prometheus metrics export - Dashboard at `traefik-vpn.rbnk.uk` ### 4. HAProxy Country-based Routing System HAProxy provides intelligent country-based request routing and load balancing. **Routing Logic:** ``` Request: https://proxy-us.rbnk.uk/path ↓ HAProxy ACL: hdr(host) -i proxy-us.rbnk.uk ↓ Backend Selection: proxy_us ↓ Server Selection: Load balancing among US nodes ``` **Backend Configuration:** - Round-robin load balancing per country - Health checks every 10 seconds - Automatic failover to backup servers - Dynamic configuration updates **Health Monitoring:** ``` GET /health HTTP/1.1 Host: proxy-{country}.rbnk.uk Expected: 200 OK ``` ### 5. Redis Metrics and State Storage Redis serves as the central data store for real-time metrics, connection tracking, and system state. **Data Structure:** ``` node:{node_id} # Node metadata and configuration metrics:{node_id}:current # Real-time node metrics metrics:{node_id}:history # Historical metrics (1 hour window) connections:{node_id} # Active connection counter server_health:{server} # VPN server health and latency ``` **Metrics Tracked:** - CPU usage percentage - Memory usage in MB - Network I/O statistics - VPN connection status - Tailscale connectivity - Active proxy connections ## VPN Node Architecture ### Container Design Each VPN exit node is a self-contained Docker container that provides secure routing through a specific country with integrated proxy services. ``` ┌─────────────────────────────────────────────────────────────┐ │ VPN Exit Node Container │ ├─────────────────────────────────────────────────────────────┤ │ ┌─────────────┐ ┌──────────────┐ ┌─────────────────────┐ │ │ │ OpenVPN │ │ Tailscale │ │ Proxy Services │ │ │ │ (NordVPN) │ │ (Exit Node) │ │ │ │ │ │ │ │ │ │ ┌─────────────────┐ │ │ │ │ Port: tun0 │ │ Port: ts0 │ │ │ Squid HTTP/S │ │ │ │ └─────────────┘ └──────────────┘ │ │ Port: 3128 │ │ │ │ │ │ │ └─────────────────┘ │ │ │ ┌─────────────────────────────────┐ │ ┌─────────────────┐ │ │ │ │ iptables Routing │ │ │ Dante SOCKS5 │ │ │ │ │ tun0 ←→ tailscale0 │ │ │ Port: 1080 │ │ │ │ └─────────────────────────────────┘ │ └─────────────────┘ │ │ │ │ ┌─────────────────┐ │ │ │ ┌─────────────────────────────────┐ │ │ Health Check │ │ │ │ │ DNS Configuration │ │ │ Port: 8080 │ │ │ │ │ NordVPN DNS: 103.86.96.100 │ │ └─────────────────┘ │ │ │ │ NordVPN DNS: 103.86.99.100 │ │ │ │ │ │ Fallback: 8.8.8.8, 1.1.1.1 │ └─────────────────────┘ │ │ └─────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────┘ ``` ### NordVPN Integration **Server Selection:** - Country-specific server pools - Automatic optimal server selection based on latency - Support for both TCP and UDP configurations - Service credentials authentication **Configuration Management:** ``` configs/vpn/ ├── us.ovpn # Default US configuration ├── us/ # Specific US servers │ ├── us5063.nordvpn.com.tcp.ovpn │ └── us5064.nordvpn.com.udp.ovpn └── auth.txt # NordVPN service credentials ``` ### Tailscale Mesh Networking **Exit Node Configuration:** - Advertises as exit node on Tailscale network with `--advertise-exit-node` - Uses `--accept-dns=false` to prevent DNS conflicts (fixes HTTPS errors in incognito mode) - Ephemeral auth key configuration for automatic device cleanup - Unique hostname: `exit-{country}-{instance}` - Userspace networking for container compatibility - Automatic IP assignment from 100.x.x.x range **Network Architecture:** ``` Internet ←→ Tailscale Client ←→ Tailscale Mesh ←→ Exit Node ←→ NordVPN ←→ Destination (100.x.x.x) (tun0) (VPN Server) ``` **DNS Resolution Configuration:** To resolve HTTPS errors in incognito mode and improve reliability: 1. **Tailscale DNS Disabled**: `--accept-dns=false` prevents Tailscale from overriding DNS 2. **NordVPN DNS Primary**: Uses NordVPN's DNS servers (103.86.96.100, 103.86.99.100) 3. **Google DNS Fallback**: Falls back to 8.8.8.8 and 1.1.1.1 if NordVPN DNS fails 4. **Container DNS Override**: Manual `/etc/resolv.conf` configuration in containers This configuration eliminates the "doesn't support secure connection" errors that occurred when using Tailscale's DNS resolution through the VPN tunnel. ### Health Monitoring and Auto-Recovery **Health Checks:** 1. Container status monitoring 2. VPN tunnel connectivity (`ip route | grep tun0`) 3. Tailscale connection status 4. Exit node advertisement verification **Auto-Recovery Process:** 1. Health check failure detected 2. Container restart attempted (max 3 times) 3. If restart fails, node marked unhealthy 4. Load balancer redirects traffic to healthy nodes 5. Failed node removed after timeout ## Proxy Routing System ### Multi-Protocol Proxy Chain The system provides a comprehensive proxy chain supporting HTTP/HTTPS and SOCKS5 protocols: ``` Client → HAProxy → Tailscale Mesh → VPN Container → Internet ↓ ↓ ↓ ↓ ↓ Request Routing Mesh Network Proxy Services Destination Layer (100.x.x.x) (Squid/Dante) (via NordVPN) ``` **Proxy Chain Components:** 1. **HAProxy**: L7 load balancer with ACL-based country routing 2. **Tailscale Mesh**: Secure encrypted tunnel network (100.64.0.0/10) 3. **VPN Container**: Integrated Squid (HTTP/HTTPS) and Dante (SOCKS5) proxies 4. **NordVPN**: Exit point to internet with country-specific IP addresses ### Country-based Subdomain Routing The system uses DNS subdomains to route traffic through specific countries with multiple proxy protocols: ``` proxy-us.rbnk.uk → United States exit nodes proxy-uk.rbnk.uk → United Kingdom exit nodes proxy-de.rbnk.uk → Germany exit nodes proxy-jp.rbnk.uk → Japan exit nodes ``` **Available Proxy Protocols:** - **HTTP Proxy**: `http://:3128` (Squid) - **SOCKS5 Proxy**: `socks5://:1080` (Dante) - **Health Check**: `http://:8080/health` ### HAProxy ACL-Based Routing (Updated) HAProxy now uses ACL-based routing instead of regex for better performance and reliability: ``` # Country-specific ACLs using hostname matching acl is_us_proxy hdr(host) -i proxy-us.rbnk.uk acl is_uk_proxy hdr(host) -i proxy-uk.rbnk.uk acl is_de_proxy hdr(host) -i proxy-de.rbnk.uk # Route to appropriate backend use_backend proxy_us if is_us_proxy use_backend proxy_uk if is_uk_proxy use_backend proxy_de if is_de_proxy ``` ### Backend Server Selection with Health Checks For each country backend, HAProxy selects from available healthy nodes using HTTP health checks: ``` backend proxy_us mode http balance roundrobin option httpchk GET /health HTTP/1.1\r\nHost:\ localhost http-check expect status 200 # VPN container nodes with health checks server us-node-1 100.86.140.98:3128 check inter 10s server us-node-2 100.86.140.99:3128 check inter 10s server us-backup 127.0.0.1:3128 backup ``` **Health Check Updates for HAProxy 2.8:** - Updated health check syntax for compatibility - HTTP health checks on port 8080 (/health endpoint) - 10-second check intervals with automatic failover ## Load Balancing System ### 5 Load Balancing Strategies 1. **Round Robin**: Sequential distribution across nodes 2. **Least Connections**: Route to node with fewest active connections 3. **Weighted Latency**: Prefer nodes with lower VPN server latency 4. **Random**: Random node selection 5. **Health Score**: Comprehensive scoring based on multiple factors ### Health Score Calculation The health score algorithm considers multiple factors: ``` def calculate_health_score(node): score = 100.0 # Perfect score baseline # Server latency (40% weight) latency_score = max(50, 100 - (latency - 50) * 0.5) score = score * 0.6 + latency_score * 0.4 # Connection count (30% weight) connection_penalty = min(20, connection_count * 2) connection_score = max(60, 100 - connection_penalty) score = score * 0.7 + connection_score * 0.3 # CPU usage (20% weight) cpu_score = max(60, 100 - cpu_percent) score = score * 0.8 + cpu_score * 0.2 # Memory usage (10% weight) memory_penalty = max(0, (memory_mb - 300) / 10) memory_score = max(70, 100 - memory_penalty) score = score * 0.9 + memory_score * 0.1 return score ``` ### Automatic Scaling Logic **Scale Up Conditions:** - Average connections per node > 50 - Current node count < 3 for the country - At least one healthy server available **Scale Down Conditions:** - Average connections per node < 10 - Current node count > 1 for the country - Target node has 0 active connections ## Infrastructure Details ### Proxmox LXC Container Setup **Container Configuration:** - Container ID: 201 - OS: Ubuntu 22.04 - Internal IP: 10.10.10.20 - Public IP: 135.181.60.45 - Memory: 8GB - Storage: 100GB **Special Permissions Required:** ``` pct set 201 -features nesting=1,keyctl=1 pct set 201 -lxc.apparmor.profile: unconfined ``` ### Network Configuration **Network Stack:** ``` ┌─────────────────────────────────────┐ │ Internet (135.181.60.45) │ ├─────────────────────────────────────┤ │ Proxmox Host │ │ ┌─────────────────────────────┐ │ │ │ LXC Container 201 │ │ │ │ IP: 10.10.10.20 │ │ │ │ ┌───────────────────────┐ │ │ │ │ │ Docker Network │ │ │ │ │ │ traefik_proxy │ │ │ │ │ │ vpn_network │ │ │ │ │ └───────────────────────┘ │ │ │ └─────────────────────────────┘ │ └─────────────────────────────────────┘ ``` **Port Mapping:** - 80 → Traefik HTTP - 443 → Traefik HTTPS - 8080 → FastAPI Application - 8081 → Traefik Dashboard - 8404 → HAProxy Stats ### DNS Configuration with Cloudflare **DNS Records:** ``` A rbnk.uk 135.181.60.45 A *.rbnk.uk 135.181.60.45 CNAME proxy-us.rbnk.uk rbnk.uk CNAME proxy-uk.rbnk.uk rbnk.uk CNAME proxy-de.rbnk.uk rbnk.uk ``` **Cloudflare Settings:** - Proxy enabled for DDoS protection - SSL/TLS: Full (strict) - Always Use HTTPS: On - HSTS enabled ## Configuration Examples ### Docker Compose for API Services ``` version: '3.8' services: api: build: ./api container_name: vpn-api network_mode: host volumes: - /var/run/docker.sock:/var/run/docker.sock - ./configs:/configs environment: - TAILSCALE_AUTHKEY=${TAILSCALE_AUTHKEY} restart: unless-stopped redis: image: redis:7-alpine container_name: vpn-redis network_mode: host volumes: - redis-data:/data restart: unless-stopped ``` ### Traefik Configuration ``` # traefik.yml entryPoints: web: address: ":80" websecure: address: ":443" certificatesResolvers: cf: acme: email: "admin@richardbankole.com" storage: /letsencrypt/acme.json dnsChallenge: provider: cloudflare ``` ### Environment Variables ``` # Required environment variables TAILSCALE_AUTHKEY=tskey-auth-xxxxx # Tailscale auth key ADMIN_USER=admin # API admin username ADMIN_PASS=Bl4ckMagic!2345erver # API admin password SECRET_KEY=your-secret-key # FastAPI secret key REDIS_URL=redis://localhost:6379 # Redis connection string CF_DNS_API_TOKEN=cloudflare-token # Cloudflare API token ``` ## Monitoring and Observability ### Metrics Collection **System Metrics:** - Node count per country - Connection distribution - CPU and memory usage - Network throughput - VPN connection stability **Business Metrics:** - Request success rate - Response time percentiles - Geographic usage distribution - Load balancing effectiveness ### Health Monitoring **Health Check Endpoints:** - `/health` - API service health - `/api/nodes` - Node status overview - `/api/metrics` - System metrics - HAProxy stats at `:8404/stats` - Traefik dashboard at `:8081` ### Alerting and Failover **Automatic Failover Triggers:** - Node health check failures - High CPU/memory usage - VPN connection loss - Tailscale connectivity issues **Recovery Actions:** - Container restart (up to 3 attempts) - Node replacement with fresh container - Load balancer traffic redirection - Administrative notifications ## Security Considerations ### Network Isolation - Each VPN node runs in isolated Docker container - Network policies restrict inter-container communication - VPN credentials stored securely in mounted volumes ### Authentication and Authorization - HTTP Basic Auth for API access - Tailscale authentication for mesh network - NordVPN service credentials for VPN access ### SSL/TLS Configuration - End-to-end encryption via Traefik - Let's Encrypt certificates with automatic renewal - Secure headers middleware - HSTS enforcement ## Deployment and Operations ### Initial Setup 1. **Proxmox LXC Creation:** ``` pct create 201 ubuntu-22.04-standard_22.04-1_amd64.tar.xz \ --hostname vpn-controller \ --memory 8192 \ --rootfs local-lvm:100 ``` 2. **Container Permissions:** ``` pct set 201 -features nesting=1,keyctl=1 pct set 201 -lxc.apparmor.profile: unconfined ``` 3. **Service Installation:** ``` cd /opt/vpn-exit-controller ./setup-project.sh systemctl enable vpn-controller systemctl start vpn-controller ``` ### Maintenance Operations **Health Monitoring:** ``` # Check service status systemctl status vpn-controller # View real-time logs journalctl -u vpn-controller -f # Check Docker containers docker ps -a --filter label=vpn.exit-node=true ``` **Configuration Updates:** ``` # Update HAProxy configuration curl -X POST http://localhost:8080/api/proxy/update-config # Restart all nodes for a country curl -X POST http://localhost:8080/api/nodes/us/restart-all ``` **Backup and Recovery:** ``` # Backup Redis data docker exec vpn-redis redis-cli BGSAVE # Backup configuration tar -czf backup.tar.gz /opt/vpn-exit-controller/configs ``` This architecture provides a robust, scalable, and intelligent VPN exit node system that automatically manages geographic traffic routing while maintaining high availability and performance. --- ## Getting Started ### Getting Started with VPN Exit Controller Welcome to the VPN Exit Controller documentation! This guide will help you get up and running quickly. ## What is VPN Exit Controller? VPN Exit Controller is a professional-grade system for managing VPN exit nodes with intelligent load balancing and country-specific proxy URLs. It provides: - 🌍 **25+ Country Support**: Access VPN exit nodes in countries worldwide - ⚡ **Intelligent Load Balancing**: 5 different strategies for optimal performance - 🔄 **Automatic Failover**: Self-healing with health monitoring - 🚀 **High Performance**: HAProxy-based routing with sub-second latency - 🔒 **Enterprise Security**: SSL/TLS, authentication, and Tailscale mesh networking - 📊 **Real-time Metrics**: Comprehensive monitoring and speed testing - 🐳 **Container-based**: Docker architecture for easy scaling ## Quick Links
- :material-rocket-launch:{ .lg .middle } __Quick Start__ --- Get VPN Exit Controller running in minutes :octicons-arrow-right-24: Quick Start Guide - :material-book-open-variant:{ .lg .middle } __User Guide__ --- Learn how to use proxy URLs and configure clients :octicons-arrow-right-24: User Guide - :material-api:{ .lg .middle } __API Reference__ --- Integrate with the REST API :octicons-arrow-right-24: API Documentation - :material-server:{ .lg .middle } __Deployment__ --- Deploy to production infrastructure :octicons-arrow-right-24: Deployment Guide
## Prerequisites Before you begin, ensure you have: - Ubuntu 22.04 or later (or compatible Linux distribution) - Docker and Docker Compose installed - Python 3.10+ with pip - A domain with Cloudflare DNS (for proxy URLs) - NordVPN service credentials - Tailscale account for mesh networking ## Architecture Overview ``` graph LR A[Internet] --> B[Cloudflare DNS] B --> C[Traefik SSL] C --> D[HAProxy] D --> E[Load Balancer] E --> F[VPN Nodes] F --> G[NordVPN] style A fill:#f9f,stroke:#333,stroke-width:2px style B fill:#ff9,stroke:#333,stroke-width:2px style C fill:#9ff,stroke:#333,stroke-width:2px style D fill:#9f9,stroke:#333,stroke-width:2px ``` ## Choose Your Path ### 🚀 **I want to use the proxy service** Start with the Proxy Usage Guide to learn how to configure your browser or application. ### 🛠️ **I want to deploy my own instance** Follow the Deployment Guide for step-by-step installation instructions. ### 💻 **I want to integrate via API** Check out the API Reference for authentication and endpoint documentation. ### 🔧 **I want to contribute** Read our Contributing Guide to get started with development. ## Key Features ### Country-Specific Proxy URLs Access any supported country through intuitive URLs: - `https://proxy-us.rbnk.uk` - United States - `https://proxy-uk.rbnk.uk:8132` - United Kingdom - `https://proxy-jp.rbnk.uk` - Japan - View all 25+ supported countries → ### Intelligent Load Balancing Choose from 5 strategies: - **Round Robin**: Equal distribution - **Least Connections**: Route to least busy node - **Weighted Latency**: Favor fastest nodes - **Random**: Randomized selection - **Health Score**: AI-based optimal routing ### Enterprise-Ready - SSL/TLS encryption with Let's Encrypt - HTTP Basic and API key authentication - Comprehensive logging and monitoring - Redis-backed metrics and session persistence - Automatic failover and recovery ## Community & Support - 📖 Full Documentation - 🐛 Report Issues - 💬 Discussions - 📧 Contact Support --- !!! tip "Ready to get started?" Head over to the Quick Start Guide to deploy your first VPN exit node in minutes! --- ## Guide > Api Usage ### API Usage Guide Learn how to interact with the VPN Exit Controller API for programmatic control of VPN nodes and proxy services. ## API Overview The VPN Exit Controller provides a comprehensive REST API for: - **Node Management**: Start, stop, and monitor VPN nodes - **Load Balancing**: Configure strategies and get optimal nodes - **Metrics & Monitoring**: Access performance data and health status - **Speed Testing**: Run bandwidth and latency tests - **Proxy Management**: Configure proxy routing ## Authentication All API endpoints require authentication using HTTP Basic Auth: ``` curl -u admin:your_password https://api.vpn.yourdomain.com/api/endpoint ``` ### Authentication Methods === "HTTP Basic Auth" ``` # Using curl curl -u admin:password https://api.vpn.yourdomain.com/api/nodes # Using base64 encoding curl -H "Authorization: Basic YWRtaW46cGFzc3dvcmQ=" \ https://api.vpn.yourdomain.com/api/nodes ``` === "Environment Variables" ``` # Set credentials export VPN_API_USER=admin export VPN_API_PASS=your_password # Use in scripts curl -u $VPN_API_USER:$VPN_API_PASS \ https://api.vpn.yourdomain.com/api/nodes ``` === "Programming Languages" ``` # Python import requests from requests.auth import HTTPBasicAuth response = requests.get( 'https://api.vpn.yourdomain.com/api/nodes', auth=HTTPBasicAuth('admin', 'password') ) ``` ## Common API Operations ### 1. Node Management #### List All Nodes ``` curl -u admin:password https://api.vpn.yourdomain.com/api/nodes ``` **Response:** ``` { "nodes": [ { "id": "vpn-us", "country": "us", "city": "New York", "status": "running", "health": "healthy", "connections": 15, "uptime": 86400 } ] } ``` #### Start a New Node ``` curl -X POST -u admin:password \ -H "Content-Type: application/json" \ -d '{"country": "uk", "city": "London"}' \ https://api.vpn.yourdomain.com/api/nodes/start ``` #### Stop a Node ``` curl -X DELETE -u admin:password \ https://api.vpn.yourdomain.com/api/nodes/vpn-uk ``` ### 2. Load Balancing #### Get Best Node for Country ``` curl -u admin:password \ https://api.vpn.yourdomain.com/api/load-balancer/best-node/us # Get best UK node curl -u admin:password \ https://api.vpn.yourdomain.com/api/load-balancer/best-node/uk ``` **Response:** ``` { "node": { "id": "vpn-us-2", "score": 95.5, "latency": 12, "connections": 5 }, "strategy": "health_score" } ``` #### Change Load Balancing Strategy ``` curl -X POST -u admin:password \ -H "Content-Type: application/json" \ -d '{"strategy": "weighted_latency"}' \ https://api.vpn.yourdomain.com/api/load-balancer/strategy ``` ### 3. Metrics and Monitoring #### Get System Metrics ``` curl -u admin:password \ https://api.vpn.yourdomain.com/api/metrics ``` #### Health Check ``` curl -u admin:password \ https://api.vpn.yourdomain.com/api/health ``` ### 4. Speed Testing #### Run Speed Test ``` curl -X POST -u admin:password \ https://api.vpn.yourdomain.com/api/speed-test/vpn-us # Run speed test for UK node curl -X POST -u admin:password \ https://api.vpn.yourdomain.com/api/speed-test/vpn-uk ``` **Response:** ``` { "node_id": "vpn-us", "download_speed": 485.6, "upload_speed": 234.8, "latency": 15.2, "timestamp": "2024-01-15T10:30:00Z" } ``` ## SDK Examples ### Python SDK ``` import requests from typing import Dict, List, Optional class VPNController: def __init__(self, base_url: str, username: str, password: str): self.base_url = base_url.rstrip('/') self.auth = (username, password) def list_nodes(self) -> List[Dict]: """List all VPN nodes""" response = requests.get( f"{self.base_url}/api/nodes", auth=self.auth ) response.raise_for_status() return response.json()['nodes'] def start_node(self, country: str, city: Optional[str] = None) -> Dict: """Start a new VPN node""" data = {"country": country} if city: data["city"] = city response = requests.post( f"{self.base_url}/api/nodes/start", json=data, auth=self.auth ) response.raise_for_status() return response.json() def get_best_node(self, country: str) -> Dict: """Get the best node for a country""" response = requests.get( f"{self.base_url}/api/load-balancer/best-node/{country}", auth=self.auth ) response.raise_for_status() return response.json() # Usage vpn = VPNController('https://api.vpn.yourdomain.com', 'admin', 'password') nodes = vpn.list_nodes() best_us = vpn.get_best_node('us') best_uk = vpn.get_best_node('uk') ``` ### JavaScript/Node.js SDK ``` const axios = require('axios'); class VPNController { constructor(baseUrl, username, password) { this.client = axios.create({ baseURL: baseUrl, auth: { username: username, password: password } }); } async listNodes() { const response = await this.client.get('/api/nodes'); return response.data.nodes; } async startNode(country, city = null) { const data = { country }; if (city) data.city = city; const response = await this.client.post('/api/nodes/start', data); return response.data; } async getBestNode(country) { const response = await this.client.get(`/api/load-balancer/best-node/${country}`); return response.data; } } // Usage const vpn = new VPNController('https://api.vpn.yourdomain.com', 'admin', 'password'); const nodes = await vpn.listNodes(); const bestUS = await vpn.getBestNode('us'); const bestUK = await vpn.getBestNode('uk'); ``` ## Error Handling The API returns standard HTTP status codes: | Status Code | Description | |-------------|-------------| | 200 | Success | | 201 | Created | | 400 | Bad Request | | 401 | Unauthorized | | 404 | Not Found | | 409 | Conflict (e.g., node already exists) | | 500 | Internal Server Error | ### Error Response Format ``` { "error": "Node not found", "detail": "No node with ID 'vpn-xyz' exists", "timestamp": "2024-01-15T10:30:00Z" } ``` ### Error Handling Examples === "Python" ``` try: response = vpn.start_node('us') except requests.exceptions.HTTPError as e: if e.response.status_code == 409: print("Node already exists") elif e.response.status_code == 401: print("Invalid credentials") else: print(f"Error: {e.response.json()['error']}") ``` === "JavaScript" ``` try { const response = await vpn.startNode('us'); } catch (error) { if (error.response) { if (error.response.status === 409) { console.log("Node already exists"); } else if (error.response.status === 401) { console.log("Invalid credentials"); } else { console.log(`Error: ${error.response.data.error}`); } } } ``` ## Rate Limiting API requests are rate limited to prevent abuse: - **Default Limit**: 100 requests per minute per IP - **Burst Limit**: 20 requests per second - **Headers**: Rate limit info in response headers ``` X-RateLimit-Limit: 100 X-RateLimit-Remaining: 95 X-RateLimit-Reset: 1642257600 ``` ## Webhooks Configure webhooks for real-time events: ``` curl -X POST -u admin:password \ -H "Content-Type: application/json" \ -d '{ "url": "https://your-webhook.com/vpn-events", "events": ["node.started", "node.stopped", "node.unhealthy"] }' \ https://api.vpn.yourdomain.com/api/webhooks ``` ### Webhook Events - `node.started` - VPN node successfully started - `node.stopped` - VPN node stopped - `node.unhealthy` - Node health check failed - `failover.triggered` - Automatic failover occurred - `speed.test.completed` - Speed test finished ## Best Practices !!! tip "API Usage Tips" 1. **Cache responses** when appropriate to reduce API calls 2. **Use bulk operations** when available 3. **Implement exponential backoff** for retries 4. **Monitor rate limits** to avoid throttling 5. **Use webhooks** for real-time updates instead of polling !!! warning "Security Best Practices" - Never hardcode credentials in your code - Use environment variables or secure vaults - Rotate API credentials regularly - Implement request signing for sensitive operations - Use HTTPS for all API communications ## API Playground Try the API directly from your browser:

## Next Steps - 📚 View complete API Reference - 🔧 Learn about Configuration Options - 📊 Explore Metrics and Monitoring - 🚀 Check out SDK Examples --- !!! question "Need Help?" Check our API Reference for detailed endpoint documentation or contact support for assistance. --- ## Guide > Configuration ### Configuration Guide This guide covers all configuration options for VPN Exit Controller, including environment variables, service settings, and advanced tuning parameters. ## Configuration Overview VPN Exit Controller uses a hierarchical configuration system: 1. **Environment Variables** (`.env` file) 2. **Service Configuration** (systemd, Docker) 3. **Application Settings** (API, load balancer, etc.) 4. **Runtime Configuration** (via API) ## Environment Variables ### Essential Configuration Create a `.env` file in the project root: ``` # Copy template cp .env.example .env # Edit configuration nano .env ``` ### Core Settings #### NordVPN Configuration ``` # Service credentials from NordVPN dashboard NORDVPN_USER=your_service_username NORDVPN_PASS=your_service_password # Optional: Preferred protocol NORDVPN_PROTOCOL=udp # or tcp NORDVPN_TECHNOLOGY=openvpn_udp # or nordlynx ``` !!! info "Getting NordVPN Credentials" 1. Log in to NordVPN Dashboard 2. Navigate to Manual Configuration 3. Generate service credentials 4. Use these credentials (not your account login) #### Tailscale Configuration ``` # Auth key for automatic node registration (use ephemeral keys for auto-cleanup) TAILSCALE_AUTH_KEY=tskey-auth-xxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxx # Optional: Custom hostname prefix TAILSCALE_HOSTNAME_PREFIX=vpn-exit # Optional: Exit node advertisement TAILSCALE_ADVERTISE_EXIT_NODE=true TAILSCALE_ADVERTISE_ROUTES=10.0.0.0/8,192.168.0.0/16 # DNS Configuration (Important: prevents HTTPS errors in incognito mode) TAILSCALE_ACCEPT_DNS=false # Disables Tailscale DNS override ``` #### API Configuration ``` # API Authentication API_USERNAME=admin API_PASSWORD=strong_secure_password_here # API Server Settings API_HOST=0.0.0.0 API_PORT=8080 API_WORKERS=4 API_RELOAD=false # Set to true for development # CORS Settings API_CORS_ORIGINS=["https://vpn-docs.rbnk.uk", "https://admin.rbnk.uk"] ``` #### Redis Configuration ``` # Redis Connection REDIS_HOST=localhost REDIS_PORT=6379 REDIS_DB=0 REDIS_PASSWORD=redis_password_if_set # Redis Settings REDIS_MAX_CONNECTIONS=50 REDIS_DECODE_RESPONSES=true REDIS_SOCKET_TIMEOUT=5 REDIS_CONNECTION_TIMEOUT=10 ``` #### Proxy Server Configuration ``` # Proxy service settings PROXY_HTTP_PORT=3128 # Squid HTTP/HTTPS proxy port PROXY_SOCKS_PORT=1080 # Dante SOCKS5 proxy port PROXY_HEALTH_PORT=8080 # Health check endpoint port # DNS Configuration for VPN containers VPN_DNS_PRIMARY=103.86.96.100 # NordVPN DNS server 1 VPN_DNS_SECONDARY=103.86.99.100 # NordVPN DNS server 2 VPN_DNS_FALLBACK_1=8.8.8.8 # Google DNS fallback 1 VPN_DNS_FALLBACK_2=1.1.1.1 # Google DNS fallback 2 # Squid proxy settings SQUID_ACCESS_LOG=none # Disable access logging for privacy SQUID_CACHE_ENABLED=false # Disable caching for privacy SQUID_MAX_CONNECTIONS=1000 # Maximum concurrent connections # SOCKS5 proxy settings DANTE_MAX_CONNECTIONS=1000 # Maximum concurrent connections DANTE_LOG_LEVEL=error # Logging level (error, warning, info, debug) ``` !!! info "DNS Resolution Fix" The VPN containers are configured with specific DNS servers to resolve the "doesn't support secure connection" errors that occurred in incognito mode: 1. **Primary**: NordVPN DNS servers (103.86.96.100, 103.86.99.100) 2. **Fallback**: Google DNS (8.8.8.8, 1.1.1.1) if NordVPN DNS fails 3. **Tailscale DNS Disabled**: `--accept-dns=false` prevents conflicts ### Advanced Settings #### Load Balancing ``` # Default strategy: round_robin, least_connections, weighted_latency, random, health_score DEFAULT_LOAD_BALANCING_STRATEGY=health_score # Auto-scaling AUTO_SCALING_ENABLED=true AUTO_SCALING_MIN_NODES=1 AUTO_SCALING_MAX_NODES=5 AUTO_SCALING_TARGET_CPU=70 AUTO_SCALING_TARGET_CONNECTIONS=100 # Connection limits MAX_CONNECTIONS_PER_NODE=50 CONNECTION_DRAIN_TIMEOUT=30 ``` #### Health Monitoring ``` # Health check intervals (seconds) HEALTH_CHECK_INTERVAL=30 HEALTH_CHECK_TIMEOUT=10 HEALTH_CHECK_RETRIES=3 HEALTH_CHECK_BACKOFF_FACTOR=2 # Failover settings FAILOVER_ENABLED=true FAILOVER_THRESHOLD=3 # Failed health checks before failover FAILOVER_COOLDOWN=300 # Seconds before retry ``` #### Speed Testing ``` # Speed test configuration SPEED_TEST_ENABLED=true SPEED_TEST_INTERVAL=3600 # Run every hour SPEED_TEST_TIMEOUT=60 SPEED_TEST_SERVERS=["fast.com", "speedtest.net", "google.com"] # Test file sizes SPEED_TEST_DOWNLOAD_SIZE=10MB SPEED_TEST_UPLOAD_SIZE=5MB ``` #### Metrics and Logging ``` # Logging LOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERROR, CRITICAL LOG_FORMAT=json # json or text LOG_FILE=/var/log/vpn-controller/app.log LOG_ROTATION=daily LOG_RETENTION_DAYS=30 # Metrics METRICS_ENABLED=true METRICS_RETENTION_HOURS=168 # 7 days METRICS_AGGREGATION_INTERVAL=60 # seconds ``` #### Security Settings ``` # API Security API_RATE_LIMIT_ENABLED=true API_RATE_LIMIT_PER_MINUTE=100 API_RATE_LIMIT_BURST=20 # IP Whitelisting (comma-separated) API_WHITELIST_IPS=10.0.0.0/8,192.168.0.0/16 API_BLACKLIST_IPS= # Session Management SESSION_TIMEOUT=3600 # 1 hour SESSION_SECURE_COOKIE=true SESSION_SAME_SITE=strict ``` ### Domain and SSL Configuration ``` # Domain settings DOMAIN=rbnk.uk API_SUBDOMAIN=vpn-api DOCS_SUBDOMAIN=vpn-docs # Cloudflare CF_API_TOKEN=your_cloudflare_api_token CF_ZONE_ID=your_zone_id CF_PROXY_ENABLED=true # SSL/TLS SSL_EMAIL=admin@yourdomain.com SSL_STAGING=false # Set to true for Let's Encrypt staging ``` ## Service Configuration ### Systemd Service Edit `/etc/systemd/system/vpn-controller.service`: ``` [Unit] Description=VPN Exit Controller API After=network.target redis.service docker.service Wants=redis.service docker.service [Service] Type=exec User=root Group=docker WorkingDirectory=/opt/vpn-exit-controller # Environment Environment="PATH=/opt/vpn-exit-controller/venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" EnvironmentFile=/opt/vpn-exit-controller/.env # Process management ExecStart=/opt/vpn-exit-controller/venv/bin/python -m uvicorn api.main:app --host 0.0.0.0 --port 8080 ExecReload=/bin/kill -s HUP $MAINPID ExecStop=/bin/kill -s TERM $MAINPID # Restart policy Restart=always RestartSec=10 RestartPreventExitStatus=0 # Resource limits LimitNOFILE=65536 LimitNPROC=4096 # Security PrivateTmp=true NoNewPrivileges=true [Install] WantedBy=multi-user.target ``` ### Docker Configuration #### Docker Compose Override Create `docker-compose.override.yml` for local settings: ``` version: '3.8' services: vpn-controller: environment: - LOG_LEVEL=DEBUG - API_RELOAD=true volumes: - ./custom-configs:/app/custom-configs ports: - "8081:8080" # Different port for development ``` #### Docker Resource Limits ``` services: vpn-controller: deploy: resources: limits: cpus: '2.0' memory: 2G reservations: cpus: '0.5' memory: 512M ``` ## HAProxy Configuration ### Load Balancer Tuning Edit `/opt/vpn-exit-controller/proxy/haproxy.cfg`: ``` global # Performance tuning maxconn 10000 nbproc 4 nbthread 8 cpu-map auto:1/1-8 0-7 # Timeouts timeout connect 5s timeout client 30s timeout server 30s timeout tunnel 1h # SSL/TLS tuning tune.ssl.default-dh-param 2048 ssl-default-bind-ciphers ECDHE+AESGCM:ECDHE+AES256:ECDHE+AES128 ssl-default-bind-options no-sslv3 no-tlsv10 no-tlsv11 ``` ### Backend Configuration ``` backend proxy_us # Load balancing algorithm balance leastconn # or roundrobin, source, uri # Health checks option httpchk GET /health HTTP/1.1\r\nHost:\ localhost http-check expect status 200 # Connection settings option http-server-close option forwardfor http-reuse safe # Servers with advanced options server vpn-us-1 10.0.0.11:8888 check inter 5s rise 2 fall 3 weight 100 server vpn-us-2 10.0.0.12:8888 check inter 5s rise 2 fall 3 weight 100 backup ``` ## Traefik Configuration ### Dynamic Configuration Create `traefik/dynamic/vpn-controller.yml`: ``` http: routers: vpn-api: rule: "Host(`vpn-api.rbnk.uk`)" service: vpn-api entryPoints: - websecure tls: certResolver: cf middlewares: - rate-limit - security-headers services: vpn-api: loadBalancer: servers: - url: "http://localhost:8080" healthCheck: path: /api/health interval: 30s timeout: 10s middlewares: rate-limit: rateLimit: average: 100 burst: 50 period: 1m security-headers: headers: customFrameOptionsValue: SAMEORIGIN contentTypeNosniff: true browserXssFilter: true stsSeconds: 31536000 stsIncludeSubdomains: true stsPreload: true ``` ## Application Configuration ### API Settings Create `api/config.py` for application-specific settings: ``` from pydantic_settings import BaseSettings from typing import List, Optional class Settings(BaseSettings): # API Settings title: str = "VPN Exit Controller API" version: str = "1.0.0" description: str = "Professional VPN node management system" docs_url: str = "/api/docs" redoc_url: str = "/api/redoc" # Feature flags enable_metrics: bool = True enable_webhooks: bool = True enable_speed_tests: bool = True enable_auto_scaling: bool = True # Performance tuning connection_pool_size: int = 100 request_timeout: int = 30 background_task_workers: int = 4 class Config: env_file = ".env" case_sensitive = False settings = Settings() ``` ### Runtime Configuration API Configure settings via API without restart: ``` # Update load balancing strategy curl -X PUT -u admin:password \ -H "Content-Type: application/json" \ -d '{"key": "load_balancing.strategy", "value": "health_score"}' \ https://api.vpn.yourdomain.com/api/config # Update health check interval curl -X PUT -u admin:password \ -H "Content-Type: application/json" \ -d '{"key": "health_check.interval", "value": 60}' \ https://api.vpn.yourdomain.com/api/config ``` ## Configuration Best Practices ### Environment Management 1. **Development Environment** ``` # .env.development LOG_LEVEL=DEBUG API_RELOAD=true SSL_STAGING=true ``` 2. **Production Environment** ``` # .env.production LOG_LEVEL=INFO API_RELOAD=false SSL_STAGING=false ``` 3. **Environment Loading** ``` # Load specific environment export ENV=production source .env.$ENV ``` ### Secret Management !!! warning "Security Best Practice" Never commit secrets to version control. Use secure secret management solutions. Options for secret management: 1. **HashiCorp Vault** ``` import hvac client = hvac.Client(url='https://vault.company.com') nordvpn_pass = client.read('secret/vpn/nordvpn')['data']['password'] ``` 2. **AWS Secrets Manager** ``` import boto3 client = boto3.client('secretsmanager') secret = client.get_secret_value(SecretId='vpn-controller-secrets') ``` 3. **Environment Variable Encryption** ``` # Encrypt secrets echo "password" | openssl enc -aes-256-cbc -base64 -out secret.enc # Decrypt at runtime export API_PASSWORD=$(openssl enc -aes-256-cbc -d -base64 -in secret.enc) ``` ### Configuration Validation Validate configuration on startup: ``` def validate_config(): """Validate all configuration settings""" errors = [] # Check required variables required = ['NORDVPN_USER', 'NORDVPN_PASS', 'TAILSCALE_AUTH_KEY'] for var in required: if not os.getenv(var): errors.append(f"Missing required: {var}") # Validate formats if os.getenv('API_PORT'): try: port = int(os.getenv('API_PORT')) if not 1 <= port <= 65535: errors.append("Invalid port range") except ValueError: errors.append("API_PORT must be integer") if errors: raise ConfigurationError("\n".join(errors)) ``` ## Monitoring Configuration Use these commands to verify configuration: ``` # Check loaded environment ./scripts/check-config.sh # Validate configuration python -m api.config validate # Test configuration changes curl -u admin:password https://api.vpn.yourdomain.com/api/config/test ``` ## Next Steps - 🚀 Deploy to Production - 🔒 Security Hardening - 📊 Monitoring Setup - 🔧 Troubleshooting --- !!! tip "Configuration Tips" - Always use `.env.example` as a template - Keep production secrets in a secure vault - Monitor configuration changes with audit logs - Test configuration changes in staging first - Document all custom configuration options --- ## Guide ### User Guide Welcome to the VPN Exit Controller User Guide! This section covers everything you need to know about using the system effectively. ## What You'll Learn
- :material-web:{ .lg .middle } __Using Proxy URLs__ --- Configure browsers and applications to use country-specific proxies :octicons-arrow-right-24: Proxy Usage Guide - :material-scale-balance:{ .lg .middle } __Load Balancing__ --- Understand and configure intelligent load balancing strategies :octicons-arrow-right-24: Load Balancing Guide - :material-api:{ .lg .middle } __API Usage__ --- Integrate with the REST API for programmatic control :octicons-arrow-right-24: API Usage Guide - :material-cog:{ .lg .middle } __Configuration__ --- Advanced configuration options and environment variables :octicons-arrow-right-24: Configuration Guide
## Quick Overview ### Proxy URLs Access VPN exit nodes through simple proxy URLs: ``` https://proxy-us.rbnk.uk # United States https://proxy-uk.rbnk.uk # United Kingdom https://proxy-jp.rbnk.uk # Japan ``` ### Load Balancing Strategies | Strategy | Description | Best For | |----------|-------------|----------| | **Round Robin** | Equal distribution | Balanced load | | **Least Connections** | Route to least busy | High traffic | | **Weighted Latency** | Favor fastest nodes | Performance | | **Random** | Random selection | Testing | | **Health Score** | AI-based routing | Optimal performance | ### API Authentication All API requests require authentication: ``` curl -u admin:password https://api.vpn.yourdomain.com/api/nodes ``` ## Common Use Cases ### 1. Browser Configuration Configure your browser to use a specific country proxy: === "Chrome" ``` Settings → Advanced → System → Open proxy settings HTTP Proxy: proxy-us.rbnk.uk Port: 443 ``` === "Firefox" ``` Settings → Network Settings → Manual proxy HTTP Proxy: proxy-uk.rbnk.uk Port: 443 ``` ### 2. Application Integration Integrate proxy URLs in your applications: === "Python" ``` import requests proxies = { 'http': 'https://proxy-jp.rbnk.uk', 'https': 'https://proxy-jp.rbnk.uk' } response = requests.get('https://ipinfo.io', proxies=proxies) ``` === "Node.js" ``` const axios = require('axios'); const response = await axios.get('https://ipinfo.io', { proxy: { protocol: 'https', host: 'proxy-de.rbnk.uk', port: 443 } }); ``` ### 3. Load Balancing Control Select the optimal node for your needs: ``` # Get best node for US curl -u admin:password \ https://api.vpn.yourdomain.com/api/load-balancer/best-node/us # Change strategy to health score curl -X POST -u admin:password \ -H "Content-Type: application/json" \ -d '{"strategy": "health_score"}' \ https://api.vpn.yourdomain.com/api/load-balancer/strategy ``` ## Best Practices !!! tip "Performance Tips" - Use the health score strategy for optimal performance - Monitor node metrics to identify performance issues - Rotate between nodes to distribute load - Use geographic proximity for lowest latency !!! warning "Security Considerations" - Always use HTTPS proxy URLs - Rotate API credentials regularly - Monitor access logs for unusual activity - Implement IP whitelisting for sensitive operations ## Need Help? - 📖 Check the detailed guides in this section - 🔧 Review Troubleshooting Guide - 💬 Ask a Question - 📧 Contact Support --- !!! success "Ready to dive deeper?" Start with the Proxy Usage Guide to learn how to configure your applications for VPN access. --- ## Guide > Load Balancing ### Load Balancing System Documentation ## Table of Contents 1. Load Balancing Overview 2. Load Balancing Strategies 3. Health Score Algorithm 4. Speed Testing Integration 5. Failover Logic 6. Configuration and Tuning 7. Monitoring and Metrics 8. Advanced Features 9. API Reference 10. Troubleshooting ## Load Balancing Overview ### Purpose and Benefits The VPN Exit Controller implements intelligent load balancing to distribute traffic across multiple VPN exit nodes within each country. This provides several key benefits: - **High Availability**: Automatic failover when nodes become unhealthy - **Performance Optimization**: Route traffic to the fastest available nodes - **Scalability**: Automatic scaling based on connection load - **Resource Efficiency**: Optimal utilization of compute resources - **Geographic Distribution**: Balanced load across different VPN servers ### Integration with Failover Systems The load balancer works closely with the failover manager to ensure service continuity: ``` # Example: Load balancer + failover integration if not healthy_nodes: # Trigger failover to different VPN server await failover_manager.handle_node_failure(node_id, "no_healthy_nodes") # Recheck for healthy nodes after failover healthy_nodes = self._get_healthy_nodes_for_country(country) ``` ## Load Balancing Strategies The system supports five distinct load balancing strategies, each optimized for different scenarios: ### 1. Round Robin Strategy **Purpose**: Simple, fair distribution of connections across all healthy nodes. **Algorithm**: ``` async def _round_robin_select(self, nodes: List[Dict], country: str) -> Dict: """Round-robin selection""" if country not in self.round_robin_counters: self.round_robin_counters[country] = 0 selected_index = self.round_robin_counters[country] % len(nodes) self.round_robin_counters[country] += 1 return nodes[selected_index] ``` **Best For**: - Evenly distributed workloads - Testing scenarios - When all nodes have similar performance characteristics **Characteristics**: - Maintains per-country counters - Guarantees fair distribution - No performance consideration ### 2. Least Connections Strategy **Purpose**: Route new connections to the node with the fewest active connections. **Algorithm**: ``` async def _least_connections_select(self, nodes: List[Dict], country: str) -> Dict: """Select node with least connections""" node_connections = [] for node in nodes: connection_count = redis_manager.get_connection_count(node['id']) node_connections.append((node, connection_count)) # Sort by connection count (ascending) node_connections.sort(key=lambda x: x[1]) return node_connections[0][0] ``` **Best For**: - Long-lived connections - Scenarios where connection duration varies significantly - Optimizing connection distribution **Monitoring**: ``` # Check connection counts via API curl -u admin:password http://localhost:8080/api/load-balancer/stats ``` ### 3. Weighted Latency Strategy **Purpose**: Route traffic based on server latency with weighted randomization. **Algorithm**: ``` async def _weighted_latency_select(self, nodes: List[Dict], country: str) -> Dict: """Select based on weighted latency scores""" node_scores = [] for node in nodes: # Get server latency from Redis server_health = redis_manager.get_server_health(node.get('vpn_server', '')) latency = server_health.get('latency', 100) if server_health else 100 # Lower latency = higher weight weight = max(1, 200 - latency) # Weight between 1-199 node_scores.append((node, weight)) # Weighted random selection total_weight = sum(score[1] for score in node_scores) random_point = random.uniform(0, total_weight) current_weight = 0 for node, weight in node_scores: current_weight += weight if current_weight >= random_point: return node ``` **Weight Calculation**: - Latency 50ms → Weight 150 - Latency 100ms → Weight 100 - Latency 150ms → Weight 50 - Latency 200ms+ → Weight 1 **Best For**: - Latency-sensitive applications - Real-time communications - Gaming or streaming workloads ### 4. Random Strategy **Purpose**: Randomly distribute connections for simple load distribution. **Algorithm**: ``` async def _random_select(self, nodes: List[Dict], country: str) -> Dict: """Random selection""" return random.choice(nodes) ``` **Best For**: - Simple load distribution - Development and testing - When other strategies are not applicable ### 5. Health Score Strategy (Default) **Purpose**: Select nodes based on comprehensive health scores considering multiple factors. **Algorithm**: ``` async def _health_score_select(self, nodes: List[Dict], country: str) -> Dict: """Select based on comprehensive health score""" node_scores = [] for node in nodes: score = await self._calculate_node_health_score(node) node_scores.append((node, score)) # Sort by score (descending - higher is better) node_scores.sort(key=lambda x: x[1], reverse=True) return node_scores[0][0] ``` ## Health Score Algorithm The health score algorithm provides a comprehensive assessment of node performance by weighing multiple factors: ### Score Calculation ``` async def _calculate_node_health_score(self, node: Dict) -> float: """Calculate comprehensive health score for a node""" score = 100.0 # Start with perfect score # Factor 1: Server latency (40% weight) server_health = redis_manager.get_server_health(node.get('vpn_server', '')) if server_health: latency = server_health.get('latency', 100) # Score: 100ms=90, 50ms=95, 200ms=80 latency_score = max(50, 100 - (latency - 50) * 0.5) score = score * 0.6 + latency_score * 0.4 # Factor 2: Connection count (30% weight) connection_count = redis_manager.get_connection_count(node['id']) # Penalize high connection counts connection_penalty = min(20, connection_count * 2) connection_score = max(60, 100 - connection_penalty) score = score * 0.7 + connection_score * 0.3 # Factor 3: CPU usage (20% weight) stats = node.get('stats', {}) cpu_percent = stats.get('cpu_percent', 0) cpu_score = max(60, 100 - cpu_percent) score = score * 0.8 + cpu_score * 0.2 # Factor 4: Memory usage (10% weight) memory_mb = stats.get('memory_mb', 0) # Penalize if using > 300MB memory_penalty = max(0, (memory_mb - 300) / 10) memory_score = max(70, 100 - memory_penalty) score = score * 0.9 + memory_score * 0.1 return score ``` ### Scoring Factors | Factor | Weight | Description | Range | |--------|--------|-------------|-------| | **Server Latency** | 40% | Network latency to VPN server | 50-100 | | **Connection Count** | 30% | Number of active connections | 60-100 | | **CPU Usage** | 20% | Container CPU utilization | 60-100 | | **Memory Usage** | 10% | Container memory consumption | 70-100 | ### Score Interpretation - **90-100**: Excellent performance, optimal for routing - **80-89**: Good performance, suitable for most traffic - **70-79**: Acceptable performance, may experience delays - **60-69**: Poor performance, consider failover - **<60**: Critical issues, automatic failover triggered ### Health Score Thresholds ``` # Configuration examples EXCELLENT_THRESHOLD = 90.0 GOOD_THRESHOLD = 80.0 ACCEPTABLE_THRESHOLD = 70.0 POOR_THRESHOLD = 60.0 CRITICAL_THRESHOLD = 50.0 ``` ## Speed Testing Integration Speed testing provides crucial data for load balancing decisions through comprehensive performance evaluation. ### Speed Test Components #### Download Speed Testing ``` async def _test_download_speed(self, node_id: str, test_url: str) -> Dict: """Test download speed by downloading a file inside the container""" # Create curl command to test download speed curl_cmd = [ "curl", "-s", "-w", "%{time_total},%{speed_download},%{size_download}", "-o", "/dev/null", "--max-time", "60", # 60 second timeout test_url ] container = self.docker_manager.client.containers.get(node_id) result = container.exec_run(curl_cmd, demux=False) # Parse results and convert to Mbps time_total, speed_download, size_download = output.split(',') mbps = (float(speed_download) * 8) / (1024 * 1024) return { 'mbps': mbps, 'time_seconds': float(time_total), 'size_bytes': float(size_download) } ``` #### Latency Testing ``` async def _test_latency(self, node_id: str) -> Dict: """Test latency to multiple endpoints""" ping_endpoints = [ "https://www.google.com", "https://www.cloudflare.com", "https://www.github.com", "https://httpbin.org/ip" ] # Test each endpoint and calculate average successful_tests = [] for endpoint in ping_endpoints: # Use curl to measure connection time latency_ms = float(connect_time) * 1000 successful_tests.append(latency_ms) avg_latency = sum(successful_tests) / len(successful_tests) return {'avg_latency': avg_latency, 'tests': latency_tests} ``` ### Speed Test Scheduling ``` # Automatic speed testing async def schedule_speed_tests(): """Run speed tests on all nodes every hour""" while True: try: results = await speed_tester.test_all_nodes("1MB") logger.info(f"Speed tests completed: {len(results)} nodes tested") except Exception as e: logger.error(f"Speed test cycle failed: {e}") await asyncio.sleep(3600) # 1 hour interval ``` ### Historical Data Usage Speed test results are stored in Redis with time-series data: ``` def _store_speed_test_result(self, node_id: str, result: Dict): """Store speed test result in Redis""" # Store latest result (1 hour TTL) key = f"speedtest:{node_id}:latest" redis_manager.client.setex(key, 3600, json.dumps(result)) # Store in history (keep last 24 hours) history_key = f"speedtest:{node_id}:history" timestamp = datetime.utcnow().timestamp() redis_manager.client.zadd(history_key, {json.dumps(result): timestamp}) # Remove old entries (older than 24 hours) cutoff = (datetime.utcnow() - timedelta(hours=24)).timestamp() redis_manager.client.zremrangebyscore(history_key, 0, cutoff) ``` ### Performance Trend Analysis ``` def analyze_performance_trends(node_id: str) -> Dict: """Analyze performance trends over time""" history = speed_tester.get_speed_test_history(node_id, hours=24) if len(history) < 2: return {"trend": "insufficient_data"} speeds = [h['download_mbps'] for h in history if 'download_mbps' in h] latencies = [h['latency_ms'] for h in history if 'latency_ms' in h] # Calculate trends speed_trend = "improving" if speeds[-1] > speeds[0] else "degrading" latency_trend = "improving" if latencies[-1] < latencies[0] else "degrading" return { "speed_trend": speed_trend, "latency_trend": latency_trend, "avg_speed_24h": sum(speeds) / len(speeds), "avg_latency_24h": sum(latencies) / len(latencies) } ``` ## Failover Logic The failover system ensures service continuity when nodes become unhealthy or disconnected. ### Automatic Failover Triggers 1. **VPN Connection Failure**: Node loses connection to VPN server 2. **High Resource Usage**: CPU > 90% or Memory > 1GB for 5 minutes 3. **Network Connectivity Issues**: Cannot reach test endpoints 4. **Container Health Check Failure**: Docker health checks fail ### Failover Process ``` async def handle_node_failure(self, node_id: str, failure_reason: str) -> bool: """Handle a failed node by attempting failover to a different server""" # Check if failover already in progress if node_id in self.failover_in_progress: return False self.failover_in_progress.add(node_id) try: # Get node details and alternative server node = self.docker_manager.get_node_details(node_id) country = node['country'] current_server = node['server'] # Check failover limits if not self._can_failover(node_id): return False # Get alternative server new_server = await self._get_alternative_server(country, current_server) if not new_server: return False # Perform failover success = await self._perform_failover(node_id, country, new_server) # Record attempt self._record_failover_attempt(node_id, country, current_server, new_server, success) return success finally: self.failover_in_progress.discard(node_id) ``` ### Failover Constraints ``` class FailoverManager: def __init__(self): self.max_failover_attempts = 3 # Max attempts per hour self.failover_cooldown = 300 # 5 minutes between attempts self.failover_history = {} # Track attempts per node ``` ### Server Selection for Failover ``` async def _get_alternative_server(self, country: str, exclude_server: str) -> Optional[str]: """Get an alternative server for failover""" # Get all servers for the country servers = vpn_server_manager.get_servers_for_country(country) # Filter out current and blacklisted servers available_servers = [ s for s in servers if s['hostname'] != exclude_server and not redis_manager.is_server_blacklisted(s['hostname']) ] # Sort by health score available_servers.sort(key=lambda s: s.get('health_score', 50), reverse=True) # Test top 3 servers for server in available_servers[:3]: success, latency = await vpn_server_manager.health_check_server(server['hostname']) if success: return server['hostname'] return available_servers[0]['hostname'] if available_servers else None ``` ### Recovery Procedures 1. **Immediate Recovery**: Stop failed container, start new one with different server 2. **Graceful Recovery**: Wait for existing connections to drain before switching 3. **Rollback Recovery**: Return to previous working server if new server fails ## Configuration and Tuning ### Load Balancing Parameters ``` # /opt/vpn-exit-controller/.env LOAD_BALANCER_STRATEGY=health_score LOAD_BALANCER_ENABLED=true MAX_NODES_PER_COUNTRY=3 AUTO_SCALE_ENABLED=true SCALE_UP_THRESHOLD=50 # connections per node SCALE_DOWN_THRESHOLD=10 # connections per node ``` ### Health Check Intervals ``` # Configuration in services/metrics_collector.py class MetricsCollector: def __init__(self, interval_seconds: int = 30): # Collect every 30 seconds self.interval = interval_seconds ``` ### Performance Thresholds ``` # CPU and memory thresholds for scaling decisions CPU_THRESHOLD_HIGH = 80.0 # Scale up trigger CPU_THRESHOLD_LOW = 20.0 # Scale down trigger MEMORY_THRESHOLD_HIGH = 500 # MB, scale up trigger MEMORY_THRESHOLD_LOW = 300 # MB, scale down trigger ``` ### Auto-scaling Configuration ``` async def start_additional_node_if_needed(self, country: str) -> bool: """Start additional node if load is high""" nodes = self._get_healthy_nodes_for_country(country) if not nodes: return False # Check if we need more capacity total_connections = sum(redis_manager.get_connection_count(n['id']) for n in nodes) avg_connections_per_node = total_connections / len(nodes) # Start new node if average > 50 connections per node and < 3 nodes if avg_connections_per_node > 50 and len(nodes) < 3: logger.info(f"High load detected for {country}, starting additional node") # Start new node... return True return False ``` ### Tuning Recommendations | Scenario | Strategy | Max Nodes | Thresholds | |----------|----------|-----------|------------| | **High Traffic** | `health_score` | 5 | Scale up: 30 conn/node | | **Low Latency** | `weighted_latency` | 3 | Scale up: 20 conn/node | | **Cost Optimized** | `least_connections` | 2 | Scale up: 80 conn/node | | **Testing** | `round_robin` | 3 | Scale up: 50 conn/node | ## Monitoring and Metrics ### Key Metrics Collection The system continuously collects metrics for load balancing decisions: ``` class MetricsCollector: """Background service that continuously collects metrics from all nodes""" async def _collect_node_metrics(self, node_id: str): """Collect metrics for a single node""" # Get detailed node info (includes Docker stats) node_details = self.docker_manager.get_node_details(node_id) # Check for anomalies if node_details.get('stats'): stats = node_details['stats'] # Alert on high resource usage if stats.get('cpu_percent', 0) > 80: logger.warning(f"High CPU usage on node {node_id}: {stats['cpu_percent']:.1f}%") if stats.get('memory_mb', 0) > 500: logger.warning(f"High memory usage on node {node_id}: {stats['memory_mb']:.1f}MB") ``` ### Load Balancing Statistics ``` def get_load_balancing_stats(self) -> Dict: """Get comprehensive load balancing statistics""" stats = { 'strategies': [s.value for s in LoadBalancingStrategy], 'round_robin_counters': self.round_robin_counters, 'countries': {} } # Get stats per country all_nodes = self.docker_manager.list_nodes() countries = set(n['country'] for n in all_nodes) for country in countries: nodes = self._get_healthy_nodes_for_country(country) total_connections = sum(redis_manager.get_connection_count(n['id']) for n in nodes) stats['countries'][country] = { 'node_count': len(nodes), 'total_connections': total_connections, 'avg_connections_per_node': total_connections / len(nodes) if nodes else 0, 'nodes': [ { 'id': n['id'], 'server': n.get('vpn_server', 'unknown'), 'connections': redis_manager.get_connection_count(n['id']), 'tailscale_ip': n.get('tailscale_ip'), 'cpu_percent': n.get('stats', {}).get('cpu_percent', 0), 'health_score': await self._calculate_node_health_score(n) } for n in nodes ] } return stats ``` ### Performance Monitoring ``` # Monitor load balancing in real-time curl -u admin:password http://localhost:8080/api/load-balancer/stats | jq # Get speed test summary curl -u admin:password http://localhost:8080/api/speed-test/summary | jq # Monitor metrics curl -u admin:password http://localhost:8080/api/metrics/current | jq ``` ### Alert Conditions | Condition | Threshold | Action | |-----------|-----------|---------| | High CPU Usage | >80% for 5 min | Scale up or failover | | High Memory | >500MB | Scale up or failover | | High Connection Count | >100 per node | Scale up | | Low Speed | <10 Mbps | Investigate/failover | | High Latency | >200ms | Switch strategy or failover | | Node Down | Health check fails | Immediate failover | ### Reporting and Analysis ``` # Generate load balancing report async def generate_load_balancing_report(hours: int = 24) -> Dict: """Generate comprehensive load balancing report""" report = { 'period_hours': hours, 'generated_at': datetime.utcnow().isoformat(), 'summary': {}, 'by_country': {}, 'performance_trends': {}, 'recommendations': [] } # Analyze each country countries = get_all_countries() for country in countries: nodes = get_nodes_for_country(country) # Calculate statistics total_connections = sum(get_connection_count(n['id']) for n in nodes) avg_speed = calculate_avg_speed(nodes, hours) avg_latency = calculate_avg_latency(nodes, hours) report['by_country'][country] = { 'node_count': len(nodes), 'total_connections': total_connections, 'avg_speed_mbps': avg_speed, 'avg_latency_ms': avg_latency, 'failover_events': count_failover_events(country, hours) } # Generate recommendations if avg_speed < 20: report['recommendations'].append(f"Consider adding more nodes to {country} - low speed detected") if total_connections / len(nodes) > 50: report['recommendations'].append(f"Scale up {country} - high load detected") return report ``` ## Advanced Features ### Connection Affinity/Sticky Sessions ``` class ConnectionAffinity: """Manage connection affinity for consistent routing""" def __init__(self): self.client_node_map = {} # client_ip -> node_id self.affinity_timeout = 3600 # 1 hour async def get_affinity_node(self, client_ip: str, country: str) -> Optional[str]: """Get node with existing affinity for client""" affinity_key = f"affinity:{client_ip}:{country}" node_id = redis_manager.client.get(affinity_key) if node_id: # Check if node is still healthy healthy, _ = docker_manager.check_container_health(node_id) if healthy: # Refresh affinity timeout redis_manager.client.expire(affinity_key, self.affinity_timeout) return node_id else: # Remove stale affinity redis_manager.client.delete(affinity_key) return None async def set_affinity(self, client_ip: str, country: str, node_id: str): """Set client affinity to specific node""" affinity_key = f"affinity:{client_ip}:{country}" redis_manager.client.setex(affinity_key, self.affinity_timeout, node_id) ``` ### Geographic Routing Preferences ``` class GeographicRouter: """Route based on geographic preferences""" REGION_PREFERENCES = { 'americas': ['us', 'ca', 'br'], 'europe': ['de', 'uk', 'fr', 'nl'], 'asia': ['jp', 'sg', 'hk', 'au'], 'africa': ['za'], 'oceania': ['au', 'nz'] } async def get_preferred_country(self, client_region: str, requested_country: str) -> str: """Get preferred country based on client region""" # Return requested country if available and healthy if self.is_country_healthy(requested_country): return requested_country # Find alternative in same region preferred_countries = self.REGION_PREFERENCES.get(client_region, []) for country in preferred_countries: if self.is_country_healthy(country): logger.info(f"Routing {client_region} client to {country} instead of {requested_country}") return country # Fallback to any healthy country return self.get_any_healthy_country() ``` ### Custom Load Balancing Rules ``` class CustomLoadBalancingRules: """Implement custom load balancing rules""" def __init__(self): self.rules = [] def add_rule(self, rule: Dict): """Add custom routing rule""" self.rules.append({ 'id': str(uuid4()), 'name': rule['name'], 'condition': rule['condition'], 'action': rule['action'], 'priority': rule.get('priority', 100), 'enabled': rule.get('enabled', True) }) async def evaluate_rules(self, context: Dict) -> Optional[str]: """Evaluate rules and return target node""" # Sort by priority active_rules = sorted( [r for r in self.rules if r['enabled']], key=lambda x: x['priority'] ) for rule in active_rules: if self._matches_condition(rule['condition'], context): return await self._execute_action(rule['action'], context) return None def _matches_condition(self, condition: Dict, context: Dict) -> bool: """Check if context matches rule condition""" # Example conditions: # {"source_device": "iPhone", "domain": "*.streaming.com"} # {"time_range": "09:00-17:00", "country": "us"} # {"client_ip_range": "192.168.1.0/24"} for key, value in condition.items(): if key == 'source_device': if context.get('user_agent', '').find(value) == -1: return False elif key == 'domain': if not fnmatch.fnmatch(context.get('domain', ''), value): return False elif key == 'time_range': current_time = datetime.now().strftime('%H:%M') start, end = value.split('-') if not (start <= current_time <= end): return False return True ``` ### API-based Load Balancing Control ``` # Extended API endpoints for advanced control @router.post("/rules") async def create_load_balancing_rule(rule: CustomRule, user=Depends(verify_auth)): """Create custom load balancing rule""" custom_rules.add_rule(rule.dict()) return {"status": "rule_created", "rule": rule} @router.put("/strategy/{country}") async def set_country_strategy( country: str, strategy: LoadBalancingStrategy, user=Depends(verify_auth) ): """Set load balancing strategy for specific country""" load_balancer.set_country_strategy(country, strategy) return {"country": country, "strategy": strategy.value} @router.post("/rebalance/{country}") async def force_rebalance(country: str, user=Depends(verify_auth)): """Force rebalancing of connections in a country""" result = await load_balancer.rebalance_country(country) return {"country": country, "rebalanced_connections": result} @router.get("/prediction/{country}") async def get_load_prediction(country: str, hours: int = 1, user=Depends(verify_auth)): """Get load prediction for next N hours""" prediction = await load_balancer.predict_load(country, hours) return prediction ``` ## API Reference ### Load Balancer Endpoints #### Get Load Balancing Statistics ``` GET /api/load-balancer/stats Authorization: Basic ``` **Response:** ``` { "strategies": ["round_robin", "least_connections", "weighted_latency", "random", "health_score"], "round_robin_counters": {"us": 5, "uk": 2}, "countries": { "us": { "node_count": 2, "total_connections": 45, "avg_connections_per_node": 22.5, "nodes": [ { "id": "container_123", "server": "us5063.nordvpn.com", "connections": 25, "tailscale_ip": "100.73.33.15", "cpu_percent": 45.2, "health_score": 87.3 } ] } } } ``` #### Get Best Node for Country ``` GET /api/load-balancer/best-node/{country}?strategy=health_score Authorization: Basic ``` **Response:** ``` { "selected_node": { "id": "container_123", "country": "us", "server": "us5063.nordvpn.com", "tailscale_ip": "100.73.33.15", "health_score": 87.3 }, "strategy": "health_score", "country": "us" } ``` #### Scale Up Country ``` POST /api/load-balancer/scale-up/{country} Authorization: Basic ``` #### Scale Down Country ``` POST /api/load-balancer/scale-down/{country} Authorization: Basic ``` #### Get Available Strategies ``` GET /api/load-balancer/strategies Authorization: Basic ``` **Response:** ``` { "strategies": [ { "name": "round_robin", "description": "Distributes requests evenly across all healthy nodes" }, { "name": "least_connections", "description": "Routes to the node with fewest active connections" }, { "name": "weighted_latency", "description": "Routes based on server latency with weighted randomization" }, { "name": "random", "description": "Randomly selects from available healthy nodes" }, { "name": "health_score", "description": "Routes to node with best overall health score (CPU, memory, latency, connections)" } ] } ``` ## Troubleshooting ### Common Issues #### 1. No Healthy Nodes Available **Symptoms:** - API returns 404 "No healthy nodes available" - Load balancer cannot route traffic **Diagnosis:** ``` # Check node health curl -u admin:password http://localhost:8080/api/nodes/list | jq '.[] | select(.status == "running")' # Check container health docker ps --filter "label=vpn-exit-node" # Check VPN connections docker exec curl -s ipinfo.io ``` **Solutions:** 1. Restart unhealthy containers: `docker restart ` 2. Check VPN credentials in `/opt/vpn-exit-controller/configs/auth.txt` 3. Verify network connectivity: `docker exec ping 8.8.8.8` 4. Force failover: `curl -X POST http://localhost:8080/api/failover/force/` #### 2. Load Imbalance **Symptoms:** - One node has significantly more connections than others - Performance degradation on overloaded nodes **Diagnosis:** ``` # Check connection distribution curl -u admin:password http://localhost:8080/api/load-balancer/stats | jq '.countries' # Check strategy curl -u admin:password http://localhost:8080/api/config | jq '.load_balancer' ``` **Solutions:** 1. Switch to `least_connections` strategy 2. Force rebalancing: `curl -X POST http://localhost:8080/api/load-balancer/rebalance/` 3. Increase connection drain timeout 4. Add more nodes: `curl -X POST http://localhost:8080/api/load-balancer/scale-up/` #### 3. Frequent Failovers **Symptoms:** - High number of failover events in logs - Unstable node assignments **Diagnosis:** ``` # Check failover history curl -u admin:password http://localhost:8080/api/failover/status | jq # Check server health curl -u admin:password http://localhost:8080/api/speed-test/summary | jq ``` **Solutions:** 1. Increase failover cooldown period 2. Check VPN server stability 3. Review health check thresholds 4. Blacklist problematic servers #### 4. Poor Performance **Symptoms:** - Slow connection speeds - High latency **Diagnosis:** ``` # Run speed tests curl -X POST -u admin:password http://localhost:8080/api/speed-test/run-all # Check health scores curl -u admin:password http://localhost:8080/api/load-balancer/stats | jq '.countries[].nodes[].health_score' ``` **Solutions:** 1. Switch to `weighted_latency` strategy 2. Add more nodes in region 3. Use different VPN servers 4. Check network congestion ### Debug Commands ``` # Enable debug logging export LOG_LEVEL=DEBUG # Check Redis data redis-cli > KEYS speedtest:* > KEYS affinity:* > KEYS server_health:* # Monitor load balancer decisions journalctl -u vpn-controller -f | grep "load_balancer" # Test specific node curl -X POST -u admin:password http://localhost:8080/api/speed-test/node/ # Force strategy change curl -X PUT -u admin:password http://localhost:8080/api/load-balancer/strategy/ \ -H "Content-Type: application/json" \ -d '{"strategy": "health_score"}' ``` ### Performance Optimization Tips 1. **Strategy Selection:** - Use `health_score` for general purpose - Use `weighted_latency` for latency-sensitive apps - Use `least_connections` for long-lived connections 2. **Resource Tuning:** - Monitor CPU/memory usage patterns - Adjust scaling thresholds based on traffic - Set appropriate connection limits 3. **Network Optimization:** - Choose VPN servers close to users - Monitor and blacklist slow servers - Use multiple servers per country 4. **Monitoring:** - Set up alerts for health score < 70 - Monitor failover frequency - Track connection distribution This comprehensive load balancing system ensures optimal performance, reliability, and scalability for the VPN Exit Controller infrastructure. --- ## Guide > Proxy Usage ### VPN Exit Controller - Usage Guide This guide explains the **dual-mode access** provided by the VPN Exit Controller: **Tailscale Exit Nodes** for network-level routing and **Proxy Services** for application-level routing. ## Overview The VPN Exit Controller provides two complementary approaches for routing traffic through VPN containers in different countries: 1. **🌐 Tailscale Exit Nodes**: Full network-level routing where entire devices/networks route through VPN containers 2. **🔗 Proxy Services**: Application-level routing where individual applications use HTTP/HTTPS/SOCKS5 proxies Both approaches use the same underlying VPN containers but provide different levels of integration and control. ### Architecture Summary ``` ┌─ Tailscale Exit Nodes ──────────────────────────┐ ┌─ Proxy Access ──────────────────────────────────┐ │ │ │ │ │ Device/Network → Tailscale → VPN Container │ │ Application → Tailscale → VPN Container │ │ (Exit Node) (NordVPN) │ │ (Proxy) (Squid/Dante) │ │ │ │ │ └──────────────────────────────────────────────────┘ └──────────────────────────────────────────────────┘ ↓ Internet (Country IP) ``` **VPN Container Services:** - **Tailscale Exit Node**: Full network routing via Tailscale mesh (`--advertise-exit-node`) - **Squid HTTP/HTTPS Proxy**: Port 3128 for web traffic (accessible via Tailscale IP) - **Dante SOCKS5 Proxy**: Port 1080 for application tunneling (accessible via Tailscale IP) - **Health Check Endpoint**: Port 8080 for container monitoring - **DNS Resolution**: Uses NordVPN DNS (103.86.96.100, 103.86.99.100) with fallback ## Choosing Your Approach ### 🌐 When to Use Tailscale Exit Nodes **Best for:** - Routing all traffic from a device through a specific country - Mobile devices (iPhone, Android) using Tailscale app - Docker containers or VMs that need VPN access - Development environments requiring consistent geo-location - Any scenario where you want "set it and forget it" VPN routing **Example: Route your entire laptop through Germany** ``` # List available exit nodes tailscale status --peers | grep exit-de # Enable Germany exit node tailscale up --exit-node=exit-de-server456 # All traffic now appears from Germany curl https://ipinfo.io/ip # Returns German IP ``` ### 🔗 When to Use Proxy Services **Best for:** - Specific applications that need different geo-locations - Web scraping with rotating country IPs - Testing geo-restricted content from multiple countries - Development/testing without affecting system-wide traffic - Applications that already support proxy configuration **Example: Test from multiple countries simultaneously** ``` # Test US endpoint via direct Tailscale proxy curl -x http://100.86.140.98:3128 https://api.example.com/us # Test German endpoint via different container curl -x http://100.72.45.23:3128 https://api.example.com/de # Test UK endpoint curl -x http://100.125.27.111:3128 https://api.example.com/uk ``` ### Getting Current VPN Container Information To discover available VPN containers and their Tailscale IPs: ``` # Get all active nodes with their Tailscale IPs curl -u admin:Bl4ckMagic!2345erver http://100.73.33.11:8080/api/nodes # Get optimal node for a specific country curl -u admin:Bl4ckMagic!2345erver http://100.73.33.11:8080/api/load-balancer/best-node/us # Get optimal UK node curl -u admin:Bl4ckMagic!2345erver http://100.73.33.11:8080/api/load-balancer/best-node/uk # List all available Tailscale exit nodes tailscale status --peers | grep "exit-" ``` ## 1. Proxy URL Format ### Base Domain Structure All proxy endpoints use the following domain pattern: ``` proxy-{country}.rbnk.uk ``` ### Available Countries and Codes | Country | Code | Proxy URL | Description | |---------|------|-----------|-------------| | United States | `us` | `proxy-us.rbnk.uk` | US-based exit nodes | | Germany | `de` | `proxy-de.rbnk.uk` | German exit nodes | | Japan | `jp` | `proxy-jp.rbnk.uk` | Japanese exit nodes | | United Kingdom | `uk` | `proxy-uk.rbnk.uk` | UK-based exit nodes | | Canada | `ca` | `proxy-ca.rbnk.uk` | Canadian exit nodes | | Australia | `au` | `proxy-au.rbnk.uk` | Australian exit nodes | | Netherlands | `nl` | `proxy-nl.rbnk.uk` | Dutch exit nodes | | France | `fr` | `proxy-fr.rbnk.uk` | French exit nodes | | Italy | `it` | `proxy-it.rbnk.uk` | Italian exit nodes | | Spain | `es` | `proxy-es.rbnk.uk` | Spanish exit nodes | | Switzerland | `ch` | `proxy-ch.rbnk.uk` | Swiss exit nodes | | Austria | `at` | `proxy-at.rbnk.uk` | Austrian exit nodes | | Belgium | `be` | `proxy-be.rbnk.uk` | Belgian exit nodes | | Czech Republic | `cz` | `proxy-cz.rbnk.uk` | Czech exit nodes | | Denmark | `dk` | `proxy-dk.rbnk.uk` | Danish exit nodes | | Hong Kong | `hk` | `proxy-hk.rbnk.uk` | Hong Kong exit nodes | | Hungary | `hu` | `proxy-hu.rbnk.uk` | Hungarian exit nodes | | Ireland | `ie` | `proxy-ie.rbnk.uk` | Irish exit nodes | | Norway | `no` | `proxy-no.rbnk.uk` | Norwegian exit nodes | | Poland | `pl` | `proxy-pl.rbnk.uk` | Polish exit nodes | | Romania | `ro` | `proxy-ro.rbnk.uk` | Romanian exit nodes | | Serbia | `rs` | `proxy-rs.rbnk.uk` | Serbian exit nodes | | Singapore | `sg` | `proxy-sg.rbnk.uk` | Singapore exit nodes | | Sweden | `se` | `proxy-se.rbnk.uk` | Swedish exit nodes | | Bulgaria | `bg` | `proxy-bg.rbnk.uk` | Bulgarian exit nodes | ### SSL/HTTPS Support All proxy endpoints support SSL/TLS encryption with valid certificates from Let's Encrypt via Cloudflare DNS challenge. ## 2. Proxy Protocols The VPN Exit Controller now provides multiple proxy protocols running inside each VPN container: ### HTTP/HTTPS Proxy (Port 3128) - Squid - **URL Format**: `http://:3128` - **Protocol**: HTTP/1.1 with HTTPS CONNECT support - **Service**: Squid proxy server - **Use Case**: Web browsing, API calls, general HTTP/HTTPS traffic - **Features**: - Header modification and anonymization - Caching disabled for privacy - Access control for Tailscale network (100.64.0.0/10) - SSL port filtering and security checks - **Example**: `curl -x http://100.86.140.98:3128 http://ipinfo.io/ip` ### SOCKS5 Proxy (Port 1080) - Dante - **URL Format**: `socks5://:1080` - **Protocol**: SOCKS5 - **Service**: Dante SOCKS server - **Use Case**: Application-level proxying, TCP traffic tunneling - **Features**: - Protocol-agnostic (works with any TCP application) - No HTTP header inspection - Full TCP tunnel support - **Example**: `curl --socks5 100.86.140.98:1080 http://ipinfo.io/ip` ### Health Check Endpoint (Port 8080) - **URL Format**: `http://:8080/health` - **Protocol**: HTTP/1.0 - **Use Case**: Container health monitoring, load balancing decisions - **Response**: Simple "OK" response for health checks - **Features**: - Lightweight HTTP server - Used by HAProxy for backend health checks - Always returns 200 OK if container is running ### Legacy Country-Specific URLs (Deprecated) The original country-specific proxy URLs (`proxy-{country}.rbnk.uk:8080`) are being phased out in favor of direct Tailscale IP access for better performance and reliability. ## 3. Client Configuration ### Browser Proxy Settings #### Chrome/Chromium ``` # HTTP proxy through Tailscale IP google-chrome --proxy-server="http://100.86.140.98:3128" # SOCKS5 proxy through Tailscale IP google-chrome --proxy-server="socks5://100.86.140.98:1080" # UK proxy examples google-chrome --proxy-server="http://100.125.27.111:3128" google-chrome --proxy-server="http://proxy-uk.rbnk.uk:8132" # Legacy country-specific (still supported) google-chrome --proxy-server="http://proxy-us.rbnk.uk:8080" ``` #### Firefox 1. Go to Settings → Network Settings 2. Select "Manual proxy configuration" 3. **Modern Setup (Recommended):** - HTTP Proxy: `100.86.140.98` Port: `3128` - HTTPS Proxy: `100.86.140.98` Port: `3128` - SOCKS5 Proxy: `100.86.140.98` Port: `1080` - **UK**: HTTP/HTTPS: `100.125.27.111` Port: `3128`, SOCKS5: `100.125.27.111` Port: `1080` 4. **Legacy Setup:** - HTTP Proxy: `proxy-us.rbnk.uk` Port: `8080` - HTTPS Proxy: `proxy-us.rbnk.uk` Port: `8443` - **UK**: HTTP: `proxy-uk.rbnk.uk` Port: `8132`, SOCKS5: `proxy-uk.rbnk.uk` Port: `1084` ### Command Line Examples #### cURL ``` # HTTP proxy (modern - direct Tailscale IP) curl -x http://100.86.140.98:3128 http://ipinfo.io/ip # HTTPS proxy (modern - same port for HTTP proxy with CONNECT) curl -x http://100.86.140.98:3128 https://ipinfo.io/ip # SOCKS5 proxy (modern - direct Tailscale IP) curl --socks5 100.86.140.98:1080 http://ipinfo.io/ip # Legacy country-specific URLs (still supported) curl -x http://proxy-us.rbnk.uk:8080 http://ipinfo.io/ip curl --socks5 proxy-us.rbnk.uk:1080 http://ipinfo.io/ip # Test with different countries curl -x http://proxy-uk.rbnk.uk:8132 http://ipinfo.io/ip curl -x http://proxy-de.rbnk.uk:8080 http://ipinfo.io/ip curl --socks5 proxy-uk.rbnk.uk:1084 http://ipinfo.io/ip ``` #### wget ``` # HTTP proxy wget -e use_proxy=yes -e http_proxy=proxy-us.rbnk.uk:8080 http://ifconfig.me # HTTPS proxy wget -e use_proxy=yes -e https_proxy=proxy-us.rbnk.uk:8443 https://ifconfig.me ``` ### Programming Language Examples #### Python (requests) ``` import requests # HTTP proxy proxies = { 'http': 'http://proxy-us.rbnk.uk:8080', 'https': 'https://proxy-us.rbnk.uk:8443' } response = requests.get('http://ifconfig.me', proxies=proxies) print(f"Your IP: {response.text}") # SOCKS5 proxy (requires PySocks) proxies = { 'http': 'socks5://proxy-us.rbnk.uk:1080', 'https': 'socks5://proxy-us.rbnk.uk:1080' } response = requests.get('http://ifconfig.me', proxies=proxies) print(f"Your IP: {response.text}") # With authentication proxies = { 'http': 'http://username:password@proxy-us.rbnk.uk:8080', 'https': 'https://username:password@proxy-us.rbnk.uk:8443' } ``` #### Node.js ``` const axios = require('axios'); const HttpsProxyAgent = require('https-proxy-agent'); const SocksProxyAgent = require('socks-proxy-agent'); // HTTP proxy const httpAgent = new HttpsProxyAgent('http://proxy-us.rbnk.uk:8080'); const response = await axios.get('http://ifconfig.me', { httpAgent }); console.log(`Your IP: ${response.data}`); // SOCKS5 proxy const socksAgent = new SocksProxyAgent('socks5://proxy-us.rbnk.uk:1080'); const response2 = await axios.get('http://ifconfig.me', { httpAgent: socksAgent }); console.log(`Your IP: ${response2.data}`); ``` #### Go ``` package main import ( "fmt" "io/ioutil" "net/http" "net/url" ) func main() { proxyURL, _ := url.Parse("http://proxy-us.rbnk.uk:8080") client := &http.Client{ Transport: &http.Transport{ Proxy: http.ProxyURL(proxyURL), }, } resp, err := client.Get("http://ifconfig.me") if err != nil { panic(err) } body, _ := ioutil.ReadAll(resp.Body) fmt.Printf("Your IP: %s\n", string(body)) } ``` ### System-wide Proxy Configuration #### Linux/macOS Environment Variables ``` export http_proxy=http://proxy-us.rbnk.uk:8080 export https_proxy=https://proxy-us.rbnk.uk:8443 export HTTP_PROXY=http://proxy-us.rbnk.uk:8080 export HTTPS_PROXY=https://proxy-us.rbnk.uk:8443 # SOCKS5 export all_proxy=socks5://proxy-us.rbnk.uk:1080 export ALL_PROXY=socks5://proxy-us.rbnk.uk:1080 ``` #### Windows ``` set http_proxy=http://proxy-us.rbnk.uk:8080 set https_proxy=https://proxy-us.rbnk.uk:8443 ``` ## 4. Authentication ### HTTP Basic Authentication The system supports HTTP Basic Authentication for API access. Credentials are managed through the VPN Exit Controller API. #### API Authentication Format ``` curl -u username:password -H "Content-Type: application/json" \ http://10.10.10.20:8080/api/proxy/urls ``` ### Credential Management - Credentials are stored in `/opt/vpn-exit-controller/configs/auth.txt` - API endpoints require authentication via the `verify_auth` dependency - Web UI uses credentials: `admin:Bl4ckMagic!2345erver` ### Proxy Authentication (if implemented) ``` # Python example with proxy authentication proxies = { 'http': 'http://username:password@proxy-us.rbnk.uk:8080', 'https': 'https://username:password@proxy-us.rbnk.uk:8443' } ``` ## 5. Load Balancing and Failover ### Automatic Load Balancing The system implements intelligent load balancing with multiple strategies: #### Available Strategies - **Health Score** (default): Combines latency, connection count, and server health - **Least Connections**: Routes to the server with fewest active connections - **Round Robin**: Distributes requests evenly across servers - **Weighted Latency**: Prioritizes servers with lower latency - **Random**: Randomly selects from healthy servers #### API Usage ``` # Get optimal proxy for a country curl -u admin:Bl4ckMagic!2345erver \ "http://10.10.10.20:8080/api/proxy/optimal/us?strategy=health_score" # Response example { "node_id": "vpn-us-node-1", "country": "us", "tailscale_ip": "100.73.33.15", "server": "us5063.nordvpn.com", "proxy_urls": { "http": "http://proxy-us.rbnk.uk:8080", "https": "https://proxy-us.rbnk.uk:8443", "socks5": "socks5://proxy-us.rbnk.uk:1080" }, "selected_strategy": "health_score" } ``` ### Failover Behavior - **Health Monitoring**: Continuous health checks every 10 seconds - **Automatic Failover**: Unhealthy nodes automatically removed from rotation - **Backup Servers**: Default backup servers activated when all primary nodes fail - **Connection Draining**: Graceful handling of existing connections during failover ### Performance Optimization Tips 1. **Connection Pooling**: Reuse connections when possible 2. **Country Selection**: Choose geographically closer countries for better latency 3. **Protocol Selection**: Use SOCKS5 for maximum compatibility, HTTP for web traffic 4. **Load Balancing**: Let the system handle load balancing rather than sticky sessions ## 6. Use Cases ### Geo-location Testing ``` # Test website from different countries curl -x http://proxy-us.rbnk.uk:8080 "https://ipinfo.io/json" curl -x http://proxy-uk.rbnk.uk:8080 "https://ipinfo.io/json" curl -x http://proxy-de.rbnk.uk:8080 "https://ipinfo.io/json" ``` ### Content Access by Region ``` import requests countries = ['us', 'uk', 'de', 'jp'] ports = {'us': 8080, 'uk': 8132, 'de': 8080, 'jp': 8080} for country in countries: port = ports[country] proxy = f'http://proxy-{country}.rbnk.uk:{port}' response = requests.get('https://example.com', proxies={'http': proxy, 'https': proxy}) print(f"{country.upper()}: {response.status_code}") ``` ### Web Scraping with Different IP Addresses ``` import requests import random countries = ['us', 'uk', 'de', 'ca', 'au'] ports = {'us': 8080, 'uk': 8132, 'de': 8080, 'ca': 8080, 'au': 8080} def get_random_proxy(): country = random.choice(countries) port = ports[country] return { 'http': f'http://proxy-{country}.rbnk.uk:{port}', 'https': f'http://proxy-{country}.rbnk.uk:{port}' } # Rotate proxies for each request for i in range(10): proxies = get_random_proxy() response = requests.get('https://httpbin.org/ip', proxies=proxies) print(f"Request {i+1}: {response.json()['origin']}") ``` ### Privacy and Anonymity ``` # Check your real IP curl http://ifconfig.me # Check IP through US proxy curl -x http://proxy-us.rbnk.uk:8080 http://ifconfig.me # Check IP through different countries for country in us uk de jp; do echo -n "$country: " case $country in uk) port=8132 ;; *) port=8080 ;; esac curl -s -x http://proxy-$country.rbnk.uk:$port http://ifconfig.me done ``` ## 7. Performance Considerations ### Speed Test Results Interpretation The system includes built-in speed testing capabilities: ``` # Get speed test results curl -u admin:Bl4ckMagic!2345erver \ http://10.10.10.20:8080/api/speed-test/results # Run speed test for specific country curl -u admin:Bl4ckMagic!2345erver -X POST \ http://10.10.10.20:8080/api/speed-test/run/us ``` #### Performance Metrics - **Latency**: Round-trip time to proxy server - **Bandwidth**: Upload/download speeds through proxy - **Connection Success Rate**: Percentage of successful connections - **Health Score**: Combined metric for overall proxy performance ### Optimal Country Selection ``` import requests # Get proxy statistics auth = ('admin', 'Bl4ckMagic!2345erver') response = requests.get('http://10.10.10.20:8080/api/proxy/stats', auth=auth) stats = response.json() # Find country with best performance best_country = None best_score = 0 for country, urls in stats['available_proxy_urls'].items(): # Logic to determine best country based on your requirements pass ``` ### Connection Pooling Recommendations ``` import requests from requests.adapters import HTTPAdapter from urllib3.util.retry import Retry # Configure session with connection pooling session = requests.Session() # Retry strategy retry_strategy = Retry( total=3, backoff_factor=1, status_forcelist=[429, 500, 502, 503, 504], ) adapter = HTTPAdapter( pool_connections=10, pool_maxsize=20, max_retries=retry_strategy ) session.mount("http://", adapter) session.mount("https://", adapter) # Use session with proxy proxies = {'http': 'http://proxy-us.rbnk.uk:8080'} response = session.get('https://example.com', proxies=proxies) ``` ## 8. Troubleshooting ### Common Connection Issues #### 1. Proxy Connection Refused ``` # Check if proxy is running curl -I http://proxy-us.rbnk.uk:8080 # Check specific node health curl -u admin:Bl4ckMagic!2345erver \ http://10.10.10.20:8080/api/proxy/health ``` #### 2. DNS Resolution Issues ``` # Test DNS resolution nslookup proxy-us.rbnk.uk dig proxy-us.rbnk.uk # Use alternative DNS curl --dns-servers 8.8.8.8 -x http://proxy-us.rbnk.uk:8080 http://ifconfig.me ``` #### 3. Authentication Errors ``` # Test API authentication curl -u admin:Bl4ckMagic!2345erver \ http://10.10.10.20:8080/api/status # Check authentication headers curl -v -u admin:Bl4ckMagic!2345erver \ http://10.10.10.20:8080/api/proxy/urls ``` ### Debugging Proxy Problems #### Enable Verbose Logging ``` # Python requests debugging import logging import requests logging.basicConfig(level=logging.DEBUG) requests_log = logging.getLogger("requests.packages.urllib3") requests_log.setLevel(logging.DEBUG) requests_log.propagate = True ``` #### Test Proxy Connectivity ``` # Test basic connectivity nc -zv proxy-us.rbnk.uk 8080 # Test SOCKS5 connectivity nc -zv proxy-us.rbnk.uk 1080 # Test with timeout timeout 10 curl -x http://proxy-us.rbnk.uk:8080 http://ifconfig.me ``` #### Check HAProxy Statistics ``` # Access HAProxy stats (if enabled) curl http://10.10.10.20:8404/stats # Get detailed proxy statistics curl -u admin:Bl4ckMagic!2345erver \ http://10.10.10.20:8080/api/proxy/stats ``` ### Performance Troubleshooting #### 1. Slow Proxy Response ``` # Test latency to different countries for country in us uk de jp; do echo -n "$country: " case $country in uk) port=8132 ;; *) port=8080 ;; esac time curl -s -x http://proxy-$country.rbnk.uk:$port http://ifconfig.me >/dev/null done ``` #### 2. High Connection Failures ``` # Check node health across all countries curl -u admin:Bl4ckMagic!2345erver \ http://10.10.10.20:8080/api/nodes/status # Monitor connection metrics curl -u admin:Bl4ckMagic!2345erver \ http://10.10.10.20:8080/api/metrics/connections ``` #### 3. Load Balancing Issues ``` # Force different load balancing strategies curl -u admin:Bl4ckMagic!2345erver \ "http://10.10.10.20:8080/api/proxy/optimal/us?strategy=least_connections" curl -u admin:Bl4ckMagic!2345erver \ "http://10.10.10.20:8080/api/proxy/optimal/us?strategy=round_robin" ``` ## API Reference ### Get All Proxy URLs ``` GET /api/proxy/urls Authorization: Basic YWRtaW46Qmw0Y2tNYWdpYyEyMzQ1ZXJ2ZXI= Response: { "us": { "http": "http://proxy-us.rbnk.uk:8080", "https": "https://proxy-us.rbnk.uk:8443", "socks5": "socks5://proxy-us.rbnk.uk:1080" }, "uk": { ... } } ``` ### Get Country-Specific URLs ``` GET /api/proxy/urls/{country} Authorization: Basic YWRtaW46Qmw0Y2tNYWdpYyEyMzQ1ZXJ2ZXI= Response: { "country": "us", "proxy_urls": { "http": "http://proxy-us.rbnk.uk:8080", "https": "https://proxy-us.rbnk.uk:8443", "socks5": "socks5://proxy-us.rbnk.uk:1080" } } ``` ### Get Optimal Proxy ``` GET /api/proxy/optimal/{country}?strategy=health_score Authorization: Basic YWRtaW46Qmw0Y2tNYWdpYyEyMzQ1ZXJ2ZXI= Response: { "node_id": "vpn-us-node-1", "country": "us", "tailscale_ip": "100.73.33.15", "server": "us5063.nordvpn.com", "proxy_urls": { ... }, "selected_strategy": "health_score" } ``` ## Support and Monitoring ### Health Monitoring - **Endpoint**: `http://10.10.10.20:8080/api/proxy/health` - **HAProxy Stats**: `http://10.10.10.20:8404/stats` - **System Status**: `http://10.10.10.20:8080/api/status` ### Logs and Diagnostics - **Application Logs**: `journalctl -u vpn-controller -f` - **HAProxy Logs**: `/opt/vpn-exit-controller/proxy/logs/` - **Traefik Logs**: `/opt/vpn-exit-controller/traefik/logs/` For additional support or advanced configuration, refer to the main system documentation or contact the system administrator. --- ## Includes > Abbreviations *[API]: Application Programming Interface *[CDN]: Content Delivery Network *[CLI]: Command Line Interface *[CPU]: Central Processing Unit *[DNS]: Domain Name System *[GB]: Gigabyte *[HTTP]: Hypertext Transfer Protocol *[HTTPS]: Hypertext Transfer Protocol Secure *[IP]: Internet Protocol *[JSON]: JavaScript Object Notation *[JWT]: JSON Web Token *[LB]: Load Balancer *[LLM]: Large Language Model *[LXC]: Linux Containers *[MB]: Megabyte *[RAM]: Random Access Memory *[REST]: Representational State Transfer *[SDK]: Software Development Kit *[SSD]: Solid State Drive *[SSL]: Secure Sockets Layer *[TCP]: Transmission Control Protocol *[TLS]: Transport Layer Security *[UDP]: User Datagram Protocol *[UI]: User Interface *[URL]: Uniform Resource Locator *[VM]: Virtual Machine *[VPN]: Virtual Private Network *[YAML]: YAML Ain't Markup Language --- ### VPN Exit Controller A sophisticated VPN management system that provides **dual-mode access** to country-specific VPN routing: **Tailscale Exit Nodes** for network-level routing and **Proxy URLs** for application-level routing. Features intelligent load balancing, automatic failover, and performance monitoring with a modern Next.js dashboard. ## 🚀 Overview The VPN Exit Controller manages Docker-based VPN containers that function as both **Tailscale exit nodes** and **proxy servers** across multiple countries. This dual approach provides maximum flexibility: - **🌐 Tailscale Exit Nodes**: Route entire networks or devices through VPN containers via Tailscale's mesh network - **🔗 Proxy Endpoints**: Route individual applications through HTTP/HTTPS/SOCKS5 proxies for specific use cases - **🤝 Complementary Approaches**: Use both simultaneously for different needs - network routing for general use, proxies for development/testing ## ✨ Key Features ### 🌐 Dual-Mode VPN Access - **Tailscale Exit Nodes**: Full network-level routing through VPN containers in the Tailscale mesh - **HTTP/HTTPS/SOCKS5 Proxies**: Application-level routing with direct Tailscale IP access - **Legacy Proxy URLs**: Country-specific endpoints like `proxy-us.rbnk.uk`, `proxy-de.rbnk.uk` ### 🎛️ Management & Monitoring - **Modern Web Dashboard**: Professional Next.js interface at `https://vpn.rbnk.uk` with real-time monitoring - **⚖️ Intelligent Load Balancing**: 5 strategies including health-score based routing - **🔄 Automatic Failover**: Seamless switching when nodes become unavailable - **📊 Performance Monitoring**: Real-time speed testing and latency monitoring ### 🔧 Infrastructure - **🔒 SSL Security**: Automatic certificate management with Let's Encrypt - **🐳 Container-Based**: Docker containers with NordVPN + Tailscale mesh networking - **📈 Auto-Scaling**: Automatic node scaling based on connection load - **🛡️ Health Monitoring**: Comprehensive health checks and recovery procedures - **🎨 Responsive Design**: Dashboard works on desktop, tablet, and mobile devices ## 🏗️ Architecture Overview ### Dual-Mode Access Architecture ``` ┌─ Tailscale Exit Nodes ─────────────────────────────────┐ ┌─ Proxy URLs ─────────────────────────────────────┐ │ │ │ │ │ Device/Network → Tailscale Mesh → VPN Container │ │ Application → Cloudflare → Traefik → HAProxy │ │ (100.x.x.x) (NordVPN Exit) │ │ (rbnk.uk) (SSL) (Routing) │ │ │ │ ↓ │ └─────────────────────────────────────────────────────────┘ │ VPN Container │ │ (Squid/Dante Proxies) │ └───────────────────────────────────────────────────┘ ``` ### Core Components - **Next.js Dashboard**: Modern web interface for VPN node management and monitoring - **FastAPI Application**: RESTful API for managing VPN nodes and load balancing - **Docker VPN Containers**: Multi-service containers providing: - **Tailscale Exit Node**: Full network routing via Tailscale mesh - **Squid HTTP/HTTPS Proxy**: Web traffic routing on port 3128 - **Dante SOCKS5 Proxy**: Application-level tunneling on port 1080 - **NordVPN Connection**: Secure VPN tunnel to country-specific servers - **HAProxy**: Country-based proxy routing for legacy proxy URLs - **Traefik**: SSL termination and reverse proxy with automatic certificates - **Tailscale Mesh**: Secure networking for both exit nodes and direct proxy access - **Redis**: Metrics storage and session state management ## 🚀 Quick Start ### Prerequisites - Proxmox VE with LXC container support - Ubuntu 22.04 LTS - Docker and Docker Compose - Node.js 18+ and npm (for dashboard) - NordVPN service credentials - Cloudflare domain and API token ### Basic Setup 1. **Clone the repository**: ``` git clone https://your-repo/vpn-exit-controller.git cd vpn-exit-controller ``` 2. **Set up Python environment**: ``` python3 -m venv venv source venv/bin/activate pip install -r requirements.txt ``` 3. **Configure environment variables**: ``` cp .env.example .env # Edit .env with your NordVPN credentials, Tailscale auth key, etc. ``` 4. **Start the services**: ``` # Start infrastructure cd traefik && docker-compose -f docker-compose.traefik.yml up -d cd ../proxy && docker-compose up -d # Start the API systemctl start vpn-controller # Start the dashboard cd dashboard && docker-compose up -d ``` ### Web Dashboard Access the modern web dashboard at: - **Production**: `https://vpn.rbnk.uk` - **Local Development**: `http://localhost:3000` The dashboard provides: - **Real-time Monitoring**: Live updates every 3 seconds - **Country Selection**: Visual grid with flags - **One-click Controls**: Start, stop, restart nodes - **Performance Metrics**: CPU, memory, network stats - **Professional UI**: Dark mode with responsive design ### API Usage ``` # Dashboard endpoints (public, no auth required) curl https://vpn.rbnk.uk/api/stats curl https://vpn.rbnk.uk/api/countries curl https://vpn.rbnk.uk/api/nodes # Management endpoints (require authentication) curl -u admin:Bl4ckMagic!2345erver https://vpn.rbnk.uk/api/status # Start a VPN node curl -X POST -u admin:Bl4ckMagic!2345erver \ https://vpn.rbnk.uk/api/nodes/us/start \ -H "Content-Type: application/json" \ -d '{"server": "us9999.nordvpn.com"}' # Get best node for a country curl -u admin:Bl4ckMagic!2345erver \ https://vpn.rbnk.uk/api/load-balancer/best-node/us ``` ## 🌍 Available Countries The system supports VPN containers in 25+ countries, accessible via **both Tailscale exit nodes and proxy endpoints**: | Country | Code | Tailscale Exit Node | Direct Proxy Access | Legacy Proxy URLs | Flag | |---------|------|---------------------|---------------------|-------------------|------| | United States | `us` | Route via Tailscale | `100.x.x.x:3128/1080` | `proxy-us.rbnk.uk` | 🇺🇸 | | Germany | `de` | Route via Tailscale | `100.x.x.x:3128/1080` | `proxy-de.rbnk.uk` | 🇩🇪 | | Japan | `jp` | Route via Tailscale | `100.x.x.x:3128/1080` | `proxy-jp.rbnk.uk` | 🇯🇵 | | United Kingdom | `uk` | Route via Tailscale | `100.125.27.111:3128/1080` | `proxy-uk.rbnk.uk` | 🇬🇧 | | And 20+ more... | | | | | | *Note: Tailscale IPs are dynamic and can be discovered via the API* ## 🔌 Usage Approaches Choose the approach that best fits your use case: ### 🌐 Approach 1: Tailscale Exit Nodes (Recommended) **Best for: Full network routing, device-level VPN, multiple applications** ``` # Enable Tailscale exit node routing (macOS/Linux) tailscale up --exit-node=exit-us-server123 # All traffic from your device now routes through US VPN container curl https://ipinfo.io/ip # Shows US IP ``` **Benefits:** - Routes **all** network traffic through VPN - Works with **any application** (no proxy configuration needed) - Perfect for mobile devices, entire computers, or Docker containers - Automatic DNS resolution through VPN - Zero application configuration required ### 🔗 Approach 2: Direct Proxy Access via Tailscale **Best for: Development, testing, specific applications** ``` # Get current Tailscale IPs for active nodes curl -u admin:password https://vpn.rbnk.uk/api/nodes # Use direct Tailscale IP for HTTP proxy (discovered from API) curl -x http://100.86.140.98:3128 https://httpbin.org/ip # Use SOCKS5 proxy curl --socks5 100.86.140.98:1080 https://httpbin.org/ip # UK example curl -x http://100.125.27.111:3128 https://httpbin.org/ip ``` **Benefits:** - Direct connection to VPN containers via Tailscale mesh - No internet routing through proxy infrastructure - Lower latency and better performance - Ideal for development and scripting ### 🌍 Approach 3: Legacy Proxy URLs **Best for: External access, non-Tailscale networks** ``` # Use legacy country-specific URLs curl -x http://proxy-us.rbnk.uk:8080 https://httpbin.org/ip curl --socks5 proxy-de.rbnk.uk:1080 https://httpbin.org/ip curl -x http://proxy-uk.rbnk.uk:8132 https://httpbin.org/ip ``` **Benefits:** - Accessible from any internet connection - No Tailscale client required - SSL/TLS termination via Traefik ### Browser Configuration **For HTTP Proxy:** 1. Go to Browser Proxy Settings 2. Select "Manual proxy configuration" 3. HTTP Proxy: `proxy-de.rbnk.uk` (or desired country) 4. Port: `8129` (for Germany, `8132` for UK, adjust for other countries) 5. Check "Use this proxy server for all protocols" 6. **No username/password required** ### Programming Examples **Python with HTTP Proxy (No Auth)**: ``` import requests # HTTP proxy - no authentication required proxies = { 'http': 'http://proxy-de.rbnk.uk:8129', 'https': 'http://proxy-de.rbnk.uk:8129' } # UK proxy example uk_proxies = { 'http': 'http://proxy-uk.rbnk.uk:8132', 'https': 'http://proxy-uk.rbnk.uk:8132' } response = requests.get('https://httpbin.org/ip', proxies=proxies) print(response.json()) ``` **Python with SOCKS5 Proxy**: ``` import requests # SOCKS5 proxy - requires requests[socks] proxies = { 'http': 'socks5://proxy-jp.rbnk.uk:1082', 'https': 'socks5://proxy-jp.rbnk.uk:1082' } # UK SOCKS5 example uk_socks_proxies = { 'http': 'socks5://proxy-uk.rbnk.uk:1084', 'https': 'socks5://proxy-uk.rbnk.uk:1084' } response = requests.get('https://httpbin.org/ip', proxies=proxies) print(response.json()) ``` **Node.js with HTTP Proxy**: ``` const axios = require('axios'); const proxy = { host: 'proxy-de.rbnk.uk', port: 8129 // No authentication required }; // UK proxy example const ukProxy = { host: 'proxy-uk.rbnk.uk', port: 8132 }; axios.get('https://httpbin.org/ip', { proxy }) .then(response => console.log(response.data)); ``` ## 📁 Directory Structure ``` /opt/vpn-exit-controller/ ├── dashboard/ # Next.js web dashboard │ ├── src/ # Dashboard source code │ ├── public/ # Static assets │ ├── Dockerfile # Dashboard container │ └── docker-compose.yml # Dashboard deployment ├── api/ # FastAPI application │ ├── main.py # Main application entry point │ ├── models/ # Data models and schemas │ ├── routes/ # API route handlers │ └── services/ # Business logic services ├── configs/ # VPN configuration files ├── traefik/ # Traefik reverse proxy configuration │ ├── docker-compose.traefik.yml │ ├── traefik.yml │ └── dynamic/ # Dynamic configuration ├── proxy/ # HAProxy configuration │ ├── docker-compose.yml │ └── haproxy.cfg ├── scripts/ # Utility scripts ├── venv/ # Python virtual environment ├── .env # Environment variables └── requirements.txt # Python dependencies ``` ## ⚙️ Configuration ### Environment Variables Key configuration options in `.env`: ``` # NordVPN Credentials NORDVPN_USER=your_service_username NORDVPN_PASS=your_service_password # Tailscale TAILSCALE_AUTH_KEY=your_tailscale_auth_key # Redis REDIS_HOST=localhost REDIS_PORT=6379 # API Authentication API_USERNAME=admin API_PASSWORD=Bl4ckMagic!2345erver # Cloudflare CF_API_TOKEN=your_cloudflare_api_token ``` ### Advanced Configuration - **Load Balancing Strategy**: Set via API or environment variables - **Health Check Intervals**: Configurable per-node monitoring - **Auto-scaling Thresholds**: Connection-based scaling triggers - **Speed Test Frequency**: Configurable performance monitoring ## 📊 Monitoring & Health Checks ### System Status ``` # Check overall system health curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/health # Get detailed metrics curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/metrics # View active nodes curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/nodes ``` ### Service Status ``` # Check systemd service systemctl status vpn-controller # View logs journalctl -u vpn-controller -f # Check Docker containers docker ps --filter name=vpn-exit ``` ## 🔧 Troubleshooting ### Common Issues **VPN Node Won't Start**: ``` # Check NordVPN credentials docker logs vpn-exit-us # Verify Tailscale connectivity tailscale status ``` **Proxy Connection Fails**: ``` # Test HAProxy configuration docker exec vpn-proxy haproxy -c -f /usr/local/etc/haproxy/haproxy.cfg # Check Traefik routing curl -H "Host: proxy-us.rbnk.uk" http://localhost ``` **Load Balancing Issues**: ``` # Check Redis connectivity redis-cli ping # View load balancing stats curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/load-balancer/stats ``` ## 📚 Documentation - 🎛️ **Dashboard Guide** - Complete dashboard documentation - 📋 **API Documentation** - Complete API reference - 🏗️ **Architecture Guide** - Technical architecture details - 🚀 **Deployment Guide** - Setup and installation - 🌐 **Proxy Usage** - How to use proxy URLs - ⚖️ **Load Balancing** - Load balancing strategies - 🔒 **Security Guide** - Security best practices - 🔧 **Troubleshooting** - Common issues and solutions - 🛠️ **Maintenance** - Operations and maintenance ## 👥 Development ### Local Development **API Development:** ``` # Activate virtual environment source venv/bin/activate # Install development dependencies pip install -r requirements-dev.txt # Run in development mode uvicorn api.main:app --reload --host 0.0.0.0 --port 8080 ``` **Dashboard Development:** ``` # Navigate to dashboard directory cd dashboard # Install dependencies npm install # Start development server npm run dev # Access at http://localhost:3000 ``` ### Testing ``` # Run unit tests pytest tests/ # Run integration tests pytest tests/integration/ # Test specific functionality pytest tests/test_load_balancer.py -v ``` ### Contributing 1. Fork the repository 2. Create a feature branch: `git checkout -b feature/new-feature` 3. Make changes and add tests 4. Commit changes: `git commit -am 'Add new feature'` 5. Push to branch: `git push origin feature/new-feature` 6. Submit a pull request ## 📄 License This project is licensed under the MIT License - see the LICENSE file for details. ## 🆘 Support - 📖 **Documentation**: Check the comprehensive guides in this repository - 🐛 **Issues**: Report bugs via GitHub Issues - 💬 **Discussions**: Join GitHub Discussions for questions and ideas - 📧 **Contact**: For enterprise support and custom deployments --- **Built with ❤️ for reliable, intelligent VPN infrastructure management** --- ## Installation ### Installation Guide This guide covers detailed installation instructions for VPN Exit Controller on various platforms. ## System Requirements ### Minimum Requirements - **CPU**: 2 cores - **RAM**: 4GB - **Storage**: 20GB SSD - **Network**: 100 Mbps connection - **OS**: Ubuntu 22.04 LTS or compatible ### Recommended Requirements - **CPU**: 4+ cores - **RAM**: 8GB+ - **Storage**: 50GB+ SSD - **Network**: 1 Gbps connection - **OS**: Ubuntu 22.04 LTS ### Supported Platforms | Platform | Version | Support Level | |----------|---------|---------------| | Ubuntu | 22.04 LTS | ✅ Full Support | | Ubuntu | 20.04 LTS | ✅ Full Support | | Debian | 11/12 | ✅ Full Support | | RHEL/CentOS | 8/9 | ⚠️ Community Support | | Proxmox LXC | 7.x/8.x | ✅ Full Support | | Docker | 20.10+ | ✅ Full Support | ## Installation Methods ### Method 1: Automated Installation (Recommended) ``` # Download and run installer curl -sSL https://vpn-docs.rbnk.uk/install.sh | bash ``` The installer will: - ✅ Check system requirements - ✅ Install dependencies - ✅ Configure services - ✅ Set up systemd units - ✅ Create necessary directories ### Method 2: Manual Installation #### Step 1: Install System Dependencies === "Ubuntu/Debian" ``` # Update system sudo apt update && sudo apt upgrade -y # Install dependencies sudo apt install -y \ curl \ git \ python3.10 \ python3.10-venv \ python3-pip \ docker.io \ docker-compose \ redis-server \ nginx \ certbot \ python3-certbot-nginx # Start services sudo systemctl enable --now docker redis ``` === "RHEL/CentOS" ``` # Update system sudo dnf update -y # Install dependencies sudo dnf install -y \ curl \ git \ python3.10 \ python3-pip \ docker \ docker-compose \ redis \ nginx \ certbot \ python3-certbot-nginx # Start services sudo systemctl enable --now docker redis ``` #### Step 2: Install Tailscale ``` # Add Tailscale repository curl -fsSL https://tailscale.com/install.sh | sh # Start Tailscale sudo systemctl enable --now tailscaled ``` #### Step 3: Clone Repository ``` # Clone from Gitea git clone https://gitea.rbnk.uk/admin/vpn-controller.git /opt/vpn-exit-controller cd /opt/vpn-exit-controller ``` #### Step 4: Python Environment Setup ``` # Create virtual environment python3 -m venv venv # Activate environment source venv/bin/activate # Install Python packages pip install --upgrade pip pip install -r requirements.txt ``` #### Step 5: Configure Environment ``` # Copy example configuration cp .env.example .env # Edit configuration nano .env ``` Required environment variables: ``` # NordVPN Service Credentials NORDVPN_USER=your_service_username NORDVPN_PASS=your_service_password # Tailscale Configuration TAILSCALE_AUTH_KEY=tskey-auth-xxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxx # API Authentication API_USERNAME=admin API_PASSWORD=strong_password_here # Cloudflare (for DNS management) CF_API_TOKEN=your_cloudflare_api_token # Redis Configuration REDIS_HOST=localhost REDIS_PORT=6379 ``` #### Step 6: Create Systemd Service ``` # Create service file sudo tee /etc/systemd/system/vpn-controller.service << EOF [Unit] Description=VPN Exit Controller API After=network.target redis.service docker.service Wants=redis.service docker.service [Service] Type=exec User=root WorkingDirectory=/opt/vpn-exit-controller Environment="PATH=/opt/vpn-exit-controller/venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" ExecStart=/opt/vpn-exit-controller/venv/bin/python -m uvicorn api.main:app --host 0.0.0.0 --port 8080 Restart=always RestartSec=10 [Install] WantedBy=multi-user.target EOF # Enable and start service sudo systemctl daemon-reload sudo systemctl enable --now vpn-controller ``` ### Method 3: Docker Installation #### Using Docker Compose ``` # Create docker-compose.yml cat > docker-compose.yml << EOF version: '3.8' services: redis: image: redis:alpine restart: always ports: - "6379:6379" volumes: - redis-data:/data vpn-controller: build: . restart: always ports: - "8080:8080" environment: - REDIS_HOST=redis env_file: - .env volumes: - /var/run/docker.sock:/var/run/docker.sock - ./configs:/app/configs depends_on: - redis volumes: redis-data: EOF # Start services docker-compose up -d ``` ## Post-Installation Setup ### 1. Verify Installation ``` # Check service status sudo systemctl status vpn-controller # Test API endpoint curl http://localhost:8080/api/health -u admin:your_password ``` ### 2. Configure Firewall === "UFW (Ubuntu)" ``` # Allow required ports sudo ufw allow 8080/tcp # API sudo ufw allow 80/tcp # HTTP sudo ufw allow 443/tcp # HTTPS sudo ufw allow 8888/tcp # HAProxy stats # Enable firewall sudo ufw enable ``` === "firewalld (RHEL)" ``` # Allow required ports sudo firewall-cmd --permanent --add-port=8080/tcp sudo firewall-cmd --permanent --add-port=80/tcp sudo firewall-cmd --permanent --add-port=443/tcp sudo firewall-cmd --permanent --add-port=8888/tcp # Reload firewall sudo firewall-cmd --reload ``` ### 3. Set Up SSL Certificates ``` # Using Certbot sudo certbot --nginx -d vpn-api.yourdomain.com # Or using Traefik (automatic) cd traefik && docker-compose up -d ``` ### 4. Configure DNS Records Add these records to your domain: | Type | Name | Value | Proxy | |------|------|-------|-------| | A | vpn-api | YOUR_SERVER_IP | ❌ | | A | proxy-us | YOUR_SERVER_IP | ✅ | | A | proxy-uk | YOUR_SERVER_IP | ✅ | | A | proxy-jp | YOUR_SERVER_IP | ✅ | ## Troubleshooting Installation !!! warning "Common Issues" **Docker Permission Denied** ``` # Add user to docker group sudo usermod -aG docker $USER # Log out and back in ``` **Port Already in Use** ``` # Find process using port sudo lsof -i :8080 # Change port in configuration ``` **Python Version Issues** ``` # Install specific Python version sudo add-apt-repository ppa:deadsnakes/ppa sudo apt install python3.10 python3.10-venv ``` ## Uninstallation To completely remove VPN Exit Controller: ``` # Stop services sudo systemctl stop vpn-controller sudo systemctl disable vpn-controller # Remove files sudo rm -rf /opt/vpn-exit-controller sudo rm /etc/systemd/system/vpn-controller.service # Remove Docker containers docker stop $(docker ps -a -q --filter name=vpn-) docker rm $(docker ps -a -q --filter name=vpn-) # Clean up (optional) sudo apt remove --purge docker.io docker-compose ``` ## Next Steps
- :material-rocket-launch:{ .lg .middle } __Quick Start__ --- Start using VPN Exit Controller :octicons-arrow-right-24: Quick Start - :material-cog:{ .lg .middle } __Configuration__ --- Configure advanced settings :octicons-arrow-right-24: Configuration Guide - :material-shield-check:{ .lg .middle } __Security__ --- Harden your installation :octicons-arrow-right-24: Security Guide
--- !!! info "Need Help?" If you encounter issues during installation, check our Troubleshooting Guide or open an issue. --- ## Operations > Deployment ### VPN Exit Controller - Deployment Guide This comprehensive guide covers the complete deployment of the VPN Exit Controller system from scratch, including infrastructure setup, dependencies, configuration, and testing procedures. ## Table of Contents 1. Infrastructure Prerequisites 2. System Dependencies 3. Application Setup 4. Service Configuration 5. Network and DNS Setup 6. Container Infrastructure 7. Testing and Verification 8. Troubleshooting ## 1. Infrastructure Prerequisites ### 1.1 Proxmox VE Setup Requirements #### Hardware Specifications (Minimum) - **CPU**: 4 cores (Intel/AMD with virtualization support) - **RAM**: 8GB (16GB recommended for multiple VPN nodes) - **Storage**: 100GB SSD (for container and Docker images) - **Network**: 1Gbps NIC with stable internet connection #### Proxmox VE Installation 1. Install Proxmox VE 8.0+ on the host system 2. Configure network bridges in Proxmox web interface 3. Set up storage pools for container data ### 1.2 LXC Container Configuration #### Create LXC Container ``` # Create Ubuntu 22.04 LXC container pct create 201 local:vztmpl/ubuntu-22.04-standard_22.04-1_amd64.tar.zst \ --hostname vpn-controller \ --memory 4096 \ --cores 2 \ --rootfs local-lvm:32 \ --net0 name=eth0,bridge=vmbr1,ip=10.10.10.20/24,gw=10.10.10.1 \ --nameserver 1.1.1.1 \ --onboot 1 \ --unprivileged 0 \ --features nesting=1,keyctl=1 ``` #### Essential Container Features - `nesting=1`: Enables Docker containers within LXC - `keyctl=1`: Required for Docker operations - `unprivileged=0`: Runs as privileged container for Docker access #### Network Configuration ``` # Configure static network in container cat > /etc/netplan/01-netcfg.yaml << 'EOF' network: version: 2 ethernets: eth0: addresses: - 10.10.10.20/24 gateway4: 10.10.10.1 nameservers: addresses: [1.1.1.1, 8.8.8.8] EOF netplan apply ``` #### AppArmor Configuration (if needed) ``` # On Proxmox host, disable AppArmor for container echo "lxc.apparmor.profile: unconfined" >> /etc/pve/lxc/201.conf pct reboot 201 ``` ## 2. System Dependencies ### 2.1 Ubuntu 22.04 LXC Base Setup ``` # Update system packages apt update && apt upgrade -y # Install essential system packages apt install -y \ curl \ wget \ git \ nano \ htop \ net-tools \ iptables \ ca-certificates \ gnupg \ lsb-release \ software-properties-common \ apt-transport-https ``` ### 2.2 Docker Installation and Configuration ``` # Add Docker's official GPG key curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg # Add Docker repository echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null # Install Docker apt update apt install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin # Start and enable Docker systemctl start docker systemctl enable docker # Add user to docker group (if not running as root) usermod -aG docker $USER ``` #### Docker Configuration ``` # Configure Docker daemon cat > /etc/docker/daemon.json << 'EOF' { "log-driver": "json-file", "log-opts": { "max-size": "10m", "max-file": "3" }, "dns": ["1.1.1.1", "8.8.8.8"], "storage-driver": "overlay2" } EOF systemctl restart docker ``` ### 2.3 Python 3.10 with Virtual Environment ``` # Install Python 3.10 and pip apt install -y python3.10 python3.10-venv python3-pip # Verify Python installation python3 --version ``` ### 2.4 Redis Server Installation ``` # Install Redis apt install -y redis-server # Configure Redis sed -i 's/bind 127.0.0.1 ::1/bind 127.0.0.1/' /etc/redis/redis.conf sed -i 's/# requirepass foobared/requirepass vpn-redis-2024/' /etc/redis/redis.conf # Start and enable Redis systemctl start redis-server systemctl enable redis-server # Test Redis redis-cli ping ``` ### 2.5 Additional System Packages ``` # Install network utilities apt install -y \ openvpn \ iptables-persistent \ netfilter-persistent \ bridge-utils \ iproute2 \ tcpdump \ nmap \ jq ``` ## 3. Application Setup ### 3.1 Repository Cloning and Directory Setup ``` # Create application directory mkdir -p /opt/vpn-exit-controller cd /opt/vpn-exit-controller # Clone repository (adjust URL as needed) git clone https://github.com/your-repo/vpn-exit-controller.git . # Set proper permissions chown -R root:root /opt/vpn-exit-controller chmod +x scripts/*.sh chmod +x start.sh ``` ### 3.2 Python Virtual Environment Setup ``` # Create virtual environment cd /opt/vpn-exit-controller python3 -m venv venv # Activate virtual environment source venv/bin/activate # Install Python dependencies pip install --upgrade pip pip install -r api/requirements.txt # Verify installations pip list ``` ### 3.3 Environment Variable Configuration ``` # Create .env file cat > /opt/vpn-exit-controller/.env << 'EOF' # Application Settings SECRET_KEY=your-super-secret-key-change-this-in-production ADMIN_USER=admin ADMIN_PASS=Bl4ckMagic!2345erver # Tailscale Configuration TAILSCALE_AUTHKEY=tskey-auth-your-tailscale-key-here # NordVPN Credentials NORDVPN_USERNAME=your-nordvpn-username NORDVPN_PASSWORD=your-nordvpn-password # Redis Configuration REDIS_HOST=127.0.0.1 REDIS_PORT=6379 REDIS_PASSWORD=vpn-redis-2024 # Cloudflare DNS API (for SSL certificates) CLOUDFLARE_EMAIL=admin@richardbankole.com CLOUDFLARE_API_KEY=your-cloudflare-api-key # Domain Configuration DOMAIN=rbnk.uk API_DOMAIN=vpn-api.rbnk.uk EOF # Secure the .env file chmod 600 /opt/vpn-exit-controller/.env ``` ### 3.4 NordVPN Configuration Setup ``` # Create NordVPN authentication file mkdir -p /opt/vpn-exit-controller/configs cat > /opt/vpn-exit-controller/configs/auth.txt << 'EOF' your-nordvpn-username your-nordvpn-password EOF chmod 600 /opt/vpn-exit-controller/configs/auth.txt # Download NordVPN configuration files cd /opt/vpn-exit-controller bash scripts/download-nordvpn-configs.sh ``` ## 4. Service Configuration ### 4.1 NordVPN Service Credentials Setup The NordVPN configurations are already present in the `/opt/vpn-exit-controller/configs/vpn/` directory. Ensure your NordVPN credentials are properly configured: ``` # Verify NordVPN configs exist ls -la /opt/vpn-exit-controller/configs/vpn/ # Test a configuration (optional) openvpn --config /opt/vpn-exit-controller/configs/vpn/us.ovpn \ --auth-user-pass /opt/vpn-exit-controller/configs/auth.txt \ --daemon ``` ### 4.2 Tailscale Installation and Configuration ``` # Install Tailscale curl -fsSL https://tailscale.com/install.sh | sh # Start Tailscale daemon systemctl start tailscaled systemctl enable tailscaled # Authenticate with Tailscale (use your auth key from .env) tailscale up --authkey=tskey-auth-your-key-here \ --advertise-exit-node \ --hostname=vpn-controller # Verify Tailscale status tailscale status tailscale ip -4 ``` ### 4.3 Systemd Service Installation ``` # Create the systemd service file cat > /etc/systemd/system/vpn-controller.service << 'EOF' [Unit] Description=VPN Exit Controller API After=docker.service tailscaled.service redis-server.service Requires=docker.service Wants=tailscaled.service redis-server.service [Service] Type=simple ExecStart=/opt/vpn-exit-controller/start.sh Restart=on-failure RestartSec=10 User=root WorkingDirectory=/opt/vpn-exit-controller Environment=PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin [Install] WantedBy=multi-user.target EOF # Reload systemd and enable service systemctl daemon-reload systemctl enable vpn-controller ``` ### 4.4 Firewall and iptables Configuration ``` # Configure iptables for VPN traffic cat > /etc/iptables/rules.v4 << 'EOF' *nat :PREROUTING ACCEPT [0:0] :INPUT ACCEPT [0:0] :OUTPUT ACCEPT [0:0] :POSTROUTING ACCEPT [0:0] # NAT rules for VPN traffic -A POSTROUTING -s 10.0.0.0/8 -o tun+ -j MASQUERADE -A POSTROUTING -s 172.16.0.0/12 -o tun+ -j MASQUERADE -A POSTROUTING -s 192.168.0.0/16 -o tun+ -j MASQUERADE COMMIT *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [0:0] # Allow loopback -A INPUT -i lo -j ACCEPT # Allow established connections -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT # Allow SSH -A INPUT -p tcp --dport 22 -j ACCEPT # Allow HTTP/HTTPS -A INPUT -p tcp --dport 80 -j ACCEPT -A INPUT -p tcp --dport 443 -j ACCEPT # Allow API port -A INPUT -p tcp --dport 8080 -j ACCEPT # Allow Tailscale -A INPUT -p udp --dport 41641 -j ACCEPT # Forward VPN traffic -A FORWARD -i tun+ -j ACCEPT -A FORWARD -o tun+ -j ACCEPT # Drop invalid packets -A INPUT -m state --state INVALID -j DROP COMMIT EOF # Apply iptables rules iptables-restore < /etc/iptables/rules.v4 netfilter-persistent save ``` ## 5. Network and DNS Setup ### 5.1 Cloudflare DNS Configuration Configure the following DNS records in your Cloudflare dashboard for `rbnk.uk`: ``` # Main API endpoint vpn-api.rbnk.uk A 10.10.10.20 (Proxied: Yes) # Proxy endpoints for each country proxy-us.rbnk.uk A 10.10.10.20 (Proxied: Yes) proxy-uk.rbnk.uk A 10.10.10.20 (Proxied: Yes) proxy-de.rbnk.uk A 10.10.10.20 (Proxied: Yes) proxy-jp.rbnk.uk A 10.10.10.20 (Proxied: Yes) proxy-ca.rbnk.uk A 10.10.10.20 (Proxied: Yes) proxy-au.rbnk.uk A 10.10.10.20 (Proxied: Yes) proxy-nl.rbnk.uk A 10.10.10.20 (Proxied: Yes) proxy-fr.rbnk.uk A 10.10.10.20 (Proxied: Yes) proxy-it.rbnk.uk A 10.10.10.20 (Proxied: Yes) proxy-es.rbnk.uk A 10.10.10.20 (Proxied: Yes) # Traefik dashboard (optional) traefik.rbnk.uk A 10.10.10.20 (Proxied: Yes) ``` ### 5.2 SSL Certificate Configuration The Traefik configuration handles SSL certificates automatically via Let's Encrypt and Cloudflare DNS challenge: ``` # Ensure acme.json has correct permissions mkdir -p /opt/vpn-exit-controller/traefik/letsencrypt touch /opt/vpn-exit-controller/traefik/letsencrypt/acme.json chmod 600 /opt/vpn-exit-controller/traefik/letsencrypt/acme.json ``` ## 6. Container Infrastructure ### 6.1 Docker Network Setup ``` # Create custom Docker networks docker network create vpn-network --subnet=172.20.0.0/16 docker network create traefik-network --subnet=172.21.0.0/16 ``` ### 6.2 Build VPN Node Container ``` # Build the VPN node Docker image cd /opt/vpn-exit-controller/vpn-node docker build -t vpn-exit-node:latest . # Verify image was built docker images | grep vpn-exit-node ``` ### 6.3 Traefik Deployment ``` # Start Traefik container cd /opt/vpn-exit-controller/traefik docker compose -f docker-compose.traefik.yml up -d # Check Traefik status docker ps | grep traefik docker logs traefik ``` ### 6.4 HAProxy Deployment ``` # Start HAProxy and proxy infrastructure cd /opt/vpn-exit-controller/proxy docker compose up -d # Verify HAProxy is running docker ps | grep haproxy curl -s http://localhost:8404 # HAProxy stats page ``` ### 6.5 Main Application Deployment ``` # Start the main application stack cd /opt/vpn-exit-controller docker compose up -d # Start the systemd service systemctl start vpn-controller systemctl status vpn-controller ``` ## 7. Testing and Verification ### 7.1 Health Check Procedures ``` # Check all services are running systemctl status vpn-controller systemctl status docker systemctl status tailscaled systemctl status redis-server # Check Docker containers docker ps -a # Check application logs journalctl -u vpn-controller -f docker logs vpn-api docker logs vpn-redis ``` ### 7.2 API Endpoint Testing ``` # Test API status endpoint curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/status # Test via domain (after DNS propagation) curl -u admin:Bl4ckMagic!2345erver https://vpn-api.rbnk.uk/api/status # Test node management endpoints curl -u admin:Bl4ckMagic!2345erver https://vpn-api.rbnk.uk/api/nodes # Test metrics endpoint curl -u admin:Bl4ckMagic!2345erver https://vpn-api.rbnk.uk/api/metrics ``` ### 7.3 Proxy URL Verification ``` # Test HTTP proxy endpoints curl -x proxy-us.rbnk.uk:80 http://ipinfo.io/country curl -x proxy-uk.rbnk.uk:80 http://ipinfo.io/country curl -x proxy-de.rbnk.uk:80 http://ipinfo.io/country # Test SOCKS5 proxy (if configured) curl --socks5 proxy-us.rbnk.uk:1080 http://ipinfo.io/country ``` ### 7.4 Performance Testing ``` # Speed test through proxy curl -x proxy-us.rbnk.uk:80 -w "@curl-format.txt" -o /dev/null -s http://speedtest.net/mini.php # Create curl format file for detailed timing cat > curl-format.txt << 'EOF' time_namelookup: %{time_namelookup}\n time_connect: %{time_connect}\n time_appconnect: %{time_appconnect}\n time_pretransfer: %{time_pretransfer}\n time_redirect: %{time_redirect}\n time_starttransfer: %{time_starttransfer}\n ----------\n time_total: %{time_total}\n EOF ``` ### 7.5 Tailscale Exit Node Verification ``` # Check Tailscale status tailscale status # Verify exit node advertisement tailscale status | grep "exit node" # Test from another Tailscale device # Use this node as exit node and check external IP ``` ## 8. Troubleshooting ### 8.1 Common Issues and Solutions #### Docker Permission Issues ``` # Add user to docker group usermod -aG docker $USER newgrp docker # Or run as root sudo su - ``` #### Container Networking Issues ``` # Restart Docker daemon systemctl restart docker # Recreate networks docker network rm vpn-network traefik-network docker network create vpn-network --subnet=172.20.0.0/16 docker network create traefik-network --subnet=172.21.0.0/16 ``` #### SSL Certificate Issues ``` # Check Traefik logs docker logs traefik # Verify Cloudflare API credentials # Check acme.json permissions ls -la /opt/vpn-exit-controller/traefik/letsencrypt/acme.json ``` #### VPN Connection Issues ``` # Check NordVPN credentials cat /opt/vpn-exit-controller/configs/auth.txt # Test manual OpenVPN connection openvpn --config /opt/vpn-exit-controller/configs/vpn/us.ovpn \ --auth-user-pass /opt/vpn-exit-controller/configs/auth.txt ``` ### 8.2 Log Locations ``` # Application logs journalctl -u vpn-controller -f # Docker container logs docker logs vpn-api docker logs vpn-redis docker logs traefik docker logs haproxy # System logs /var/log/syslog /var/log/daemon.log # Traefik logs /opt/vpn-exit-controller/traefik/logs/ ``` ### 8.3 Recovery Procedures #### Service Recovery ``` # Restart all services systemctl restart vpn-controller docker compose down && docker compose up -d # Clean restart docker system prune -f docker compose down -v docker compose up -d --build ``` #### Database Recovery ``` # Restart Redis systemctl restart redis-server # Clear Redis cache if needed redis-cli FLUSHALL ``` ## Post-Deployment Checklist - [ ] All services running and enabled - [ ] DNS records configured and propagated - [ ] SSL certificates obtained and valid - [ ] API endpoints responding correctly - [ ] Proxy URLs functional for all countries - [ ] Tailscale exit node operational - [ ] Monitoring and logging configured - [ ] Backup procedures established - [ ] Security hardening completed - [ ] Performance baselines established ## Security Considerations 1. **Change default passwords** in `.env` file 2. **Restrict API access** using proper authentication 3. **Configure firewall rules** to limit exposed ports 4. **Regular security updates** for all components 5. **Monitor access logs** for suspicious activity 6. **Secure NordVPN credentials** with proper file permissions 7. **Use strong Tailscale authentication** keys 8. **Regular backup** of configuration files ## Maintenance ### Regular Tasks - Monitor disk space and logs - Update Docker images monthly - Rotate authentication keys quarterly - Review access logs weekly - Test backup/recovery procedures monthly ### Updates - Always test updates in staging environment - Backup configurations before updates - Update dependencies in requirements.txt - Monitor for security advisories This deployment guide provides a complete foundation for setting up the VPN Exit Controller system. Adjust specific values like domain names, IP addresses, and credentials according to your environment. --- ## Operations > Docs Build System ### Documentation Build System This page describes the automated documentation build system for the VPN Exit Controller project. ## Overview The VPN Exit Controller documentation is built using MkDocs with the Material theme and is automatically rebuilt whenever changes are pushed to the repository. The documentation is hosted at https://vpn-docs.rbnk.uk. ## Architecture ``` graph LR A[Git Push] --> B[Gitea Webhook] B --> C[Webhook Server
Port 8888] C --> D[Rebuild Script] D --> E[Docker Build] E --> F[New Container] F --> G[Traefik] G --> H[vpn-docs.rbnk.uk] ``` ## Components ### 1. MkDocs Site **Location**: `/opt/vpn-exit-controller/mkdocs-site/` The documentation source includes: - `mkdocs.yml` - MkDocs configuration - `docs/` - Documentation source files - `Dockerfile` - Multi-stage build for documentation - `docker-compose.yml` - Container orchestration - `nginx.conf` - Web server configuration ### 2. Webhook Server **Script**: `/opt/vpn-exit-controller/scripts/webhook-docs-rebuild.py` **Service**: `docs-webhook.service` **Port**: 8888 The webhook server: - Listens for POST requests to `/rebuild-docs` - Validates webhook signatures (optional) - Triggers the rebuild script - Logs all activity ### 3. Rebuild Script **Location**: `/opt/vpn-exit-controller/scripts/rebuild-docs.sh` The rebuild script performs: 1. Stops the existing documentation container 2. Builds a new container with latest documentation 3. Starts the new container 4. Verifies deployment success 5. Logs the rebuild event ### 4. Docker Container **Container Name**: `vpn-docs` **Internal Port**: 80 **External Port**: 8001 The container uses a multi-stage build: 1. **Builder stage**: Python environment with MkDocs 2. **Production stage**: Nginx serving static files ## Configuration ### Webhook Service Configuration The webhook service is managed by systemd: ``` [Unit] Description=Documentation Rebuild Webhook Server After=network.target docker.service Requires=docker.service [Service] Type=simple User=root WorkingDirectory=/opt/vpn-exit-controller Environment="WEBHOOK_SECRET=change-me-to-secure-secret" ExecStart=/usr/bin/python3 /opt/vpn-exit-controller/scripts/webhook-docs-rebuild.py Restart=always RestartSec=10 ``` ### Gitea Webhook Setup To configure automatic rebuilds: 1. Navigate to your repository settings in Gitea 2. Go to Webhooks section 3. Add a new webhook with: - **URL**: `http://10.10.10.20:8888/rebuild-docs` - **Method**: POST - **Events**: Push events - **Secret**: (optional but recommended) ### Security Configuration For production use, configure a webhook secret: 1. Generate a secure secret: ``` openssl rand -hex 32 ``` 2. Update the systemd service: ``` systemctl edit docs-webhook ``` 3. Add the environment variable: ``` [Service] Environment="WEBHOOK_SECRET=your-generated-secret" ``` 4. Use the same secret in Gitea webhook configuration ## Usage ### Automatic Builds Documentation is automatically rebuilt when: - Code is pushed to the main branch - A webhook request is received - The rebuild is manually triggered ### Manual Rebuild To manually rebuild documentation: ``` # Option 1: Direct script execution /opt/vpn-exit-controller/scripts/rebuild-docs.sh # Option 2: Trigger via webhook curl -X POST http://localhost:8888/rebuild-docs # Option 3: With webhook secret curl -X POST http://localhost:8888/rebuild-docs \ -H "X-Hub-Signature-256: sha256=your-signature" ``` ### Monitoring Check the system status: ``` # Webhook service status systemctl status docs-webhook # Container status docker ps | grep vpn-docs # Recent rebuilds tail -f /opt/vpn-exit-controller/logs/docs-rebuild.log # Webhook activity tail -f /opt/vpn-exit-controller/logs/webhook.log ``` ## Troubleshooting ### Common Issues #### Webhook Not Triggering 1. Check service status: ``` systemctl status docs-webhook journalctl -u docs-webhook -f ``` 2. Test webhook connectivity: ``` curl -X POST http://localhost:8888/rebuild-docs ``` 3. Verify Gitea can reach the webhook URL #### Build Failures 1. Check Docker logs: ``` docker logs vpn-docs ``` 2. Manually test the build: ``` cd /opt/vpn-exit-controller/mkdocs-site docker-compose build ``` 3. Check disk space: ``` df -h ``` #### Site Not Accessible 1. Verify container health: ``` docker inspect vpn-docs --format='{{.State.Health.Status}}' ``` 2. Check Traefik routing: ``` docker logs traefik | grep vpn-docs ``` 3. Test SSL certificate: ``` curl -vI https://vpn-docs.rbnk.uk ``` ### Log Locations - **Webhook logs**: `/opt/vpn-exit-controller/logs/webhook.log` - **Rebuild logs**: `/opt/vpn-exit-controller/logs/docs-rebuild.log` - **Container logs**: `docker logs vpn-docs` - **Service logs**: `journalctl -u docs-webhook` ## Maintenance ### Regular Tasks - **Monitor disk usage** - Documentation builds can consume space - **Review logs** - Check for failed builds or security issues - **Update dependencies** - Keep MkDocs and plugins updated - **Rotate logs** - Ensure log files don't grow too large ### Updates To update the documentation system: 1. Update MkDocs dependencies: ``` cd /opt/vpn-exit-controller/mkdocs-site # Update requirements.txt docker-compose build --no-cache ``` 2. Update webhook server: ``` # Modify the Python script systemctl restart docs-webhook ``` ## Performance The documentation build process: - Takes 1-2 minutes to complete - Uses minimal CPU during normal operation - Requires ~500MB disk space for build cache - Serves static files efficiently via Nginx ## Security Considerations 1. **Webhook Authentication**: Always use a secret in production 2. **Network Access**: Limit webhook access to trusted sources 3. **Container Isolation**: Runs with minimal privileges 4. **SSL/TLS**: All public access uses HTTPS via Traefik 5. **Input Validation**: Webhook server validates all inputs ## Future Enhancements Potential improvements to consider: - [ ] Add build status badges - [ ] Implement build notifications - [ ] Add search analytics - [ ] Enable documentation versioning - [ ] Add automated link checking - [ ] Implement A/B testing for docs - [ ] Add user feedback collection ## Related Documentation - Deployment Guide - Overall system deployment - Maintenance Guide - System maintenance procedures - Troubleshooting Guide - General troubleshooting --- !!! tip "Quick Test" To test if the documentation build system is working, make a small change to any `.md` file, commit, and push. You should see the documentation automatically rebuild within 2-3 minutes. --- ## Operations ### Operations Guide This section covers the operational aspects of running and maintaining the VPN Exit Controller system. ## Quick Links - **Deployment** - Initial system deployment and setup - **Documentation Build System** - Automated documentation builds - **Monitoring** - System monitoring and alerting - **Maintenance** - Routine maintenance procedures - **Troubleshooting** - Common issues and solutions - **Scaling** - Scaling the system for growth ## Overview Operating the VPN Exit Controller requires understanding several key areas: ### 🚀 Deployment Learn how to deploy the system from scratch, including infrastructure setup, service configuration, and initial testing. ### 📚 Documentation Build System Understand how documentation is automatically built and deployed when changes are pushed to the repository. This ensures documentation stays in sync with the codebase. ### 📊 Monitoring Set up comprehensive monitoring to track system health, performance metrics, and potential issues before they impact users. ### 🔧 Maintenance Follow routine maintenance procedures to keep the system running smoothly, including updates, backups, and security patches. ### 🔍 Troubleshooting Quickly diagnose and resolve common issues using our troubleshooting guide and diagnostic tools. ### 📈 Scaling Plan for growth with our scaling guide, covering both vertical and horizontal scaling strategies. ## Key Operational Tasks ### Daily Tasks - Monitor system health via dashboard - Check webhook and build logs - Review error logs for issues - Verify all VPN nodes are operational ### Weekly Tasks - Review performance metrics - Check disk usage and clean if needed - Verify backup procedures - Update documentation as needed ### Monthly Tasks - Security updates and patches - Certificate renewal verification - Capacity planning review - Documentation review and updates ## Important Locations ### Configuration Files - Main config: `/opt/vpn-exit-controller/` - Service files: `/etc/systemd/system/` - Docker configs: Various `docker-compose.yml` files ### Log Files - API logs: `journalctl -u vpn-controller` - Webhook logs: `/opt/vpn-exit-controller/logs/webhook.log` - Container logs: `docker logs [container-name]` ### Documentation - Source: `/opt/vpn-exit-controller/mkdocs-site/docs/` - Live site: https://vpn-docs.rbnk.uk ## Getting Help If you encounter issues not covered in these guides: 1. Check the troubleshooting guide 2. Review system logs for error messages 3. Consult the API documentation 4. Check the main documentation index --- !!! info "Continuous Improvement" This operations documentation is continuously updated based on operational experience. If you discover new issues or better procedures, please document them! --- ## Operations > Maintenance ### VPN Exit Controller - Maintenance Guide This document provides comprehensive maintenance procedures for the VPN Exit Controller system running on Proxmox LXC container (ID: 201). ## Table of Contents 1. Routine Maintenance Tasks 2. System Monitoring 3. Backup and Recovery 4. Updates and Upgrades 5. Certificate Management 6. VPN Service Management 7. Capacity Management 8. Security Maintenance 9. Documentation Updates 10. Emergency Procedures --- ## 1. Routine Maintenance Tasks ### Daily Health Checks (Automated) Create a daily health check script at `/opt/vpn-exit-controller/scripts/daily-check.sh`: ``` #!/bin/bash # Daily health check script LOG_FILE="/var/log/vpn-controller-health.log" DATE=$(date "+%Y-%m-%d %H:%M:%S") echo "[$DATE] Starting daily health check" >> $LOG_FILE # Check system resources CPU_USAGE=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | awk -F'%' '{print $1}') MEM_USAGE=$(free | grep Mem | awk '{printf("%.2f"), $3/$2 * 100.0}') DISK_USAGE=$(df -h /opt | awk 'NR==2 {print $5}' | sed 's/%//') echo "[$DATE] CPU: ${CPU_USAGE}%, Memory: ${MEM_USAGE}%, Disk: ${DISK_USAGE}%" >> $LOG_FILE # Check service status if systemctl is-active --quiet vpn-controller; then echo "[$DATE] VPN Controller service: RUNNING" >> $LOG_FILE else echo "[$DATE] VPN Controller service: FAILED" >> $LOG_FILE systemctl restart vpn-controller fi # Check Docker containers RUNNING_CONTAINERS=$(docker ps --format "table {{.Names}}\t{{.Status}}" | grep -c "Up") echo "[$DATE] Running containers: $RUNNING_CONTAINERS" >> $LOG_FILE # Check API health API_RESPONSE=$(curl -s -u admin:Bl4ckMagic!2345erver -o /dev/null -w "%{http_code}" http://localhost:8080/api/status) if [ "$API_RESPONSE" = "200" ]; then echo "[$DATE] API health: OK" >> $LOG_FILE else echo "[$DATE] API health: FAILED (HTTP $API_RESPONSE)" >> $LOG_FILE fi # Check Redis if docker exec vpn-redis redis-cli ping | grep -q PONG; then echo "[$DATE] Redis: OK" >> $LOG_FILE else echo "[$DATE] Redis: FAILED" >> $LOG_FILE fi # Alert on high resource usage if [ "$CPU_USAGE" -gt 80 ] || [ "$MEM_USAGE" -gt 85 ] || [ "$DISK_USAGE" -gt 90 ]; then echo "[$DATE] WARNING: High resource usage detected" >> $LOG_FILE fi echo "[$DATE] Daily health check completed" >> $LOG_FILE ``` **Schedule**: Add to crontab: ``` 0 6 * * * /opt/vpn-exit-controller/scripts/daily-check.sh ``` ### Weekly Performance Review Run every Sunday at 2 AM: ``` #!/bin/bash # Weekly performance review script REPORT_FILE="/var/log/vpn-controller-weekly-$(date +%Y%m%d).log" echo "Weekly Performance Report - $(date)" > $REPORT_FILE echo "========================================" >> $REPORT_FILE # System uptime uptime >> $REPORT_FILE # Docker container statistics echo -e "\nContainer Resource Usage:" >> $REPORT_FILE docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.NetIO}}" >> $REPORT_FILE # VPN node performance echo -e "\nVPN Node Statistics:" >> $REPORT_FILE curl -s -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/metrics/summary >> $REPORT_FILE # Log analysis echo -e "\nError Summary (last 7 days):" >> $REPORT_FILE journalctl -u vpn-controller --since "7 days ago" | grep -i error | wc -l >> $REPORT_FILE # Redis memory usage echo -e "\nRedis Memory Usage:" >> $REPORT_FILE docker exec vpn-redis redis-cli info memory | grep used_memory_human >> $REPORT_FILE ``` ### Monthly System Updates **First Sunday of each month at 3 AM:** ``` #!/bin/bash # Monthly system update script UPDATE_LOG="/var/log/monthly-updates-$(date +%Y%m).log" echo "Monthly System Update - $(date)" > $UPDATE_LOG # Update package lists apt update >> $UPDATE_LOG 2>&1 # List available updates echo "Available updates:" >> $UPDATE_LOG apt list --upgradable >> $UPDATE_LOG 2>&1 # Install security updates only (safer for production) DEBIAN_FRONTEND=noninteractive apt upgrade -y -o Dpkg::Options::="--force-confdef" >> $UPDATE_LOG 2>&1 # Clean up apt autoremove -y >> $UPDATE_LOG 2>&1 apt autoclean >> $UPDATE_LOG 2>&1 # Check if reboot is required if [ -f /var/run/reboot-required ]; then echo "REBOOT REQUIRED" >> $UPDATE_LOG # Schedule reboot for low-traffic time (4 AM) shutdown -r 04:00 "Scheduled reboot for system updates" fi ``` ### Quarterly Capacity Planning **First day of each quarter:** ``` #!/bin/bash # Quarterly capacity planning report QUARTER=$(date +%Y-Q$(($(date +%m)/3+1))) REPORT_FILE="/var/log/capacity-report-${QUARTER}.log" echo "Quarterly Capacity Planning Report - $QUARTER" > $REPORT_FILE echo "=============================================" >> $REPORT_FILE # Historical resource usage trends echo "Resource Usage Trends (last 90 days):" >> $REPORT_FILE # Disk usage growth df -h /opt >> $REPORT_FILE # VPN node usage statistics echo -e "\nVPN Node Utilization:" >> $REPORT_FILE curl -s -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/metrics/nodes | jq '.[] | {country: .country, usage_percent: .usage_percent}' >> $REPORT_FILE # Connection statistics echo -e "\nConnection Statistics:" >> $REPORT_FILE curl -s -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/metrics/connections >> $REPORT_FILE # Recommendations section echo -e "\nCapacity Recommendations:" >> $REPORT_FILE echo "Review this report to determine if additional nodes or resources are needed." >> $REPORT_FILE ``` --- ## 2. System Monitoring ### Key Metrics to Watch #### System Level Metrics - **CPU Usage**: Should stay below 80% average - **Memory Usage**: Should stay below 85% - **Disk Usage**: Should stay below 90% - **Network I/O**: Monitor for unusual spikes - **Load Average**: Should be below number of CPU cores #### Application Level Metrics - **API Response Time**: < 500ms for health checks - **Active VPN Connections**: Track connection counts - **Container Health**: All containers should be "healthy" - **Redis Memory Usage**: Monitor for memory leaks - **Failed Connection Attempts**: Track error rates ### Alert Thresholds and Escalation #### Critical Alerts (Immediate Response) - API unavailable (HTTP 5xx errors) - System CPU > 95% for 5+ minutes - System memory > 95% - Disk usage > 95% - All VPN nodes offline - Redis unavailable #### Warning Alerts (Response within 4 hours) - System CPU > 80% for 15+ minutes - System memory > 85% - Disk usage > 90% - > 50% of VPN nodes offline - High error rate (> 5%) #### Info Alerts (Response within 24 hours) - Single VPN node offline - Slow API response times (> 1s) - Log rotation needed ### Log Monitoring and Rotation #### Configure log rotation for VPN Controller: Create `/etc/logrotate.d/vpn-controller`: ``` /var/log/vpn-controller*.log { daily rotate 30 compress delaycompress missingok notifempty create 644 root root postrotate systemctl reload vpn-controller endscript } ``` #### Monitor key log patterns: ``` # Create log monitoring script #!/bin/bash # Monitor critical log patterns LOG_FILE="/var/log/vpn-controller-alerts.log" JOURNAL_LOG=$(journalctl -u vpn-controller --since "1 hour ago" --no-pager) # Check for critical errors CRITICAL_ERRORS=$(echo "$JOURNAL_LOG" | grep -i "critical\|fatal\|emergency" | wc -l) if [ "$CRITICAL_ERRORS" -gt 0 ]; then echo "$(date): CRITICAL - $CRITICAL_ERRORS critical errors found" >> $LOG_FILE fi # Check for failed VPN connections FAILED_CONNECTIONS=$(echo "$JOURNAL_LOG" | grep -i "connection failed\|vpn failed" | wc -l) if [ "$FAILED_CONNECTIONS" -gt 10 ]; then echo "$(date): WARNING - $FAILED_CONNECTIONS failed VPN connections in last hour" >> $LOG_FILE fi # Check for Docker issues DOCKER_ERRORS=$(echo "$JOURNAL_LOG" | grep -i "docker.*error" | wc -l) if [ "$DOCKER_ERRORS" -gt 0 ]; then echo "$(date): WARNING - $DOCKER_ERRORS Docker errors found" >> $LOG_FILE fi ``` ### Performance Baseline Tracking Create baseline measurements script: ``` #!/bin/bash # Performance baseline tracking BASELINE_FILE="/var/log/performance-baseline.json" TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%SZ) # Collect performance metrics API_RESPONSE_TIME=$(curl -o /dev/null -s -w "%{time_total}" -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/status) MEMORY_USAGE=$(free | grep Mem | awk '{printf("%.1f"), $3/$2 * 100.0}') CPU_USAGE=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | awk -F'%' '{print $1}') DISK_USAGE=$(df /opt | awk 'NR==2 {print $5}' | sed 's/%//') # Create JSON entry cat >> $BASELINE_FILE << EOF { "timestamp": "$TIMESTAMP", "api_response_time": $API_RESPONSE_TIME, "memory_usage_percent": $MEMORY_USAGE, "cpu_usage_percent": $CPU_USAGE, "disk_usage_percent": $DISK_USAGE } EOF ``` --- ## 3. Backup and Recovery ### Configuration Backup Procedures #### Daily Configuration Backup: ``` #!/bin/bash # Daily configuration backup script BACKUP_DIR="/opt/backups/vpn-controller" DATE=$(date +%Y%m%d) BACKUP_FILE="vpn-controller-config-${DATE}.tar.gz" mkdir -p $BACKUP_DIR # Backup configurations tar -czf "${BACKUP_DIR}/${BACKUP_FILE}" \ /opt/vpn-exit-controller/configs/ \ /opt/vpn-exit-controller/.env \ /opt/vpn-exit-controller/docker-compose.yml \ /opt/vpn-exit-controller/traefik/ \ /etc/systemd/system/vpn-controller.service # Keep only last 30 days of backups find $BACKUP_DIR -name "vpn-controller-config-*.tar.gz" -mtime +30 -delete echo "$(date): Configuration backup completed: $BACKUP_FILE" ``` #### Weekly Full System Backup: ``` #!/bin/bash # Weekly full system backup BACKUP_DIR="/opt/backups/vpn-controller" DATE=$(date +%Y%m%d) FULL_BACKUP="vpn-controller-full-${DATE}.tar.gz" # Stop services for consistent backup systemctl stop vpn-controller docker-compose -f /opt/vpn-exit-controller/docker-compose.yml down # Create full backup tar -czf "${BACKUP_DIR}/${FULL_BACKUP}" \ /opt/vpn-exit-controller/ \ /etc/systemd/system/vpn-controller.service \ --exclude=/opt/vpn-exit-controller/venv/ \ --exclude=/opt/vpn-exit-controller/logs/ \ --exclude=/opt/vpn-exit-controller/data/cache/ # Start services systemctl start vpn-controller # Keep only last 4 weekly backups find $BACKUP_DIR -name "vpn-controller-full-*.tar.gz" -mtime +28 -delete echo "$(date): Full system backup completed: $FULL_BACKUP" ``` ### Redis Database Backup ``` #!/bin/bash # Redis backup script BACKUP_DIR="/opt/backups/redis" DATE=$(date +%Y%m%d-%H%M) REDIS_BACKUP="redis-backup-${DATE}.rdb" mkdir -p $BACKUP_DIR # Create Redis backup docker exec vpn-redis redis-cli BGSAVE sleep 5 # Copy the backup file docker cp vpn-redis:/data/dump.rdb "${BACKUP_DIR}/${REDIS_BACKUP}" # Keep only last 7 days of Redis backups find $BACKUP_DIR -name "redis-backup-*.rdb" -mtime +7 -delete echo "$(date): Redis backup completed: $REDIS_BACKUP" ``` ### Recovery Testing Procedures #### Monthly Recovery Test: ``` #!/bin/bash # Monthly recovery test procedure TEST_DIR="/tmp/recovery-test-$(date +%Y%m%d)" LATEST_BACKUP=$(ls -t /opt/backups/vpn-controller/vpn-controller-config-*.tar.gz | head -1) echo "Testing recovery from backup: $LATEST_BACKUP" # Create test directory mkdir -p $TEST_DIR # Extract backup tar -xzf $LATEST_BACKUP -C $TEST_DIR # Verify key files exist REQUIRED_FILES=( "opt/vpn-exit-controller/configs/auth.txt" "opt/vpn-exit-controller/.env" "opt/vpn-exit-controller/docker-compose.yml" ) RECOVERY_SUCCESS=true for file in "${REQUIRED_FILES[@]}"; do if [ ! -f "$TEST_DIR/$file" ]; then echo "ERROR: Missing file in backup: $file" RECOVERY_SUCCESS=false fi done # Test configuration validity if [ "$RECOVERY_SUCCESS" = true ]; then echo "Recovery test PASSED" else echo "Recovery test FAILED" fi # Cleanup rm -rf $TEST_DIR ``` ### Disaster Recovery Planning #### Full System Recovery Procedure: 1. **Prepare New LXC Container:** ``` # On Proxmox host pct create 201 /var/lib/vz/template/cache/ubuntu-22.04-standard_22.04-1_amd64.tar.zst \ --memory 4096 --cores 4 --storage local-lvm --size 50G \ --net0 name=eth0,bridge=vmbr1,ip=10.10.10.20/24,gw=10.10.10.1 \ --nameserver 8.8.8.8 --hostname vpn-exit-controller \ --features nesting=1,keyctl=1 \ --unprivileged 0 ``` 2. **Start Container and Install Dependencies:** ``` pct start 201 pct enter 201 apt update && apt upgrade -y apt install -y docker.io docker-compose python3 python3-pip python3-venv curl systemctl enable docker systemctl start docker ``` 3. **Restore from Backup:** ``` # Copy latest backup to container LATEST_BACKUP=$(ls -t /opt/backups/vpn-controller/vpn-controller-full-*.tar.gz | head -1) tar -xzf $LATEST_BACKUP -C / # Restore permissions chown -R root:root /opt/vpn-exit-controller/ chmod +x /opt/vpn-exit-controller/start.sh # Restore systemd service systemctl daemon-reload systemctl enable vpn-controller systemctl start vpn-controller ``` 4. **Verify Recovery:** ``` # Check service status systemctl status vpn-controller # Test API curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/status # Check containers docker ps ``` --- ## 4. Updates and Upgrades ### System Package Updates #### Security Updates (Weekly): ``` #!/bin/bash # Weekly security updates LOG_FILE="/var/log/security-updates.log" echo "$(date): Starting security updates" >> $LOG_FILE # Update package lists apt update >> $LOG_FILE 2>&1 # Install security updates only DEBIAN_FRONTEND=noninteractive apt-get -y upgrade \ -o Dpkg::Options::="--force-confdef" \ -o Dpkg::Options::="--force-confold" \ $(apt list --upgradable 2>/dev/null | grep -i security | cut -d/ -f1) >> $LOG_FILE 2>&1 # Check for reboot requirement if [ -f /var/run/reboot-required ]; then echo "$(date): Reboot required after security updates" >> $LOG_FILE fi echo "$(date): Security updates completed" >> $LOG_FILE ``` #### Full System Updates (Monthly): ``` #!/bin/bash # Monthly full system updates (scheduled maintenance window) MAINTENANCE_LOG="/var/log/maintenance-updates.log" echo "$(date): Starting maintenance window - full system update" >> $MAINTENANCE_LOG # Stop VPN service systemctl stop vpn-controller >> $MAINTENANCE_LOG 2>&1 # Update all packages apt update && apt full-upgrade -y >> $MAINTENANCE_LOG 2>&1 # Clean up apt autoremove -y >> $MAINTENANCE_LOG 2>&1 apt autoclean >> $MAINTENANCE_LOG 2>&1 # Update Docker images docker-compose -f /opt/vpn-exit-controller/docker-compose.yml pull >> $MAINTENANCE_LOG 2>&1 # Restart service systemctl start vpn-controller >> $MAINTENANCE_LOG 2>&1 # Verify service is running sleep 30 if systemctl is-active --quiet vpn-controller; then echo "$(date): Service restarted successfully" >> $MAINTENANCE_LOG else echo "$(date): ERROR: Service failed to restart" >> $MAINTENANCE_LOG fi echo "$(date): Maintenance window completed" >> $MAINTENANCE_LOG ``` ### Docker Image Updates #### Check for Updates: ``` #!/bin/bash # Check for Docker image updates IMAGES=( "redis:7-alpine" "traefik:v2.10" ) for image in "${IMAGES[@]}"; do echo "Checking updates for $image" # Pull latest docker pull $image # Compare image IDs LOCAL_ID=$(docker images --no-trunc --quiet $image | head -1) REMOTE_ID=$(docker inspect --format='{{.Id}}' $image) if [ "$LOCAL_ID" != "$REMOTE_ID" ]; then echo "Update available for $image" else echo "No update available for $image" fi done ``` #### Update Custom VPN Node Image: ``` #!/bin/bash # Update VPN node image cd /opt/vpn-exit-controller/vpn-node # Build new image docker build -t vpn-exit-node:latest . # Test new image docker run --rm vpn-exit-node:latest --version # Restart containers with new image docker-compose -f /opt/vpn-exit-controller/docker-compose.yml up -d --force-recreate ``` ### Application Code Updates #### Update from Git Repository: ``` #!/bin/bash # Update application code from repository cd /opt/vpn-exit-controller # Backup current version tar -czf "/opt/backups/pre-update-$(date +%Y%m%d).tar.gz" . --exclude=venv --exclude=data # Stop service systemctl stop vpn-controller # Pull updates (if using git) # git pull origin main # Update Python dependencies source venv/bin/activate pip install -r api/requirements.txt # Restart service systemctl start vpn-controller # Verify update sleep 30 curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/status ``` ### Dependency Management #### Python Dependencies Audit: ``` #!/bin/bash # Audit Python dependencies for security vulnerabilities cd /opt/vpn-exit-controller source venv/bin/activate # Check for outdated packages pip list --outdated # Security audit (install pip-audit if not available) pip install pip-audit pip-audit # Generate requirements with exact versions pip freeze > requirements-frozen.txt ``` --- ## 5. Certificate Management ### SSL Certificate Monitoring #### Check Certificate Expiration: ``` #!/bin/bash # Check SSL certificate expiration CERT_FILE="/opt/vpn-exit-controller/traefik/letsencrypt/acme.json" ALERT_DAYS=30 if [ -f "$CERT_FILE" ]; then # Extract certificate expiration dates python3 << EOF import json import base64 import datetime from cryptography import x509 with open('$CERT_FILE', 'r') as f: acme_data = json.load(f) for resolver in acme_data.values(): for cert_data in resolver.get('Certificates', []): cert_pem = base64.b64decode(cert_data['certificate']).decode() cert = x509.load_pem_x509_certificate(cert_pem.encode()) domain = cert_data['domain']['main'] expiry = cert.not_valid_after days_left = (expiry - datetime.datetime.now()).days print(f"Domain: {domain}, Expires: {expiry}, Days left: {days_left}") if days_left < $ALERT_DAYS: print(f"WARNING: Certificate for {domain} expires in {days_left} days") EOF fi ``` ### Let's Encrypt Certificate Renewal #### Automatic Renewal Check: ``` #!/bin/bash # Check Let's Encrypt certificate renewal # Traefik handles automatic renewal, but verify it's working TRAEFIK_LOG="/opt/vpn-exit-controller/traefik/logs/traefik.log" # Check for recent renewal attempts if [ -f "$TRAEFIK_LOG" ]; then echo "Recent certificate renewal attempts:" grep -i "certificate\|acme" "$TRAEFIK_LOG" | tail -10 fi # Verify certificate is valid DOMAIN="your-domain.com" # Replace with actual domain echo | openssl s_client -servername $DOMAIN -connect $DOMAIN:443 2>/dev/null | openssl x509 -noout -dates ``` ### Certificate Troubleshooting #### Common Issues and Solutions: ``` #!/bin/bash # Certificate troubleshooting script echo "Certificate Troubleshooting Report" echo "==================================" # Check Traefik configuration echo "1. Checking Traefik configuration..." docker-compose -f /opt/vpn-exit-controller/traefik/docker-compose.traefik.yml config # Check ACME challenge accessibility echo "2. Checking ACME challenge accessibility..." DOMAIN="your-domain.com" # Replace with actual domain curl -I "http://$DOMAIN/.well-known/acme-challenge/test" # Check DNS resolution echo "3. Checking DNS resolution..." nslookup $DOMAIN # Check port accessibility echo "4. Checking port 443 accessibility..." nc -zv $DOMAIN 443 # Check Traefik dashboard echo "5. Checking Traefik dashboard..." curl -I http://localhost:8080/dashboard/ ``` --- ## 6. VPN Service Management ### NordVPN Credential Rotation #### Monthly Credential Check: ``` #!/bin/bash # Check and rotate NordVPN credentials if needed AUTH_FILE="/opt/vpn-exit-controller/configs/auth.txt" CURRENT_DATE=$(date +%s) FILE_AGE=$(stat -c %Y "$AUTH_FILE") AGE_DAYS=$(( (CURRENT_DATE - FILE_AGE) / 86400 )) echo "NordVPN credentials are $AGE_DAYS days old" if [ $AGE_DAYS -gt 60 ]; then echo "WARNING: NordVPN credentials are older than 60 days" echo "Consider updating credentials in $AUTH_FILE" # Test current credentials echo "Testing current credentials..." # This would require implementing a credential test function fi ``` #### Credential Update Procedure: ``` #!/bin/bash # Update NordVPN credentials AUTH_FILE="/opt/vpn-exit-controller/configs/auth.txt" BACKUP_FILE="/opt/backups/auth-backup-$(date +%Y%m%d).txt" echo "Updating NordVPN credentials..." # Backup current credentials cp "$AUTH_FILE" "$BACKUP_FILE" # Prompt for new credentials (in production, use secure input method) echo "Enter new NordVPN username:" read -r NEW_USERNAME echo "Enter new NordVPN password:" read -rs NEW_PASSWORD # Update auth file echo "$NEW_USERNAME" > "$AUTH_FILE" echo "$NEW_PASSWORD" >> "$AUTH_FILE" # Restart VPN containers to use new credentials docker-compose -f /opt/vpn-exit-controller/docker-compose.yml restart echo "Credentials updated. Testing connections..." # Wait for containers to start sleep 30 # Test API to verify VPN connections are working curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/status ``` ### Server List Updates #### Update NordVPN Server Configurations: ``` #!/bin/bash # Update NordVPN server configurations SCRIPT_PATH="/opt/vpn-exit-controller/scripts/download-nordvpn-configs.sh" CONFIG_DIR="/opt/vpn-exit-controller/configs/vpn" echo "Updating NordVPN server configurations..." # Backup current configurations tar -czf "/opt/backups/vpn-configs-backup-$(date +%Y%m%d).tar.gz" "$CONFIG_DIR" # Run the download script if [ -f "$SCRIPT_PATH" ]; then bash "$SCRIPT_PATH" echo "Server configurations updated" # Restart service to load new configurations systemctl restart vpn-controller else echo "ERROR: Download script not found at $SCRIPT_PATH" fi ``` ### Performance Optimization #### VPN Node Performance Tuning: ``` #!/bin/bash # VPN node performance optimization echo "VPN Performance Optimization Report" echo "===================================" # Check connection speeds for each country COUNTRIES=("us" "uk" "de" "jp" "au") for country in "${COUNTRIES[@]}"; do echo "Testing $country nodes..." # Use speed test API endpoint SPEED_RESULT=$(curl -s -u admin:Bl4ckMagic!2345erver \ "http://localhost:8080/api/speed-test/$country" | jq '.download_speed') echo "$country: $SPEED_RESULT Mbps" done # Check for underperforming nodes echo -e "\nChecking for underperforming nodes..." curl -s -u admin:Bl4ckMagic!2345erver \ "http://localhost:8080/api/metrics/nodes" | \ jq '.[] | select(.performance_score < 0.7) | {country: .country, score: .performance_score}' ``` ### Service Status Monitoring #### Comprehensive VPN Service Check: ``` #!/bin/bash # Comprehensive VPN service status check STATUS_LOG="/var/log/vpn-service-status.log" TIMESTAMP=$(date) echo "[$TIMESTAMP] VPN Service Status Check" >> $STATUS_LOG # Check each VPN container VPN_CONTAINERS=$(docker ps --filter "name=vpn-" --format "{{.Names}}") for container in $VPN_CONTAINERS; do # Check container health HEALTH=$(docker inspect --format='{{.State.Health.Status}}' $container 2>/dev/null || echo "no health check") # Check VPN connection IP_CHECK=$(docker exec $container curl -s --max-time 10 ifconfig.me 2>/dev/null || echo "failed") echo "[$TIMESTAMP] $container: Health=$HEALTH, IP=$IP_CHECK" >> $STATUS_LOG done # Check overall API health API_HEALTH=$(curl -s -u admin:Bl4ckMagic!2345erver -o /dev/null -w "%{http_code}" http://localhost:8080/api/status) echo "[$TIMESTAMP] API Health: HTTP $API_HEALTH" >> $STATUS_LOG # Check Redis connectivity REDIS_HEALTH=$(docker exec vpn-redis redis-cli ping 2>/dev/null || echo "FAILED") echo "[$TIMESTAMP] Redis Health: $REDIS_HEALTH" >> $STATUS_LOG ``` --- ## 7. Capacity Management ### Resource Usage Monitoring #### Real-time Resource Monitor: ``` #!/bin/bash # Real-time resource monitoring script MONITOR_LOG="/var/log/resource-monitor.log" while true; do TIMESTAMP=$(date "+%Y-%m-%d %H:%M:%S") # System resources CPU_USAGE=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | awk -F'%' '{print $1}') MEM_USAGE=$(free | grep Mem | awk '{printf("%.1f"), $3/$2 * 100.0}') DISK_USAGE=$(df /opt | awk 'NR==2 {print $5}' | sed 's/%//') # Docker resources DOCKER_CONTAINERS=$(docker ps -q | wc -l) # Network connections CONNECTIONS=$(ss -tuln | wc -l) echo "$TIMESTAMP,$CPU_USAGE,$MEM_USAGE,$DISK_USAGE,$DOCKER_CONTAINERS,$CONNECTIONS" >> $MONITOR_LOG sleep 300 # 5-minute intervals done ``` ### Scaling Decisions #### Auto-scaling Triggers: ``` #!/bin/bash # Auto-scaling decision logic SCALE_LOG="/var/log/scaling-decisions.log" TIMESTAMP=$(date) # Get current metrics CPU_USAGE=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | awk -F'%' '{print $1}') ACTIVE_CONNECTIONS=$(curl -s -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/metrics/connections | jq '.active_connections') RUNNING_NODES=$(docker ps --filter "name=vpn-" -q | wc -l) echo "[$TIMESTAMP] Scaling Check: CPU=$CPU_USAGE%, Connections=$ACTIVE_CONNECTIONS, Nodes=$RUNNING_NODES" >> $SCALE_LOG # Scaling up logic if (( $(echo "$CPU_USAGE > 70" | bc -l) )) && [ "$ACTIVE_CONNECTIONS" -gt 100 ] && [ "$RUNNING_NODES" -lt 10 ]; then echo "[$TIMESTAMP] SCALE UP: High load detected" >> $SCALE_LOG # Trigger scale up via API curl -X POST -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/nodes/scale-up fi # Scaling down logic if (( $(echo "$CPU_USAGE < 30" | bc -l) )) && [ "$ACTIVE_CONNECTIONS" -lt 20 ] && [ "$RUNNING_NODES" -gt 3 ]; then echo "[$TIMESTAMP] SCALE DOWN: Low load detected" >> $SCALE_LOG # Trigger scale down via API curl -X POST -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/nodes/scale-down fi ``` ### Node Capacity Planning #### Capacity Planning Analysis: ``` #!/bin/bash # Node capacity planning analysis REPORT_FILE="/var/log/capacity-analysis-$(date +%Y%m%d).log" echo "Node Capacity Planning Analysis - $(date)" > $REPORT_FILE echo "=========================================" >> $REPORT_FILE # Current node utilization echo -e "\nCurrent Node Utilization:" >> $REPORT_FILE curl -s -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/metrics/nodes | \ jq -r '.[] | "\(.country): \(.active_connections) connections, \(.cpu_usage)% CPU, \(.memory_usage)% Memory"' >> $REPORT_FILE # Peak usage analysis (last 7 days) echo -e "\nPeak Usage Analysis (Last 7 Days):" >> $REPORT_FILE # Historical connection data PEAK_CONNECTIONS=$(grep "connections" /var/log/resource-monitor.log | \ tail -2016 | cut -d',' -f5 | sort -n | tail -1) # Last 7 days of 5-min intervals echo "Peak concurrent connections: $PEAK_CONNECTIONS" >> $REPORT_FILE # Capacity recommendations echo -e "\nCapacity Recommendations:" >> $REPORT_FILE if [ "$PEAK_CONNECTIONS" -gt 500 ]; then echo "- Consider adding more VPN nodes for high-demand countries" >> $REPORT_FILE fi if [ "$(docker ps -q | wc -l)" -gt 15 ]; then echo "- Current container count is high, consider resource optimization" >> $REPORT_FILE fi echo "- Monitor load balancing effectiveness across regions" >> $REPORT_FILE ``` ### Performance Optimization #### System Performance Tuning: ``` #!/bin/bash # System performance optimization echo "Applying performance optimizations..." # Network optimizations echo 'net.core.rmem_max = 16777216' >> /etc/sysctl.conf echo 'net.core.wmem_max = 16777216' >> /etc/sysctl.conf echo 'net.ipv4.tcp_rmem = 4096 65536 16777216' >> /etc/sysctl.conf echo 'net.ipv4.tcp_wmem = 4096 65536 16777216' >> /etc/sysctl.conf # Apply changes sysctl -p # Docker performance optimizations echo '{ "log-driver": "json-file", "log-opts": { "max-size": "10m", "max-file": "3" }, "default-ulimits": { "nofile": { "Name": "nofile", "Hard": 64000, "Soft": 64000 } } }' > /etc/docker/daemon.json systemctl restart docker echo "Performance optimizations applied" ``` --- ## 8. Security Maintenance ### Security Patch Management #### Critical Security Updates: ``` #!/bin/bash # Critical security update management SECURITY_LOG="/var/log/security-updates.log" TIMESTAMP=$(date) echo "[$TIMESTAMP] Starting critical security check" >> $SECURITY_LOG # Check for available security updates SECURITY_UPDATES=$(apt list --upgradable 2>/dev/null | grep -i security | wc -l) if [ "$SECURITY_UPDATES" -gt 0 ]; then echo "[$TIMESTAMP] $SECURITY_UPDATES security updates available" >> $SECURITY_LOG # Apply critical security updates immediately DEBIAN_FRONTEND=noninteractive apt-get -y install \ $(apt list --upgradable 2>/dev/null | grep -i security | cut -d/ -f1) \ >> $SECURITY_LOG 2>&1 # Check if reboot is required if [ -f /var/run/reboot-required ]; then echo "[$TIMESTAMP] CRITICAL: Reboot required for security updates" >> $SECURITY_LOG # Schedule reboot during maintenance window shutdown -r +60 "Security updates require reboot" fi else echo "[$TIMESTAMP] No security updates available" >> $SECURITY_LOG fi # Update Docker base images for security docker-compose -f /opt/vpn-exit-controller/docker-compose.yml pull >> $SECURITY_LOG 2>&1 ``` ### Credential Rotation Schedules #### Quarterly Credential Rotation: ``` #!/bin/bash # Quarterly credential rotation ROTATION_LOG="/var/log/credential-rotation.log" TIMESTAMP=$(date) echo "[$TIMESTAMP] Starting quarterly credential rotation" >> $ROTATION_LOG # 1. API Admin Password Rotation echo "[$TIMESTAMP] Rotating API admin password" >> $ROTATION_LOG NEW_API_PASSWORD=$(openssl rand -base64 32) # Update environment file sed -i "s/ADMIN_PASS=.*/ADMIN_PASS=$NEW_API_PASSWORD/" /opt/vpn-exit-controller/.env # 2. Redis Password (if used) # NEW_REDIS_PASSWORD=$(openssl rand -base64 32) # 3. Secret Key Rotation NEW_SECRET_KEY=$(openssl rand -base64 64) sed -i "s/SECRET_KEY=.*/SECRET_KEY=$NEW_SECRET_KEY/" /opt/vpn-exit-controller/.env # 4. Restart services with new credentials systemctl restart vpn-controller echo "[$TIMESTAMP] Credential rotation completed" >> $ROTATION_LOG echo "[$TIMESTAMP] New API password: $NEW_API_PASSWORD" >> $ROTATION_LOG ``` ### Access Review Procedures #### Monthly Access Review: ``` #!/bin/bash # Monthly access review REVIEW_LOG="/var/log/access-review-$(date +%Y%m).log" echo "Monthly Access Review - $(date)" > $REVIEW_LOG echo "=================================" >> $REVIEW_LOG # Review systemd service permissions echo -e "\nService File Permissions:" >> $REVIEW_LOG ls -la /etc/systemd/system/vpn-controller.service >> $REVIEW_LOG # Review application directory permissions echo -e "\nApplication Directory Permissions:" >> $REVIEW_LOG ls -la /opt/vpn-exit-controller/ >> $REVIEW_LOG # Review Docker socket access echo -e "\nDocker Socket Access:" >> $REVIEW_LOG ls -la /var/run/docker.sock >> $REVIEW_LOG # Review network exposure echo -e "\nNetwork Exposure:" >> $REVIEW_LOG ss -tuln >> $REVIEW_LOG # Review recent authentication attempts echo -e "\nRecent Authentication Attempts:" >> $REVIEW_LOG journalctl -u vpn-controller --since "30 days ago" | grep -i "auth\|login" | tail -20 >> $REVIEW_LOG ``` ### Security Audit Schedules #### Comprehensive Security Audit: ``` #!/bin/bash # Comprehensive security audit AUDIT_LOG="/var/log/security-audit-$(date +%Y%m%d).log" echo "Security Audit Report - $(date)" > $AUDIT_LOG echo "===============================" >> $AUDIT_LOG # 1. System vulnerabilities scan echo -e "\n1. System Vulnerability Scan:" >> $AUDIT_LOG if command -v lynis &> /dev/null; then lynis audit system --quiet >> $AUDIT_LOG else echo "Lynis not installed. Install with: apt install lynis" >> $AUDIT_LOG fi # 2. Open ports audit echo -e "\n2. Open Ports Audit:" >> $AUDIT_LOG nmap -sS -p- localhost >> $AUDIT_LOG 2>&1 # 3. Docker security audit echo -e "\n3. Docker Security Audit:" >> $AUDIT_LOG docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \ -v /usr/lib/systemd:/usr/lib/systemd \ -v /etc:/etc --label docker_bench_security \ docker/docker-bench-security >> $AUDIT_LOG 2>/dev/null || echo "Docker Bench Security not available" >> $AUDIT_LOG # 4. File permissions audit echo -e "\n4. Critical File Permissions:" >> $AUDIT_LOG CRITICAL_FILES=( "/opt/vpn-exit-controller/.env" "/opt/vpn-exit-controller/configs/auth.txt" "/etc/systemd/system/vpn-controller.service" ) for file in "${CRITICAL_FILES[@]}"; do if [ -f "$file" ]; then ls -la "$file" >> $AUDIT_LOG fi done # 5. Process audit echo -e "\n5. Running Processes:" >> $AUDIT_LOG ps aux | grep -E "(vpn|docker|redis)" >> $AUDIT_LOG echo -e "\nSecurity audit completed. Review $AUDIT_LOG for findings." >> $AUDIT_LOG ``` --- ## 9. Documentation Updates ### Keeping Documentation Current #### Documentation Update Checklist: ``` ## Monthly Documentation Review Checklist - [ ] Review API_DOCUMENTATION.md for new endpoints - [ ] Update ARCHITECTURE.md for infrastructure changes - [ ] Check DEPLOYMENT.md for new deployment procedures - [ ] Verify SECURITY.md reflects current security measures - [ ] Update this MAINTENANCE.md for new procedures - [ ] Review configuration examples for accuracy - [ ] Update version numbers and dependencies - [ ] Check command examples are current - [ ] Verify troubleshooting guides are accurate - [ ] Update contact information and escalation procedures ``` #### Automated Documentation Validation: ``` #!/bin/bash # Validate documentation accuracy DOC_DIR="/opt/vpn-exit-controller" VALIDATION_LOG="/var/log/doc-validation.log" TIMESTAMP=$(date) echo "[$TIMESTAMP] Starting documentation validation" > $VALIDATION_LOG # Check if mentioned files exist DOCS=("API_DOCUMENTATION.md" "ARCHITECTURE.md" "DEPLOYMENT.md" "SECURITY.md" "MAINTENANCE.md") for doc in "${DOCS[@]}"; do if [ -f "$DOC_DIR/$doc" ]; then echo "[$TIMESTAMP] ✓ $doc exists" >> $VALIDATION_LOG # Check for outdated information LAST_MODIFIED=$(stat -c %Y "$DOC_DIR/$doc") CURRENT_TIME=$(date +%s) DAYS_OLD=$(( (CURRENT_TIME - LAST_MODIFIED) / 86400 )) if [ $DAYS_OLD -gt 90 ]; then echo "[$TIMESTAMP] ⚠ $doc is $DAYS_OLD days old (review needed)" >> $VALIDATION_LOG fi else echo "[$TIMESTAMP] ✗ $doc missing" >> $VALIDATION_LOG fi done # Validate command examples in documentation echo "[$TIMESTAMP] Validating command examples..." >> $VALIDATION_LOG # Test API endpoint mentioned in docs if curl -s -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/status > /dev/null; then echo "[$TIMESTAMP] ✓ API endpoint accessible" >> $VALIDATION_LOG else echo "[$TIMESTAMP] ✗ API endpoint not accessible" >> $VALIDATION_LOG fi # Test service commands if systemctl is-active --quiet vpn-controller; then echo "[$TIMESTAMP] ✓ vpn-controller service running" >> $VALIDATION_LOG else echo "[$TIMESTAMP] ✗ vpn-controller service not running" >> $VALIDATION_LOG fi ``` ### Change Log Maintenance #### Automated Change Log Updates: ``` #!/bin/bash # Update CHANGELOG.md with recent changes CHANGELOG="/opt/vpn-exit-controller/CHANGELOG.md" TEMP_LOG="/tmp/changelog-temp.md" # Get recent commits (if using git) if [ -d ".git" ]; then echo "## $(date +%Y-%m-%d) - Maintenance Update" > $TEMP_LOG echo "" >> $TEMP_LOG # Add git log entries git log --since="1 month ago" --pretty=format:"- %s (%an)" >> $TEMP_LOG echo "" >> $TEMP_LOG echo "" >> $TEMP_LOG # Prepend to existing changelog if [ -f "$CHANGELOG" ]; then cat "$CHANGELOG" >> $TEMP_LOG mv "$TEMP_LOG" "$CHANGELOG" else mv "$TEMP_LOG" "$CHANGELOG" fi fi ``` ### Configuration Documentation #### Auto-generate Configuration Documentation: ``` #!/bin/bash # Generate current configuration documentation CONFIG_DOC="/opt/vpn-exit-controller/CONFIG_CURRENT.md" echo "# Current System Configuration" > $CONFIG_DOC echo "Generated on: $(date)" >> $CONFIG_DOC echo "" >> $CONFIG_DOC # System information echo "## System Information" >> $CONFIG_DOC echo "- **OS**: $(lsb_release -d | cut -f2)" >> $CONFIG_DOC echo "- **Kernel**: $(uname -r)" >> $CONFIG_DOC echo "- **Docker Version**: $(docker --version)" >> $CONFIG_DOC echo "- **Python Version**: $(python3 --version)" >> $CONFIG_DOC echo "" >> $CONFIG_DOC # Service status echo "## Service Status" >> $CONFIG_DOC echo "\`\`\`" >> $CONFIG_DOC systemctl status vpn-controller --no-pager >> $CONFIG_DOC echo "\`\`\`" >> $CONFIG_DOC echo "" >> $CONFIG_DOC # Docker containers echo "## Running Containers" >> $CONFIG_DOC echo "\`\`\`" >> $CONFIG_DOC docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}" >> $CONFIG_DOC echo "\`\`\`" >> $CONFIG_DOC echo "" >> $CONFIG_DOC # Network configuration echo "## Network Configuration" >> $CONFIG_DOC echo "\`\`\`" >> $CONFIG_DOC ip addr show | grep -E "(inet|link)" >> $CONFIG_DOC echo "\`\`\`" >> $CONFIG_DOC ``` --- ## 10. Emergency Procedures ### System Outage Response #### Emergency Response Playbook: ``` #!/bin/bash # Emergency response script for system outages EMERGENCY_LOG="/var/log/emergency-response.log" TIMESTAMP=$(date) echo "[$TIMESTAMP] EMERGENCY: System outage detected" >> $EMERGENCY_LOG # 1. Immediate Assessment echo "[$TIMESTAMP] Step 1: Immediate assessment" >> $EMERGENCY_LOG # Check if container is running if pct status 201 | grep -q "status: running"; then echo "[$TIMESTAMP] ✓ LXC container is running" >> $EMERGENCY_LOG else echo "[$TIMESTAMP] ✗ LXC container is stopped - starting now" >> $EMERGENCY_LOG pct start 201 sleep 30 fi # Check core services SERVICES=("docker" "vpn-controller") for service in "${SERVICES[@]}"; do if systemctl is-active --quiet $service; then echo "[$TIMESTAMP] ✓ $service is running" >> $EMERGENCY_LOG else echo "[$TIMESTAMP] ✗ $service is stopped - restarting" >> $EMERGENCY_LOG systemctl restart $service fi done # 2. Quick Recovery Attempt echo "[$TIMESTAMP] Step 2: Quick recovery attempt" >> $EMERGENCY_LOG # Restart VPN controller systemctl restart vpn-controller sleep 60 # Test API if curl -s -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/status > /dev/null; then echo "[$TIMESTAMP] ✓ API is responding - emergency recovery successful" >> $EMERGENCY_LOG exit 0 fi # 3. Docker Recovery echo "[$TIMESTAMP] Step 3: Docker container recovery" >> $EMERGENCY_LOG cd /opt/vpn-exit-controller docker-compose down docker-compose up -d sleep 120 # Test again if curl -s -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/status > /dev/null; then echo "[$TIMESTAMP] ✓ API responding after Docker restart" >> $EMERGENCY_LOG exit 0 fi # 4. Full System Recovery echo "[$TIMESTAMP] Step 4: Full system recovery required" >> $EMERGENCY_LOG echo "[$TIMESTAMP] Escalating to manual intervention" >> $EMERGENCY_LOG # Generate diagnostic report /opt/vpn-exit-controller/scripts/generate-diagnostic-report.sh >> $EMERGENCY_LOG echo "[$TIMESTAMP] Emergency procedures completed - manual intervention required" >> $EMERGENCY_LOG ``` ### Emergency Contact Procedures #### Contact List and Escalation: ``` ## Emergency Contact Procedures ### Severity Levels #### P1 - Critical (Response Time: 15 minutes) - Complete system outage - Security breach - Data loss #### P2 - High (Response Time: 2 hours) - Partial system outage - Performance degradation > 50% - Single node failures affecting service #### P3 - Medium (Response Time: 24 hours) - Minor performance issues - Single container failures - Non-critical errors #### P4 - Low (Response Time: 72 hours) - Cosmetic issues - Documentation updates - Feature requests ### Contact Information **Primary On-Call Engineer** - Name: [Your Name] - Phone: [Phone Number] - Email: [Email Address] - Signal/WhatsApp: [Number] **Secondary Engineer** - Name: [Backup Name] - Phone: [Phone Number] - Email: [Email Address] **Infrastructure Team** - Email: infrastructure@yourcompany.com - Slack: #infrastructure-alerts ### Escalation Matrix 1. **0-15 minutes**: Primary engineer response 2. **15-30 minutes**: Escalate to secondary engineer 3. **30-60 minutes**: Escalate to infrastructure team lead 4. **1+ hours**: Escalate to engineering manager ``` ### Critical Issue Escalation #### Automated Escalation Script: ``` #!/bin/bash # Automated escalation for critical issues ISSUE_TYPE="$1" SEVERITY="$2" DETAILS="$3" ESCALATION_LOG="/var/log/escalation.log" TIMESTAMP=$(date) echo "[$TIMESTAMP] ESCALATION: $ISSUE_TYPE (Severity: $SEVERITY)" >> $ESCALATION_LOG echo "[$TIMESTAMP] Details: $DETAILS" >> $ESCALATION_LOG # Generate system snapshot SNAPSHOT_FILE="/tmp/system-snapshot-$(date +%Y%m%d-%H%M%S).txt" cat > $SNAPSHOT_FILE << EOF EMERGENCY SYSTEM SNAPSHOT ======================== Time: $(date) Issue: $ISSUE_TYPE Severity: $SEVERITY Details: $DETAILS System Status: $(systemctl status vpn-controller --no-pager) Docker Status: $(docker ps -a) Resource Usage: $(free -h) $(df -h) Recent Logs: $(journalctl -u vpn-controller --since "1 hour ago" --no-pager | tail -50) Network Status: $(ss -tuln) EOF # Send alerts based on severity case $SEVERITY in "P1"|"CRITICAL") # Immediate alerts echo "[$TIMESTAMP] Sending P1 alerts" >> $ESCALATION_LOG # curl -X POST webhook-url or send email ;; "P2"|"HIGH") echo "[$TIMESTAMP] Sending P2 alerts" >> $ESCALATION_LOG # Send high priority alerts ;; *) echo "[$TIMESTAMP] Logging issue for review" >> $ESCALATION_LOG ;; esac echo "[$TIMESTAMP] Escalation process initiated" >> $ESCALATION_LOG ``` ### Recovery Time Objectives #### RTO/RPO Definitions: ``` ## Recovery Time and Point Objectives ### Recovery Time Objectives (RTO) | Component | Target RTO | Maximum Acceptable | |-----------|------------|-------------------| | VPN API Service | 5 minutes | 15 minutes | | Individual VPN Nodes | 2 minutes | 5 minutes | | Database (Redis) | 3 minutes | 10 minutes | | Full System | 15 minutes | 30 minutes | | Container Infrastructure | 10 minutes | 20 minutes | ### Recovery Point Objectives (RPO) | Data Type | Target RPO | Backup Frequency | |-----------|------------|------------------| | Configuration Files | 0 minutes | Real-time sync | | System Metrics | 5 minutes | Continuous | | Connection Logs | 15 minutes | Every 15 minutes | | Performance Data | 1 hour | Hourly snapshots | ### Service Level Objectives (SLO) - **Availability**: 99.9% (8.76 hours downtime/year) - **API Response Time**: < 500ms (95th percentile) - **VPN Connection Success Rate**: > 99% - **Data Recovery Success Rate**: 100% ### Disaster Recovery Scenarios #### Scenario 1: Complete LXC Container Failure - **RTO**: 30 minutes - **Procedure**: Restore from latest backup to new container - **Validation**: Full service test suite #### Scenario 2: Docker Infrastructure Failure - **RTO**: 15 minutes - **Procedure**: Restart Docker daemon, rebuild containers - **Validation**: Container health checks #### Scenario 3: Database Corruption - **RTO**: 10 minutes - **Procedure**: Restore Redis from latest backup - **Validation**: Data integrity verification #### Scenario 4: Configuration Corruption - **RTO**: 5 minutes - **Procedure**: Restore from configuration backup - **Validation**: Service functionality test ``` --- ## Maintenance Schedule Summary ### Daily (Automated) - ✅ Health checks (6:00 AM) - ✅ Resource monitoring (Every 5 minutes) - ✅ Log rotation checks - ✅ Basic performance metrics ### Weekly - ✅ Performance review (Sunday 2:00 AM) - ✅ Security updates (Sunday 3:00 AM) - ✅ Configuration backup verification - ✅ VPN node performance analysis ### Monthly - ✅ Full system updates (First Sunday 3:00 AM) - ✅ Access review - ✅ Documentation validation - ✅ Capacity planning review - ✅ Recovery testing ### Quarterly - ✅ Comprehensive security audit - ✅ Credential rotation - ✅ Disaster recovery testing - ✅ Performance optimization review - ✅ Documentation major updates --- ## Quick Reference Commands ### Emergency Commands ``` # Emergency service restart systemctl restart vpn-controller # Check service status systemctl status vpn-controller # View real-time logs journalctl -u vpn-controller -f # Test API health curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/status # Docker container status docker ps -a # Resource usage htop df -h free -h ``` ### Maintenance Commands ``` # Activate Python environment source /opt/vpn-exit-controller/venv/bin/activate # Update system packages apt update && apt upgrade -y # Restart all containers docker-compose -f /opt/vpn-exit-controller/docker-compose.yml restart # Create configuration backup tar -czf backup-$(date +%Y%m%d).tar.gz /opt/vpn-exit-controller/configs/ # Check certificate expiration openssl x509 -in cert.pem -noout -dates ``` --- **Document Version**: 1.0 **Last Updated**: $(date) **Next Review**: $(date -d "+1 month") --- *This maintenance guide should be reviewed and updated monthly to ensure accuracy and completeness. All procedures should be tested in a development environment before applying to production.* --- ## Operations > Monitoring This guide covers monitoring the VPN Exit Controller system for health, performance, and security. ## Overview Effective monitoring is crucial for maintaining a reliable VPN exit node service. This guide covers the tools and procedures for monitoring all aspects of the system. ## System Monitoring ### Service Health Monitor core services: ``` # Check VPN controller service systemctl status vpn-controller # Check documentation webhook systemctl status docs-webhook # Check Docker services docker ps --format "table {{.Names}}\t{{.Status}}\t{{.RunningFor}}" ``` ### Container Health Monitor VPN exit node containers: ``` # Check all VPN containers docker ps | grep vpn-exit # Check specific country node docker inspect vpn-exit-us --format='{{.State.Health.Status}}' # View container resource usage docker stats --no-stream ``` ## Performance Monitoring ### API Performance ``` # Check API response time time curl -s https://exit.idlegaming.org/api/health # Monitor API logs journalctl -u vpn-controller -f | grep -E "(ERROR|WARNING|response_time)" ``` ### Proxy Performance ``` # Test proxy response time time curl -x http://admin:password@proxy-us.exit.idlegaming.org:3128 http://httpbin.org/ip # Check HAProxy statistics docker exec haproxy-default echo "show stat" | socat stdio /var/run/haproxy/admin.sock ``` ## Log Monitoring ### Key Log Locations - **API Logs**: `journalctl -u vpn-controller -f` - **Webhook Logs**: `/opt/vpn-exit-controller/logs/webhook.log` - **Docker Logs**: `docker logs [container-name]` - **Rebuild Logs**: `/opt/vpn-exit-controller/logs/docs-rebuild.log` ### Log Analysis ``` # Check for errors in last hour journalctl -u vpn-controller --since "1 hour ago" | grep ERROR # Monitor real-time errors tail -f /var/log/syslog | grep -E "(error|fail|critical)" # Check container restart frequency docker ps -a --filter "name=vpn-" --format "table {{.Names}}\t{{.Status}}" ``` ## Metrics Collection ### System Metrics ``` # CPU and Memory usage top -b -n 1 | head -20 # Disk usage df -h | grep -E "(/$|/opt|/var)" # Network connections ss -tunap | grep -E "(8080|3128|1080)" ``` ### Container Metrics ``` # Container resource usage docker stats --no-stream --format "table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}" # Container network stats docker exec vpn-exit-us cat /proc/net/dev ``` ## Alerting ### Basic Health Checks Create a simple monitoring script: ``` #!/bin/bash # /opt/vpn-exit-controller/scripts/health-check.sh # Check API if ! curl -sf https://exit.idlegaming.org/api/health > /dev/null; then echo "ALERT: API is down" fi # Check containers if [ $(docker ps | grep -c vpn-exit) -lt 10 ]; then echo "ALERT: Some VPN containers are down" fi # Check disk space if [ $(df -h / | awk 'NR==2 {print $5}' | sed 's/%//') -gt 80 ]; then echo "ALERT: Disk usage above 80%" fi ``` ### Automated Monitoring Set up a cron job for regular checks: ``` # Add to crontab */5 * * * * /opt/vpn-exit-controller/scripts/health-check.sh >> /opt/vpn-exit-controller/logs/health-check.log 2>&1 ``` ## Dashboard Monitoring Access monitoring dashboards: - **VPN Dashboard**: https://exit.idlegaming.org/ - **API Metrics**: https://exit.idlegaming.org/api/metrics/summary - **Node Status**: https://exit.idlegaming.org/api/nodes ## Troubleshooting Common Issues ### High CPU Usage 1. Check container stats: `docker stats` 2. Identify problematic container 3. Restart if necessary: `docker restart [container]` ### Memory Leaks 1. Monitor memory over time: `free -m -s 5` 2. Check for growing processes: `ps aux | sort -nrk 4 | head` 3. Restart services if needed ### Network Issues 1. Check connectivity: `ping -c 4 8.8.8.8` 2. Verify DNS: `nslookup exit.idlegaming.org` 3. Check firewall: `ufw status` ## Best Practices 1. **Regular Reviews**: Check logs daily 2. **Automate Alerts**: Set up automated monitoring 3. **Document Issues**: Keep a log of problems and solutions 4. **Capacity Planning**: Monitor trends for growth 5. **Security Monitoring**: Watch for unusual activity --- !!! tip "Monitoring Tools" Consider implementing additional monitoring tools like Prometheus, Grafana, or Netdata for more comprehensive monitoring capabilities. --- ## Operations > Scaling ### Scaling Guide This guide covers strategies for scaling the VPN Exit Controller system to handle increased load and geographic expansion. ## Overview The VPN Exit Controller is designed to scale both vertically (more resources) and horizontally (more nodes). This guide covers both approaches and when to use each. ## Vertical Scaling ### Resource Upgrades Increase resources for the existing LXC container: ``` # On Proxmox host # Increase CPU cores pct set 201 --cores 8 # Increase memory pct set 201 --memory 16384 # Increase disk pct resize 201 rootfs +20G ``` ### Container Limits Adjust Docker resource limits: ``` # In docker-compose.yml services: vpn-exit-us: deploy: resources: limits: cpus: '2.0' memory: 2G reservations: cpus: '1.0' memory: 1G ``` ## Horizontal Scaling ### Adding More VPN Nodes 1. **Add new country nodes**: ``` # Use the API to start new nodes curl -X POST -u admin:password \ https://exit.idlegaming.org/api/nodes/br/start ``` 2. **Configure load balancing**: ``` # Add to HAProxy configuration server vpn-br-1 100.73.33.20:3128 check server vpn-br-2 100.73.33.21:3128 check ``` ### Multi-Region Deployment Deploy additional VPN controllers in different regions: 1. **Create new LXC container** 2. **Install VPN controller** 3. **Configure geo-routing** 4. **Update DNS for regional access** ## Auto-Scaling ### Connection-Based Scaling Monitor and scale based on connections: ``` # Auto-scaling logic async def check_scaling_needed(): stats = await get_connection_stats() for country, connections in stats.items(): if connections > SCALE_UP_THRESHOLD: await start_additional_node(country) elif connections < SCALE_DOWN_THRESHOLD: await stop_extra_node(country) ``` ### Performance-Based Scaling Scale based on response times: ``` # Monitor response times for country in us uk de jp; do response_time=$(curl -w "%{time_total}" -o /dev/null -s \ -x http://proxy-$country.exit.idlegaming.org:3128 \ http://httpbin.org/ip) if (( $(echo "$response_time > 2.0" | bc -l) )); then echo "Scaling up $country - slow response" fi done ``` ## Load Distribution ### Geographic Distribution Distribute load across regions: ``` # Nginx geo-based routing geo $proxy_pool { default us; # Europe 2.0.0.0/8 eu; 5.0.0.0/8 eu; # Asia 1.0.0.0/8 asia; 14.0.0.0/8 asia; # Americas 8.0.0.0/8 us; 24.0.0.0/8 us; } ``` ### Load Balancing Strategies Configure different strategies: ``` # Set load balancing strategy curl -X POST -u admin:password \ https://exit.idlegaming.org/api/lb/strategy/health_score ``` Available strategies: - `round_robin` - Equal distribution - `least_connections` - Fewest active connections - `health_score` - Best performing nodes - `random` - Random selection - `ip_hash` - Consistent hashing ## Capacity Planning ### Monitoring Metrics Track key metrics for capacity planning: ``` # Connection count per node docker exec haproxy-default \ echo "show stat" | socat stdio /var/run/haproxy/admin.sock | \ awk -F',' '{print $2, $5}' # Bandwidth usage vnstat -i eth0 -h # Resource utilization docker stats --no-stream --format json | \ jq '.CPUPerc, .MemUsage' ``` ### Growth Projections Plan for growth: 1. **Track daily peaks**: Monitor busiest hours 2. **Calculate growth rate**: Month-over-month increase 3. **Plan ahead**: Scale before hitting limits 4. **Budget resources**: CPU, memory, bandwidth ## Optimization ### Container Optimization Optimize container performance: ``` # Optimized Dockerfile FROM alpine:latest RUN apk add --no-cache \ openvpn \ squid \ dante-server # Reduce layers RUN configuration && \ cleanup && \ optimization ``` ### Network Optimization Improve network performance: ``` # Increase network buffers sysctl -w net.core.rmem_max=134217728 sysctl -w net.core.wmem_max=134217728 sysctl -w net.ipv4.tcp_rmem="4096 87380 134217728" sysctl -w net.ipv4.tcp_wmem="4096 65536 134217728" ``` ## Scaling Checklist Before scaling: - [ ] Monitor current utilization - [ ] Identify bottlenecks - [ ] Plan scaling approach - [ ] Test in staging environment - [ ] Prepare rollback plan - [ ] Schedule during low traffic - [ ] Monitor after scaling - [ ] Document changes ## Best Practices 1. **Scale gradually**: Add resources incrementally 2. **Monitor impact**: Watch metrics after changes 3. **Automate where possible**: Use scripts for common tasks 4. **Plan for failures**: Have redundancy 5. **Document everything**: Keep scaling playbooks ## Cost Optimization ### Right-Sizing Ensure efficient resource usage: ``` # Analyze container usage over time for container in $(docker ps --format "{{.Names}}" | grep vpn-); do echo "=== $container ===" docker stats $container --no-stream done ``` ### Idle Resource Management Stop unused resources: ``` # Stop idle nodes during off-peak for country in $(low_traffic_countries); do docker stop vpn-exit-$country-backup done ``` --- !!! warning "Scaling Considerations" Always test scaling changes in a staging environment first. Monitor closely after any scaling operations to ensure system stability. --- ## Operations > Troubleshooting ### VPN Exit Controller Troubleshooting Guide This guide provides step-by-step troubleshooting procedures for common issues with the VPN Exit Controller system. Use this as your primary reference when diagnosing problems. ## Quick Reference ### Essential Commands ``` # System status systemctl status vpn-controller systemctl status docker docker ps -a # View logs journalctl -u vpn-controller -f docker logs vpn-api docker logs vpn-redis # API health check curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/status # Container access pct enter 201 # From Proxmox host ``` ### Log Locations - **System Service**: `journalctl -u vpn-controller` - **API Logs**: `docker logs vpn-api` - **Redis Logs**: `docker logs vpn-redis` - **Traefik Logs**: `/opt/vpn-exit-controller/traefik/logs/traefik.log` - **HAProxy Logs**: `/opt/vpn-exit-controller/proxy/logs/` - **VPN Node Logs**: `docker logs ` - **OpenVPN Logs**: Inside containers at `/var/log/openvpn.log` --- ## 1. Node Issues ### 1.1 VPN Nodes Failing to Start **Symptoms:** - Containers exit immediately after starting - API shows nodes as "failed" or "stopped" - Cannot connect to VPN endpoints **Common Error Messages:** ``` ERROR: Auth file not found at /configs/auth.txt VPN connection timeout after 30 seconds Cannot allocate TUN/TAP dev dynamically ``` **Diagnostic Steps:** 1. **Check container status:** ``` docker ps -a | grep vpn-node docker logs ``` 2. **Verify auth file exists:** ``` ls -la /opt/vpn-exit-controller/configs/auth.txt cat /opt/vpn-exit-controller/configs/auth.txt ``` 3. **Check VPN configuration files:** ``` ls -la /opt/vpn-exit-controller/configs/vpn/ # Verify country-specific configs exist ls -la /opt/vpn-exit-controller/configs/vpn/us/ ``` 4. **Test OpenVPN configuration manually:** ``` # Inside a test container docker run -it --rm --cap-add=NET_ADMIN --device=/dev/net/tun \ -v /opt/vpn-exit-controller/configs:/configs \ ubuntu:22.04 bash # Then install and test OpenVPN apt update && apt install -y openvpn openvpn --config /configs/vpn/us.ovpn --auth-user-pass /configs/auth.txt --verb 3 ``` **Resolution Steps:** 1. **Fix auth file issues:** ``` # Ensure auth file exists with correct format echo "your_nordvpn_username" > /opt/vpn-exit-controller/configs/auth.txt echo "your_nordvpn_password" >> /opt/vpn-exit-controller/configs/auth.txt chmod 600 /opt/vpn-exit-controller/configs/auth.txt ``` 2. **Fix container permissions:** ``` # Ensure Docker has proper capabilities docker run --cap-add=NET_ADMIN --device=/dev/net/tun ... ``` 3. **Update NordVPN configs if outdated:** ``` cd /opt/vpn-exit-controller ./scripts/download-nordvpn-configs.sh ``` ### 1.2 NordVPN Authentication Problems **Symptoms:** - Authentication failures in OpenVPN logs - "AUTH_FAILED" messages - Containers restart in loops **Diagnostic Steps:** 1. **Verify credentials:** ``` cat /opt/vpn-exit-controller/configs/auth.txt # Should contain username on line 1, password on line 2 ``` 2. **Test credentials manually:** ``` # Try logging into NordVPN website with same credentials ``` 3. **Check for service token vs regular credentials:** ``` # NordVPN may require service credentials for OpenVPN # Check if using regular login vs service credentials ``` **Resolution Steps:** 1. **Use NordVPN service credentials:** - Generate service credentials from NordVPN dashboard - Update auth.txt with service credentials (not regular login) 2. **Reset authentication:** ``` # Clear any cached auth rm -f /opt/vpn-exit-controller/configs/auth.txt # Recreate with correct service credentials echo "service_username" > /opt/vpn-exit-controller/configs/auth.txt echo "service_password" >> /opt/vpn-exit-controller/configs/auth.txt chmod 600 /opt/vpn-exit-controller/configs/auth.txt ``` ### 1.3 Tailscale Connection Issues **Symptoms:** - Nodes start but don't appear in Tailscale admin - "tailscale up" command fails - Exit node not advertised properly **Diagnostic Steps:** 1. **Check Tailscale auth key:** ``` # Verify auth key is set docker exec env | grep TAILSCALE_AUTHKEY ``` 2. **Check Tailscale daemon:** ``` docker exec tailscale status docker exec tailscale ping 100.73.33.11 ``` 3. **Verify container networking:** ``` docker exec ip addr show docker exec ip route ``` **Resolution Steps:** 1. **Generate new auth key:** - Go to Tailscale admin console - Generate new auth key with exit node permissions - Update .env file with new key 2. **Restart Tailscale in container:** ``` docker exec tailscale down docker exec tailscale up --authkey= --advertise-exit-node ``` 3. **Check firewall rules:** ``` # Ensure UDP port 41641 is accessible iptables -L | grep 41641 ``` ### 1.4 Container Restart Loops **Symptoms:** - Containers continuously restart - High CPU usage - "Restarting" status in docker ps **Diagnostic Steps:** 1. **Check restart policy:** ``` docker inspect | grep -A 5 RestartPolicy ``` 2. **Monitor restart events:** ``` docker events --filter container= ``` 3. **Check exit codes:** ``` docker logs --tail 50 ``` **Resolution Steps:** 1. **Identify root cause:** - VPN connection failures - Tailscale authentication issues - Network connectivity problems 2. **Temporary fix - stop restart:** ``` docker update --restart=no docker stop ``` 3. **Fix underlying issue and restart:** ``` docker update --restart=unless-stopped docker start ``` --- ## 2. Network Issues ### 2.1 Proxy Connection Failures **Symptoms:** - HTTP 502/503 errors from proxy endpoints - Timeouts when connecting through proxy - "upstream connect error" messages **Diagnostic Steps:** 1. **Check HAProxy status:** ``` # Access HAProxy stats curl http://localhost:8404/ ``` 2. **Test direct container connectivity:** ``` # Find proxy container ports docker ps | grep proxy # Test direct connection curl -x localhost:3128 http://httpbin.org/ip ``` 3. **Check backend health:** ``` # In HAProxy stats, look for backend server status # Red = down, Green = up ``` **Resolution Steps:** 1. **Restart proxy containers:** ``` cd /opt/vpn-exit-controller/proxy docker-compose restart ``` 2. **Check proxy configuration:** ``` # Verify HAProxy config syntax docker exec haproxy-container haproxy -c -f /usr/local/etc/haproxy/haproxy.cfg ``` 3. **Update backend servers:** ``` # Use API to refresh backend configurations curl -X POST -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/proxy/refresh ``` ### 2.2 DNS Resolution Problems **Symptoms:** - Domain names not resolving in containers - "Name resolution failed" errors - Proxy working with IPs but not domains - **"Doesn't support secure connection" errors in incognito mode** - HTTPS websites failing through proxy **Diagnostic Steps:** 1. **Test DNS in containers:** ``` docker exec nslookup google.com docker exec dig @8.8.8.8 google.com docker exec dig @103.86.96.100 google.com # NordVPN DNS ``` 2. **Check container DNS settings:** ``` docker exec cat /etc/resolv.conf # Should show NordVPN DNS servers first, then fallbacks ``` 3. **Test Tailscale DNS configuration:** ``` docker exec tailscale status --json | grep -i dns # Should show accept-dns=false ``` 4. **Test host DNS:** ``` nslookup google.com dig google.com ``` **Resolution Steps:** 1. **Fix VPN container DNS (Primary Fix for HTTPS errors):** ``` # Ensure containers use NordVPN DNS with fallbacks # This is handled automatically in entrypoint.sh: echo "nameserver 103.86.96.100" > /etc/resolv.conf echo "nameserver 103.86.99.100" >> /etc/resolv.conf echo "nameserver 8.8.8.8" >> /etc/resolv.conf echo "nameserver 1.1.1.1" >> /etc/resolv.conf ``` 2. **Ensure Tailscale DNS is disabled:** ``` # In container, verify Tailscale is using --accept-dns=false docker exec ps aux | grep tailscale # Should show --accept-dns=false in the command line ``` 3. **Test DNS resolution through VPN:** ``` # Test that DNS queries go through VPN tunnel docker exec dig +trace google.com # Should resolve through NordVPN DNS servers ``` 4. **Legacy system DNS fix (if needed):** ``` # Edit /etc/systemd/resolved.conf [Resolve] DNS=8.8.8.8 1.1.1.1 systemctl restart systemd-resolved ``` 5. **Restart networking if changes made:** ``` systemctl restart networking systemctl restart docker ``` !!! success "DNS Resolution Fix Explained" The recent update implements a comprehensive DNS fix that resolves HTTPS errors in incognito mode: 1. **Root Cause**: Tailscale's DNS resolution was interfering with HTTPS certificate validation 2. **Solution**: Disable Tailscale DNS (`--accept-dns=false`) and use NordVPN DNS servers 3. **Implementation**: Containers manually configure `/etc/resolv.conf` with NordVPN DNS first, Google DNS as fallback 4. **Result**: Eliminates "doesn't support secure connection" errors and improves proxy reliability ### 2.3 SSL Certificate Issues **Symptoms:** - HTTPS endpoints returning certificate errors - "SSL handshake failed" messages - Browser certificate warnings **Diagnostic Steps:** 1. **Check Traefik certificate status:** ``` # View certificate file ls -la /opt/vpn-exit-controller/traefik/letsencrypt/acme.json # Check Traefik logs for ACME challenges docker logs traefik | grep -i acme ``` 2. **Test certificate validity:** ``` openssl s_client -connect proxy-us.rbnk.uk:443 -servername proxy-us.rbnk.uk ``` 3. **Check DNS for ACME challenge:** ``` dig _acme-challenge.proxy-us.rbnk.uk TXT ``` **Resolution Steps:** 1. **Force certificate renewal:** ``` # Delete existing certificates rm /opt/vpn-exit-controller/traefik/letsencrypt/acme.json # Restart Traefik to trigger new certificate request docker restart traefik ``` 2. **Check Cloudflare API credentials:** ``` # Verify CF_API_EMAIL and CF_API_KEY are set correctly docker exec traefik env | grep CF_ ``` 3. **Manual certificate debug:** ``` # Check ACME logs in Traefik docker logs traefik | grep -i "certificate\|acme\|error" ``` ### 2.4 Port Binding Conflicts **Symptoms:** - "Port already in use" errors - Containers failing to start - Services not accessible on expected ports **Diagnostic Steps:** 1. **Check port usage:** ``` netstat -tulpn | grep :8080 lsof -i :8080 ss -tulpn | grep :8080 ``` 2. **Check Docker port mappings:** ``` docker ps --format "table {{.Names}}\t{{.Ports}}" ``` 3. **Identify conflicting services:** ``` systemctl list-units --type=service --state=running | grep -E "(proxy|web|http)" ``` **Resolution Steps:** 1. **Kill conflicting processes:** ``` # Find and kill process using the port sudo kill $(lsof -t -i:8080) ``` 2. **Change service ports:** ``` # Update docker-compose.yml or service configuration # Use different port numbers ``` 3. **Restart in correct order:** ``` systemctl stop vpn-controller docker-compose down systemctl start vpn-controller ``` --- ## 3. Service Issues ### 3.1 FastAPI Application Not Starting **Symptoms:** - systemctl shows failed status - "Connection refused" on port 8080 - Python import errors in logs **Diagnostic Steps:** 1. **Check systemd service status:** ``` systemctl status vpn-controller journalctl -u vpn-controller -n 50 ``` 2. **Test manual startup:** ``` cd /opt/vpn-exit-controller source venv/bin/activate export $(grep -v '^#' .env | xargs) cd api python -m uvicorn main:app --host 0.0.0.0 --port 8080 ``` 3. **Check Python environment:** ``` /opt/vpn-exit-controller/venv/bin/python --version /opt/vpn-exit-controller/venv/bin/pip list ``` **Resolution Steps:** 1. **Fix Python dependencies:** ``` cd /opt/vpn-exit-controller source venv/bin/activate pip install -r api/requirements.txt ``` 2. **Check environment variables:** ``` # Verify .env file exists and has required variables cat /opt/vpn-exit-controller/.env # Required: SECRET_KEY, ADMIN_USER, ADMIN_PASS, TAILSCALE_AUTHKEY ``` 3. **Fix import errors:** ``` # Ensure all Python modules are properly installed cd /opt/vpn-exit-controller/api python -c "import main" ``` ### 3.2 Redis Connection Problems **Symptoms:** - "Redis connection failed" in API logs - Cache-related operations failing - High latency in API responses **Diagnostic Steps:** 1. **Check Redis container:** ``` docker ps | grep redis docker logs vpn-redis ``` 2. **Test Redis connectivity:** ``` docker exec vpn-redis redis-cli ping redis-cli -h localhost -p 6379 ping ``` 3. **Check Redis configuration:** ``` docker exec vpn-redis redis-cli CONFIG GET "*" ``` **Resolution Steps:** 1. **Restart Redis:** ``` docker restart vpn-redis ``` 2. **Clear Redis data if corrupted:** ``` docker exec vpn-redis redis-cli FLUSHALL ``` 3. **Check disk space:** ``` df -h # Redis needs disk space for persistence ``` ### 3.3 Docker Daemon Issues **Symptoms:** - "Cannot connect to Docker daemon" errors - Docker commands hanging - Containers not starting **Diagnostic Steps:** 1. **Check Docker service:** ``` systemctl status docker journalctl -u docker -n 50 ``` 2. **Test Docker functionality:** ``` docker version docker info docker run hello-world ``` 3. **Check Docker socket:** ``` ls -la /var/run/docker.sock sudo chmod 666 /var/run/docker.sock # Temporary fix ``` **Resolution Steps:** 1. **Restart Docker:** ``` systemctl restart docker ``` 2. **Clean up Docker resources:** ``` docker system prune -a docker volume prune ``` 3. **Check disk space:** ``` df -h /var/lib/docker # Docker needs sufficient disk space ``` ### 3.4 Systemd Service Failures **Symptoms:** - Service won't start at boot - "Failed to start" messages - Service stops unexpectedly **Diagnostic Steps:** 1. **Check service definition:** ``` systemctl cat vpn-controller systemctl status vpn-controller ``` 2. **View detailed logs:** ``` journalctl -u vpn-controller -f journalctl -u vpn-controller --since "1 hour ago" ``` 3. **Test service script manually:** ``` /opt/vpn-exit-controller/start.sh ``` **Resolution Steps:** 1. **Fix service dependencies:** ``` # Ensure docker.service is running systemctl start docker systemctl enable docker ``` 2. **Update service configuration:** ``` systemctl edit vpn-controller # Add any necessary environment variables or dependencies ``` 3. **Reload and restart:** ``` systemctl daemon-reload systemctl restart vpn-controller systemctl enable vpn-controller ``` --- ## 4. Load Balancing Issues ### 4.1 Nodes Not Being Selected Properly **Symptoms:** - All traffic going to one node - Load balancer ignoring some nodes - Uneven traffic distribution **Diagnostic Steps:** 1. **Check load balancer status:** ``` curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/load-balancer/status ``` 2. **View node health:** ``` curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/nodes ``` 3. **Check HAProxy backend status:** ``` curl http://localhost:8404/ # Look for backend server weights and status ``` **Resolution Steps:** 1. **Reset load balancer:** ``` curl -X POST -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/load-balancer/reset ``` 2. **Update backend weights:** ``` # API call to rebalance curl -X POST -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/load-balancer/rebalance ``` 3. **Restart load balancing service:** ``` docker restart haproxy ``` ### 4.2 Health Check Failures **Symptoms:** - Nodes marked as unhealthy - False positive health failures - Health checks timing out **Diagnostic Steps:** 1. **Check health check configuration:** ``` # View HAProxy health check settings cat /opt/vpn-exit-controller/proxy/haproxy.cfg | grep -A 5 "option httpchk" ``` 2. **Test health endpoints manually:** ``` # Test individual node health curl -H "Host: proxy-us.rbnk.uk" http://container-ip:3128/health ``` 3. **Check health check logs:** ``` docker logs haproxy | grep -i health ``` **Resolution Steps:** 1. **Adjust health check timeouts:** ``` # In haproxy.cfg, increase timeout values timeout check 5s ``` 2. **Fix health endpoint:** ``` # Ensure containers respond properly to health checks docker exec curl localhost:3128/health ``` 3. **Restart health monitoring:** ``` docker restart haproxy ``` ### 4.3 Speed Test Failures **Symptoms:** - Speed tests returning errors - Incorrect speed measurements - Speed test endpoints unreachable **Diagnostic Steps:** 1. **Test speed test API:** ``` curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/speed-test/run/us ``` 2. **Check network connectivity:** ``` # Test from container docker exec curl -I http://speedtest.net ``` 3. **View speed test logs:** ``` docker logs vpn-api | grep -i "speed" ``` **Resolution Steps:** 1. **Update speed test endpoints:** ``` # Modify speed test configuration to use working endpoints ``` 2. **Increase timeout values:** ``` # In speed test service, allow more time for tests ``` 3. **Use alternative speed test method:** ``` # Switch to different speed testing service ``` ### 4.4 Failover Not Working **Symptoms:** - Failed nodes still receiving traffic - No automatic failover - Manual failover not working **Diagnostic Steps:** 1. **Check failover configuration:** ``` curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/failover/status ``` 2. **Test failover trigger:** ``` # Manually trigger failover curl -X POST -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/failover/trigger/us ``` 3. **Check monitoring service:** ``` docker logs vpn-api | grep -i "failover\|monitor" ``` **Resolution Steps:** 1. **Restart monitoring service:** ``` systemctl restart vpn-controller ``` 2. **Update failover thresholds:** ``` # Adjust sensitivity in configuration ``` 3. **Manual failover:** ``` # Remove failed nodes from rotation curl -X DELETE -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/nodes/ ``` --- ## 5. Proxy Issues ### 5.1 HAProxy Configuration Errors **Symptoms:** - HAProxy fails to start - Configuration validation errors - Syntax errors in logs **Diagnostic Steps:** 1. **Validate configuration:** ``` docker exec haproxy haproxy -c -f /usr/local/etc/haproxy/haproxy.cfg ``` 2. **Check configuration file:** ``` cat /opt/vpn-exit-controller/proxy/haproxy.cfg ``` 3. **View HAProxy logs:** ``` docker logs haproxy ``` **Resolution Steps:** 1. **Fix syntax errors:** ``` # Check for missing commas, brackets, quotes in haproxy.cfg # Validate with: haproxy -c -f haproxy.cfg ``` 2. **Restore backup configuration:** ``` cp /opt/vpn-exit-controller/proxy/haproxy.cfg.backup /opt/vpn-exit-controller/proxy/haproxy.cfg ``` 3. **Restart with corrected config:** ``` docker restart haproxy ``` ### 5.2 Traefik Routing Problems **Symptoms:** - 404 errors on proxy domains - Requests not reaching HAProxy - Routing rules not working **Diagnostic Steps:** 1. **Check Traefik dashboard:** ``` # Access dashboard (if enabled) curl http://localhost:8080/dashboard/ ``` 2. **View Traefik configuration:** ``` cat /opt/vpn-exit-controller/traefik/traefik.yml ``` 3. **Check routing rules:** ``` docker logs traefik | grep -i "router\|rule" ``` **Resolution Steps:** 1. **Update dynamic configuration:** ``` # Check and update files in /opt/vpn-exit-controller/traefik/dynamic/ ``` 2. **Restart Traefik:** ``` docker restart traefik ``` 3. **Verify domain DNS:** ``` dig proxy-us.rbnk.uk # Ensure domains point to correct IP ``` ### 5.3 Country Routing Not Working **Symptoms:** - All traffic routed to same country - Country-specific domains not working - Incorrect geolocation **Diagnostic Steps:** 1. **Test country-specific endpoints:** ``` curl -H "Host: proxy-us.rbnk.uk" http://localhost:8080/ curl -H "Host: proxy-uk.rbnk.uk" http://localhost:8080/ ``` 2. **Check HAProxy ACL rules:** ``` grep -A 10 "acl is_.*_proxy" /opt/vpn-exit-controller/proxy/haproxy.cfg ``` 3. **Test IP geolocation:** ``` curl -x proxy-us.rbnk.uk:8080 http://ipinfo.io/json curl -x proxy-uk.rbnk.uk:8080 http://ipinfo.io/json ``` **Resolution Steps:** 1. **Update HAProxy routing rules:** ``` # Verify ACL rules and backend assignments in haproxy.cfg ``` 2. **Restart proxy services:** ``` cd /opt/vpn-exit-controller/proxy docker-compose restart ``` 3. **Check node assignments:** ``` # Ensure nodes are assigned to correct countries curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/nodes ``` ### 5.4 Proxy Service Failures **Symptoms:** - Squid or Dante proxy services not responding - Connection refused on ports 3128 or 1080 - Health check endpoint (port 8080) not responding - Container restart loops involving proxy services **Diagnostic Steps:** 1. **Check proxy service status in container:** ``` # Check if proxy processes are running docker exec ps aux | grep -E "(squid|danted)" # Check if ports are listening docker exec netstat -tuln | grep -E "(3128|1080|8080)" # Test proxy services directly docker exec curl -I http://localhost:3128 docker exec curl -I http://localhost:8080/health ``` 2. **Check proxy service logs:** ``` # Squid logs docker exec tail -f /var/log/squid/cache.log # Check container logs for proxy startup docker logs | grep -E "(squid|dante|health)" ``` 3. **Test proxy connectivity from host:** ``` # Test HTTP proxy (replace with actual Tailscale IP) curl -x http://100.86.140.98:3128 http://httpbin.org/ip # Test SOCKS5 proxy curl --socks5 100.86.140.98:1080 http://httpbin.org/ip # Test health endpoint curl http://100.86.140.98:8080/health ``` **Resolution Steps:** 1. **Restart proxy services in container:** ``` # Kill and restart Squid docker exec pkill squid docker exec squid -N -d 1 & # Kill and restart Dante docker exec pkill danted docker exec danted -D & # The entrypoint.sh monitoring loop will also restart them automatically ``` 2. **Restart entire container:** ``` # Use API to restart node curl -X POST -u admin:Bl4ckMagic!2345erver \ http://localhost:8080/api/nodes//restart # Or restart via Docker docker restart ``` 3. **Check Squid configuration:** ``` # Verify Squid config syntax docker exec squid -k parse # Check if Squid cache directory is initialized docker exec ls -la /var/spool/squid/ ``` 4. **Check Dante configuration:** ``` # Verify Dante config file exists docker exec cat /etc/danted.conf # Check Dante listening status docker exec ss -tuln | grep 1080 ``` ### 5.5 Authentication Failures **Symptoms:** - 401/403 errors on proxy requests - Authentication not working - Unauthorized access attempts **Diagnostic Steps:** 1. **Test API authentication:** ``` curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/status ``` 2. **Check authentication configuration:** ``` grep -i auth /opt/vpn-exit-controller/proxy/haproxy.cfg ``` 3. **View authentication logs:** ``` docker logs haproxy | grep -i "auth\|401\|403" ``` **Resolution Steps:** 1. **Update credentials:** ``` # Update .env file with correct credentials vim /opt/vpn-exit-controller/.env ``` 2. **Restart services:** ``` systemctl restart vpn-controller ``` 3. **Clear authentication cache:** ``` docker exec vpn-redis redis-cli FLUSHDB ``` --- ## 6. Performance Issues ### 6.1 Slow Proxy Speeds **Symptoms:** - High latency through proxy - Slow download/upload speeds - Timeouts on large requests **Diagnostic Steps:** 1. **Test direct vs proxy speed:** ``` # Direct speed test curl -o /dev/null -s -w "%{time_total}\n" http://speedtest.net/speedtest.jpg # Proxy speed test curl -x proxy-us.rbnk.uk:8080 -o /dev/null -s -w "%{time_total}\n" http://speedtest.net/speedtest.jpg ``` 2. **Check network utilization:** ``` iftop -i eth0 nethogs ``` 3. **Monitor container resources:** ``` docker stats ``` **Resolution Steps:** 1. **Optimize HAProxy configuration:** ``` # Increase connection limits and timeouts maxconn 4096 timeout client 50000ms timeout server 50000ms ``` 2. **Scale up resources:** ``` # Add more CPU/memory to LXC container # Add more VPN nodes for load distribution ``` 3. **Optimize VPN connections:** ``` # Use UDP instead of TCP where possible # Select closer VPN servers ``` ### 6.2 High Latency **Symptoms:** - Slow response times - High ping times - Delayed connections **Diagnostic Steps:** 1. **Measure latency at different points:** ``` # Host to VPN server ping nordvpn-server.com # Through proxy curl -x proxy-us.rbnk.uk:8080 -w "%{time_connect}\n" http://httpbin.org/get ``` 2. **Check routing:** ``` traceroute -T -p 80 google.com ``` 3. **Monitor network queues:** ``` ss -i | grep -E "(rto|rtt)" ``` **Resolution Steps:** 1. **Select closer VPN servers:** ``` # Use geographically closer NordVPN servers # Update configuration to use lower-latency servers ``` 2. **Optimize TCP settings:** ``` # Tune TCP congestion control echo 'net.core.default_qdisc = fq' >> /etc/sysctl.conf echo 'net.ipv4.tcp_congestion_control = bbr' >> /etc/sysctl.conf sysctl -p ``` 3. **Reduce proxy hops:** ``` # Minimize routing through multiple proxies # Use direct connections where possible ``` ### 6.3 Memory or CPU Issues **Symptoms:** - High CPU usage - Out of memory errors - System slowdowns **Diagnostic Steps:** 1. **Check system resources:** ``` top htop free -m df -h ``` 2. **Monitor container resources:** ``` docker stats --no-stream ``` 3. **Check for memory leaks:** ``` # Monitor memory usage over time while true; do docker stats --no-stream; sleep 30; done ``` **Resolution Steps:** 1. **Increase LXC container resources:** ``` # From Proxmox host pct set 201 --memory 4096 --cores 4 ``` 2. **Optimize container limits:** ``` # Add resource limits to docker-compose.yml deploy: resources: limits: memory: 512M reservations: memory: 256M ``` 3. **Clean up resources:** ``` docker system prune -a docker volume prune ``` ### 6.4 Connection Timeouts **Symptoms:** - Connections dropping - Timeout errors - Incomplete transfers **Diagnostic Steps:** 1. **Check timeout settings:** ``` grep -i timeout /opt/vpn-exit-controller/proxy/haproxy.cfg ``` 2. **Monitor connection states:** ``` ss -s netstat -an | grep -E "(ESTABLISHED|TIME_WAIT|CLOSE_WAIT)" | wc -l ``` 3. **Test with different timeout values:** ``` curl --connect-timeout 30 --max-time 300 -x proxy-us.rbnk.uk:8080 http://httpbin.org/delay/10 ``` **Resolution Steps:** 1. **Increase timeout values:** ``` # In haproxy.cfg timeout connect 10s timeout client 60s timeout server 60s ``` 2. **Optimize connection pooling:** ``` # Adjust keep-alive settings option http-keep-alive timeout http-keep-alive 10s ``` 3. **Scale connection limits:** ``` # Increase system limits ulimit -n 65536 echo '* soft nofile 65536' >> /etc/security/limits.conf echo '* hard nofile 65536' >> /etc/security/limits.conf ``` --- ## 7. Monitoring Issues ### 7.1 Metrics Not Being Collected **Symptoms:** - Empty metrics endpoints - No data in monitoring dashboards - Metrics API returning errors **Diagnostic Steps:** 1. **Check metrics endpoint:** ``` curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/metrics ``` 2. **Verify metrics collector service:** ``` docker logs vpn-api | grep -i "metrics" ``` 3. **Check Redis for metrics data:** ``` docker exec vpn-redis redis-cli KEYS "*metrics*" ``` **Resolution Steps:** 1. **Restart metrics collection:** ``` curl -X POST -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/metrics/restart ``` 2. **Clear corrupted metrics:** ``` docker exec vpn-redis redis-cli DEL metrics:* ``` 3. **Check metrics configuration:** ``` # Verify metrics collection intervals and settings ``` ### 7.2 Health Checks Failing **Symptoms:** - All nodes showing as unhealthy - Health check endpoints not responding - False positive failures **Diagnostic Steps:** 1. **Test health endpoints manually:** ``` # Test individual container health docker exec curl -I localhost:3128/health ``` 2. **Check health check configuration:** ``` grep -A 3 "option httpchk" /opt/vpn-exit-controller/proxy/haproxy.cfg ``` 3. **Monitor health check frequency:** ``` docker logs haproxy | grep -i "health\|check" | tail -20 ``` **Resolution Steps:** 1. **Adjust health check parameters:** ``` # Increase check intervals and timeouts inter 10s fastinter 2s downinter 5s ``` 2. **Fix health endpoint implementation:** ``` # Ensure containers properly implement /health endpoint ``` 3. **Reset health check state:** ``` docker restart haproxy ``` ### 7.3 Log Analysis Techniques **Key Log Locations and Commands:** 1. **System-wide issues:** ``` # System logs journalctl -u vpn-controller -f journalctl --since "1 hour ago" | grep -i error # Docker logs docker logs --tail 100 -f vpn-api ``` 2. **Network issues:** ``` # HAProxy logs docker logs haproxy | grep -E "(5xx|error|timeout)" # Traefik logs tail -f /opt/vpn-exit-controller/traefik/logs/traefik.log | grep -i error ``` 3. **VPN connection issues:** ``` # OpenVPN logs in containers docker exec tail -f /var/log/openvpn.log # Connection monitoring docker exec ip route show | grep tun0 ``` 4. **Performance analysis:** ``` # Response time analysis docker logs haproxy | grep -oE '"[0-9]+/[0-9]+/[0-9]+/[0-9]+/[0-9]+"' | tail -100 # Error rate analysis docker logs haproxy | grep -oE '" [0-9]{3} ' | sort | uniq -c ``` --- ## Emergency Recovery Procedures ### Complete System Reset If the system is completely broken, follow these steps: 1. **Stop all services:** ``` systemctl stop vpn-controller docker-compose -f /opt/vpn-exit-controller/docker-compose.yml down docker-compose -f /opt/vpn-exit-controller/proxy/docker-compose.yml down ``` 2. **Clean up Docker:** ``` docker system prune -a docker volume prune ``` 3. **Reset configuration:** ``` cd /opt/vpn-exit-controller git stash # Save any local changes git reset --hard HEAD # Reset to last known good state ``` 4. **Restart services:** ``` systemctl start docker systemctl start vpn-controller ``` ### Data Recovery If data is corrupted: 1. **Backup current state:** ``` tar -czf /tmp/vpn-backup-$(date +%Y%m%d).tar.gz /opt/vpn-exit-controller/data/ ``` 2. **Reset Redis data:** ``` docker exec vpn-redis redis-cli FLUSHALL ``` 3. **Restart with clean state:** ``` systemctl restart vpn-controller ``` ### Network Recovery If networking is broken: 1. **Reset network interfaces:** ``` systemctl restart networking systemctl restart docker ``` 2. **Flush iptables:** ``` iptables -F iptables -X iptables -t nat -F iptables -t nat -X ``` 3. **Restart Tailscale:** ``` tailscale down tailscale up --authkey= --advertise-exit-node ``` --- ## Prevention and Maintenance ### Regular Maintenance Tasks 1. **Weekly:** - Check disk space: `df -h` - Review error logs: `journalctl -u vpn-controller --since "1 week ago" | grep -i error` - Test proxy endpoints: `curl -x proxy-us.rbnk.uk:8080 http://httpbin.org/ip` 2. **Monthly:** - Update NordVPN configurations: `./scripts/download-nordvpn-configs.sh` - Clean Docker resources: `docker system prune` - Review and rotate Tailscale auth keys 3. **Quarterly:** - Update system packages: `apt update && apt upgrade` - Review and update SSL certificates - Performance optimization review ### Monitoring Setup Set up automated monitoring: 1. **Log monitoring:** ``` # Set up logrotate for container logs # Monitor for specific error patterns ``` 2. **Resource monitoring:** ``` # Set up alerts for CPU/memory usage # Monitor disk space ``` 3. **Health monitoring:** ``` # Automated health checks # Alert on service failures ``` This troubleshooting guide should help you diagnose and resolve most issues with the VPN Exit Controller system. Keep it updated as you encounter new issues and solutions. --- ## Quick Start ### Quick Start Guide Get VPN Exit Controller up and running in under 10 minutes! ## Prerequisites Checklist Before starting, ensure you have: - [x] Ubuntu 22.04+ server with root access - [x] Docker and Docker Compose installed - [x] Python 3.10+ with pip - [x] Domain with Cloudflare DNS - [x] NordVPN service credentials - [x] Tailscale auth key ## 1. Clone the Repository ``` git clone https://gitea.rbnk.uk/admin/vpn-controller.git cd vpn-controller ``` ## 2. Configure Environment Copy the example environment file: ``` cp .env.example .env ``` Edit `.env` with your credentials: ``` # Essential configuration NORDVPN_USER=your_nordvpn_service_username NORDVPN_PASS=your_nordvpn_service_password TAILSCALE_AUTH_KEY=your_tailscale_auth_key CF_API_TOKEN=your_cloudflare_api_token API_PASSWORD=your_secure_password_here ``` !!! warning "Security Note" Never commit `.env` to version control. The file is already in `.gitignore`. ## 3. Install Dependencies Create Python virtual environment: ``` python3 -m venv venv source venv/bin/activate pip install -r requirements.txt ``` ## 4. Start Services ### Redis (Required for metrics) ``` docker run -d --name redis \ -p 6379:6379 \ --restart always \ redis:alpine ``` ### VPN Controller API ``` # Development mode uvicorn api.main:app --reload --host 0.0.0.0 --port 8080 # Or use the systemd service (production) sudo systemctl enable vpn-controller sudo systemctl start vpn-controller ``` ## 5. Create Your First VPN Node Start a US exit node: ``` curl -X POST http://localhost:8080/api/nodes/start \ -u admin:your_api_password \ -H "Content-Type: application/json" \ -d '{"country": "us", "city": "New York"}' ``` Check node status: ``` curl http://localhost:8080/api/nodes \ -u admin:your_api_password ``` ## 6. Set Up Proxy Access (Optional) ### Deploy HAProxy ``` cd proxy docker-compose up -d ``` ### Deploy Traefik for SSL ``` cd ../traefik docker-compose up -d ``` ### Configure DNS ``` # Set your Cloudflare API token export CF_API_TOKEN=your_cloudflare_api_token # Run DNS setup ./scripts/setup-proxy-dns.sh ``` ## 7. Test Your Setup ### Test VPN connectivity: ``` # Check if node is connected curl http://localhost:8080/api/health -u admin:your_api_password ``` ### Test proxy URL (if configured): ``` # Test US proxy curl -x https://proxy-us.yourdomain.com https://ipinfo.io ``` ## 🎉 Success! You now have a working VPN Exit Controller! Here's what you can do next: ### Essential Commands === "Node Management" ``` # List all nodes curl http://localhost:8080/api/nodes -u admin:$API_PASSWORD # Start a node curl -X POST http://localhost:8080/api/nodes/start \ -u admin:$API_PASSWORD \ -H "Content-Type: application/json" \ -d '{"country": "uk"}' # Start a specific UK node curl -X POST http://localhost:8080/api/nodes/uk/start \ -u admin:$API_PASSWORD # Stop a node curl -X DELETE http://localhost:8080/api/nodes/vpn-uk \ -u admin:$API_PASSWORD ``` === "Health & Metrics" ``` # System health curl http://localhost:8080/api/health -u admin:$API_PASSWORD # Node metrics curl http://localhost:8080/api/metrics -u admin:$API_PASSWORD # Speed test curl -X POST http://localhost:8080/api/speed-test/vpn-us \ -u admin:$API_PASSWORD ``` === "Load Balancing" ``` # Get best node for country curl http://localhost:8080/api/load-balancer/best-node/us \ -u admin:$API_PASSWORD # Get best UK node curl http://localhost:8080/api/load-balancer/best-node/uk \ -u admin:$API_PASSWORD # Change strategy curl -X POST http://localhost:8080/api/load-balancer/strategy \ -u admin:$API_PASSWORD \ -H "Content-Type: application/json" \ -d '{"strategy": "health_score"}' ``` ## Common Issues & Solutions !!! question "Node won't start?" - Check Docker is running: `docker ps` - Verify NordVPN credentials in `.env` - Check logs: `docker logs vpn-us` !!! question "Can't access proxy URLs?" - Ensure DNS records are created - Check Traefik is running: `docker ps | grep traefik` - Verify SSL certificates: Check Traefik logs !!! question "API returns 401 Unauthorized?" - Check username is `admin` - Verify password matches `.env` setting - Use `-u admin:password` with curl ## Next Steps
- :material-book-open-variant:{ .lg .middle } __Read the User Guide__ --- Learn about all features and configuration options :octicons-arrow-right-24: User Guide - :material-api:{ .lg .middle } __Explore the API__ --- Full API reference with examples :octicons-arrow-right-24: API Reference - :material-server:{ .lg .middle } __Deploy to Production__ --- Production deployment best practices :octicons-arrow-right-24: Deployment Guide - :material-shield-check:{ .lg .middle } __Security Hardening__ --- Secure your deployment :octicons-arrow-right-24: Security Guide
--- !!! success "Congratulations!" You've successfully deployed VPN Exit Controller. Join our community to stay updated with new features and best practices. --- ## Security > Best Practices ### Security Guide for VPN Exit Controller This document outlines security considerations, best practices, and hardening procedures for the VPN Exit Controller system. ## Table of Contents 1. Network Security 2. Authentication and Authorization 3. Container Security 4. SSL/TLS Security 5. Data Protection 6. Operational Security 7. Compliance Considerations 8. Security Hardening ## Network Security ### Firewall Configuration The VPN Exit Controller requires specific firewall rules for secure operation: ``` # Allow SSH (change default port) ufw allow 2222/tcp # Allow web management interface (behind Traefik) ufw allow 80/tcp ufw allow 443/tcp # Allow Tailscale mesh network ufw allow 41641/udp # Allow OpenVPN traffic for exit nodes ufw allow 1194/udp ufw allow 443/tcp # Block all other incoming traffic by default ufw default deny incoming ufw default allow outgoing # Enable firewall ufw enable ``` ### Network Isolation #### Proxmox LXC Security - **Privileged Containers**: The system requires privileged LXC containers for Docker operations - **Container Isolation**: Use `lxc.apparmor.profile: unconfined` carefully and only when necessary - **Network Segmentation**: Place the LXC container on an isolated VLAN (vmbr1) ``` # Example LXC configuration for security cat >> /etc/pve/lxc/201.conf << EOF # Security settings lxc.apparmor.profile: generated lxc.seccomp.profile: /usr/share/lxc/config/seccomp # Only use unconfined when Docker requires it EOF ``` #### Docker Network Security - **Bridge Networks**: Use custom Docker networks instead of default bridge - **Network Policies**: Implement container-to-container communication restrictions - **Port Exposure**: Only expose necessary ports (8080 for API, Redis port internally only) ### VPN Tunnel Security #### OpenVPN Configuration - **Cipher Suite**: Use AES-256-GCM or ChaCha20-Poly1305 - **Authentication**: Use SHA-256 or stronger for HMAC - **Perfect Forward Secrecy**: Enable TLS-auth with rotating keys - **Certificate Validation**: Implement strict certificate checking #### NordVPN Integration Security - **Credential Storage**: Store NordVPN credentials securely (see Data Protection section) - **Server Verification**: Verify NordVPN server certificates - **Protocol Selection**: Prefer OpenVPN over IKEv2 for better auditability ### Tailscale Mesh Network Security #### Authentication - **Auth Keys**: Use ephemeral auth keys when possible - **ACL Policies**: Implement strict Access Control Lists - **Node Authorization**: Require manual node approval ``` // Example Tailscale ACL { "acls": [ { "action": "accept", "src": ["group:admins"], "dst": ["vpn-exit-controller:8080"] } ], "groups": { "group:admins": ["user@example.com"] } } ``` ## Authentication and Authorization ### HTTP Basic Authentication The system currently uses HTTP Basic Authentication with the following security considerations: #### Current Implementation Issues ``` # SECURITY WARNING: Default credentials are exposed ADMIN_USER = os.getenv("ADMIN_USER", "admin") ADMIN_PASS = os.getenv("ADMIN_PASS", "changeme") ``` #### Security Recommendations 1. **Change Default Credentials Immediately** ``` # Set secure environment variables export ADMIN_USER="secure_admin_user" export ADMIN_PASS="$(openssl rand -base64 32)" ``` 2. **Use Strong Password Policy** - Minimum 16 characters - Include uppercase, lowercase, numbers, and symbols - Avoid dictionary words - Rotate passwords regularly (90 days) 3. **Implement Rate Limiting** ``` # Add to FastAPI middleware from slowapi import Limiter from slowapi.util import get_remote_address limiter = Limiter(key_func=get_remote_address) @app.post("/api/auth/login") @limiter.limit("5/minute") async def login(request: Request, ...): # Login logic ``` ### API Security #### JWT Token Management - **Secret Key**: Use cryptographically secure random keys - **Token Expiration**: Set appropriate token lifetimes (24 hours max) - **Refresh Tokens**: Implement token refresh mechanism ``` # Generate secure secret key openssl rand -hex 32 ``` #### API Key Security - **Rotation**: Implement regular API key rotation - **Scoping**: Use least-privilege principle for API access - **Monitoring**: Log all API key usage ### Service Credential Storage #### NordVPN Credentials Current storage in `/opt/vpn-exit-controller/configs/auth.txt` is insecure. **Secure Implementation:** ``` # Use HashiCorp Vault or similar vault kv put secret/nordvpn username="your_username" password="your_password" # Or encrypt with gpg echo "username:password" | gpg --symmetric --armor > /opt/vpn-exit-controller/configs/auth.txt.gpg ``` #### Tailscale Auth Keys - **Ephemeral Keys**: Use time-limited auth keys - **Key Storage**: Store in secure key management system - **Access Logging**: Monitor auth key usage #### Cloudflare API Tokens - **Scoped Tokens**: Use DNS-only tokens with domain restrictions - **Token Rotation**: Regular rotation schedule - **Environment Variables**: Never hardcode in configuration files ## Container Security ### Docker Security Best Practices #### Image Security ``` # Use specific, non-root base images FROM ubuntu:22.04@sha256:specific-hash # Create non-root user RUN useradd -r -u 1001 -m -c "vpn user" -d /home/vpn -s /bin/bash vpn # Use non-root user USER vpn # Verify signatures RUN curl -fsSL https://tailscale.com/install.sh | sh ``` #### Runtime Security ``` # docker-compose.yml security settings services: api: security_opt: - no-new-privileges:true read_only: true tmpfs: - /tmp - /var/tmp cap_drop: - ALL cap_add: - NET_ADMIN # Only if required for VPN ``` ### Container Isolation #### Network Isolation ``` # Create isolated networks networks: vpn-internal: driver: bridge internal: true vpn-external: driver: bridge ``` #### Resource Limits ``` services: api: deploy: resources: limits: cpus: '1.0' memory: 512M reservations: cpus: '0.5' memory: 256M ``` ### Privileged Container Considerations The VPN nodes require privileged access for network operations: #### Risk Mitigation - **Minimal Privileges**: Only grant necessary capabilities - **Network Namespaces**: Use separate network namespaces - **Monitoring**: Enhanced monitoring for privileged containers ``` # Minimal privileged configuration services: vpn-node: privileged: false cap_add: - NET_ADMIN - NET_RAW devices: - /dev/net/tun ``` ## SSL/TLS Security ### Traefik SSL Configuration #### Let's Encrypt Integration ``` # traefik.yml - Secure ACME configuration certificatesResolvers: cf: acme: email: "security@yourdomain.com" storage: /letsencrypt/acme.json dnsChallenge: provider: cloudflare delayBeforeCheck: 60 ``` #### SSL Security Headers ``` # security-headers.yml - Enhanced headers http: middlewares: security-headers: headers: customResponseHeaders: X-Frame-Options: "DENY" X-Content-Type-Options: "nosniff" X-XSS-Protection: "1; mode=block" Strict-Transport-Security: "max-age=31536000; includeSubDomains; preload" Content-Security-Policy: "default-src 'self'; script-src 'self' 'unsafe-inline'; style-src 'self' 'unsafe-inline'" Referrer-Policy: "strict-origin-when-cross-origin" Permissions-Policy: "geolocation=(), microphone=(), camera=()" ``` ### Certificate Management #### Automatic Renewal ``` # Verify certificate renewal docker exec traefik cat /letsencrypt/acme.json | jq '.cf.Certificates[0].certificate' | base64 -d | openssl x509 -text -noout ``` #### Certificate Monitoring ``` #!/bin/bash # Certificate expiry monitoring script CERT_PATH="/opt/vpn-exit-controller/traefik/letsencrypt/acme.json" EXPIRY_DAYS=30 # Extract and check certificate expiry # Add to cron for regular monitoring ``` ## Data Protection ### Logging Security #### Log Configuration ``` # Secure logging configuration import logging from logging.handlers import RotatingFileHandler # Avoid logging sensitive data class SensitiveDataFilter(logging.Filter): def filter(self, record): # Remove passwords, tokens, etc. record.msg = re.sub(r'password=[^&\s]+', 'password=***', str(record.msg)) record.msg = re.sub(r'token=[^&\s]+', 'token=***', str(record.msg)) return True handler = RotatingFileHandler('/var/log/vpn-controller.log', maxBytes=10485760, backupCount=5) handler.addFilter(SensitiveDataFilter()) ``` #### Log Retention Policy - **Retention Period**: 90 days for operational logs - **Security Logs**: 1 year minimum - **Compliance Logs**: As required by regulations - **Log Rotation**: Daily rotation with compression ### Sensitive Data Handling #### Environment Variables ``` # Secure environment variable handling cat > /opt/vpn-exit-controller/.env << EOF # Never commit this file to version control SECRET_KEY=$(openssl rand -hex 32) ADMIN_USER=secure_admin ADMIN_PASS=$(openssl rand -base64 32) TAILSCALE_AUTHKEY=tskey-auth-*** NORDVPN_USER=*** NORDVPN_PASS=*** CLOUDFLARE_API_TOKEN=*** EOF chmod 600 /opt/vpn-exit-controller/.env ``` #### Redis Security ``` # redis.conf security settings requirepass "$(openssl rand -base64 32)" bind 127.0.0.1 protected-mode yes port 0 # Disable TCP port unixsocket /var/run/redis/redis.sock unixsocketperm 770 ``` ### Backup and Recovery Security #### Encrypted Backups ``` #!/bin/bash # Secure backup script BACKUP_DIR="/secure/backups" DATE=$(date +%Y%m%d_%H%M%S) # Create encrypted backup tar -czf - /opt/vpn-exit-controller | gpg --symmetric --cipher-algo AES256 > "$BACKUP_DIR/vpn-controller-$DATE.tar.gz.gpg" # Set secure permissions chmod 600 "$BACKUP_DIR/vpn-controller-$DATE.tar.gz.gpg" ``` ## Operational Security ### System Monitoring #### Security Monitoring ``` # Install fail2ban for brute force protection apt install fail2ban # Configure jail for API endpoints cat > /etc/fail2ban/jail.d/vpn-api.conf << EOF [vpn-api] enabled = true port = 8080 filter = vpn-api logpath = /var/log/vpn-controller.log maxretry = 5 bantime = 3600 EOF ``` #### Performance Monitoring - **Resource Usage**: Monitor CPU, memory, disk usage - **Network Traffic**: Monitor unusual traffic patterns - **Container Health**: Monitor container status and resource usage ### Security Updates #### Update Schedule - **Security Updates**: Weekly automated security updates - **System Updates**: Monthly maintenance windows - **Container Images**: Monthly base image updates ``` # Automated security updates cat > /etc/apt/apt.conf.d/50unattended-upgrades << EOF Unattended-Upgrade::Allowed-Origins { "\${distro_id}:\${distro_codename}-security"; }; Unattended-Upgrade::AutoFixInterruptedDpkg "true"; Unattended-Upgrade::Remove-Unused-Dependencies "true"; EOF ``` ### Access Control #### SSH Hardening ``` # /etc/ssh/sshd_config security settings Port 2222 PermitRootLogin no PasswordAuthentication no PubkeyAuthentication yes X11Forwarding no MaxAuthTries 3 ClientAliveInterval 300 ClientAliveCountMax 2 ``` #### User Management - **Principle of Least Privilege**: Grant minimum required permissions - **Regular Audits**: Monthly access reviews - **Multi-Factor Authentication**: Implement for privileged accounts ### Incident Response #### Response Plan 1. **Detection**: Automated alerting for security events 2. **Containment**: Immediate isolation procedures 3. **Investigation**: Forensic data collection 4. **Recovery**: Secure restoration procedures 5. **Lessons Learned**: Post-incident review #### Emergency Procedures ``` # Emergency shutdown script #!/bin/bash # Stop all VPN nodes docker stop $(docker ps -q --filter "ancestor=vpn-exit-node:latest") # Stop main services systemctl stop vpn-controller systemctl stop docker # Block all traffic iptables -P INPUT DROP iptables -P FORWARD DROP iptables -P OUTPUT DROP ``` ## Compliance Considerations ### Privacy Regulations #### GDPR Compliance - **Data Minimization**: Collect only necessary data - **Retention Limits**: Implement data retention policies - **User Rights**: Provide data access and deletion capabilities - **Consent Management**: Document legal basis for processing #### Data Processing Records ``` { "processing_activity": "VPN Traffic Routing", "data_categories": ["IP addresses", "Connection timestamps", "Traffic volumes"], "legal_basis": "Legitimate interest", "retention_period": "30 days", "security_measures": ["Encryption", "Access controls", "Audit logging"] } ``` ### VPN Service Terms #### NordVPN Compliance - **Terms of Service**: Regular review of provider terms - **Usage Monitoring**: Ensure compliance with usage limits - **Prohibited Activities**: Monitor and prevent policy violations ### Data Residency #### Geographic Restrictions - **Data Location**: Understand where data is processed and stored - **Cross-Border Transfers**: Implement appropriate safeguards - **Jurisdiction Requirements**: Comply with local data protection laws ### Audit Trail Maintenance #### Comprehensive Logging ``` # Audit logging implementation import structlog audit_logger = structlog.get_logger("audit") def log_admin_action(user, action, resource, result): audit_logger.info( "admin_action", user=user, action=action, resource=resource, result=result, timestamp=datetime.utcnow().isoformat() ) ``` ## Security Hardening ### System Hardening #### Kernel Security ``` # /etc/sysctl.d/99-security.conf # Network security net.ipv4.ip_forward=1 # Required for VPN routing net.ipv4.conf.all.send_redirects=0 net.ipv4.conf.default.send_redirects=0 net.ipv4.conf.all.accept_redirects=0 net.ipv4.conf.default.accept_redirects=0 # Memory protection kernel.dmesg_restrict=1 kernel.kptr_restrict=1 kernel.yama.ptrace_scope=1 ``` #### File System Security ``` # Secure mount options # /etc/fstab tmpfs /tmp tmpfs defaults,nodev,nosuid,noexec 0 0 tmpfs /var/tmp tmpfs defaults,nodev,nosuid,noexec 0 0 ``` ### Service Configuration Security #### Systemd Service Hardening ``` # /etc/systemd/system/vpn-controller.service [Unit] Description=VPN Exit Controller After=docker.service [Service] Type=exec User=vpn-controller Group=vpn-controller ExecStart=/opt/vpn-exit-controller/start.sh Restart=always RestartSec=10 # Security settings NoNewPrivileges=true ProtectSystem=strict ProtectHome=true PrivateTmp=true PrivateDevices=true ProtectKernelTunables=true ProtectKernelModules=true ProtectControlGroups=true ReadWritePaths=/opt/vpn-exit-controller ``` ### Regular Security Assessments #### Security Checklist **Monthly Checks:** - [ ] Review access logs for anomalies - [ ] Update all container images - [ ] Check certificate expiry dates - [ ] Review firewall rules - [ ] Audit user access **Quarterly Checks:** - [ ] Penetration testing of API endpoints - [ ] Review and update security policies - [ ] Audit third-party dependencies - [ ] Review backup and recovery procedures **Annual Checks:** - [ ] Comprehensive security audit - [ ] Business continuity testing - [ ] Compliance assessment - [ ] Security training updates #### Automated Security Scanning ``` # Container vulnerability scanning docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \ aquasec/trivy image vpn-exit-node:latest # System vulnerability scanning lynis audit system ``` ### Vulnerability Management #### Vulnerability Scanning - **Automated Scans**: Daily vulnerability scans - **Patch Management**: Prioritized patching based on severity - **Zero-Day Response**: Emergency response procedures #### Security Tools ``` # Install security monitoring tools apt install -y \ aide \ rkhunter \ chkrootkit \ lynis \ fail2ban ``` ## Security Incident Contacts - **Security Team**: security@yourdomain.com - **Emergency Contact**: +1-555-0123 - **Incident Response**: Available 24/7 ## Additional Resources - OWASP Top 10 - NIST Cybersecurity Framework - Docker Security Best Practices - Tailscale Security Documentation --- **Last Updated:** 2025-08-04 **Version:** 1.0 **Review Schedule:** Quarterly > **Important:** This security guide should be reviewed and updated regularly to address new threats and vulnerabilities. All security measures should be tested in a non-production environment before implementation. --- ## Additional Resources - API Endpoint: https://vpn.rbnk.uk:8080/api - Dashboard: https://vpn.rbnk.uk - Documentation: https://vpn-docs.rbnk.uk - Repository: https://gitea.rbnk.uk/admin/vpn-controller