VPN Exit Controller - Maintenance Guide¶
This document provides comprehensive maintenance procedures for the VPN Exit Controller system running on Proxmox LXC container (ID: 201).
Table of Contents¶
- Routine Maintenance Tasks
- System Monitoring
- Backup and Recovery
- Updates and Upgrades
- Certificate Management
- VPN Service Management
- Capacity Management
- Security Maintenance
- Documentation Updates
- Emergency Procedures
1. Routine Maintenance Tasks¶
Daily Health Checks (Automated)¶
Create a daily health check script at /opt/vpn-exit-controller/scripts/daily-check.sh:
#!/bin/bash
# Daily health check script
LOG_FILE="/var/log/vpn-controller-health.log"
DATE=$(date "+%Y-%m-%d %H:%M:%S")
echo "[$DATE] Starting daily health check" >> $LOG_FILE
# Check system resources
CPU_USAGE=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | awk -F'%' '{print $1}')
MEM_USAGE=$(free | grep Mem | awk '{printf("%.2f"), $3/$2 * 100.0}')
DISK_USAGE=$(df -h /opt | awk 'NR==2 {print $5}' | sed 's/%//')
echo "[$DATE] CPU: ${CPU_USAGE}%, Memory: ${MEM_USAGE}%, Disk: ${DISK_USAGE}%" >> $LOG_FILE
# Check service status
if systemctl is-active --quiet vpn-controller; then
echo "[$DATE] VPN Controller service: RUNNING" >> $LOG_FILE
else
echo "[$DATE] VPN Controller service: FAILED" >> $LOG_FILE
systemctl restart vpn-controller
fi
# Check Docker containers
RUNNING_CONTAINERS=$(docker ps --format "table {{.Names}}\t{{.Status}}" | grep -c "Up")
echo "[$DATE] Running containers: $RUNNING_CONTAINERS" >> $LOG_FILE
# Check API health
API_RESPONSE=$(curl -s -u admin:Bl4ckMagic!2345erver -o /dev/null -w "%{http_code}" http://localhost:8080/api/status)
if [ "$API_RESPONSE" = "200" ]; then
echo "[$DATE] API health: OK" >> $LOG_FILE
else
echo "[$DATE] API health: FAILED (HTTP $API_RESPONSE)" >> $LOG_FILE
fi
# Check Redis
if docker exec vpn-redis redis-cli ping | grep -q PONG; then
echo "[$DATE] Redis: OK" >> $LOG_FILE
else
echo "[$DATE] Redis: FAILED" >> $LOG_FILE
fi
# Alert on high resource usage
if [ "$CPU_USAGE" -gt 80 ] || [ "$MEM_USAGE" -gt 85 ] || [ "$DISK_USAGE" -gt 90 ]; then
echo "[$DATE] WARNING: High resource usage detected" >> $LOG_FILE
fi
echo "[$DATE] Daily health check completed" >> $LOG_FILE
Schedule: Add to crontab:
Weekly Performance Review¶
Run every Sunday at 2 AM:
#!/bin/bash
# Weekly performance review script
REPORT_FILE="/var/log/vpn-controller-weekly-$(date +%Y%m%d).log"
echo "Weekly Performance Report - $(date)" > $REPORT_FILE
echo "========================================" >> $REPORT_FILE
# System uptime
uptime >> $REPORT_FILE
# Docker container statistics
echo -e "\nContainer Resource Usage:" >> $REPORT_FILE
docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.NetIO}}" >> $REPORT_FILE
# VPN node performance
echo -e "\nVPN Node Statistics:" >> $REPORT_FILE
curl -s -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/metrics/summary >> $REPORT_FILE
# Log analysis
echo -e "\nError Summary (last 7 days):" >> $REPORT_FILE
journalctl -u vpn-controller --since "7 days ago" | grep -i error | wc -l >> $REPORT_FILE
# Redis memory usage
echo -e "\nRedis Memory Usage:" >> $REPORT_FILE
docker exec vpn-redis redis-cli info memory | grep used_memory_human >> $REPORT_FILE
Monthly System Updates¶
First Sunday of each month at 3 AM:
#!/bin/bash
# Monthly system update script
UPDATE_LOG="/var/log/monthly-updates-$(date +%Y%m).log"
echo "Monthly System Update - $(date)" > $UPDATE_LOG
# Update package lists
apt update >> $UPDATE_LOG 2>&1
# List available updates
echo "Available updates:" >> $UPDATE_LOG
apt list --upgradable >> $UPDATE_LOG 2>&1
# Install security updates only (safer for production)
DEBIAN_FRONTEND=noninteractive apt upgrade -y -o Dpkg::Options::="--force-confdef" >> $UPDATE_LOG 2>&1
# Clean up
apt autoremove -y >> $UPDATE_LOG 2>&1
apt autoclean >> $UPDATE_LOG 2>&1
# Check if reboot is required
if [ -f /var/run/reboot-required ]; then
echo "REBOOT REQUIRED" >> $UPDATE_LOG
# Schedule reboot for low-traffic time (4 AM)
shutdown -r 04:00 "Scheduled reboot for system updates"
fi
Quarterly Capacity Planning¶
First day of each quarter:
#!/bin/bash
# Quarterly capacity planning report
QUARTER=$(date +%Y-Q$(($(date +%m)/3+1)))
REPORT_FILE="/var/log/capacity-report-${QUARTER}.log"
echo "Quarterly Capacity Planning Report - $QUARTER" > $REPORT_FILE
echo "=============================================" >> $REPORT_FILE
# Historical resource usage trends
echo "Resource Usage Trends (last 90 days):" >> $REPORT_FILE
# Disk usage growth
df -h /opt >> $REPORT_FILE
# VPN node usage statistics
echo -e "\nVPN Node Utilization:" >> $REPORT_FILE
curl -s -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/metrics/nodes | jq '.[] | {country: .country, usage_percent: .usage_percent}' >> $REPORT_FILE
# Connection statistics
echo -e "\nConnection Statistics:" >> $REPORT_FILE
curl -s -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/metrics/connections >> $REPORT_FILE
# Recommendations section
echo -e "\nCapacity Recommendations:" >> $REPORT_FILE
echo "Review this report to determine if additional nodes or resources are needed." >> $REPORT_FILE
2. System Monitoring¶
Key Metrics to Watch¶
System Level Metrics¶
- CPU Usage: Should stay below 80% average
- Memory Usage: Should stay below 85%
- Disk Usage: Should stay below 90%
- Network I/O: Monitor for unusual spikes
- Load Average: Should be below number of CPU cores
Application Level Metrics¶
- API Response Time: < 500ms for health checks
- Active VPN Connections: Track connection counts
- Container Health: All containers should be "healthy"
- Redis Memory Usage: Monitor for memory leaks
- Failed Connection Attempts: Track error rates
Alert Thresholds and Escalation¶
Critical Alerts (Immediate Response)¶
- API unavailable (HTTP 5xx errors)
- System CPU > 95% for 5+ minutes
- System memory > 95%
- Disk usage > 95%
- All VPN nodes offline
- Redis unavailable
Warning Alerts (Response within 4 hours)¶
- System CPU > 80% for 15+ minutes
- System memory > 85%
- Disk usage > 90%
-
50% of VPN nodes offline
- High error rate (> 5%)
Info Alerts (Response within 24 hours)¶
- Single VPN node offline
- Slow API response times (> 1s)
- Log rotation needed
Log Monitoring and Rotation¶
Configure log rotation for VPN Controller:¶
Create /etc/logrotate.d/vpn-controller:
/var/log/vpn-controller*.log {
daily
rotate 30
compress
delaycompress
missingok
notifempty
create 644 root root
postrotate
systemctl reload vpn-controller
endscript
}
Monitor key log patterns:¶
# Create log monitoring script
#!/bin/bash
# Monitor critical log patterns
LOG_FILE="/var/log/vpn-controller-alerts.log"
JOURNAL_LOG=$(journalctl -u vpn-controller --since "1 hour ago" --no-pager)
# Check for critical errors
CRITICAL_ERRORS=$(echo "$JOURNAL_LOG" | grep -i "critical\|fatal\|emergency" | wc -l)
if [ "$CRITICAL_ERRORS" -gt 0 ]; then
echo "$(date): CRITICAL - $CRITICAL_ERRORS critical errors found" >> $LOG_FILE
fi
# Check for failed VPN connections
FAILED_CONNECTIONS=$(echo "$JOURNAL_LOG" | grep -i "connection failed\|vpn failed" | wc -l)
if [ "$FAILED_CONNECTIONS" -gt 10 ]; then
echo "$(date): WARNING - $FAILED_CONNECTIONS failed VPN connections in last hour" >> $LOG_FILE
fi
# Check for Docker issues
DOCKER_ERRORS=$(echo "$JOURNAL_LOG" | grep -i "docker.*error" | wc -l)
if [ "$DOCKER_ERRORS" -gt 0 ]; then
echo "$(date): WARNING - $DOCKER_ERRORS Docker errors found" >> $LOG_FILE
fi
Performance Baseline Tracking¶
Create baseline measurements script:
#!/bin/bash
# Performance baseline tracking
BASELINE_FILE="/var/log/performance-baseline.json"
TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%SZ)
# Collect performance metrics
API_RESPONSE_TIME=$(curl -o /dev/null -s -w "%{time_total}" -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/status)
MEMORY_USAGE=$(free | grep Mem | awk '{printf("%.1f"), $3/$2 * 100.0}')
CPU_USAGE=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | awk -F'%' '{print $1}')
DISK_USAGE=$(df /opt | awk 'NR==2 {print $5}' | sed 's/%//')
# Create JSON entry
cat >> $BASELINE_FILE << EOF
{
"timestamp": "$TIMESTAMP",
"api_response_time": $API_RESPONSE_TIME,
"memory_usage_percent": $MEMORY_USAGE,
"cpu_usage_percent": $CPU_USAGE,
"disk_usage_percent": $DISK_USAGE
}
EOF
3. Backup and Recovery¶
Configuration Backup Procedures¶
Daily Configuration Backup:¶
#!/bin/bash
# Daily configuration backup script
BACKUP_DIR="/opt/backups/vpn-controller"
DATE=$(date +%Y%m%d)
BACKUP_FILE="vpn-controller-config-${DATE}.tar.gz"
mkdir -p $BACKUP_DIR
# Backup configurations
tar -czf "${BACKUP_DIR}/${BACKUP_FILE}" \
/opt/vpn-exit-controller/configs/ \
/opt/vpn-exit-controller/.env \
/opt/vpn-exit-controller/docker-compose.yml \
/opt/vpn-exit-controller/traefik/ \
/etc/systemd/system/vpn-controller.service
# Keep only last 30 days of backups
find $BACKUP_DIR -name "vpn-controller-config-*.tar.gz" -mtime +30 -delete
echo "$(date): Configuration backup completed: $BACKUP_FILE"
Weekly Full System Backup:¶
#!/bin/bash
# Weekly full system backup
BACKUP_DIR="/opt/backups/vpn-controller"
DATE=$(date +%Y%m%d)
FULL_BACKUP="vpn-controller-full-${DATE}.tar.gz"
# Stop services for consistent backup
systemctl stop vpn-controller
docker-compose -f /opt/vpn-exit-controller/docker-compose.yml down
# Create full backup
tar -czf "${BACKUP_DIR}/${FULL_BACKUP}" \
/opt/vpn-exit-controller/ \
/etc/systemd/system/vpn-controller.service \
--exclude=/opt/vpn-exit-controller/venv/ \
--exclude=/opt/vpn-exit-controller/logs/ \
--exclude=/opt/vpn-exit-controller/data/cache/
# Start services
systemctl start vpn-controller
# Keep only last 4 weekly backups
find $BACKUP_DIR -name "vpn-controller-full-*.tar.gz" -mtime +28 -delete
echo "$(date): Full system backup completed: $FULL_BACKUP"
Redis Database Backup¶
#!/bin/bash
# Redis backup script
BACKUP_DIR="/opt/backups/redis"
DATE=$(date +%Y%m%d-%H%M)
REDIS_BACKUP="redis-backup-${DATE}.rdb"
mkdir -p $BACKUP_DIR
# Create Redis backup
docker exec vpn-redis redis-cli BGSAVE
sleep 5
# Copy the backup file
docker cp vpn-redis:/data/dump.rdb "${BACKUP_DIR}/${REDIS_BACKUP}"
# Keep only last 7 days of Redis backups
find $BACKUP_DIR -name "redis-backup-*.rdb" -mtime +7 -delete
echo "$(date): Redis backup completed: $REDIS_BACKUP"
Recovery Testing Procedures¶
Monthly Recovery Test:¶
#!/bin/bash
# Monthly recovery test procedure
TEST_DIR="/tmp/recovery-test-$(date +%Y%m%d)"
LATEST_BACKUP=$(ls -t /opt/backups/vpn-controller/vpn-controller-config-*.tar.gz | head -1)
echo "Testing recovery from backup: $LATEST_BACKUP"
# Create test directory
mkdir -p $TEST_DIR
# Extract backup
tar -xzf $LATEST_BACKUP -C $TEST_DIR
# Verify key files exist
REQUIRED_FILES=(
"opt/vpn-exit-controller/configs/auth.txt"
"opt/vpn-exit-controller/.env"
"opt/vpn-exit-controller/docker-compose.yml"
)
RECOVERY_SUCCESS=true
for file in "${REQUIRED_FILES[@]}"; do
if [ ! -f "$TEST_DIR/$file" ]; then
echo "ERROR: Missing file in backup: $file"
RECOVERY_SUCCESS=false
fi
done
# Test configuration validity
if [ "$RECOVERY_SUCCESS" = true ]; then
echo "Recovery test PASSED"
else
echo "Recovery test FAILED"
fi
# Cleanup
rm -rf $TEST_DIR
Disaster Recovery Planning¶
Full System Recovery Procedure:¶
-
Prepare New LXC Container:
# On Proxmox host pct create 201 /var/lib/vz/template/cache/ubuntu-22.04-standard_22.04-1_amd64.tar.zst \ --memory 4096 --cores 4 --storage local-lvm --size 50G \ --net0 name=eth0,bridge=vmbr1,ip=10.10.10.20/24,gw=10.10.10.1 \ --nameserver 8.8.8.8 --hostname vpn-exit-controller \ --features nesting=1,keyctl=1 \ --unprivileged 0 -
Start Container and Install Dependencies:
-
Restore from Backup:
# Copy latest backup to container LATEST_BACKUP=$(ls -t /opt/backups/vpn-controller/vpn-controller-full-*.tar.gz | head -1) tar -xzf $LATEST_BACKUP -C / # Restore permissions chown -R root:root /opt/vpn-exit-controller/ chmod +x /opt/vpn-exit-controller/start.sh # Restore systemd service systemctl daemon-reload systemctl enable vpn-controller systemctl start vpn-controller -
Verify Recovery:
4. Updates and Upgrades¶
System Package Updates¶
Security Updates (Weekly):¶
#!/bin/bash
# Weekly security updates
LOG_FILE="/var/log/security-updates.log"
echo "$(date): Starting security updates" >> $LOG_FILE
# Update package lists
apt update >> $LOG_FILE 2>&1
# Install security updates only
DEBIAN_FRONTEND=noninteractive apt-get -y upgrade \
-o Dpkg::Options::="--force-confdef" \
-o Dpkg::Options::="--force-confold" \
$(apt list --upgradable 2>/dev/null | grep -i security | cut -d/ -f1) >> $LOG_FILE 2>&1
# Check for reboot requirement
if [ -f /var/run/reboot-required ]; then
echo "$(date): Reboot required after security updates" >> $LOG_FILE
fi
echo "$(date): Security updates completed" >> $LOG_FILE
Full System Updates (Monthly):¶
#!/bin/bash
# Monthly full system updates (scheduled maintenance window)
MAINTENANCE_LOG="/var/log/maintenance-updates.log"
echo "$(date): Starting maintenance window - full system update" >> $MAINTENANCE_LOG
# Stop VPN service
systemctl stop vpn-controller >> $MAINTENANCE_LOG 2>&1
# Update all packages
apt update && apt full-upgrade -y >> $MAINTENANCE_LOG 2>&1
# Clean up
apt autoremove -y >> $MAINTENANCE_LOG 2>&1
apt autoclean >> $MAINTENANCE_LOG 2>&1
# Update Docker images
docker-compose -f /opt/vpn-exit-controller/docker-compose.yml pull >> $MAINTENANCE_LOG 2>&1
# Restart service
systemctl start vpn-controller >> $MAINTENANCE_LOG 2>&1
# Verify service is running
sleep 30
if systemctl is-active --quiet vpn-controller; then
echo "$(date): Service restarted successfully" >> $MAINTENANCE_LOG
else
echo "$(date): ERROR: Service failed to restart" >> $MAINTENANCE_LOG
fi
echo "$(date): Maintenance window completed" >> $MAINTENANCE_LOG
Docker Image Updates¶
Check for Updates:¶
#!/bin/bash
# Check for Docker image updates
IMAGES=(
"redis:7-alpine"
"traefik:v2.10"
)
for image in "${IMAGES[@]}"; do
echo "Checking updates for $image"
# Pull latest
docker pull $image
# Compare image IDs
LOCAL_ID=$(docker images --no-trunc --quiet $image | head -1)
REMOTE_ID=$(docker inspect --format='{{.Id}}' $image)
if [ "$LOCAL_ID" != "$REMOTE_ID" ]; then
echo "Update available for $image"
else
echo "No update available for $image"
fi
done
Update Custom VPN Node Image:¶
#!/bin/bash
# Update VPN node image
cd /opt/vpn-exit-controller/vpn-node
# Build new image
docker build -t vpn-exit-node:latest .
# Test new image
docker run --rm vpn-exit-node:latest --version
# Restart containers with new image
docker-compose -f /opt/vpn-exit-controller/docker-compose.yml up -d --force-recreate
Application Code Updates¶
Update from Git Repository:¶
#!/bin/bash
# Update application code from repository
cd /opt/vpn-exit-controller
# Backup current version
tar -czf "/opt/backups/pre-update-$(date +%Y%m%d).tar.gz" . --exclude=venv --exclude=data
# Stop service
systemctl stop vpn-controller
# Pull updates (if using git)
# git pull origin main
# Update Python dependencies
source venv/bin/activate
pip install -r api/requirements.txt
# Restart service
systemctl start vpn-controller
# Verify update
sleep 30
curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/status
Dependency Management¶
Python Dependencies Audit:¶
#!/bin/bash
# Audit Python dependencies for security vulnerabilities
cd /opt/vpn-exit-controller
source venv/bin/activate
# Check for outdated packages
pip list --outdated
# Security audit (install pip-audit if not available)
pip install pip-audit
pip-audit
# Generate requirements with exact versions
pip freeze > requirements-frozen.txt
5. Certificate Management¶
SSL Certificate Monitoring¶
Check Certificate Expiration:¶
#!/bin/bash
# Check SSL certificate expiration
CERT_FILE="/opt/vpn-exit-controller/traefik/letsencrypt/acme.json"
ALERT_DAYS=30
if [ -f "$CERT_FILE" ]; then
# Extract certificate expiration dates
python3 << EOF
import json
import base64
import datetime
from cryptography import x509
with open('$CERT_FILE', 'r') as f:
acme_data = json.load(f)
for resolver in acme_data.values():
for cert_data in resolver.get('Certificates', []):
cert_pem = base64.b64decode(cert_data['certificate']).decode()
cert = x509.load_pem_x509_certificate(cert_pem.encode())
domain = cert_data['domain']['main']
expiry = cert.not_valid_after
days_left = (expiry - datetime.datetime.now()).days
print(f"Domain: {domain}, Expires: {expiry}, Days left: {days_left}")
if days_left < $ALERT_DAYS:
print(f"WARNING: Certificate for {domain} expires in {days_left} days")
EOF
fi
Let's Encrypt Certificate Renewal¶
Automatic Renewal Check:¶
#!/bin/bash
# Check Let's Encrypt certificate renewal
# Traefik handles automatic renewal, but verify it's working
TRAEFIK_LOG="/opt/vpn-exit-controller/traefik/logs/traefik.log"
# Check for recent renewal attempts
if [ -f "$TRAEFIK_LOG" ]; then
echo "Recent certificate renewal attempts:"
grep -i "certificate\|acme" "$TRAEFIK_LOG" | tail -10
fi
# Verify certificate is valid
DOMAIN="your-domain.com" # Replace with actual domain
echo | openssl s_client -servername $DOMAIN -connect $DOMAIN:443 2>/dev/null | openssl x509 -noout -dates
Certificate Troubleshooting¶
Common Issues and Solutions:¶
#!/bin/bash
# Certificate troubleshooting script
echo "Certificate Troubleshooting Report"
echo "=================================="
# Check Traefik configuration
echo "1. Checking Traefik configuration..."
docker-compose -f /opt/vpn-exit-controller/traefik/docker-compose.traefik.yml config
# Check ACME challenge accessibility
echo "2. Checking ACME challenge accessibility..."
DOMAIN="your-domain.com" # Replace with actual domain
curl -I "http://$DOMAIN/.well-known/acme-challenge/test"
# Check DNS resolution
echo "3. Checking DNS resolution..."
nslookup $DOMAIN
# Check port accessibility
echo "4. Checking port 443 accessibility..."
nc -zv $DOMAIN 443
# Check Traefik dashboard
echo "5. Checking Traefik dashboard..."
curl -I http://localhost:8080/dashboard/
6. VPN Service Management¶
NordVPN Credential Rotation¶
Monthly Credential Check:¶
#!/bin/bash
# Check and rotate NordVPN credentials if needed
AUTH_FILE="/opt/vpn-exit-controller/configs/auth.txt"
CURRENT_DATE=$(date +%s)
FILE_AGE=$(stat -c %Y "$AUTH_FILE")
AGE_DAYS=$(( (CURRENT_DATE - FILE_AGE) / 86400 ))
echo "NordVPN credentials are $AGE_DAYS days old"
if [ $AGE_DAYS -gt 60 ]; then
echo "WARNING: NordVPN credentials are older than 60 days"
echo "Consider updating credentials in $AUTH_FILE"
# Test current credentials
echo "Testing current credentials..."
# This would require implementing a credential test function
fi
Credential Update Procedure:¶
#!/bin/bash
# Update NordVPN credentials
AUTH_FILE="/opt/vpn-exit-controller/configs/auth.txt"
BACKUP_FILE="/opt/backups/auth-backup-$(date +%Y%m%d).txt"
echo "Updating NordVPN credentials..."
# Backup current credentials
cp "$AUTH_FILE" "$BACKUP_FILE"
# Prompt for new credentials (in production, use secure input method)
echo "Enter new NordVPN username:"
read -r NEW_USERNAME
echo "Enter new NordVPN password:"
read -rs NEW_PASSWORD
# Update auth file
echo "$NEW_USERNAME" > "$AUTH_FILE"
echo "$NEW_PASSWORD" >> "$AUTH_FILE"
# Restart VPN containers to use new credentials
docker-compose -f /opt/vpn-exit-controller/docker-compose.yml restart
echo "Credentials updated. Testing connections..."
# Wait for containers to start
sleep 30
# Test API to verify VPN connections are working
curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/status
Server List Updates¶
Update NordVPN Server Configurations:¶
#!/bin/bash
# Update NordVPN server configurations
SCRIPT_PATH="/opt/vpn-exit-controller/scripts/download-nordvpn-configs.sh"
CONFIG_DIR="/opt/vpn-exit-controller/configs/vpn"
echo "Updating NordVPN server configurations..."
# Backup current configurations
tar -czf "/opt/backups/vpn-configs-backup-$(date +%Y%m%d).tar.gz" "$CONFIG_DIR"
# Run the download script
if [ -f "$SCRIPT_PATH" ]; then
bash "$SCRIPT_PATH"
echo "Server configurations updated"
# Restart service to load new configurations
systemctl restart vpn-controller
else
echo "ERROR: Download script not found at $SCRIPT_PATH"
fi
Performance Optimization¶
VPN Node Performance Tuning:¶
#!/bin/bash
# VPN node performance optimization
echo "VPN Performance Optimization Report"
echo "==================================="
# Check connection speeds for each country
COUNTRIES=("us" "uk" "de" "jp" "au")
for country in "${COUNTRIES[@]}"; do
echo "Testing $country nodes..."
# Use speed test API endpoint
SPEED_RESULT=$(curl -s -u admin:Bl4ckMagic!2345erver \
"http://localhost:8080/api/speed-test/$country" | jq '.download_speed')
echo "$country: $SPEED_RESULT Mbps"
done
# Check for underperforming nodes
echo -e "\nChecking for underperforming nodes..."
curl -s -u admin:Bl4ckMagic!2345erver \
"http://localhost:8080/api/metrics/nodes" | \
jq '.[] | select(.performance_score < 0.7) | {country: .country, score: .performance_score}'
Service Status Monitoring¶
Comprehensive VPN Service Check:¶
#!/bin/bash
# Comprehensive VPN service status check
STATUS_LOG="/var/log/vpn-service-status.log"
TIMESTAMP=$(date)
echo "[$TIMESTAMP] VPN Service Status Check" >> $STATUS_LOG
# Check each VPN container
VPN_CONTAINERS=$(docker ps --filter "name=vpn-" --format "{{.Names}}")
for container in $VPN_CONTAINERS; do
# Check container health
HEALTH=$(docker inspect --format='{{.State.Health.Status}}' $container 2>/dev/null || echo "no health check")
# Check VPN connection
IP_CHECK=$(docker exec $container curl -s --max-time 10 ifconfig.me 2>/dev/null || echo "failed")
echo "[$TIMESTAMP] $container: Health=$HEALTH, IP=$IP_CHECK" >> $STATUS_LOG
done
# Check overall API health
API_HEALTH=$(curl -s -u admin:Bl4ckMagic!2345erver -o /dev/null -w "%{http_code}" http://localhost:8080/api/status)
echo "[$TIMESTAMP] API Health: HTTP $API_HEALTH" >> $STATUS_LOG
# Check Redis connectivity
REDIS_HEALTH=$(docker exec vpn-redis redis-cli ping 2>/dev/null || echo "FAILED")
echo "[$TIMESTAMP] Redis Health: $REDIS_HEALTH" >> $STATUS_LOG
7. Capacity Management¶
Resource Usage Monitoring¶
Real-time Resource Monitor:¶
#!/bin/bash
# Real-time resource monitoring script
MONITOR_LOG="/var/log/resource-monitor.log"
while true; do
TIMESTAMP=$(date "+%Y-%m-%d %H:%M:%S")
# System resources
CPU_USAGE=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | awk -F'%' '{print $1}')
MEM_USAGE=$(free | grep Mem | awk '{printf("%.1f"), $3/$2 * 100.0}')
DISK_USAGE=$(df /opt | awk 'NR==2 {print $5}' | sed 's/%//')
# Docker resources
DOCKER_CONTAINERS=$(docker ps -q | wc -l)
# Network connections
CONNECTIONS=$(ss -tuln | wc -l)
echo "$TIMESTAMP,$CPU_USAGE,$MEM_USAGE,$DISK_USAGE,$DOCKER_CONTAINERS,$CONNECTIONS" >> $MONITOR_LOG
sleep 300 # 5-minute intervals
done
Scaling Decisions¶
Auto-scaling Triggers:¶
#!/bin/bash
# Auto-scaling decision logic
SCALE_LOG="/var/log/scaling-decisions.log"
TIMESTAMP=$(date)
# Get current metrics
CPU_USAGE=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | awk -F'%' '{print $1}')
ACTIVE_CONNECTIONS=$(curl -s -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/metrics/connections | jq '.active_connections')
RUNNING_NODES=$(docker ps --filter "name=vpn-" -q | wc -l)
echo "[$TIMESTAMP] Scaling Check: CPU=$CPU_USAGE%, Connections=$ACTIVE_CONNECTIONS, Nodes=$RUNNING_NODES" >> $SCALE_LOG
# Scaling up logic
if (( $(echo "$CPU_USAGE > 70" | bc -l) )) && [ "$ACTIVE_CONNECTIONS" -gt 100 ] && [ "$RUNNING_NODES" -lt 10 ]; then
echo "[$TIMESTAMP] SCALE UP: High load detected" >> $SCALE_LOG
# Trigger scale up via API
curl -X POST -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/nodes/scale-up
fi
# Scaling down logic
if (( $(echo "$CPU_USAGE < 30" | bc -l) )) && [ "$ACTIVE_CONNECTIONS" -lt 20 ] && [ "$RUNNING_NODES" -gt 3 ]; then
echo "[$TIMESTAMP] SCALE DOWN: Low load detected" >> $SCALE_LOG
# Trigger scale down via API
curl -X POST -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/nodes/scale-down
fi
Node Capacity Planning¶
Capacity Planning Analysis:¶
#!/bin/bash
# Node capacity planning analysis
REPORT_FILE="/var/log/capacity-analysis-$(date +%Y%m%d).log"
echo "Node Capacity Planning Analysis - $(date)" > $REPORT_FILE
echo "=========================================" >> $REPORT_FILE
# Current node utilization
echo -e "\nCurrent Node Utilization:" >> $REPORT_FILE
curl -s -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/metrics/nodes | \
jq -r '.[] | "\(.country): \(.active_connections) connections, \(.cpu_usage)% CPU, \(.memory_usage)% Memory"' >> $REPORT_FILE
# Peak usage analysis (last 7 days)
echo -e "\nPeak Usage Analysis (Last 7 Days):" >> $REPORT_FILE
# Historical connection data
PEAK_CONNECTIONS=$(grep "connections" /var/log/resource-monitor.log | \
tail -2016 | cut -d',' -f5 | sort -n | tail -1) # Last 7 days of 5-min intervals
echo "Peak concurrent connections: $PEAK_CONNECTIONS" >> $REPORT_FILE
# Capacity recommendations
echo -e "\nCapacity Recommendations:" >> $REPORT_FILE
if [ "$PEAK_CONNECTIONS" -gt 500 ]; then
echo "- Consider adding more VPN nodes for high-demand countries" >> $REPORT_FILE
fi
if [ "$(docker ps -q | wc -l)" -gt 15 ]; then
echo "- Current container count is high, consider resource optimization" >> $REPORT_FILE
fi
echo "- Monitor load balancing effectiveness across regions" >> $REPORT_FILE
Performance Optimization¶
System Performance Tuning:¶
#!/bin/bash
# System performance optimization
echo "Applying performance optimizations..."
# Network optimizations
echo 'net.core.rmem_max = 16777216' >> /etc/sysctl.conf
echo 'net.core.wmem_max = 16777216' >> /etc/sysctl.conf
echo 'net.ipv4.tcp_rmem = 4096 65536 16777216' >> /etc/sysctl.conf
echo 'net.ipv4.tcp_wmem = 4096 65536 16777216' >> /etc/sysctl.conf
# Apply changes
sysctl -p
# Docker performance optimizations
echo '{
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3"
},
"default-ulimits": {
"nofile": {
"Name": "nofile",
"Hard": 64000,
"Soft": 64000
}
}
}' > /etc/docker/daemon.json
systemctl restart docker
echo "Performance optimizations applied"
8. Security Maintenance¶
Security Patch Management¶
Critical Security Updates:¶
#!/bin/bash
# Critical security update management
SECURITY_LOG="/var/log/security-updates.log"
TIMESTAMP=$(date)
echo "[$TIMESTAMP] Starting critical security check" >> $SECURITY_LOG
# Check for available security updates
SECURITY_UPDATES=$(apt list --upgradable 2>/dev/null | grep -i security | wc -l)
if [ "$SECURITY_UPDATES" -gt 0 ]; then
echo "[$TIMESTAMP] $SECURITY_UPDATES security updates available" >> $SECURITY_LOG
# Apply critical security updates immediately
DEBIAN_FRONTEND=noninteractive apt-get -y install \
$(apt list --upgradable 2>/dev/null | grep -i security | cut -d/ -f1) \
>> $SECURITY_LOG 2>&1
# Check if reboot is required
if [ -f /var/run/reboot-required ]; then
echo "[$TIMESTAMP] CRITICAL: Reboot required for security updates" >> $SECURITY_LOG
# Schedule reboot during maintenance window
shutdown -r +60 "Security updates require reboot"
fi
else
echo "[$TIMESTAMP] No security updates available" >> $SECURITY_LOG
fi
# Update Docker base images for security
docker-compose -f /opt/vpn-exit-controller/docker-compose.yml pull >> $SECURITY_LOG 2>&1
Credential Rotation Schedules¶
Quarterly Credential Rotation:¶
#!/bin/bash
# Quarterly credential rotation
ROTATION_LOG="/var/log/credential-rotation.log"
TIMESTAMP=$(date)
echo "[$TIMESTAMP] Starting quarterly credential rotation" >> $ROTATION_LOG
# 1. API Admin Password Rotation
echo "[$TIMESTAMP] Rotating API admin password" >> $ROTATION_LOG
NEW_API_PASSWORD=$(openssl rand -base64 32)
# Update environment file
sed -i "s/ADMIN_PASS=.*/ADMIN_PASS=$NEW_API_PASSWORD/" /opt/vpn-exit-controller/.env
# 2. Redis Password (if used)
# NEW_REDIS_PASSWORD=$(openssl rand -base64 32)
# 3. Secret Key Rotation
NEW_SECRET_KEY=$(openssl rand -base64 64)
sed -i "s/SECRET_KEY=.*/SECRET_KEY=$NEW_SECRET_KEY/" /opt/vpn-exit-controller/.env
# 4. Restart services with new credentials
systemctl restart vpn-controller
echo "[$TIMESTAMP] Credential rotation completed" >> $ROTATION_LOG
echo "[$TIMESTAMP] New API password: $NEW_API_PASSWORD" >> $ROTATION_LOG
Access Review Procedures¶
Monthly Access Review:¶
#!/bin/bash
# Monthly access review
REVIEW_LOG="/var/log/access-review-$(date +%Y%m).log"
echo "Monthly Access Review - $(date)" > $REVIEW_LOG
echo "=================================" >> $REVIEW_LOG
# Review systemd service permissions
echo -e "\nService File Permissions:" >> $REVIEW_LOG
ls -la /etc/systemd/system/vpn-controller.service >> $REVIEW_LOG
# Review application directory permissions
echo -e "\nApplication Directory Permissions:" >> $REVIEW_LOG
ls -la /opt/vpn-exit-controller/ >> $REVIEW_LOG
# Review Docker socket access
echo -e "\nDocker Socket Access:" >> $REVIEW_LOG
ls -la /var/run/docker.sock >> $REVIEW_LOG
# Review network exposure
echo -e "\nNetwork Exposure:" >> $REVIEW_LOG
ss -tuln >> $REVIEW_LOG
# Review recent authentication attempts
echo -e "\nRecent Authentication Attempts:" >> $REVIEW_LOG
journalctl -u vpn-controller --since "30 days ago" | grep -i "auth\|login" | tail -20 >> $REVIEW_LOG
Security Audit Schedules¶
Comprehensive Security Audit:¶
#!/bin/bash
# Comprehensive security audit
AUDIT_LOG="/var/log/security-audit-$(date +%Y%m%d).log"
echo "Security Audit Report - $(date)" > $AUDIT_LOG
echo "===============================" >> $AUDIT_LOG
# 1. System vulnerabilities scan
echo -e "\n1. System Vulnerability Scan:" >> $AUDIT_LOG
if command -v lynis &> /dev/null; then
lynis audit system --quiet >> $AUDIT_LOG
else
echo "Lynis not installed. Install with: apt install lynis" >> $AUDIT_LOG
fi
# 2. Open ports audit
echo -e "\n2. Open Ports Audit:" >> $AUDIT_LOG
nmap -sS -p- localhost >> $AUDIT_LOG 2>&1
# 3. Docker security audit
echo -e "\n3. Docker Security Audit:" >> $AUDIT_LOG
docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \
-v /usr/lib/systemd:/usr/lib/systemd \
-v /etc:/etc --label docker_bench_security \
docker/docker-bench-security >> $AUDIT_LOG 2>/dev/null || echo "Docker Bench Security not available" >> $AUDIT_LOG
# 4. File permissions audit
echo -e "\n4. Critical File Permissions:" >> $AUDIT_LOG
CRITICAL_FILES=(
"/opt/vpn-exit-controller/.env"
"/opt/vpn-exit-controller/configs/auth.txt"
"/etc/systemd/system/vpn-controller.service"
)
for file in "${CRITICAL_FILES[@]}"; do
if [ -f "$file" ]; then
ls -la "$file" >> $AUDIT_LOG
fi
done
# 5. Process audit
echo -e "\n5. Running Processes:" >> $AUDIT_LOG
ps aux | grep -E "(vpn|docker|redis)" >> $AUDIT_LOG
echo -e "\nSecurity audit completed. Review $AUDIT_LOG for findings." >> $AUDIT_LOG
9. Documentation Updates¶
Keeping Documentation Current¶
Documentation Update Checklist:¶
## Monthly Documentation Review Checklist
- [ ] Review API_DOCUMENTATION.md for new endpoints
- [ ] Update ARCHITECTURE.md for infrastructure changes
- [ ] Check DEPLOYMENT.md for new deployment procedures
- [ ] Verify SECURITY.md reflects current security measures
- [ ] Update this MAINTENANCE.md for new procedures
- [ ] Review configuration examples for accuracy
- [ ] Update version numbers and dependencies
- [ ] Check command examples are current
- [ ] Verify troubleshooting guides are accurate
- [ ] Update contact information and escalation procedures
Automated Documentation Validation:¶
#!/bin/bash
# Validate documentation accuracy
DOC_DIR="/opt/vpn-exit-controller"
VALIDATION_LOG="/var/log/doc-validation.log"
TIMESTAMP=$(date)
echo "[$TIMESTAMP] Starting documentation validation" > $VALIDATION_LOG
# Check if mentioned files exist
DOCS=("API_DOCUMENTATION.md" "ARCHITECTURE.md" "DEPLOYMENT.md" "SECURITY.md" "MAINTENANCE.md")
for doc in "${DOCS[@]}"; do
if [ -f "$DOC_DIR/$doc" ]; then
echo "[$TIMESTAMP] ✓ $doc exists" >> $VALIDATION_LOG
# Check for outdated information
LAST_MODIFIED=$(stat -c %Y "$DOC_DIR/$doc")
CURRENT_TIME=$(date +%s)
DAYS_OLD=$(( (CURRENT_TIME - LAST_MODIFIED) / 86400 ))
if [ $DAYS_OLD -gt 90 ]; then
echo "[$TIMESTAMP] ⚠ $doc is $DAYS_OLD days old (review needed)" >> $VALIDATION_LOG
fi
else
echo "[$TIMESTAMP] ✗ $doc missing" >> $VALIDATION_LOG
fi
done
# Validate command examples in documentation
echo "[$TIMESTAMP] Validating command examples..." >> $VALIDATION_LOG
# Test API endpoint mentioned in docs
if curl -s -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/status > /dev/null; then
echo "[$TIMESTAMP] ✓ API endpoint accessible" >> $VALIDATION_LOG
else
echo "[$TIMESTAMP] ✗ API endpoint not accessible" >> $VALIDATION_LOG
fi
# Test service commands
if systemctl is-active --quiet vpn-controller; then
echo "[$TIMESTAMP] ✓ vpn-controller service running" >> $VALIDATION_LOG
else
echo "[$TIMESTAMP] ✗ vpn-controller service not running" >> $VALIDATION_LOG
fi
Change Log Maintenance¶
Automated Change Log Updates:¶
#!/bin/bash
# Update CHANGELOG.md with recent changes
CHANGELOG="/opt/vpn-exit-controller/CHANGELOG.md"
TEMP_LOG="/tmp/changelog-temp.md"
# Get recent commits (if using git)
if [ -d ".git" ]; then
echo "## $(date +%Y-%m-%d) - Maintenance Update" > $TEMP_LOG
echo "" >> $TEMP_LOG
# Add git log entries
git log --since="1 month ago" --pretty=format:"- %s (%an)" >> $TEMP_LOG
echo "" >> $TEMP_LOG
echo "" >> $TEMP_LOG
# Prepend to existing changelog
if [ -f "$CHANGELOG" ]; then
cat "$CHANGELOG" >> $TEMP_LOG
mv "$TEMP_LOG" "$CHANGELOG"
else
mv "$TEMP_LOG" "$CHANGELOG"
fi
fi
Configuration Documentation¶
Auto-generate Configuration Documentation:¶
#!/bin/bash
# Generate current configuration documentation
CONFIG_DOC="/opt/vpn-exit-controller/CONFIG_CURRENT.md"
echo "# Current System Configuration" > $CONFIG_DOC
echo "Generated on: $(date)" >> $CONFIG_DOC
echo "" >> $CONFIG_DOC
# System information
echo "## System Information" >> $CONFIG_DOC
echo "- **OS**: $(lsb_release -d | cut -f2)" >> $CONFIG_DOC
echo "- **Kernel**: $(uname -r)" >> $CONFIG_DOC
echo "- **Docker Version**: $(docker --version)" >> $CONFIG_DOC
echo "- **Python Version**: $(python3 --version)" >> $CONFIG_DOC
echo "" >> $CONFIG_DOC
# Service status
echo "## Service Status" >> $CONFIG_DOC
echo "\`\`\`" >> $CONFIG_DOC
systemctl status vpn-controller --no-pager >> $CONFIG_DOC
echo "\`\`\`" >> $CONFIG_DOC
echo "" >> $CONFIG_DOC
# Docker containers
echo "## Running Containers" >> $CONFIG_DOC
echo "\`\`\`" >> $CONFIG_DOC
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}" >> $CONFIG_DOC
echo "\`\`\`" >> $CONFIG_DOC
echo "" >> $CONFIG_DOC
# Network configuration
echo "## Network Configuration" >> $CONFIG_DOC
echo "\`\`\`" >> $CONFIG_DOC
ip addr show | grep -E "(inet|link)" >> $CONFIG_DOC
echo "\`\`\`" >> $CONFIG_DOC
10. Emergency Procedures¶
System Outage Response¶
Emergency Response Playbook:¶
#!/bin/bash
# Emergency response script for system outages
EMERGENCY_LOG="/var/log/emergency-response.log"
TIMESTAMP=$(date)
echo "[$TIMESTAMP] EMERGENCY: System outage detected" >> $EMERGENCY_LOG
# 1. Immediate Assessment
echo "[$TIMESTAMP] Step 1: Immediate assessment" >> $EMERGENCY_LOG
# Check if container is running
if pct status 201 | grep -q "status: running"; then
echo "[$TIMESTAMP] ✓ LXC container is running" >> $EMERGENCY_LOG
else
echo "[$TIMESTAMP] ✗ LXC container is stopped - starting now" >> $EMERGENCY_LOG
pct start 201
sleep 30
fi
# Check core services
SERVICES=("docker" "vpn-controller")
for service in "${SERVICES[@]}"; do
if systemctl is-active --quiet $service; then
echo "[$TIMESTAMP] ✓ $service is running" >> $EMERGENCY_LOG
else
echo "[$TIMESTAMP] ✗ $service is stopped - restarting" >> $EMERGENCY_LOG
systemctl restart $service
fi
done
# 2. Quick Recovery Attempt
echo "[$TIMESTAMP] Step 2: Quick recovery attempt" >> $EMERGENCY_LOG
# Restart VPN controller
systemctl restart vpn-controller
sleep 60
# Test API
if curl -s -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/status > /dev/null; then
echo "[$TIMESTAMP] ✓ API is responding - emergency recovery successful" >> $EMERGENCY_LOG
exit 0
fi
# 3. Docker Recovery
echo "[$TIMESTAMP] Step 3: Docker container recovery" >> $EMERGENCY_LOG
cd /opt/vpn-exit-controller
docker-compose down
docker-compose up -d
sleep 120
# Test again
if curl -s -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/status > /dev/null; then
echo "[$TIMESTAMP] ✓ API responding after Docker restart" >> $EMERGENCY_LOG
exit 0
fi
# 4. Full System Recovery
echo "[$TIMESTAMP] Step 4: Full system recovery required" >> $EMERGENCY_LOG
echo "[$TIMESTAMP] Escalating to manual intervention" >> $EMERGENCY_LOG
# Generate diagnostic report
/opt/vpn-exit-controller/scripts/generate-diagnostic-report.sh >> $EMERGENCY_LOG
echo "[$TIMESTAMP] Emergency procedures completed - manual intervention required" >> $EMERGENCY_LOG
Emergency Contact Procedures¶
Contact List and Escalation:¶
## Emergency Contact Procedures
### Severity Levels
#### P1 - Critical (Response Time: 15 minutes)
- Complete system outage
- Security breach
- Data loss
#### P2 - High (Response Time: 2 hours)
- Partial system outage
- Performance degradation > 50%
- Single node failures affecting service
#### P3 - Medium (Response Time: 24 hours)
- Minor performance issues
- Single container failures
- Non-critical errors
#### P4 - Low (Response Time: 72 hours)
- Cosmetic issues
- Documentation updates
- Feature requests
### Contact Information
**Primary On-Call Engineer**
- Name: [Your Name]
- Phone: [Phone Number]
- Email: [Email Address]
- Signal/WhatsApp: [Number]
**Secondary Engineer**
- Name: [Backup Name]
- Phone: [Phone Number]
- Email: [Email Address]
**Infrastructure Team**
- Email: [email protected]
- Slack: #infrastructure-alerts
### Escalation Matrix
1. **0-15 minutes**: Primary engineer response
2. **15-30 minutes**: Escalate to secondary engineer
3. **30-60 minutes**: Escalate to infrastructure team lead
4. **1+ hours**: Escalate to engineering manager
Critical Issue Escalation¶
Automated Escalation Script:¶
#!/bin/bash
# Automated escalation for critical issues
ISSUE_TYPE="$1"
SEVERITY="$2"
DETAILS="$3"
ESCALATION_LOG="/var/log/escalation.log"
TIMESTAMP=$(date)
echo "[$TIMESTAMP] ESCALATION: $ISSUE_TYPE (Severity: $SEVERITY)" >> $ESCALATION_LOG
echo "[$TIMESTAMP] Details: $DETAILS" >> $ESCALATION_LOG
# Generate system snapshot
SNAPSHOT_FILE="/tmp/system-snapshot-$(date +%Y%m%d-%H%M%S).txt"
cat > $SNAPSHOT_FILE << EOF
EMERGENCY SYSTEM SNAPSHOT
========================
Time: $(date)
Issue: $ISSUE_TYPE
Severity: $SEVERITY
Details: $DETAILS
System Status:
$(systemctl status vpn-controller --no-pager)
Docker Status:
$(docker ps -a)
Resource Usage:
$(free -h)
$(df -h)
Recent Logs:
$(journalctl -u vpn-controller --since "1 hour ago" --no-pager | tail -50)
Network Status:
$(ss -tuln)
EOF
# Send alerts based on severity
case $SEVERITY in
"P1"|"CRITICAL")
# Immediate alerts
echo "[$TIMESTAMP] Sending P1 alerts" >> $ESCALATION_LOG
# curl -X POST webhook-url or send email
;;
"P2"|"HIGH")
echo "[$TIMESTAMP] Sending P2 alerts" >> $ESCALATION_LOG
# Send high priority alerts
;;
*)
echo "[$TIMESTAMP] Logging issue for review" >> $ESCALATION_LOG
;;
esac
echo "[$TIMESTAMP] Escalation process initiated" >> $ESCALATION_LOG
Recovery Time Objectives¶
RTO/RPO Definitions:¶
## Recovery Time and Point Objectives
### Recovery Time Objectives (RTO)
| Component | Target RTO | Maximum Acceptable |
|-----------|------------|-------------------|
| VPN API Service | 5 minutes | 15 minutes |
| Individual VPN Nodes | 2 minutes | 5 minutes |
| Database (Redis) | 3 minutes | 10 minutes |
| Full System | 15 minutes | 30 minutes |
| Container Infrastructure | 10 minutes | 20 minutes |
### Recovery Point Objectives (RPO)
| Data Type | Target RPO | Backup Frequency |
|-----------|------------|------------------|
| Configuration Files | 0 minutes | Real-time sync |
| System Metrics | 5 minutes | Continuous |
| Connection Logs | 15 minutes | Every 15 minutes |
| Performance Data | 1 hour | Hourly snapshots |
### Service Level Objectives (SLO)
- **Availability**: 99.9% (8.76 hours downtime/year)
- **API Response Time**: < 500ms (95th percentile)
- **VPN Connection Success Rate**: > 99%
- **Data Recovery Success Rate**: 100%
### Disaster Recovery Scenarios
#### Scenario 1: Complete LXC Container Failure
- **RTO**: 30 minutes
- **Procedure**: Restore from latest backup to new container
- **Validation**: Full service test suite
#### Scenario 2: Docker Infrastructure Failure
- **RTO**: 15 minutes
- **Procedure**: Restart Docker daemon, rebuild containers
- **Validation**: Container health checks
#### Scenario 3: Database Corruption
- **RTO**: 10 minutes
- **Procedure**: Restore Redis from latest backup
- **Validation**: Data integrity verification
#### Scenario 4: Configuration Corruption
- **RTO**: 5 minutes
- **Procedure**: Restore from configuration backup
- **Validation**: Service functionality test
Maintenance Schedule Summary¶
Daily (Automated)¶
- ✅ Health checks (6:00 AM)
- ✅ Resource monitoring (Every 5 minutes)
- ✅ Log rotation checks
- ✅ Basic performance metrics
Weekly¶
- ✅ Performance review (Sunday 2:00 AM)
- ✅ Security updates (Sunday 3:00 AM)
- ✅ Configuration backup verification
- ✅ VPN node performance analysis
Monthly¶
- ✅ Full system updates (First Sunday 3:00 AM)
- ✅ Access review
- ✅ Documentation validation
- ✅ Capacity planning review
- ✅ Recovery testing
Quarterly¶
- ✅ Comprehensive security audit
- ✅ Credential rotation
- ✅ Disaster recovery testing
- ✅ Performance optimization review
- ✅ Documentation major updates
Quick Reference Commands¶
Emergency Commands¶
# Emergency service restart
systemctl restart vpn-controller
# Check service status
systemctl status vpn-controller
# View real-time logs
journalctl -u vpn-controller -f
# Test API health
curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/status
# Docker container status
docker ps -a
# Resource usage
htop
df -h
free -h
Maintenance Commands¶
# Activate Python environment
source /opt/vpn-exit-controller/venv/bin/activate
# Update system packages
apt update && apt upgrade -y
# Restart all containers
docker-compose -f /opt/vpn-exit-controller/docker-compose.yml restart
# Create configuration backup
tar -czf backup-$(date +%Y%m%d).tar.gz /opt/vpn-exit-controller/configs/
# Check certificate expiration
openssl x509 -in cert.pem -noout -dates
Document Version: 1.0
Last Updated: $(date)
Next Review: $(date -d "+1 month")
This maintenance guide should be reviewed and updated monthly to ensure accuracy and completeness. All procedures should be tested in a development environment before applying to production.