Skip to content

VPN Exit Controller - Maintenance Guide

This document provides comprehensive maintenance procedures for the VPN Exit Controller system running on Proxmox LXC container (ID: 201).

Table of Contents

  1. Routine Maintenance Tasks
  2. System Monitoring
  3. Backup and Recovery
  4. Updates and Upgrades
  5. Certificate Management
  6. VPN Service Management
  7. Capacity Management
  8. Security Maintenance
  9. Documentation Updates
  10. Emergency Procedures

1. Routine Maintenance Tasks

Daily Health Checks (Automated)

Create a daily health check script at /opt/vpn-exit-controller/scripts/daily-check.sh:

#!/bin/bash
# Daily health check script

LOG_FILE="/var/log/vpn-controller-health.log"
DATE=$(date "+%Y-%m-%d %H:%M:%S")

echo "[$DATE] Starting daily health check" >> $LOG_FILE

# Check system resources
CPU_USAGE=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | awk -F'%' '{print $1}')
MEM_USAGE=$(free | grep Mem | awk '{printf("%.2f"), $3/$2 * 100.0}')
DISK_USAGE=$(df -h /opt | awk 'NR==2 {print $5}' | sed 's/%//')

echo "[$DATE] CPU: ${CPU_USAGE}%, Memory: ${MEM_USAGE}%, Disk: ${DISK_USAGE}%" >> $LOG_FILE

# Check service status
if systemctl is-active --quiet vpn-controller; then
    echo "[$DATE] VPN Controller service: RUNNING" >> $LOG_FILE
else
    echo "[$DATE] VPN Controller service: FAILED" >> $LOG_FILE
    systemctl restart vpn-controller
fi

# Check Docker containers
RUNNING_CONTAINERS=$(docker ps --format "table {{.Names}}\t{{.Status}}" | grep -c "Up")
echo "[$DATE] Running containers: $RUNNING_CONTAINERS" >> $LOG_FILE

# Check API health
API_RESPONSE=$(curl -s -u admin:Bl4ckMagic!2345erver -o /dev/null -w "%{http_code}" http://localhost:8080/api/status)
if [ "$API_RESPONSE" = "200" ]; then
    echo "[$DATE] API health: OK" >> $LOG_FILE
else
    echo "[$DATE] API health: FAILED (HTTP $API_RESPONSE)" >> $LOG_FILE
fi

# Check Redis
if docker exec vpn-redis redis-cli ping | grep -q PONG; then
    echo "[$DATE] Redis: OK" >> $LOG_FILE
else
    echo "[$DATE] Redis: FAILED" >> $LOG_FILE
fi

# Alert on high resource usage
if [ "$CPU_USAGE" -gt 80 ] || [ "$MEM_USAGE" -gt 85 ] || [ "$DISK_USAGE" -gt 90 ]; then
    echo "[$DATE] WARNING: High resource usage detected" >> $LOG_FILE
fi

echo "[$DATE] Daily health check completed" >> $LOG_FILE

Schedule: Add to crontab:

0 6 * * * /opt/vpn-exit-controller/scripts/daily-check.sh

Weekly Performance Review

Run every Sunday at 2 AM:

#!/bin/bash
# Weekly performance review script

REPORT_FILE="/var/log/vpn-controller-weekly-$(date +%Y%m%d).log"

echo "Weekly Performance Report - $(date)" > $REPORT_FILE
echo "========================================" >> $REPORT_FILE

# System uptime
uptime >> $REPORT_FILE

# Docker container statistics
echo -e "\nContainer Resource Usage:" >> $REPORT_FILE
docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.NetIO}}" >> $REPORT_FILE

# VPN node performance
echo -e "\nVPN Node Statistics:" >> $REPORT_FILE
curl -s -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/metrics/summary >> $REPORT_FILE

# Log analysis
echo -e "\nError Summary (last 7 days):" >> $REPORT_FILE
journalctl -u vpn-controller --since "7 days ago" | grep -i error | wc -l >> $REPORT_FILE

# Redis memory usage
echo -e "\nRedis Memory Usage:" >> $REPORT_FILE
docker exec vpn-redis redis-cli info memory | grep used_memory_human >> $REPORT_FILE

Monthly System Updates

First Sunday of each month at 3 AM:

#!/bin/bash
# Monthly system update script

UPDATE_LOG="/var/log/monthly-updates-$(date +%Y%m).log"

echo "Monthly System Update - $(date)" > $UPDATE_LOG

# Update package lists
apt update >> $UPDATE_LOG 2>&1

# List available updates
echo "Available updates:" >> $UPDATE_LOG
apt list --upgradable >> $UPDATE_LOG 2>&1

# Install security updates only (safer for production)
DEBIAN_FRONTEND=noninteractive apt upgrade -y -o Dpkg::Options::="--force-confdef" >> $UPDATE_LOG 2>&1

# Clean up
apt autoremove -y >> $UPDATE_LOG 2>&1
apt autoclean >> $UPDATE_LOG 2>&1

# Check if reboot is required
if [ -f /var/run/reboot-required ]; then
    echo "REBOOT REQUIRED" >> $UPDATE_LOG
    # Schedule reboot for low-traffic time (4 AM)
    shutdown -r 04:00 "Scheduled reboot for system updates"
fi

Quarterly Capacity Planning

First day of each quarter:

#!/bin/bash
# Quarterly capacity planning report

QUARTER=$(date +%Y-Q$(($(date +%m)/3+1)))
REPORT_FILE="/var/log/capacity-report-${QUARTER}.log"

echo "Quarterly Capacity Planning Report - $QUARTER" > $REPORT_FILE
echo "=============================================" >> $REPORT_FILE

# Historical resource usage trends
echo "Resource Usage Trends (last 90 days):" >> $REPORT_FILE

# Disk usage growth
df -h /opt >> $REPORT_FILE

# VPN node usage statistics
echo -e "\nVPN Node Utilization:" >> $REPORT_FILE
curl -s -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/metrics/nodes | jq '.[] | {country: .country, usage_percent: .usage_percent}' >> $REPORT_FILE

# Connection statistics
echo -e "\nConnection Statistics:" >> $REPORT_FILE
curl -s -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/metrics/connections >> $REPORT_FILE

# Recommendations section
echo -e "\nCapacity Recommendations:" >> $REPORT_FILE
echo "Review this report to determine if additional nodes or resources are needed." >> $REPORT_FILE

2. System Monitoring

Key Metrics to Watch

System Level Metrics

  • CPU Usage: Should stay below 80% average
  • Memory Usage: Should stay below 85%
  • Disk Usage: Should stay below 90%
  • Network I/O: Monitor for unusual spikes
  • Load Average: Should be below number of CPU cores

Application Level Metrics

  • API Response Time: < 500ms for health checks
  • Active VPN Connections: Track connection counts
  • Container Health: All containers should be "healthy"
  • Redis Memory Usage: Monitor for memory leaks
  • Failed Connection Attempts: Track error rates

Alert Thresholds and Escalation

Critical Alerts (Immediate Response)

  • API unavailable (HTTP 5xx errors)
  • System CPU > 95% for 5+ minutes
  • System memory > 95%
  • Disk usage > 95%
  • All VPN nodes offline
  • Redis unavailable

Warning Alerts (Response within 4 hours)

  • System CPU > 80% for 15+ minutes
  • System memory > 85%
  • Disk usage > 90%
  • 50% of VPN nodes offline

  • High error rate (> 5%)

Info Alerts (Response within 24 hours)

  • Single VPN node offline
  • Slow API response times (> 1s)
  • Log rotation needed

Log Monitoring and Rotation

Configure log rotation for VPN Controller:

Create /etc/logrotate.d/vpn-controller:

/var/log/vpn-controller*.log {
    daily
    rotate 30
    compress
    delaycompress
    missingok
    notifempty
    create 644 root root
    postrotate
        systemctl reload vpn-controller
    endscript
}

Monitor key log patterns:

# Create log monitoring script
#!/bin/bash
# Monitor critical log patterns

LOG_FILE="/var/log/vpn-controller-alerts.log"
JOURNAL_LOG=$(journalctl -u vpn-controller --since "1 hour ago" --no-pager)

# Check for critical errors
CRITICAL_ERRORS=$(echo "$JOURNAL_LOG" | grep -i "critical\|fatal\|emergency" | wc -l)
if [ "$CRITICAL_ERRORS" -gt 0 ]; then
    echo "$(date): CRITICAL - $CRITICAL_ERRORS critical errors found" >> $LOG_FILE
fi

# Check for failed VPN connections
FAILED_CONNECTIONS=$(echo "$JOURNAL_LOG" | grep -i "connection failed\|vpn failed" | wc -l)
if [ "$FAILED_CONNECTIONS" -gt 10 ]; then
    echo "$(date): WARNING - $FAILED_CONNECTIONS failed VPN connections in last hour" >> $LOG_FILE
fi

# Check for Docker issues
DOCKER_ERRORS=$(echo "$JOURNAL_LOG" | grep -i "docker.*error" | wc -l)
if [ "$DOCKER_ERRORS" -gt 0 ]; then
    echo "$(date): WARNING - $DOCKER_ERRORS Docker errors found" >> $LOG_FILE
fi

Performance Baseline Tracking

Create baseline measurements script:

#!/bin/bash
# Performance baseline tracking

BASELINE_FILE="/var/log/performance-baseline.json"
TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%SZ)

# Collect performance metrics
API_RESPONSE_TIME=$(curl -o /dev/null -s -w "%{time_total}" -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/status)
MEMORY_USAGE=$(free | grep Mem | awk '{printf("%.1f"), $3/$2 * 100.0}')
CPU_USAGE=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | awk -F'%' '{print $1}')
DISK_USAGE=$(df /opt | awk 'NR==2 {print $5}' | sed 's/%//')

# Create JSON entry
cat >> $BASELINE_FILE << EOF
{
  "timestamp": "$TIMESTAMP",
  "api_response_time": $API_RESPONSE_TIME,
  "memory_usage_percent": $MEMORY_USAGE,
  "cpu_usage_percent": $CPU_USAGE,
  "disk_usage_percent": $DISK_USAGE
}
EOF

3. Backup and Recovery

Configuration Backup Procedures

Daily Configuration Backup:

#!/bin/bash
# Daily configuration backup script

BACKUP_DIR="/opt/backups/vpn-controller"
DATE=$(date +%Y%m%d)
BACKUP_FILE="vpn-controller-config-${DATE}.tar.gz"

mkdir -p $BACKUP_DIR

# Backup configurations
tar -czf "${BACKUP_DIR}/${BACKUP_FILE}" \
    /opt/vpn-exit-controller/configs/ \
    /opt/vpn-exit-controller/.env \
    /opt/vpn-exit-controller/docker-compose.yml \
    /opt/vpn-exit-controller/traefik/ \
    /etc/systemd/system/vpn-controller.service

# Keep only last 30 days of backups
find $BACKUP_DIR -name "vpn-controller-config-*.tar.gz" -mtime +30 -delete

echo "$(date): Configuration backup completed: $BACKUP_FILE"

Weekly Full System Backup:

#!/bin/bash
# Weekly full system backup

BACKUP_DIR="/opt/backups/vpn-controller"
DATE=$(date +%Y%m%d)
FULL_BACKUP="vpn-controller-full-${DATE}.tar.gz"

# Stop services for consistent backup
systemctl stop vpn-controller
docker-compose -f /opt/vpn-exit-controller/docker-compose.yml down

# Create full backup
tar -czf "${BACKUP_DIR}/${FULL_BACKUP}" \
    /opt/vpn-exit-controller/ \
    /etc/systemd/system/vpn-controller.service \
    --exclude=/opt/vpn-exit-controller/venv/ \
    --exclude=/opt/vpn-exit-controller/logs/ \
    --exclude=/opt/vpn-exit-controller/data/cache/

# Start services
systemctl start vpn-controller

# Keep only last 4 weekly backups
find $BACKUP_DIR -name "vpn-controller-full-*.tar.gz" -mtime +28 -delete

echo "$(date): Full system backup completed: $FULL_BACKUP"

Redis Database Backup

#!/bin/bash
# Redis backup script

BACKUP_DIR="/opt/backups/redis"
DATE=$(date +%Y%m%d-%H%M)
REDIS_BACKUP="redis-backup-${DATE}.rdb"

mkdir -p $BACKUP_DIR

# Create Redis backup
docker exec vpn-redis redis-cli BGSAVE
sleep 5

# Copy the backup file
docker cp vpn-redis:/data/dump.rdb "${BACKUP_DIR}/${REDIS_BACKUP}"

# Keep only last 7 days of Redis backups
find $BACKUP_DIR -name "redis-backup-*.rdb" -mtime +7 -delete

echo "$(date): Redis backup completed: $REDIS_BACKUP"

Recovery Testing Procedures

Monthly Recovery Test:

#!/bin/bash
# Monthly recovery test procedure

TEST_DIR="/tmp/recovery-test-$(date +%Y%m%d)"
LATEST_BACKUP=$(ls -t /opt/backups/vpn-controller/vpn-controller-config-*.tar.gz | head -1)

echo "Testing recovery from backup: $LATEST_BACKUP"

# Create test directory
mkdir -p $TEST_DIR

# Extract backup
tar -xzf $LATEST_BACKUP -C $TEST_DIR

# Verify key files exist
REQUIRED_FILES=(
    "opt/vpn-exit-controller/configs/auth.txt"
    "opt/vpn-exit-controller/.env"
    "opt/vpn-exit-controller/docker-compose.yml"
)

RECOVERY_SUCCESS=true
for file in "${REQUIRED_FILES[@]}"; do
    if [ ! -f "$TEST_DIR/$file" ]; then
        echo "ERROR: Missing file in backup: $file"
        RECOVERY_SUCCESS=false
    fi
done

# Test configuration validity
if [ "$RECOVERY_SUCCESS" = true ]; then
    echo "Recovery test PASSED"
else
    echo "Recovery test FAILED"
fi

# Cleanup
rm -rf $TEST_DIR

Disaster Recovery Planning

Full System Recovery Procedure:

  1. Prepare New LXC Container:

    # On Proxmox host
    pct create 201 /var/lib/vz/template/cache/ubuntu-22.04-standard_22.04-1_amd64.tar.zst \
      --memory 4096 --cores 4 --storage local-lvm --size 50G \
      --net0 name=eth0,bridge=vmbr1,ip=10.10.10.20/24,gw=10.10.10.1 \
      --nameserver 8.8.8.8 --hostname vpn-exit-controller \
      --features nesting=1,keyctl=1 \
      --unprivileged 0
    

  2. Start Container and Install Dependencies:

    pct start 201
    pct enter 201
    
    apt update && apt upgrade -y
    apt install -y docker.io docker-compose python3 python3-pip python3-venv curl
    systemctl enable docker
    systemctl start docker
    

  3. Restore from Backup:

    # Copy latest backup to container
    LATEST_BACKUP=$(ls -t /opt/backups/vpn-controller/vpn-controller-full-*.tar.gz | head -1)
    tar -xzf $LATEST_BACKUP -C /
    
    # Restore permissions
    chown -R root:root /opt/vpn-exit-controller/
    chmod +x /opt/vpn-exit-controller/start.sh
    
    # Restore systemd service
    systemctl daemon-reload
    systemctl enable vpn-controller
    systemctl start vpn-controller
    

  4. Verify Recovery:

    # Check service status
    systemctl status vpn-controller
    
    # Test API
    curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/status
    
    # Check containers
    docker ps
    


4. Updates and Upgrades

System Package Updates

Security Updates (Weekly):

#!/bin/bash
# Weekly security updates

LOG_FILE="/var/log/security-updates.log"

echo "$(date): Starting security updates" >> $LOG_FILE

# Update package lists
apt update >> $LOG_FILE 2>&1

# Install security updates only
DEBIAN_FRONTEND=noninteractive apt-get -y upgrade \
    -o Dpkg::Options::="--force-confdef" \
    -o Dpkg::Options::="--force-confold" \
    $(apt list --upgradable 2>/dev/null | grep -i security | cut -d/ -f1) >> $LOG_FILE 2>&1

# Check for reboot requirement
if [ -f /var/run/reboot-required ]; then
    echo "$(date): Reboot required after security updates" >> $LOG_FILE
fi

echo "$(date): Security updates completed" >> $LOG_FILE

Full System Updates (Monthly):

#!/bin/bash
# Monthly full system updates (scheduled maintenance window)

MAINTENANCE_LOG="/var/log/maintenance-updates.log"

echo "$(date): Starting maintenance window - full system update" >> $MAINTENANCE_LOG

# Stop VPN service
systemctl stop vpn-controller >> $MAINTENANCE_LOG 2>&1

# Update all packages
apt update && apt full-upgrade -y >> $MAINTENANCE_LOG 2>&1

# Clean up
apt autoremove -y >> $MAINTENANCE_LOG 2>&1
apt autoclean >> $MAINTENANCE_LOG 2>&1

# Update Docker images
docker-compose -f /opt/vpn-exit-controller/docker-compose.yml pull >> $MAINTENANCE_LOG 2>&1

# Restart service
systemctl start vpn-controller >> $MAINTENANCE_LOG 2>&1

# Verify service is running
sleep 30
if systemctl is-active --quiet vpn-controller; then
    echo "$(date): Service restarted successfully" >> $MAINTENANCE_LOG
else
    echo "$(date): ERROR: Service failed to restart" >> $MAINTENANCE_LOG
fi

echo "$(date): Maintenance window completed" >> $MAINTENANCE_LOG

Docker Image Updates

Check for Updates:

#!/bin/bash
# Check for Docker image updates

IMAGES=(
    "redis:7-alpine"
    "traefik:v2.10"
)

for image in "${IMAGES[@]}"; do
    echo "Checking updates for $image"

    # Pull latest
    docker pull $image

    # Compare image IDs
    LOCAL_ID=$(docker images --no-trunc --quiet $image | head -1)
    REMOTE_ID=$(docker inspect --format='{{.Id}}' $image)

    if [ "$LOCAL_ID" != "$REMOTE_ID" ]; then
        echo "Update available for $image"
    else
        echo "No update available for $image"
    fi
done

Update Custom VPN Node Image:

#!/bin/bash
# Update VPN node image

cd /opt/vpn-exit-controller/vpn-node

# Build new image
docker build -t vpn-exit-node:latest .

# Test new image
docker run --rm vpn-exit-node:latest --version

# Restart containers with new image
docker-compose -f /opt/vpn-exit-controller/docker-compose.yml up -d --force-recreate

Application Code Updates

Update from Git Repository:

#!/bin/bash
# Update application code from repository

cd /opt/vpn-exit-controller

# Backup current version
tar -czf "/opt/backups/pre-update-$(date +%Y%m%d).tar.gz" . --exclude=venv --exclude=data

# Stop service
systemctl stop vpn-controller

# Pull updates (if using git)
# git pull origin main

# Update Python dependencies
source venv/bin/activate
pip install -r api/requirements.txt

# Restart service
systemctl start vpn-controller

# Verify update
sleep 30
curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/status

Dependency Management

Python Dependencies Audit:

#!/bin/bash
# Audit Python dependencies for security vulnerabilities

cd /opt/vpn-exit-controller
source venv/bin/activate

# Check for outdated packages
pip list --outdated

# Security audit (install pip-audit if not available)
pip install pip-audit
pip-audit

# Generate requirements with exact versions
pip freeze > requirements-frozen.txt

5. Certificate Management

SSL Certificate Monitoring

Check Certificate Expiration:

#!/bin/bash
# Check SSL certificate expiration

CERT_FILE="/opt/vpn-exit-controller/traefik/letsencrypt/acme.json"
ALERT_DAYS=30

if [ -f "$CERT_FILE" ]; then
    # Extract certificate expiration dates
    python3 << EOF
import json
import base64
import datetime
from cryptography import x509

with open('$CERT_FILE', 'r') as f:
    acme_data = json.load(f)

for resolver in acme_data.values():
    for cert_data in resolver.get('Certificates', []):
        cert_pem = base64.b64decode(cert_data['certificate']).decode()
        cert = x509.load_pem_x509_certificate(cert_pem.encode())

        domain = cert_data['domain']['main']
        expiry = cert.not_valid_after
        days_left = (expiry - datetime.datetime.now()).days

        print(f"Domain: {domain}, Expires: {expiry}, Days left: {days_left}")

        if days_left < $ALERT_DAYS:
            print(f"WARNING: Certificate for {domain} expires in {days_left} days")
EOF
fi

Let's Encrypt Certificate Renewal

Automatic Renewal Check:

#!/bin/bash
# Check Let's Encrypt certificate renewal

# Traefik handles automatic renewal, but verify it's working
TRAEFIK_LOG="/opt/vpn-exit-controller/traefik/logs/traefik.log"

# Check for recent renewal attempts
if [ -f "$TRAEFIK_LOG" ]; then
    echo "Recent certificate renewal attempts:"
    grep -i "certificate\|acme" "$TRAEFIK_LOG" | tail -10
fi

# Verify certificate is valid
DOMAIN="your-domain.com"  # Replace with actual domain
echo | openssl s_client -servername $DOMAIN -connect $DOMAIN:443 2>/dev/null | openssl x509 -noout -dates

Certificate Troubleshooting

Common Issues and Solutions:

#!/bin/bash
# Certificate troubleshooting script

echo "Certificate Troubleshooting Report"
echo "=================================="

# Check Traefik configuration
echo "1. Checking Traefik configuration..."
docker-compose -f /opt/vpn-exit-controller/traefik/docker-compose.traefik.yml config

# Check ACME challenge accessibility
echo "2. Checking ACME challenge accessibility..."
DOMAIN="your-domain.com"  # Replace with actual domain
curl -I "http://$DOMAIN/.well-known/acme-challenge/test"

# Check DNS resolution
echo "3. Checking DNS resolution..."
nslookup $DOMAIN

# Check port accessibility
echo "4. Checking port 443 accessibility..."
nc -zv $DOMAIN 443

# Check Traefik dashboard
echo "5. Checking Traefik dashboard..."
curl -I http://localhost:8080/dashboard/

6. VPN Service Management

NordVPN Credential Rotation

Monthly Credential Check:

#!/bin/bash
# Check and rotate NordVPN credentials if needed

AUTH_FILE="/opt/vpn-exit-controller/configs/auth.txt"
CURRENT_DATE=$(date +%s)
FILE_AGE=$(stat -c %Y "$AUTH_FILE")
AGE_DAYS=$(( (CURRENT_DATE - FILE_AGE) / 86400 ))

echo "NordVPN credentials are $AGE_DAYS days old"

if [ $AGE_DAYS -gt 60 ]; then
    echo "WARNING: NordVPN credentials are older than 60 days"
    echo "Consider updating credentials in $AUTH_FILE"

    # Test current credentials
    echo "Testing current credentials..."
    # This would require implementing a credential test function
fi

Credential Update Procedure:

#!/bin/bash
# Update NordVPN credentials

AUTH_FILE="/opt/vpn-exit-controller/configs/auth.txt"
BACKUP_FILE="/opt/backups/auth-backup-$(date +%Y%m%d).txt"

echo "Updating NordVPN credentials..."

# Backup current credentials
cp "$AUTH_FILE" "$BACKUP_FILE"

# Prompt for new credentials (in production, use secure input method)
echo "Enter new NordVPN username:"
read -r NEW_USERNAME
echo "Enter new NordVPN password:"
read -rs NEW_PASSWORD

# Update auth file
echo "$NEW_USERNAME" > "$AUTH_FILE"
echo "$NEW_PASSWORD" >> "$AUTH_FILE"

# Restart VPN containers to use new credentials
docker-compose -f /opt/vpn-exit-controller/docker-compose.yml restart

echo "Credentials updated. Testing connections..."

# Wait for containers to start
sleep 30

# Test API to verify VPN connections are working
curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/status

Server List Updates

Update NordVPN Server Configurations:

#!/bin/bash
# Update NordVPN server configurations

SCRIPT_PATH="/opt/vpn-exit-controller/scripts/download-nordvpn-configs.sh"
CONFIG_DIR="/opt/vpn-exit-controller/configs/vpn"

echo "Updating NordVPN server configurations..."

# Backup current configurations
tar -czf "/opt/backups/vpn-configs-backup-$(date +%Y%m%d).tar.gz" "$CONFIG_DIR"

# Run the download script
if [ -f "$SCRIPT_PATH" ]; then
    bash "$SCRIPT_PATH"
    echo "Server configurations updated"

    # Restart service to load new configurations
    systemctl restart vpn-controller
else
    echo "ERROR: Download script not found at $SCRIPT_PATH"
fi

Performance Optimization

VPN Node Performance Tuning:

#!/bin/bash
# VPN node performance optimization

echo "VPN Performance Optimization Report"
echo "==================================="

# Check connection speeds for each country
COUNTRIES=("us" "uk" "de" "jp" "au")

for country in "${COUNTRIES[@]}"; do
    echo "Testing $country nodes..."

    # Use speed test API endpoint
    SPEED_RESULT=$(curl -s -u admin:Bl4ckMagic!2345erver \
        "http://localhost:8080/api/speed-test/$country" | jq '.download_speed')

    echo "$country: $SPEED_RESULT Mbps"
done

# Check for underperforming nodes
echo -e "\nChecking for underperforming nodes..."
curl -s -u admin:Bl4ckMagic!2345erver \
    "http://localhost:8080/api/metrics/nodes" | \
    jq '.[] | select(.performance_score < 0.7) | {country: .country, score: .performance_score}'

Service Status Monitoring

Comprehensive VPN Service Check:

#!/bin/bash
# Comprehensive VPN service status check

STATUS_LOG="/var/log/vpn-service-status.log"
TIMESTAMP=$(date)

echo "[$TIMESTAMP] VPN Service Status Check" >> $STATUS_LOG

# Check each VPN container
VPN_CONTAINERS=$(docker ps --filter "name=vpn-" --format "{{.Names}}")

for container in $VPN_CONTAINERS; do
    # Check container health
    HEALTH=$(docker inspect --format='{{.State.Health.Status}}' $container 2>/dev/null || echo "no health check")

    # Check VPN connection
    IP_CHECK=$(docker exec $container curl -s --max-time 10 ifconfig.me 2>/dev/null || echo "failed")

    echo "[$TIMESTAMP] $container: Health=$HEALTH, IP=$IP_CHECK" >> $STATUS_LOG
done

# Check overall API health
API_HEALTH=$(curl -s -u admin:Bl4ckMagic!2345erver -o /dev/null -w "%{http_code}" http://localhost:8080/api/status)
echo "[$TIMESTAMP] API Health: HTTP $API_HEALTH" >> $STATUS_LOG

# Check Redis connectivity
REDIS_HEALTH=$(docker exec vpn-redis redis-cli ping 2>/dev/null || echo "FAILED")
echo "[$TIMESTAMP] Redis Health: $REDIS_HEALTH" >> $STATUS_LOG

7. Capacity Management

Resource Usage Monitoring

Real-time Resource Monitor:

#!/bin/bash
# Real-time resource monitoring script

MONITOR_LOG="/var/log/resource-monitor.log"

while true; do
    TIMESTAMP=$(date "+%Y-%m-%d %H:%M:%S")

    # System resources
    CPU_USAGE=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | awk -F'%' '{print $1}')
    MEM_USAGE=$(free | grep Mem | awk '{printf("%.1f"), $3/$2 * 100.0}')
    DISK_USAGE=$(df /opt | awk 'NR==2 {print $5}' | sed 's/%//')

    # Docker resources
    DOCKER_CONTAINERS=$(docker ps -q | wc -l)

    # Network connections
    CONNECTIONS=$(ss -tuln | wc -l)

    echo "$TIMESTAMP,$CPU_USAGE,$MEM_USAGE,$DISK_USAGE,$DOCKER_CONTAINERS,$CONNECTIONS" >> $MONITOR_LOG

    sleep 300  # 5-minute intervals
done

Scaling Decisions

Auto-scaling Triggers:

#!/bin/bash
# Auto-scaling decision logic

SCALE_LOG="/var/log/scaling-decisions.log"
TIMESTAMP=$(date)

# Get current metrics
CPU_USAGE=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | awk -F'%' '{print $1}')
ACTIVE_CONNECTIONS=$(curl -s -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/metrics/connections | jq '.active_connections')
RUNNING_NODES=$(docker ps --filter "name=vpn-" -q | wc -l)

echo "[$TIMESTAMP] Scaling Check: CPU=$CPU_USAGE%, Connections=$ACTIVE_CONNECTIONS, Nodes=$RUNNING_NODES" >> $SCALE_LOG

# Scaling up logic
if (( $(echo "$CPU_USAGE > 70" | bc -l) )) && [ "$ACTIVE_CONNECTIONS" -gt 100 ] && [ "$RUNNING_NODES" -lt 10 ]; then
    echo "[$TIMESTAMP] SCALE UP: High load detected" >> $SCALE_LOG
    # Trigger scale up via API
    curl -X POST -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/nodes/scale-up
fi

# Scaling down logic
if (( $(echo "$CPU_USAGE < 30" | bc -l) )) && [ "$ACTIVE_CONNECTIONS" -lt 20 ] && [ "$RUNNING_NODES" -gt 3 ]; then
    echo "[$TIMESTAMP] SCALE DOWN: Low load detected" >> $SCALE_LOG
    # Trigger scale down via API
    curl -X POST -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/nodes/scale-down
fi

Node Capacity Planning

Capacity Planning Analysis:

#!/bin/bash
# Node capacity planning analysis

REPORT_FILE="/var/log/capacity-analysis-$(date +%Y%m%d).log"

echo "Node Capacity Planning Analysis - $(date)" > $REPORT_FILE
echo "=========================================" >> $REPORT_FILE

# Current node utilization
echo -e "\nCurrent Node Utilization:" >> $REPORT_FILE
curl -s -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/metrics/nodes | \
    jq -r '.[] | "\(.country): \(.active_connections) connections, \(.cpu_usage)% CPU, \(.memory_usage)% Memory"' >> $REPORT_FILE

# Peak usage analysis (last 7 days)
echo -e "\nPeak Usage Analysis (Last 7 Days):" >> $REPORT_FILE

# Historical connection data
PEAK_CONNECTIONS=$(grep "connections" /var/log/resource-monitor.log | \
    tail -2016 | cut -d',' -f5 | sort -n | tail -1)  # Last 7 days of 5-min intervals

echo "Peak concurrent connections: $PEAK_CONNECTIONS" >> $REPORT_FILE

# Capacity recommendations
echo -e "\nCapacity Recommendations:" >> $REPORT_FILE

if [ "$PEAK_CONNECTIONS" -gt 500 ]; then
    echo "- Consider adding more VPN nodes for high-demand countries" >> $REPORT_FILE
fi

if [ "$(docker ps -q | wc -l)" -gt 15 ]; then
    echo "- Current container count is high, consider resource optimization" >> $REPORT_FILE
fi

echo "- Monitor load balancing effectiveness across regions" >> $REPORT_FILE

Performance Optimization

System Performance Tuning:

#!/bin/bash
# System performance optimization

echo "Applying performance optimizations..."

# Network optimizations
echo 'net.core.rmem_max = 16777216' >> /etc/sysctl.conf
echo 'net.core.wmem_max = 16777216' >> /etc/sysctl.conf
echo 'net.ipv4.tcp_rmem = 4096 65536 16777216' >> /etc/sysctl.conf
echo 'net.ipv4.tcp_wmem = 4096 65536 16777216' >> /etc/sysctl.conf

# Apply changes
sysctl -p

# Docker performance optimizations
echo '{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  },
  "default-ulimits": {
    "nofile": {
      "Name": "nofile",
      "Hard": 64000,
      "Soft": 64000
    }
  }
}' > /etc/docker/daemon.json

systemctl restart docker

echo "Performance optimizations applied"

8. Security Maintenance

Security Patch Management

Critical Security Updates:

#!/bin/bash
# Critical security update management

SECURITY_LOG="/var/log/security-updates.log"
TIMESTAMP=$(date)

echo "[$TIMESTAMP] Starting critical security check" >> $SECURITY_LOG

# Check for available security updates
SECURITY_UPDATES=$(apt list --upgradable 2>/dev/null | grep -i security | wc -l)

if [ "$SECURITY_UPDATES" -gt 0 ]; then
    echo "[$TIMESTAMP] $SECURITY_UPDATES security updates available" >> $SECURITY_LOG

    # Apply critical security updates immediately
    DEBIAN_FRONTEND=noninteractive apt-get -y install \
        $(apt list --upgradable 2>/dev/null | grep -i security | cut -d/ -f1) \
        >> $SECURITY_LOG 2>&1

    # Check if reboot is required
    if [ -f /var/run/reboot-required ]; then
        echo "[$TIMESTAMP] CRITICAL: Reboot required for security updates" >> $SECURITY_LOG
        # Schedule reboot during maintenance window
        shutdown -r +60 "Security updates require reboot"
    fi
else
    echo "[$TIMESTAMP] No security updates available" >> $SECURITY_LOG
fi

# Update Docker base images for security
docker-compose -f /opt/vpn-exit-controller/docker-compose.yml pull >> $SECURITY_LOG 2>&1

Credential Rotation Schedules

Quarterly Credential Rotation:

#!/bin/bash
# Quarterly credential rotation

ROTATION_LOG="/var/log/credential-rotation.log"
TIMESTAMP=$(date)

echo "[$TIMESTAMP] Starting quarterly credential rotation" >> $ROTATION_LOG

# 1. API Admin Password Rotation
echo "[$TIMESTAMP] Rotating API admin password" >> $ROTATION_LOG
NEW_API_PASSWORD=$(openssl rand -base64 32)

# Update environment file
sed -i "s/ADMIN_PASS=.*/ADMIN_PASS=$NEW_API_PASSWORD/" /opt/vpn-exit-controller/.env

# 2. Redis Password (if used)
# NEW_REDIS_PASSWORD=$(openssl rand -base64 32)

# 3. Secret Key Rotation
NEW_SECRET_KEY=$(openssl rand -base64 64)
sed -i "s/SECRET_KEY=.*/SECRET_KEY=$NEW_SECRET_KEY/" /opt/vpn-exit-controller/.env

# 4. Restart services with new credentials
systemctl restart vpn-controller

echo "[$TIMESTAMP] Credential rotation completed" >> $ROTATION_LOG
echo "[$TIMESTAMP] New API password: $NEW_API_PASSWORD" >> $ROTATION_LOG

Access Review Procedures

Monthly Access Review:

#!/bin/bash
# Monthly access review

REVIEW_LOG="/var/log/access-review-$(date +%Y%m).log"

echo "Monthly Access Review - $(date)" > $REVIEW_LOG
echo "=================================" >> $REVIEW_LOG

# Review systemd service permissions
echo -e "\nService File Permissions:" >> $REVIEW_LOG
ls -la /etc/systemd/system/vpn-controller.service >> $REVIEW_LOG

# Review application directory permissions
echo -e "\nApplication Directory Permissions:" >> $REVIEW_LOG
ls -la /opt/vpn-exit-controller/ >> $REVIEW_LOG

# Review Docker socket access
echo -e "\nDocker Socket Access:" >> $REVIEW_LOG
ls -la /var/run/docker.sock >> $REVIEW_LOG

# Review network exposure
echo -e "\nNetwork Exposure:" >> $REVIEW_LOG
ss -tuln >> $REVIEW_LOG

# Review recent authentication attempts
echo -e "\nRecent Authentication Attempts:" >> $REVIEW_LOG
journalctl -u vpn-controller --since "30 days ago" | grep -i "auth\|login" | tail -20 >> $REVIEW_LOG

Security Audit Schedules

Comprehensive Security Audit:

#!/bin/bash
# Comprehensive security audit

AUDIT_LOG="/var/log/security-audit-$(date +%Y%m%d).log"

echo "Security Audit Report - $(date)" > $AUDIT_LOG
echo "===============================" >> $AUDIT_LOG

# 1. System vulnerabilities scan
echo -e "\n1. System Vulnerability Scan:" >> $AUDIT_LOG
if command -v lynis &> /dev/null; then
    lynis audit system --quiet >> $AUDIT_LOG
else
    echo "Lynis not installed. Install with: apt install lynis" >> $AUDIT_LOG
fi

# 2. Open ports audit
echo -e "\n2. Open Ports Audit:" >> $AUDIT_LOG
nmap -sS -p- localhost >> $AUDIT_LOG 2>&1

# 3. Docker security audit
echo -e "\n3. Docker Security Audit:" >> $AUDIT_LOG
docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \
    -v /usr/lib/systemd:/usr/lib/systemd \
    -v /etc:/etc --label docker_bench_security \
    docker/docker-bench-security >> $AUDIT_LOG 2>/dev/null || echo "Docker Bench Security not available" >> $AUDIT_LOG

# 4. File permissions audit
echo -e "\n4. Critical File Permissions:" >> $AUDIT_LOG
CRITICAL_FILES=(
    "/opt/vpn-exit-controller/.env"
    "/opt/vpn-exit-controller/configs/auth.txt"
    "/etc/systemd/system/vpn-controller.service"
)

for file in "${CRITICAL_FILES[@]}"; do
    if [ -f "$file" ]; then
        ls -la "$file" >> $AUDIT_LOG
    fi
done

# 5. Process audit
echo -e "\n5. Running Processes:" >> $AUDIT_LOG
ps aux | grep -E "(vpn|docker|redis)" >> $AUDIT_LOG

echo -e "\nSecurity audit completed. Review $AUDIT_LOG for findings." >> $AUDIT_LOG

9. Documentation Updates

Keeping Documentation Current

Documentation Update Checklist:

## Monthly Documentation Review Checklist

- [ ] Review API_DOCUMENTATION.md for new endpoints
- [ ] Update ARCHITECTURE.md for infrastructure changes
- [ ] Check DEPLOYMENT.md for new deployment procedures
- [ ] Verify SECURITY.md reflects current security measures
- [ ] Update this MAINTENANCE.md for new procedures
- [ ] Review configuration examples for accuracy
- [ ] Update version numbers and dependencies
- [ ] Check command examples are current
- [ ] Verify troubleshooting guides are accurate
- [ ] Update contact information and escalation procedures

Automated Documentation Validation:

#!/bin/bash
# Validate documentation accuracy

DOC_DIR="/opt/vpn-exit-controller"
VALIDATION_LOG="/var/log/doc-validation.log"
TIMESTAMP=$(date)

echo "[$TIMESTAMP] Starting documentation validation" > $VALIDATION_LOG

# Check if mentioned files exist
DOCS=("API_DOCUMENTATION.md" "ARCHITECTURE.md" "DEPLOYMENT.md" "SECURITY.md" "MAINTENANCE.md")

for doc in "${DOCS[@]}"; do
    if [ -f "$DOC_DIR/$doc" ]; then
        echo "[$TIMESTAMP] ✓ $doc exists" >> $VALIDATION_LOG

        # Check for outdated information
        LAST_MODIFIED=$(stat -c %Y "$DOC_DIR/$doc")
        CURRENT_TIME=$(date +%s)
        DAYS_OLD=$(( (CURRENT_TIME - LAST_MODIFIED) / 86400 ))

        if [ $DAYS_OLD -gt 90 ]; then
            echo "[$TIMESTAMP] ⚠ $doc is $DAYS_OLD days old (review needed)" >> $VALIDATION_LOG
        fi
    else
        echo "[$TIMESTAMP] ✗ $doc missing" >> $VALIDATION_LOG
    fi
done

# Validate command examples in documentation
echo "[$TIMESTAMP] Validating command examples..." >> $VALIDATION_LOG

# Test API endpoint mentioned in docs
if curl -s -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/status > /dev/null; then
    echo "[$TIMESTAMP] ✓ API endpoint accessible" >> $VALIDATION_LOG
else
    echo "[$TIMESTAMP] ✗ API endpoint not accessible" >> $VALIDATION_LOG
fi

# Test service commands
if systemctl is-active --quiet vpn-controller; then
    echo "[$TIMESTAMP] ✓ vpn-controller service running" >> $VALIDATION_LOG
else
    echo "[$TIMESTAMP] ✗ vpn-controller service not running" >> $VALIDATION_LOG
fi

Change Log Maintenance

Automated Change Log Updates:

#!/bin/bash
# Update CHANGELOG.md with recent changes

CHANGELOG="/opt/vpn-exit-controller/CHANGELOG.md"
TEMP_LOG="/tmp/changelog-temp.md"

# Get recent commits (if using git)
if [ -d ".git" ]; then
    echo "## $(date +%Y-%m-%d) - Maintenance Update" > $TEMP_LOG
    echo "" >> $TEMP_LOG

    # Add git log entries
    git log --since="1 month ago" --pretty=format:"- %s (%an)" >> $TEMP_LOG
    echo "" >> $TEMP_LOG
    echo "" >> $TEMP_LOG

    # Prepend to existing changelog
    if [ -f "$CHANGELOG" ]; then
        cat "$CHANGELOG" >> $TEMP_LOG
        mv "$TEMP_LOG" "$CHANGELOG"
    else
        mv "$TEMP_LOG" "$CHANGELOG"
    fi
fi

Configuration Documentation

Auto-generate Configuration Documentation:

#!/bin/bash
# Generate current configuration documentation

CONFIG_DOC="/opt/vpn-exit-controller/CONFIG_CURRENT.md"

echo "# Current System Configuration" > $CONFIG_DOC
echo "Generated on: $(date)" >> $CONFIG_DOC
echo "" >> $CONFIG_DOC

# System information
echo "## System Information" >> $CONFIG_DOC
echo "- **OS**: $(lsb_release -d | cut -f2)" >> $CONFIG_DOC
echo "- **Kernel**: $(uname -r)" >> $CONFIG_DOC
echo "- **Docker Version**: $(docker --version)" >> $CONFIG_DOC
echo "- **Python Version**: $(python3 --version)" >> $CONFIG_DOC
echo "" >> $CONFIG_DOC

# Service status
echo "## Service Status" >> $CONFIG_DOC
echo "\`\`\`" >> $CONFIG_DOC
systemctl status vpn-controller --no-pager >> $CONFIG_DOC
echo "\`\`\`" >> $CONFIG_DOC
echo "" >> $CONFIG_DOC

# Docker containers
echo "## Running Containers" >> $CONFIG_DOC
echo "\`\`\`" >> $CONFIG_DOC
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}" >> $CONFIG_DOC
echo "\`\`\`" >> $CONFIG_DOC
echo "" >> $CONFIG_DOC

# Network configuration
echo "## Network Configuration" >> $CONFIG_DOC
echo "\`\`\`" >> $CONFIG_DOC
ip addr show | grep -E "(inet|link)" >> $CONFIG_DOC
echo "\`\`\`" >> $CONFIG_DOC

10. Emergency Procedures

System Outage Response

Emergency Response Playbook:

#!/bin/bash
# Emergency response script for system outages

EMERGENCY_LOG="/var/log/emergency-response.log"
TIMESTAMP=$(date)

echo "[$TIMESTAMP] EMERGENCY: System outage detected" >> $EMERGENCY_LOG

# 1. Immediate Assessment
echo "[$TIMESTAMP] Step 1: Immediate assessment" >> $EMERGENCY_LOG

# Check if container is running
if pct status 201 | grep -q "status: running"; then
    echo "[$TIMESTAMP] ✓ LXC container is running" >> $EMERGENCY_LOG
else
    echo "[$TIMESTAMP] ✗ LXC container is stopped - starting now" >> $EMERGENCY_LOG
    pct start 201
    sleep 30
fi

# Check core services
SERVICES=("docker" "vpn-controller")
for service in "${SERVICES[@]}"; do
    if systemctl is-active --quiet $service; then
        echo "[$TIMESTAMP] ✓ $service is running" >> $EMERGENCY_LOG
    else
        echo "[$TIMESTAMP] ✗ $service is stopped - restarting" >> $EMERGENCY_LOG
        systemctl restart $service
    fi
done

# 2. Quick Recovery Attempt
echo "[$TIMESTAMP] Step 2: Quick recovery attempt" >> $EMERGENCY_LOG

# Restart VPN controller
systemctl restart vpn-controller
sleep 60

# Test API
if curl -s -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/status > /dev/null; then
    echo "[$TIMESTAMP] ✓ API is responding - emergency recovery successful" >> $EMERGENCY_LOG
    exit 0
fi

# 3. Docker Recovery
echo "[$TIMESTAMP] Step 3: Docker container recovery" >> $EMERGENCY_LOG

cd /opt/vpn-exit-controller
docker-compose down
docker-compose up -d

sleep 120

# Test again
if curl -s -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/status > /dev/null; then
    echo "[$TIMESTAMP] ✓ API responding after Docker restart" >> $EMERGENCY_LOG
    exit 0
fi

# 4. Full System Recovery
echo "[$TIMESTAMP] Step 4: Full system recovery required" >> $EMERGENCY_LOG
echo "[$TIMESTAMP] Escalating to manual intervention" >> $EMERGENCY_LOG

# Generate diagnostic report
/opt/vpn-exit-controller/scripts/generate-diagnostic-report.sh >> $EMERGENCY_LOG

echo "[$TIMESTAMP] Emergency procedures completed - manual intervention required" >> $EMERGENCY_LOG

Emergency Contact Procedures

Contact List and Escalation:

## Emergency Contact Procedures

### Severity Levels

#### P1 - Critical (Response Time: 15 minutes)
- Complete system outage
- Security breach
- Data loss

#### P2 - High (Response Time: 2 hours)
- Partial system outage
- Performance degradation > 50%
- Single node failures affecting service

#### P3 - Medium (Response Time: 24 hours)
- Minor performance issues
- Single container failures
- Non-critical errors

#### P4 - Low (Response Time: 72 hours)
- Cosmetic issues
- Documentation updates
- Feature requests

### Contact Information

**Primary On-Call Engineer**
- Name: [Your Name]
- Phone: [Phone Number]
- Email: [Email Address]
- Signal/WhatsApp: [Number]

**Secondary Engineer**
- Name: [Backup Name]
- Phone: [Phone Number]
- Email: [Email Address]

**Infrastructure Team**
- Email: [email protected]
- Slack: #infrastructure-alerts

### Escalation Matrix

1. **0-15 minutes**: Primary engineer response
2. **15-30 minutes**: Escalate to secondary engineer
3. **30-60 minutes**: Escalate to infrastructure team lead
4. **1+ hours**: Escalate to engineering manager

Critical Issue Escalation

Automated Escalation Script:

#!/bin/bash
# Automated escalation for critical issues

ISSUE_TYPE="$1"
SEVERITY="$2"
DETAILS="$3"

ESCALATION_LOG="/var/log/escalation.log"
TIMESTAMP=$(date)

echo "[$TIMESTAMP] ESCALATION: $ISSUE_TYPE (Severity: $SEVERITY)" >> $ESCALATION_LOG
echo "[$TIMESTAMP] Details: $DETAILS" >> $ESCALATION_LOG

# Generate system snapshot
SNAPSHOT_FILE="/tmp/system-snapshot-$(date +%Y%m%d-%H%M%S).txt"

cat > $SNAPSHOT_FILE << EOF
EMERGENCY SYSTEM SNAPSHOT
========================
Time: $(date)
Issue: $ISSUE_TYPE
Severity: $SEVERITY
Details: $DETAILS

System Status:
$(systemctl status vpn-controller --no-pager)

Docker Status:
$(docker ps -a)

Resource Usage:
$(free -h)
$(df -h)

Recent Logs:
$(journalctl -u vpn-controller --since "1 hour ago" --no-pager | tail -50)

Network Status:
$(ss -tuln)
EOF

# Send alerts based on severity
case $SEVERITY in
    "P1"|"CRITICAL")
        # Immediate alerts
        echo "[$TIMESTAMP] Sending P1 alerts" >> $ESCALATION_LOG
        # curl -X POST webhook-url or send email
        ;;
    "P2"|"HIGH")
        echo "[$TIMESTAMP] Sending P2 alerts" >> $ESCALATION_LOG
        # Send high priority alerts
        ;;
    *)
        echo "[$TIMESTAMP] Logging issue for review" >> $ESCALATION_LOG
        ;;
esac

echo "[$TIMESTAMP] Escalation process initiated" >> $ESCALATION_LOG

Recovery Time Objectives

RTO/RPO Definitions:

## Recovery Time and Point Objectives

### Recovery Time Objectives (RTO)

| Component | Target RTO | Maximum Acceptable |
|-----------|------------|-------------------|
| VPN API Service | 5 minutes | 15 minutes |
| Individual VPN Nodes | 2 minutes | 5 minutes |
| Database (Redis) | 3 minutes | 10 minutes |
| Full System | 15 minutes | 30 minutes |
| Container Infrastructure | 10 minutes | 20 minutes |

### Recovery Point Objectives (RPO)

| Data Type | Target RPO | Backup Frequency |
|-----------|------------|------------------|
| Configuration Files | 0 minutes | Real-time sync |
| System Metrics | 5 minutes | Continuous |
| Connection Logs | 15 minutes | Every 15 minutes |
| Performance Data | 1 hour | Hourly snapshots |

### Service Level Objectives (SLO)

- **Availability**: 99.9% (8.76 hours downtime/year)
- **API Response Time**: < 500ms (95th percentile)
- **VPN Connection Success Rate**: > 99%
- **Data Recovery Success Rate**: 100%

### Disaster Recovery Scenarios

#### Scenario 1: Complete LXC Container Failure
- **RTO**: 30 minutes
- **Procedure**: Restore from latest backup to new container
- **Validation**: Full service test suite

#### Scenario 2: Docker Infrastructure Failure
- **RTO**: 15 minutes
- **Procedure**: Restart Docker daemon, rebuild containers
- **Validation**: Container health checks

#### Scenario 3: Database Corruption
- **RTO**: 10 minutes
- **Procedure**: Restore Redis from latest backup
- **Validation**: Data integrity verification

#### Scenario 4: Configuration Corruption
- **RTO**: 5 minutes
- **Procedure**: Restore from configuration backup
- **Validation**: Service functionality test

Maintenance Schedule Summary

Daily (Automated)

  • ✅ Health checks (6:00 AM)
  • ✅ Resource monitoring (Every 5 minutes)
  • ✅ Log rotation checks
  • ✅ Basic performance metrics

Weekly

  • ✅ Performance review (Sunday 2:00 AM)
  • ✅ Security updates (Sunday 3:00 AM)
  • ✅ Configuration backup verification
  • ✅ VPN node performance analysis

Monthly

  • ✅ Full system updates (First Sunday 3:00 AM)
  • ✅ Access review
  • ✅ Documentation validation
  • ✅ Capacity planning review
  • ✅ Recovery testing

Quarterly

  • ✅ Comprehensive security audit
  • ✅ Credential rotation
  • ✅ Disaster recovery testing
  • ✅ Performance optimization review
  • ✅ Documentation major updates

Quick Reference Commands

Emergency Commands

# Emergency service restart
systemctl restart vpn-controller

# Check service status
systemctl status vpn-controller

# View real-time logs
journalctl -u vpn-controller -f

# Test API health
curl -u admin:Bl4ckMagic!2345erver http://localhost:8080/api/status

# Docker container status
docker ps -a

# Resource usage
htop
df -h
free -h

Maintenance Commands

# Activate Python environment
source /opt/vpn-exit-controller/venv/bin/activate

# Update system packages
apt update && apt upgrade -y

# Restart all containers
docker-compose -f /opt/vpn-exit-controller/docker-compose.yml restart

# Create configuration backup
tar -czf backup-$(date +%Y%m%d).tar.gz /opt/vpn-exit-controller/configs/

# Check certificate expiration
openssl x509 -in cert.pem -noout -dates

Document Version: 1.0
Last Updated: $(date)
Next Review: $(date -d "+1 month")


This maintenance guide should be reviewed and updated monthly to ensure accuracy and completeness. All procedures should be tested in a development environment before applying to production.