Documentation > Server > Troubleshooting

Troubleshooting Guide

Common issues, debugging techniques, and comprehensive solutions for SysManage server problems.

Quick Diagnostic Steps

Before diving into specific issues, follow these general diagnostic steps:

🔍 Initial Checks

  1. Check Service Status: Verify SysManage services are running
  2. Review Logs: Check application and system logs for errors
  3. Network Connectivity: Test network connections and firewall rules
  4. Database Connection: Verify database connectivity and status
  5. Resource Usage: Check CPU, memory, and disk usage
  6. Configuration Validation: Verify configuration file syntax and values

📋 Quick Diagnostic Commands

# Check service status
systemctl status sysmanage

# View recent logs
journalctl -u sysmanage -n 50

# Check database connection
psql -h localhost -U sysmanage -d sysmanage -c "SELECT version();"

# Test web interface
curl -k https://localhost:6443/health

# Check disk space
df -h

# Monitor resources
htop

Installation Issues

🔧 Dependency Problems

Python Version Compatibility

Symptoms: Import errors, syntax errors, or package installation failures

Cause: Using unsupported Python version (3.13 not yet supported)

Solution:
# Check Python version
python3 --version

# Install supported Python version (Ubuntu/Debian)
sudo apt install python3.12 python3.12-venv

# Create virtual environment with correct version
python3.12 -m venv .venv

PostgreSQL Installation Issues

Symptoms: Cannot connect to database, authentication failures

Cause: PostgreSQL not properly installed or configured

Solution:
# Install PostgreSQL (Ubuntu/Debian)
sudo apt install postgresql postgresql-contrib

# Start PostgreSQL service
sudo systemctl start postgresql
sudo systemctl enable postgresql

# Create database user
sudo -u postgres createuser -P sysmanage
sudo -u postgres createdb -O sysmanage sysmanage

# Test connection
psql -h localhost -U sysmanage -d sysmanage

Node.js/npm Issues

Symptoms: Frontend build failures, npm install errors

Cause: Outdated Node.js version or npm cache corruption

Solution:
# Install Node.js 20.x
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt install -y nodejs

# Clear npm cache
npm cache clean --force

# Remove node_modules and reinstall
rm -rf node_modules package-lock.json
npm install

Configuration Issues

⚙️ Configuration File Problems

Invalid YAML Syntax

Symptoms: Server fails to start with YAML parsing errors

Cause: Malformed YAML in configuration file

Solution:
# Validate YAML syntax
python3 -c "import yaml; yaml.safe_load(open('/etc/sysmanage.yaml'))"

# Common YAML issues:
# - Incorrect indentation (use spaces, not tabs)
# - Missing quotes around strings with special characters
# - Inconsistent list formatting

# Example correct format:
database:
  host: "localhost"
  port: 5432
  name: "sysmanage"
  user: "sysmanage"
  password: "your_password"

SSL Certificate Issues

Symptoms: HTTPS connection failures, certificate errors

Cause: Invalid or expired SSL certificates

Solution:
# Check certificate validity
openssl x509 -in /etc/ssl/certs/sysmanage.crt -text -noout

# Verify certificate and key match
openssl x509 -noout -modulus -in /etc/ssl/certs/sysmanage.crt | openssl md5
openssl rsa -noout -modulus -in /etc/ssl/private/sysmanage.key | openssl md5

# Generate self-signed certificate for testing
openssl req -x509 -newkey rsa:4096 -keyout /etc/ssl/private/sysmanage.key \
  -out /etc/ssl/certs/sysmanage.crt -days 365 -nodes

Database Connection Configuration

Symptoms: Database connection timeouts or authentication errors

Cause: Incorrect database configuration parameters

Solution:
# Test database connection manually
psql "postgresql://sysmanage:password@localhost:5432/sysmanage"

# Check PostgreSQL configuration
sudo -u postgres psql -c "SHOW hba_file;"
sudo cat /etc/postgresql/*/main/pg_hba.conf

# Add connection rule if needed
echo "host sysmanage sysmanage 127.0.0.1/32 md5" | sudo tee -a /etc/postgresql/*/main/pg_hba.conf

# Restart PostgreSQL
sudo systemctl restart postgresql

Runtime Issues

🚨 Service Startup Problems

Service Won't Start

Symptoms: SysManage service fails to start or crashes immediately

Cause: Configuration errors, missing dependencies, or permission issues

Diagnosis:
# Check service status and logs
systemctl status sysmanage
journalctl -u sysmanage -f

# Run manually for debugging
cd /path/to/sysmanage
source .venv/bin/activate
python -m uvicorn backend.main:app --host 0.0.0.0 --port 6443
Common Solutions:
  • Check file permissions on configuration and log directories
  • Verify all required environment variables are set
  • Ensure database is accessible and credentials are correct
  • Check for port conflicts (port 6443 already in use)

High Memory Usage

Symptoms: Server becomes unresponsive, OOM killer terminates process

Cause: Memory leaks, inefficient queries, or insufficient system resources

Diagnosis:
# Monitor memory usage
htop
ps aux | grep sysmanage

# Check for memory leaks
valgrind --tool=memcheck python -m uvicorn backend.main:app

# PostgreSQL memory usage
sudo -u postgres psql -c "SELECT * FROM pg_stat_activity;"
Solutions:
  • Increase system memory or add swap space
  • Optimize database queries and add indexes
  • Configure connection pooling limits
  • Review and optimize WebSocket connection handling

Performance Issues

Symptoms: Slow response times, timeouts, high CPU usage

Cause: Database performance, inefficient code, or resource constraints

Performance Analysis:
# Monitor system performance
iostat -x 1
sar -u 1 10

# Database performance
sudo -u postgres psql -c "SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;"

# Application profiling
python -m cProfile -o profile.stats -m uvicorn backend.main:app
Optimization Steps:
  • Add database indexes for frequently queried columns
  • Implement query result caching
  • Optimize WebSocket message handling
  • Configure appropriate worker processes

Network & Connectivity Issues

🌐 Connection Problems

WebSocket Connection Failures

Symptoms: Agents cannot connect, frequent disconnections

Cause: Firewall issues, proxy problems, or network instability

Diagnosis:
# Test WebSocket connection
curl -i -N -H "Connection: Upgrade" -H "Upgrade: websocket" \
  -H "Sec-WebSocket-Key: SGVsbG8sIHdvcmxkIQ==" \
  -H "Sec-WebSocket-Version: 13" \
  https://localhost:6443/ws

# Check firewall rules
sudo ufw status
sudo iptables -L

# Test network connectivity
telnet your-server 6443
nc -zv your-server 6443
Solutions:
  • Configure firewall to allow WebSocket connections
  • Check proxy server WebSocket support
  • Verify SSL certificate validity for secure WebSockets
  • Implement connection retry logic with backoff

Agent Registration Issues

Symptoms: Agents fail to register or appear offline

Cause: Authentication problems, network issues, or configuration mismatches

Diagnosis:
# Check agent logs
tail -f /var/log/sysmanage-agent/agent.log

# Test agent API endpoint
curl -k -X POST https://server:6443/api/agents/register \
  -H "Content-Type: application/json" \
  -d '{"hostname": "test", "token": "agent-token"}'

# Verify agent configuration
cat /etc/sysmanage-agent/config.yaml
Solutions:
  • Verify agent authentication tokens
  • Check server hostname/IP configuration
  • Ensure proper SSL certificate validation
  • Review agent and server version compatibility

Load Balancer Issues

Symptoms: Inconsistent behavior, session problems with multiple servers

Cause: Session affinity issues or health check failures

Load Balancer Configuration:
# Nginx health check configuration
location /health {
    proxy_pass https://backend;
    proxy_set_header Host $host;
    access_log off;
}

# HAProxy health check
backend sysmanage
    option httpchk GET /health
    server app1 10.0.1.10:6443 check
    server app2 10.0.1.11:6443 check
Solutions:
  • Configure sticky sessions for WebSocket connections
  • Implement proper health check endpoints
  • Use shared session storage (Redis/database)
  • Configure appropriate timeout values

Database Issues

🗃️ Database Problems

Connection Pool Exhaustion

Symptoms: "Too many connections" errors, connection timeouts

Cause: Insufficient connection pool size or connection leaks

Diagnosis:
# Check PostgreSQL connections
sudo -u postgres psql -c "SELECT count(*) FROM pg_stat_activity;"
sudo -u postgres psql -c "SHOW max_connections;"

# Monitor connection usage
watch "sudo -u postgres psql -c 'SELECT count(*) FROM pg_stat_activity;'"
Solutions:
  • Increase PostgreSQL max_connections setting
  • Configure connection pooling in application
  • Implement connection timeout and cleanup
  • Review code for connection leaks

Slow Query Performance

Symptoms: Long response times, database timeouts

Cause: Missing indexes, complex queries, or database locks

Performance Analysis:
# Enable query logging
sudo -u postgres psql -c "ALTER SYSTEM SET log_statement = 'all';"
sudo -u postgres psql -c "ALTER SYSTEM SET log_min_duration_statement = 1000;"
sudo systemctl reload postgresql

# Analyze slow queries
sudo -u postgres psql -c "SELECT query, mean_time, calls FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 10;"

# Check for locks
sudo -u postgres psql -c "SELECT * FROM pg_locks WHERE NOT granted;"
Optimization:
  • Add indexes for frequently queried columns
  • Use EXPLAIN ANALYZE to understand query plans
  • Optimize JOIN operations and WHERE clauses
  • Implement query result caching

Database Migration Issues

Symptoms: Migration failures, schema inconsistencies

Cause: Interrupted migrations, version conflicts, or data issues

Migration Troubleshooting:
# Check migration status
alembic current
alembic history

# Show pending migrations
alembic show

# Force migration to specific version
alembic stamp head

# Backup before migration
pg_dump -h localhost -U sysmanage sysmanage > backup_pre_migration.sql
Solutions:
  • Always backup database before migrations
  • Run migrations in maintenance mode
  • Test migrations on copy of production data
  • Have rollback plan ready

Authentication & Security Issues

🔐 Security Problems

JWT Token Issues

Symptoms: Authentication failures, token expired errors

Cause: Invalid tokens, clock skew, or configuration issues

Token Debugging:
# Decode JWT token (use jwt.io or)
python3 -c "
import jwt
import json
token = 'your-jwt-token-here'
decoded = jwt.decode(token, options={'verify_signature': False})
print(json.dumps(decoded, indent=2))
"

# Check system time synchronization
timedatectl status
ntpq -p
Solutions:
  • Verify JWT secret key configuration
  • Check token expiration settings
  • Synchronize system clocks (NTP)
  • Implement token refresh mechanism

Permission Denied Errors

Symptoms: 403 Forbidden errors, access denied messages

Cause: Incorrect RBAC configuration or user permissions

Permission Debugging:
# Check user roles and permissions
sudo -u postgres psql sysmanage -c "
SELECT u.userid, r.name as role, p.name as permission
FROM users u
JOIN user_roles ur ON u.id = ur.user_id
JOIN roles r ON ur.role_id = r.id
JOIN role_permissions rp ON r.id = rp.role_id
JOIN permissions p ON rp.permission_id = p.id
WHERE u.userid = 'username@example.com';
"
Solutions:
  • Review user role assignments
  • Verify permission definitions
  • Check for inherited permissions
  • Validate RBAC policy configuration

SSL/TLS Certificate Problems

Symptoms: Certificate warnings, HTTPS connection failures

Cause: Expired certificates, hostname mismatches, or chain issues

Certificate Validation:
# Check certificate details
openssl x509 -in /etc/ssl/certs/sysmanage.crt -text -noout

# Test certificate chain
openssl s_client -connect your-server:6443 -servername your-server

# Verify certificate expiration
openssl x509 -in /etc/ssl/certs/sysmanage.crt -noout -dates
Solutions:
  • Renew expired certificates
  • Ensure hostname matches certificate CN/SAN
  • Include intermediate certificates in chain
  • Configure automatic certificate renewal

Frontend Issues

🖥️ Web Interface Problems

JavaScript/React Errors

Symptoms: White screen, console errors, component failures

Cause: JavaScript errors, missing dependencies, or build issues

Debugging:
# Check browser console for errors
# Press F12 and look at Console tab

# Rebuild frontend
cd frontend
npm run build

# Check for TypeScript errors
npm run type-check

# Run development server for detailed errors
npm run dev
Solutions:
  • Clear browser cache and cookies
  • Rebuild frontend assets
  • Check for missing environment variables
  • Verify API endpoint connectivity

API Communication Errors

Symptoms: Failed API requests, CORS errors, network timeouts

Cause: CORS configuration, network issues, or API errors

Network Debugging:
# Check browser network tab for failed requests
# Press F12 and look at Network tab

# Test API endpoints directly
curl -k -H "Authorization: Bearer your-token" \
  https://localhost:6443/api/hosts

# Check CORS headers
curl -k -H "Origin: https://your-frontend-domain" \
  -H "Access-Control-Request-Method: POST" \
  -H "Access-Control-Request-Headers: X-Requested-With" \
  -X OPTIONS https://localhost:6443/api/hosts
Solutions:
  • Configure CORS settings in backend
  • Check API authentication tokens
  • Verify network connectivity and DNS
  • Review proxy configuration if applicable

WebSocket Connection Issues

Symptoms: Real-time updates not working, connection errors

Cause: WebSocket connection failures or proxy issues

WebSocket Testing:
# Test WebSocket in browser console
const ws = new WebSocket('wss://localhost:6443/ws');
ws.onopen = () => console.log('Connected');
ws.onmessage = (event) => console.log('Message:', event.data);
ws.onerror = (error) => console.log('Error:', error);

# Check proxy WebSocket support (Nginx)
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
Solutions:
  • Configure proxy for WebSocket support
  • Check firewall WebSocket rules
  • Implement connection retry logic
  • Verify SSL certificate for WSS connections

Log Analysis

📋 Log Locations

# SysManage application logs
/var/log/sysmanage/sysmanage.log
/var/log/sysmanage/error.log
/var/log/sysmanage/access.log

# System logs
journalctl -u sysmanage
/var/log/syslog
/var/log/messages

# PostgreSQL logs
/var/log/postgresql/postgresql-*.log

# Nginx logs (if used)
/var/log/nginx/access.log
/var/log/nginx/error.log

# Agent logs
/var/log/sysmanage-agent/agent.log

🔍 Log Analysis Commands

# Real-time log monitoring
tail -f /var/log/sysmanage/sysmanage.log

# Search for errors
grep -i error /var/log/sysmanage/sysmanage.log

# Filter by date range
journalctl -u sysmanage --since "2024-01-01" --until "2024-01-02"

# Count error occurrences
grep -c "ERROR" /var/log/sysmanage/sysmanage.log

# Extract IP addresses from access logs
awk '{print $1}' /var/log/sysmanage/access.log | sort | uniq -c | sort -nr

📊 Common Log Patterns

Error Patterns to Look For:

  • Database Errors: "connection refused", "timeout", "deadlock"
  • Authentication: "unauthorized", "forbidden", "invalid token"
  • WebSocket: "connection closed", "handshake failed"
  • Performance: "timeout", "slow query", "high memory"

Getting Help

📞 Support Channels

📋 Information to Include

When seeking help, please include:

  • SysManage version and installation method
  • Operating system and version
  • Complete error messages and stack traces
  • Relevant log excerpts
  • Steps to reproduce the issue
  • Configuration files (redact sensitive information)

🔧 Debug Information Script

#!/bin/bash
# debug_info.sh - Collect debug information

echo "=== SysManage Debug Information ==="
echo "Date: $(date)"
echo "Hostname: $(hostname)"
echo "OS: $(uname -a)"
echo
echo "=== Service Status ==="
systemctl status sysmanage
echo
echo "=== Recent Logs ==="
journalctl -u sysmanage -n 20 --no-pager
echo
echo "=== Database Connection ==="
psql -h localhost -U sysmanage -d sysmanage -c "SELECT version();" 2>&1
echo
echo "=== Disk Usage ==="
df -h
echo
echo "=== Memory Usage ==="
free -h