Troubleshooting Guide
Common issues, debugging techniques, and comprehensive solutions for SysManage server problems.
Quick Diagnostic Steps
Before diving into specific issues, follow these general diagnostic steps:
🔍 Initial Checks
- Check Service Status: Verify SysManage services are running
- Review Logs: Check application and system logs for errors
- Network Connectivity: Test network connections and firewall rules
- Database Connection: Verify database connectivity and status
- Resource Usage: Check CPU, memory, and disk usage
- Configuration Validation: Verify configuration file syntax and values
📋 Quick Diagnostic Commands
# Check service status
systemctl status sysmanage
# View recent logs
journalctl -u sysmanage -n 50
# Check database connection
psql -h localhost -U sysmanage -d sysmanage -c "SELECT version();"
# Test web interface
curl -k https://localhost:6443/health
# Check disk space
df -h
# Monitor resources
htop
Installation Issues
🔧 Dependency Problems
Python Version Compatibility
Symptoms: Import errors, syntax errors, or package installation failures
Cause: Using unsupported Python version (3.13 not yet supported)
Solution:
# Check Python version
python3 --version
# Install supported Python version (Ubuntu/Debian)
sudo apt install python3.12 python3.12-venv
# Create virtual environment with correct version
python3.12 -m venv .venv
PostgreSQL Installation Issues
Symptoms: Cannot connect to database, authentication failures
Cause: PostgreSQL not properly installed or configured
Solution:
# Install PostgreSQL (Ubuntu/Debian)
sudo apt install postgresql postgresql-contrib
# Start PostgreSQL service
sudo systemctl start postgresql
sudo systemctl enable postgresql
# Create database user
sudo -u postgres createuser -P sysmanage
sudo -u postgres createdb -O sysmanage sysmanage
# Test connection
psql -h localhost -U sysmanage -d sysmanage
Node.js/npm Issues
Symptoms: Frontend build failures, npm install errors
Cause: Outdated Node.js version or npm cache corruption
Solution:
# Install Node.js 20.x
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt install -y nodejs
# Clear npm cache
npm cache clean --force
# Remove node_modules and reinstall
rm -rf node_modules package-lock.json
npm install
Configuration Issues
⚙️ Configuration File Problems
Invalid YAML Syntax
Symptoms: Server fails to start with YAML parsing errors
Cause: Malformed YAML in configuration file
Solution:
# Validate YAML syntax
python3 -c "import yaml; yaml.safe_load(open('/etc/sysmanage.yaml'))"
# Common YAML issues:
# - Incorrect indentation (use spaces, not tabs)
# - Missing quotes around strings with special characters
# - Inconsistent list formatting
# Example correct format:
database:
host: "localhost"
port: 5432
name: "sysmanage"
user: "sysmanage"
password: "your_password"
SSL Certificate Issues
Symptoms: HTTPS connection failures, certificate errors
Cause: Invalid or expired SSL certificates
Solution:
# Check certificate validity
openssl x509 -in /etc/ssl/certs/sysmanage.crt -text -noout
# Verify certificate and key match
openssl x509 -noout -modulus -in /etc/ssl/certs/sysmanage.crt | openssl md5
openssl rsa -noout -modulus -in /etc/ssl/private/sysmanage.key | openssl md5
# Generate self-signed certificate for testing
openssl req -x509 -newkey rsa:4096 -keyout /etc/ssl/private/sysmanage.key \
-out /etc/ssl/certs/sysmanage.crt -days 365 -nodes
Database Connection Configuration
Symptoms: Database connection timeouts or authentication errors
Cause: Incorrect database configuration parameters
Solution:
# Test database connection manually
psql "postgresql://sysmanage:password@localhost:5432/sysmanage"
# Check PostgreSQL configuration
sudo -u postgres psql -c "SHOW hba_file;"
sudo cat /etc/postgresql/*/main/pg_hba.conf
# Add connection rule if needed
echo "host sysmanage sysmanage 127.0.0.1/32 md5" | sudo tee -a /etc/postgresql/*/main/pg_hba.conf
# Restart PostgreSQL
sudo systemctl restart postgresql
Runtime Issues
🚨 Service Startup Problems
Service Won't Start
Symptoms: SysManage service fails to start or crashes immediately
Cause: Configuration errors, missing dependencies, or permission issues
Diagnosis:
# Check service status and logs
systemctl status sysmanage
journalctl -u sysmanage -f
# Run manually for debugging
cd /path/to/sysmanage
source .venv/bin/activate
python -m uvicorn backend.main:app --host 0.0.0.0 --port 6443
Common Solutions:
- Check file permissions on configuration and log directories
- Verify all required environment variables are set
- Ensure database is accessible and credentials are correct
- Check for port conflicts (port 6443 already in use)
High Memory Usage
Symptoms: Server becomes unresponsive, OOM killer terminates process
Cause: Memory leaks, inefficient queries, or insufficient system resources
Diagnosis:
# Monitor memory usage
htop
ps aux | grep sysmanage
# Check for memory leaks
valgrind --tool=memcheck python -m uvicorn backend.main:app
# PostgreSQL memory usage
sudo -u postgres psql -c "SELECT * FROM pg_stat_activity;"
Solutions:
- Increase system memory or add swap space
- Optimize database queries and add indexes
- Configure connection pooling limits
- Review and optimize WebSocket connection handling
Performance Issues
Symptoms: Slow response times, timeouts, high CPU usage
Cause: Database performance, inefficient code, or resource constraints
Performance Analysis:
# Monitor system performance
iostat -x 1
sar -u 1 10
# Database performance
sudo -u postgres psql -c "SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;"
# Application profiling
python -m cProfile -o profile.stats -m uvicorn backend.main:app
Optimization Steps:
- Add database indexes for frequently queried columns
- Implement query result caching
- Optimize WebSocket message handling
- Configure appropriate worker processes
Network & Connectivity Issues
🌐 Connection Problems
WebSocket Connection Failures
Symptoms: Agents cannot connect, frequent disconnections
Cause: Firewall issues, proxy problems, or network instability
Diagnosis:
# Test WebSocket connection
curl -i -N -H "Connection: Upgrade" -H "Upgrade: websocket" \
-H "Sec-WebSocket-Key: SGVsbG8sIHdvcmxkIQ==" \
-H "Sec-WebSocket-Version: 13" \
https://localhost:6443/ws
# Check firewall rules
sudo ufw status
sudo iptables -L
# Test network connectivity
telnet your-server 6443
nc -zv your-server 6443
Solutions:
- Configure firewall to allow WebSocket connections
- Check proxy server WebSocket support
- Verify SSL certificate validity for secure WebSockets
- Implement connection retry logic with backoff
Agent Registration Issues
Symptoms: Agents fail to register or appear offline
Cause: Authentication problems, network issues, or configuration mismatches
Diagnosis:
# Check agent logs
tail -f /var/log/sysmanage-agent/agent.log
# Test agent API endpoint
curl -k -X POST https://server:6443/api/agents/register \
-H "Content-Type: application/json" \
-d '{"hostname": "test", "token": "agent-token"}'
# Verify agent configuration
cat /etc/sysmanage-agent/config.yaml
Solutions:
- Verify agent authentication tokens
- Check server hostname/IP configuration
- Ensure proper SSL certificate validation
- Review agent and server version compatibility
Load Balancer Issues
Symptoms: Inconsistent behavior, session problems with multiple servers
Cause: Session affinity issues or health check failures
Load Balancer Configuration:
# Nginx health check configuration
location /health {
proxy_pass https://backend;
proxy_set_header Host $host;
access_log off;
}
# HAProxy health check
backend sysmanage
option httpchk GET /health
server app1 10.0.1.10:6443 check
server app2 10.0.1.11:6443 check
Solutions:
- Configure sticky sessions for WebSocket connections
- Implement proper health check endpoints
- Use shared session storage (Redis/database)
- Configure appropriate timeout values
Database Issues
🗃️ Database Problems
Connection Pool Exhaustion
Symptoms: "Too many connections" errors, connection timeouts
Cause: Insufficient connection pool size or connection leaks
Diagnosis:
# Check PostgreSQL connections
sudo -u postgres psql -c "SELECT count(*) FROM pg_stat_activity;"
sudo -u postgres psql -c "SHOW max_connections;"
# Monitor connection usage
watch "sudo -u postgres psql -c 'SELECT count(*) FROM pg_stat_activity;'"
Solutions:
- Increase PostgreSQL max_connections setting
- Configure connection pooling in application
- Implement connection timeout and cleanup
- Review code for connection leaks
Slow Query Performance
Symptoms: Long response times, database timeouts
Cause: Missing indexes, complex queries, or database locks
Performance Analysis:
# Enable query logging
sudo -u postgres psql -c "ALTER SYSTEM SET log_statement = 'all';"
sudo -u postgres psql -c "ALTER SYSTEM SET log_min_duration_statement = 1000;"
sudo systemctl reload postgresql
# Analyze slow queries
sudo -u postgres psql -c "SELECT query, mean_time, calls FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 10;"
# Check for locks
sudo -u postgres psql -c "SELECT * FROM pg_locks WHERE NOT granted;"
Optimization:
- Add indexes for frequently queried columns
- Use EXPLAIN ANALYZE to understand query plans
- Optimize JOIN operations and WHERE clauses
- Implement query result caching
Database Migration Issues
Symptoms: Migration failures, schema inconsistencies
Cause: Interrupted migrations, version conflicts, or data issues
Migration Troubleshooting:
# Check migration status
alembic current
alembic history
# Show pending migrations
alembic show
# Force migration to specific version
alembic stamp head
# Backup before migration
pg_dump -h localhost -U sysmanage sysmanage > backup_pre_migration.sql
Solutions:
- Always backup database before migrations
- Run migrations in maintenance mode
- Test migrations on copy of production data
- Have rollback plan ready
Authentication & Security Issues
🔐 Security Problems
JWT Token Issues
Symptoms: Authentication failures, token expired errors
Cause: Invalid tokens, clock skew, or configuration issues
Token Debugging:
# Decode JWT token (use jwt.io or)
python3 -c "
import jwt
import json
token = 'your-jwt-token-here'
decoded = jwt.decode(token, options={'verify_signature': False})
print(json.dumps(decoded, indent=2))
"
# Check system time synchronization
timedatectl status
ntpq -p
Solutions:
- Verify JWT secret key configuration
- Check token expiration settings
- Synchronize system clocks (NTP)
- Implement token refresh mechanism
Permission Denied Errors
Symptoms: 403 Forbidden errors, access denied messages
Cause: Incorrect RBAC configuration or user permissions
Permission Debugging:
# Check user roles and permissions
sudo -u postgres psql sysmanage -c "
SELECT u.userid, r.name as role, p.name as permission
FROM users u
JOIN user_roles ur ON u.id = ur.user_id
JOIN roles r ON ur.role_id = r.id
JOIN role_permissions rp ON r.id = rp.role_id
JOIN permissions p ON rp.permission_id = p.id
WHERE u.userid = 'username@example.com';
"
Solutions:
- Review user role assignments
- Verify permission definitions
- Check for inherited permissions
- Validate RBAC policy configuration
SSL/TLS Certificate Problems
Symptoms: Certificate warnings, HTTPS connection failures
Cause: Expired certificates, hostname mismatches, or chain issues
Certificate Validation:
# Check certificate details
openssl x509 -in /etc/ssl/certs/sysmanage.crt -text -noout
# Test certificate chain
openssl s_client -connect your-server:6443 -servername your-server
# Verify certificate expiration
openssl x509 -in /etc/ssl/certs/sysmanage.crt -noout -dates
Solutions:
- Renew expired certificates
- Ensure hostname matches certificate CN/SAN
- Include intermediate certificates in chain
- Configure automatic certificate renewal
Frontend Issues
🖥️ Web Interface Problems
JavaScript/React Errors
Symptoms: White screen, console errors, component failures
Cause: JavaScript errors, missing dependencies, or build issues
Debugging:
# Check browser console for errors
# Press F12 and look at Console tab
# Rebuild frontend
cd frontend
npm run build
# Check for TypeScript errors
npm run type-check
# Run development server for detailed errors
npm run dev
Solutions:
- Clear browser cache and cookies
- Rebuild frontend assets
- Check for missing environment variables
- Verify API endpoint connectivity
API Communication Errors
Symptoms: Failed API requests, CORS errors, network timeouts
Cause: CORS configuration, network issues, or API errors
Network Debugging:
# Check browser network tab for failed requests
# Press F12 and look at Network tab
# Test API endpoints directly
curl -k -H "Authorization: Bearer your-token" \
https://localhost:6443/api/hosts
# Check CORS headers
curl -k -H "Origin: https://your-frontend-domain" \
-H "Access-Control-Request-Method: POST" \
-H "Access-Control-Request-Headers: X-Requested-With" \
-X OPTIONS https://localhost:6443/api/hosts
Solutions:
- Configure CORS settings in backend
- Check API authentication tokens
- Verify network connectivity and DNS
- Review proxy configuration if applicable
WebSocket Connection Issues
Symptoms: Real-time updates not working, connection errors
Cause: WebSocket connection failures or proxy issues
WebSocket Testing:
# Test WebSocket in browser console
const ws = new WebSocket('wss://localhost:6443/ws');
ws.onopen = () => console.log('Connected');
ws.onmessage = (event) => console.log('Message:', event.data);
ws.onerror = (error) => console.log('Error:', error);
# Check proxy WebSocket support (Nginx)
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
Solutions:
- Configure proxy for WebSocket support
- Check firewall WebSocket rules
- Implement connection retry logic
- Verify SSL certificate for WSS connections
Log Analysis
📋 Log Locations
# SysManage application logs
/var/log/sysmanage/sysmanage.log
/var/log/sysmanage/error.log
/var/log/sysmanage/access.log
# System logs
journalctl -u sysmanage
/var/log/syslog
/var/log/messages
# PostgreSQL logs
/var/log/postgresql/postgresql-*.log
# Nginx logs (if used)
/var/log/nginx/access.log
/var/log/nginx/error.log
# Agent logs
/var/log/sysmanage-agent/agent.log
🔍 Log Analysis Commands
# Real-time log monitoring
tail -f /var/log/sysmanage/sysmanage.log
# Search for errors
grep -i error /var/log/sysmanage/sysmanage.log
# Filter by date range
journalctl -u sysmanage --since "2024-01-01" --until "2024-01-02"
# Count error occurrences
grep -c "ERROR" /var/log/sysmanage/sysmanage.log
# Extract IP addresses from access logs
awk '{print $1}' /var/log/sysmanage/access.log | sort | uniq -c | sort -nr
📊 Common Log Patterns
Error Patterns to Look For:
- Database Errors: "connection refused", "timeout", "deadlock"
- Authentication: "unauthorized", "forbidden", "invalid token"
- WebSocket: "connection closed", "handshake failed"
- Performance: "timeout", "slow query", "high memory"
Getting Help
📞 Support Channels
- GitHub Issues: Report bugs and request features
- GitHub Discussions: Community support and questions
- Documentation: Complete documentation
📋 Information to Include
When seeking help, please include:
- SysManage version and installation method
- Operating system and version
- Complete error messages and stack traces
- Relevant log excerpts
- Steps to reproduce the issue
- Configuration files (redact sensitive information)
🔧 Debug Information Script
#!/bin/bash
# debug_info.sh - Collect debug information
echo "=== SysManage Debug Information ==="
echo "Date: $(date)"
echo "Hostname: $(hostname)"
echo "OS: $(uname -a)"
echo
echo "=== Service Status ==="
systemctl status sysmanage
echo
echo "=== Recent Logs ==="
journalctl -u sysmanage -n 20 --no-pager
echo
echo "=== Database Connection ==="
psql -h localhost -U sysmanage -d sysmanage -c "SELECT version();" 2>&1
echo
echo "=== Disk Usage ==="
df -h
echo
echo "=== Memory Usage ==="
free -h