Documentation > Agent > Troubleshooting

Agent Troubleshooting

Comprehensive troubleshooting guide for SysManage agents including common issues, debugging techniques, and resolution strategies.

Overview

This guide provides systematic approaches to diagnosing and resolving common issues with SysManage agents. Follow the troubleshooting steps in order, starting with basic connectivity checks and progressing to advanced debugging techniques.

Troubleshooting Methodology

  1. Identify the Problem: Gather symptoms and error messages
  2. Check Basic Requirements: Verify system requirements and configuration
  3. Review Logs: Examine agent and system logs for clues
  4. Test Components: Isolate and test individual components
  5. Apply Solutions: Implement targeted fixes
  6. Verify Resolution: Confirm the issue is resolved

Quick Diagnostics

Start with these quick checks to identify common issues.

Basic Health Check

# Check if agent is running
systemctl status sysmanage-agent
ps aux | grep sysmanage-agent

# Check agent logs
tail -f /var/log/sysmanage-agent.log
journalctl -u sysmanage-agent -f

# Test server connectivity
telnet sysmanage.example.com 8080
curl -k https://sysmanage.example.com:8080/health

# Check configuration syntax
sysmanage-agent --config-test

# Verify permissions
ls -la /etc/sysmanage-agent/
whoami

Agent Status Indicators

Status Description Action Required
🟢 Online Agent connected and functioning normally None
🟡 Connecting Agent attempting to connect to server Check network connectivity
🔴 Offline Agent cannot connect to server Check configuration and network
⚫ Stopped Agent service is not running Start the agent service
🟠 Error Agent running but experiencing errors Check logs for specific errors

Common Issues

Connection Issues

Agent Cannot Connect to Server

Symptoms:
  • Agent shows "Offline" or "Connecting" status
  • Connection timeout errors in logs
  • Agent continuously retrying connection
Solutions:
  1. Verify server configuration:
    # Check server hostname and port
    ping sysmanage.example.com
    telnet sysmanage.example.com 8080
    
    # Test HTTPS connection
    curl -k https://sysmanage.example.com:8080/health
  2. Check firewall rules:
    # Linux (iptables)
    iptables -L | grep 8080
    
    # Linux (firewalld)
    firewall-cmd --list-ports
    
    # FreeBSD (pf)
    pfctl -sr | grep 8080
    
    # Check if port is listening on server
    netstat -tlnp | grep 8080
  3. Verify DNS resolution:
    # Test DNS resolution
    nslookup sysmanage.example.com
    dig sysmanage.example.com
    
    # Check /etc/hosts for overrides
    cat /etc/hosts | grep sysmanage
  4. Check proxy settings:
    # Check environment variables
    echo $HTTP_PROXY
    echo $HTTPS_PROXY
    echo $NO_PROXY
    
    # Test direct connection
    curl --noproxy '*' https://sysmanage.example.com:8080

SSL/TLS Certificate Issues

Symptoms:
  • SSL certificate verification errors
  • "Certificate has expired" messages
  • "Hostname mismatch" errors
Solutions:
  1. Test certificate validity:
    # Check certificate details
    openssl s_client -connect sysmanage.example.com:8080 -servername sysmanage.example.com
    
    # Verify certificate expiration
    echo | openssl s_client -connect sysmanage.example.com:8080 2>/dev/null | openssl x509 -noout -dates
    
    # Check certificate chain
    curl -vvI https://sysmanage.example.com:8080
  2. Temporary SSL bypass (development only):
    # In agent configuration
    server:
      hostname: "sysmanage.example.com"
      port: 8080
      use_https: true
      verify_ssl: false  # ONLY for development/testing
  3. Update CA certificates:
    # Ubuntu/Debian
    apt update && apt install ca-certificates
    update-ca-certificates
    
    # RHEL/CentOS/Fedora
    yum update ca-certificates
    # or
    dnf update ca-certificates
    
    # FreeBSD
    pkg update && pkg install ca_root_nss

Authentication Issues

Agent Registration Failures

Symptoms:
  • Agent appears offline in server dashboard
  • "Registration failed" messages in logs
  • HTTP 401 or 403 errors
Solutions:
  1. Check agent approval status:
    # Log into SysManage web interface
    # Navigate to Hosts > Pending Approval
    # Approve the agent if it appears in pending list
  2. Verify hostname detection:
    # Check system hostname
    hostname
    hostname -f
    
    # Override in agent config if needed
    client:
      hostname_override: "my-custom-hostname"
  3. Clear agent database:
    # Stop agent
    systemctl stop sysmanage-agent
    
    # Remove agent database
    rm /path/to/agent.db
    
    # Restart agent (will re-register)
    systemctl start sysmanage-agent

Performance Issues

High CPU Usage

Symptoms:
  • Agent process consuming high CPU
  • System slowdown when agent is running
  • Frequent data collection in logs
Solutions:
  1. Adjust collection intervals:
    # Increase collection intervals
    collection:
      intervals:
        system_info: 600      # 10 minutes (was 5)
        software: 7200        # 2 hours (was 1)
        hardware: 3600        # 1 hour (was 30 min)
        network: 600          # 10 minutes (was 5)
  2. Disable unnecessary collection:
    # Disable specific collection types
    collection:
      types:
        available_packages: false  # Expensive operation
        user_accounts: false       # If not needed
        system_metrics: false      # If monitoring elsewhere
  3. Monitor agent resource usage:
    # Monitor CPU and memory usage
    top -p $(pgrep sysmanage-agent)
    htop -p $(pgrep sysmanage-agent)
    
    # Check for memory leaks
    ps -o pid,ppid,cmd,%mem,%cpu --sort=-%mem | head

High Memory Usage

Symptoms:
  • Agent memory usage continuously growing
  • System running out of memory
  • OOM (Out of Memory) killer terminating agent
Solutions:
  1. Enable memory monitoring:
    # Monitor memory usage over time
    while true; do
      echo "$(date): $(ps -o pid,ppid,cmd,%mem --sort=-%mem | grep sysmanage-agent)"
      sleep 60
    done > memory_usage.log
  2. Reduce data collection scope:
    # Limit large data collections
    collection:
      types:
        available_packages: false  # Large dataset
        hardware_info: false       # If static
    
    # Reduce message queue size
    message_queue:
      max_size: 1000
      cleanup_interval_minutes: 15
  3. Implement memory limits:
    # Systemd service limits
    [Service]
    MemoryMax=256M
    MemoryHigh=200M
    
    # Or use cgroups directly
    echo "256M" > /sys/fs/cgroup/memory/sysmanage-agent/memory.limit_in_bytes

Package Management Issues

Package Manager Not Detected

Symptoms:
  • "No package manager found" errors
  • Empty package inventory
  • Package operations failing
Solutions:
  1. Verify package manager installation:
    # Check for package managers
    which apt apt-get yum dnf pkg pkg_add brew choco
    
    # Verify they're in PATH
    echo $PATH
    
    # Test package manager directly
    apt --version
    yum --version
    pkg --version
  2. Check permissions:
    # Verify agent user can access package manager
    sudo -u sysmanage apt list --installed
    sudo -u sysmanage yum list installed
    
    # Check sudo configuration for package management
    sudo -l -U sysmanage
  3. Configure custom package manager paths:
    # Custom package manager configuration
    package_management:
      custom_paths:
        apt: "/usr/bin/apt"
        brew: "/opt/homebrew/bin/brew"
        pkg: "/usr/local/sbin/pkg"

Service Management Issues

Agent Service Won't Start

Symptoms:
  • Service fails to start
  • "Service failed to start" messages
  • Immediate service exit after start
Solutions:
  1. Check service status and logs:
    # Check service status
    systemctl status sysmanage-agent
    journalctl -u sysmanage-agent --no-pager
    
    # Check for configuration errors
    sysmanage-agent --config-test
    
    # Try starting manually
    sudo -u sysmanage sysmanage-agent --verbose
  2. Verify file permissions:
    # Check binary permissions
    ls -la /usr/local/bin/sysmanage-agent
    
    # Check configuration permissions
    ls -la /etc/sysmanage-agent/
    ls -la /var/log/sysmanage-agent/
    
    # Fix permissions if needed
    chmod +x /usr/local/bin/sysmanage-agent
    chown -R sysmanage:sysmanage /etc/sysmanage-agent/
  3. Check dependencies:
    # Check required libraries
    ldd /usr/local/bin/sysmanage-agent
    
    # Verify Python dependencies (if applicable)
    pip list | grep -E "(requests|websocket|pyyaml)"
    
    # Check system dependencies
    python3 --version
    systemctl --version

Debugging Tools and Techniques

Advanced Logging

# Enable debug logging
logging:
  level: "DEBUG"
  format: "%(asctime)s [%(levelname)s] %(name)s: %(message)s"

# Increase log verbosity for specific components
logging:
  loggers:
    websocket: "DEBUG"
    package_manager: "DEBUG"
    collection: "INFO"

# Log to separate files
logging:
  handlers:
    file:
      filename: "/var/log/sysmanage-agent/debug.log"
      level: "DEBUG"
    console:
      level: "WARNING"

Network Debugging

# Monitor network connections
netstat -tuln | grep sysmanage
ss -tuln | grep sysmanage

# Trace network requests
strace -e trace=network -p $(pgrep sysmanage-agent)

# Monitor DNS queries
sudo tcpdump -i any port 53 and host sysmanage.example.com

# Check WebSocket connections
tcpdump -i any port 8080 and host sysmanage.example.com

# Test with curl
curl -v -H "Upgrade: websocket" \
     -H "Connection: Upgrade" \
     -H "Sec-WebSocket-Key: $(openssl rand -base64 16)" \
     -H "Sec-WebSocket-Version: 13" \
     https://sysmanage.example.com:8080/ws

Process Debugging

# Monitor system calls
strace -f -p $(pgrep sysmanage-agent)

# Monitor file access
strace -e trace=file -p $(pgrep sysmanage-agent)

# Check open files
lsof -p $(pgrep sysmanage-agent)

# Monitor resource usage
pidstat -p $(pgrep sysmanage-agent) 1

# Check for deadlocks (if applicable)
gdb -p $(pgrep sysmanage-agent)
(gdb) thread apply all bt
(gdb) quit

Log Analysis

Common Log Locations

# Agent logs
/var/log/sysmanage-agent.log
/var/log/sysmanage-agent/agent.log
~/logs/agent.log

# System logs
/var/log/syslog          # Ubuntu/Debian
/var/log/messages        # RHEL/CentOS
/var/log/system.log      # macOS
C:\Windows\System32\winevt\Logs\  # Windows

# Service logs
journalctl -u sysmanage-agent
systemctl status sysmanage-agent

Important Log Patterns

Connection Issues

# Connection failures
grep -i "connection.*failed\|timeout\|refused" /var/log/sysmanage-agent.log

# SSL/TLS issues
grep -i "ssl\|tls\|certificate\|handshake" /var/log/sysmanage-agent.log

# Authentication failures
grep -i "auth.*failed\|unauthorized\|forbidden" /var/log/sysmanage-agent.log

Performance Issues

# High resource usage warnings
grep -i "memory\|cpu\|performance\|slow" /var/log/sysmanage-agent.log

# Collection timeouts
grep -i "collection.*timeout\|collection.*failed" /var/log/sysmanage-agent.log

# Queue issues
grep -i "queue.*full\|message.*expired" /var/log/sysmanage-agent.log

Configuration Issues

# Configuration errors
grep -i "config.*error\|invalid.*config\|missing.*config" /var/log/sysmanage-agent.log

# Permission issues
grep -i "permission.*denied\|access.*denied\|not.*permitted" /var/log/sysmanage-agent.log

# File system issues
grep -i "no.*space\|disk.*full\|read.*only" /var/log/sysmanage-agent.log

Recovery Procedures

Agent Recovery Scenarios

Complete Agent Reset

When agent is completely unresponsive or corrupted:

# 1. Stop the agent service
systemctl stop sysmanage-agent

# 2. Backup configuration (optional)
cp /etc/sysmanage-agent/sysmanage-agent.yaml /tmp/

# 3. Remove agent database and cache
rm -f /var/lib/sysmanage-agent/agent.db
rm -rf /var/cache/sysmanage-agent/*

# 4. Clear logs (optional)
> /var/log/sysmanage-agent.log

# 5. Restart agent service
systemctl start sysmanage-agent

# 6. Monitor startup
journalctl -u sysmanage-agent -f

Network Configuration Reset

When experiencing persistent connection issues:

# 1. Test basic connectivity
ping -c 3 sysmanage.example.com
telnet sysmanage.example.com 8080

# 2. Clear DNS cache
systemctl restart systemd-resolved  # Ubuntu 18+
service nscd restart                 # Older systems

# 3. Reset network configuration in agent
systemctl stop sysmanage-agent

# Edit configuration to use IP instead of hostname
server:
  hostname: "192.168.1.100"  # Use server IP
  port: 8080
  use_https: false           # Temporarily disable SSL

systemctl start sysmanage-agent

# 4. Monitor connection attempts
tail -f /var/log/sysmanage-agent.log | grep -i connect

Permission Recovery

When facing permission-related issues:

# 1. Fix file ownership
chown -R sysmanage:sysmanage /etc/sysmanage-agent/
chown -R sysmanage:sysmanage /var/log/sysmanage-agent/
chown -R sysmanage:sysmanage /var/lib/sysmanage-agent/

# 2. Set correct permissions
chmod 755 /usr/local/bin/sysmanage-agent
chmod 640 /etc/sysmanage-agent/sysmanage-agent.yaml
chmod 755 /var/log/sysmanage-agent/
chmod 755 /var/lib/sysmanage-agent/

# 3. Verify user exists
id sysmanage
getent passwd sysmanage

# 4. Check sudo configuration
sudo -l -U sysmanage

# 5. Test basic operations
sudo -u sysmanage /usr/local/bin/sysmanage-agent --version

Monitoring Agent Health

Automated Health Checks

#!/bin/bash
# Agent health check script

AGENT_NAME="sysmanage-agent"
LOG_FILE="/var/log/sysmanage-agent.log"
PID_FILE="/var/run/sysmanage-agent.pid"

# Check if service is running
if ! systemctl is-active --quiet $AGENT_NAME; then
    echo "ERROR: $AGENT_NAME service is not running"
    exit 1
fi

# Check if process exists
if [ -f "$PID_FILE" ]; then
    PID=$(cat $PID_FILE)
    if ! kill -0 $PID 2>/dev/null; then
        echo "ERROR: Process $PID not found"
        exit 1
    fi
else
    echo "WARNING: PID file not found"
fi

# Check log for recent errors
if [ -f "$LOG_FILE" ]; then
    ERRORS=$(tail -100 $LOG_FILE | grep -c -i error)
    if [ $ERRORS -gt 5 ]; then
        echo "WARNING: Found $ERRORS recent errors in log"
    fi
fi

# Check memory usage
MEM_USAGE=$(ps -o %mem -p $PID --no-headers | tr -d ' ')
if (( $(echo "$MEM_USAGE > 10.0" | bc -l) )); then
    echo "WARNING: High memory usage: ${MEM_USAGE}%"
fi

# Check disk space
DISK_USAGE=$(df /var/log | tail -1 | awk '{print $5}' | sed 's/%//')
if [ $DISK_USAGE -gt 80 ]; then
    echo "WARNING: High disk usage: ${DISK_USAGE}%"
fi

echo "Agent health check completed successfully"

Setting Up Alerts

# Systemd service monitoring (systemctl status alerts)
# Add to /etc/systemd/system/sysmanage-agent-monitor.service

[Unit]
Description=SysManage Agent Monitor
After=sysmanage-agent.service

[Service]
Type=oneshot
ExecStart=/usr/local/bin/check-agent-health.sh
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target

# Create timer for regular checks
# /etc/systemd/system/sysmanage-agent-monitor.timer

[Unit]
Description=Run SysManage Agent Health Check
Requires=sysmanage-agent-monitor.service

[Timer]
OnCalendar=*:0/5  # Every 5 minutes
Persistent=true

[Install]
WantedBy=timers.target

# Enable the timer
systemctl enable sysmanage-agent-monitor.timer
systemctl start sysmanage-agent-monitor.timer

Getting Support

Information to Gather Before Requesting Support

#!/bin/bash
# Support information gathering script

echo "=== SysManage Agent Support Information ==="
echo "Date: $(date)"
echo

echo "=== System Information ==="
uname -a
lsb_release -a 2>/dev/null || cat /etc/os-release
echo

echo "=== Agent Information ==="
sysmanage-agent --version
systemctl status sysmanage-agent
echo

echo "=== Configuration ==="
echo "Configuration file:"
cat /etc/sysmanage-agent/sysmanage-agent.yaml
echo

echo "=== Recent Logs ==="
echo "Last 50 lines of agent log:"
tail -50 /var/log/sysmanage-agent.log
echo

echo "=== System Resources ==="
free -h
df -h
ps aux | grep sysmanage-agent
echo

echo "=== Network Connectivity ==="
ping -c 3 sysmanage.example.com
netstat -tuln | grep sysmanage
echo

echo "=== Environment ==="
env | grep -i sysmanage
env | grep -i proxy

Support Channels

Prevention and Best Practices

Preventive Measures

  • Regular Updates: Keep agent software updated to latest stable version
  • Configuration Validation: Test configuration changes in development first
  • Monitoring Setup: Implement proactive monitoring and alerting
  • Backup Procedures: Regular backup of agent configuration and database
  • Log Rotation: Implement proper log rotation to prevent disk space issues
  • Resource Monitoring: Monitor CPU, memory, and disk usage trends
  • Network Monitoring: Monitor network connectivity and certificate expiry
  • Documentation: Maintain documentation of custom configurations

Regular Maintenance Schedule

Frequency Task Description
Daily Health Check Verify agent status and basic connectivity
Weekly Log Review Review agent logs for errors and warnings
Monthly Update Check Check for agent software updates
Monthly Resource Review Review CPU, memory, and disk usage trends
Quarterly Configuration Review Review and optimize agent configuration
Quarterly Certificate Check Verify SSL certificate expiry dates