Agent Troubleshooting
Comprehensive troubleshooting guide for SysManage agents including common issues, debugging techniques, and resolution strategies.
Overview
This guide provides systematic approaches to diagnosing and resolving common issues with SysManage agents. Follow the troubleshooting steps in order, starting with basic connectivity checks and progressing to advanced debugging techniques.
Troubleshooting Methodology
- Identify the Problem: Gather symptoms and error messages
- Check Basic Requirements: Verify system requirements and configuration
- Review Logs: Examine agent and system logs for clues
- Test Components: Isolate and test individual components
- Apply Solutions: Implement targeted fixes
- Verify Resolution: Confirm the issue is resolved
Quick Diagnostics
Start with these quick checks to identify common issues.
Basic Health Check
# Check if agent is running
systemctl status sysmanage-agent
ps aux | grep sysmanage-agent
# Check agent logs
tail -f /var/log/sysmanage-agent.log
journalctl -u sysmanage-agent -f
# Test server connectivity
telnet sysmanage.example.com 8080
curl -k https://sysmanage.example.com:8080/health
# Check configuration syntax
sysmanage-agent --config-test
# Verify permissions
ls -la /etc/sysmanage-agent/
whoami
Agent Status Indicators
Status | Description | Action Required |
---|---|---|
🟢 Online | Agent connected and functioning normally | None |
🟡 Connecting | Agent attempting to connect to server | Check network connectivity |
🔴 Offline | Agent cannot connect to server | Check configuration and network |
⚫ Stopped | Agent service is not running | Start the agent service |
🟠 Error | Agent running but experiencing errors | Check logs for specific errors |
Common Issues
Connection Issues
Agent Cannot Connect to Server
Symptoms:
- Agent shows "Offline" or "Connecting" status
- Connection timeout errors in logs
- Agent continuously retrying connection
Solutions:
- Verify server configuration:
# Check server hostname and port ping sysmanage.example.com telnet sysmanage.example.com 8080 # Test HTTPS connection curl -k https://sysmanage.example.com:8080/health
- Check firewall rules:
# Linux (iptables) iptables -L | grep 8080 # Linux (firewalld) firewall-cmd --list-ports # FreeBSD (pf) pfctl -sr | grep 8080 # Check if port is listening on server netstat -tlnp | grep 8080
- Verify DNS resolution:
# Test DNS resolution nslookup sysmanage.example.com dig sysmanage.example.com # Check /etc/hosts for overrides cat /etc/hosts | grep sysmanage
- Check proxy settings:
# Check environment variables echo $HTTP_PROXY echo $HTTPS_PROXY echo $NO_PROXY # Test direct connection curl --noproxy '*' https://sysmanage.example.com:8080
SSL/TLS Certificate Issues
Symptoms:
- SSL certificate verification errors
- "Certificate has expired" messages
- "Hostname mismatch" errors
Solutions:
- Test certificate validity:
# Check certificate details openssl s_client -connect sysmanage.example.com:8080 -servername sysmanage.example.com # Verify certificate expiration echo | openssl s_client -connect sysmanage.example.com:8080 2>/dev/null | openssl x509 -noout -dates # Check certificate chain curl -vvI https://sysmanage.example.com:8080
- Temporary SSL bypass (development only):
# In agent configuration server: hostname: "sysmanage.example.com" port: 8080 use_https: true verify_ssl: false # ONLY for development/testing
- Update CA certificates:
# Ubuntu/Debian apt update && apt install ca-certificates update-ca-certificates # RHEL/CentOS/Fedora yum update ca-certificates # or dnf update ca-certificates # FreeBSD pkg update && pkg install ca_root_nss
Authentication Issues
Agent Registration Failures
Symptoms:
- Agent appears offline in server dashboard
- "Registration failed" messages in logs
- HTTP 401 or 403 errors
Solutions:
- Check agent approval status:
# Log into SysManage web interface # Navigate to Hosts > Pending Approval # Approve the agent if it appears in pending list
- Verify hostname detection:
# Check system hostname hostname hostname -f # Override in agent config if needed client: hostname_override: "my-custom-hostname"
- Clear agent database:
# Stop agent systemctl stop sysmanage-agent # Remove agent database rm /path/to/agent.db # Restart agent (will re-register) systemctl start sysmanage-agent
Performance Issues
High CPU Usage
Symptoms:
- Agent process consuming high CPU
- System slowdown when agent is running
- Frequent data collection in logs
Solutions:
- Adjust collection intervals:
# Increase collection intervals collection: intervals: system_info: 600 # 10 minutes (was 5) software: 7200 # 2 hours (was 1) hardware: 3600 # 1 hour (was 30 min) network: 600 # 10 minutes (was 5)
- Disable unnecessary collection:
# Disable specific collection types collection: types: available_packages: false # Expensive operation user_accounts: false # If not needed system_metrics: false # If monitoring elsewhere
- Monitor agent resource usage:
# Monitor CPU and memory usage top -p $(pgrep sysmanage-agent) htop -p $(pgrep sysmanage-agent) # Check for memory leaks ps -o pid,ppid,cmd,%mem,%cpu --sort=-%mem | head
High Memory Usage
Symptoms:
- Agent memory usage continuously growing
- System running out of memory
- OOM (Out of Memory) killer terminating agent
Solutions:
- Enable memory monitoring:
# Monitor memory usage over time while true; do echo "$(date): $(ps -o pid,ppid,cmd,%mem --sort=-%mem | grep sysmanage-agent)" sleep 60 done > memory_usage.log
- Reduce data collection scope:
# Limit large data collections collection: types: available_packages: false # Large dataset hardware_info: false # If static # Reduce message queue size message_queue: max_size: 1000 cleanup_interval_minutes: 15
- Implement memory limits:
# Systemd service limits [Service] MemoryMax=256M MemoryHigh=200M # Or use cgroups directly echo "256M" > /sys/fs/cgroup/memory/sysmanage-agent/memory.limit_in_bytes
Package Management Issues
Package Manager Not Detected
Symptoms:
- "No package manager found" errors
- Empty package inventory
- Package operations failing
Solutions:
- Verify package manager installation:
# Check for package managers which apt apt-get yum dnf pkg pkg_add brew choco # Verify they're in PATH echo $PATH # Test package manager directly apt --version yum --version pkg --version
- Check permissions:
# Verify agent user can access package manager sudo -u sysmanage apt list --installed sudo -u sysmanage yum list installed # Check sudo configuration for package management sudo -l -U sysmanage
- Configure custom package manager paths:
# Custom package manager configuration package_management: custom_paths: apt: "/usr/bin/apt" brew: "/opt/homebrew/bin/brew" pkg: "/usr/local/sbin/pkg"
Service Management Issues
Agent Service Won't Start
Symptoms:
- Service fails to start
- "Service failed to start" messages
- Immediate service exit after start
Solutions:
- Check service status and logs:
# Check service status systemctl status sysmanage-agent journalctl -u sysmanage-agent --no-pager # Check for configuration errors sysmanage-agent --config-test # Try starting manually sudo -u sysmanage sysmanage-agent --verbose
- Verify file permissions:
# Check binary permissions ls -la /usr/local/bin/sysmanage-agent # Check configuration permissions ls -la /etc/sysmanage-agent/ ls -la /var/log/sysmanage-agent/ # Fix permissions if needed chmod +x /usr/local/bin/sysmanage-agent chown -R sysmanage:sysmanage /etc/sysmanage-agent/
- Check dependencies:
# Check required libraries ldd /usr/local/bin/sysmanage-agent # Verify Python dependencies (if applicable) pip list | grep -E "(requests|websocket|pyyaml)" # Check system dependencies python3 --version systemctl --version
Debugging Tools and Techniques
Advanced Logging
# Enable debug logging
logging:
level: "DEBUG"
format: "%(asctime)s [%(levelname)s] %(name)s: %(message)s"
# Increase log verbosity for specific components
logging:
loggers:
websocket: "DEBUG"
package_manager: "DEBUG"
collection: "INFO"
# Log to separate files
logging:
handlers:
file:
filename: "/var/log/sysmanage-agent/debug.log"
level: "DEBUG"
console:
level: "WARNING"
Network Debugging
# Monitor network connections
netstat -tuln | grep sysmanage
ss -tuln | grep sysmanage
# Trace network requests
strace -e trace=network -p $(pgrep sysmanage-agent)
# Monitor DNS queries
sudo tcpdump -i any port 53 and host sysmanage.example.com
# Check WebSocket connections
tcpdump -i any port 8080 and host sysmanage.example.com
# Test with curl
curl -v -H "Upgrade: websocket" \
-H "Connection: Upgrade" \
-H "Sec-WebSocket-Key: $(openssl rand -base64 16)" \
-H "Sec-WebSocket-Version: 13" \
https://sysmanage.example.com:8080/ws
Process Debugging
# Monitor system calls
strace -f -p $(pgrep sysmanage-agent)
# Monitor file access
strace -e trace=file -p $(pgrep sysmanage-agent)
# Check open files
lsof -p $(pgrep sysmanage-agent)
# Monitor resource usage
pidstat -p $(pgrep sysmanage-agent) 1
# Check for deadlocks (if applicable)
gdb -p $(pgrep sysmanage-agent)
(gdb) thread apply all bt
(gdb) quit
Log Analysis
Common Log Locations
# Agent logs
/var/log/sysmanage-agent.log
/var/log/sysmanage-agent/agent.log
~/logs/agent.log
# System logs
/var/log/syslog # Ubuntu/Debian
/var/log/messages # RHEL/CentOS
/var/log/system.log # macOS
C:\Windows\System32\winevt\Logs\ # Windows
# Service logs
journalctl -u sysmanage-agent
systemctl status sysmanage-agent
Important Log Patterns
Connection Issues
# Connection failures
grep -i "connection.*failed\|timeout\|refused" /var/log/sysmanage-agent.log
# SSL/TLS issues
grep -i "ssl\|tls\|certificate\|handshake" /var/log/sysmanage-agent.log
# Authentication failures
grep -i "auth.*failed\|unauthorized\|forbidden" /var/log/sysmanage-agent.log
Performance Issues
# High resource usage warnings
grep -i "memory\|cpu\|performance\|slow" /var/log/sysmanage-agent.log
# Collection timeouts
grep -i "collection.*timeout\|collection.*failed" /var/log/sysmanage-agent.log
# Queue issues
grep -i "queue.*full\|message.*expired" /var/log/sysmanage-agent.log
Configuration Issues
# Configuration errors
grep -i "config.*error\|invalid.*config\|missing.*config" /var/log/sysmanage-agent.log
# Permission issues
grep -i "permission.*denied\|access.*denied\|not.*permitted" /var/log/sysmanage-agent.log
# File system issues
grep -i "no.*space\|disk.*full\|read.*only" /var/log/sysmanage-agent.log
Recovery Procedures
Agent Recovery Scenarios
Complete Agent Reset
When agent is completely unresponsive or corrupted:
# 1. Stop the agent service
systemctl stop sysmanage-agent
# 2. Backup configuration (optional)
cp /etc/sysmanage-agent/sysmanage-agent.yaml /tmp/
# 3. Remove agent database and cache
rm -f /var/lib/sysmanage-agent/agent.db
rm -rf /var/cache/sysmanage-agent/*
# 4. Clear logs (optional)
> /var/log/sysmanage-agent.log
# 5. Restart agent service
systemctl start sysmanage-agent
# 6. Monitor startup
journalctl -u sysmanage-agent -f
Network Configuration Reset
When experiencing persistent connection issues:
# 1. Test basic connectivity
ping -c 3 sysmanage.example.com
telnet sysmanage.example.com 8080
# 2. Clear DNS cache
systemctl restart systemd-resolved # Ubuntu 18+
service nscd restart # Older systems
# 3. Reset network configuration in agent
systemctl stop sysmanage-agent
# Edit configuration to use IP instead of hostname
server:
hostname: "192.168.1.100" # Use server IP
port: 8080
use_https: false # Temporarily disable SSL
systemctl start sysmanage-agent
# 4. Monitor connection attempts
tail -f /var/log/sysmanage-agent.log | grep -i connect
Permission Recovery
When facing permission-related issues:
# 1. Fix file ownership
chown -R sysmanage:sysmanage /etc/sysmanage-agent/
chown -R sysmanage:sysmanage /var/log/sysmanage-agent/
chown -R sysmanage:sysmanage /var/lib/sysmanage-agent/
# 2. Set correct permissions
chmod 755 /usr/local/bin/sysmanage-agent
chmod 640 /etc/sysmanage-agent/sysmanage-agent.yaml
chmod 755 /var/log/sysmanage-agent/
chmod 755 /var/lib/sysmanage-agent/
# 3. Verify user exists
id sysmanage
getent passwd sysmanage
# 4. Check sudo configuration
sudo -l -U sysmanage
# 5. Test basic operations
sudo -u sysmanage /usr/local/bin/sysmanage-agent --version
Monitoring Agent Health
Automated Health Checks
#!/bin/bash
# Agent health check script
AGENT_NAME="sysmanage-agent"
LOG_FILE="/var/log/sysmanage-agent.log"
PID_FILE="/var/run/sysmanage-agent.pid"
# Check if service is running
if ! systemctl is-active --quiet $AGENT_NAME; then
echo "ERROR: $AGENT_NAME service is not running"
exit 1
fi
# Check if process exists
if [ -f "$PID_FILE" ]; then
PID=$(cat $PID_FILE)
if ! kill -0 $PID 2>/dev/null; then
echo "ERROR: Process $PID not found"
exit 1
fi
else
echo "WARNING: PID file not found"
fi
# Check log for recent errors
if [ -f "$LOG_FILE" ]; then
ERRORS=$(tail -100 $LOG_FILE | grep -c -i error)
if [ $ERRORS -gt 5 ]; then
echo "WARNING: Found $ERRORS recent errors in log"
fi
fi
# Check memory usage
MEM_USAGE=$(ps -o %mem -p $PID --no-headers | tr -d ' ')
if (( $(echo "$MEM_USAGE > 10.0" | bc -l) )); then
echo "WARNING: High memory usage: ${MEM_USAGE}%"
fi
# Check disk space
DISK_USAGE=$(df /var/log | tail -1 | awk '{print $5}' | sed 's/%//')
if [ $DISK_USAGE -gt 80 ]; then
echo "WARNING: High disk usage: ${DISK_USAGE}%"
fi
echo "Agent health check completed successfully"
Setting Up Alerts
# Systemd service monitoring (systemctl status alerts)
# Add to /etc/systemd/system/sysmanage-agent-monitor.service
[Unit]
Description=SysManage Agent Monitor
After=sysmanage-agent.service
[Service]
Type=oneshot
ExecStart=/usr/local/bin/check-agent-health.sh
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.target
# Create timer for regular checks
# /etc/systemd/system/sysmanage-agent-monitor.timer
[Unit]
Description=Run SysManage Agent Health Check
Requires=sysmanage-agent-monitor.service
[Timer]
OnCalendar=*:0/5 # Every 5 minutes
Persistent=true
[Install]
WantedBy=timers.target
# Enable the timer
systemctl enable sysmanage-agent-monitor.timer
systemctl start sysmanage-agent-monitor.timer
Getting Support
Information to Gather Before Requesting Support
#!/bin/bash
# Support information gathering script
echo "=== SysManage Agent Support Information ==="
echo "Date: $(date)"
echo
echo "=== System Information ==="
uname -a
lsb_release -a 2>/dev/null || cat /etc/os-release
echo
echo "=== Agent Information ==="
sysmanage-agent --version
systemctl status sysmanage-agent
echo
echo "=== Configuration ==="
echo "Configuration file:"
cat /etc/sysmanage-agent/sysmanage-agent.yaml
echo
echo "=== Recent Logs ==="
echo "Last 50 lines of agent log:"
tail -50 /var/log/sysmanage-agent.log
echo
echo "=== System Resources ==="
free -h
df -h
ps aux | grep sysmanage-agent
echo
echo "=== Network Connectivity ==="
ping -c 3 sysmanage.example.com
netstat -tuln | grep sysmanage
echo
echo "=== Environment ==="
env | grep -i sysmanage
env | grep -i proxy
Support Channels
- GitHub Issues: Report bugs and feature requests
- GitHub Discussions: Community support and questions
- Documentation: Check the official documentation for updates
- Log Analysis: Always include relevant log excerpts when requesting support
Prevention and Best Practices
Preventive Measures
- Regular Updates: Keep agent software updated to latest stable version
- Configuration Validation: Test configuration changes in development first
- Monitoring Setup: Implement proactive monitoring and alerting
- Backup Procedures: Regular backup of agent configuration and database
- Log Rotation: Implement proper log rotation to prevent disk space issues
- Resource Monitoring: Monitor CPU, memory, and disk usage trends
- Network Monitoring: Monitor network connectivity and certificate expiry
- Documentation: Maintain documentation of custom configurations
Regular Maintenance Schedule
Frequency | Task | Description |
---|---|---|
Daily | Health Check | Verify agent status and basic connectivity |
Weekly | Log Review | Review agent logs for errors and warnings |
Monthly | Update Check | Check for agent software updates |
Monthly | Resource Review | Review CPU, memory, and disk usage trends |
Quarterly | Configuration Review | Review and optimize agent configuration |
Quarterly | Certificate Check | Verify SSL certificate expiry dates |