Message Flow & State Management

Comprehensive overview of how messages are passed, queued, and processed between SysManage agents and the backend server.

Architecture Overview

SysManage uses a sophisticated message-passing architecture that ensures reliable, ordered communication between distributed agents and the central server. The system employs persistent message queues on both sides with automatic retry logic, priority handling, and state tracking.

SysManage Message Flow Diagram

Figure 1: Complete message flow and state diagram showing bidirectional communication between agents and server

Communication Protocol

WebSocket Layer

All real-time communication between agents and the server occurs over WebSocket connections secured with mutual TLS (mTLS). The WebSocket layer provides:

  • Real-time bidirectional communication for immediate message delivery
  • Automatic reconnection logic with exponential backoff
  • Connection state management and heartbeat monitoring
  • Encrypted transport using certificate-based authentication

Persistent Message Queuing

Both agents and server maintain persistent message queues to ensure no data is lost during network interruptions or system restarts:

Agent-Side Queuing (SQLite)

  • Local SQLite database stores messages when connection is unavailable
  • Automatic replay of queued messages upon reconnection
  • Priority-based processing ensures critical messages are sent first
  • Retry logic with configurable maximum attempts

Server-Side Queuing (PostgreSQL)

  • PostgreSQL message_queue table provides enterprise-grade persistence
  • Host-specific and broadcast queuing for targeted or fleet-wide operations
  • Correlation tracking for request/response matching
  • Transaction integrity ensures consistent state management

Message States & Lifecycle

PENDING
Queued for processing
IN_PROGRESS
Being transmitted/processed
COMPLETED
Successfully delivered

Messages can also transition to FAILED state if maximum retry attempts are exceeded, or EXPIRED if they exceed the configured time-to-live.

Message Queue Fields

{ "message_id": "uuid4-string", "direction": "outbound|inbound", "message_type": "system_info|command|update_request|...", "message_data": "json-encoded-payload", "status": "pending|in_progress|completed|failed|expired", "priority": "urgent|high|normal|low", "retry_count": 0, "max_retries": 3, "correlation_id": "uuid4-for-request-response-matching", "created_at": "2024-01-01T12:00:00Z", "scheduled_at": "2024-01-01T12:00:00Z", "completed_at": "2024-01-01T12:00:05Z" }

Message Types

Agent → Server (Outbound)

  • SYSTEM_INFO
  • HEARTBEAT
  • COMMAND_RESULT
  • OS_VERSION_UPDATE
  • HARDWARE_UPDATE
  • SOFTWARE_INVENTORY_UPDATE
  • PACKAGE_UPDATES_UPDATE
  • USER_ACCESS_UPDATE
  • UPDATE_APPLY_RESULT
  • SCRIPT_EXECUTION_RESULT
  • REBOOT_STATUS_UPDATE
  • DIAGNOSTIC_COLLECTION_RESULT
  • ERROR

Server → Agent (Inbound)

  • COMMAND
  • UPDATE_REQUEST
  • PING
  • SHUTDOWN
  • HOST_APPROVED

Priority & Scheduling

Messages are processed according to priority levels, ensuring critical operations are handled first:

URGENT
Emergency shutdown, critical security alerts
HIGH
Security updates, system commands
NORMAL
Regular updates, data collection
LOW
Heartbeats, routine maintenance

Reliability Features

Retry Logic

  • Exponential backoff prevents overwhelming disconnected endpoints
  • Configurable retry limits (default: 3 attempts)
  • Priority-aware scheduling ensures high-priority messages retry first
  • Dead letter handling for messages that exceed retry limits

Error Handling

  • Graceful degradation during network interruptions
  • Comprehensive error logging with correlation IDs
  • Automatic recovery procedures for common failure scenarios
  • Circuit breaker patterns prevent cascading failures

Data Integrity

  • Message deduplication prevents processing duplicates
  • Checksum validation ensures data integrity
  • Transaction boundaries maintain database consistency
  • Audit trails track all message processing activities

Performance Characteristics

Scalability Metrics

  • Concurrent Connections: Supports thousands of simultaneous agent connections
  • Message Throughput: Processes 10,000+ messages per second per server instance
  • Queue Depth: Handles millions of queued messages with minimal performance impact
  • Latency: Sub-100ms message delivery under normal network conditions
  • Memory Usage: Constant memory footprint regardless of queue size

Monitoring & Observability

The message flow system provides comprehensive monitoring capabilities:

  • Real-time metrics: Queue depths, message processing rates, error rates
  • Connection monitoring: Agent connectivity status, connection quality metrics
  • Performance tracking: Message latency, throughput statistics
  • Alert integration: Configurable alerts for queue backups, connection failures
  • Audit logging: Complete trail of all message processing activities