Mutual TLS (mTLS)
Comprehensive guide to SysManage's mutual TLS implementation, certificate management, PKI infrastructure, auto-rotation, and secure agent communication.
mTLS Overview
SysManage implements mutual TLS (mTLS) for secure agent-to-server communication. This provides strong authentication, encryption in transit, and protection against various attack vectors including man-in-the-middle attacks, agent spoofing, and unauthorized access.
mTLS Architecture
┌─────────────────────────────────────────────────────────────────┐
│ mTLS Infrastructure │
├─────────────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────────────────┐ │
│ │ Root CA │ │ Intermediate │ │ Server Certificate │ │
│ │ (Offline) │ │ CA (Online) │ │ (TLS Server Auth) │ │
│ └─────────────┘ └──────────────┘ └─────────────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────────────────┐ │
│ │ Certificate │ │ Certificate │ │ Certificate Revocation │ │
│ │ Authority │ │ Database │ │ List (CRL) │ │
│ └─────────────┘ └──────────────┘ └─────────────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Agent Certificates │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │Agent #1 │ │Agent #2 │ │Agent #3 │ │Agent #4 │ │Agent #N │ │ │
│ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │
│ └─────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
mTLS Handshake
│
┌─────────────────────────────────────────────────────────────────┐
│ Agent Communication │
├─────────────────────────────────────────────────────────────────┤
│ 1. Agent presents client certificate │
│ 2. Server validates certificate against CA │
│ 3. Server presents server certificate │
│ 4. Agent validates server certificate │
│ 5. Establish encrypted communication channel │
│ 6. Continuous certificate status validation │
└─────────────────────────────────────────────────────────────────┘
PKI Infrastructure
Certificate Authority Hierarchy
SysManage implements a two-tier PKI with offline root CA and online intermediate CA for enhanced security.
CA Structure
🔐 Root Certificate Authority
- Offline Storage: Air-gapped secure environment
- Long Validity: 10-20 years certificate lifetime
- High Security: Hardware Security Module (HSM) protected
- Limited Use: Only signs intermediate CA certificates
- Key Length: 4096-bit RSA or P-384 ECC
🔑 Intermediate Certificate Authority
- Online Operation: Integrated with SysManage server
- Medium Validity: 2-5 years certificate lifetime
- Automated Signing: Issues agent and server certificates
- CRL Management: Maintains certificate revocation lists
- Key Length: 2048-bit RSA or P-256 ECC
📜 End Entity Certificates
- Agent Certificates: Unique per agent with hostname/UUID
- Server Certificates: TLS server authentication
- Short Validity: 90 days with automatic rotation
- Extended Validation: Enhanced subject validation
- Key Usage: Digital signature, key encipherment
PKI Configuration
# PKI configuration in SysManage
PKI_CONFIG = {
"root_ca": {
"key_size": 4096,
"algorithm": "RSA",
"validity_years": 20,
"storage": "offline",
"subject": {
"CN": "SysManage Root CA",
"O": "Your Organization",
"OU": "IT Security",
"C": "US"
}
},
"intermediate_ca": {
"key_size": 2048,
"algorithm": "RSA",
"validity_years": 5,
"storage": "online",
"crl_distribution_points": ["https://sysmanage.company.com/crl"],
"subject": {
"CN": "SysManage Intermediate CA",
"O": "Your Organization",
"OU": "SysManage",
"C": "US"
}
},
"agent_certificates": {
"key_size": 2048,
"algorithm": "RSA",
"validity_days": 90,
"auto_renewal_days": 30,
"extended_key_usage": ["clientAuth"],
"key_usage": ["digitalSignature", "keyEncipherment"]
},
"server_certificates": {
"key_size": 2048,
"algorithm": "RSA",
"validity_days": 365,
"auto_renewal_days": 30,
"extended_key_usage": ["serverAuth"],
"key_usage": ["digitalSignature", "keyEncipherment"]
}
}
Certificate Lifecycle Management
Certificate Generation Process
Agent Certificate Generation
- Agent Registration: Agent submits registration request with host information
- Administrator Approval: Manual approval process for new agents
- CSR Generation: Server generates Certificate Signing Request
- Certificate Issuance: Intermediate CA signs the certificate
- Certificate Delivery: Secure delivery to approved agent
- Certificate Installation: Agent installs certificate and private key
Certificate Subject Format
# Agent certificate subject format
CN=agent-{hostname}-{uuid}
OU=SysManage Agents
O=Your Organization
L=City
ST=State
C=US
# Example agent certificate
CN=agent-web01-a1b2c3d4-e5f6-7g8h-9i0j-k1l2m3n4o5p6
OU=SysManage Agents
O=Acme Corporation
L=New York
ST=NY
C=US
# Server certificate subject format
CN=sysmanage.company.com
OU=SysManage Server
O=Your Organization
L=City
ST=State
C=US
# Subject Alternative Names (SAN) for server
DNS:sysmanage.company.com
DNS:*.sysmanage.company.com
DNS:sysmanage-api.company.com
IP:192.168.1.100
Automatic Certificate Rotation
SysManage implements automatic certificate rotation to minimize security risks and operational overhead.
Rotation Workflow
- Monitoring: Continuous monitoring of certificate expiration dates
- Pre-notification: Alerts sent 30 days before expiration
- New Certificate Generation: Automatic generation of new certificate
- Gradual Rollout: Phased deployment to agents
- Validation: Confirmation of successful rotation
- Old Certificate Revocation: Revocation of expired certificates
Rotation Implementation
# Certificate rotation service
import asyncio
from datetime import datetime, timedelta
from cryptography import x509
from cryptography.x509.oid import NameOID
class CertificateRotationService:
def __init__(self, ca_manager, agent_manager):
self.ca_manager = ca_manager
self.agent_manager = agent_manager
self.rotation_threshold_days = 30
async def check_certificate_expiration(self):
"""Check for certificates approaching expiration"""
expiring_certs = []
threshold_date = datetime.now() + timedelta(days=self.rotation_threshold_days)
# Get all active agent certificates
active_agents = await self.agent_manager.get_active_agents()
for agent in active_agents:
cert = await self.load_agent_certificate(agent.id)
if cert.not_valid_after <= threshold_date:
expiring_certs.append({
'agent_id': agent.id,
'hostname': agent.hostname,
'certificate': cert,
'expires_at': cert.not_valid_after
})
return expiring_certs
async def rotate_certificate(self, agent_id: str):
"""Rotate certificate for specific agent"""
try:
# Generate new key pair
new_private_key = self.ca_manager.generate_private_key()
# Create certificate request
subject = x509.Name([
x509.NameAttribute(NameOID.COMMON_NAME, f"agent-{agent.hostname}-{agent.uuid}"),
x509.NameAttribute(NameOID.ORGANIZATIONAL_UNIT_NAME, "SysManage Agents"),
x509.NameAttribute(NameOID.ORGANIZATION_NAME, "Your Organization"),
])
# Generate new certificate
new_cert = await self.ca_manager.issue_certificate(
subject=subject,
public_key=new_private_key.public_key(),
validity_days=90
)
# Deliver new certificate to agent
await self.deliver_certificate_to_agent(agent_id, new_cert, new_private_key)
# Update certificate database
await self.update_agent_certificate(agent_id, new_cert)
# Schedule old certificate revocation
await self.schedule_certificate_revocation(agent_id, delay_hours=24)
return True
except Exception as e:
await self.log_rotation_error(agent_id, str(e))
return False
async def automated_rotation_task(self):
"""Background task for automatic certificate rotation"""
while True:
try:
expiring_certs = await self.check_certificate_expiration()
for cert_info in expiring_certs:
await self.rotate_certificate(cert_info['agent_id'])
# Small delay to avoid overwhelming the system
await asyncio.sleep(1)
# Run check every 6 hours
await asyncio.sleep(6 * 3600)
except Exception as e:
await self.log_error(f"Certificate rotation task failed: {e}")
await asyncio.sleep(3600) # Wait 1 hour before retry
mTLS Handshake Process
Handshake Flow
The mTLS handshake provides mutual authentication between the SysManage server and agents.
Detailed Handshake Process
Agent Server
│ │
│ 1. ClientHello │
├──────────────────────────────────────→│
│ - TLS version │
│ - Cipher suites │
│ - Client random │
│ │
│ 2. ServerHello + Certificate │
│←──────────────────────────────────────┤
│ - Selected TLS version │
│ - Selected cipher suite │
│ - Server certificate │
│ - Certificate Request │
│ │
│ 3. Client Certificate + Key Exchange │
├──────────────────────────────────────→│
│ - Client certificate │
│ - Certificate Verify │
│ - Change Cipher Spec │
│ - Finished │
│ │
│ 4. Change Cipher Spec + Finished │
│←──────────────────────────────────────┤
│ │
│ 5. Encrypted Application Data │
│←─────────────────────────────────────→│
│ │
Certificate Validation Process
Server-Side Validation (Agent Certificate)
- Certificate Chain Validation: Verify certificate chain to trusted root CA
- Signature Verification: Validate certificate signature using CA public key
- Validity Period Check: Ensure certificate is within valid time period
- Revocation Status: Check Certificate Revocation List (CRL)
- Subject Validation: Verify certificate subject matches agent identity
- Key Usage Validation: Confirm appropriate key usage extensions
- Hostname Verification: Validate hostname in certificate subject
Client-Side Validation (Server Certificate)
- Certificate Chain Validation: Verify server certificate chain
- Hostname Verification: Match server hostname to certificate CN/SAN
- Certificate Pinning: Validate against pinned certificate/public key
- Expiration Check: Ensure server certificate is not expired
- Revocation Check: Verify certificate has not been revoked
mTLS Implementation
# Server-side mTLS configuration (FastAPI)
import ssl
from fastapi import FastAPI, HTTPException
from cryptography import x509
from cryptography.hazmat.primitives import hashes
app = FastAPI()
# Configure TLS context for mTLS
def create_mtls_context():
context = ssl.create_default_context(ssl.Purpose.CLIENT_AUTH)
context.minimum_version = ssl.TLSVersion.TLSv1_2
context.check_hostname = False
context.verify_mode = ssl.CERT_REQUIRED
# Load server certificate and key
context.load_cert_chain(
certfile="/etc/ssl/certs/sysmanage-server.crt",
keyfile="/etc/ssl/private/sysmanage-server.key"
)
# Load CA certificates for client validation
context.load_verify_locations("/etc/ssl/certs/sysmanage-ca.crt")
return context
# Certificate validation middleware
async def validate_client_certificate(request):
"""Validate client certificate from mTLS connection"""
try:
# Extract client certificate from TLS connection
cert_der = request.scope.get('client_cert')
if not cert_der:
raise HTTPException(401, "Client certificate required")
# Parse certificate
cert = x509.load_der_x509_certificate(cert_der)
# Validate certificate chain
if not await validate_certificate_chain(cert):
raise HTTPException(401, "Invalid certificate chain")
# Check certificate expiration
if cert.not_valid_after < datetime.now():
raise HTTPException(401, "Certificate has expired")
# Validate certificate subject
subject = cert.subject
cn = subject.get_attributes_for_oid(x509.NameOID.COMMON_NAME)[0].value
if not cn.startswith("agent-"):
raise HTTPException(401, "Invalid certificate subject")
# Check certificate revocation status
if await is_certificate_revoked(cert):
raise HTTPException(401, "Certificate has been revoked")
# Extract agent information from certificate
agent_info = parse_agent_certificate(cert)
return agent_info
except Exception as e:
raise HTTPException(401, f"Certificate validation failed: {str(e)}")
# Agent-side mTLS configuration
class AgentHTTPSConnection:
def __init__(self, server_host, server_port, cert_file, key_file, ca_file):
self.server_host = server_host
self.server_port = server_port
self.cert_file = cert_file
self.key_file = key_file
self.ca_file = ca_file
self.context = self._create_ssl_context()
def _create_ssl_context(self):
context = ssl.create_default_context(ssl.Purpose.SERVER_AUTH)
context.minimum_version = ssl.TLSVersion.TLSv1_2
# Load client certificate for mTLS
context.load_cert_chain(self.cert_file, self.key_file)
# Load CA certificates for server validation
context.load_verify_locations(self.ca_file)
# Enable certificate hostname checking
context.check_hostname = True
context.verify_mode = ssl.CERT_REQUIRED
return context
async def connect(self):
"""Establish mTLS connection to server"""
try:
# Create SSL connection
reader, writer = await asyncio.open_connection(
self.server_host,
self.server_port,
ssl=self.context
)
# Verify server certificate
peer_cert = writer.get_extra_info('ssl_object').getpeercert_chain()[0]
if not self._verify_server_certificate(peer_cert):
raise Exception("Server certificate validation failed")
return reader, writer
except Exception as e:
raise Exception(f"mTLS connection failed: {str(e)}")
def _verify_server_certificate(self, cert):
"""Additional server certificate validation"""
# Implement certificate pinning
expected_fingerprint = self._get_pinned_fingerprint()
cert_fingerprint = cert.fingerprint(hashes.SHA256())
return cert_fingerprint == expected_fingerprint
Certificate Storage & Security
Secure Storage Implementation
Proper certificate storage is critical for maintaining the security of the mTLS infrastructure.
Storage Security Requirements
🔐 Access Control
- Restrictive file permissions (600 for private keys)
- Dedicated service accounts for certificate access
- Role-based access control for certificate operations
- Audit logging for all certificate access
🛡️ Encryption at Rest
- Private key encryption with strong passphrases
- Database encryption for certificate metadata
- File system encryption (LUKS/BitLocker)
- Hardware Security Module (HSM) integration
🔄 Backup & Recovery
- Encrypted certificate backups
- Geographically distributed storage
- Recovery procedure documentation
- Regular backup validation testing
📊 Monitoring
- Certificate expiration monitoring
- Access pattern analysis
- Integrity verification checks
- Anomaly detection and alerting
Certificate Storage Layout
# Recommended directory structure
/etc/ssl/sysmanage/
├── ca/
│ ├── root-ca.crt # Root CA certificate (public)
│ ├── intermediate-ca.crt # Intermediate CA certificate (public)
│ ├── ca-chain.crt # Full certificate chain (public)
│ └── private/
│ ├── intermediate-ca.key # Intermediate CA private key (600)
│ └── root-ca.key # Root CA private key (offline storage)
├── server/
│ ├── server.crt # Server certificate (644)
│ ├── server.key # Server private key (600)
│ └── server-chain.crt # Server certificate with chain (644)
├── agents/
│ ├── {agent-uuid}/
│ │ ├── agent.crt # Agent certificate (644)
│ │ ├── agent.key # Agent private key (600)
│ │ └── metadata.json # Certificate metadata (644)
│ └── revoked/
│ └── {agent-uuid}/ # Revoked certificates archive
├── crl/
│ ├── sysmanage.crl # Certificate Revocation List (644)
│ └── crl-history/ # Historical CRL versions
└── backup/
├── encrypted/ # Encrypted certificate backups
└── metadata/ # Backup metadata and checksums
# File permissions example
chown -R sysmanage:sysmanage /etc/ssl/sysmanage/
chmod 755 /etc/ssl/sysmanage/
chmod 700 /etc/ssl/sysmanage/*/private/
chmod 600 /etc/ssl/sysmanage/*/private/*
chmod 644 /etc/ssl/sysmanage/*/*.crt
chmod 644 /etc/ssl/sysmanage/*/*.json
Hardware Security Module (HSM) Integration
For enterprise environments requiring the highest level of security, SysManage supports HSM integration.
HSM Benefits
- Hardware-based Security: Private keys never exist in software
- Tamper Resistance: Physical security against attacks
- Performance: Hardware-accelerated cryptographic operations
- Compliance: FIPS 140-2 Level 3/4 certification
- Audit Trail: Comprehensive logging of all operations
HSM Configuration Example
# HSM configuration for SysManage
HSM_CONFIG = {
"enabled": True,
"provider": "pkcs11",
"library_path": "/usr/lib/libpkcs11.so",
"slot_id": 0,
"pin": "secured-hsm-pin",
"key_label_prefix": "sysmanage-",
"ca_key_label": "sysmanage-intermediate-ca",
"key_attributes": {
"private": True,
"sensitive": True,
"extractable": False,
"key_type": "RSA",
"key_size": 2048
}
}
# HSM integration implementation
import pkcs11
from cryptography.hazmat.primitives import serialization
class HSMCertificateManager:
def __init__(self, hsm_config):
self.config = hsm_config
self.lib = pkcs11.lib(hsm_config['library_path'])
self.token = None
self.session = None
async def initialize_hsm(self):
"""Initialize HSM connection"""
try:
# Get token
self.token = self.lib.get_token(slot_id=self.config['slot_id'])
# Open session
self.session = self.token.open(user_pin=self.config['pin'])
return True
except Exception as e:
logger.error(f"HSM initialization failed: {e}")
return False
async def generate_key_pair(self, key_label: str):
"""Generate key pair in HSM"""
try:
# Generate RSA key pair in HSM
public_key, private_key = self.session.generate_keypair(
pkcs11.KeyType.RSA,
self.config['key_size'],
label=key_label,
**self.config['key_attributes']
)
return public_key, private_key
except Exception as e:
logger.error(f"HSM key generation failed: {e}")
raise
async def sign_certificate(self, csr_data: bytes, ca_key_label: str):
"""Sign certificate using HSM-stored CA key"""
try:
# Find CA private key in HSM
ca_private_key = self.session.get_key(
label=ca_key_label,
key_type=pkcs11.KeyType.RSA
)
# Sign the certificate request
signature = ca_private_key.sign(
csr_data,
mechanism=pkcs11.Mechanism.SHA256_RSA_PKCS
)
return signature
except Exception as e:
logger.error(f"HSM certificate signing failed: {e}")
raise
Certificate Revocation
Certificate Revocation List (CRL)
SysManage maintains a Certificate Revocation List to track and distribute information about revoked certificates.
CRL Generation Process
- Revocation Request: Administrator revokes certificate through web interface
- Database Update: Certificate status updated in certificate database
- CRL Generation: New CRL generated with revoked certificate entry
- CRL Signing: CRL signed by Intermediate CA private key
- CRL Distribution: Updated CRL distributed to all agents
- Cache Invalidation: Certificate validation caches cleared
CRL Implementation
# CRL generation and management
from cryptography import x509
from cryptography.x509.oid import CRLEntryExtensionOID
from cryptography.hazmat.primitives import hashes
import datetime
class CRLManager:
def __init__(self, ca_manager):
self.ca_manager = ca_manager
self.crl_number = 0
self.crl_validity_hours = 24
async def generate_crl(self):
"""Generate new Certificate Revocation List"""
try:
# Get all revoked certificates
revoked_certs = await self.get_revoked_certificates()
# Create CRL builder
crl_builder = x509.CertificateRevocationListBuilder()
# Set issuer (Intermediate CA)
ca_cert = await self.ca_manager.get_intermediate_ca_certificate()
crl_builder = crl_builder.issuer_name(ca_cert.subject)
# Set validity period
now = datetime.datetime.utcnow()
next_update = now + datetime.timedelta(hours=self.crl_validity_hours)
crl_builder = crl_builder.last_update(now)
crl_builder = crl_builder.next_update(next_update)
# Add revoked certificates
for revoked_cert in revoked_certs:
revoked_cert_builder = x509.RevokedCertificateBuilder()
revoked_cert_builder = revoked_cert_builder.serial_number(
revoked_cert['serial_number']
)
revoked_cert_builder = revoked_cert_builder.revocation_date(
revoked_cert['revocation_date']
)
# Add revocation reason
revoked_cert_builder = revoked_cert_builder.add_extension(
x509.CRLReason(revoked_cert['reason']),
critical=False
)
crl_builder = crl_builder.add_revoked_certificate(
revoked_cert_builder.build()
)
# Add CRL extensions
crl_builder = crl_builder.add_extension(
x509.CRLNumber(self.crl_number),
critical=False
)
# Add Authority Key Identifier
ca_public_key = ca_cert.public_key()
aki = x509.AuthorityKeyIdentifier.from_issuer_public_key(ca_public_key)
crl_builder = crl_builder.add_extension(aki, critical=False)
# Sign CRL with CA private key
ca_private_key = await self.ca_manager.get_intermediate_ca_private_key()
crl = crl_builder.sign(ca_private_key, hashes.SHA256())
# Save CRL to file and database
await self.save_crl(crl)
# Increment CRL number for next generation
self.crl_number += 1
return crl
except Exception as e:
logger.error(f"CRL generation failed: {e}")
raise
async def revoke_certificate(self, serial_number: int, reason: x509.ReasonFlags):
"""Revoke a certificate and update CRL"""
try:
# Update certificate status in database
await self.update_certificate_status(
serial_number=serial_number,
status='revoked',
revocation_date=datetime.datetime.utcnow(),
revocation_reason=reason
)
# Generate new CRL
await self.generate_crl()
# Notify all agents of CRL update
await self.notify_crl_update()
logger.info(f"Certificate {serial_number} revoked successfully")
except Exception as e:
logger.error(f"Certificate revocation failed: {e}")
raise
async def check_certificate_revocation(self, serial_number: int) -> bool:
"""Check if certificate is revoked"""
try:
# Check local database first
cert_status = await self.get_certificate_status(serial_number)
if cert_status == 'revoked':
return True
# Download and check latest CRL if needed
latest_crl = await self.download_latest_crl()
for revoked_cert in latest_crl:
if revoked_cert.serial_number == serial_number:
return True
return False
except Exception as e:
logger.error(f"Revocation check failed: {e}")
return True # Fail secure - assume revoked if check fails
Online Certificate Status Protocol (OCSP)
For real-time certificate validation, SysManage supports OCSP as an alternative to CRL.
OCSP Advantages
- Real-time Status: Immediate certificate status information
- Bandwidth Efficient: Query specific certificates only
- Reduced Latency: No need to download entire CRL
- Privacy: Responder doesn't know which certificates are being validated
Certificate Monitoring & Alerting
Monitoring Dashboard
SysManage provides comprehensive monitoring for certificate health and lifecycle management.
Key Monitoring Metrics
📊 Certificate Inventory
- Total active certificates
- Certificates by type (agent/server)
- Certificate age distribution
- Expiration timeline
⏰ Expiration Tracking
- Certificates expiring in 30 days
- Certificates expiring in 7 days
- Expired certificates
- Rotation success/failure rates
🔒 Security Events
- Certificate validation failures
- Revocation events
- Suspicious certificate usage
- CRL update failures
🔧 Operational Health
- CA service availability
- Certificate issuance latency
- OCSP responder status
- HSM connectivity (if applicable)
Alerting Configuration
# Certificate monitoring alerts
CERTIFICATE_ALERTS = {
"expiration_warnings": {
"30_days": {
"enabled": True,
"recipients": ["admin@company.com"],
"severity": "warning"
},
"7_days": {
"enabled": True,
"recipients": ["admin@company.com", "security@company.com"],
"severity": "high"
},
"24_hours": {
"enabled": True,
"recipients": ["oncall@company.com"],
"severity": "critical"
}
},
"validation_failures": {
"threshold": 5,
"window_minutes": 5,
"recipients": ["security@company.com"],
"severity": "high"
},
"ca_service_down": {
"check_interval_seconds": 60,
"recipients": ["admin@company.com", "oncall@company.com"],
"severity": "critical"
},
"crl_update_failure": {
"max_age_hours": 25,
"recipients": ["security@company.com"],
"severity": "medium"
}
}
# Prometheus metrics for certificate monitoring
from prometheus_client import Counter, Gauge, Histogram
# Certificate counters
certificates_issued_total = Counter(
'sysmanage_certificates_issued_total',
'Total number of certificates issued',
['certificate_type']
)
certificates_revoked_total = Counter(
'sysmanage_certificates_revoked_total',
'Total number of certificates revoked',
['revocation_reason']
)
# Certificate gauges
certificates_active = Gauge(
'sysmanage_certificates_active',
'Number of active certificates',
['certificate_type']
)
certificates_expiring_soon = Gauge(
'sysmanage_certificates_expiring_soon',
'Number of certificates expiring soon',
['days_until_expiry']
)
# Performance metrics
certificate_validation_duration = Histogram(
'sysmanage_certificate_validation_duration_seconds',
'Certificate validation duration'
)
# Example metric collection
async def collect_certificate_metrics():
"""Collect certificate metrics for monitoring"""
try:
# Count active certificates by type
agent_certs = await count_active_certificates('agent')
server_certs = await count_active_certificates('server')
certificates_active.labels(certificate_type='agent').set(agent_certs)
certificates_active.labels(certificate_type='server').set(server_certs)
# Count certificates expiring soon
expiring_30d = await count_expiring_certificates(30)
expiring_7d = await count_expiring_certificates(7)
expiring_24h = await count_expiring_certificates(1)
certificates_expiring_soon.labels(days_until_expiry='30').set(expiring_30d)
certificates_expiring_soon.labels(days_until_expiry='7').set(expiring_7d)
certificates_expiring_soon.labels(days_until_expiry='1').set(expiring_24h)
except Exception as e:
logger.error(f"Metric collection failed: {e}")
mTLS Troubleshooting
Common Issues
🚫 Certificate Validation Failures
- Symptoms: Agents cannot connect, TLS handshake failures
- Causes: Expired certificates, incorrect CA chain, clock skew
- Solutions: Check certificate expiry, verify CA chain, sync time
🔗 Certificate Chain Issues
- Symptoms: "Certificate chain incomplete" errors
- Causes: Missing intermediate certificates, incorrect order
- Solutions: Rebuild certificate chain, verify CA order
⏰ Clock Synchronization
- Symptoms: "Certificate not yet valid" or "expired" errors
- Causes: System clock drift, timezone issues
- Solutions: Configure NTP, check timezone settings
🔐 Permission Problems
- Symptoms: "Permission denied" reading certificate files
- Causes: Incorrect file permissions, SELinux policies
- Solutions: Fix file permissions, update SELinux context
Diagnostic Commands
# Certificate verification commands
# Verify certificate chain
openssl verify -CAfile /etc/ssl/sysmanage/ca/ca-chain.crt \
/etc/ssl/sysmanage/agents/agent-web01/agent.crt
# Check certificate details
openssl x509 -in /etc/ssl/sysmanage/agents/agent-web01/agent.crt \
-text -noout
# Test mTLS connection
openssl s_client -connect sysmanage.company.com:443 \
-cert /etc/ssl/sysmanage/agents/agent-web01/agent.crt \
-key /etc/ssl/sysmanage/agents/agent-web01/agent.key \
-CAfile /etc/ssl/sysmanage/ca/ca-chain.crt \
-verify_return_error
# Check certificate expiration
openssl x509 -in /etc/ssl/sysmanage/agents/agent-web01/agent.crt \
-noout -dates
# Validate CRL
openssl crl -in /etc/ssl/sysmanage/crl/sysmanage.crl \
-text -noout
# Test OCSP responder
openssl ocsp -issuer /etc/ssl/sysmanage/ca/intermediate-ca.crt \
-cert /etc/ssl/sysmanage/agents/agent-web01/agent.crt \
-url http://ocsp.sysmanage.company.com \
-resp_text
# Debug TLS handshake
curl -v --cert /etc/ssl/sysmanage/agents/agent-web01/agent.crt \
--key /etc/ssl/sysmanage/agents/agent-web01/agent.key \
--cacert /etc/ssl/sysmanage/ca/ca-chain.crt \
https://sysmanage.company.com/api/health