Security Procedures

Incident response protocols, emergency procedures, and communication templates for Pilier validators.

Reading time: 12 minutes

Incident Classification

All incidents are classified by severity and response time.

Severity Levels

Level	Name	Response Time	Examples
🔴 Critical	Network-breaking	<2 hours	Key compromise, consensus failure, network attack
🟠 High	Service degraded	<24 hours	Hardware failure, performance issues, missed blocks
🟡 Medium	Minor disruption	<7 days	Monitoring alerts, certificate expiration, log rotation
🟢 Low	Informational	<30 days	Routine maintenance, documentation updates

Critical Incidents (2-Hour Response)

1. Key Compromise

Definition: Session keys or validator account credentials exposed or suspected compromised.

Immediate actions (within 15 minutes):

1. STOP validator node immediately
   └─ systemctl stop pilier-node

2. Rotate session keys
   └─ Generate new keys on secure offline machine

3. Alert other validators
   └─ Telegram: @pilier_validators
   └─ Subject: "URGENT: validator-{id} key compromise suspected"

4. Notify core team
   └─ Email: security@pilier.org
   └─ Phone: +33 X XX XX XX XX (24/7 hotline)

Within 2 hours:

1. Complete forensic analysis
   ├─ How were keys compromised? (phishing, malware, insider?)
   ├─ Review access logs (who accessed server?)
   ├─ Check for unauthorized transactions
   └─ Document timeline

2. Generate new session keys (secure ceremony)
   ├─ Use air-gapped machine if available
   ├─ Store in hardware security module (HSM) if available
   └─ Backup encrypted with strong passphrase

3. Submit governance proposal: "Rotate session keys for validator-{id}"
   ├─ Explain incident (transparency)
   ├─ Provide new session keys
   └─ Request emergency approval (fast-track: 48 hours instead of 14 days)

4. Document incident
   └─ Use Incident Report Template (see below)

Follow-up (within 7 days):

1. Security audit
   ├─ Review all access controls
   ├─ Scan for malware/backdoors
   ├─ Update passwords, SSH keys, firewall rules
   └─ Consider engaging external security firm

2. Post-mortem report
   ├─ What happened? (root cause)
   ├─ How was it detected?
   ├─ What was the impact?
   ├─ How do we prevent recurrence?
   └─ Publish on forum (transparency)

3. Insurance claim (if applicable)
   └─ Notify cyber liability insurer within 72 hours

2. Network Attack (DDoS, Eclipse)

Definition: Malicious traffic targeting validator or network-wide attack.

DDoS (Distributed Denial of Service):

Symptoms:
├─ Abnormally high inbound traffic (10-100× normal)
├─ Node unresponsive (cannot sync blocks)
├─ CPU/bandwidth maxed out
└─ Peers disconnecting

Immediate actions (within 30 minutes):

1. Enable DDoS mitigation
   ├─ Cloudflare: Enable "I'm Under Attack" mode
   ├─ Firewall: Rate-limit connections (iptables / ufw)
   ├─ Null-route attacking IPs (if identifiable)
   └─ Switch to backup IP if available

2. Alert other validators
   └─ Telegram: "validator-{id} under DDoS, investigating"

3. Contact hosting provider
   ├─ Request upstream DDoS protection
   ├─ Consider temporary IP change
   └─ Log attack traffic (for analysis)

Within 2 hours:

1. Assess impact
   ├─ How long was validator offline?
   ├─ Missed blocks / finality votes?
   └─ Any data loss?

2. Restore service
   ├─ Bring node back online (with mitigation active)
   ├─ Verify sync status (check latest block)
   └─ Monitor for continued attack

3. Document attack
   ├─ Attack duration (start / end time)
   ├─ Attack vector (UDP flood, SYN flood, application layer?)
   ├─ Source IPs (if known)
   └─ Mitigation effectiveness

Eclipse Attack:

Symptoms:
├─ Node isolated from legitimate peers
├─ Only connected to attacker-controlled peers
├─ Receives invalid blocks / false data
└─ Appears to be syncing but on wrong chain

Immediate actions (within 15 minutes):

1. Disconnect all peers
   └─ Restart node with --reserved-only flag

2. Connect to known-good validators
   └─ Use explicit --reserved-nodes list (trusted validators only)

3. Verify chain state
   ├─ Compare block hash with other validators
   ├─ Check finality (via telemetry or block explorer)
   └─ Re-sync if on wrong fork

4. Alert network
   └─ Telegram: "Eclipse attack detected on validator-{id}"

3. Runtime Bug (Consensus-Breaking)

Definition: Critical bug in blockchain runtime causing network halt or invalid state.

Symptoms:

Network-wide:
├─ Finality stalled (no new finalized blocks)
├─ Validators producing conflicting blocks
├─ Invalid state transitions
└─ Nodes crashing repeatedly

Immediate actions (within 1 hour):

1. STOP validator node (if instructed by core team)
   └─ Prevent further damage to chain state

2. Join emergency coordination
   └─ Telegram: @pilier_validators_emergency
   └─ Core team will provide instructions

3. Test proposed fix on local testnet
   ├─ Core team provides patched runtime
   ├─ Validator tests on isolated node
   └─ Verify fix resolves issue

4. Coordinate upgrade
   ├─ All validators must upgrade simultaneously
   ├─ Agree on block height for activation
   └─ Execute on schedule (no early/late upgrades)

Within 2 hours:

1. Execute emergency governance vote
   ├─ Proposal: "Emergency runtime upgrade to fix [bug]"
   ├─ Fast-track voting: 24-48 hours (instead of 14 days)
   ├─ Validators vote based on testnet results
   └─ Requires 80% approval (high bar for emergency)

2. Deploy fix
   ├─ Update node binary
   ├─ Restart validator
   ├─ Verify network recovers
   └─ Monitor for 24 hours

3. Document incident
   └─ Post-mortem published within 7 days

4. Hardware Failure

Definition: Critical hardware component failed (disk, RAM, CPU, network).

Symptoms:

Common failures:
├─ Disk failure (I/O errors, filesystem corruption)
├─ RAM failure (kernel panics, random crashes)
├─ Network card failure (no connectivity)
└─ Power supply failure (unexpected shutdowns)

Immediate actions (within 30 minutes):

1. Diagnose failure
   ├─ Check system logs: journalctl -xe
   ├─ Test hardware: smartctl (disks), memtest (RAM)
   └─ Identify failed component

2. Failover to backup (if available)
   ├─ Switch DNS to backup server IP
   ├─ Sync blockchain data from snapshot/backup
   ├─ Start validator on backup hardware
   └─ Estimate: 1-4 hours to restore

3. Alert other validators
   └─ Telegram: "validator-{id} hardware failure, restoring from backup, ETA 2 hours"

Within 6 hours:

1. If no backup: Emergency hardware replacement
   ├─ Order replacement part (same-day delivery if possible)
   ├─ Or rent temporary cloud server (OVH, Hetzner)
   ├─ Sync blockchain (may take 6-24 hours for full sync)
   └─ Resume validation

2. Document downtime
   ├─ Failure timestamp
   ├─ Root cause (component failure)
   ├─ Restoration time
   └─ Missed blocks / votes

3. Post-incident review
   ├─ Why no backup? (if applicable)
   ├─ How to prevent? (RAID, redundant PSU, monitoring)
   └─ Update disaster recovery plan

5. Persistent Downtime (>10 Days)

Definition: Validator offline for more than 10 consecutive days without communication.

This triggers governance removal process.

Validator obligations during extended downtime:

If you know you'll be offline >24 hours:

1. Notify other validators IMMEDIATELY
   └─ Telegram: "validator-{id} will be offline [duration] due to [reason]"

2. Provide ETA for restoration
   └─ "Expect to be back online by [date/time]"

3. Daily status updates (if downtime extends)
   └─ "Still working on [issue], ETA now [new date]"

If downtime exceeds 10 days:
└─ Expect governance proposal: "Remove validator-{id} for persistent downtime"
└─ You can submit counter-proposal: "Extend grace period, validator returning [date]"

For other validators:

If peer validator offline >10 days with no communication:

1. Attempt contact (all channels)
   ├─ Email: validator-ops@entity.org
   ├─ Phone: Emergency contact number
   ├─ Social media: LinkedIn, Twitter (last resort)
   └─ Document contact attempts (for governance proposal)

2. Submit removal proposal (if no response after 15 days)
   ├─ Evidence: On-chain telemetry (last heartbeat)
   ├─ Justification: Non-responsive, Charter violation
   ├─ Grace period: 30-day notice before removal
   └─ Voting period: 14 days

3. Execute removal (if approved)
   ├─ Remove from session keys (runtime call)
   ├─ Archive validator data (for transparency)
   └─ Redistribute block production among remaining validators

High Priority Incidents (24-Hour Response)

1. Performance Degradation

Symptoms:

Validator producing <90% of expected blocks:
├─ Expected: ~20% of blocks (if 5 validators)
├─ Actual: <18% of blocks
└─ Duration: >7 consecutive days

Actions (within 24 hours):

1. Diagnose root cause
   ├─ Check CPU usage: top, htop
   ├─ Check disk I/O: iostat, iotop
   ├─ Check network: ping latency, packet loss
   ├─ Check peers: how many connected? (should be 10+)
   └─ Check logs: any errors? warnings?

2. Apply fixes
   ├─ If CPU bound: Upgrade to higher core count
   ├─ If disk I/O bound: Switch to NVMe SSD
   ├─ If network: Optimize firewall rules, switch ISP
   ├─ If peer issues: Add more bootnodes
   └─ If logs show errors: Update node binary, clear cache

3. Monitor improvement
   ├─ Track block production rate (next 48 hours)
   ├─ Should return to >95% expected blocks
   └─ If not: Consider hardware upgrade or hosting change

2. Missed Runtime Upgrade

Scenario: Network upgraded to new runtime, but validator still running old version.

Symptoms:

├─ Validator producing blocks, but they're being rejected
├─ "Invalid runtime version" errors in logs
├─ Finality participation drops to 0%
└─ Telemetry shows "outdated" status

Actions (within 6 hours):

1. Identify missed upgrade
   ├─ Check governance proposals: Was there a runtime upgrade?
   ├─ Check current runtime version: curl rpc.pilier.net (compare to your node)
   └─ Check Telegram announcements (core team posts upgrade notices)

2. Upgrade immediately
   ├─ Download latest binary: wget https://releases.pilier.net/v1.x.x
   ├─ Stop node: systemctl stop pilier-node
   ├─ Replace binary: mv pilier-node /usr/local/bin/
   ├─ Start node: systemctl start pilier-node
   └─ Verify sync: check logs for successful block import

3. Apologize + document
   ├─ Telegram: "validator-{id} missed runtime upgrade, now fixed"
   ├─ Forum post: Explain why missed (monitoring gap? missed announcement?)
   └─ Update procedures to prevent recurrence (subscribe to announcements)

Medium Priority Incidents (7-Day Response)

1. Certificate Expiration

TLS/SSL certificates expire (if using HTTPS for RPC/telemetry).

Actions (within 7 days before expiry):

1. Renew certificate
   ├─ Let's Encrypt: certbot renew
   ├─ Or manual: Generate new CSR, get signed cert
   └─ Update nginx/apache config

2. Restart web server
   └─ systemctl restart nginx

3. Verify
   └─ Check expiry: openssl s_client -connect validator.pilier.net:443

2. Monitoring Alerts Misconfigured

False positives or missing alerts.

Actions (within 7 days):

1. Review alert thresholds
   ├─ Too sensitive? (alerting on every minor spike)
   ├─ Too lax? (missed actual outage)
   └─ Adjust in Prometheus / Grafana

2. Test alerts
   └─ Simulate failure (stop node briefly, verify alert fires)

3. Document tuning
   └─ Update monitoring runbook

Low Priority Incidents (30-Day Response)

1. Routine Maintenance

Scheduled server updates, OS patches.

Actions (within 30 days):

1. Plan maintenance window
   ├─ Choose low-traffic period (weekends)
   ├─ Notify other validators 48 hours in advance
   └─ Expected downtime: <1 hour

2. Execute maintenance
   ├─ apt update && apt upgrade (Ubuntu)
   ├─ Restart server if kernel updated
   └─ Verify validator resumes normally

3. Document
   └─ Log maintenance in runbook (for auditing)

Emergency Contacts

24/7 Hotline

Critical incidents only (key compromise, network attack):

📞 Phone: +33 X XX XX XX XX
📧 Email: security@pilier.net
⏰ Response time: <30 minutes

Validator Communication

All validators:

💬 Telegram: @pilier_validators (private channel)
📧 Email: validators@pilier.org
🌐 Forum: forum.pilier.net/validators

Emergency coordination:

💬 Telegram: @pilier_validators_emergency (critical incidents only)

Core Team

Technical support:

📧 tech-support@pilier.net
⏰ Response: <24 hours (business days)

Governance questions:

📧 governance@pilier.net

Communication Requirements

When to Alert Other Validators

Always alert for:

✅ Critical incidents (key compromise, attack, network issue)
✅ Planned downtime >1 hour
✅ Performance degradation (producing <90% blocks)
✅ Hardware failures (even if quickly resolved)

Optional (but recommended) alert for:

⚠️ Routine maintenance (<1 hour downtime)
⚠️ Minor issues (resolved within 30 minutes)

Alert Format (Telegram)

Quick alert:

🔴 validator-lyon-01: CRITICAL
Issue: Key compromise suspected
Status: Node stopped, rotating keys
ETA: 2 hours
Contact: ops@univ-lyon.fr

Update:

🟢 validator-lyon-01: RESOLVED
Issue: Key compromise (root cause: phishing)
Actions: Keys rotated, governance proposal submitted
Status: Node back online
Post-mortem: Will publish within 48 hours

Incident Report Format (Email)

Send to: validators@pilier.org
Subject: Incident Report - validator-{id} - [Date]

Template:

## Incident Summary

**Validator:** validator-lyon-01
**Date:** 2027-03-15
**Severity:** Critical
**Duration:** 3 hours 45 minutes
**Impact:** Missed 225 blocks, 0 finality votes during outage

## Timeline

2027-03-15 14:23 UTC: Phishing email received by operator
2027-03-15 14:35 UTC: Operator clicked link, entered credentials
2027-03-15 15:00 UTC: Suspicious activity detected (IP from unknown location)
2027-03-15 15:10 UTC: Validator node stopped (security measure)
2027-03-15 15:15 UTC: Other validators alerted via Telegram
2027-03-15 15:30 UTC: Session keys rotated (offline machine)
2027-03-15 16:45 UTC: Governance proposal submitted (fast-track)
2027-03-15 18:05 UTC: Validator back online (new keys)

## Root Cause

Phishing attack targeting validator operator.
Email impersonated Pilier core team, requesting "urgent security update."

## Impact Assessment

**Network impact:**
├─ Validator offline: 3h 45min
├─ Missed blocks: 225 / 900 (25% of validator's responsibility during outage)
├─ Finality: Unaffected (4/5 validators still active, >2/3 threshold maintained)
└─ Network continued operating normally

**Validator impact:**
├─ Uptime: 99.48% (month-to-date, including this incident)
├─ Reputation: Minor hit (first incident in 6 months)
└─ Financial: No loss (insurance covers incident response costs)

## Corrective Actions

**Immediate:**
├─ Session keys rotated and stored in HSM
├─ 2FA enabled on all accounts
├─ Email filters updated (block phishing domains)
└─ Operator re-trained on security awareness

**Long-term:**
├─ Hardware security keys (YubiKey) ordered for all operators
├─ Phishing simulation training (quarterly)
├─ Consider moving to hardware-isolated signing (air-gapped)
└─ Update security runbook

## Lessons Learned

1. Phishing remains #1 attack vector (human error)
2. Response time was good (detected within 25 minutes)
3. Network resilience confirmed (4/5 validators sufficient)
4. Need better operator training (scheduled for Q2)

## Attachments

├─ Phishing email (screenshot)
├─ Access logs (sanitized)
└─ Governance proposal #127 (key rotation)

---

Submitted by: ops@univ-lyon.fr
Date: 2027-03-16

Post-Incident Procedures

1. Post-Mortem Report (Within 48-72 Hours)

Required for:

Critical incidents
High-impact incidents (>4 hours downtime)
Security breaches

Published on:

Forum: forum.pilier.net/validators/incidents
Governance portal (linked to incident)

Template: See Incident Report Format above

2. Update Runbooks

After every incident:

Review what worked / didn't work
Update procedures in internal runbook
Share improvements with other validators (forum post)
Update monitoring / alerting (prevent recurrence)

3. Insurance Claims (If Applicable)

For covered incidents:

1. Notify insurer within 72 hours
   └─ Email: claims@cyber-insurer.com
   └─ Reference: Policy #XXX-YYYY

2. Provide documentation
   ├─ Incident report (see template above)
   ├─ Forensic analysis (if available)
   ├─ Cost breakdown (consultant fees, hardware replacement)
   └─ Proof of incident (logs, screenshots)

3. Claim processing
   └─ Typical timeline: 30-60 days
   └─ Reimbursement: Direct deposit or check

Security Checklist (Quarterly Review)

Every 3 months, validators should:

□ Review access controls (who has SSH access?)
□ Rotate passwords / SSH keys
□ Update firewall rules (any new IPs to whitelist?)
□ Test backup restoration (can you actually restore from backup?)
□ Review monitoring alerts (any false positives / missed alerts?)
□ Update node binary (latest stable version?)
□ Check certificate expiry (renew if <30 days remaining)
□ Review insurance coverage (still adequate? any incidents to report?)
□ Test emergency contact (core team sends test alert)
□ Update disaster recovery plan (any changes to infrastructure?)

TEMPLATES

Template 1: Incident Report

# Incident Report: [Brief Description]

**Validator:** validator-{id}
**Date:** YYYY-MM-DD
**Severity:** Critical / High / Medium / Low
**Duration:** X hours Y minutes
**Impact:** [Brief summary]

## Timeline

YYYY-MM-DD HH:MM UTC: [Event 1]
YYYY-MM-DD HH:MM UTC: [Event 2]
...

## Root Cause

[What caused the incident?]

## Impact Assessment

**Network impact:**
├─ [Metric 1]
├─ [Metric 2]
└─ [Conclusion]

**Validator impact:**
├─ [Metric 1]
├─ [Metric 2]
└─ [Conclusion]

## Corrective Actions

**Immediate:**
├─ [Action 1]
├─ [Action 2]
└─ [Action 3]

**Long-term:**
├─ [Action 1]
├─ [Action 2]
└─ [Action 3]

## Lessons Learned

1. [Lesson 1]
2. [Lesson 2]
3. [Lesson 3]

## Attachments

├─ [File 1]
└─ [File 2]

---

Submitted by: [email]
Date: YYYY-MM-DD

Template 2: Downtime Notification (Email)

To: validators@pilier.org
Subject: Planned Downtime - validator-{id} - [Date]

Hello fellow validators,

This is to notify you of planned maintenance for validator-lyon-01.

**Maintenance Window:**
├─ Start: 2027-04-20 02:00 UTC (Saturday)
├─ Duration: ~1 hour
└─ End: 2027-04-20 03:00 UTC (expected)

**Reason:**
Routine OS updates (security patches) + hardware inspection.

**Expected Impact:**
├─ Validator offline during maintenance window
├─ Missed blocks: ~600 (negligible, network continues with 4/5 validators)
└─ Finality: Unaffected (>2/3 threshold maintained)

**Contact:**
If any issues arise, reach me at:
├─ Email: ops@univ-lyon.fr
├─ Telegram: @lyon_validator_ops
└─ Phone: +33 X XX XX XX XX

Thank you for your understanding.

---
University of Lyon Validator Team
validator-lyon-01

Template 3: Emergency Contact List

# Emergency Contacts - validator-lyon-01

**Last updated:** 2027-03-01

## Primary Contacts

**Validator Operator:**
├─ Name: Jean Dupont
├─ Email: ops@univ-lyon.fr
├─ Phone: +33 6 XX XX XX XX (24/7)
├─ Telegram: @lyon_validator_ops
└─ Backup: Marie Martin (backup-ops@univ-lyon.fr, +33 6 YY YY YY YY)

**University IT Department:**
├─ Email: support-it@univ-lyon.fr
├─ Phone: +33 4 XX XX XX XX (business hours)
└─ Emergency: +33 6 ZZ ZZ ZZ ZZ (after hours)

## External Contacts

**Pilier Core Team:**
├─ Security hotline: +33 X XX XX XX XX
├─ Email: security@pilier.org
└─ Telegram: @pilier_validators_emergency

**Hosting Provider (OVH):**
├─ Support: +33 9 XX XX XX XX
├─ Email: support@ovh.com
└─ Customer ID: ABC123456

**Insurance (Cyber Liability):**
├─ Provider: Hiscox
├─ Policy #: XXX-YYYY-ZZZZ
├─ Claims: +33 1 XX XX XX XX
└─ Email: claims@hiscox.fr

## Internal Escalation

**Level 1:** Validator operator (Jean Dupont)
**Level 2:** Backup operator (Marie Martin)
**Level 3:** University IT Manager (Pierre Lefebvre, it-manager@univ-lyon.fr)
**Level 4:** University CIO (Sophie Bernard, cio@univ-lyon.fr)

---

**Test Schedule:**
Emergency contacts tested quarterly (Jan, Apr, Jul, Oct)
Last test: 2027-01-15 (PASSED)
Next test: 2027-04-15

Template 4: Post-Mortem

# Post-Mortem: [Incident Title]

**Date:** YYYY-MM-DD
**Authors:** [Name(s)]
**Status:** Draft / Final
**Related Incident:** [Link to incident report]

---

## Executive Summary

[2-3 sentences: What happened, impact, resolution]

---

## What Happened?

[Detailed narrative: chronological story of incident]

---

## Root Cause Analysis

**Primary cause:**
[What was the immediate trigger?]

**Contributing factors:**
├─ [Factor 1: e.g., insufficient monitoring]
├─ [Factor 2: e.g., lack of backup]
└─ [Factor 3: e.g., operator error]

**5 Whys:**

1. Why did X happen? → Because Y
2. Why did Y happen? → Because Z
3. Why did Z happen? → Because A
4. Why did A happen? → Because B
5. Why did B happen? → **Root cause: C**

---

## What Went Well?

├─ [Positive 1: e.g., quick detection]
├─ [Positive 2: e.g., good communication]
└─ [Positive 3: e.g., backup worked]

---

## What Didn't Go Well?

├─ [Negative 1: e.g., slow response]
├─ [Negative 2: e.g., missing documentation]
└─ [Negative 3: e.g., unclear responsibilities]

---

## Action Items

| Action     | Owner  | Deadline   | Priority |
| ---------- | ------ | ---------- | -------- |
| [Action 1] | [Name] | YYYY-MM-DD | High     |
| [Action 2] | [Name] | YYYY-MM-DD | Medium   |
| [Action 3] | [Name] | YYYY-MM-DD | Low      |

---

## Lessons Learned

1. [Lesson 1]
2. [Lesson 2]
3. [Lesson 3]

---

## Timeline (Detailed)

| Time (UTC) | Event     | Action Taken |
| ---------- | --------- | ------------ |
| HH:MM      | [Event 1] | [Action 1]   |
| HH:MM      | [Event 2] | [Action 2]   |
| ...        | ...       | ...          |

---

## Metrics

**Downtime:** X hours Y minutes
**Missed blocks:** N / M (N%)
**Finality impact:** Yes / No
**User impact:** [Description]
**Cost:** €X (hardware, consulting, etc.)

---

**Published:** YYYY-MM-DD
**Forum:** [Link]
**Governance:** [Link if proposal submitted]

Summary

Incident classification:

🔴 Critical (2h): Key compromise, network attack, runtime bug, hardware failure
🟠 High (24h): Performance issues, missed upgrades
🟡 Medium (7d): Certs, monitoring
🟢 Low (30d): Routine maintenance

Communication:

Alert validators for all critical incidents
Provide ETA and regular updates
Post incident reports within 48 hours

Documentation:

Use templates (incident report, downtime notice, post-mortem)
Publish on forum (transparency)
Update runbooks (continuous improvement)

Emergency contacts:

📞 Security hotline: +33 X XX XX XX XX
💬 Telegram: @pilier_validators_emergency
📧 Email: security@pilier.org

Next Steps

For validators:

✅ Save emergency contact list (Template 3)
✅ Test incident response (simulate hardware failure)
✅ Set up monitoring alerts (Prometheus + Grafana)
✅ Review quarterly security checklist
📧 Questions? Email: validators@pilier.org

Support

📧 Security: security@pilier.org
💬 Telegram: @pilier_validators
🌐 Forum: forum.pilier.net/validators
📞 Emergency: +33 X XX XX XX XX (24/7)

Incident Classification​

Severity Levels​

Critical Incidents (2-Hour Response)​

1. Key Compromise​

2. Network Attack (DDoS, Eclipse)​

3. Runtime Bug (Consensus-Breaking)​

4. Hardware Failure​

5. Persistent Downtime (>10 Days)​

High Priority Incidents (24-Hour Response)​

1. Performance Degradation​

2. Missed Runtime Upgrade​

Medium Priority Incidents (7-Day Response)​

1. Certificate Expiration​

2. Monitoring Alerts Misconfigured​

Low Priority Incidents (30-Day Response)​

1. Routine Maintenance​

Emergency Contacts​

24/7 Hotline​

Validator Communication​

Core Team​

Communication Requirements​

When to Alert Other Validators​

Alert Format (Telegram)​

Incident Report Format (Email)​

Post-Incident Procedures​

1. Post-Mortem Report (Within 48-72 Hours)​

2. Update Runbooks​

3. Insurance Claims (If Applicable)​

Security Checklist (Quarterly Review)​

TEMPLATES​

Template 1: Incident Report​

Template 2: Downtime Notification (Email)​

Template 3: Emergency Contact List​

Template 4: Post-Mortem​

Summary​

Next Steps​

Support​

Incident Classification

Severity Levels

Critical Incidents (2-Hour Response)

1. Key Compromise

2. Network Attack (DDoS, Eclipse)

3. Runtime Bug (Consensus-Breaking)

4. Hardware Failure

5. Persistent Downtime (>10 Days)

High Priority Incidents (24-Hour Response)

1. Performance Degradation

2. Missed Runtime Upgrade

Medium Priority Incidents (7-Day Response)

1. Certificate Expiration

2. Monitoring Alerts Misconfigured

Low Priority Incidents (30-Day Response)

1. Routine Maintenance

Emergency Contacts

24/7 Hotline

Validator Communication

Core Team

Communication Requirements

When to Alert Other Validators

Alert Format (Telegram)

Incident Report Format (Email)

Post-Incident Procedures

1. Post-Mortem Report (Within 48-72 Hours)

2. Update Runbooks

3. Insurance Claims (If Applicable)

Security Checklist (Quarterly Review)

TEMPLATES

Template 1: Incident Report

Template 2: Downtime Notification (Email)

Template 3: Emergency Contact List

Template 4: Post-Mortem

Summary

Next Steps

Support