Project Health Monitoring

referenceproject

Project Health Monitoring

Running Services (PM2)

ServicePurposeExpected BehaviorHealth Check
tg-collectorCollect Telegram messagesShould be online 24/7, no restarts✅ Running
forwarderForward picksShould be online 24/7, no restarts✅ Running
routerRoute messagesShould be online 24/7, no restarts✅ Running
discord-forwarder-pythonForward to DiscordShould be online 24/7, no restarts✅ Running
discord-forwarder-leaksLeaks forwardingShould be online 24/7, no restarts✅ Running
hiddenbag-botHiddenBag Telegram bot7 restarts - MONITOR⚠️ Restarts
hiddenbag-webHiddenBag Next.js site18 restarts - MONITOR⚠️ Restarts
ai-tools-schedulerAI Tools HQ schedulerShould be online, no restarts✅ Running
dailyai-picksDaily AI picksSTOPPED - intentional?⏸️ Stopped

Last Health Check

  • Date: 2026-01-28 11:33 EST
  • Overall Status: ✅ All stable

Diagnostic Results (from logs)

hiddenbag-bot (7 restarts earlier)

  • Cause: Supabase error: TypeError: fetch failed (intermittent network)
  • Current: ✅ STABLE - 900+ minutes uptime, working fine
  • This is normal behavior - bot retries and PM2 restarts on crash

hiddenbag-web (18 restarts earlier)

  • Cause: Port 3000 conflict (process 14632 using it), runs on 3001
  • Cause: Next.js dev mode recompiles (not crashes)
  • Current: ✅ WORKING - serving pages fine

dailyai-picks (stopped)

  • Status: ✅ BY DESIGN
  • Has PM2 cron: */30 * * * * (every 30 min)
  • It runs, processes, stops. This is correct.

What To Check

Every 4 Hours (Cron: pm2-health-check)

  1. All core services online (tg-collector, forwarder, router, discord-forwarder-*)
  2. Restart count - alert if >3 in 24h
  3. Check logs for errors

Daily

  1. Verify messages flowing (check last forward timestamp)
  2. Check disk space
  3. Review error logs

Weekly (Sunday)

  1. Full project review with Matt
  2. Update this health status
  3. Clear old logs

Improvement Tracking

Things That Can Be Better

AreaIssueProposed FixStatus
HiddenBagToo many restartsCheck logs, fix crash causeTODO
dailyai-picksStoppedAsk Matt if intentionalTODO
Context lossCompaction loses contextBetter memory loggingIn Progress

Lessons Learned

  • Always write to memory files immediately
  • Don't rely on session context - it can compact anytime
  • Log Matt's exact words for reference

Notes

Updated by Damian automatically during health checks.