A Production Celery Worker Freeze Caused by epoll

Back in mid-2019, while I was working on the Data Platform team at Krom, I ran into a problem that lingered long after it was fixed. Our Celery workers would occasionally stop processing tasks without any obvious signal. Supervisor still showed them as RUNNING, with uptimes measured in days, but zero tasks were processed. Data pipelines quietly stalled, alerts never fired, and the only clue was an occasional Broken pipe buried in the logs. ...

Jan 28, 2026 · Duy Do