Scheduler dropping freeze/unfreeze jobs
Incident Report for Merge Freeze
Resolved
First, an apology to our customers affected over the last week. Engineering teams use Merge Freeze to reduce* complexity and we failed to meet that promise between Feb 14-21. Here's what happened:

1. Beginning Feb 14 our Scheduler queue was running slower than usual, causing in some cases batch freeze/unfreeze delays and in other cases a partial or complete drop of the job itself.

2. We patched 3 remedies for this -- more Scheduler background queues (Feb 15), new "05" and "55" minute Scheduler resolutions in our frontend UI (Feb 17), and an upgrade from Redis 4 to Redis 6.2, suggested to us by Heroku (https://status.mergefreeze.com/incidents/2f39nfz0pmms).

3. Our Redis upgrade (performed Feb 18, 00:00a UTC) helped resolve dropped cache keys, but also introduced a new issue -- only the latest/newest pull requests were frozen or unfrozen during batch operations.

4. Today (Feb 21) at approximately 9:00p UTC we identified the issue. Redis modified the expected behavior of their "exists()" function in v4.5.0 (https://github.com/redis/redis-rb/blob/master/CHANGELOG.md#450), which our application depended on during batch iterations of pull requests. We tested and deployed new syntax changes, which restored all functionality.

For additional details about this incident, or to provide any other feedback, please email hello@mergefreeze.com.
Posted Feb 21, 2023 - 21:54 UTC
Monitoring
In the last 24 hours we've received reports of Scheduler routines either freezing or unfreezing just a fraction of open PRs. Our logs indicate an increase in the usual number of batch freeze requests, so we've taken steps to scale our background queue workers.

For potentially immediate remeditation we've added new "05" and "55" minute resolutions to our Scheduler timepicker, and encourage customers experiencing dropped jobs to update their freeze/unfreeze schedules to +/- 5 minutes, as on-the-hour Schedules are the most common implementation.

We will continue monitoring our Scheduler queue, and will be upgrading our Redis stack per Heroku's suggestion earlier this week:
https://devcenter.heroku.com/articles/heroku-redis-version-upgrade

I (Ryan Kulp, lead developer at Merge Freeze) apologize for this issue. Please subscribe to our Status Page for immediate updates on the situation.
Posted Feb 17, 2023 - 20:37 UTC