So today I discovered that there’s a cron job that holds non-reproducible state that died, and now our system is fucked.
The cron job doesn’t live inside any source control. This morning it entered a terminal state, and because it overwrites its state there’s no way to revert it.
I’m currently waiting for the database rollback and have rewritten it in a reproducible/idempotent way.
We never had our crons in source control, but I always saved it somewhere (usually on my machine and the target machine) so we had some history just in case of typing r instead of l for some reason. You can also create an alias called backupCrontab or something that runs the command for you and puts the output somewhere safe.