IBM has had a rough weak trying to get its cloudy Information Management System back up and running after multiple brownouts and outages.
IMS is IBM's hierarchical database and information management product for delivering critical information to applications.
The first signs of trouble showed up on Monday April 1st at 9:20PM UTC when engineers noticed customer transactions, including provisioning and reloads, were being processed slower than usual.
Engineers traced the issue back to a large request which overwhelmed IBM's available resources. IBM halted the request and started clearing out the congestion and transactions appeared to be processing correctly an hour later while engineers worked to find the root of the problem.
But the issue appeared again just 14 hours later as IBM's internal teams detected an issue affecting IMS' critical backend systems, once again delaying customer transactions.
Engineers made changes to backend system configurations in an attempt to mitigate the issue for good, and performance started to improve…for a while.
While engineers were still investigating the root cause of the issue at 1:29AM UTC on Wednesday April 3rd, customer transactions once again began processing slower than usual. Database administrators were also called in to help further modify backend configurations and by 2:52AM UTC, the modifications were complete.
IBM wasn't out of the woods yet.
By the time Thursday April 4 rolled around, the issue appeared once again, affecting critical backend systems and again causing transaction delays. Engineers completed corrected steps and the impact was thought to be mitigated at 2:48AM UTC on the 4th.
But the issue returned nine hours later when errors re-appeared, stuck jobs stopped running at the transaction servers and stalled jobs for long stretches of times.
As of 10:10PM UTC on the 4th, just before the time of writing, the issue was ongoing and IBM said engineers were still looking for a root cause.
Good luck, IBM engineers!