None Non-Malicious Amazon DynamoDB Data Deletion Can Occur!
Are you your companies main information technology head? You are the person responsible for maintenance, upgrades, and making sure all files are accessible by your employees when they need them? Now imagine that due to fatigue or distraction you inadvertently lose 300GB of data in an Amazon DynamoDB, and you have no recourse to recover the data you are expected to be protecting.
Of course, you are saying to yourself that this scenario would never happen. You would be careful and diligent in your tasks. Every IT administrator says that. Even non-IT personnel sit at their computers thinking how careful they will be when working on an important project. However, that exact thing that you’d never do has happened to many, including to GitLab.
The GitLab Incident
On January 31, 2017, GitHub came under attack by spammers. According to their reports, the attacks began at around 6 pm, US Central Time, when reports of spammers hammering the database making it unstable began coming in. The team at GitLab worked hard to stop the spammers and restore stability to their database.
The GitLab’s team was able to block the spammers and remove a user who had 47,000 IPS using the same account. However, the spamming attack was not the only incident the group encountered that evening. At around 10 pm, the support team received notification that the DB Replication was lagging so far behind it effectively stopped. The lagging was due to “a spike in writes that were not processed on time by the secondary database.”
Once again, the team set out to attempt to fix this problem. Upon review, it appeared that the database two was only lagging about 4 GB. While determining the cause of the lag, a team member noticed that the db2.cluster was refusing to replicate and connect to database 1. The team member attempted to correct the communication error between the two databases by adjusting the number of WAL clients. This correction leads to a new error which causes the replication process not to start.
After making several adjustments, team members are still unable to make the second database replicated. Upon further review, the team member believes the problem lies in the present, but empty, PostgreSQL data directory. While attempting to delete this problem directory, the team member ran the delete process on database one instead of two. Despite discovering his error within seconds, only 4GB of data remained in the database. The original storage amount in this database was around 300GB.
Recovering the information was not an easy task. GitLab had five backup/replication techniques in place for catastrophic events like this one. However, these services were either not set up properly in the first place or was not working reliably. While they were able to restore most of the data, it was around six hours old. Had the team member not ran a manual backup before the outage, data would have been at least 24 hours old.
Seamless Daily Backup of Amazon DynamoDB
As you can see from GitLab’s outage, these accidents happen. A tired worker trying to fix a string of problems can make a fatal error resulting in the loss of hundreds of gigabytes of data. There is a way to ensure never lose any of your data and that is by using our Amazon DynamoDB backup service.
While Amazon works hard to make its DynamoDB service fully integrated for use across multiple platforms, not every tier offers backup solutions to go with their service. Much like their Amazon SimpleDB platform, you are responsible for backing up your data to protect against a catastrophic error.
We know you may have a variety of tables, domains, and regions attached to your Amazon DynamoDB account. That is why our service allows you to choose which tables and domains to backup and at what intervals you save your information. By default, we backup your databases daily. You can choose to change this to every few days or even once a week. However, we recommend you backup up your changes daily.
Restoration is a simple one click solution. As with our other backup options, you can choose to restore the information to the same domain/region as it came from or you can restore it to a new domain/region. However, if you restore to the same domain/region as where the data was backed up from it is a destructive restoration meaning it will replace everything in the domain. Restoring to a new domain/region is a non-destructive restoration.