Back

 Industry News Details

 
Deleting Big Data: Which Option Works Best Posted on : Jul 19 - 2019

Big data is critical to innovation, but what happens when you need to delete some of it? Here's what you need to know about large deletes.

In a previous article, I outlined how to gain hidden insights into Microsoft SQL Server transaction logs through use of specific Dynamic Management Objects. I did not think that I’d have the opportunity to use the information provided in that article to identify a course of action to one of the technology professionals I mentor.

The situation was this: Her client was logging activity used for billing into a table that had never been trimmed of unneeded, older data. This table was not used for auditing (the rules around long-term storage of that data are quite strict compared to what we’re discussing here). Rather, it was used to stage data to their billing system for processing. The table was approximately 20Tb and held six years of data. The use of the system had been steadily growing along with the success of her client’s company, so earlier years had fewer records. She was tasked with deleting all data in the table but for the last 13 months, with about 75% of the records targeted for deletion. The goal was to accomplish this task with as little impact on the overall system as possible. The IT pro asked me about different options for ordered deletes, which piqued my curiosity. This seemed like the perfect opportunity to put the Dynamic Management Objects that I wrote about to good use, to look for a couple of key factors: impact on the transaction log and spills to tempdb for processing.

Setting up the Test

I was interested in vetting out the options for “ordered” deletes, looking at both ordered deletes directly against tables and those same delete processes using views built on top of those tables. To add a twist to this test, I wanted to look at the differences between when an ordered delete is based on the clustering key or when it is based on a column that is specifically not the clustering key. View More