How much data should companies keep? What should be stored, and what should be deleted?
For many companies, this question is asked and addressed regularly. For others, it’s never been addressed. For some, what’s known as data retention is an automated process governed by a well-developed policy.
In the not-so-distant past, data retention may have been an afterthought. Companies didn’t have access to as much data, and certainly not as much sensitive information.
These days, however, companies collect more data than they use, and they don’t always properly dispose of what they no longer need. Added to the proliferation of data breaches, keeping more data than necessary becomes a liability.
According to an early 2020 global survey sponsored by Seagate, some 68% of enterprise data is not leveraged by companies. Some benchmarks estimate that 75% of this over-retained data contains personally identifiable information, which poses great legal risk to the data holder.
Data classification and retention
Google “data retention,” and you’ll find a host of organizations claiming expertise and competency on the subject. Simply put, data retention is the storage of data. A data retention policy defines what’s stored, how it’s stored, for how long and under what circumstances.
“Having a documented data classification and retention policy is a requirement for responsible data ownership of a company,” said Michael Beltz, senior director of Cloud Operations & Engineering for Upland Software.
According to Beltz, it all starts with identifying the type of data you have.
“The first important part of a data retention policy is data classification,” he said. “It is important to know not only where data is located, but what type of data is contained within that location. Personal identifiable information, credit card or payment information should be treated as classified and highly protected.
“Other information that would give away customer trade secrets or would be considered restricted must also be identified and be protected,” he added.
Upland not only sells business software, but also acquires software companies, so Beltz knows all about data collection and storage and the associated liability. The Texas-based company keeps only current customer data and protects that data with encryption. Encryption, he said, is a means for slowing hackers down, not stopping them.
“We have to assume that encryption gets broken all the time,” he said.
Upland also has access controls and tracking in place to secure users’ administrative accounts, which can also get hacked, he said. “It doesn’t stop with just our system,” Beltz said.
When a customer leaves, that customer’s data is automatically deleted within 30 days, and Upland will produce a certificate of data destruction upon request.
Upland’s acquired companies are not always so fastidious. “Of the 30 or so companies we’ve acquired, two had a data retention policy,” Beltz said.
Data retention cost and risk
The argument for a defined data retention policy can be boiled down to cost and risk, and they’re interconnected. First in the cost category is that, while data storage has become more affordable, storing more than you need still affects the bottom line.
Large databases take up more disk space, which comes at a cost, said Karen Winter, chief technology officer for LinkIt! Backups also take longer when you have more data than you need, she said.
The second argument for cost is that the more data you have, the slower your queries will run or, as Mike Walsh, CEO of Straight Path Solutions, put it: “The bigger your junk drawer, the harder it is to find that one thing you need.”
Walsh, whose New Hampshire-based company provides database management services to companies around the country, gave the example of a company keeping data from a 1989 invoice.
“That invoice you’ve kept from 1989 can affect performance in 2023,” he said. It’s much more effective to aggregate data, so you can access sales figures from past years without storing detailed data, he said.
Beltz noted that slow queries and reporting often prompt company management to consider purging unnecessary data. “System performance usually drives data retention,” he said.
Risk could be listed as simply another component of cost, because carrying a higher risk of data security and legal action means gambling with your bottom line. Beltz advised considering the risk of a data breach by asking, “What happens if every byte of your data becomes public? What’s the risk? What’s the damage done?”
Thinking of your data in this way should fuel your resolve to delete unnecessary data, because the more irrelevant data you have, the bigger your risk.
“The more stuff you have around, the more you can lose track of it,” said Walsh. Put another way: “The more you have, the less likely you are to have it managed well.”
Why do you need well-managed data? Because data breaches — reputation-damaging, expensive mishaps for companies — happen all the time.
“The risk of keeping data around longer than it should be retained only increases the risk that in a breach additional data is captured,” Beltz pointed out.
Storing too much data also means higher costs to rectify a security breach. If you have 100 current customers, but you’re storing data for 1,000 because you’ve not deleted former customers’ data, your potential data breach costs a lot more because you have more breached data to identify, more customers to notify and, potentially, higher legal fees in the event of legal action.
Walsh gave the example of electronic discovery systems that automate the legal process of discovering evidence. The more data you have, the harder and longer these systems must work.
“If you have 20 years of data, it gets expensive to look through,” he said. Also, “whatever you have, you have to produce,” he said.
For example, if an automotive company has a legal issue with braking systems in a certain model of car, every piece of data related to brakes on that vehicle must be turned over in the discovery.
The golden rule
If cost and risk don’t compel a company to pay attention to data retention and purging policies, perhaps conscience will, said Beltz. When we, as consumers, share personal data with a company, don’t we expect that company to use it responsibly?
“The golden rule seems to apply,” he said. “If I was a customer of the company I oversaw, how would I want my personal data handled and when would I want it to be removed? Use it when you need it and remove it as soon as it is no longer in use seems to be a good general rule.”
What to keep, what to purge
“Start with the idea that ‘I only keep what I need,’” said Beltz. That means when customers no longer do business with you, you delete their data. It also means creating a policy for historical data, which could include emails, transaction records and more.
This is where knowing the laws that govern your company’s specific type of data comes in. These laws may specify not only what you need to purge, but the need to keep certain data for a specific amount of time. “Deleting more than you’re supposed to can get you in legal trouble, too,” said Walsh.
Case in point: LinkIt!, a New York-based company that provides student data collection, management and reporting services for school districts, employs a chief information security officer to ensure compliance with privacy and confidentiality, and follows strict policies on collecting and providing access to data. But LinkIt! doesn’t typically purge data unless a customer terminates a contract. That’s because state laws require storage of student testing data for seven to 10 years.
LinkIt! did alter its data privacy policies in recent years because of a new product launch.
“I didn’t have any real concerns about data privacy until two years ago,” Winter, who serves as chief technology officer, said. “We had no PII until then.”
LinkIt! added to its traditional academic testing data services an intervention management system that integrates academic performance with measures like family income, address, medical issues and more. “It’s a lot of personal data,” Winter said, and it prompted the company to start putting disclaimers on reports that put the onus on customers for the privacy of any information that’s gleaned from the system and potentially disseminated.
Winter said the LinkIt! sales team often wants to keep data from customers who terminate services, because they often renew when funding comes through or when administration personnel changes. However, the company does have a scripted process to delete customers’ data two weeks after termination of a contract.
“We give their files to them on a disk and purge it from our system,” Winter said.
Types of data and data-related laws
When most people think of data breaches, they think of what’s known as personally identifiable information, or PII. This includes any information that, when used alone or in combination with other data, can identify an individual. Examples include full names, social security numbers, medical records, financial information, geographic information and even easily accessible information like place of birth, gender or race.
Beltz pointed out that just two pieces of information — for example, phone number and email — could reveal a person’s identity. So it doesn’t take much to breach PII data security.
If you’re doing business in Europe, you’ll need to follow strict standards covered under General Data Protection Regulation, or GDPR. For example, the GDPR requires all organizations to store personal identifying data only for as long as it’s needed.
The United States, on the other hand, has no wide-reaching data privacy or retention laws. For example, companies are under no legal obligation to notify consumers of a breach involving their personal data, and they can sell or share consumer data without any permission or notification. Many companies publicize and follow their own data privacy standards, but they’re not legally required to do so.
Instead, a patchwork of national laws regulates use of specific types of data, including personal, financial, health and other categories. The Privacy Act of 1974 governs federal agencies’ use of personal data. The Gramm-Leach-Billey Act of 1998 requires financial institutions to have policies that safeguard personal data.
Other laws like the Health Insurance Portability and Accountability Act of 1996 govern how health information can be collected and used. Still other federal laws, like the Children’s Online Privacy Protection Act of 1998, restrict how data collected about children under 13 can be used.
The Sarbanes-Oxley Act, enacted in 2002 as a response to scandals at companies like Enron and WorldCom, governs corporations’ financial record keeping and reporting. Similar regulations in Canada, Germany, France and several other countries followed this U.S. law.
Beyond federal regulations, some states have specific data privacy laws. The California Consumer Privacy Act strictly regulates how businesses collect and use consumer data. For example, it specifically calls out the right of consumers to know what personal data businesses collect and who they sell it to, and to opt out of that sale. Similar legislation is in effect in Virginia, Colorado, Connecticut and Utah, and other states have either passed or introduced similar bills. You can track state data privacy laws using the International Association of Privacy Professionals legislation tracker.
Some states have laws governing how to handle specific types of data. The Illinois Biometric Privacy Act protects individuals’ fingerprints and face scans — known as biometric data. Missouri’s e-book privacy rules govern how libraries use their records. These aren’t included in the data tracker, but the National Conference of State Legislatures is a good resource for researching state-specific data privacy laws.
Developing a data retention policy
With all the other system-related endeavors an information technology department must tackle, data retention and classification often fall by the wayside.
“It’s the last thing you want to spend your time on,” Winter said. But it’s necessary; “I try to build processes that self-purge,” she said. “It’s important to build into your processes from the beginning,” just as you design data privacy and security from the beginning.
Beltz agreed that it’s easier to architect data retention policies into a system ahead of time, but said it’s not always done that way. Individuals often develop applications without ever thinking about the long-term applications of data collection, he said. So, sometimes, data retention policies must be retro-fitted.
When developing a data retention policy, Beltz said it’s important to get all stakeholders involved. “Creating a data retention policy can be done by a governance team, but executing the policy, finding and securing the data, requires the help of a security and compliance team, the product development team comprised of business stakeholders, software architects and system and database administrators to ensure all data locations are found and secured.”
Start by classifying your data, Beltz said.
“I would first identify the type of data you collect from your customers and then look for any state or federal laws around such data,” he said. “While some data needs to be removed once customers leave, some regulations require keeping records for a certain amount of time. Hiring a firm or doing good solid research into data classifications and regulations will provide a good start.”
Once you decide what to delete, make sure it’s an automated process. “Nobody should have to think about ‘I need to delete a record,’” Beltz said. Also, make sure you have a record of what was deleted, so you can validate it if necessary, he said.
Finally, Beltz advised conducting data-focused audits. These could include penetration testing to determine how easy it is to access your data, tests to ensure old data isn’t accessible and even International Standards Organization, or ISO, data privacy testing.
Data retention policies will vary by type of business, but every company handles some type of information. Without defined policies for handling and purging that information when it’s no longer needed, companies are at greater risk for data breaches, legal suits and excess costs.