Data Sanitisation
Ensuring data is clean from unknown, untrusted or trusted sources.
What is Data Sanitisation – or otherwise known as Content Disarm and Reconstruct (CDR)
There are numerous meanings of data sanitisation depending on the context to which it is being used in. For the purpose of this article we won’t be discussing the permanent and irreversible destruction of data often used when information on a disk needs to be wiped never to be retrieved again. And we won’t be discussing the redacting of information in data in the event of unauthorised access.
This article is about the Content Disarm and Reconstruct method to protect organisations from the threats posed by data ingress from unknown and known sources.
As the digital economies around the world booms, information flow between organisations and individuals has also become a staple process to doing business. Government entities, defence contractors, banking, finance, insurance, real estate, printing, marketing, educational institutions, libraries, municipalities and so many other organisation types interact with their business partners and customers electronically. Receiving information digitally is arguably the most efficient way of gathering vast quantities of data without limitation to geographies, improving process times, reducing labour intensive tasks, reducing error rates and overall increasing productivity for any organisation.
The risks faced, however, also increase. With greater reach comes a degree of uncertainty. The information being received can be from anyone (or thing). This is where Data Sanitisation becomes invaluable.
Data Sanitisation
In this scenario, Data Sanitisation refers to one of two methods of ensuring inbound data is free from active content that may or may not be malicious.
- All inbound files are “flattened” to an image based document, removing the possibility of malicious code being activated when a user interacts with that document.
- Full Content Disarm and Reconstruction – in that files are interrogated for “non standard” construction and content, sanitised and then rebuilt to their original state, minus any dubious or active content.
The benefit of the second approach, over the first, is that the files are totally useable after cleansing.
CDR doesn’t rely on detection. CDR assumes all files are malicious. The tech is highly effective against unknown threats, zero-day and malicious code designed to evade detection including fully undetectable (FUD) malware, obfuscation technology and VM detection.
How Does CDR or Data Sanitisation Work?
One vendor making use of this technology in their defensive efforts is OPSWAT.
As OPSWAT puts it, CDR works by evaluating and verifying files as they enter a system. These entry points could be via email, portable media, or ICAP (web). When the files hit the santisation system, they are evaluated and verified for file type and consistency across over 4,500 file types. But in OPSWAT’s case, they can be first scanned with over 30 anti-malware engines to remove the known bad, and removing the obviously malicious files before CDR kicks in. The file types themselves are then interrogated to ensure the extensions actually match the real type of file and hence uncovering files that may be attempting to masquerade as harmless file types. The files are then broken into separate, discrete components and malicious elements are removed. The metadata and file characteristics are then reconstructed. The files are recompiled, renamed and delivered. The delivered file has its file structure integrity preserved so that users can use the file as intended, safely.
Even complex files such as PowerPoint presentations with animations remain intact. The original files are quarantined for back up, auditing and future examination if needed.
Why Data Santisation is important
Malware is becoming Sandbox aware
Because of this, malware is able to evade traditional detection methods
Data Santisation doesn’t rely on Detection
As detection is not necessary, detection errors don’t happen. Many evasive file based threats, known, unknown or sandbox aware, are prevented. Removing any possible embedded threat effectively disarms file based threats without the need for detection.