What is Unstructured Data?
Data Management is appearing very regularly in the mainstream media at the moment – for good reasons (GDPR and legislation to properly safeguard personal Data) and bad reasons (front page headline Data breaches and fines), the term “unstructured Data” is now used a lot outside of its traditional IT/Compliance setting. It can sometimes lead to some confusion over what exactly the term means, what is counted as Unstructured Data, how much of it typically exists in an organisation, how to find where it is and, naturally, how to deal with it.
To understand what Unstructured Data comprises, we must first have a look at Structured Data. This means in simple terms, any Data that resides in a fixed field within a record or file including Data contained in relational databases.
Structured data can, for the most part, be easily entered, stored, queried and analysed. Essentially structured data refers to well organised information, usually in a relational database. Structured Data is usually presented in headed columns and rows and is easy to order, search and process by data mining software whereas unstructured data is essentially the opposite.
Examples of Structured Data:
- CRM and ERP relational databases
- Payroll systems and Accounting software
- Bespoke CRM and similar databases
- Transactional information for payments
- Booking information databases such as flights, accommodation etc.
- Stock level Management databases
- Addresses and Dates in relational databases.
The huge surprise for most new Data Managers is that Structured Data only constitutes around 20% of a typical organisation’s total Data. This leaves a massive 80% of “other” Data not conforming to the rules above and we call this “Unstructured Data”. Trying to get a handle on this composition of Data can be a significant challenge without the right tools. With GDPR and global equivalents just months away, knowing the composition of all types of your Data is imperative.
The most common form of Data found on your servers, Unstructured Data, is what we can describe as an unsorted mess without a visible stucture. In its natural state, it contains valuable, redundant, critical, obsolete and duplicate Data intermingled with no clear means of seeing what belongs to each of these categories.
The value of Unstructured Data can be derived from proper identification of its make up through the use of software tools and then creating plans and policies to keep best practice in place. Businesses can derive very valuable insights from properly querying their massive pile of unstructured files and emails.
While normal use Big Data tools cannot query information in email messages, there is undoubtedly very valuable insight to be gained from analysing Data from this source and specialist software tools should be used to query the email Data. Email qualifies as a version of Unstructured Data because even though there is a structure to some of the content within the messages, the above scenario where ordinary data mining software cannot process the information applies. While email holds information such as the time sent, subject, and sender, the actual content of the message is difficult to break down and properly categorise.
Because the nature of Unstructured Data is unsorted, disorganised and left in its original state, most businesses just keep it all. This naturally leads to expensive outgoings in terms of storage and trying to find Data, classify it and make best use of what is still good. While the best possible outcome would be to have all this represented as Structured Data, the time, cost and difficulty in converting most types of Unstructured Data into Structured Data is very prohibitive for almost all organisation sizes.
Examples of Unstructured Data:
- Spreadsheets, Word Processing Files
- PDF files
- Digital Images
- Video and Audio files
- Machine Data from surveillance devices, satellite data, scientific data, manufacturing data
A Growing Problem
We have already looked at the composition of business Data and the surprising fact that the “difficult to analyse” Unstructured Data makes up over 80% of your existing Data. Unfortunately the bad news does not stop there – the volume of Unstructured Data is growing exponentially. According to industry analysts, Gartner, “Data volume is set to grow 800% over the next 5 years and 80% of it will reside as unstructured data“.
Unstructured Data left unmanaged will lead to excessive storage consumption, costs relating to same and an unresponsive performance from your email and file servers. Normally this is a business pain that some Data Managers have had to swallow and get on with their day, “solving” the quantity problem by throwing storage capacity at it and allowing the digital landfill dump to grow exponentially and without any oversight. In terms of proper Big Data analytics, there is a wasted opportunity to derive insight from the huge volume of unqueried information but it is not an apocalyptic scenario for most organisations.
With new Compliance legislation affecting almost every corner of the globe coming in the near future (GDPR, POPI Act, DPB etc. ), the “luxury” of data hoarding is well and truly over. Huge fines for non-compliance that would put all but the deepest pocketed organisations out of business immediately are coming fast down the track. What lurks in the fog of your Unstructured Data could be Personal or Personal Sensitive Information that you were blissfully unaware of until now. Ignorance will not be an acceptable excuse if you are retaining or utilising Data beyond its consented intended use.
Waterford Technologies and Unstructured Data
Waterford Technologies have been dealing with Unstructured Data since we opened our doors in 2001. Email and file Data Management is our business and we deploy our specially designed software to address the analysis, reduction, retention and policy management requirements of compliance with detailed reporting facilities to ensure accurate business intelligence. We assist Data Managers and DPOs with their requirements for GDPR on the Unstructured Data side.
Our new module, ComplyKEY will be released in Q1 ’18. Specifically designed to meet the challenges of existing and future compliance requirements, including GDPR, ComplyKEY has advanced functionality including an easy DSAR (data subject access request) process with Case Management/Deletion/Hold/Export/Redact tools to simplify compliance for email and file data tasks.