Archiving – The Basics
Mark Mulcahy – Waterford Technologies
Organisations of all sizes are facing accelerated growth in the amount of unstructured, file-based information they have to store. IT research houses all agree that the burden this places on primary disk storage arrays is:
- Becoming more and more excessive leading to greater costs,
- Increasing IT administration burden,
- Inefficient utilisation of storage,
- Causing difficulties in obeying legal and regulatory frameworks regarding information retention and deletion,
- Wasting disk space through file duplication.
- Making information harder to find.
The answer to this set of problems is to recognise that most unstructured information is not needed for immediate access. It can safely be moved to less expensive, capacity-centric storage arrays and amalgamated in a single logical file archive that provides:
- Using very much more cost-effective storage for older, less active files,
- Faster backup of primary data with lower space needs,
- Ability to have several archival storage tiers to match storage cost to information value
- Flexibility to use any storage supply source you wish
- Removing duplicated files,
- Simplified file management
Security and compliance
- Automated file retention and deletion
- Reporting facilities to demonstrate compliance
- IT department set policies for file deletion and migration
- IT department set policies for file retention
- File-access paths preserved for applications and users
- Better file revision management facilities for users
A file archive system can enable you to manage the seemingly uncontrollable onrush of unstructured information that needs to be stored and impose sensible and practical and automated management processes on it to save you money and your users’ time.
The Desk Top Analogy
Organisations are increasingly storing more and more files. This near ceaseless growth in file storage is putting growing pressure on Windows servers as their hard drives fill up. It takes longer and longer to backup file server contents and the administration of tens if not hundreds of thousands of files, even millions in larger organisations, is an increasing burden on IT departments.
An executive or manager’s office can be characterised as having two storage resources for paper-based information: the desk with its working surface and In, Out, and Active trays on the one hand, and the filing cabinets for less often needed but still desired reference information. This is a two-tier scheme and it makes good sense so as not to overload the desktop and its file trays with old information that gets in the way of dealing with the newer and more active information.
This is not the case with digitally-stored information though. Expensive online storage is being used to store both current files which are often accessed, and also less active files which may only be accessed once a month or less. This growing file estate requires more and more disk capacity to be purchased.
By archiving less active files to secondary storage, less expensive than frontline drive arrays, capacity purchase can be deferred and file server backup completes faster. The file servers themselves may well perform their work faster as well.
File data is often described as unstructured or semi-structured data in contrast to highly structured data base records. It can be viewed as all user-generated and non-database information stored on servers and their disk drive arrays, comprising of Word documents, spread sheets, presentations with slides and graphics, PDF documents, scanned images and documents, and also e-mails. E-mail archiving and file archiving are two distinct sides of the same coin and e-mail archiving has its own unique requirements
In everyday office terms the storage of all this information of fast, front line drive arrays is akin to having no filing cabinets and storing everything on the desk. It is impractical. Storing all file information on a single set of fast drive arrays is becoming unaffordable and will choke the array’s performance, make backups longer, as well as making array management inefficient, time-consuming and very Costly.
There is no shortage of research reports describing the inexorable rise in the amount of unstructured data enterprises have to deal with. For example, the Taneja Group consulting firm surveyed file storage activities in business. In its report, “Next Generation File Management and Controls Market Overview,” the Taneja Group found:
- 73% of the users in the survey indicated more than half their data, 60% or more, was unstructured,
- Just over half, 53%, had 11TB or more of unstructured data in their systems,
- Unstructured data growth rates between 16 and 75% were reported by 62% of the users.
The major software drivers for this growth were Microsoft Office (78%), e-mail attachments (66%), and backup and archive (81%).
Another factor in the growth of unstructured data is the generation of multiple and distributed copies of file data from content creation and collaboration. Users pointed to Windows as their standard file storage platform at the server and storage level, housing more than 26% of their unstructured content.
Finally, the majority of respondents expected their file management and control budgets would grow by up to 20% in the next 12 months in verticals such as government, professional services, financial services, retail and telecom.