ROT Analysis/Proof of Concept

Posted by

a083c-4fc13-image-asset

Our organization has not entered the world of ROT analysis yet. We will be performing some upcoming content migrations and understand that ROT analysis is important for any successful migration effort. A few questions we have yet to be answered are below. Looking for some advice/knowledge. Thank you in advance.

Does anyone have suggestions for what ROT analysis software is user friendly and robust?
Can anyone speak to the process of doing a proof of concept for ROT analysis?

——————————
Timothy Russell
Content Process Analyst
Kamehameha Schools
——————————


Hi Timothy,

I suspect many people on this board will have their own favorites for software, so you may get quite a list. I will jump in with the first entry on the list, Netwrix Data Classification (formerly Concept Searching).  I believe you will find their unique approach in their algorithms to be a very solid solution.

In terms of the “process of doing a proof of concept”, I’m unclear what you’re asking exactly, especially in terms of “process”.  Perhaps you could elaborate a bit?

Aria

Aria Business Card-0۸
Aria Business Card-۱۰


a083c-4fc13-image-asset

Thank you for your response Lorne.

The process question was aimed at understanding any specific/important steps when performing a proof of concept for a ROT analysis solution. We understand the basics of a proof of concept, but were looking for any subject matter expertise to expand on their lessons learned or share knowledge for ROT analysis specifically.

Mahalo,
Tim

——————————
Timothy Russell
Kamehameha Schools
——————————


a083c-4fc13-image-asset

You may want to look at DocAuthority for ROT analysis and migration preparation.  DocAuthority uses AI to discover and categorize your information into file groups across multiple repositories.  This is done with minimal user intervention.  This process allows for ROT detection and tagging of information for migration into an ECM of other content controlling solution.   Please let me know if you would like a demo or further discussion.

Thanks

Alan Weintraub, CIP


a083c-4fc13-image-asset
In terms of process, Steps should include:
– Inventory of content sources and content types in each source. Examples include:

employee specific sources including computing devices and portable storage
team based sources (common storage areas)
enterprise sources
Off-site repositories

-Inventory of content types (possibly by source)

Private data and confidential data

-Current classification shema – if used
-Applicable regulatory and industry standards

               GDPR?, Other Country based privacy obligation

These are just a few of the steps before I would start with the software tools to help with classification. Check out the AIIM Toolkits section for help with these steps.
Regards,
Alan Frank, CIP

——————————
Alan Frank
Business Process Analyst
ASF Consulting
PhD, CIP, IGP
——————————


image-asset.png?format=original

Thank you for the steps to include in the process. Will reference the AIIM Toolkits for help with these steps.

——————————
Timothy Russell
Kamehameha Schools
——————————


a083c-4fc13-image-asset

AIIM has a webinar coming up on Wednesday Oct 16 at 2pm ET discussing Proof of Concept. Check it out to help you with that part of your question – the core tenants are the same regardless what type of project you’re applying it to.
5 Key Factors for Document Automation PoC Success
If you miss the live webinar, we will have it available on demand.——————————
Theresa Resek, CIP
Director, Market Intelligence
AIIM
——————————


a083c-4fc13-image-asset

Thank you Theresa. I am signed up for the webinar and downloaded the corresponding sheet.

Tim

——————————
Timothy Russell
Kamehameha Schools
——————————


image-asset.png?format=original

Timothy,

There is a lot of best practices and thought around what I call eTrash removal.  GDPR calls it data minimization. It is also referred to as defensible disposition.

When selecting a tool, remember that eTrash, and therefore file analysis tools come in lots of formats and flavors.  Generally, from simple to complex:

  1. At one end you can achieve some basic stuff with a DOS directory listing and some good excel skills.  I wouldn’t recommend this approach unless you have no budget.
  2. Some products in the “file analytics” space only look at the file context (name and extension) but cannot see into the files themselves. These are great if you know that your biggest problem is duplicates and TMP files, but it won’t help if you actually want to search for something in the content.
  3. Some products focus on the content of files and use text analytics.  This can get you topic modeling, cluttering, full text search, predictive coding and searching.  Only about half of what you find on a file share will contain text, so this approach may not be sufficient for eTrash.
  4. A number of tools do both file analysis and text analysis. Many will use the OutsideIn file viewing utility to extract text.  This limits the number of files you can “see” into but this may be perfectly fine. These solutions will generally also allow searching for number or word patterns (like credit cards)
  5. At the other end of the spectrum, eDiscovery tools do all of the above, plus some pretty nifty stuff like OCRing images, analyzing photos, looking through databases, transcribing audio, translating, and most importantly, allowing users to review decisions easily before content is deleted. (Full disclosure, until recently, I worked for a company which falls into this category)

Therefore, for a POC, first decide what other problems may also be important to solve when you are looking at your data.  For example, it is a shame to build a content map and look at all your content, but miss out on the ability to also improve your security program, analyze your personal data regulatory risk, classify records, respond to FOIA or litigation discovery requests, merge or divest data sets, classify and migrate to an ECM, or whatever other governance use cases you want to tackle.  If you are migrating into a content repository that allows for metadata (Sharepoint O365, for example), get a tool that can see into text and build classification structures based on context and content.

Once you know all your use cases, the differences in tools will be along the following lines:

  • Speed – removing ROT is generally not a high speed activity compared to a GDPR subject access request or litigation. Some tools base pricing on speed without limits on price
  • Volume  – Many tools offer pricing based on the starting volume of content rather than limiting speed.
  • Containers – See if the tool can open ZIP files or PST files
  • Unusual file types – If you are only looking at Office content, this will not be an issue.  You probably do want to look inside containers though
  • Ease of use compared to number of use cases – simpler products do less but users can do more if they can do it easily

If you are going to evaluate products in a POC, find a sample data set of 500GB to 1TB and leave it in its natural habitat.  You will get a good sense for what installation issues, access settings, problem formats, metadata values, file management practices, software speeds, ease of use, platform compatibility, language support and costs will be.  More data will not really give you better answers.

Also, make sure you think out your approach to preserving any legal hold content.

The Forrester Wave™: File Analytics Providers, Q2 2018

Forrester remove preview
The Forrester Wave™: File Analytics Providers, Q2 2018
In our 31-criteria evaluation of file analytics providers, we identified the 11 most significant ones – Active Navigation, Adlib, Concept Searching, Egnyte, IBM, Micro Focus, Nuix, OpenText, TITUS, Varonis, and Veritas Technologies – and researched, analyzed, and scored them. This report shows how each provider measures up and helps enterprise architecture (EA) professionals make the right choice.
View this on Forrester >

Let me know if I can help any more, this is what I love doing.

Mahalo

——————————
Brian Tuemmler
IG Solution Manager
www.atdoc.com
——————————

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.