Our organization has not entered the world of ROT analysis yet. We will be performing some upcoming content migrations and understand that ROT analysis is important for any successful migration effort. A few questions we have yet to be answered are below. Looking for some advice/knowledge. Thank you in advance.
Does anyone have suggestions for what ROT analysis software is user friendly and robust?
Can anyone speak to the process of doing a proof of concept for ROT analysis?
Content Process Analyst
I suspect many people on this board will have their own favorites for software, so you may get quite a list. I will jump in with the first entry on the list, Netwrix Data Classification (formerly Concept Searching). I believe you will find their unique approach in their algorithms to be a very solid solution.
In terms of the “process of doing a proof of concept”, I’m unclear what you’re asking exactly, especially in terms of “process”. Perhaps you could elaborate a bit?
Thank you for your response Lorne.
The process question was aimed at understanding any specific/important steps when performing a proof of concept for a ROT analysis solution. We understand the basics of a proof of concept, but were looking for any subject matter expertise to expand on their lessons learned or share knowledge for ROT analysis specifically.
You may want to look at DocAuthority for ROT analysis and migration preparation. DocAuthority uses AI to discover and categorize your information into file groups across multiple repositories. This is done with minimal user intervention. This process allows for ROT detection and tagging of information for migration into an ECM of other content controlling solution. Please let me know if you would like a demo or further discussion.
Alan Weintraub, CIP
In terms of process, Steps should include:
– Inventory of content sources and content types in each source. Examples include:
team based sources (common storage areas)
-Inventory of content types (possibly by source)
Private data and confidential data
-Current classification shema – if used
-Applicable regulatory and industry standards
GDPR?, Other Country based privacy obligation
These are just a few of the steps before I would start with the software tools to help with classification. Check out the AIIM Toolkits section for help with these steps.
Alan Frank, CIP
Business Process Analyst
PhD, CIP, IGP
Thank you for the steps to include in the process. Will reference the AIIM Toolkits for help with these steps.
5 Key Factors for Document Automation PoC Success
If you miss the live webinar, we will have it available on demand.——————————
Theresa Resek, CIP
Director, Market Intelligence
Thank you Theresa. I am signed up for the webinar and downloaded the corresponding sheet.
There is a lot of best practices and thought around what I call eTrash removal. GDPR calls it data minimization. It is also referred to as defensible disposition.
When selecting a tool, remember that eTrash, and therefore file analysis tools come in lots of formats and flavors. Generally, from simple to complex:
- At one end you can achieve some basic stuff with a DOS directory listing and some good excel skills. I wouldn’t recommend this approach unless you have no budget.
- Some products in the “file analytics” space only look at the file context (name and extension) but cannot see into the files themselves. These are great if you know that your biggest problem is duplicates and TMP files, but it won’t help if you actually want to search for something in the content.
- Some products focus on the content of files and use text analytics. This can get you topic modeling, cluttering, full text search, predictive coding and searching. Only about half of what you find on a file share will contain text, so this approach may not be sufficient for eTrash.
- A number of tools do both file analysis and text analysis. Many will use the OutsideIn file viewing utility to extract text. This limits the number of files you can “see” into but this may be perfectly fine. These solutions will generally also allow searching for number or word patterns (like credit cards)
- At the other end of the spectrum, eDiscovery tools do all of the above, plus some pretty nifty stuff like OCRing images, analyzing photos, looking through databases, transcribing audio, translating, and most importantly, allowing users to review decisions easily before content is deleted. (Full disclosure, until recently, I worked for a company which falls into this category)
Therefore, for a POC, first decide what other problems may also be important to solve when you are looking at your data. For example, it is a shame to build a content map and look at all your content, but miss out on the ability to also improve your security program, analyze your personal data regulatory risk, classify records, respond to FOIA or litigation discovery requests, merge or divest data sets, classify and migrate to an ECM, or whatever other governance use cases you want to tackle. If you are migrating into a content repository that allows for metadata (Sharepoint O365, for example), get a tool that can see into text and build classification structures based on context and content.
Once you know all your use cases, the differences in tools will be along the following lines:
- Speed – removing ROT is generally not a high speed activity compared to a GDPR subject access request or litigation. Some tools base pricing on speed without limits on price
- Volume – Many tools offer pricing based on the starting volume of content rather than limiting speed.
- Containers – See if the tool can open ZIP files or PST files
- Unusual file types – If you are only looking at Office content, this will not be an issue. You probably do want to look inside containers though
- Ease of use compared to number of use cases – simpler products do less but users can do more if they can do it easily
If you are going to evaluate products in a POC, find a sample data set of 500GB to 1TB and leave it in its natural habitat. You will get a good sense for what installation issues, access settings, problem formats, metadata values, file management practices, software speeds, ease of use, platform compatibility, language support and costs will be. More data will not really give you better answers.
Also, make sure you think out your approach to preserving any legal hold content.
Let me know if I can help any more, this is what I love doing.
IG Solution Manager
- Click to share on Facebook (Opens in new window)
- Click to share on Twitter (Opens in new window)
- Click to share on WhatsApp (Opens in new window)
- Click to share on Skype (Opens in new window)
- Click to print (Opens in new window)
- Click to share on Telegram (Opens in new window)
- Click to email this to a friend (Opens in new window)
- Click to share on Reddit (Opens in new window)
- Click to share on Pocket (Opens in new window)
- Click to share on Pinterest (Opens in new window)
- Click to share on Tumblr (Opens in new window)