Conversion of files from Filenet, CDs to 365/Sharepoint

Posted by
a083c-4fc13-image-asset
Has any of the members tackled a conversion process from files from Filenet and CDs into Office 365/Sharepoint? I know it is fairly easy to make a dump of Filenet and remain folders and subfolders structure intact and copy into 365 but I wonder if more intelligence should or can be applied. Currently, the documents in Filenet have only a title and date. The files are scanned documents in PDF. The PDF are searchable but non have disposition dates attached. The volume is about 800.000 documents.

The CD collection contains about 4 million files. These are images and converted to PDF. Non- searchable but metadata is present makes it possible to search on title, year, department name. I was looking for a process flow to propose and possibly a consultancy to assist with a tool to create the most value of this conversion process.

Appreciate recommendations, models to use or experiences to connect with.

Thanks

Daan

——————————
Daan Boom

——————————


 

12616-efe8c-image-asset

Daan,

 

Given that FileNet, even in pretty old versions, does make it pretty straightforward to export the content repo structures (and I think even the security/permissions model as well), I would definitely suggest starting with that as a “straw man” to improve upon for the controlled taxonomy in O365 (SharePoint Online, or SPO, in this case).  I would also strongly suggest acquiring and using a tool set like Netwrix Data Classification (formerly ConceptSearching) against the content in the FileNet repos to either support or debunk (as the case may be) that straw man based upon the insights it will provide from the actual content.

 

In terms of the assets on the CDs; I haven’t heard of any hardware/software combination that could make getting that content off those discs a quick process.  I think you’re looking at getting a machine setup with a bunch of cd drives and writing a batch program to pull the files and metadata out to a file share and having someone manually swap out the CDs as each is ripped.  Obviously, once the contents are into a file share, you can then choose whether that content is important enough to apply tooling to in order to enrich the metadata for each file.  Whether you do that or not, those files should go into ‘asset’ libraries in SPO.

 

If you’re interested in exploring Netwrix, and getting assistance to make this happen, please feel free to message me directly.

 

912bb-bd660-2020-01-24at04-24-53

 


 

a083c-4fc13-image-asset

Daan –

With regard to adding metadata / enriching the documents you’re moving, my question would be what the documents are used for.  If they’re just archived records with no business purpose, I’d be inclined to leave them alone.

For the CDs, we’ve got a tool that can OCR all of the files that’s licensed fixed fee (so it’s cost effective when they value of the OCR process is not well established.)  We build it for a customer that had 50m images.

For physical hardware to load the CDs, I’d recommend that you look at the automatic CD/DVD/Bluray burners and potentially a piece of custom software.  Though the equipment is designed to burn CDs/DVDs/BluRays it’s quite effective at copying CDs back to file storage.  Many of them like the Bravo we use have two outputs so you can separate the discs that are unreadable from those which copied completely.  Again, it’s probably building a small amount of custom software to drive the process but it’s substantially easier than trying to physically handle all of the CDs.

I’m all for better taxonomy and metadata but depending upon your purposes, it may be enough to just add OCR support to the old CD based images.

Rob

——————————
Robert Bogue
President
Thor Projects LLC
——————————


 

a083c-4fc13-image-asset

Thank you Robert for your advice. You are absolutely right that it doesn’t make sense to copy 1:1 and it might be better to leave the files alone. I dived into the volume a little deeper last Monday and assessed that out of the 800.000 files, 300.000 have archival business value. Out of these an estimated 15 to 20.000 reusability value which requires a manual effort to systematize or through one of the tools a fellow member proposed to look into. The files on the CDs have archival value and probably non for reusability. To apply the BRAVO process looks a sensible approach to consider and thank you for that suggestion. I may revert back to you in time after internal discussions on the best way to proceed.

Daan

——————————
Daan Boom
Director
CCLFI
——————————


 

a083c-4fc13-image-asset

Daan –

One other thought to consider regarding the CD archive is what responsibility you have to protect the records.  Burned CDs do not archive well.  If you have a responsibility to archive for more than 10 years — I’d consider getting them back on to two offline copies of magnetic media.

I agree with Lorne, for records with value, evaluating the information architecture is an important step.

Rob

——————————
Robert Bogue
President
Thor Projects LLC
——————————


 

a083c-4fc13-image-asset

Hi Robert, the documents on CD’s are considered permanent (historic, institutional, board decisions etc) The CD collection is about 20.000 and they hold about 150.000 images (scanned PDF). They are sometimes assessed for research and I checked the number of requests for information to be about 100 a year. It is worth to discuss full migration to SOP.

——————————
Daan Boom
Director
CCLFI
——————————


a083c-4fc13-image-asset

 

There are a number of migration tools that will migrate from FileNet to SharePoint Online. The same tools can migrate file shares.During migration the tools can add metadata if required which can be based on file location etc.
SharePoint Online has basic OCR built in.
You should think carefully about the IA if migrating such large quantities of files.

——————————
Randy Perkins-Smart
Director
Qaixen
——————————

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.