I’m working on a proof of concept to introduce an automated metadata classification tool to our instance of SharePoint Online.
I’ve been asked to provide a set of success criteria. I can measure, albeit with a staged exercise, the time saved by the tool but other benefits are more intangible. I was wondering how others in the community have approached this?
In our automated classification projects we define document classification into critical and non-critical documents.
We assign higher targets for critical than non-critical.
For metadata, I would think you extend this to include higher targets for metadata on critical documents than non-critical.
For the non-critical content (ROT) there is a diminishing ROI trying to reach the same targets for critical.
VP Banking Solutions
I think the success criteria would heavily depend on what your strategic goals are for your metadata classification. So, what I mean is that if your goals are say mostly aligned to RM considerations such as correct assignment to your retention schedule categories then that would obviously point to success criteria aligned in that respect. If, however, you’re more concerned about metadata assignment for say search purposes such as scoping, role-alignment, DAM considerations, facet filtering, and so on, then gathering your metrics along those axes would seem to be the most logical.
Does that make sense? Does that help?
Is all the content within a well designed and signed off business classification scheme?
Has a cleansing and weeding exercise been undertaken? This really is a must when migrating from an old situation to a new one. (options are ‘delete’, ‘Archive’, ‘use’.)
How many items of content are there?
Have the high level content owners been assigned? They will make sure any cleansing and migrating work gets done for their area.
Have you done the requirements and created the metadata model?
I have designed and supervised this type of exercise quite a few times but never with tools. As I’m sure you know these tools have only limited success. I have always ensured that the business put in some effort after designing the BCS, and metadata model. Then as people move their content from the old to the new, many metadata items can be inherited. Document type is a very important metadata item and this can sometimes be gleaned from the file name via the use of simple tools.
As for success criteria,
A. all items of content must have a document type this can be measured.
B. if for example you have a high level folder (or site as it’s SP) for the Finance team, you will know how many items there are then all items under that folder. They should have the value ‘Finance’ applied to them for the ‘Business Unit’ metadata item for example.
It’s hard to design success criteria without knowing how many items should have a particular metadata value assigned to them and you can’t do that without a metadata model or BCS.
I hope this helps, but it may raise more questions than answers.
Agile Information Management Ltd
I think it is important to set expectations for launch versus persist long term function.
Out of the gate, I usually shoot for 90-90-90 rule as far as KPIs for first pass metadata and data classifications we have used are:
1. Relevancy of Data Set – 90% of documents in the data set should be relevant document (Speaks to the aforementioned “Weeding Out” of unnecessary information).
2. Auto-Classification Data Set / Total Data Set = 90% is a good initial benchmark
3. Of the exceptions (10% that did not auto-classify), 90% of those should be readily identifiable as to why they did not classify.
Obviously these go up as time progresses, data sets get refined, etc.
Ultimately being a Six Sigma guy I always shoot for the magical 3.4 Defects per million number or 99.99966% percent accuracy… However I have yet to achieve those numbers yet (if anyone has, please right me a manual as to how!). Ideally by the 6 month mark, I am hoping to see 97 and above. 1 Year in 99% and up.
I hope that gives you a basic idea.
We are currently implementing automatic classification in our enterprise intranet environment (SharePoint 2013) – with the intent to expand to other systems and environments – all within the scope of our corporate information governance program. We are still refining the classification logic, but our target is to soon reach an 80% tagging accuracy in the two primary security classification columns. These columns use managed metadata that matches our enterprise security classification taxonomy. We are determining accuracy based on assessment of a random selection of tagged documents by designated knowledge experts across the business.
Thank you for the responses so far.
One of the challenges is that we do not know how many documents will pass through the classification service but we know it will be a lot! (We have 1PB of unstructured content that is considered to be in scope). I’m considering snap sampling to validate that the target document types have been correctly inferred from the content. I know we will not get it right 1st time and it will not be perfect! So I like the idea of a few passes with a view of increasing the accuracy of assignment. I also recognize that there will be a plateau to the accuracy and that will be related to the level of effort we put into generating the classification clues.