Hi, I am working on defining a taxonomy over Document types and Document categories for an Oil & Gas company to be used as metadata for documents in different systems. One of the first issues I have to tackle is that there is a difference in what people think of when they say “Document type”. In my mind there are 4 levels when classifying documents by Document types:
1. A categorization of different items that are similar, either in form, function or use.
Ex: Project management documents, Governing documents, Meeting documents
2. A general description of the function of the document. These are typically very general, they only define the purpose of the document, the subject or domain of the document is not usually part of the term
Ex: Plan, Specification, Minutes
3. A theoretical document, whose criteria has been specified and described. These are typically defined as a delivery in a project or process:
Ex: Benefits Realization plan, Engineering Numbering System specification, Minutes for Monthly Executive Committee meeting
4. The actualized document:
Ex: Benefits Realization plan for work package A, Engineering Numbering System for development project B, EC minutes of meeting May 2018.
Usually, most people agree on Level 1 (Document category) and Level 4 (Document), but level 2 and 3 are usually discussed as if they were one and the same thing (Document type), though they have very different requirements in document management.
– Level 2 is a very manageable list of terms that can be standardized across systems, level 3 includes hundreds, if not thousands of types which makes standardization across systems challenging.
– Managing templates for level 2 is whole other thing than managing templates for level 3
One issue of not treating level 2 and 3 as separate categories, but lumping them together as “document type”, is that you risk having document types that are not mutually exclusive. For instance: there may be need to categorize something as a “End of project report”, but there may be other reports being created that does not fit in the already defined level 3 categories, which means that a level 2 “Report” term must be introduced, but that means that some users will use the level 3 term “End of project report”, while others will use the level 2 term “Report” for the same kind of document, especially if the metadata list is very long.
Because of this, I think it is wise to at least be aware of the difference between Level 2 and Level 3, as well as the requirements and consequences of the different levels, but in most of the literature, I only come across the terms Document category and Document type/kind – which is not enough to separate all of these layers.
What terms would you use for each of the levels? Would you use “Document type” for both level 2 and 3, or would you distinguish? How would you separate level 3 and level 4 if Document type is reserved for level 2?
Definitely agree with your breakdown. Although I would add an additional wrinkle in the naming discussion around Type” which is the actual file type, ex. Word, Excel, executable, mp4, avi, etc. So, ‘file type’ should be added.
Anyway, for level 2, I generally recommend to clients to consider that as the “keyword” description. For level 3, that is where I usually go with something like “output category” or “function” depending on group consensus.
Does that help?
Thank you for your response.
File type is definitely also a type in this discussion. I left it out because I believe it is already very defined (named “file type”, related to the file name extension), but I have noticed that there are cases where document type and file type have been confused, so thank you for reminding me.
Let me see if I understand you correctly: Would you put Level 2 as metadata in a keyword field, and not define a taxonomy for document type at all? Would you then reserve Keyword for just Level 2 type, or would it be a mix of all different kinds of keywords?
Output category and function are good suggestions for Level 3, I will note those down, thank you very much.
From what you’re explaining, it seems like you may benefit from a Faceted Taxonomy.
Each of your 4 levels would become its own level and when combined would describe the different aspects document. In the case of 2 (function) and 3 (event/purpose/usage?), you’ll want to separate the function from the qualifier. Think of the main question at hand for each level:
1. Purpose – What is the document’s core purpose? Ex. Project Management, Governance, Technical, Consumer
2. Function – What is this document’s function? Ex: Plan, Specification, Minutes
3. Usage – Where is this document used? Ex. Benefits Realization, Engineering Numbering System, Monthly Executive Committee meeting
And an additional level
5. Frequency – How often are these produced? Ex. One time, Daily, Monthy, Bi-Annual, Annual
Levels 2 and 3 can be mixed and matched for different purposes. So you may have a meeting about Benefits Realization and store the minutes, and you may also have created a Plan for that Benefits Realization. Combining the two levels makes a whole and allows the user flexibility to assign the proper metadata as needed.
Also, be wary of confusing document metadata with taxonomy. All taxonomy is metadata, but not all metadata is taxonomy. A good test of whether a field (or level) may be a taxonomy is to see if it can fit in a drop-down list for easy selection. If it can, and the terms are controlled and unambiguous it is most likely a candidate for a taxonomy.
It is also possible to have a main taxonomy for the entire company, but need area-specific taxonomies to cover specific tasks within the area. This adds another level of complexity to a taxonomy and is almost on its way to an ontology… which is another can of worms!
Taxonomies are a complex, living beast for any company. I hope the above makes sense to you. If not, let me know and I’ll see if I can explain in another way.
Acuity Systems, LLC
Hi , I think we are very much aligned, but I can see that you could get confused from my original post, I wasn’t clear enough on my purpose with the post.
We are using faceted taxonomies, and have already defined taxonomies and master data for several areas, but Document type seems to be an especially tricky facet. The reason I listed the four levels was not because we will use all of these levels in the same taxonomy, but because these are different level concepts that are often confused. What typically happens is that when I talk to the end users, they will regularly confuse Level 2 and 3,and call both Document type, and list both Level 2 and 3 in the same list. This is challenging for many reasons, but this is how they talk about their documents in everyday life.
I can definitly see your point about separating the Level 2 and 3 in different taxonomies, however I would be careful about adding to many facets. As a person who loves structures, having several clear cut facets is a dream to work with when categorizing and retrieving information, but there is just no way I would get users to add the 5 facets you mention when storing a document, and we do not yet have the tools for automatically assigning Frequency. We could create facets for every possible concept, but we would have a hard time translating that to metadata for the user, because they will not use it, and we would end up with documents that are not consistently categorized.
One issue with trying to establish facets for all separate concepts, is that it can be difficult to be consistent at the lowest detail level. For instance, I do struggle a bit with your facet 3, Usage. The concepts you list here, are widely different; Benefits Realization for example is a topic or activity in Project Management, the Executive Committee Meeting is a forum, and the Engineering Numbering system is one specific document, the term would not apply to any other document, which goes against my idea of a taxonomy. As mentioned, we can have several facets, but it is incredibly complex to try to establish a facet for every concept in a document title.
Another issue with faceted taxonomies is that sometimes the term known to the end user is greater than the sum of its parts. Take the concept Quality plan. On the surface, one could use the facet function “Plan” and the facet discipline or process “Quality”, and you would have “Quality plan”. However, the process or discipline Quality could easily be responsible for more than one type of plan, which would all be tagged with the same terms as metadata, but which may have vastly different requirements.
I belive that the likely outcome will be that we create a taxonomy over Level 2, which can be implemented and standardized across systems, but I will also need to adress how to manage Level 3 (for which I still need to find a good term) as it is, because this is the Level most users relate to with regards to templates, regulations, requirements and industry standards, and for the reasons listed above, using a pure faceted taxonomy will not be feasible.
Thank you so much for your input, I do appreciate someone challenging me, prompting me to rethink and reformulate my views!
I think that generally you’re thinking about the right problems and in the right way. Facets … but as small as you can get away with… Taxonomy to drive communication … but understanding the tagging problem.
I’d offer this as a way of potentially untangling the situation. When I’m working with folks I’m not focused on “correct” tagging (metadata entry). Rather, I’m concerned with discoverability and findability. That is, I don’t care if they don’t tag a quality plan as a plan or as a quality plan. (I do pay attention to the way I structure the hierarchy of terms but that’s a separate conversation.) What I do care about in this context is what’s the discoverability experience. What happens when they inevitably tag things incorrectly? What are the chances that they’ll still discover it — or find it? For me discoverability is browsing and navigating. In truth this is probably not the largest issue, findability is.
The reality is that for a nested hierarchy of terms in search with facets/refiners, you introduce discoverability of the error and you give the seekers the opportunity to revise their path to allow for the error. If they refine with plans they’ll find the quality plans — along with others. If they refine with quality plans they may exclude some of the results increasing precision but reducing recall. However, users will generally release their refiner and then grab the one at the higher level.
All of this is to say, it’s the right concern but ultimately if you hierarchy within the taxonomy is reasonable you should end up with a search experience that gets you what you want.
Again, I’m a pragmatist. I care about how the user gets what they want while allowing for the messiness that is going to happen with sub-optimal tagging.
I hope that helps.
Thor Projects LLC
I don’t always agree with Robert (LOL!), but I very largely do in this regard. While I’m certainly enamored of the beauty of designing a shallow, ‘enterprise’ taxonomy, deepened by multiple, linked, dep’t / functional stream taxonomies and the beauty of MDM and controlled vocabularies and all the things we are all well familiar with, true success in DM/ECM is defined by the ability for a user to very quickly, easily, and accurately find the content they seek and continue to execute their business function to create the outcomes they actually get paid for ;)). RM has, or should have, different measures of success around compliance and risk management.
Why do I feel that defines true success? Because, in my experience, that is what gets people to actually use the ‘systems’ (both the software and the assemblage of governance and processes around it). The most elegantly designed and described (eg. your quandary that was the source of your initial post around what to name the “levels”) information architecture in the world is utterly useless if the users involved hate using it and try every way from Sunday to work around it (aka using Exchange as a “file store” and “collaboration” environment!).
I agree with other posters that they key criteria are, in most cases, not ‘correct’ classification, but utility in the form of improvements to document findability and retrieval, and to the ease and accuracy of the user classification process. One option might be to combine your level 2 with level 3: Use broader document type terms from level 2 where a more general type of classification will produce adequate results for retrieval, and zoom in with more specific terms from level 3 where greater specificity is required in your context. For example, a consulting firm that manages multiple projects for clients would likely require specific document types relating to various stages in project management, such as project charter, project budget, timelines and resources chart, etc., whereas it may not require much detail in terms of external communications/PR materials; it might be sufficient to have document types of ‘press release’ and ‘white paper’, for example.
Once you have decided what level of specificity is needed for all of the document types in your users’ work, you can combine them into a single ‘document type’ taxonomy for application in the system/platform. This is practical for end users, allowing them to classify and retrieve using a single taxonomic field.
Here’s something related from my experience.
We use OnBase as our records /document management system, and it has an in-house built integration with Maximo. With the integration, users upload and view documents from the Maximo interface but the documents are in fact stored in OnBase. This way, the usage of Maximo is governed by business processes while the RM classification and retention are implemented in OnBase.
We used to store documents in Maximo its earlier days at our company but it was not the right tool for document storage and management, and the above solution was developed.
First off, thanks so much for the responses. I did not clarify in detail in my original request what I am looking for, I am not worried about the documents as they are a small percentage of the “record” information in the application. I am talking data/workflow. The tables that make up the “purchasing record” that create the “inventory record” and flow into the “asset management record” and so on are my concern. I want to take all tabular data that makes the purchasing records and get rid of it at X time, I want to take the tables that create the time sheets and get rid of them at Y time, etc. Dealing with the documents is the easiest part (though not super easy), I am looking at retention and governance for the data that is the record created via workflow in the application. I was wondering if there is some type of application out there to assist in doing that? We have other applications that have add ons that can analyze and let us know the relationships between tables so we can either dispose of or wait until the longest retention period, but in Maximo at this time we can’t even obsolete anything without breaking a workflow.
What have you all done to deal with the data which for us is the record in most cases?
Hi, You may want to contact OpenText which has several solutions related to your request. If you would like to contact me directly, I can put you in touch with the correct person(s) to review the solutions.