Garbage in, Garbage out – Intelligent Information: 3 Conditions

Posted by

Intelligence as a Prerequisite and as the Result

Knowledge is power; information is the key to success. That has been so, is so and will continue to be the case. However, what changes over time is the way we collect information and then gather and share knowledge from it.

In addition to humans, the machine is increasingly involved in this process, which opens up completely new possibilities. In fractions of a second, masses of data can be generated, evaluated and made available today as useful information.

In some areas, however, it is not enough to simply touch a button.

Experience shows that, in many companies today, knowledge is generated and collected in digital form – for example in wikis, documents, presentations and mail systems. This means that, through specific queries or searches, any kind of information in these documents and systems should be retrievable – be it an automatically generated overview, a work instruction, or technical illustrations of application architectures.

But, in reality, there are limits to the intelligence of such information. There are pieces of information in the system that are outdated and no longer correct. There are items of information available in different variants and versions. As a result, they are no longer unambiguous or are even contradictory. Other information, on the other hand, is no longer available at all. The input determines the output. Garbage in, Garbage out.

To create intelligent information, information must fulfil three conditions with all facets: Scope, quality and availability.

Scope: What is the Necessary Scope?

Typically, documents and information within the company tend to grow wild over time. This leads to the assumption that all necessary information is available… somewhere.

In practice, however, decisions often have to be made on the basis of insufficient information – the necessary information simply does not exist. The opposite scenario is not uncommon: The masses of information are overwhelming, so that screening and evaluation does not allow for quick decision-making. Would it not be helpful to have all the information on a topic in exactly one place – and to exactly the extent that is required for a specific situation?

The way to get there:

What information has to be available? (Information Need)
Comparison with information actually available.
Fill in the blanks (Information Gap).
Remove the superfluous parts.
It sounds simple, but there is more to it than that…
Simple, but there is more to it than that…

Information Need
… because how do we know what information is needed in the company? To this end, those who use the information should be identified and interviewed (Information Users). They are the stakeholders and set the goals and the scope.


Once the scope and information structure have been worked out in detail, the comparison with the existing information is the logical next step. Experience has shown that the inspection (= Capture) of all available information is a task that can only be accomplished with disproportionate effort. Therefore, the following applies: Only search for what is needed.

Information Gap

… and this information flows into the information platform as a draft. What does not exist must be created. And what is not needed is consequently winnowed out and archived.

Quality – Which Items of Information are Useful?

They are used “only as drafts” – because the most beautiful diagrams are useless if they are based on false data. If information in the company has not been provided in accordance with prescribed, uniform rules, then important quality characteristics are often not fulfilled. For intelligent information to emerge, the quality of the basis must be right. There are methods that check and classify the quality (Evaluate).

As a matter of principle, information should not only be correct but also up to date, complete and unambiguous. “User-friendliness” also has an impact on quality. Good language and structure, as well as the uniform use of terms, help to make it easy to comprehend and prevent misunderstandings. At the same time, the information user should always find the information with the desired type and depth when searching.

Availability – Now Where did I Have…?

Often, the information user knows that a certain item of information exists – but he simply cannot remember where he last saw it. If he cannot find it, the information is effectively not available and therefore useless. To ensure rapid retrieval, metadata is invaluable. The search function can use metadata to filter information by relevance. However, availability does not only comprise the searchability, but also the issue of access/authorisation. Anyone authorised to access a particular piece of information must be able to access it at any time.

All Things Considered…

… systems for intelligent information are all well and good, but even the best system cannot achieve the impossible.

While there are now possibilities for checking numerous quality criteria and the availability of information with the help of tools, they are hitting their limits in other areas – some information is simply not available or does not exist in written form.

Moreover, while a tool can make assumptions about the quality of information, making adjustments in order to raise the quality of the information to the standard is ultimately the responsibility of the person who creates it.

As is so often the case in life, if you don’t know your goal, you can’t find your way – or in other words, if you know your requirement (Information Need), you can find your way there. The information gap can be determined and closed by comparing it with existing information. If the demands for quality and access are taken into account at the same time, then the way is paved for intelligent information.

Avato consulting ag

We are currently experiencing this in a project in a corporate IT.

At first it was very difficult to set the goals of the Information Management initiative. Then we found many tens of thousands of documents. The samples that we look at are often outdated, inaccurate and usually not very useful.

Garbage in, Garbage out seems to be a typical Information Management problem in IT departments.

We are now considering using analytics to prequalify via metadata and keywords.

Avato consulting ag

Every organization has ROT, that is simply a fact of corporate life. Heck, I have ROT in my own files. It’s just that it doesn’t cause anyone ELSE a problem 😉

Any company that is serious about tackling their information should definitely be looking to utilize appropriate automated discovery, analytic/identification and classification tool(s) as part of the effort. And since the majority of these tools on the market are rapidly gaining some pretty impressive capabilities courtesy of machine learning and AI, it is already at a point where it has become a de facto “well…d’uh!” aspect of achieving success.

Lorne Rogers Vice-Chair, ISO Trustworthy Content/Document Management President/Senior Management Consultant Aria Consulting Ltd.

Lorne I agree 100% however people fail to realize that if you don’t take the human factor into consideration you will still get garbage in garbage out. If harvest criteria are not well tuned, culled, enhanced, and reviewed for accuracy before machine learning and auto classification sets in an organisation can end up wasting a huge amount of money with little to no ROI. In dealing with unstructured data projects, success must be clearly defined and rational steps and ample resources in place and available to get there. When done correctley leveraging AI, machine learning, and auto classification can be a beautiful thing!!

Austin Energy

Definitely agree, Amy! In fact, with the most recent client I worked with, I talked about the criticality of the people change aspect so much at the beginning of the engagement, he forbid me to mention it again, LOL!


Even professionals provide us with garbage in.

We had to recognise and extract data from a form that the document management department develloped. They expected 300.000 forms to be handeled.

The recognition software failed at a large percentage.
After a search what went wrong, it came out that the definite form was develloped with 20 proof of concept forms, which was tested in 10.000 so 200.000 forms in the stack were different in text and layout, and we only recognized the definitive form.

Its a quick fix when you discover what goes wrong, but discovering the 20 poc forms took us some time. Taking form developers and marketing by the hand and showing them what recognition can do, and what not will fix this in the future (i hope)

That’s exactly the problem that we have in the IT again and again.
The IT SMEs produce tons of documents over time and everyone believes that all the information needed is actually somewhere. But unfortunately that’s not the case. We find a lot of garbage and at the end we miss a lot of important information such as technical documentation or work instructions.

Avato consulting ag

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.