File Naming Conventions

Posted by
a083c-4fc13-image-asset
When naming files for general office use, is there still a need to use an underscore, hyphen, or CamelCase? I understand the original purpose was due to how different operating systems saw file names, but in today’s world where “almost” all files are stored and retrieved in a Windows NTFS system, and/or a document management system like OpenText or SharePoint, is there any reason not to just use a space between the words? And I do understand that if you are uploading to a web site, you will get the %20 for each space, but I am talking about the average office document stored in the office network drive or a document management system.——————————
Bud Porter-Roth
Principal Consultant
Porter-Roth Associates
——————————


a083c-4fc13-image-asset

Hi Bud;

No, there is no longer a need to use underscores, dashes, etc., especially when storing in an ECM environment such as opentext or hyland.  On the other hand, storing mission critical data on network drives is a REALLY BAD idea as you have no protections that you get with a trustworthy storage environment   Consider what happened recently to Baltimore with the ransomware attack, and there have been many others, and what we have found is that those organizations that had a trustworthy storage environment, the trustworthy storage sub-systems were able to protect the content, even though the network drives got encrypted.  The important point is protect your mission critical data, don’t rely on IT security methods only, or other protocols related to the OS rather than the ECM environment, as these technologies are used to store and  manage the content throughout its life-cycle.

The original purpose to adding dashes, underscores, etc. was during the late ’90’s the versions of windows available couldn’t handle filenames that didn’t have a contiguous prefix and required all filenames to be ‘7 bit’ along with other limitations.  This was also the time when storage was exceptionally expensive and organizations were only starting to consider using ECM for storing content other than engineering files.  This resulted in many people using the underscores, dashes,etc. so that the file could be stored in windows.   In relation to using ECM technologies, there is no value as the filename itself is no longer relevant, but rather usually becomes an additional descriptor loaded into ECM (if desired) and used to add optional information along with the file into ECM, using index values to enable users to locate previously stored information and not rely on file naming conventions by the users.  This approach also supports the ability to ingest the content into a trustworthy storage sub-system (TSS) where the information is protected from malware and other malicious attacks.  This is accomplished through the implementation of a trustworthy environment.   We (the ECM and Digital Transformation Industry Standards program) have finished preparing several industry standards that will be of value to you and your clients such as ISO 18829 (ECM Assessments), ISO 15801 (Trustworthy Content Management) and ISO 22957 (ECM Design and implementation).   I highly recommend you get these and review them.  The program is also in the process of preparing several ISO (international) best practices related to planning and implementing Digital Transformation and Trustworthy ECM Environments.

For more information please contact either the ECM Standards program Director, Betsy Fanning at betsy.fanning @ 3dpdfconsortium.org or myself (I am the chair of a few of the committees) at blatt @ eid-inc.com    I hope this helps.

——————————
Robert Blatt, MIT, LIT, CHPA-III
Principal Consultant, Electronic Image Designers (EID).
AIIIM Fellow #175
Chair, Trustworthy Storage
Chair, Trustworthy Document Management & Assessment
Chair, ECM Implementation Guidelines
ISO Convenor: 18829, 18759, 22957, 18759)
US Delegate to ISO TC/171
TC/171 Liaison Officer to TC46 SC11
TC/171 Liaison Officer to TC/272
——————————


a083c-4fc13-image-asset

Robert, good info, as usual, and thanks………I will check out the standards mentioned. Bud

——————————
Bud Porter-Roth
Principal Consultant
Porter-Roth Associates
——————————


a083c-4fc13-image-asset

In my experience, extended characters are usually an issue in long term conversion strategies. I had an EMC conversion move from 30k to 300k based on the conversion of extended characters moving from one system to the next. I think a good naming convention and training in the long run creates a much smoother long term solution.  IMO.


a083c-4fc13-image-asset

Carl, thanks. Migration of files, especially ones with non-standards names is problematic. Bud

——————————
Bud Porter-Roth
Principal Consultant
Porter-Roth Associates
——————————


a083c-4fc13-image-asset
In fact it’s the opposite – CamelCase, hyphens and underscores can be outright harmful, as they may not be discovered by search (where no “clean” version of the name exists in the metadata otherwise).

For example in SharePoint / Office 365 Search:* Search will only match the start of a string – so if the filename is “MasterAgreement.doc” it would come up when searching for master but not if searching for agreement.
* Underscore _ is treated the same as a space, except that it is also searchable itself. So if the filename is “Master_Agreement.doc”, it can be found by master, agreement or even pin-pointed by searching for master_agreement (no quoting needed).
* Hyphen is also treated as a space but it’s also a special character meaning “not”, so searching for it specifically needs quotes. Thus “Master-Agreement.doc” can be found by master or agreement as above, but searching for master-agreement (without quotes) will not find it, as it will specifically leave out anything with the word agreement!

Thus, space is preferable, underscores are a good alternative option, hyphens should be avoided (at least in SharePoint). And The same applies to metadata!

——————————
Pauli Visuri
Consulting Director
Sharepoint City
——————————


a083c-4fc13-image-asset

Pauli, thanks for the mini-tutorial. I don’t think most people would even think about how these three naming conventions affect search. Bud

——————————
Bud Porter-Roth
Principal Consultant
Porter-Roth Associates
——————————


a083c-4fc13-image-asset

Hello Pauli,
Good point, but I would say this speaks more to the software than naming convention.  Tools like Hyland, and OpenText do not have these deficiencies.
So I suppose the important thing is to understand how your users work, store, and search.  Choose the right tools and know how naming convention may limit.
Thanks,
-Rick——————————
Richard Molique
ECM Consultant
IQ Business Group, Inc.
804-614-6445
rmolique@…
——————————


a083c-4fc13-image-asset

Richard –

I think that every tool/software has deficiencies.  However, search (by default) searching only at the start of the word is a side effect of the way that search works — in whatever tool.  When we need to solve for searching in the middle of strings we have to play some games.  I detail these in a post I did about wildcarding @ https://www.thorprojects.com/blog/archive/2016/10/05/search-wildcarding-front-to-back-and-back-to-front/  — I speak in terms of both full-text and SQL searching to make it more understandable.

​I’m not 100% certain about Pauli’s note about a hyphen.  I know that it used to be that hyphens were treated as NOT only when they were preceded by white space but I suppose that it’s possible that the behavior changed.  It would be odd, however, because there’s a relative standard about the behavior of punctuation like hyphen and quotes.

With that out of the way, I don’t believe file naming conventions are necessary any longer — or effective.  Most search engines optimize for titles and support full-text.  With that plus metadata I rarely see people searching by file name.

Rob

——————————
Robert Bogue
President
Thor Projects LLC
——————————


a083c-4fc13-image-asset

Whoa-hey!  Don’t let my post let you think Hyland and OpenText (and others) don’t have deficiencies!!!  There are pros and cons to any system, and I have a whole list of tickets and open bugs to prove it!  🙂  We were talking about filenames.My point was “understand how your users work, store, and search.  Choose the right tools and know how naming convention may limit.”  I do think we should train our users to rely less on filenames…but its a hard habit to break.  Even with metadata, fulltext search etc…its still something that will affect adoption in my view.

Cheers,
-Rick

——————————
Richard Molique
ECM Consultant
IQ Business Group, Inc.
804-614-6445
rmolique@…
——————————


a083c-4fc13-image-asset

​The naming of documents has always been a pet peeve of mine, especially when there are multiple users on the shared drive. After years of convincing folks, this was the decision most people followed: In order to keep similar subject documents in chronological order, the date goes first YYYYMMDD Subject title with no spaces and author if needed
Example: 20190625 AIIM Presentation by Gray   The next time there is a presentation it will fall under this one.
I hope this helps a bit.
Regards,——————————
Rhonda Hazlett
Corporate Document Administrator
Olin Corporation
——————————


a083c-4fc13-image-asset

Rhonda, thanks and an important point if you want files to sort in chron order. And if you do manual versions, as in, V01, V02, etc., you should also use the “0” to allow for correct sorting. Bud

——————————
Bud Porter-Roth
Principal Consultant
Porter-Roth Associates
——————————


a083c-4fc13-image-asset

Excellent thought. Thank you

 

Olin_logo Rhonda Hazlett

Olin Corporation

Document Mgmnt Administrator

 

3855 North Ocoee Street

Cleveland, TN 37312

O +1-423-336-4053

RSHazlett@…

 

The information contained in this e-mail message is intended only for the personal and confidential use of the recipient(s) named above. If the reader of this message is not the intended recipient or an agent responsible for delivering it to the intended recipient, you are hereby notified that you have received this message and any attachments in error and that any review, dissemination, distribution, copying or alteration of this message and/or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately by electronic mail, and delete the original message.


a083c-4fc13-image-asset

Rhonda, Bud, it is not good practice to add dates or version numbers to filenames, except in very specific situations.

Consider the following:
* Files are frequently updated after they have been created. Thus the date will be wrong, unless the user goes to the trouble of changing it, after saving.
* The same applies to version numbers.

I have had to review tens of thousands of files as part of migrations, and in more than a third of all cases any dates or version numbers in filename have been incorrect.

What’s more, in Content Management systems
* Renaming is often even harder than on plain disk storage
* The system handles the versioning, which will be totally messed up if the filename changes in CMS’s such as SharePoint where the name is used as the identifier for the item.

These are my rules of thumb which I teach to users switching to Content Management:

– Filenames are forever. They must not have anything that would need to be changed if the file changes.
– Any variable information such as dates, version or status should be put in the file’s metadata

There are a few exceptions to the above, e.g.
– where a date is an intrinsic part of the file’s identifier – e.g. in a news article of something that happened on a specific date (this was actually the case in Rhonda’s example).
– where an edited file becomes a new and independent “Release” or “Issue”, instead of just an updated version, and the previous releases will have equal status to the new one. This is the case in for example policies or regulations, where past ones will still continue to apply.

Finally – where there IS a date in the file name, consider using the international format, “2019-06-25” instead of “20190625” or any localised format. This will be understood by search engines, and far easier to handle in any migration situation.

——————————
Pauli Visuri
Consulting Director
Sharepoint City
——————————


Paul,

Agree with everything you said (heartily!), except the very last point about the dates.  I would strongly recommend underscores versus the dashes in the date as the dashes could be interpreted by some storage systems, and more likely by search providers, as a special character, thus being potentially problematic.

 

Aria

Aria Business Card-0۸   Aria Business Card-۱۰


a083c-4fc13-image-asset

Lorne – yes I know, and in fact that is my general advice to users as well (avoid dashes, use spaces or underscores instead).

However, the standard ISO8601 date format specifies dashes.

At least SharePoint search is able to tell this apart from a special character in searching, it recognises the format and understands it’s a date.

——————————
Pauli Visuri
Consulting Director
Sharepoint City
——————————


a083c-4fc13-image-asset

I personally see having a restriction around using spaces in a file name as an artificial restriction that doesn’t recognize the way that collaborative users create documents. They typically want a meaningful, friendly file name and unless we are somehow making the ‘mistake’ of trying to embed meta data in the file name, it should be irrelevant to the storage platform being used to manage the document whether the name contains, in particular, spaces. I have seen restrictions on other characters that can impact delivery – certain special characters used in html for example…. ​

——————————
Peter Rahalski, CIP
Information Solutions Architect
EXCELLUS HEALTH PLANS
——————————


a083c-4fc13-image-asset
​Good Morning All,

I have appreciated everyone’s input to this discussion over the last several days. The search aspect, with differing results dependent upon the software or OS, is certainly a tip that everyone can benefit from regardless of their role.Our challenge at the State Archives is digital preservation of files received from every agency that are of historical / archival value. These files can be in multiple forms and formats and transferred as soon as 1 year (in recent and relevant forms/formats) to decades old (where formats may be obsolete).  Most are unstructured.  Many have no file name convention.  Those that do are sometimes still problematic.

With the introduction of lengthy file names and ‘anything goes’ for a file name, we are finding many fail the migration/conversion process due to special characters in the file name. There is something to be said of days of old ‘8.3’ file names, where no spaces or special characters were allowed and the file extension always followed the period.  We provided training for many years on naming conventions, encouraging no spaces and no special characters other than underscore and hyphen.  We promote and encourage the use of YYYYMMDD date format and where versioning or other sequential numbering is relevant, to always zero-fill to aid in uniformity and sort order.

Due to special characters being problematic in migration, conversion and preservation, when we receive a batch of files, one of our steps in readying the files for preservation is to run a file renaming utility, such as File Renamer or Bulk Utility to discover, and remove, special characters.  We search for ALL special characters, including those we accept (underscore and hyphen) using this expression:  [][!”#$%&'()*+,./:;<=>?@\^_`{|}~-]

After learning of the potential issues with the use of hyphen in SharePoint, we may need to reconsider.

I would caution anyone to build file naming rules or guidelines around the ‘software’ or ‘system’ in use on today’s ‘flavor of the day’.  Instead, realize that with the rapid technology changes, what is in use today was not in use 10 years ago and likely will not be in use 10 yrs from now.  In order to bridge all systems, and to plan for migration and preservation, stick to the basics of a meaningful file naming convention; similar to the one mentioned earlier in this string (I believe it was something like YYYYMMDD_MeaningfulFileName_Author_Version) or other similarly simplistic but meaningful name in your area of business.
——————————
Linda Avetta
Digital Archives and Records Division Chief
PA State Archives
Pennsylvania Historical & Museum Commission
——————————


a083c-4fc13-image-asset

I can confirm that SharePoint does NOT use an embedded hyphen as not.  It only has this behavior when it’s preceded by white space.  See the example/test image below.
Show SharePoint and embedded hyphens——————————
Robert Bogue
President
Thor Projects LLC
——————————


Did you happen to get a chance to try it in the modern/unified search?

 

Aria

Aria Business Card-0۸   Aria Business Card-۱۰


a083c-4fc13-image-asset

Modern Search still uses the same engine.  I did for kicks just trigger Microsoft search and the same there.  white space + hyphen = NOT, just hypen does not.
——————————
Robert Bogue
President
Thor Projects LLC
——————————


a083c-4fc13-image-asset

All, this has been an excellent discussion and has helped me to better understand the problem/issues. Would anyone care to provide their document naming conventions policy? You could scrub it clean of company names but it would be interesting to see. I plan to consolidate the main points of this discussion and can post it as a paper if anyone is interested.
Again………thanks to everyone who contributed.
Bud——————————
Bud Porter-Roth
Principal Consultant
Porter-Roth Associates
——————————


Hi Bud,

I put one I have up on my site if you want to take a look:

https://ariaconsulting.net/resources-blog

 

Aria

Aria Business Card-0۸   Aria Business Card-۱۰


a083c-4fc13-image-asset
Thanks Bud – that would be great.

Supplying two attachments –

1) NamingConventions.  This is used for specific naming conventions for assets in the Archives.  So, although they may have valuable information for us, they may be over the top for you all … it’s still food for thought.  The underscores provide the various field breaks. Parsing the data with this separator is very easy to do to create indexes, lists, etc.  It works very well for our institutional holdings.  But may not work for your environments.

2)  NamingConventions_PresentationExcerpt.  This was used for initial training for staff.  We sometimes use it as a refresher for new hires who will be working on scanning projects.  We should probably train ALL new hires, because as someone else noted, as soon as folks start doing things their own way, then no one can decipher what’s what.

As you can see, both are very dated, but the general concepts remain and are followed today.  On the presentation excerpt, you will note plenty of screen shots that are dated (including the format types which should be updated such as .docx, .xlsx, pdfa, etc.)  Some of the slides may not be comprehendible without the trainer – a few are exercises for the participants (to identify what is correct, and to rename properly those that are incorrect).  But either way you’ll still get the gist.

Use whatever you feel relevant.  If you have any questions, please ping me.

Linda

Linda Avetta | Division Chief | CGCIO

Digital Archives & Records Division

Pennsylvania State Archives

PA Historical & Museum Commission

The Commonwealth of Pennsylvania
1825 Stanley Drive | Harrisburg, PA 17103

Phone: 717.705.6923
Visit us on the web at PHMC.state.pa.us and DigitalArchives

 

Attachment(s)


a083c-4fc13-image-asset

 

Linda, thanks for sending me the 2 files, I appreciate it very much. I did a quick look and they look good. Bud

——————————
Bud Porter-Roth
Principal Consultant
Porter-Roth Associates
——————————

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.