Creation of PDFA files from scanned PDFs

Posted by

a083c-4fc13-image-asset
Hi everyone,

I am having a challenging time trying to convert pdf files into PDA/A files and wonder if there is anyone out there who might be able to offer some advice?

The pdf files have been created in numerous ways, such as being scanned from a paper copy or being rendered in a document management system.

I am using Adobe Professional 9 to convert them.

Any ideas would be gratefully received!

Kind regards
Julie

——————————
Julie Morris
Rolls-Royce
——————————


Hi Julie,

Some additional information on what sort of challenges you’re experience might help to troubleshoot.  However, right off the top, I would suspect that the ancient version of the software you’re using may play a part.

Aria

Aria Business Card-0۸
Aria Business Card-۱۰


a083c-4fc13-image-asset

Thanks for your response Lorne.  One of the challenges is that we can only print to PDF/A, we can’t save as a PDF/A.  When we are able to print to PDF/A, the file sizes are huge in comparison to the original scanned pdf.  E.g. 2.5MB increases to 32MB.  Also, we keep getting preflight errors, one of which related to xmp metadata​.

Do you think the old version of Adobe Pro could be to blame?

Many thanks
Julie

——————————
Julie Morris
Rolls-Royce
——————————


Hi Julie,

The Acrobat print driver has long had a tendency to balloon file sizes, especially with an image (such as a scanned input), so that is no surprise.  I would suspect that you’re doing this on Win10 machines?

Also, as far as the xmp metadata error, are you attempting to make any changes whatsoever to the file after it has been converted to PDF/A-1a?  Such as trying to do a verification?

Oh, and Hans brings up a point to consider that there are solutions out there beyond Acrobat desktop to be able to do file conversion and optimization in volume.

Aria

Aria Business Card-0۸
Aria Business Card-۱۰


a083c-4fc13-image-asset

We are not attempting to make any changes following the PDF/A conversion but we have been selecting PDF/A-1b as we understood this was the option for less structured documents.

Thank you​

——————————
Julie Morris
Rolls-Royce
——————————


OK, sorry, thought you had indicated you were using 1a.  The xmp error in preflight for 1b is something that is resolved in later versions of Acrobat.

I would suggest getting 1 seat of current Acrobat DC on a monthly subscription (it’s cheap that way) and trying it on the same machine(s) that you’re having problems and see if the problems go away.  If they do, you can look at getting additional subscriptions as required.

Aria

Aria Business Card-0۸
Aria Business Card-۱۰


a083c-4fc13-image-asset

Yes, I think we will look into purchasing a DC licence and hopefully that will help us.

Thanks again – your suggestions are much appreciated!​

——————————
Julie Morris
Rolls-Royce
——————————


No problem, Julie.  Please feel free to send me one of the company’s finished products in return!��

Aria

Aria Business Card-0۸
Aria Business Card-۱۰


Have very little experience using Adobe Professional – but if you have thousands of documents to convert then consider to use a development toolkit to convert files. We are successfully using Aspose.net in an automated setup converting PDF-files from many sources into PDF/A without any challenges. Aspose toolkit exists for .net and for java but do require a developer for this as well – and other suppliers delivers toolkits for the same.

But if you are using Adobe professional remember to use the optimize functionality – that can reduce the PDF files size with a large % depending on how the original PDF was produced – something about how a PDF file is build up in layers in the structue behind that are simplified with the optimize functionality.

——————————
Hans Kofoed
ECM Lead
Danske Bank
——————————

Hi Hans,
Thank you for your advice.  We have used the optimise functionality but before the PDF/A conversion as I didn’t think you could use this after the PDF/A conversion.​We may need to look at other software options but currently we are just trying to use what we have available to us.

Thanks
Julie

——————————
Julie Morris
Rolls-Royce
——————————


Julie –

I’ve seen that you’ve got some solid answers from Loren and Hans but I thought I’d offer a few quick thoughts.

1) For any sizable volume, using Acrobat (desktop) for conversion will not be your best option.  There are a number of tools that can process all the files in batch.
2) Consider whether you need PDF/A – Certainly if you’re required to use the format do, however, scanned PDFs are really a thin PDF wrapper around the TIFF scanned image.  They’ll be stable for a VERY long time.  I’d consider whether you can defer this process for scanned PDFs.  (As a sidebar, I’d recommend doing OCR on all scanned PDF images if that isn’t already a part of your process.
3) Consider the appropriate scan resolution if you must retain the documents long-term.  Storage is cheap but the move from 300dpi to 200dpi is a 4:9 conversion on file size.  If you don’t need the precision, I wouldn’t keep it.  Thus, if you convert the files you might have the conversion down sample the files to a “storage” resolution.

Good luck.

Rob

——————————
Robert Bogue
President
Thor Projects LLC
——————————


a083c-4fc13-image-asset

Thank you for your suggestions Rob​.  We’re undertaking some testing and trying out a few different things from the many suggestions we have received from this community which is so helpful.  I’m positive with all the advice we are receiving that we’ll get a solution that works!

Thank you
Julie

——————————
Julie Morris
Rolls-Royce
——————————


a083c-4fc13-image-asset

We couldn’t agree more on your recommendation to OCR, Rob.  We do OCR in bulk in our conversion process ​through Action Wizard.  It does take extra time but we think it’s well worth it.  And, yes, I believe you would need Adobe Acrobat DC for this.  I can’t remember if it was available in the Pro version or not …

——————————
Linda Avetta
Digital Archives and Records Division Chief
Pennsylvania Historical & Museum Commission
——————————


a083c-4fc13-image-asset

One more tidbit.
​We are wondering if you are having accessibility issues?  We had experienced this and to combat it, needed to set up the Autotag Document function in the Action Wizard.  In addition, when converting such documents as maps, site plans and other non-WORD type documents, we found it was better to OCR before the Action Wizard.See attached for what our Edit Action looks like for these types of docs for accessibility.

——————————
Linda Avetta
Digital Archives and Records Division Chief
Pennsylvania Historical & Museum Commission
——————————

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

WPChat
%d bloggers like this: