Generating PDFs with huge number of pages

This topic has 5 replies, 2 voices, and was last updated Feb 09, 2012
10:03:57 by jlessner.

Viewing 6 posts - 1 through 6 (of 6 total)

Author

Posts
jlessner
February 8, 2012 at 10:36
#26677
Hello everybody
In one of our projects we have the strange requirement that we must render a PDF file with a huge number of pages (currently 15.000!!) and we are wondering if there is a way to achieve that with PD4ML. I have seen that there is a rendering method which is based on an input stream reader and an output stream which looks quite like a streaming interface being suitable for “endless” data. So I made some tests with this method but it still seems to keep too much page-dependent data inside during the rendering process, causing an OutOfMemory exception after a while.

I know that this is a weird requirement, but could you give us some advice if there is a chance for rendering such things at all with PD4ML?

Regards,
Jan Lessner
PD4ML
February 8, 2012 at 13:28
#28865
In general it is not possible to implement the “endless data” PDF output in HTML-to-PDF conversion scenarios.

PD4ML does all the layout of all the pages in memory, before it writes anything out. The reason is following: let’s say the source HTML layout is built as a single huge table. Any cell, let’s say, on page #350 whose width is a bit wider than previous cells of the same column requires re-layouting of previous 349 pages – as it impacts the entire table layout. If the 349 pages are already written to an output stream – the re-layouting is not possible anymore.
jlessner
February 8, 2012 at 14:13
#28866
OK, makes sense. In our monster case the resulting document is in fact a sequence of snippets where each of these makes up a single page. Is there a way to sequentially produce PDF snippets and concatenate them to a single document in a separate phase afterwards? We are not familiar with the structure of PDF (that’s why we love to use PD4ML), so don’t laugh of this is an absurd idea 😉
PD4ML
February 8, 2012 at 14:45
#28867
There are relatively new PD4ML API methods merge(), intended for that. A PDF parser behind them has a limited functionality for the time being and cannot read some third-party PDF documents. But it reads/merges PDFs produced by PD4ML with no problems. So your idea should work.
PD4ML
February 8, 2012 at 15:03
#28868
Hmm… if you can split the huge document to smaller portions, probably even a PD4ML.render( URL[], … ) or PD4ML.render( StringReader[], … ) method should help. It will render each of multiple HTMLs one-by-one, deallocating parsed structures after a portion is converted to PDF.

The following code should reduce RAM utilization if your document has bulky images.
[language=java:1mepq6g0]Map m = new HashMap();
m.put(PD4Constants.PD4ML_CACHE_IMAGES_IN_TMP_DIR, “true”);
pd4ml.setDynamicParams(m);[/language:1mepq6g0]
jlessner
February 9, 2012 at 10:03
#28869
Thanks a lot for your help. At the moment we optimized everything around PD4ML so far that we managed to create the 15.000 pages. I will keep your suggestions in mind in case we run into memory problems again.
Author

Posts

Viewing 6 posts - 1 through 6 (of 6 total)

The forum ‘General questions / FAQ’ is closed to new topics and replies.