PD4ML v4 Forums PD4ML v3 Archived Forums (Read Only) General questions / FAQ Generating PDFs with huge number of pages

Viewing 6 posts - 1 through 6 (of 6 total)
  • Author
  • #26677

    Hello everybody
    In one of our projects we have the strange requirement that we must render a PDF file with a huge number of pages (currently 15.000!!) and we are wondering if there is a way to achieve that with PD4ML. I have seen that there is a rendering method which is based on an input stream reader and an output stream which looks quite like a streaming interface being suitable for “endless” data. So I made some tests with this method but it still seems to keep too much page-dependent data inside during the rendering process, causing an OutOfMemory exception after a while.

    I know that this is a weird requirement, but could you give us some advice if there is a chance for rendering such things at all with PD4ML?

    Jan Lessner


    In general it is not possible to implement the “endless data” PDF output in HTML-to-PDF conversion scenarios.

    PD4ML does all the layout of all the pages in memory, before it writes anything out. The reason is following: let’s say the source HTML layout is built as a single huge table. Any cell, let’s say, on page #350 whose width is a bit wider than previous cells of the same column requires re-layouting of previous 349 pages – as it impacts the entire table layout. If the 349 pages are already written to an output stream – the re-layouting is not possible anymore.


    OK, makes sense. In our monster case the resulting document is in fact a sequence of snippets where each of these makes up a single page. Is there a way to sequentially produce PDF snippets and concatenate them to a single document in a separate phase afterwards? We are not familiar with the structure of PDF (that’s why we love to use PD4ML), so don’t laugh of this is an absurd idea 😉


    There are relatively new PD4ML API methods merge(), intended for that. A PDF parser behind them has a limited functionality for the time being and cannot read some third-party PDF documents. But it reads/merges PDFs produced by PD4ML with no problems. So your idea should work.


    Hmm… if you can split the huge document to smaller portions, probably even a PD4ML.render( URL[], … ) or PD4ML.render( StringReader[], … ) method should help. It will render each of multiple HTMLs one-by-one, deallocating parsed structures after a portion is converted to PDF.

    The following code should reduce RAM utilization if your document has bulky images.
    [language=java:1mepq6g0]Map m = new HashMap();
    m.put(PD4Constants.PD4ML_CACHE_IMAGES_IN_TMP_DIR, “true”);


    Thanks a lot for your help. At the moment we optimized everything around PD4ML so far that we managed to create the 15.000 pages. I will keep your suggestions in mind in case we run into memory problems again.

Viewing 6 posts - 1 through 6 (of 6 total)

The forum ‘General questions / FAQ’ is closed to new topics and replies.