The most actual state of PD4ML v4 usage examples source code can be found on GitHub: https://github.com/zxfr/pd4ml-examples

1.1.Getting Started

The example demonstrates a reading of the source document from an HTML string and a writing of a conversion result to a temp file.

The conversion relies on default settings of PD4ML: output format is A4, 10mm margins etc.

After the conversion is done, the resulting PDF is open with a default PDF viewer application.

PD4ML pd4ml = new PD4ML();

String html = "TEST<pd4ml:page.break><b>Hello, World!</b>";
ByteArrayInputStream bais =
        new ByteArrayInputStream(html.getBytes());

// read and parse HTML
pd4ml.readHTML(bais);

File pdf = File.createTempFile("result", ".pdf");
FileOutputStream fos = new FileOutputStream(pdf);

// render and write the result as PDF
pd4ml.writePDF(fos);

// alternatively or additionally:
// pd4ml.writeRTF(rtfos, false);
// pd4ml.writeDOCX(docxos);
// BufferedImage[] images = pd4ml.renderAsImages();

// open the just-generated PDF with a default PDF viewer
Desktop.getDesktop().open(pdf);
PD4ML pd4ml = new PD4ML();
    	
String html = "TEST<pd4ml:page.break><b>Hello, World!</b>";
StringReader bais = new StringReader(html);
     	
File pdf = File.createTempFile("result", ".pdf");
FileOutputStream fos = new FileOutputStream(pdf);
// render and write the result as PDF
pd4ml.render(bais, fos);
    	
// open the just-generated PDF with a default PDF viewer
Desktop.getDesktop().open(pdf);

E001GettingStarted.java

1.2.Set Page Format

Page format and page margins settings are represented with new com.pd4ml.PageSize and com.pd4ml.PageMargins classes correspondingly.

PageSize class has already predefined constants for commonly used paper formats. A definition of an arbitrary page format (measured in pt or mm) is also possible, of course.

Both PageSize and PageMargins settings can be applied to selected range of pages, distinguished by scope attribute. Multiple calls of setPageSize() or setPageMargins() are allowed. By an overlapping/conflict of the page ranges, a later call wins.

If scope attribute is omitted, the setting is applied to all document pages.


// define page format for the first page
pd4ml.setPageSize(PageSize.A5, "1");

// define landscape page format for the second and following pages
pd4ml.setPageSize(PageSize.A4.rotate(), "2+");

// reset page margins for the first two pages
pd4ml.setPageMargins(new PageMargins(0, 0, 0, 0), "1-2");

// set page margins for the third and following (if any) pages
pd4ml.setPageMargins(new PageMargins(0, 0, 0, 0), "3+");

pd4ml.setPageSize(PD4Constants.A5);

// alternatively, define landscape page format  
// pd4ml.setPageSize(pd4ml.changePageOrientation(PD4Constants.A5));

pd4ml.setPageInsets(new java.awt.Insets(0, 0, 0, 0));

E002SetPageFormat.java

1.5.Set Page Background

setPageBackground() API call is intended to define target media background layout. The layout is defined in HTML: in the simplest case it can be just a scanned form image (i.e. <img width=100%; height=100% src=form.jpg>) or it can be more sophisticated HTML/CSS/SVG code.

The HTML code may even include the placeholders: $[page] – to be substituted with current page number; $[total] – total number of pages; $[title] – document title as defined in <title> HTML tag or overridden with setDocumentTitle() API call.

The background is rendered for all available target media space, ignoring specified margins (if any).

Optional scope parameter allows to apply background to a specified range of pages.


// define page background for the first page
pd4ml.setPageBackground("<div style='width: 100%; height: 100%; background-color: rgb(228,255,228);'></div>", "1");

// define page background for the first page
pd4ml.setPageBackground("<div style='width: 100%; height: 100%; background-color: rgb(255,228,228);'></div>", "2+");

PD4PageMark pageDecoration = new PD4PageMark() {
	@Override
    public Color getPageBackgroundColor(int pageNumber) {  
        if ( pageNumber == 1 ) {  
            return new Color(228,255,228);  
        } else {  
            return new Color(255,228,228);  
        }  
    }  
};

// assign page footer (in the case it only specifies the background) 
pd4ml.setPageFooter(pageDecoration);

E005SetPageBackground.java

1.6.Set Page Background Inline

Usage of a proprietary <pd4ml:page.background> HTML tag as an alternative to API background definition


<html>
<head>
<title>Page background example</title>
<style>BODY {font-family: Arial}</style>
</head>
<body>
<pd4ml:page.background>
<div style='width: 100%; height: 100%; background-color: rgb(228,255,228);'></div>
</pd4ml:page.background>
First Page

<pd4ml:page.break>
<!--
// override the previously defined background with a new one starting from the current page
// A similar can be achieved if you place the page background definition to the top of the
// doc and set scope='2+' attribute
//
// Note: the style is applied to <pd4ml:page.background> tag in the case
-->
<pd4ml:page.background style='width: 100%; height: 100%; background-color: rgb(255,228,228);'></pd4ml:page.background>
Second Page
</body>
</html>

E006SetPageBackgroundInline.java

1.7.Set Page Watermark

PD4ML provides an easy way to utilize native PDF watermarking. PDF watermarks can be configured to only be visible in screen viewers, in printed output or both.

As usually in PD4ML, a watermark layout can be defined using HTML/CSS/SVG code (unfortunately no placeholders like $[page] or $[total] supported). It is possible to control a watermark position, opacity, angle, scale and a page range to apply.

See setWatermark() API call documentation.

// define watermark for the first page
pd4ml.setWatermark("<b>WATERMARK</b>",
20, // offset X
0, // offset Y
.3f, // opacity
30, // angle
9, // scale (1 = 100%)
true, // should the watermark be visible in PDF viewers?
true, // should the watermark be printed?
"1"); // page range to apply

// define watermark for the second and following pages
pd4ml.setWatermark("<b style='color: tomato'>WATERMARK</b>", 20, 0, .3f, 30, 9, true, true, "2+");
PD4ML pd4ml = new PD4ML();

PD4PageMark pageDecoration = new PD4PageMark() {
	@Override
    public String getWatermarkUrl(int pageNumber) {  
        if ( pageNumber == 1 ) {  
            return "https://pd4ml.com/i/logo.png";  
        } else {  
            return "https://pd4ml.com/i/logo.gif";  
        }  
    }  
	
	@Override
	public int getWatermarkOpacity() {
		// image opacity in range from 0 to 100
		return 30;
	}

	@Override
	public Rectangle getWatermarkBounds() {
		return new Rectangle(10, 10, 200, 200);
	}

	public String getWatermarkUrl() {
		// as getWatermarkUrl(int pageNumber) already defined, here is the only dummy output 
		// to let PD4ML process watermarks
		return "defined";
	}
};

// assign page footer (in the case it only specifies the watermarks) 
pd4ml.setPageFooter(pageDecoration);

E007SetPageWatermark.java

1.8.Set Page Watermark Inline

Usage of a proprietary <pd4ml:watermark> HTML tag as an alternative to API watermark definition


<html>
<head>
<title>Watermarking example</title>
<style>BODY {font-family: Arial}</style>
</head>
<body>

<pd4ml:watermark style="opacity: 30%; left: 20px; top: 0; scale: 900%; angle: 30deg; media: screen, print;" scope="1">
<b>WATERMARK</b>
</pd4ml:watermark>

<pd4ml:watermark style="opacity: 30%; left: 20px; top: 0; scale: 900%; angle: 30deg; media: screen, print;" scope="2+">
<b style='color: tomato'>WATERMARK</b>
</pd4ml:watermark>

First Page

<pd4ml:page.break>

Second Page

</body>
</html>

E008SetPageWatermarkInline.java

1.9.Set Document Password

setPermissions() method allows to apply the standard PDF security options: define a document password or restrict particular document actions (like a hi-res print).

See a list of applicable permission flags (Allow* and DefaultPermissions).

It is possible to define a positive list of permissions (no password defined):

pd4ml.setPermissions(null, Constants.AllowAnnotate | Constants.AllowDegradedPrint);
pd4ml.setPermissions("empty", PD4Constants.AllowAnnotate | PD4Constants.AllowDegradedPrint, true);

or to disable only selected ones (no password defined):

pd4ml.setPermissions(null, Constants.DefaultPermissions ^ Constants.AllowModify);
pd4ml.setPermissions("empty", PD4Constants.DefaultPermissions ^ PD4Constants.AllowModify, true);

With password:

// protect the document with "test" password. No permission restrictions applied
pd4ml.setPermissions("test", Constants.DefaultPermissions);
// protect the document with "test" password. No permission restrictions applied
pd4ml.setPermissions("test", PD4Constants.DefaultPermissions, true);

E009SetDocumentPassword.java

1.10.Inject Html

With PD4ML API it is possible to inject an arbitrary HTML portion either just after opening <body> or right before closing </body> tag of a source HTML document.

Be careful: with HTML portion it is easy to corrupt the original document layout. As an extreme case, if you inset a beginning of HTML comment with pd4ml.injectHtml("<--", true); API call, you obviously get a blank PDF document.


// insert some content just after the opening <body> tag:
pd4ml.injectHtml("Some new content to the top of the document", true);

// insert some content before the closing </body> tag:
pd4ml.injectHtml("<p style='color: tomato'>Content to append", false);

E010InjectHtml.java

2.1.Add Style Programmatically

addStyle() API call applies an extra stylesheet to the source document. It can be specified as a style string or an external resource reference.

Multiple invocations of the method are possible. The method takes effect only if called before readHTML().


pd4ml.setHtmlWidth(900); // render HTML in a virtual frame 900px wide
pd4ml.addStyle(
// specify TTF font file for "Consolas" font face (only "plain" style, in the case).

// Here we use free FiraMono-Regular instead of the original Consolas.
// Other font faces to be mapped to PDF viewer standard built-in fonts.

// In the resulting PDF you can see '?' symbols instead of some character glyphs.
// That means the missing glyphs are not defined by any of the available fonts.

// As a workaround create a font dir, place a set of fonts there to cover the
// desired language or character range, index fonts and refer to the dir
// with pd4ml.useTTF() API call. Optionally the font dir can be packed to
// a fonts.jar
"@font-face {\n" +
" font-family: \"Consolas\";\n" +
" src: url(\"java:/html/rc/FiraMono-Regular.ttf\") format(\"ttf\"),\n" +
"}\n", false);

// read and parse HTML
pd4ml.readHTML(new URL("html/H001.htm"));

pd4ml.addStyle(
"H3 { page-break-before: always; }\n" +
"H3:first-of-type { page-break-before: auto; }", true);

H001ConvertHtml.java

2.2.Add TOC

<pd4ml:toc> proprietary tag is substituted with a table of contents, auto-generated from <H1>-<H6> hierarchy.

The generated TOC is an HTML table, whose appearance can be customized using CSS style.

The example illustrates how to inject a table of contents to the top of a document fro Java API.


pd4ml.injectHtml("<pd4ml:toc>", true);

// forces PD4ML to process <pd4ml:toc> tag as it was in the source HTML
// just after opening <body> tag.

...
<body>
    <pd4ml:toc>
    <hr>
    <h1>Pages</h1>
    <h2>First Page</h2>
    <pd4ml:page.break>
    <h2>Second Page</h2>
...

An attribute pd4toc=”nopagenum” added to <H1>-<H6> tags suppresses a page number generation for the marked TOC entries.

H002AddTOC.java

2.3.Page Number Tag

By default <pd4ml:page.number> tag is substituted with a current page number. Optional OF attribute should refer to an HTML element with matching ID attribute value – in the case the tag is substituted with a page number where the referenced element is located.


<html>
<body>
Total pages: <pd4ml:page.number><br>
<a href="#continue1"><b>Section 1</b></a> on page <pd4ml:page.number of="continue1"><br>
<a href="#continue2"><b>Section 2</b> on page <pd4ml:page.number of="continue2"></a><br>
<pd4ml:page.break>
<a name="continue1">Section 1</a>
<pd4ml:page.break>
<div id="continue2">Section 2</div>
</body>
</html>

2.4.Create Bookmarks

PD4ML supports three methods of bookmarks (aka PDF outlines) generation:

  1. From <H1>-<H6> headings hierarchy
  2. From named anchors <a name="chapter1">Chapter 1</a>
  3. From a structure of <pd4ml:bookmark> tags

The API call illustrates the first method.


pd4ml.generateBookmarksFromHeadings(true);


pd4ml.generateOutlines(true); // true = from headings, false = from named anchors 

See also generateBookmarksFromAnchors()

Bookmarks defined with <pd4ml:bookmark> are included into bookmarks structure regardless if it is generated with method one or two.

H003CreateBookmarks.java

2.5.Apply Page Breaks

To force a page break you may use either standard CSS method


pd4ml.addStyle(
"H3 { page-break-before: always; }\n" +
"H3:first-of-type { page-break-before: auto; }", true);

or PD4ML’s proprietary <pd4ml:page.break> tag.

In PD4ML versions prior to v4 <pd4ml:page.break> supports some useful features relevant for PDF output: to rotate page, to change HTML-to-PDF scale factor etc. Also the page break can be conditional. The features are going to be ported to v4 in the forthcoming releases.

H004ApplyPageBreaks.java

2.6.Add Attachment

<pd4ml:attachment> tag makes possible to include an arbitrary document or a binary file to resulting PDF as an attachment. The resource to attach is referenced by SRC attribute.

<pd4ml:attachment> tag can be placed to any reasonable location of a document. In PDF the tag will be substituted with a clickable icon, which opens the attachment with a default viewer application for the attached file type.

There are icon options:

  • graph
  • paperclip
  • pushpin
  • area

where area is a special “invisible” icon, which only turns a neighbor region into a clickable area. The region dimensions are specified with WIDTH and HEIGHT attributes.

The example shows the way how to add an attachment to the top part of the document with an API call.


// with the below code we embed the document source as an attachment to the resulting PDF
// The attachment icon will appear on the top (right side) of the document layout
pd4ml.injectHtml("<div style=\"text-align: right; width: 100%\">"
+ "<pd4ml:attachment style=\"align: right\" type=\"paperclip\" src=\"H001.htm\"/>"
+ "</div>", true);

<div style="text-align: right; width: 100%">
<pd4ml:attachment description="desc" style="align: right" type="paperclip" src="src/html/H001.htm"/>
</div>

H005AddAttachment.java

2.7.Footnotes / Endnotes

<pd4ml:footnote> tag forces PD4ML to move its nested content to the bottom part of a current page and print a footnote auto-incremented index instead. If footnotes area takes to much space, not fitting footnotes are moved to subsequent page(s).

An appearance of noref attribute suppresses the footnote index.

<pd4ml:footnote.caption> allows to specify a delimiter between the main document content and footnotes area.

<pd4ml:endnote> and <pd4ml:endnote.caption> acts identically, however the endnote content is moved not to the bottom part of a page, but to the end of the document.


<pd4ml:footnote.caption>
Footnotes
<hr>
</pd4ml:footnote.caption>

<pd4ml:footnote noref>This footnote has no reference from the main text</pd4ml:footnote>

A note is a string of text placed at the bottom of a page in a book or document or at the end of a chapter,
volume or the whole text<pd4ml:footnote>In some editions of the Bible, notes are placed in a narrow column
in the middle of each page between two columns of biblical text.</pd4ml:footnote>.
The note can provide an author's comments on the main text or citations of a
reference work in support of the text, or both.
<p>
Footnotes are notes at the foot of the page while endnotes<pd4ml:footnote>Unlike footnotes, endnotes have the advantage of not
affecting the layout of the main text, but may cause inconvenience to readers who have to move back
and forth between the main text and the endnotes.</pd4ml:footnote> are collected under a separate heading at
the end of a chapter, volume, or entire work.
<p>

H006.htm

3.1.Embedding TTF Fonts

To support non-Latin charsets, all referenced TTF fonts need to be shaped (unused glyphs removed) and embedded to the resulting PDF. To do that PD4ML needs a direct access to TTF font files, as java.awt.Font object unfortunately provides no way to read font file bytes.

So in order to work with non-Latin charsets, PD4ML needs to be informed, where it can find font files and which ones can be used.

useTTF() API methods let PD4ML know a font directory location or font folder in a resource JAR. Multiple invocations of useTTF() are also allowed.

In a font directory PD4ML expects to find pd4fonts.properties index file with font face name -> font file name mapping information. If the file is not there, it is possible to enable an auto-indexing.


public final static String FONTS_DIR = "c:/windows/fonts";

...

PD4ML pd4ml = new PD4ML();
pd4ml.useTTF(FONTS_DIR, true); // The second parameter forces to index fonts in FONTS_DIR.
// As the indexing of a font directory with a big number of fonts is time/resource consuming,
// it is a good idea to prepare the font mapping file in advance.
// See the next example how to index.

On Windows platform a typical font directory location is c:/windows/fonts, but unfortunately it is write-protected and it is not recommended to store pd4fonts.properties there.

If you want to use fonts from there, you may rely on auto-index, but limit the scope of indexed fonts with a pattern for a better performance.

N001TtfFonts.java

3.2.Preparing TTF Fonts

A generation of pd4fonts.properties from a Java application:


// Index available fonts. As the indexing time/resource consuming,
// it is a good idea to prepare the font mapping file in advance.
File index = File.createTempFile("pd4fonts", ".properties");
index.deleteOnExit();
FontCache.generateFontPropertiesFile(FONTS_DIR, index.getAbsolutePath(), (short)0);

System.out.println("font indexing is done.");
// The same can be done with a command line call:
// java -jar pd4ml.jar -configure.fonts <font.dir> [index.file.location]

...

pd4ml.useTTF(index.getAbsolutePath());

...
File index = File.createTempFile("pd4fonts", ".properties");
index.deleteOnExit();
PD4Util.generateFontPropertiesFile(FONTS_DIR, index.getAbsolutePath());

// The same can be done with a command line call: 
// java -jar pd4ml.jar -configure.fonts <font.dir> [index.file.location] 

PD4ML pd4ml = new PD4ML();
pd4ml.useTTF(index.getAbsolutePath(), true);
...

A similar can be achieved with a command line call:
java -jar pd4ml.jar -Xmx512m -configure.fonts c:/windows/fonts d:/write/enabled/dir/pd4fonts.properties

After the font dir is indexed and an index file is stored to d:/write/enabled/dir/, you may refer the fonts with the API call
pd4ml.useTTF("d:/write/enabled/dir/", false);

N002TtfFonts.java

4.2.Silent Print

The code forces PDF viewer to initiate a printing to default printer as soon as the document is open. Modern PDF viewers normally ask for a confirmation in the case, so a really “silent print” is fortunately not possible.

pd4ml.addDocumentActionHandler("silentprint", null);
// similarly:
// pd4ml.addDocumentActionHandler("OpenAction", "this.print({bUI: false, bSilent: true});");
pd4ml.addDocumentActionHandler("OpenAction", "this.print({bUI: false, bSilent: true});");

A002SilentPrint.java

4.3.Read Resources From Classpath

To address resources via Java Classloader, PD4ML provides support for a non-standard “java:” protocol.

// If you need to handle "java:" URLs in your application, run once the following code
// e.g. in "static { }" section

URL.setURLStreamHandlerFactory(new URLStreamHandlerFactory() {
    public URLStreamHandler createURLStreamHandler(String protocol) {
        return "java".equals(protocol) ? new URLStreamHandler() {
            protected URLConnection openConnection(URL url) throws IOException {
                return new URLConnection(url) {
                    public void connect() throws IOException {
                }
            };
        }
        } : null;
    }
});
// read and parse HTML
pd4ml.readHTML(new URL("java:/advanced/A003.htm"));
// If you need to handle "java:" URLs in your application, run once the following code
// e.g. in "static { }" section

URL.setURLStreamHandlerFactory(new URLStreamHandlerFactory() {
    public URLStreamHandler createURLStreamHandler(String protocol) {
        return "java".equals(protocol) ? new URLStreamHandler() {
            protected URLConnection openConnection(URL url) throws IOException {
                return new URLConnection(url) {
                    public void connect() throws IOException {
                }
            };
        }
        } : null;
    }
});
File pdf = File.createTempFile("result", ".pdf");
FileOutputStream fos = new FileOutputStream(pdf);
// render and write the result as PDF
pd4ml.render(new URL("java:/advanced/A003.htm"), fos);

The URL.setURLStreamHandlerFactory() call is implicitly done by PD4ML() instantiation to suppress java.net.MalformedURLException: Unknown protocol. Do the same in your application if you need to deal with “java:” URLs.

A003ReadHtmlFromClasspath.java

4.4.Add Progress Listener

HTML conversion of big documents may take a while. If you use PD4ML in a GUI application, probably you would like to show a progress bar which informs the user about the conversion state instead of just showing the empty page.

PD4ML provides a callback API for that.

In the example all progress events are just dumped to STDOUT. It is up to you how to use the progress data in your application for a better user experience.

public static class ProgressMeter implements ProgressListener {

    private long startTime = -1;

    /**
    * callback method triggered by progress event. The implementation      dumps the events to STDOUT.
    * Alternatively it could control GUI progress bar etc.
    */
    public void progressUpdate(int messageID, int progress, String note, long msec) {

        if ( startTime < 0 ) {
            startTime = msec;
        }

        String tick = String.format( "%7d", msec - startTime );
        String progressString = String.format( "%3d", progress );

        String step = "";
        switch ( messageID ) {
            case CONVERSION_BEGIN:
                step = "conversion begin";
            break;
            case MAIN_DOC_READ:
                step = "doc read";
            break;
            case HTML_PARSED:
                step = "html parsed";
            break;
            case RENDERER_TREE_BUILT:
                step = "document tree structure built";
            break;
            case HTML_LAYOUT_IN_PROGRESS:
                step = "layouting...";
            break;
            case HTML_LAYOUT_DONE:
                step = "layout done";
            break;
            case PAGEBREAKS_ALIGNED:
                step = "pagebreaks aligned";
            break;
            case TOC_GENERATED:
                step = "TOC generated";
            break;
            case DOC_RENDER_IN_PROGRESS:
                step = "generating doc page";
            break;
            case RTF_PRE_RENDER_DONE:
            step = "RTF pre-render done";
            break;
            case DOC_WRITE_BEGIN:
                step = "writing doc...";
            break;
            case CONVERSION_END:
                step = "done.";
            break;
        }
        System.out.println( tick + " " + progressString + " " + step + " " + note );
    }
}

...

pd4ml.monitorProgressWith(new ProgressMeter());
public static class ProgressMeter implements PD4ProgressListener {

	/**
	 * callback method triggered by progress event. The implementation dumps the events to STDOUT.
	 * Alternatively it could control GUI progress bar etc. 
	 */
	public void progressUpdate(int messageID, int progress, String note, long msec) {
		
		String tick = String.format( "%7d", msec );
		String progressString = String.format( "%3d", progress );
 
		String step = "";
		switch ( messageID ) {
			case CONVERSION_BEGIN:
				step = "conversion begin";
				break;
			case HTML_PARSED:
				step = "html parsed";
				break;
			case DOC_TREE_BUILT:
				step = "document tree structure built";
				break;
			case HTML_LAYOUT_IN_PROGRESS:
				step = "layouting...";
				break;
			case HTML_LAYOUT_DONE:
				step = "layout done";
				break;
			case TOC_GENERATED:
				step = "TOC generated";
				break;
			case DOC_OUTPUT_IN_PROGRESS:
				step = "generating PDF...";
				break;
			case NEW_SRC_DOC_BEGIN:
				step = "proceed to new source document";
				break;
			case CONVERSION_END:
				step = "done.";
				break;
		}
			
		System.out.println( tick + " " + progressString + " " + step + " " + note );
	}
}
...
pd4ml.monitorProgress(new ProgressMeter());

A004AddProgressListener.java

4.5.Add Custom Resource Loader

If some HTML resources like images or stylesheets are not accessible with the standard methods (file read, HTTP(S), etc), you may define your own resource reading “driver”.

First, define a resource addressing syntax, that matches your needs. For example <a src="database:table=pictures;id=4711">

Second, implement a resource loader, which knows what to do with “database:table=pictures;id=4711” URL.

The loader has to be derived from com.pd4ml.ResourceProvider class and to implement two methods: boolean canLoad(String resource, FileCache cache) to test if it can read the URL; BufferedInputStream getResourceAsStream(String resource, FileCache cache) to actually read the resource bytes.

public class DummyProvider extends ResourceProvider {

    public final static String PROTOCOL = "dummy";

    @Override
    public BufferedInputStream getResourceAsStream(String resource, FileCache cache) throws IOException {
        if (!resource.toLowerCase().startsWith(PROTOCOL)) {
            return null;
        }

        // interpret the "resource" parameter according to your protocol (e.g. as a key to a database record etc)

        // in the example we simply dump the resource parameter string
        String buf = "[" + resource.substring(PROTOCOL.length()+1) + "]";
        ByteArrayInputStream baos = new ByteArrayInputStream(buf.getBytes());
        return new BufferedInputStream(baos);
    }

    @Override
    public boolean canLoad(String resource, FileCache cache) {
        if (resource.toLowerCase().startsWith(PROTOCOL)) {
            return true;
        }
        return false;
    }
}
...
pd4ml.addCustomResourceProvider("advanced.DummyProvider");

public class DummyProvider extends ResourceProvider {

    public final static String PROTOCOL = "dummy";

    @Override
    public byte[] getResourceAsBytes(String resource, boolean debugOn) throws IOException {  

        if (!resource.toLowerCase().startsWith(PROTOCOL)) {
            return null;
        }
		
        // interpret the "resource" parameter according to your protocol (e.g. as a key to a database record etc)
		
        // in the example we simply dump the resource parameter string
		
        ByteArrayOutputStream fos = new ByteArrayOutputStream();  
        byte buffer[] = new byte[2048];  
   
        InputStream is = null;  
          
        resource = "file:src/html/rc/" + resource.substring(PROTOCOL.length()+1);  
          
        URL src = new URL(resource);  
        URLConnection urlConnect = src.openConnection();  
        try {  
            urlConnect.connect();  
        } catch (Throwable e) {  
            return new byte[0];  
        }  
        is = urlConnect.getInputStream();  
        BufferedInputStream bis = new BufferedInputStream(is);  
   
        int read;  
        do {  
            read = is.read(buffer, 0, buffer.length);  
            if (read > 0) { // something to put down  
                fos.write(buffer, 0, read);  
            }  
        } while (read > -1);  
   
        fos.close();  
        bis.close();  
        is.close();  
   
        return fos.toByteArray();  
    }
}

PD4ML pd4ml = new PD4ML();

HashMap map = new HashMap();  
map.put( PD4Constants.PD4ML_EXTRA_RESOURCE_LOADERS, "advanced.DummyProvider" );  
pd4ml.setDynamicParams(map);  

String html = "<img src=\"dummy:w3c.svg\">";
StringReader bais = new StringReader(html);

File pdf = File.createTempFile("result", ".pdf");
FileOutputStream fos = new FileOutputStream(pdf);
// render and write the result as PDF
pd4ml.render(bais, fos);

A005AddCustomResourceLoader.java

4.6.Add Image Resampling Resource Loader

PDF file format allows you to embed JPEG images (and some species of PNG) “as is”. Other types of images are represented in PDF as native PDF images: a ZIP compressed stream of pixel color codes that is even less size-efficient than GIF. One tactic to decrease resulting PDF file size (sacrificing image detailization) is to resample the source document images to reduce the image dimensions and/or to convert to JPEG.

The example does not allow images to exceed maximal size of 800x400px. Images does mot match the criteria are scaled down and converted to JPEG.

The conversion applied to images obtained via HTTPS and whose file name ends with “.png”

The example requests HTTPS transport from the PD4ML class com.pd4ml.cache.SslResourceProvider14. It can alternatively be com.pd4ml.cache.WebResourceProvider or com.pd4ml.cache.FileResourceProvider or a conditional combining of all the resource providers.

You can adjust the scaling logic according your needs and/or if you are not satisfied with sometimes too obvious JPEG artifacts, omit "JPEG" conversion and return the resulting image converted to "PNG".


public class ConvertedImageProvider extends ResourceProvider {

	private int maxwidth = 800;
	private int maxheight = 400;
	
	@Override
	public BufferedInputStream getResourceAsStream(String resource, 
                                   FileCache cache) throws IOException {
		if (!resource.toLowerCase().startsWith("https:") || 
			!resource.toLowerCase().endsWith(".png")) {
			return null;
		}
		
		// request the standard HTTPS resource loader to load the image bytes
		com.pd4ml.cache.SslResourceProvider14 provider = 
											new com.pd4ml.cache.SslResourceProvider14();
		byte[] img = provider.getResourceAsBytes(resource, cache);
		
		ByteArrayOutputStream baos = new ByteArrayOutputStream();
	
		BufferedImage image = ImageIO.read(new ByteArrayInputStream(img));
		int width = image.getWidth();
		int height = image.getHeight();
		
		// cannot get image dimensions or image does not exceed the size limit
		if (width <= 0 || height <= 0 && width < maxwidth && height < maxheight) {
			ByteArrayInputStream bais = new ByteArrayInputStream(img);
			return new BufferedInputStream(bais);
		}
	
		double scale = Math.min((double)maxwidth/width, (double)maxheight/height);

		final BufferedImage convertedImage = new BufferedImage((int)(width * scale), 
						(int)(height * scale), BufferedImage.TYPE_INT_RGB);
		
		// scale the source image if requested
		if (scale != 1) {
			AffineTransform scaleTransform = 
						AffineTransform.getScaleInstance(scale, scale);
			AffineTransformOp bilinearScaleOp = new AffineTransformOp(scaleTransform, 
						AffineTransformOp.TYPE_BILINEAR);
			image = bilinearScaleOp.filter(image, new BufferedImage((int)(int)(width * 
						scale), (int)(height * scale), image.getType()));
		}
			
		convertedImage.createGraphics().drawImage(image, 0, 0, Color.WHITE, null);
		ImageIO.write(convertedImage, "JPEG", baos);

		ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
		return new BufferedInputStream(bais);
	}

	@Override
	public boolean canLoad(String resource, FileCache cache) {
		if (resource.toLowerCase().startsWith("https:") && 
		    resource.toLowerCase().endsWith(".png")) {
			return true;
		}
		
		return false;
	}
}
...
pd4ml.addCustomResourceProvider("advanced.ConvertedImageProvider");

ConvertedImageProvider.java
A005AddImageConvertingResourceLoader.java

4.7.Substitute Placeholders

A simple way to add dynamic content to your static HTML templates.

Add $[var1], $[my.variable] etc placeholders to your HTML.

During conversion specify dynamic content for the placeholders this way:

HashMap<String, String> map = new HashMap<>();
map.put("var1", "value 1");
map.put("var2", "[value 2]");
map.put("var3", "* value 3 *");
map.put("my.variable", "Dynamically inserted text");
pd4ml.setDynamicData(map);
HashMap<String, String> map = new HashMap<>();
map.put("var1", "value 1");
map.put("var2", "[value 2]");
map.put("var3", "* value 3 *");
map.put("my.variable", "Dynamically inserted text");
pd4ml.setDynamicParams(map);

$[page], $[total] and $[title] placeholders are reserved.

A006SubstitutePlaceholders.java

4.8.Rendering Status Info

Receiving some conversion statistics and diagnostics data:


// render and write the result as PDF/A
pd4ml.writePDF(fos, Constants.PDFA);

System.out.println("pages: " + (Long)pd4ml.getLastRenderInfo(Constants.PD4ML_TOTAL_PAGES));

// reports actual HTML document layout height in pixels
// (as a rule the value depends on htmlWidth conversion parameter)
System.out.println("height: " + (Long)pd4ml.getLastRenderInfo(Constants.PD4ML_DOCUMENT_HEIGHT_PX));

// reports default width of the HTML document layout in pixels.
// If the document has root-level elements with width="100%",
// the returned value is almost always going to be equal htmlWidth parameter.
// If the returned value is smaller htmlWidth, probably it is optimal htmlWidth for the given document.
System.out.println("right edge: " + (Long)pd4ml.getLastRenderInfo(Constants.PD4ML_RIGHT_EDGE_PX));

StatusMessage[] msgs =
    (StatusMessage[])pd4ml.getLastRenderInfo(Constants.PD4ML_PDFA_STATUS);

for ( int i = 0; i < msgs.length; i++ ) {
    System.out.println( (msgs[i].isError() ? "ERROR: " : "WARNING: ") + msgs[i].getMessage());
}

A007RenderingStatusInfo.java

4.9.Adding Custom Tag Renderer

PD4ML provides a way to introduce your own HTML tags. The example illustrates a way, how to define <star> tag, which renders (surprise!) a star. See StarTag class implementation

String html = "TEST STAR [<star height=20 width=20 style='border: 1 solid blue'>]";
pd4ml.addCustomTagHandler("star", new StarTag());

ByteArrayInputStream bais = new ByteArrayInputStream(html.getBytes());
pd4ml.readHTML(bais);

FYI: Using this API PD4ML plugs external MathML and SVG renderers in.

A008AddingCustomTagRenderer.java

5.1.Convert And Merge With PDF

With merge() API call you may specify a PDF document to merge HTML conversion result with. It can be entire static PDF document or only selected pages of the document.

URL pdfUrl = new URL("java:/pdftools/PDFOpenParameters.pdf");
PdfDocument pdf = new PdfDocument(pdfUrl, null);

File f = File.createTempFile("result", ".pdf");

pd4ml.setPageHeader("HEADER $[page] of $[total]", 40, "1+");

// merge only with pages from 2 to 4. The pages will be appended to the converted PDF
pd4ml.merge(pdf, 2, 4, true);

pd4ml.readHTML(new ByteArrayInputStream(html.getBytes()));
pd4ml.writePDF(new FileOutputStream(f));
PD4ML pd4ml = new PD4ML(); // constructor implicitly registers "java:" protocol

String html = "TEST<pd4ml:page.break><b>Hello, World!</b>";
StringReader bais = new StringReader(html);

PD4PageMark header = new PD4PageMark();
header.setHtmlTemplate("HEADER $[page] of $[total]");
header.setAreaHeight(40);
pd4ml.setPageHeader(header);

File pdfFile = new File("src/pdftools/PDFOpenParameters.pdf");
FileInputStream pdf = new FileInputStream(pdfFile);
// merge only with pages from 2 to 4. The pages will be appended to the converted PDF
pd4ml.merge(pdf, 2, 4, true);

File f = File.createTempFile("result", ".pdf");
FileOutputStream fos = new FileOutputStream(f);

// render and write the result as PDF
pd4ml.render(bais, fos);

P001ConvertAndMergeWithPDF.java

5.2.Merge Two PDFs

PD4ML also provides a set of useful tools to deal with PDF.

The example illustrates how to merge two static PDFs to a single doc. It is straightforward.

URL pdfUrl1 = new URL("java:/pdftools/doc1.pdf");
URL pdfUrl2 = new URL("java:/pdftools/doc2.pdf");
PdfDocument pdf1 = new PdfDocument(pdfUrl1, null);
PdfDocument pdf2 = new PdfDocument(pdfUrl2, null);

File f = File.createTempFile("pdf", ".pdf");

pdf1.append(pdf2);
pdf1.write(new FileOutputStream(f));
URL pdfUrl = new URL("java:/pdftools/PDFOpenParameters.pdf");
PD4Document pdf1 = new PD4Document(pdfUrl, null);
PD4Document pdf2 = new PD4Document(pdfUrl, null);

File f = File.createTempFile("pdf", ".pdf");

pdf1.append(pdf2);
pdf1.write(new FileOutputStream(f));

P002MergeTwoPDFs.java

5.3.Merge Two PDFs And Protect With Password

As an extension of the previous example, the resulting document is also protected with a password and reduced permissions.

URL pdfUrl1 = new URL("java:/pdftools/doc1.pdf");
URL pdfUrl2 = new URL("java:/pdftools/doc2.pdf");
PdfDocument pdf1 = new PdfDocument(pdfUrl1, null);
PdfDocument pdf2 = new PdfDocument(pdfUrl2, null);

File f = File.createTempFile("pdf", ".pdf");

pdf1.append(pdf2);
pdf1.write(new FileOutputStream(f), "test", // Protect the resulting PDF with password "test"
Constants.AllowDegradedPrint | Constants.AllowAnnotate);
URL pdfUrl = new URL("java:/pdftools/PDFOpenParameters.pdf");
PD4Document pdf1 = new PD4Document(pdfUrl, null);
PD4Document pdf2 = new PD4Document(pdfUrl, null);

File f = File.createTempFile("pdf", ".pdf");

pdf1.append(pdf2);
pdf1.write(new FileOutputStream(f), "test",   // Protect the resulting PDF with password "test"
        PD4Constants.AllowDegradedPrint | PD4Constants.AllowAnnotate);

P003MergeTwoPDFsAndProtectWithPassword.java

5.4.Update Pdf Meta Info

PD4ML’s PDF tools make possible to update PDF document meta info.

PdfDocument doc = new PdfDocument(pdfUrl, null);

System.out.println("document author: " + doc.getAuthor());

doc.setTitle("Document Modification Test");
doc.setSubject("PdfDocument API test");
doc.setKeywords("key1, key2");
doc.setModDate(); // set modification date to NOW

doc.write(new FileOutputStream(f), null, -1); // no password, default permissions
URL pdfUrl = new URL("java:/pdftools/PDFOpenParameters.pdf");
PD4Document doc = new PD4Document(pdfUrl, null);

System.out.println("document author: " + doc.getAuthor());  
  
doc.setTitle("Document Modification Test");  
doc.setSubject("PdfDocument API test");  
doc.setKeywords("key1, key2");  
doc.setModDate(); // set modification date to NOW  
  
File f = File.createTempFile("pdf", ".pdf");

doc.write(new FileOutputStream(f), null, -1); // no password, default permissions  

P004UpdatePdfMetaInfo.java

5.5.Underlay/Overlay

A very special way of PDF document merging: overlay and underlay.

PdfDocument doc1 = new PdfDocument(pdfUrl, null);
PdfDocument doc2 = new PdfDocument(pdfUrl, null);

// overlay request to place doc2 content over doc1
// "1" limits to use only the first page of doc2 as an overlay content
// "2+" specifies to apply the overlay to the second and all subsequent pages
// "128" is opacity of overlay (doc2) content, which corresponds ~50%
doc1.overlay(doc2, "1", "2+", 128);
// doc1.underlay(doc2, "1", "2+", 128);

File f = File.createTempFile("pdf", ".pdf");

// writing the overlay result as a new PDF document
FileOutputStream fos = new FileOutputStream(f);
doc1.write(fos);
URL pdfUrl = new URL("java:/pdftools/PDFOpenParameters.pdf");
PD4Document doc1 = new PD4Document(pdfUrl, null);
PD4Document doc2 = new PD4Document(pdfUrl, null);
		              
// overlay request to place doc2 content over doc1   
// "1" limits to use only the first page of doc2 as an overlay content  
// "2+" specifies to apply the overlay to the second and all subsequent pages  
// "128" is opacity of overlay (doc2) content, which corresponds ~50%  
doc1.overlay(doc2, "1", "2+", 128);  
// doc1.underlay(doc2, "1", "2+", 128);  

File f = File.createTempFile("pdf", ".pdf");

// writing the overlay result as a new PDF document   
FileOutputStream fos = new FileOutputStream(f);  
doc1.write(fos);  

P005UnderlayOverlay.java

Suggest Edit