Creating an index at the end of the pdf file

This topic has 10 replies, 4 voices, and was last updated Dec 15, 2009
15:22:46 by PD4ML.

Viewing 11 posts - 1 through 11 (of 11 total)

Author

Posts
tedMartin
December 9, 2009 at 11:51
#26303
Hi

We’re currently pondering whether to use pd4ml (from Java).

Our requirements are fairly common, but one comes out as not being obvious, could anyone tell me whether it’s doable ?

This specific requirements is about creating an index.

It means, for some keywords, to list where they are used and provide link to the pages. The trick is that we don’t want more than one link per page, whatever the number of occurrences of this keyword.

In the end, it should look like this :
Characters index
– keyword1: page 2 (3 times), page 4 (2 times) and page 5.
– keyword2: page 2, page 33 (5 times) and page 45.

In this example, each “page X” should be a link to the actual page.

Is it doable with pd4ml ?

thanks in advance

best regards

ted
PD4ML
December 10, 2009 at 15:18
#27677
Unfortunately PD4ML does not support the feature.

Of course the index creation cannot be fully automated. Let say you mark all significant keywords like that:

text text keyword text text

Currently you may generate PDF bookmarks from the markup (pd4ml.generateOutlines(false)). The bookmarks give you quick access to the marked keywords. However the bookmarks are not sorted as index assumes and they are visible only in Acroread, not in print output.

Obviously we need to add a new proprietary tag in order to support the feature you need. Also the keyword markup needs to be extended with a switch to make possible exclude regular document anchors:

text text keyword text text

or to introduce another tag for that:

text text keyword text text

If the above is acceptable by you we’ll include the feature implementation to our plans. The priority will depend on the license type you are going to purchase.
Joseph
December 14, 2009 at 08:45
#27678
thanks a lot for this answer

due to it, I was asked to look more into the precise features set of PD4ML.

it appears that, for our use, it would lack as well a “orphans and widows” feature.

if both the index and this orphans and widows features were available, the java library pro would be our targeted license.

would it be possible to have both the orphans and widows as well as the index features ?

About the offered solution, the tag keyword would perfectly fit our need. I guess a tag like would also be needed.

thanks again

best regards
Joseph
December 14, 2009 at 10:29
#27679
Regarding the license issue, I had a closer look at the conditions.

The way we plan to build our pdf generation stack is to have a server providing pdf generation as a service to the rest of our stack (allowing for pdf rendering in order to attach it to a mail for example).

As such, the current plan is to call this pdf generation service (through RMI) and then give it everything it needs to run. In the current case, I think it means calling render(java.io.InputStreamReader isr,
java.io.OutputStream os,
java.net.URL base)

Does it qualify us for the volume license ?

thanks again
joseph
PD4ML
December 14, 2009 at 11:06
#27680
Orphans can be eliminated with the currently existing conditional page break:

(300 is height in pixels there)

Also you may format portions of text (or starting and ending parts or chapters) as

s and to define “page-break-inside: avoid” CSS style for them. It should cure both orphans and widows.
PD4ML
December 14, 2009 at 11:21
#27681
For your planned environment (if you do not want to build a cluster of servers) 2 Pro licenses are sufficient. One is for the productive server, another one for your development team (and test servers).

On the other hand I see no big reason to create the pdf generation server. You may simply drop pd4ml.jar and ss_css2.jar into lib directory of your application and have the PDF generation functionality in-place. In the case you do need the Volume license, and it is worth its money if you compare it with efforts to establish and maintain the PDF-generating network infrastructure.
Anonymous
December 14, 2009 at 12:15
#27682
Interesting discussion, thanks for your inputs.

I didn’t find this pd4ml.page tag on the website, but the tag lib api did it (http://www.pd4ml.com/taglib/index.html). Looks like doing the job.

Regarding our stack design, it comes from one of our needs : to be able to generate mails with pdf attached to them, as a kind of newsletter. Indeed, we could have a request to process 1000+ of such mails, with just a few lines changing from one mail to the others.

in doing so, we would like to be able to spare the bandwidth as much as possible (and thus we plan to have a templating option) as well as be sure we won’t spoil some web server CPU (and as such being able to produce the pdf in a dedicated server). The templating stuff would be done before the pdf rendering engine.

hence our aimed architecture. I welcome any advice/feedback.

As PD4ML looks promising, I’ll now work on a demo implementation, using the trial version. Hopefully it’ll work out as expected and then we could discuss further of the buy and index stuff.

thanks again
joseph
Anonymous
December 15, 2009 at 14:25
#27684
Hi

I spent some time looking closer into it, and a some questions arose.

Most are about the jsp tag lib, considering I’m not used to jsp tags. Are they some rules defining whether the tags should be closed immediately or include the whole document ? Is the order of declaration a matter ? Should parameters always be included in quotes ?

Currently I’ve :
<pd4ml:transform screenWidth="400" debug="true" > <pd4ml:permissions password="empty" rights="2068" strongEncryption="true"> <pd4ml:page.break ifSpaceBelowLessThan="500"> <pd4ml:clean_xhtml/> (html doc) </pd4ml:page.break> </pd4ml:permissions> </pd4ml:transform> 
=> I’ve tried many other combinations, to no success.

In this context, only clean_xhtml works… The permissions defined in jsp never worked (I had to do it through Java), neither did the page break feature (for which I found no Java correspondence).

A more important question is about hyphenation : it’s working but without an hyphen (-) character.
I currently use CSS :
 p{ text-align: justify; word-wrap:break-word; }
however the breaks occur without hyphen character. Is there a way to define one ?

I tried to put some h3 with visibility:hidden, in order for the Table Of Content to include link to these items without actually showing it. It doesn’t work.

However, this is a need for our current usage of a so called index : we have some h3 which are hidden but should be part of the table of content, in order to allow links to the position. Is there a way to do it through the table of content ?

If the index spoken above was to be created, could it link to keywords which aren’t actually displayed ? Or could it be done through other tricks ?

I’ll look deeper into PD4ML later, so I could come back with more questions…

thanks again !

joseph
Anonymous
December 15, 2009 at 14:33
#27685
nb : works fine 🙂
PD4ML
December 15, 2009 at 15:01
#27683
I know, the same namespace in PD4ML JSP taglib and proprietary tags confuses.

Here is the JSP docs:
http://pd4ml.com/taglib/pd4ml/tld-summary.html

The proprietary tags are at the bottom of the doc:
http://pd4ml.com/html.htm

Generally if a custom JSP tag has the same name as a PD4ML proprietary tag, the JSP tag simply replicates itself and lets PD4ML runtime to process it. The rest of JSP custom tags are mapped to PD4ML API.

Regarding your particular issues:

assumes no body, so in JSP it should be . In plain HTML syntax is also allowed. But please take into account: the tag does not define a “page breaking style”. It is an explicit page break (in the case it is conditional).

is mapped to setPermissions() and it is also body-less.

Your JSP should look like the following:

[language=xml:201vh62w] (preface) (chapter 1) (chapter 2) [/language:201vh62w]

I also removed more-less outdated clean_xhtml as the current version of our HTML renderer “understands” XHTML quite good.

PD4ML

December 15, 2009 at 15:22

#27686
You may find the general info about TOC by the link:http://pd4ml.com/reference.htm#8

There are some not yet documented features.

You may suppress page number generation for particular TOC entries this way:
[language=xml:2g3o7pbj]

Test Page

NOPAGENUM test

Line 1

Line 1.1

Line 1.2

Line 1.2.1

Line 1.2.1

Line 1.3

Line 2

Line 3

Line 4

Line 5

Line 6

[/language:2g3o7pbj]

Or to disable it, let’s say, for all

:

div.ptoc3-style-left pd4ml-dots, div.ptoc3-style-right {visibility: hidden}

You may disable particular levels to appear in TOC completely with:

.ptoc3-style-left, .ptoc3-style-right {display:none; visibility:hidden;}

Author

Posts

Viewing 11 posts - 1 through 11 (of 11 total)

The forum ‘HTML/CSS rendering issues’ is closed to new topics and replies.

Creating an index at the end of the pdf file

Line 1

Line 1.1

Line 1.2

Line 1.2.1

Line 1.2.1

Line 1.3

Line 2

Line 3

Line 4

Line 5

Line 6

: div.ptoc3-style-left pd4ml-dots, div.ptoc3-style-right {visibility: hidden} You may disable particular levels to appear in TOC completely with: .ptoc3-style-left, .ptoc3-style-right {display:none; visibility:hidden;}

:

div.ptoc3-style-left pd4ml-dots, div.ptoc3-style-right {visibility: hidden}

You may disable particular levels to appear in TOC completely with:

.ptoc3-style-left, .ptoc3-style-right {display:none; visibility:hidden;}