RE: [FOray-developer] Foray Font/FOP/PDFBox
Modular XSL-FO Implementation for Java.
Status: Alpha
Brought to you by:
victormote
From: Victor M. <vi...@ou...> - 2006-03-17 01:37:25
|
Hi Ben: > > > 2)From raw font data that is already embedded into a PDF create a > > > Font to get font metrics(this is used by text > > > extraction) > > > > FOrayFont does not really handle #2 or #3 at all. > > For both of these items, I think we > > have data classes that should > > work pretty well for you, but no parsers. > > I actually mispoke as well, the parsers are not for text > extraction but for embedding fonts into a PDF. Info needs to > be parsed to set certain fields in PDF dictionaries. I will > need to take a font that is embedded in a PDF(ie TTF) and > create a java.awt.Font for display purposes, but I don't > think I would use axslFont for that, probably just standard > libraries. I would like programmatically register a new font > with the font server, if I understand correctly the > FontServer will load a config file and make fonts available, > but I would also like to programmatically register a new > font. I don't intend that FORayFont to parse PDF or have any > PDF knowledge at all, just wanted to clarify myself. OK. That sounds fine. I'll work on adding the methods for the programmatic registration. > Again, if there is a piece of data that I use that is not yet > available I am sure we can just add some patches to do it. That sounds right. > > > 3)Parse embedded CMAP files to get encoding information > > We can just take the FontBox CMAP parser an bring it into > FORayFont and > add some sort of interface to axslFont, I thought I saw a CMAP parser > in your codebase already but maybe I was mistaken. FOrayFont knows how to parse a CMAP table in a TTF file. We should be able to make that work for a CMAP table pulled out of an embedded font in a PDF. However, we don't have anything that will parse the PDF CMAP concept -- we know how to create them (like the ToUnicodeCMAP), but don't parse them. In either case, it sounds like we should be able to make this work. > > > 4)Create a java.awt.Font from any font, > > > this is used for displaying a PDF > > > On #4, do you mean > > that you create a java.awt.Font from a font > > file, or that you create it from > > an embedded font? If the latter, then we don't have that either. > > Hmm, FontBox does not actually do this I use java.awt.Font directly > right now but I envisioned using the axsl FontServer so that > substitution could be used, also if it is one of the base 14 > fonts then > the correct font needs to be choosen, this is hardcoded in PDFBox(not > FontBox) right now but would be nice it that was somehow part of the > font library so it could be configured. OK, I thought you were saying that FontBox/FOrayFont needed to do this. You'll probably need to lead me through what you want on font substitution. I doubt that we have it in place right now. Keep in mind that WRT the base-14 fonts, we don't actually have the font, just the font metrics. As I understand it, the viewer application is supposed to either have them or be capable of getting them from the O/S. > > There is an even bigger issue here. If your parsing pulls the data > out of > > the PDF in a fairly raw chunk, then sends that chunk to a > parser, we > can > > probably create some methods in the aXSL FontServer to register the > font and > > send the content to the FOray parsing mechanisms (there are > plans in > my mind > > for such capability -- nobody has asked for it yet). If the parsing > needs to > > be done in place, then I think your PDF parsing routines would need > to use > > FOray directly instead of aXSL. The aXSL interfaces are for a very > different > > purpose. > > > > PDFBox will do all the PDF parsing an collect all the > required data but > I'll need to be able to pass it to something to create the axslFont. > For example if a TTF is embedded in a PDF I'll potentially > want to just > pass you a stream to read the TTF data from. That should work fine. We actually use a random access scheme to parse the TTF files, but we should be able to write an implementation of it that can read the stream and make it available that way. This was discussed recently in another thread. I'll put that on my list of things to do. > I assume that Vincent will be solely using the axslFont > interfaces for > FOP integration and I would like PDFBox to stick to the axslFont > interfaces as well. So I don't expect any explicit FORayFont imports > in FOP or PDFBox(at least not in any public interface but not at all > would be best). I do not want to mix axslFont/FORarFont > interfaces if > I can avoid it, that would defeat the purpose of axslFont, would it > not? ;) Yes it would defeat the purpose, and you have the right idea here. I was afraid you were asking for things that needed to be done at a much lower level than aXSL was designed for. As I undertand you right now, we should be able to make aXSL do what you need. Thanks for clarifying all of that. I think we are on the right track. Victor Mote |