Neilay Dedhia - 2007-01-03

I have a pdf file containing Indic text (Hindi and Gujarati). The Hindi and Gujarati text are encoded by two fonts that have font specific encodings (the fonts are "Atmadharma Gujarati" and "Atmadharma Hindi" ). I am trying to convert the pdf file into an utf-8 encoded html file. To achieve that, I need to
know which text is Gujarati and which text is Hindi. In the style section, pdftohtml (with the -c option)  labelled all the classes as belonging to the font-family "Times" and hence I cannot distinguish between Hindi and Gujarati text in the html file.

I want pdftohtml to add additional classes in the style section that has classes for the "Atmadharma Gujarati" and
"Atmadharma Hindi" font families and then tagging the "Atmadharma Gujarati" text wiht the appropriate class. I tried modifying the HtmlFonts.cc file to add the "Atmadharma Gujarati" to the list of 13 fonts mentioned in the file. See below. But that didn't work. All the classes were still labelled as belonging to the "Times" font-family. See below.

//const int font_num=13;
const int font_num=14;

static Fonts fonts[font_num+1]={
.. (standard fonts) ..
   {"AtmadharmaGujarati",    "Atmadharma Gujarati"},
..

Can you point me what other changes I need to do ? Thanks for your help.

Neilay