Help save net neutrality! Learn more.
Close

#433 issues with ePub conversion (1/3)

closed-rejected
5
2012-10-03
2012-08-28
No

Dear Sebastian,

using the Oxgarage, I have generated an ePub file from an already encoded book (http://www.perseus.tufts.edu/hopper/opensource/downloads/texts/tei/1999.04.0074.xml).

There is an important problem, since the html files that contain the text (OPS/index-body.1_div.1.html and OPS/index-body.1_div.2.html) which are
1.2MB and 920KB. These are too big to be uncompressed and loaded by an ereader (Sony PRS-T1). It took more than 8 minutes (more than 500 secs)
to load the first part (it isn't only too slow, it empties the battery).

I thought this was caused (that the html files containing text weren't 100KB or less) by the fact that the text is encoded with the following main divisions: part, section and subsection. But encoding sections as chapters and subsections as sections made no difference.

My suggestion to this issue is: wouldn't it be possible that the ePub generation process splits chapters (or sections when there are no chapters) in different html files? These will be smaller and faster to be loaded in ereaders.

Many thanks for your help,

Pablo

Discussion

  • Pablo Rodriguez

    Pablo Rodriguez - 2012-08-28

    Dear Sebastian,

    I forgot to mention that the ePub file that OxGarage generates from http://www.tei-c.org/release/xml/tei/custom/odd/teilite.odd contains also a huge (2.3MB) OPS/index-back.1_div.2.html file.

    In that case, would it be possible to create a new file with each element? (I don't see other clear way to do it.)

    Many thanks for your help,

    Pablo

     
  • Sebastian Rahtz

    Sebastian Rahtz - 2012-09-05

    the default policy is to split the text into one HTML page per top-level section. Splitting in a more granular way is dictated by the value of splitLevel parameter, which can be set in the XSL. Unfortunately, I don't support setting parameters like this in OxGarage (though the commandline script teitoepub does allow it). when I set splitLevel to 1 (default is 0), I duly get 24 HTML files in the epub.

    unfortunately the epub is invalid! I will work on seeing what is going wrong.

    are you able to work on the command line for this?

     
  • Sebastian Rahtz

    Sebastian Rahtz - 2012-09-05

    There is also a major problem with this text, in the shape of unresolved references. eg

    <ref target="s669">669</ref>

    this points to a local file called "669", which does not exist. ePubcheck looks at these hyperlinks, and complains because the target is not present.

    I wonder what the Perseus people think this is?

     
  • Pablo Rodriguez

    Pablo Rodriguez - 2012-09-08

    Sorry for the delayed reply.

    I'm afraid I cannot work with teitoepub (don't worry, this is my personal issue).

    Not because of me, I guess that setting the default to splitLevel=1 could be a sensible policy (again, I don't mean it because of myself).

    On the issue how the Perseus people encoded the text, I'm afraid this is all Greek to me ;-). I actually don't know what they might have in mind with that But after checking the file, I guess they mean section number from the same book. If I'm not wrong, this is wrong encoded.

    Many thanks for your help,

    Pablo

     
  • Sebastian Rahtz

    Sebastian Rahtz - 2012-09-09

    setting the splitLevel higher by default would produce odd results for many texts, as
    the second level <div>s are often very small. I probably need to change OxGarage to allow
    arbitrary parameters to be passed in.

    the references do indeed seem to indicate that the encoding is simply wrong.

    do you actually _want_ this text as ePub, or is it just a learning experience?

     
  • Pablo Rodriguez

    Pablo Rodriguez - 2012-09-09

    It would be perfect for me, if you allow arbitrary parameters settings in OxGarage.

    I use this text as ePub texting experience. It would be fine to be able to read it, but it contains so many issues that is too slow to be rendered on an ereader (font embedding to name only one).

     
  • Sebastian Rahtz

    Sebastian Rahtz - 2012-10-03
    • status: open --> closed-rejected
     
  • Sebastian Rahtz

    Sebastian Rahtz - 2012-10-03

    I am going to close this ticket. I think the example text is not normal, and can be coped with by a profile which changes the setting.