Menu

Render img's 'alt' attribute?

David
2010-03-18
2013-01-03
  • David

    David - 2010-03-18

    Hi,

    it looks like there's no way (or at least I haven't found any) of replacing images with their 'alt' attribute when rendering an HTML page using Source.Renderer. Have I missed something and if not, would that not be a nice feature to add? The purpose of the 'alt' attribute is to provide a textual description of an image for clients that cannot display them, I think that would make sense in that context since the renderer's goal is to provide a visually sensible and "accurate" representation of a page.

    cheers,

    pagod

     
  • Martin Jericho

    Martin Jericho - 2010-03-18

    Hi Pagod,

    This would be a good feature.  Do you have a suggestion on how it should be rendered?  e.g. enclosed in square brackets or something?

    Martin

     
  • David

    David - 2010-03-22

    Hey Martin,

    sorry for the late answer, was gone for the weekend.
    Well, as far as representation is concerned, I definitely think it should be easy to automatically differentiate "alt" text from "normal text". The reason for this is that a program using that output (e.g. for reading aloud or image indexing) should not have to rely on heuristics to decide what to do with the text. I would therefore go for e.g. double brackets or some other unusual sequence of punctuation symbols.

    I think the same goes for links rendered in the text through the Renderer.setIncludeHyperlinkURLs.

    On a slightly more general note, the output of the renderer might be prefered over that of the text extractor even for "simple" text extraction needs as it retains (part of) the structure of the text - which, be it only for text analysis purposes, might be very valuable. That means that the output of the renderer might be used for goals other than just visual, and could therefore be optimized for automatic processing. Being able to actively set such characters as start and end separators for links / image text / whatever would of course also do the trick, allowing each developer to adapt the output to their own needs.

    Cheers,

    Pagod

     
  • Martin Jericho

    Martin Jericho - 2010-05-16

    Hi Pagod,

    I have added Renderer.setIncludeAlternateText(boolean) and Renderer.renderAlternateText(StartTag) methods to v3.2.  The default implementation uses square brackets to delimit the alternate text as it looks better than double angle quotes.

    I disagree with the notion of parsing the output of the Renderer class and choosing delimiters based on this possibility.  Any application that needs to analyse the structure of the text for reading aloud, image indexing or any other purpose should be working with the marked up content directly.

    Until version 3.2 is officially released, the development version is available here: 
    http://jericho.htmlparser.net/temp/jericho-html-3.2-dev.zip

    Cheers 
    Martin

     

Log in to post a comment.