Creating Full Page Images

Help
Trey Spiva
2008-01-18
2012-12-15
  • Trey Spiva

    Trey Spiva - 2008-01-18

    I have started to play with khtml2png and I have been able to create images of web sites.  However in order to create an image of a page that is longer than the height of 600 I have to specify a larger height.  If I define a height that is greater than the pages contents, I just get a lot of extra space at the bottom of the image. 

    So, my question is how do I determine the optimal height for a page?  Is there a setting that will specify the "best" height for a page?

     
    • Florent Bruneau

      Florent Bruneau - 2008-01-18

      Hi,

      khtml2png has a "--auto 'id'" option that can be used to detect the position and size of the tag with the given id in the page.

      Let's consider I have the following html code :

      <html>
      <body>
         <div id="mycontent">
            Here is the content of the page.
         </div>
         <div id="mymenu">
           Here is the menu of the page
         </div>
      </body>
      </html>

      then, running `khtml2png --auto mycontent` will automatically detect the size of the <div> with id "mycontent" and take a screenshot of the corresponding box.

      Another situation can be the following :

      <html>
        <body>
          MyContent
          <div class="mydelimiter"></div>
        </body>
      </html>

      In this second cas, the div "mydelimiter" is empty, so khtml2png decides it is not a box, but a delimiter. `khtml2png --auto mydelimiter` will take a screenshot of the page from the top-left corner of the page, to the bottom-right corner of the delimiter.

      Hope, this will help you.

       
    • Trey Spiva

      Trey Spiva - 2008-01-18

      This solution sounds only effective if you own a page.  However, it does not sound like it will work for any arbitrary web page.  If khtml2page can find the size of a tag with an ID why not find the height of the </html> tag?

       
      • Florent Bruneau

        Florent Bruneau - 2008-01-18

        Most of the page use ids to structure their html. Even if you don't own the page, you can find corresponding id by looking into the html code of the page.

        <html> tag is not rendered (it's only a meta-markup to tell the interpreter "This is HTML data"), the size of the page is defined by the <body> tag. I'll make some tests to find out if a "detect body" option can work.

         
      • Florent Bruneau

        Florent Bruneau - 2008-03-09

        Hi,

        I've made a patch to allow screenshot of the <body> tag. It detects correctly the height of the contents, but you have to give manually the width of the screenshot (because the <body> tag is extended to fit the width of the window of the browser, so, its width does not sense).

        diff --git a/khtml2png.cpp b/khtml2png.cpp
        index 8b48729..40bcdac 100644
        --- a/khtml2png.cpp
        +++ b/khtml2png.cpp
        @@ -50,7 +50,7 @@
          **parameter height: Height of the screenshot (if id is empty)
          **/
        KHTML2PNG::KHTML2PNG(const KCmdLineArgs* const args)
        -:KApplication(), m_html(0), pix(0)
        +:KApplication(), m_html(0), getBody(false), pix(0)
        {
             const QString width  = args->getOption("width");
             const QString height = args->getOption("height");
        @@ -58,6 +58,7 @@ KHTML2PNG::KHTML2PNG(const KCmdLineArgs* const args)
             const QString scaledHeight = args->getOption("scaled-height");
             autoDetectId = args->getOption("auto");
             timeoutMillis = args->getOption("time").toUInt() * 1000;
        +    getBody = args->isSet("get-body");
             show = !args->isSet("disable-window");

             rect = QRect(0, 0, width.isEmpty() ? -1 : width.toInt(), height.isEmpty() ? -1 : height.toInt());
        @@ -205,20 +206,32 @@ void KHTML2PNG::resizeClipper(const int width, const int height)

        /**
          **name slotCompleted()
        - **description Searches for the position of a HTML element to use as screenshot size marker or sets the m_completed variable.
        + **descr
          **/
        void KHTML2PNG::completed()
        {
             loadingCompleted = true;
        -    if (!detectionCompleted && !autoDetectId.isEmpty())
        +    if (!detectionCompleted && (getBody || !autoDetectId.isEmpty()))
             {
                 //search for the HTML element
        -        DOM::Node markerNode = m_html->htmlDocument().all().namedItem(autoDetectId);
        +        DOM::Node markerNode;
        +        if (getBody)
        +        {
        +            markerNode = m_html->htmlDocument().body();
        +        }
        +        else
        +        {
        +            markerNode = m_html->htmlDocument().all().namedItem(autoDetectId);
        +        }

                 if (!markerNode.isNull())
                 {
                     //get its position
        -            rect = m_html->htmlDocument().all().namedItem(autoDetectId).getRect();
        +            QRect tmpRect = m_html->htmlDocument().all().namedItem(autoDetectId).getRect();
        +            if (getBody && rect.width() > 0) {
        +                tmpRect.setWidth(rect.width());
        +            }
        +            rect = tmpRect;
                     if (rect.isEmpty()) {
                         rect = QRect(0, 0, rect.right(), rect.bottom());
                     }
        @@ -227,7 +240,11 @@ void KHTML2PNG::completed()
                     right = right > rect.right() ? right : rect.right() + 200;
                     bottom = bottom > rect.bottom() ? bottom : rect.bottom() + 200;
                     resizeClipper(right, bottom);
        -            rect = m_html->htmlDocument().all().namedItem(autoDetectId).getRect();
        +            tmpRect = m_html->htmlDocument().all().namedItem(autoDetectId).getRect();
        +            if (getBody) {
        +                tmpRect.setWidth(rect.width());
        +            }
        +            rect = tmpRect;
                     if (rect.isEmpty()) {
                         rect = QRect(0, 0, rect.right(), rect.bottom());
                     }
        @@ -424,6 +441,8 @@ static KCmdLineOptions options[] =
             { "t", 0, 0},
             { "time <time>", "Maximum time in seconds to spend loading page", "30" },
             { "auto <id>", "Use this option if you to autodetect the bottom/right border", "" },
        +    { "get-body", "Autodected the body of the page (if width is not detected, use --width)", 0 },
        +    { "b", 0, 0 },
             { "disable-window", "If set, don't show the window when doing rendering (can lead to missing items)", 0 },
             { "disable-js", "Enable/Disable javascript (enabled by default)", 0 },
             { "disable-java", "Enable/Disable java (enabled by default)", 0},
        diff --git a/khtml2png.h b/khtml2png.h
        index 6a7b20a..40a8631 100644
        --- a/khtml2png.h
        +++ b/khtml2png.h
        @@ -41,6 +41,7 @@ class KHTML2PNG : public KApplication
             bool show;

             QString autoDetectId;
        +    bool getBody;
             QString filename;
             QRect   rect;
             QSize   scaled;

         
    • Trey Spiva

      Trey Spiva - 2008-10-15

      Sorry that I have not replied in such a long time.  I have been working on other aspects of the application for a while. 

      I have downloaded 2.7.5 and I am trying to use this feature. However I have not been able to figure out how to tell khtml2png2 to use the body to determine the height.  Did you put in a new switch?

       
    • Trey Spiva

      Trey Spiva - 2008-10-15

      I was able to put the patch into my 2.7.5 code base and every thing work correctly.  Thanks. 

      Are you planning to put this patch into the release?

       
      • Florent Bruneau

        Florent Bruneau - 2008-10-23

        > Are you planning to put this patch into the release?

        Hauke released khtml2png 2.7.6 with this patch 2 days ago.

         
        • Leonid Evdokimov

          I've tested 2.7.6 and found that:
          khtml2png2 --disable-redirect --width 1024 --get-body http://foo.bar foobar.png
          gives me image with gray area below ~760 pixels, BUT
          khtml2png2 --disable-window --disable-redirect --width 1024 --get-body http://foo.bar foobar.png
          works much better — whole area contains some text/images and looks like website.

          But --disable-window misbehaves when grabbing http://linux.org.ru — it looks like out-of-memory condition and I have ~1Gb of free RAM.
          I'll give more test results later.

          $ khtml2png2 --version
          Qt: 3.3.8
          KDE: 3.5.9
          KHTML2PNG: 2.7.6

          Gentoo Linux, xorg-server-1.3.0.0-r6 (using Radeon 7500 mobility).

           
          • Leonid Evdokimov

            I tested a bit more — it was not OOM, khtml2png made X-server to consume 100% of CPU time, so X session became laggy.

            Using Xvfb instead of usual X server eliminated lags.

             

Log in to post a comment.