Menu

How to parse this page? help me.

Help
Anonymous
2004-05-11
2004-05-11
  • Anonymous

    Anonymous - 2004-05-11

    I try to parse this page (http://www.hboasia.com/schedule/index.jsp?curCountry=84) But I can't get any expected result. This is a schedule of my favorite TV's program. I just want to get the title, time and duration of all the movies on the present day. Any body can help me?

     
    • Derrick Oswald

      Derrick Oswald - 2004-05-11

      This is pretty trivial entry level programmer stuff. The formatting of the output takes more code than the actual parse code.

      I include the code inline in this forum, but there's a chance it will get mangled by the sourceforge system:

      import org.htmlparser.filters.HasAttributeFilter;
      import org.htmlparser.util.NodeList;
      import org.htmlparser.NodeFilter;
      import org.htmlparser.Parser;

      public class HBOReader
      {
          public static void format (String string)
          {
              int length;

              string = string.replaceAll ("\t", "");
              string = string.replaceAll ("\n", " ");
              do
              {
                  length = string.length ();
                  string = string.replaceAll ("  ", " ");
              }
              while (length != string.length ());

              if (string.startsWith (" "))
              {
                  if (string.endsWith (" more >> "))
                      string = string.substring (0, string.length () - 9);
                  System.out.println (string.substring (1));
              }
          }

          public static void main (String[] args) throws Exception
          {
              Parser parser;
              NodeFilter filter;
              NodeList list;

              filter = new HasAttributeFilter ("class", "contentdesc");
              parser = new Parser ("http://www.hboasia.com/schedule/index.jsp?curCountry=84");
              list = parser.extractAllNodesThatMatch (filter);
              for (int i = 0; i < list.size (); i++)
                  format (list.elementAt (i).toPlainTextString ());
          }
      }

      Produces:
      52 Pick-Up (Starts 5:00 AM) Duration - 1:39 hr
      The Powerpuff Girls Movie (Starts 6:45 AM) Duration - 1:30 hr
      Dance Macabre (Starts 8:00 AM) Duration - 1:32 hr
      Family Under Siege (Starts 9:45 AM) Duration - 1:30 hr
      Beethoven's 4th (Starts 11:30 AM) Duration - 1:29 hr
      Save The Last Dance (Starts 1:00 PM) Duration - 1:48 hr
      Desert Hawk (Starts 3:00 PM) Duration - 1:26 hr
      Streets Of Fire (Starts 4:30 PM) Duration - 1:29 hr
      The Animal (Starts 6:00 PM) Duration - 1:19 hr
      Tremors 3: Back To Perfection (Starts 7:20 PM) Duration - 1:39 hr
      Six Feet Under Back to Back Special (Ep 22-23) (Starts 9:00 PM) Duration - 1:46 hr
      The Amazing Panda Adventure (Starts 10:55 PM) Duration - 1:20 hr
      Beethoven's 4th (Starts 12:20 AM) Duration - 1:29 hr
      Lone Star State Of Mind (Starts 2:00 AM) Duration - 1:24 hr
      The Animal (Starts 3:30 AM) Duration - 1:19 hr

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.