I try to parse this page (http://www.hboasia.com/schedule/index.jsp?curCountry=84) But I can't get any expected result. This is a schedule of my favorite TV's program. I just want to get the title, time and duration of all the movies on the present day. Any body can help me?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
if (string.startsWith (" "))
{
if (string.endsWith (" more >> "))
string = string.substring (0, string.length () - 9);
System.out.println (string.substring (1));
}
}
public static void main (String[] args) throws Exception
{
Parser parser;
NodeFilter filter;
NodeList list;
filter = new HasAttributeFilter ("class", "contentdesc");
parser = new Parser ("http://www.hboasia.com/schedule/index.jsp?curCountry=84");
list = parser.extractAllNodesThatMatch (filter);
for (int i = 0; i < list.size (); i++)
format (list.elementAt (i).toPlainTextString ());
}
}
Produces:
52 Pick-Up (Starts 5:00 AM) Duration - 1:39 hr
The Powerpuff Girls Movie (Starts 6:45 AM) Duration - 1:30 hr
Dance Macabre (Starts 8:00 AM) Duration - 1:32 hr
Family Under Siege (Starts 9:45 AM) Duration - 1:30 hr
Beethoven's 4th (Starts 11:30 AM) Duration - 1:29 hr
Save The Last Dance (Starts 1:00 PM) Duration - 1:48 hr
Desert Hawk (Starts 3:00 PM) Duration - 1:26 hr
Streets Of Fire (Starts 4:30 PM) Duration - 1:29 hr
The Animal (Starts 6:00 PM) Duration - 1:19 hr
Tremors 3: Back To Perfection (Starts 7:20 PM) Duration - 1:39 hr
Six Feet Under Back to Back Special (Ep 22-23) (Starts 9:00 PM) Duration - 1:46 hr
The Amazing Panda Adventure (Starts 10:55 PM) Duration - 1:20 hr
Beethoven's 4th (Starts 12:20 AM) Duration - 1:29 hr
Lone Star State Of Mind (Starts 2:00 AM) Duration - 1:24 hr
The Animal (Starts 3:30 AM) Duration - 1:19 hr
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I try to parse this page (http://www.hboasia.com/schedule/index.jsp?curCountry=84) But I can't get any expected result. This is a schedule of my favorite TV's program. I just want to get the title, time and duration of all the movies on the present day. Any body can help me?
This is pretty trivial entry level programmer stuff. The formatting of the output takes more code than the actual parse code.
I include the code inline in this forum, but there's a chance it will get mangled by the sourceforge system:
import org.htmlparser.filters.HasAttributeFilter;
import org.htmlparser.util.NodeList;
import org.htmlparser.NodeFilter;
import org.htmlparser.Parser;
public class HBOReader
{
public static void format (String string)
{
int length;
string = string.replaceAll ("\t", "");
string = string.replaceAll ("\n", " ");
do
{
length = string.length ();
string = string.replaceAll (" ", " ");
}
while (length != string.length ());
if (string.startsWith (" "))
{
if (string.endsWith (" more >> "))
string = string.substring (0, string.length () - 9);
System.out.println (string.substring (1));
}
}
public static void main (String[] args) throws Exception
{
Parser parser;
NodeFilter filter;
NodeList list;
filter = new HasAttributeFilter ("class", "contentdesc");
parser = new Parser ("http://www.hboasia.com/schedule/index.jsp?curCountry=84");
list = parser.extractAllNodesThatMatch (filter);
for (int i = 0; i < list.size (); i++)
format (list.elementAt (i).toPlainTextString ());
}
}
Produces:
52 Pick-Up (Starts 5:00 AM) Duration - 1:39 hr
The Powerpuff Girls Movie (Starts 6:45 AM) Duration - 1:30 hr
Dance Macabre (Starts 8:00 AM) Duration - 1:32 hr
Family Under Siege (Starts 9:45 AM) Duration - 1:30 hr
Beethoven's 4th (Starts 11:30 AM) Duration - 1:29 hr
Save The Last Dance (Starts 1:00 PM) Duration - 1:48 hr
Desert Hawk (Starts 3:00 PM) Duration - 1:26 hr
Streets Of Fire (Starts 4:30 PM) Duration - 1:29 hr
The Animal (Starts 6:00 PM) Duration - 1:19 hr
Tremors 3: Back To Perfection (Starts 7:20 PM) Duration - 1:39 hr
Six Feet Under Back to Back Special (Ep 22-23) (Starts 9:00 PM) Duration - 1:46 hr
The Amazing Panda Adventure (Starts 10:55 PM) Duration - 1:20 hr
Beethoven's 4th (Starts 12:20 AM) Duration - 1:29 hr
Lone Star State Of Mind (Starts 2:00 AM) Duration - 1:24 hr
The Animal (Starts 3:30 AM) Duration - 1:19 hr