jtidy-user Mailing List for JTidy (Page 4)

Brought to you by: aditsu, atripp, fgiust, garypeskin, and 4 others

jtidy-user — JTidy mailing list for users

You can subscribe to this list here.

2004	Jan (5)	Feb (6)	Mar (11)	Apr (6)	May (9)	Jun (5)	Jul (8)	Aug (3)	Sep (2)	Oct (16)	Nov (16)	Dec (4)
2005	Jan (8)	Feb (7)	Mar (6)	Apr (8)	May (5)	Jun (9)	Jul (4)	Aug (4)	Sep (2)	Oct (5)	Nov (5)	Dec (2)
2006	Jan (9)	Feb (5)	Mar (2)	Apr (9)	May (1)	Jun (4)	Jul (1)	Aug (9)	Sep (2)	Oct (5)	Nov (5)	Dec
2007	Jan (1)	Feb (1)	Mar (2)	Apr (1)	May (5)	Jun (1)	Jul (2)	Aug (4)	Sep (3)	Oct (2)	Nov (3)	Dec
2008	Jan (4)	Feb (7)	Mar (3)	Apr (6)	May	Jun (1)	Jul (3)	Aug (3)	Sep (5)	Oct (1)	Nov (3)	Dec (3)
2009	Jan (2)	Feb (4)	Mar (1)	Apr	May (1)	Jun	Jul (16)	Aug (12)	Sep (10)	Oct	Nov (2)	Dec (4)
2010	Jan (3)	Feb (1)	Mar (1)	Apr (16)	May (4)	Jun (1)	Jul (15)	Aug (8)	Sep (14)	Oct (5)	Nov (1)	Dec
2011	Jan (2)	Feb	Mar (2)	Apr (1)	May (1)	Jun (6)	Jul	Aug	Sep (1)	Oct (2)	Nov (1)	Dec
2012	Jan	Feb	Mar	Apr (7)	May	Jun (1)	Jul	Aug (3)	Sep (1)	Oct	Nov	Dec
2013	Jan (1)	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep (1)	Oct	Nov	Dec (6)
2016	Jan	Feb	Mar	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov (2)	Dec
2017	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug (1)	Sep	Oct	Nov	Dec

Flat | Threaded

<< < 1 2 3 4 5 6 .. 17 > >> (Page 4 of 17)

[Jtidy-user] HTML 4.01

From: 周鹏 <zho...@gm...> - 2010-07-14 08:15:23

Hi!
    I'm use Jtidy to convert this page:

http://sports.yahoo.com/mlb/news;_ylt=AgnhXKUYhDvpSQ1TCuC_5SE5nYcB?slug=ap-obit-steinbrenner

    Here is my code:
        Tidy tidy = new Tidy();
        tidy.setXHTML(true);
        InputStream     is = new FileInputStream("1.html");//1.html is the
page on the top
       OutputStream     os = new FileOutputStream("result.xml");
         tidy.parseDOM(is, os);
    ......
    This can't  work correct.Here is the log:
      InputStream: Doctype given is "-//W3C//DTD HTML 4.01//EN"
      InputStream: Document content looks like HTML 4.01 Transitional
     330 warnings, 19 errors were found!
     This document has errors that must be fixed before
     using HTML Tidy to generate a tidied up version.

 Can anyone help me?
 Sorry for my poor english!

[Jtidy-user] [Open Discussion] add content-type meta tag

From: SourceForge.net <no...@so...> - 2010-07-12 16:05:35

The following forum message was posted by weberjn at http://sourceforge.net/projects/jtidy/forums/forum/41436/topic/3767679:

Hi,
can you make jTidy output a content-type meta tag like:
[code]<meta http-equiv=\"content-type\" content=\"text/html; charset=utf-8\" />[/code] ?

Or would I have to add manually an element to the head element?

Thanks, 
Juergen

[Jtidy-user] [Help] Problems with pprint

From: SourceForge.net <no...@so...> - 2010-07-02 12:21:08

The following forum message was posted by asheara at http://sourceforge.net/projects/jtidy/forums/forum/41437/topic/3758644:

Hi All,
I\'m trying to extract a div element with its content from html file. Then I want to write a new html file wich its only content will be the extracted div. Everything\'s ok until the point where I try to do tidy.pprint(w3cDoc, bos); 
After this pprint sentences I inspect bos and it is empty. I had never work with jTidy, it\'s so hard to find examples o tutorials, any idea?

Thank you,

This is the complete code block
[code]Document doc = tidy.parseDOM(new FileInputStream(\"myFile.html\"), null);
                    DOMReader reader = new DOMReader();
                    org.dom4j.Document dom4jDoc = reader.read(doc);
                    String node = \"//div[@id=\'contenedor\']\";
                    Node myNode = dom4jDoc.selectSingleNode(node);
              
                    miNodo.setDocument(null);
                    miNodo.setParent(null);
                    //Create new Document
                    org.dom4j.Document newHTML = DocumentHelper.createDocument();
                  
                    newHTML.add(miNodo);
                    DOMWriter writer = new DOMWriter();
                    try {
                        Document w3cDoc = writer.write(newHTML);
                        ByteArrayOutputStream bos = new ByteArrayOutputStream();
                        tidy.pprint(w3cDoc, bos);[/code]

[Jtidy-user] [Help] RE: Tidy and linebreaks after < BR> in...

From: SourceForge.net <no...@so...> - 2010-06-07 17:28:35

The following forum message was posted by  at http://sourceforge.net/projects/jtidy/forums/forum/41437/topic/3683558:

Sure.

One thing, I\'m also finding that we get line breaks after <b> and <i> in <pre> tags... is this part of the bug?

[Jtidy-user] [Help] how to process specific tag

From: SourceForge.net <no...@so...> - 2010-05-16 20:29:32

The following forum message was posted by Anonymous at http://sourceforge.net/projects/jtidy/forums/forum/41437/topic/3711387:

I\'m processing bad-formated HTML pages with JTidy. I am only interested in fixing a specific set of tags, for example <img> <table>. Is there anyway to tell JTidy to focus on only those tags?

[Jtidy-user] [Help] New to JTidy

From: SourceForge.net <no...@so...> - 2010-05-14 06:58:48

The following forum message was posted by viswavaranasi at http://sourceforge.net/projects/jtidy/forums/forum/41437/topic/3708791:

Hi,
does Jtidy can be used for deleting the unwanted HTML tags from a html file?
with JTidy is it possible to replace some of the HTML tags with new HTML tags?

for eg : in a HTML file,replace  all <h1>  with <h2> 

pls refer me with some examples or tutorials on Jtidy.

Thanks
viswa

[Jtidy-user] Using jtidy with XPath

From: Misha K. <mis...@gm...> - 2010-05-12 10:03:27

Dear All:

Thank you for great product!

I am using TagSoup+XOM per:
http://nicklothian.com/blog/2006/09/11/using-xpath-on-real-world-html-documents/
seems to work well except the following namespace problem:
http://www.supermind.org/blog/613/dom4j-xpath-tagsoup-namespaces-sweet

Can I use JTidy for XPath? Any code samples? How does it compare to tagsoup/HTMLParser/Jericho etc?

Thank you
Misha

[Jtidy-user] Using jtidy with XPath

From: Misha K. <mis...@gm...> - 2010-05-12 10:02:06

Dear All:

Thank you for great product!

I am using TagSoup+XOM per:
http://nicklothian.com/blog/2006/09/11/using-xpath-on-real-world-html-documents/
seems to work well except the following namespace problem:
http://www.supermind.org/blog/613/dom4j-xpath-tagsoup-namespaces-sweet

Can I use JTidy for XPath? Any code samples? How does it compare to tagsoup/HTMLParser/Jericho etc?

Thank you
Misha

[Jtidy-user] [Help] RE: getTextContent() always returning null

From: SourceForge.net <no...@so...> - 2010-04-24 08:32:42

The following forum message was posted by aditsu at http://sourceforge.net/projects/jtidy/forums/forum/41437/topic/3683463:

JTidy implements getTextContent in DOMNodeImpl (only), this way:

[code]    /**
     * @todo DOM level 3 getTextContent() Not implemented. Returns null.
     * @see org.w3c.dom.Node#getTextContent()
     */
    public String getTextContent() throws DOMException
    {
        return null;
    }[/code]

I think it\'s quite obvious.
Can you file a bug report?

[Jtidy-user] [Help] RE: Tidy and linebreaks after < BR> in...

From: SourceForge.net <no...@so...> - 2010-04-24 07:49:20

The following forum message was posted by aditsu at http://sourceforge.net/projects/jtidy/forums/forum/41437/topic/3683558:

I think I fixed this in the CodeUpdateAndJava5 branch, but trunk still has this bug. I\'d have to backport it.
Wanna file a bug report with a test case?

[Jtidy-user] [Help] Tidy and linebreaks after <BR> in...

From: SourceForge.net <no...@so...> - 2010-04-21 13:29:31

The following forum message was posted by Anonymous at http://sourceforge.net/projects/jtidy/forums/forum/41437/topic/3683558:

It looks like this was supposed to have been fixed in htmltidy a while back:
http://osdir.com/ml/web.html-tidy.tracker/2006-04/msg00015.html

Is this working as designed? One should expect linebreaks after <BR>s in <PRE> tags?

[Jtidy-user] [Help] RE: getTextContent() always returning null

From: SourceForge.net <no...@so...> - 2010-04-21 11:03:50

The following forum message was posted by Anonymous at http://sourceforge.net/projects/jtidy/forums/forum/41437/topic/3683463:

Right, apparently I messed up the BBCode. This is the Java code:

[code]// Load test.html.
String file = \"test.html\";
InputStream in = new FileInputStream(file);
OutputStream out = null;

// Parse test.html into a DOM tree.
Tidy tidy = new Tidy();
Document doc = tidy.parseDOM(in, out);

// Print <body>\'s text content.
org.w3c.dom.Node body = doc.getElementsByTagName(\"body\").item(0);
Element bodyElement = (Element) body;
String bodyTextContent = bodyElement.getTextContent();
System.out.print(\"<body> TextContent:\\n\" + bodyTextContent);[/code]

[Jtidy-user] [Help] getTextContent() always returning null

From: SourceForge.net <no...@so...> - 2010-04-21 11:01:32

The following forum message was posted by Anonymous at http://sourceforge.net/projects/jtidy/forums/forum/41437/topic/3683463:

Everytime I call getTextContent() on an org.w3c.dom.Node object, it always returns null. When I checked the documentation, it said getTextContent only returns null when the Node object is either of type DOCUMENT_NODE, DOCUMENT_TYPE_NODE, or NOTATION_NODE. This is odd because it returns null on virtually every DOM node.

It illustrate my issue, I\'ve written a small test case. I\'ve used the following HTML code:

[code]<!DOCTYPE html>
<html>
<head>
<title>jwz</title>
</head>
<body>
<p>text<b>b<i>i<u>u</u>i</i>b<br>b</b>text</p>
</body>
</html>[/code]

Using the following Java code I\'ve tried to get the textContent of the <body> element:

[code]// Load test.html.
InputStream in = new FileInputStream(\"test.html\");
OutputStream out = null;

// Parse test.html into a DOM tree.
Tidy tidy = new Tidy();
Document doc = tidy.parseDOM(in, out);

// Print <body>\'s text content.
org.w3c.dom.Node body = doc.getElementsByTagName(\"body\").item(0);
Element bodyElement = (Element) body;
String bodyTextContent = bodyElement.getTextContent();
System.out.print(\"<body> TextContent:\\n\" + bodyTextContent);[/code]

However, the result is:

[code]<body> TextContent:
null[/code]

Did I do something wrong here? Or is this not supposed to happen?

Thanks in advance!

[Jtidy-user] line break inserted after <br> (in <pre>)

From: Kevin B. <kb...@gm...> - 2010-04-18 19:24:12

It looks like this was supposed to have been fixed in htmltidy:
http://osdir.com/ml/web.html-tidy.tracker/2006-04/msg00015.html

Is this working as designed? One should expect linebreaks after <BR>s in
<PRE> tags?

[Jtidy-user] [Help] RE: What is the plan for JTidy through Maven Repo

From: SourceForge.net <no...@so...> - 2010-04-12 16:30:13

The following forum message was posted by verhagent at http://sourceforge.net/projects/jtidy/forums/forum/41437/topic/3673061:

Hi Adrian, Thanks for the quick reply! Yes, it would be of great help for me, and also for others, when they use JTidy API from within a Maven based project. Ok, I\'ll have to investigate this part also myself a bit more. I\'ll let you know, what you (/ we) need to do, to get it done. Thanks in advance! Tjeerd

[Jtidy-user] [Help] RE: What is the plan for JTidy through Maven Repo

From: SourceForge.net <no...@so...> - 2010-04-12 13:24:31

The following forum message was posted by aditsu at http://sourceforge.net/projects/jtidy/forums/forum/41437/topic/3673061:

Hi, I\'m the current JTidy maintainer. I joined the project last year, after noticing that it was almost abandoned.
That feature request is much older.
I don\'t use maven at all, I don\'t know what needs to be done and I\'d rather not bother doing it.
But if it is useful to you and you know how to release it to whatever repository you need, then just go ahead. Let me know what you need and I will assist you.

Adrian