Thread: [Htmlparser-user] unable to determine cannonical charset name for utf-9 - using ISO-8859-1
Brought to you by:
derrickoswald
From: Chris P. <cp...@kc...> - 2009-06-19 23:20:42
|
Hello! I have been using html parser for a long time and all of a sudden one of the applications I use it in reported this: unable to determine cannonical charset name for utf-9 - using ISO-8859-1 I am a little dumb about these sort of things, however is there something I could look for in the source page that could be causing this? Chris |
From: Derrick O. <der...@gm...> - 2009-06-20 04:55:03
|
The charset specified by the HTTP server in the header or by the HTML page itself is not found in the list of character sets known to the Java Virctual Machine. It's likely the server got hacked and somebody changed the "utf-8" in the html source to "utf-9" because that doesn't exist, see: http://en.wikipedia.org/wiki/UTF-9_and_UTF-18 You can inform the site that it has an error. On Sat, Jun 20, 2009 at 1:20 AM, Chris Palmer <cp...@kc...>wrote: > Hello! > > I have been using html parser for a long time and all of a sudden one of > the applications I use it in reported this: > > unable to determine cannonical charset name for utf-9 - using ISO-8859-1 > > I am a little dumb about these sort of things, however is there something I > could look for in the source page that could be causing this? > > Chris > > > > ------------------------------------------------------------------------------ > Are you an open source citizen? Join us for the Open Source Bridge > conference! > Portland, OR, June 17-19. Two days of sessions, one day of unconference: > $250. > Need another reason to go? 24-hour hacker lounge. Register today! > > http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
From: Chris P. <cp...@kc...> - 2009-06-20 06:34:15
|
Hello! If you view source, it says UTF-8,is that still possible if the actual content says UTF-8? Chris On Sat, Jun 20, 2009 at 2:54 PM, Derrick Oswald <der...@gm...>wrote: > > The charset specified by the HTTP server in the header or by the HTML page > itself is not found in the list of character sets known to the Java Virctual > Machine. It's likely the server got hacked and somebody changed the "utf-8" > in the html source to "utf-9" because that doesn't exist, see: > http://en.wikipedia.org/wiki/UTF-9_and_UTF-18 > > You can inform the site that it has an error. > > On Sat, Jun 20, 2009 at 1:20 AM, Chris Palmer <cp...@kc...>wrote: > >> Hello! >> >> I have been using html parser for a long time and all of a sudden one of >> the applications I use it in reported this: >> >> unable to determine cannonical charset name for utf-9 - using ISO-8859-1 >> >> I am a little dumb about these sort of things, however is there something >> I could look for in the source page that could be causing this? >> >> Chris >> >> >> >> ------------------------------------------------------------------------------ >> Are you an open source citizen? Join us for the Open Source Bridge >> conference! >> Portland, OR, June 17-19. Two days of sessions, one day of unconference: >> $250. >> Need another reason to go? 24-hour hacker lounge. Register today! >> >> http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org >> _______________________________________________ >> Htmlparser-user mailing list >> Htm...@li... >> https://lists.sourceforge.net/lists/listinfo/htmlparser-user >> >> > > > ------------------------------------------------------------------------------ > Are you an open source citizen? Join us for the Open Source Bridge > conference! > Portland, OR, June 17-19. Two days of sessions, one day of unconference: > $250. > Need another reason to go? 24-hour hacker lounge. Register today! > > http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
From: Derrick O. <der...@gm...> - 2009-06-20 08:53:10
|
OK, the other possibility is that the utf-9 is specified in the HTTP header - which you don't see unless you add a ConnectionMonitor and look at the header with something like org.htmlparser.http.HttpHeader. Try: parser.getConnectionManager ().setMonitor (parser); On Sat, Jun 20, 2009 at 8:34 AM, Chris Palmer <cp...@kc...>wrote: > Hello! > If you view source, it says UTF-8,is that still possible if the actual > content says UTF-8? > > Chris > > > On Sat, Jun 20, 2009 at 2:54 PM, Derrick Oswald <der...@gm...>wrote: > >> >> The charset specified by the HTTP server in the header or by the HTML page >> itself is not found in the list of character sets known to the Java Virctual >> Machine. It's likely the server got hacked and somebody changed the "utf-8" >> in the html source to "utf-9" because that doesn't exist, see: >> http://en.wikipedia.org/wiki/UTF-9_and_UTF-18 >> >> You can inform the site that it has an error. >> >> On Sat, Jun 20, 2009 at 1:20 AM, Chris Palmer <cp...@kc...>wrote: >> >>> Hello! >>> >>> I have been using html parser for a long time and all of a sudden one of >>> the applications I use it in reported this: >>> >>> unable to determine cannonical charset name for utf-9 - using ISO-8859-1 >>> >>> I am a little dumb about these sort of things, however is there something >>> I could look for in the source page that could be causing this? >>> >>> Chris >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Are you an open source citizen? Join us for the Open Source Bridge >>> conference! >>> Portland, OR, June 17-19. Two days of sessions, one day of unconference: >>> $250. >>> Need another reason to go? 24-hour hacker lounge. Register today! >>> >>> http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org >>> _______________________________________________ >>> Htmlparser-user mailing list >>> Htm...@li... >>> https://lists.sourceforge.net/lists/listinfo/htmlparser-user >>> >>> >> >> >> ------------------------------------------------------------------------------ >> Are you an open source citizen? Join us for the Open Source Bridge >> conference! >> Portland, OR, June 17-19. Two days of sessions, one day of unconference: >> $250. >> Need another reason to go? 24-hour hacker lounge. Register today! >> >> http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org >> _______________________________________________ >> Htmlparser-user mailing list >> Htm...@li... >> https://lists.sourceforge.net/lists/listinfo/htmlparser-user >> >> > > > ------------------------------------------------------------------------------ > Are you an open source citizen? Join us for the Open Source Bridge > conference! > Portland, OR, June 17-19. Two days of sessions, one day of unconference: > $250. > Need another reason to go? 24-hour hacker lounge. Register today! > > http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
From: Chris P. <kc...@gm...> - 2009-06-20 09:44:36
|
Hi Derrick, Thanks. I'll test that out. Since I am reading a site, I wonder if they put it in for things like HTML parser. Chris On 20/06/2009, at 6:53 PM, Derrick Oswald <der...@gm...> wrote: > OK, the other possibility is that the utf-9 is specified in the HTTP > header - which you don't see unless you add a ConnectionMonitor and > look at the header with something like org.htmlparser.http.HttpHeader. > > Try: > > parser.getConnectionManager ().setMonitor (parser); > > > On Sat, Jun 20, 2009 at 8:34 AM, Chris Palmer > <cp...@kc...> wrote: > Hello! > > If you view source, it says UTF-8,is that still possible if the > actual content says UTF-8? > > Chris > > > On Sat, Jun 20, 2009 at 2:54 PM, Derrick Oswald <der...@gm... > > wrote: > > The charset specified by the HTTP server in the header or by the > HTML page itself is not found in the list of character sets known to > the Java Virctual Machine. It's likely the server got hacked and > somebody changed the "utf-8" in the html source to "utf-9" because > that doesn't exist, see: > http://en.wikipedia.org/wiki/UTF-9_and_UTF-18 > > You can inform the site that it has an error. > > On Sat, Jun 20, 2009 at 1:20 AM, Chris Palmer > <cp...@kc...> wrote: > Hello! > > > I have been using html parser for a long time and all of a sudden > one of the applications I use it in reported this: > > unable to determine cannonical charset name for utf-9 - using > ISO-8859-1 > > I am a little dumb about these sort of things, however is there > something I could look for in the source page that could be causing > this? > > Chris > > > --- > --- > --- > --------------------------------------------------------------------- > Are you an open source citizen? Join us for the Open Source Bridge > conference! > Portland, OR, June 17-19. Two days of sessions, one day of > unconference: $250. > Need another reason to go? 24-hour hacker lounge. Register today! > http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > --- > --- > --- > --------------------------------------------------------------------- > Are you an open source citizen? Join us for the Open Source Bridge > conference! > Portland, OR, June 17-19. Two days of sessions, one day of > unconference: $250. > Need another reason to go? 24-hour hacker lounge. Register today! > http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > --- > --- > --- > --------------------------------------------------------------------- > Are you an open source citizen? Join us for the Open Source Bridge > conference! > Portland, OR, June 17-19. Two days of sessions, one day of > unconference: $250. > Need another reason to go? 24-hour hacker lounge. Register today! > http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > --- > --- > --- > --------------------------------------------------------------------- > Are you an open source citizen? Join us for the Open Source Bridge > conference! > Portland, OR, June 17-19. Two days of sessions, one day of > unconference: $250. > Need another reason to go? 24-hour hacker lounge. Register today! > http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user |