Menu

#202 clean dumped pgn file from polyglot_tolerant

1.3.2
open
nobody
None
5
2022-01-31
2022-01-15
Jonathan
No

polyglot_tolerant has some option to dump a bin book to pgn.
You can find polyglot_tolerant here: https://chess.massimilianogoi.com/download/polyglottolerant/
I I type the comment 'polyglot_tolerant dump-book -bin book.bin -color white -out book.pgn' for example I get something like a pgn file that looks like this:

Dump of "book.bin" for white.
1: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} dxc4 4. e3{100%} b5 5. a4{100%} e6 6. axb5{100%} cxb5 7. b3{100%} Bb4+ 8. Bd2{100%} Bxd2+ 9. Nbxd2{100%} a5 10. bxc4{100%} b4 11. Ne5{100%}
2: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} dxc4 4. e3{100%} Be6 5. Nc3{100%} b5 6. a4{100%} b4 7. Ne2{50%}
3: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} dxc4 4. e3{100%} Be6 5. Nc3{100%} b5 6. a4{100%} b4 7. Ne4{50%}
4: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} e6 4. Nbd2{100%} f5 5. g3{100%} Nd7 6. Bg2{100%} Bd6 7. O-O{100%} Ngf6 8. Ne1{100%} O-O 9. Nd3{100%} Ne4 10. Qc2{100%} b6 11. b4{100%}
5: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} e6 4. Nbd2{100%} f5 5. g3{100%} Bd6 6. Bg2{100%} Nd7 {trans: line=4, ply=12}
6: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} e6 4. Nbd2{100%} f5 5. g3{100%} Bd6 6. Bg2{100%} Nf6 7. O-O{100%} Nbd7 {trans: line=4, ply=14}
7: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} e6 4. Nbd2{100%} f5 5. g3{100%} Bd6 6. Bg2{100%} Nf6 7. O-O{100%} O-O 8. Ne5{100%} b6 9. Ndf3{100%}
8: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} e6 4. Nbd2{100%} f5 5. g3{100%} Bd6 6. Bg2{100%} Nf6 7. O-O{100%} O-O 8. Ne5{100%} Qe7 9. Ndf3{100%}
9: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} e6 4. Nbd2{100%} f5 5. g3{100%} Nf6 6. Bg2{100%} Bd6 {trans: line=6, ply=12}
10: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} e6 4. Nbd2{100%} Nf6 5. g3{100%} b6 6. Bg2{100%} Nbd7 7. O-O{100%}
11: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} e6 4. Nbd2{100%} Nf6 5. g3{100%} b6 6. Bg2{100%} Bb7 7. O-O{100%}
12: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} e6 4. Nbd2{100%} Nf6 5. g3{100%} c5 6. Bg2{100%} Nc6 7. O-O{100%} cxd4 8. cxd5{100%}
13: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} e6 4. Nbd2{100%} Nf6 5. g3{100%} Nbd7 6. Bg2{100%} b6 {trans: line=10, ply=12}
14: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} e6 4. Nbd2{100%} Nf6 5. g3{100%} Nbd7 6. Bg2{100%} Be7 7. O-O{100%} O-O 8. Qc2{100%} b6 9. e4{100%} dxc4 10. Nxc4{100%}
15: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} e6 4. Nbd2{100%} Nf6 5. g3{100%} Nbd7 6. Bg2{100%} Be7 7. O-O{100%} O-O 8. Qc2{100%} b6 9. e4{100%} Bb7 10. e5{100%}
16: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} e6 4. Nbd2{100%} Nf6 5. g3{100%} Nbd7 6. Bg2{100%} Bd6 7. O-O{100%} O-O 8. Qc2{100%} Re8 9. Rd1{100%}
17: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} e6 4. Nbd2{100%} Nf6 5. g3{100%} Be7 6. Bg2{100%} Nbd7 {trans: line=14, ply=12}
18: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} e6 4. Nbd2{100%} Nf6 5. g3{100%} Be7 6. Bg2{100%} O-O 7. O-O{100%} b6 8. Qc2{100%} Nbd7 {trans: line=14, ply=16}
19: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} e6 4. Nbd2{100%} Nf6 5. g3{100%} Be7 6. Bg2{100%} O-O 7. O-O{100%} b6 8. Qc2{100%} Bb7 9. e4{100%} Nbd7 {trans: line=15, ply=18}
20: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} e6 4. Nbd2{100%} Nf6 5. g3{100%} Be7 6. Bg2{100%} O-O 7. O-O{100%} Nbd7 {trans: line=14, ply=14}
21: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} e6 4. Nbd2{100%} Nf6 5. g3{100%} Bd6 6. Bg2{100%} Nbd7 {trans: line=16, ply=12}

Most chess software can not read this except chessX
So what I would normally do is opening the pgn with chessX, and copy the game to a new pgn database in chessX, so the pgn is normalized and can be read by anny chess GUI

The problem is that it only works with a small pgn database of up to 10000 games for chessX to open the pgn database from the polyglot dump.
I want to be able to do it, with dump databases with more than 100000 games.

I have tried pgn-extract to normalize the pgn but it didn't work.
So I would like chessX to be able to read verry big dumb pgn files so I can normalize the databases

Related

Feature Requests: #202

Discussion

  • Jens Nissen

    Jens Nissen - 2022-01-29

    ChessX can operate on databases with several million games without any problems. I have a database with all games from TWIC and this covers 3.400.000 games. The restriction might come from your computer (memory and processor).

    There are two things that I want to mention: ChessX can't really read the output - I tested against 1.4.6 and it recognizes only every second game.
    Second thing: the latest ChessX build detects the correct number of games but cannot parse a single move.

    The issues all programs have come from two issues:
    - the weird number at the beginning of each line which is not conformant with PGN
    - the missing end delimiter of a game.
    You could fix this by sending the PGN through a preprocessor like sed and remove "^{1-9}: " and add an asterisk * to the end of the line. Then any chess program should be able to read your file, at least, ChessX can.

    Coming to think of it, I might insert a tweak into chessx which handles this internally.

     
    • Jonathan

      Jonathan - 2022-01-30

      Hi Jens,

      Thanks for your response.
      I found another way to convert a bin book created with polyglot to pgn.
      PolyGlot Telerant has a script 'createpgn' from a bin book, it uses Python
      I think.
      https://web.archive.org/web/20210507064550/https://chess.massimilianogoi.com/download/polyglottolerant/

      I have some other requests.
      Somebody made a Python tool so I can put the eco code and openings name as
      annotation in the game. After the first moves of the opening as annotation
      in the pgn game in brackets { A00o Grob Gambit Hurst Attack }, for example.
      It works with the eco.pgn that comes with pgn extract.
      I'm looking for another pgn because eco.pgn is very limited.

      pgnextract_eco.pgn looks like this

      [Event "?"]

      [Site "?"]
      [Date "????.??.??"]
      [Round "?"]
      [White "?"]
      [Black "?"]
      [Result "*"]
      [ECO "A00"]
      [Opening "Clemenz (Mead's, Basman's or de Klerk's) opening"]

      1. h3 *

      [Event "?"]
      [Site "?"]
      [Date "????.??.??"]
      [Round "?"]
      [White "?"]
      [Black "?"]
      [Result "*"]
      [ECO "A00"]
      [Opening "Global opening"]

      1. h3 e5 2. a3 *

      [Event "?"]
      [Site "?"]
      [Date "????.??.??"]
      [Round "?"]
      [White "?"]
      [Black "?"]
      [Result "*"]
      [ECO "A00"]
      [Opening "Amar (Paris) opening"]

      1. Nh3 *

      [Event "?"]
      [Site "?"]
      [Date "????.??.??"]
      [Round "?"]
      [White "?"]
      [Black "?"]
      [Result "*"]
      [ECO "A00"]
      [Opening "Amar gambit"]

      1. Nh3 d5 2. g3 e5 3. f4 Bxh3 4. Bxh3 exf4 *

      [Event "?"]
      [Site "?"]
      [Date "????.??.??"]
      [Round "?"]
      [White "?"]
      [Black "?"]
      [Result "*"]
      [ECO "A00"]
      [Opening "Dunst (Sleipner, Heinrichsen) opening"]

      1. Nc3 *

      I find an other eco file in the installation folder of SCID 7, in the bin
      folder is a file called "scid.eco"

      If I open this file in a text editor, it looks like this:

      scid.eco

      This is the ECO classification file for Scid.

      The Scid ECO code format allows for extensions: each basic code can have

      a lower case letter (a-z) appended, and a further level (1-4) can be

      added to each extension. So the order of ECO codes for A00 is:

      A00, A00a, A00a1, A00a2, A00a3, A00a4, A00b, A00b1, ..., A00z4.

      You can convert this file to PGN format with "eco2pgn" and to EPD

      format with "eco2epd" -- these Scid programs are not compiled by

      default so you may need to compile them first, e.g. "make eco2pgn".

      Copyright (C) 1999-2003 Shane Hudson (sgh@users.sourceforge.net)

      Created: June 1999.

      Last update: January 2011.

      Scid website: http://scid.sourceforge.net/

      A00a "Start position" *
      A00b "Barnes Opening" 1.f3 *
      A00b "Fried fox" 1.f3 e5 2.Kf2 *
      A00c "Kadas Opening" 1.h4 *
      A00d "Clemenz Opening" 1.h3 *
      A00e "Ware Opening" 1.a4 *
      A00f "Anderssen Opening" 1.a3 *
      A00f "Creepy Crawly Opening (Basman)" 1.a3 e5 2.h3 d5 *
      A00g "Amar/Paris Opening" 1.Nh3 *
      A00g "Amar: Paris Gambit" 1.Nh3 d5 2.g3 e5 3.f4 *
      A00h "Durkin" 1.Na3 *
      A00i "Saragossa" 1.c3 *
      A00j "Mieses" 1.d3 *
      A00j "Mieses: 1...e5" 1.d3 e5 *
      A00j "Mieses: 1...d5" 1.d3 d5 *
      A00j "Spike Deferred" 1.d3 g6 2.g4 *
      A00k "Van Kruijs" 1.e3 *
      A00l "Van Geet (Dunst) Opening" 1.Nc3 *
      A00l "Van Geet: 1...Nf6" 1.Nc3 Nf6 *
      A00l "Van Geet: 1...Nf6 2.Nf3" 1.Nc3 Nf6 2.Nf3 *
      A00l "Van Geet: Tuebingen Gambit" 1.Nc3 Nf6 2.g4 *
      A00l "Van Geet: 1...e5" 1.Nc3 e5 *
      A00l "Van Geet: 1...e5 2.Nf3" 1.Nc3 e5 2.Nf3 *
      A00l "Van Geet: Sicilian Variation" 1.Nc3 c5 *
      A00l "Van Geet: Sicilian Variation, 2.Nf3" 1.Nc3 c5 2.Nf3 *
      A00l "Van Geet: Sicilian Variation, 2.Nf3 Nc6" 1.Nc3 c5 2.Nf3 Nc6 *
      A00m "Van Geet: 1...d5" 1.Nc3 d5 *
      A00m "Van Geet: 1...d5 2.Nf3" 1.Nc3 d5 2.Nf3 *
      A00m "Van Geet: 1...d5 2.Nf3 Nf6" 1.Nc3 d5 2.Nf3 Nf6 *
      A00m "Van Geet: 1...d5 2.e4" 1.Nc3 d5 2.e4 *
      A00m "Van Geet: 1...d5 2.e4 d4" 1.Nc3 d5 2.e4 d4 *
      A00m "Van Geet: 1...d5 2.e4 dxe4" 1.Nc3 d5 2.e4 dxe4 *
      A00m "Van Geet: Hector Gambit" 1.Nc3 d5 2.e4 dxe4 3.Bc4 *
      A00n "Grob" 1.g4 *
      A00n "Grob: Alessi Gambit" 1.g4 f5 *
      A00n "Grob: Double Grob" 1.g4 g5 *
      A00n "Grob: 1...e5" 1.g4 e5 *
      A00o "Grob: 1...d5" 1.g4 d5 *
      A00o "Grob Gambit" 1.g4 d5 2.Bg2 *
      A00o "Grob Gambit: e5" 1.g4 d5 2.Bg2 e5 *
      A00o "Grob Gambit: Hurst Attack" 1.g4 d5 2.Bg2 e5 3.c4 *
      A00o "Grob Gambit: 2...c6" 1.g4 d5 2.Bg2 c6 *
      A00o "Grob Gambit: Spike Attack" 1.g4 d5 2.Bg2 c6 3.g5 *
      A00o "Grob Gambit Accepted" 1.g4 d5 2.Bg2 Bxg4 *
      A00o "Grob Gambit Accepted: Fritz Gambit" 1.g4 d5 2.Bg2 Bxg4 3.c4 *

      Would it be possible somehow to convert it so the format looks like this of
      pgnextract_eco.pgn?

      I have included the file in the attachment.

      I hope you can help me with this.

      Regards Jonathan

      Op za 29 jan. 2022 om 23:22 schreef Jens Nissen hognose@users.sourceforge.net:

      ChessX can operate on databases with several million games without any
      problems. I have a database with all games from TWIC and this covers
      3.400.000 games. The restriction might come from your computer (memory and
      processor).

      There are two things that I want to mention: ChessX can't really read the
      output - I tested against 1.4.6 and it recognizes only every second game.
      Second thing: the latest ChessX build detects the correct number of games
      but cannot parse a single move.

      The issues all programs have come from two issues:
      - the weird number at the beginning of each line which is not conformant
      with PGN
      - the missing end delimiter of a game.
      You could fix this by sending the PGN through a preprocessor like sed and
      remove "^{1-9}: " and add an asterisk * to the end of the line. Then any
      chess program should be able to read your file, at least, ChessX can.

      Coming to think of it, I might insert a tweak into chessx which handles
      this internally.


      Status: open
      Group: 1.3.2
      Created: Sat Jan 15, 2022 02:25 PM UTC by Jonathan
      Last Updated: Sat Jan 15, 2022 02:25 PM UTC
      Owner: nobody

      polyglot_tolerant has some option to dump a bin book to pgn.
      You can find polyglot_tolerant here:
      https://chess.massimilianogoi.com/download/polyglottolerant/
      I I type the comment 'polyglot_tolerant dump-book -bin book.bin -color
      white -out book.pgn' for example I get something like a pgn file that looks
      like this:

      Dump of "book.bin" for white.
      1: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} dxc4 4. e3{100%} b5 5.
      a4{100%} e6 6. axb5{100%} cxb5 7. b3{100%} Bb4+ 8. Bd2{100%} Bxd2+ 9.
      Nbxd2{100%} a5 10. bxc4{100%} b4 11. Ne5{100%}
      2: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} dxc4 4. e3{100%} Be6 5.
      Nc3{100%} b5 6. a4{100%} b4 7. Ne2{50%}
      3: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} dxc4 4. e3{100%} Be6 5.
      Nc3{100%} b5 6. a4{100%} b4 7. Ne4{50%}
      4: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} e6 4. Nbd2{100%} f5 5.
      g3{100%} Nd7 6. Bg2{100%} Bd6 7. O-O{100%} Ngf6 8. Ne1{100%} O-O 9.
      Nd3{100%} Ne4 10. Qc2{100%} b6 11. b4{100%}
      5: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} e6 4. Nbd2{100%} f5 5.
      g3{100%} Bd6 6. Bg2{100%} Nd7 {trans: line=4, ply=12}
      6: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} e6 4. Nbd2{100%} f5 5.
      g3{100%} Bd6 6. Bg2{100%} Nf6 7. O-O{100%} Nbd7 {trans: line=4, ply=14}
      7: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} e6 4. Nbd2{100%} f5 5.
      g3{100%} Bd6 6. Bg2{100%} Nf6 7. O-O{100%} O-O 8. Ne5{100%} b6 9. Ndf3{100%}
      8: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} e6 4. Nbd2{100%} f5 5.
      g3{100%} Bd6 6. Bg2{100%} Nf6 7. O-O{100%} O-O 8. Ne5{100%} Qe7 9.
      Ndf3{100%}
      9: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} e6 4. Nbd2{100%} f5 5.
      g3{100%} Nf6 6. Bg2{100%} Bd6 {trans: line=6, ply=12}
      10: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} e6 4. Nbd2{100%} Nf6 5.
      g3{100%} b6 6. Bg2{100%} Nbd7 7. O-O{100%}
      11: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} e6 4. Nbd2{100%} Nf6 5.
      g3{100%} b6 6. Bg2{100%} Bb7 7. O-O{100%}
      12: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} e6 4. Nbd2{100%} Nf6 5.
      g3{100%} c5 6. Bg2{100%} Nc6 7. O-O{100%} cxd4 8. cxd5{100%}
      13: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} e6 4. Nbd2{100%} Nf6 5.
      g3{100%} Nbd7 6. Bg2{100%} b6 {trans: line=10, ply=12}
      14: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} e6 4. Nbd2{100%} Nf6 5.
      g3{100%} Nbd7 6. Bg2{100%} Be7 7. O-O{100%} O-O 8. Qc2{100%} b6 9. e4{100%}
      dxc4 10. Nxc4{100%}
      15: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} e6 4. Nbd2{100%} Nf6 5.
      g3{100%} Nbd7 6. Bg2{100%} Be7 7. O-O{100%} O-O 8. Qc2{100%} b6 9. e4{100%}
      Bb7 10. e5{100%}
      16: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} e6 4. Nbd2{100%} Nf6 5.
      g3{100%} Nbd7 6. Bg2{100%} Bd6 7. O-O{100%} O-O 8. Qc2{100%} Re8 9.
      Rd1{100%}
      17: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} e6 4. Nbd2{100%} Nf6 5.
      g3{100%} Be7 6. Bg2{100%} Nbd7 {trans: line=14, ply=12}
      18: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} e6 4. Nbd2{100%} Nf6 5.
      g3{100%} Be7 6. Bg2{100%} O-O 7. O-O{100%} b6 8. Qc2{100%} Nbd7 {trans:
      line=14, ply=16}
      19: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} e6 4. Nbd2{100%} Nf6 5.
      g3{100%} Be7 6. Bg2{100%} O-O 7. O-O{100%} b6 8. Qc2{100%} Bb7 9. e4{100%}
      Nbd7 {trans: line=15, ply=18}
      20: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} e6 4. Nbd2{100%} Nf6 5.
      g3{100%} Be7 6. Bg2{100%} O-O 7. O-O{100%} Nbd7 {trans: line=14, ply=14}
      21: 1. d4{100%} c6 2. c4{100%} d5 3. Nf3{100%} e6 4. Nbd2{100%} Nf6 5.
      g3{100%} Bd6 6. Bg2{100%} Nbd7 {trans: line=16, ply=12}

      Most chess software can not read this except chessX
      So what I would normally do is opening the pgn with chessX, and copy the
      game to a new pgn database in chessX, so the pgn is normalized and can be
      read by anny chess GUI

      The problem is that it only works with a small pgn database of up to 10000
      games for chessX to open the pgn database from the polyglot dump.
      I want to be able to do it, with dump databases with more than 100000
      games.

      I have tried pgn-extract to normalize the pgn but it didn't work.
      So I would like chessX to be able to read verry big dumb pgn files so I
      can normalize the databases


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/chessx/feature-requests/202/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       

      Related

      Feature Requests: #202

  • Jens Nissen

    Jens Nissen - 2022-01-31

    Even though ChessX has an internal parser for this file, ChessX only uses it to build an internal ECO reference (so that it can classify new games or unclassified games). It's not very difficult to make it generate a PGN file as well. See for compileAsciiEcoFile() method inside ChessX. You would require a PgnDatabase for writing and a GameX variable to collect the moves that are parsed. After parsing an asterisk * simply write the GameX variable into the Database and don't forget to call output after finishing the complete file.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.