Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
This folder has no files. | |||
Totals: 0 Items | 0 |
CHESS GAMES DATABASE with some 15 MILLION games mostly played by human players
WHICH FILES TO DOWNLOAD?
Look into latest version folder and download, there is a hash file ( MD5 ) which you can use to verify package both before and after downloading.
Work is in progress to remove duplicate games and to optimize database. check back from time to time for updates.
HOW TO USE DATABASE?
Database has been compressed with 7zip LZMA2 Ultra method, so you will probably need 7zip to unpack the archive after downloading.
To open, browse and manipulate the database you will need Scid vs PC installed.
Database is split into separate databases, detailed description is within latest update folder.
HOW TO CONTRIBUTE
You can contribute your games collection in one of the following ways:
- browse your domestic official chess website for OTB game archives.
- browse international chess websites for OTB games.
- browse for puzzles.
- browse for tactics.
Collect, zip and upload your collected games to some file sharing site. Please make sure the format of collected games is either PGN or Scid, ( conversion tools might help ) Also zipping with 7zip (Ultra LZMA2) migh help a lot to compress as much as possible. In return I will update the database with the games you provide, so that this database becomes even bigger than it is now!
( Yes there is a limit of 16.7 million games per database, but there is no limit to the number of databases :D )
Also please make sure to visit discusion forum here to learn what websites has been used so far to collect games, in order to avoid collecting already harvested games. Also make sure to share the links in forums to let other people know which sites you harvested.
Preffer to collect human played OTB games, puzzles, mid games and endgames, since these are worth more than any other. games played by machines, online games and correspondence games are not attractive for now.
HOW TO GET RID OF DUPLICATES?
Problem is that removing duplicates is very time consuming process, not only that but it also fully consumes disk read/write during that time!
Actions ( for testing purposes ) performed so far against previous database packages include following (in order):
-Spell checking of players, events, sites and rounds by using "spelling.ssp" file.
-Add missing Elo ratings by using "ratings.ssp" file.
-ECO classify all games by using "scid.eco" file.
-Delete twin games (full match - all conditions)
-Compact database (both game file and name file)
-Sort database according to ECO code, white and black player in order.
-Compact database (game file)
-Check games (Read from disk, has index fetched and is decoded...)
Beside actions above (full match - all conditions), additional actions against the database include following (in order)
Match only first 4 letters of player names to identify duplicates ( result: cca 10% - 20% reduction )
Skip site checking only ( result: cca 1% - 2% reduction )
Skip event checking only ( result: cca 4% - 5% reduction)
Skip month and day only ( result: max 1% reduction )
First 4 letters only, skip: event, site, day and month ( result: cca 30% reduction )
Thank You!
~codekiddy