Download Latest Version wsprspots-2019-05.7z (399.0 MB)
Email in envelope

Get an email when there's a new version of WSPR Analytics

Home
Name Modified Size InfoDownloads / Week
wsprspots 2019-07-09
utility-scripts 2019-07-05
fileinfo 2019-07-05
README.md 2019-07-05 3.0 kB
Totals: 4 Items   3.0 kB 0

WSPR Analitics Source Data

This project is a 7z recompression of the original spots from WSPRnet. As the project progresses, analitic aplications, scripts, and utilities will be added for general public use.

All original *.csv.gz archives were decompressed, imported to MongoDB and PostgreSQL to check for csv import errors. Two months had issues

  1. wsprspots-2013-01.csv.gz
  2. wsprspots-2013-02.csv.gz

Both files had the same anamoly, IZ"WMD is a malformed callsign which fails to import properly. Errored lines were removed then compressed with the following commands:

# Remove errored spots
sed -i '/IZ\"WMD/d' ./wsprspots-2013-01.csv
sed -i '/IZ\"WMD/d' ./wsprspots-2013-02.csv

# 7z compression command:
7z a -mx=9 $file.7z ./$file.csv

Compression and Stats

The following tests were run to see which options proved most benificial. On average, 7z reduces file size by 45% to 50% with the files tested.

Import file stats can be reviewed in the fileinfo folder:

Stats JSON Fields

The structure of the stats file is as follows:

  • _id: ObjectId is the key for the document
  • fileName is the WSPR CSV file
  • lineCount is the number of decodes in the archive
  • csvSize is on-disk csv file size in bytes
  • archiveSize on-disk 7z post compression size in bytes
  • processDate is the ISODate when the files was processed.

NOTE: The structure could change over time, but for now, this is all that is being tracked. The key element is the lineCount as that will be used for a number of metrics without having to parse the database or open and recount from the source files.

Compression Tests

7z Compression Tests

1) 7z a -mx=9 -mfb=273 -ms=on $file.7z ./*.csv
   Results :    raw csv = 270.1 MB (270,072,472 bytes), 
                gz = 49.7 MB (49,707,897 bytes)
                7z = 26.1 MB (26,099,881 bytes)

2) 7z a -t7z -m0=lzma2 -mx=9 -mfb=64 -md=1024m -ms=on $file.7z ./*.csv
   Results :    raw csv = 270.1 MB (270,072,472 bytes)
                gz = 49.7 MB (49,707,897 bytes)
                7z = 26.1 MB (26,098,713 bytes)

3)  7z a -mx=9 $file.7z ./*.csv
    Results :   raw csv = 270.1 MB (270,072,472 bytes)
                gz = 49.7 MB (49,707,897 bytes)
                7z = 26.1 MB (26,099,881 bytes)


4) 7z a -t7z -mx=9 -mfb=273 -ms -md=31 -myx=9 -mtm=- -mmt -mmtf -md=1536m -mmf=bt3 -mmc=10000 -mpb=0 -mlc=0 $file.7z ./*.csv
    Results :   raw csv = 270.1 MB (270,072,472 bytes)
                gz = 49.7 MB (49,707,897 bytes)
                7z = 24.6 MB (24,551,235 bytes)

While method (4) creates a smaller archive, it's very time comsuming. Number (3)seems to be the best all-around solution in terms of speed and post compression archive sizing.

Source: README.md, updated 2019-07-05