Name | Modified | Size | Downloads / Week |
---|---|---|---|
wsprspots | 2019-07-09 | ||
utility-scripts | 2019-07-05 | ||
fileinfo | 2019-07-05 | ||
README.md | 2019-07-05 | 3.0 kB | |
Totals: 4 Items | 3.0 kB | 0 |
WSPR Analitics Source Data
This project is a 7z recompression of the original spots from WSPRnet. As the project progresses, analitic aplications, scripts, and utilities will be added for general public use.
All original *.csv.gz
archives were decompressed, imported to MongoDB and
PostgreSQL to check for csv import errors. Two months had issues
- wsprspots-2013-01.csv.gz
- wsprspots-2013-02.csv.gz
Both files had the same anamoly, IZ"WMD
is a malformed callsign which fails to
import properly. Errored lines were removed then compressed with the following commands:
# Remove errored spots
sed -i '/IZ\"WMD/d' ./wsprspots-2013-01.csv
sed -i '/IZ\"WMD/d' ./wsprspots-2013-02.csv
# 7z compression command:
7z a -mx=9 $file.7z ./$file.csv
Compression and Stats
The following tests were run to see which options proved most benificial. On average, 7z reduces file size by 45% to 50% with the files tested.
Import file stats can be reviewed in the fileinfo folder:
Stats JSON Fields
The structure of the stats file is as follows:
- _id: ObjectId is the key for the document
- fileName is the WSPR CSV file
- lineCount is the number of decodes in the archive
- csvSize is on-disk csv file size in bytes
- archiveSize on-disk 7z post compression size in bytes
- processDate is the ISODate when the files was processed.
NOTE: The structure could change over time, but for now, this is all that is being tracked. The key element is the lineCount as that will be used for a number of metrics without having to parse the database or open and recount from the source files.
Compression Tests
7z Compression Tests
1) 7z a -mx=9 -mfb=273 -ms=on $file.7z ./*.csv
Results : raw csv = 270.1 MB (270,072,472 bytes),
gz = 49.7 MB (49,707,897 bytes)
7z = 26.1 MB (26,099,881 bytes)
2) 7z a -t7z -m0=lzma2 -mx=9 -mfb=64 -md=1024m -ms=on $file.7z ./*.csv
Results : raw csv = 270.1 MB (270,072,472 bytes)
gz = 49.7 MB (49,707,897 bytes)
7z = 26.1 MB (26,098,713 bytes)
3) 7z a -mx=9 $file.7z ./*.csv
Results : raw csv = 270.1 MB (270,072,472 bytes)
gz = 49.7 MB (49,707,897 bytes)
7z = 26.1 MB (26,099,881 bytes)
4) 7z a -t7z -mx=9 -mfb=273 -ms -md=31 -myx=9 -mtm=- -mmt -mmtf -md=1536m -mmf=bt3 -mmc=10000 -mpb=0 -mlc=0 $file.7z ./*.csv
Results : raw csv = 270.1 MB (270,072,472 bytes)
gz = 49.7 MB (49,707,897 bytes)
7z = 24.6 MB (24,551,235 bytes)
While method (4) creates a smaller archive, it's very time comsuming. Number (3)seems to be the best all-around solution in terms of speed and post compression archive sizing.