seti-science Wiki

Brought to you by: dan-werthimer, jeffcobb, korpela, mattlebofsky, and 4 others

SERENDIP

Overview
"/home/ryanl" on the berkeley servers consists of all the scripts for SERENDIP . They are first divided into directories associated with the projects that they are associated with. The main directory that we are focused on are the FAST and SERENDIP directories. The SERENDIP directory consists of all of the arecibo data scripts and a majority of general purpose SERENDIP scripts, while the FAST directory only contains scripts related to FAST specifically. Also note that each directory consists of a README file that explains a majority of what each directory is expected to contain and what each file does or directory contains. If anything is not covered in this document please refer to the README file for each directory. Lastly note that all the runs that I make on Nebula is done on the Amigos machine.

Nebula (for serendip data analysis)
Nebula cannot take files directly; the data files have to be first dumped by a program into a nebula friendly data dump file. The setup for a nebula run can be found in “~/SERENDIP/start_serendip_runs.csh” but an outline can be found below.

A run directory must be created for the pre and post processing data to be stored. This run directory should be named according to what type of run it is to ensure clarity.
Navigate into the directory and create a blank file called s6
Create a symbolic link called “nebula” that points to the compiled and built copy of the nebula repo. (If there are any issues with building or getting nebula working please contact Dave Anderson)
Create a directory/symlink to a directory called “unload” for the dumped data files in a nebula friendly format (Please see section below regarding how to dump arecibo data)
Make a directory called “data”
Copy an lband_sky_float_nside2048_eq.qpix” into the “data” directory.This file was provided by Dave Anderson.
Once all of the requirements have been met then you can run nebula by initializing the following command in the run directory. “make -f nebula/makefile_pipline” or “make -f nebula/makefile_basic” for testing nebula. Note that there are other makefile_* that can be used and you can check the make files to find the right one that you should use.

Within the "NEBULA/" directory you should find the following contents:

etfits-api-code
A reponsitory for the etfits-api which is used by nebula to read and manipulate fits files.
There are also many scripts within the etfits-api that are useful for the user to view fits
data files. One of these helpful script is "s6_dictionary.py" which displays the computed
values for the fits data files.

seti-science-code
The nebula repository for running serendip fits data files. This contains a older version
that is more stable and works more nicely with serendip data.

seti-science-code_fast
The nebula repository for running fast fits data files. This repository is a copy of zhi-song's
custom repo of the nebula repository. It is supposedly works with fits files that don't have any
timing data but I have yet to get it to work. I recommend not using this repo if possible.

seti-science-code_latest
The nebula repository for running serendip fits data files. This repository contains the lastest
code from Dave Anderson. This is not as stable and some changes from Dave might cause things in
this repository to not work. This also might include any new algorithms and rfi rejections that
might not exist in the more stable code.

lband_sky_float_nside2048_eq.qpix
A qpix file needed for nebula. This should be copied over to any nebula run in the "data/" directory
This can be seen in the start_serendip _runs.csh in the SERENDIP directory
(~/SERENDIP/start_serendip_runs.csh)

Arecibo
Arecibo data are located in the berkeley server and the Nebula code is used to do analysis on the fits data files. Current locations can be seen in the “start_serendip_runs.csh” in the "SERENDIP/" directory. Since Nebula needs a nebula-friendly dump format in the nebula repository there is a program called “s6_get_hits”. Note that “s6_get_hits” is a relatively simple program as most of the reading and analysis of the data is done by the “etfits_api”. You can run this program on a certain set of data to create a dump file that would be included in the “unload” directory that is created in the initialization process for nebula. The file should be called “spike_unload” The following are some command line arguments for “s6_get_hits”
–dir x (The root directory that the program will be scanning)
–outfile x (Name and location of the output file that is created)
–dir_exclude x (Exclude directories that contain a certain string. Great for excluding bad or good files as the directories name contains good or bad in its name)
–file_require x (Require the files that are dumped to include a certain string in its name. This is great for dump runs where you might only want to run an ALFA files or 327MHz only)

In order to see the current runs and the location of the current data please see the symlinks directory. The file “seti_amigos” is a link to the current location of the arecibo data, latest runs, and current dumped files. Please see their README files for more details regarding what is located in the directory.

Within the "SERENDIP/" directory you should find the following contents:

s6_nebula
An example of an serendip data dump directory. This type of directory has should be what is
expected in unload directory for a nebula run.

SERENDIP6_scripts
A directory consisting of scripts used for manipulating and editing serendip data files.
plot_obs(old and no longer in use)
A directory consisting of scripts to plot the parts of the sky that arecibo has observed.
unusedScripts
Some old scripts that are no longer in use.
EditFitsHeaderFile.py
Edit the clock and birdie frequencies of fits data files to fix them for bad ones that are missing.
fix_fits_headers_gen.csh
A script to loop through the current bad arecibo data on amigos and run the
EditFitsHeaderFile.py on those files. The bad files are stored in a directory called
${year}_at_hpss/fits_bad_synth/.
getHeaderData.py
Reads through a serendip data file and retrieves the bridie and clock header information
and writes that to a file for analysis to determine if that information is missing.
getBadFiles.py
A script to list all of the good files in the arecibo data directory
getGoodFiles.py
A script to list all of the bad files in the arecibo data directory
ListFitsHeader.py
A script to list the fits header for the arecibo data
ListFitsHeader_jeff.py
A script to list the fits header for the arecibo data after jeff cobb and read and reviewed it.
screenlog.0
A screen log file of the last run made
strace_*
Strace files made during debugging.

serendip_header_test
A serendip run on data files where their header data have been edited. Mainly clock and birdie
frequencies are injected in.

serendip_runs
A couple of serendip test runs on different years or different types of files:
s6_2015_2018_runs
A run on good files for the years 2015 and 2018
s6_2015_2018_run_nebula
A data dump of the data files for years 2015 and 2018
s6_2019_fixed_run
A run on edited(fixed) files of serendip data files on year 2019
s6_2019_fixed_nebula
A data dump of the data files for year 2019 on fixed files
s6_do_good_test
A test on good files for all years
s6_do_good_test_nebula
A data dump of the good files for all years.
s6_forced_multiplet
A run on a small set of files where we duplicate the clock frequencies to force multipletes
s6_forced_multiplet_nebula
A data dump of the data files were we forced a presence of multiplet with duplicated clock frquencies.

fix_fits_headers_s6c1.csh
A script to fix fits headers for serendip data files.

get_rand_files.csh
Get a set of random files from the serendip data directories.

start_serendip_runs.csh
A script to start 8 runs on the different subbands on the arecibo data and start them as
screen processes in the background. A data dump is needed for each of the subband is needed.
The dump directory consists of subdirectories such that s6_${band}_run_nebula/ consists a single
file called unload which consists of the dumped data directory. To do a data dump of the serendip
data for nebula. Run "s6_get_hits" in the nebula repo on all the fits data files that are needed
for the data dump. "s6_get_hits" takes in several parameters and allows the users specify the
destination and source directory.

Within the "symlink/setiamigos/" directory(main data directory) you should find the following contents:

fast_nebula
A directory consisting of the data Jeff Cobb brought over from the FAST servers.

s6_alfa_nebula
Consists of all the runs and data dumps for Arecibo Alfa files.
s6_all_year_run_new
A new run for all of the subbands for Arecibo. Nebula is ran separately on each subband
and thus should be viewed separately on the webpage.
s6_all_year_runs
A old run for all of the subbands for Arecibo. Nebula is ran separately on each subband
and thus should be viewed separately on the webpage.
s6_nebula_dump
An dump for all alfa files for Arecibo. These are linked into the run directories to allow
nebula to read these files.

s6_fast_runs
Consists of all the runs and data dumps for the fast data located in fast_nebula.
s6_fast_test_nebula
A dump of the test files for FAST. These files does contain metadata
s6_fast_test_run
A nebula run on the data dump files for test.
s6_fast_test_run_basic
A nebula_basic run on the data dump files for test.

s6_nebula
Old data dump files for nebula that are no longer used.

serendip6_data
A majority of the serendip data lives here. Edit with caution as things are backed up and read
in a specific manner. You should always get Jeff Cobb's approval before making any edits in this
directory.

FAST
The FAST data runs very similarly to the Arecibo data. There are some things to note for FAST that are not the same as SERENDIP, and some other things that exist for FAST. Firstly there were some data files by Zhi Song that didn't metadata. These are located in “~/FAST/” some tests to inject and fix some of that data can be found in the same directory. A test run was made on these data files and that can be found in “~/FAST/sample_test?” After some time Jeff Cobb was able to get some proper FAST data files that we can use to test nebula on. This fits data files can be found in “~/symlinks/seti_amigos/fast_nebula” and a run on those data files can be found in “~/symlinks/seti_amigos/s6_fast_runs/s6_fast_test_run/”. Refer to the README file for more information regarding the files located in there.

Within the "FAST/" directory you should find the following contents:

DATA
A symlink to the test data from Zhi-Song. These data files are missing timing data from
fits headers. These files needs to have artificial timing data injected to work with nebula
This can be done using the script inject_header_FAST.py

fixed_DATA
A directory of the fixed data. These are data files from the DATA directory after running
inject_header_FAST.py on it. A copy of the edited files are created in this direcotry

new_DATA
New data given by Zhi-Song which supposidly contains timing data in the header of the fits
file. I haven't been able to get this new data file working

sample_test
A sample run of nebula on the fixed_DATA folder. View the data on the webpage by creating a
link on the seti@home page

inject_header_FAST.py (No longer needed or used)
A script for injecting timing information into FAST fits files. The source and destination
directory can be changed in the script. Names of the fixed files will be the original name of
the file followed by "_edited". For example "serendip6_m13_TEST_0_0_20190714_005354.fits" will
be named "serendip6_m13_TEST_0_0_20190714_005354_edited.fits" for the fixed file.