Radio Pulsar Search on Einstein@Home Code Review

Date

On Dec 5 we will have a code review for the radio pulsar search on Einstein@Home. We will start at 9 CET in room 165.

First we would like to review the codes with highest science content. These are:
  • dedisperse.c
  • addheader.c

The room is blocked until 16 CET.

The second code review will start at 11 CET on Dec 12 in room 165.

We will look at:
  • demod_binary.c

The room is blocked until 16 CET again although this meeting might not take long.

Remote participation

If desired we can arrange remote participation via conference call.

Download Codes and Data

  • code packages have been removed
  • XXX: selected codes and test data (for those who have GSL and FFTW installed)
  • XXX: Complete set of all project codes and test data; using this set you can build the app without having installed FFTW or GSL

Documentation

The following sections describe the different stages (data preprocessing, dedispersion, and demodulation / search) of our pulsar search pipeline and the codes used in each stage.

This picture shows a schematic overview of the pipeline:

pipeline.png

For the codes to be reviewed there is additional information about the input files accepted by the codes, the output files written, a list of the command line arguments, and a quick walktrough describing the main structure of the corresponding piece of code.

Data preprocessing

The raw data from Arecibo are preprocessed using the standard pulsar astronomy tools from Arecibo and the sigproc suite.

  • Codes: alfasplit, filterbank2.2, addheader.c, structs.h
  • Libraries:

  • alfasplit this piece of code splits the initial WAPP files. Each WAPP file contains interleaved information from two beams of the telescope. These have to be split in a first step (see below). We use a pre-compiled version of this code.
  • filterbank2.2 is a part of the standard sigproc suite used widely by pulsar astronomers. The split data files have to be translated into the filterbank format (basically the t-f plane). Here, we also use a precompiled version of the code.
  • addheader.c combines the ascii header and the *.wapp.tfdata.binary file provided by filterbank2.2 into a single binary file which can then be fed into the next stage of our pipeline.

Tests on data sets that contain known pulsars suggest that there is no problem in using the precompiled versions.

  1. splitting of the data files by alfasplit; this seperates the two beams of the pointing that are written into the same file during observation. For each two final files of a five minute observation there are three initial files to split. alfasplit then splits those into 6 temporary (split) files. Usually for one observation there are two files of about the same size (~1 Gb) and a much smaller spillover one (~100Mb).
  2. the three files for each final file have to transferred into filterbank format. This means to translate the data in the t-f-plane (where t is time and f the radio frequency). At the end of this step there is one *.wapp.tfdata.binary file and a header file containing additional information for each beam of the observation.
  3. in the last step we add the separate header file into the data file to get this information downstream more easily. This is done by addheader.c, a simple piece of C code which just creates a binary file, parses the ascii header, writes it in a fixed format (C struct) into the binary file and then reads the binary *.wapp.tfdata.binary and appends it to the new binary file.

Information on structs.h

This file defines the commonly used structs in our pipeline, including headers to raw data files, dedispersed time series and checkpoint files. The following headers are defined:
  • Data_header: header for the raw data files, containing information about the observation
  • DD_header: header for the dedispersed time series, containing the same information as data_header + trial DM, decompression information
  • CP_header: header for the binary checkpoint file, containing the number of the last template the code has done and the name of the input time series
  • CP_cand: struct for the candidates in the search: f0, P_orb, tau, Psi_0, P_sum, ifa, N_h
  • t_pulsar_search: struct for communication with the graphics app over shared memory

Information on addheader.c

input:
  • any *wapp.tfdata.binary file (as provided by filterbank)
  • ascii header file in Arecibo standard (as provided by filterbank)

ouput:
  • a *.wapp.tfdata.binary file with C struct header in our format (structs.h)

command line arguments:
  • -o, --outputfile [string]: Path name of the binary output file.
  • -i, --inputfile [string]: Path name of the binary input file.
  • -H, --headerfile [string] Path name of the ascii header file.
  • -h, --help: Prints help message.

Quick Code Walktrough for addheader.c
This is only a very short bit of code:
  • command line parsing
  • opening output and input streams
  • parsing of information in ascii header file
  • writing C struct header to binary output file
  • reading input data in chunks of 256 floats (256 radio frequency channels at one instant in time) and writing these chunks to the output file. Keep reading as long as the number of time samples read in total is less or equal to 2^22 and as long there are still data in the input file
  • In case there are more than 2^22 points: stop reading after 2^22 points
  • In case there are less than 2^22 points: fill the remaining bins with zeros

At the end of these three steps the data are in a format where our pipeline can take over. The following chapters describe the bits of code written for this purpose.

Dedispersion stage

  • Codes: dedisperse.c, dedisperse.h, structs.h
  • Libraries: math, gsl, fftw3

This stage is the first one of our actual pipeline. It is designed to do the dedispersion of the t-f data sets on a grid of trial DM values. Other important functions are the downsampling of the data set by a factor of 2 in time direction, removal of radio frequency interferences (RFI) and compression of the final dedispersed time series into single-byte precision.

Tests of the dedispersion stage

A complete set of two intial *.wapp.tfdata.binary files, all corresponding dedispersed time series (and all result files) can be found on ATLAS in /home/benk/code_review.

J1949+31.wapp.tfdata.binary and J1947+19.wapp.tfdata.binary are the initial t-f data sets with a C struct header.

/home/benk/code_review/data/ J1949+31 and /home/benk/code_review/data/ J1947+19 contain the 628 dedispersed time series for each data set.

Information on dedisperse.c

input:
  • any *wapp.tfdata.binary file with a C struct header in our format.
  • ascii file with the DM grid

ouput:
  • a number of dedispersed time series' as binary files with a different C struct header containing information about the dedispersion process

command line arguments:
  • -h, --help: Prints help message.
  • -i, --input_file [string]: Path name of the input file.
  • -o, --output_file [string]: Path name of the time series binary output file, the code will append "_XXX.binary", where XXX is the integer index of trial DM.
  • -T, --table_file [string]: Path name of the DM table file.
  • -a, --ascii_output: Switch for ascii output.
  • -dm, --DMmin [double]: The minimum value of the dispersion measure to dedisperse for.
  • -DM, --DMmax [double]: The maximum value of the dispersion measure to dedisperse for.
  • -F, --f_clipping [double]: RFI removal in frequency domain. Recommended value = 5.0.
  • -c, --clipvalue [double]: RFI removal in time domain, (0.0 = deactivated, else: recommended value = 2.0).
  • -z, --debug: Run program in debug mode.

Quick Code Walktrough for dedisperse.c

The important parts of the code mentioned above occur in the following order:
  • command line parsing
  • reading C struct header of input tfdata file and transferring information into the header for the final timeseries
  • reading tfdata file to memory
  • downsampling of tfdata by FFTing each radio frequency channel, chopping off the upper half of the spectrum and IFFTing and downsampling
  • if wanted: time domain cleaning of data -- not used at the moment
  • if wanted: frequency domain cleaning of data by FFTing each radio frequency channel and replacing all bins above a certain threshold with properly scaled white noise. Back to time domain via IFFTing. Subtraction of median in each channel to adjust noise levels in all channels
  • actual dedispersion by looping over the values read from the table file:
    • precomputation of shifting LUT
    • summing along the tracks in t-f plane defined by use of shifting LUT
    • there are options to use interpolation between neighbouring samples -- these are currently not used and not very well tested
    • compression of final time series into single byte precision by computing, max (99% quantile), min (1% quantile), and median of the time series, rescaling it so that scalefactor*(max - median) = 127 and storing the rescaled time series by casting it into unsigned chars
    • finally, writing the downsampled, compressed time series into a binary file
  • when the loop is done, some cleanup

Demodulation stage

  • Codes: demod_binary.c, rngmed.c, erp_utilities.c, demod_binary.h, rngmed.h, erp_utilities.h, structs.h
  • Libraries: math, gsl, fftw3f

This is the part of the pipeline that is done distributed on the clients. Its main tasks are the resampling of the time series (from dedisperse.c), FFTing it and using harmonic summing to increase sensitivity to pulsed signals with small duty cycle.

Tests of the demodulation stage

Some test where done to illustrate the behaviour of the code. A central step is the whitening of the dedispersed time series on the client. The correctness of this step is illustrated by a test below. There was also a full test of the complete code by running it on real data containing a pulsar. The results of this test are shown in the second subsection.

Test of the whitening procedure

To test the whitening procedure used in the demodulation code, the distribution of the values in the power spectrum after whitening has been computed and then compared to the expected distribution exp(-P). The following three plots show the results. The first plot shows the distribution in the first third of the spectrum, the second one the distribution in the second third, and the last one shows the distribution in the last third of the spectrum:

  • whitening test 1:
    review3.png

  • whitening test 2:
    review4.png

  • whitening test 3:
    review5.png

Full test run on real data

A complete set of two intial *.wapp.tfdata.binary files, all corresponding dedispersed time series (and all result files) can be found on ATLAS in /home/benk/code_review.

/home/benk/code_review/results/ J1949+31 and /home/benk/code_review/results/ J1947+19 contain the 628 result files for each data set.

You will find the most significant pulsar candidates in results/J1947+19/J1947+19_dd_316.cand and results/J1949+31/J1949+31_dd_274.cand. Since these two pulsars are isolated and quite bright they ring many templates and trigger so many significant candidates that all top-100 candidates basically are the same. This effect can easily be taken care of in postprocessing and will not happen for binary or weaker pulsars.

Visualisation of some of the content of the results (colorcode shows frequency of the candidate):

review1.pngreview2.png

Information on demod_binary.c

input:
  • any dedispersed, compressed time series as a *.binary file with a C struct header in our format and data stored as unsigned chars.
  • a template bank specifying the orbital parameters used for resampling of the time series

output:
  • an ascii file containing the top 100 candidate events sorted by inverse false alarm rate
  • temporarily a binary checkpoint file is generated

command line arguments:
  • -h, --help: Print help message.
  • -i, --input_file [string]: Path name of the input file.
  • -o, --output_file [string]: Path name of the candidate output file.
  • -t, --template_bank [string]: Path name of the template bank file.
  • -c, --checkpoint_file [string]: Path name of the checkpoint file.
  • -f, --f0 [double]: The maximum signal frequency in Hz.
  • -A, --false_alarm [double]: False alarm probability.
  • -K, --kill_line: Switch on to kill the power line at 60 Hz.
  • -P, --padding [double]: The frequency over-resolution factor
  • -W, --whitening: Switch for power spectrum whitening.
  • -B, --box [int]: Box width for the running median in frequeny bins.
  • -z, --debug: Run program in debug mode.

Quick Code Walktrough for demod_binary.c

The demodulation code has the following structure:
  • command line parsing
  • opening the template bank file, looking for a checkpoint, if existent: read in
  • opening time series data, reading the header information
  • conversion of some header information for use in the screen saver
  • conversion of time series back into four-byte precision floats
  • whitening of the time series:
    • FFTing of time series, computation of the power spectrum
    • adjusting running median of the FFT such that rescaled power spectrum is white, chopping of bins at the ends not covered by running median
    • IFFT to get time series back, normalization
  • main part:
    • precomputation of thresholds on powers, power line killing preparations
    • looping over the template bank
      • linear resampling of the time series
      • mean padding of the time series to increase frequency resolution
      • computation of the FFT
      • if wanted: killing of powerline and harmonics at 60 Hz, 120 Hz, 180Hz, and 240 Hz
      • harmonic summing
      • storing of candidates from harmonic summed power arrays
      • refreshing downsampled power spectrum for screen saver
      • saving candidates to checkpoint
    • when all templates are done: compute inverse false alarm rate for top candidates by power and resort candidate list by inverse false alarm rate
    • write the top 100 candidates to an ascii file

Random template placement stage

  • Codes: GenerateRandomBinaryTemplates.c
  • Libraries: math, gsl

This piece of code is used to generate a random template bank in the three orbital dimensions of the four-dimensional parameter space. The metric used to construct the template is projected with respect to frequency, so that the frequency in the search has to reconstructed by using the FFT. To deal with the fact the the metric is not flat, we cut the parameter space into cubes in which we assume the determinant of the metric to be constant at the maximum value in the cube.

Some more background information on the random template bank techniques used can be found in this document by Chris:

* param_space.pdf

Tests of the random template bank

template bank test 1: fixed point in orbital parameter space, different frequency, tested against 500 different random template banks

For this test, a signal (without noise) was injected at four different frequencies at a fixed point in orbital parameter space. Then 500 different random template banks were generated and the signal was recovered with each of these template banks. The plots show in red all points in one of these template banks and in green the loudest candidate from each of the 500 template banks.

test1.pdf

histograms of mismatch for template bank test 1:

These histograms show the mismatch distribution for the 500 green points from the previous plots. The template bank was constructed to give a maximum mismatch of 0.1 in orbital parameters and the frequency grid was set up so that the maximum mismatch in frequency is 0.1 as well. All histograms show that the maximum mismatch is below 0.2 as expected.

histograms_test1.pdf

template bank test 2: three points in orbital parameter space, fixed frequency at 50 Hz, tested against 500 different random template banks

For this test, a signal (without noise) was injected at fixed frequency = 50 Hz and at three points in orbital parameter space. Then 500 different random template banks were generated and the signal was recovered with each of these template banks. The plots show in red all points in one of these template banks and in green the loudest candidate from each of the 500 template banks.

test2.pdf

histograms of mismatch for template bank test 2:

These histograms show the mismatch distribution for the 500 green points from the previous plots. The template bank was constructed to give a maximum mismatch of 0.1 in orbital parameters and the frequency grid was set up so that the maximum mismatch in frequency is 0.1 as well. For injection 3 the mismatch distribution extends to higher mismatches because the signal is in the region of parameter space, where the mismatch ''bananas'' extend way out of the space covered by the template bank.

histograms_test2.pdf

Information on GenerateRandomBinaryTemplates.c

input:
  • no input files needed

output:
  • an ascii file containing the randomly placed templates, one per line

command line arguments:
  • -h, --help:Prints help message.
  • -T, --tobs [double]: The coherent observation span (in seconds).
  • -p, --periodmin [double]: The lower boundary of the search range for orbital period (in seconds).
  • -P, --periodmax [double]: The upper boundary of the search range for orbital period (in seconds).
  • -f, --f0 [double]: The maximum signal frequency (in Hz).
  • -m, --minNSmass [double]: The minimum NS mass (in solar masses).
  • -M, --maxcompmass [double]: The maximum companion mass (in solar masses).
  • -u, --mismatch [double]: The maximum mismatch we wish to achieve over a fraction eta of the space.
  • -e, --eta [double]: The fraction of parameter space we wish to cover at mismatch mu.
  • -r, --popfraction [double]: The fraction of the population at extreme companion mass we are sensitive to.
  • -A, --approx [int]: Switching the tau << T_obs approximation on (=1) and off (=0).
  • -n, --nperiod [int]: The number of boxes over which to divide the orbital period dimension.
  • -N, --nasini [int]: The number of boxes over which to divide the projected orbital semi-major axis dimension.
  • -v, --npsi [int]: The number of boxes over which to divide the orbital initial phase dimension.
  • -o, --outputfile [string]: Path name of the template bank output file.
  • -d, --debug [int]: Set flag for outputting status information for debugging.

Quick Code Walktrough for GenerateRandomBinaryTemplates.c
  • command line parsing
  • compute n-ball volume, cube sizes, random template density
  • setting up random number generator
  • loop over cubes and place random templates inside
  • loop over templates generated and use threshold on fraction of population we want to see to throw away templates outside the parameter space

Additional information

  • RadioPulsars.pdf: Some general information on radio pulsar searches by Jim Cordes

  • parameter_ranges.pdf: Some thoughts on how to constrain our parameter space based on astrophysics (Jim Cordes)

-- BenjaminKnispel - 20 Nov 2008

Topic attachments
I AttachmentSorted ascending Action Size Date Who Comment
hist1.pdfpdf hist1.pdf manage 14 K 04 Dec 2008 - 08:04 BenjaminKnispel histograms of mismatch for template bank test 2: three points in orbital parameter space, fixed frequency at 50 Hz, tested against 500 different random template banks
hist2.pdfpdf hist2.pdf manage 14 K 04 Dec 2008 - 08:01 BenjaminKnispel histograms of mismatch for template bank test 1: fixed point in orbital parameter space, different frequency, tested against 500 different random template banks
injections.pdfpdf injections.pdf manage 492 K 04 Dec 2008 - 08:03 BenjaminKnispel template bank test 2: three points in orbital parameter space, fixed frequency at 50 Hz, tested against 500 different random template banks
injections2.pdfpdf injections2.pdf manage 698 K 04 Dec 2008 - 08:00 BenjaminKnispel template bank test 1: fixed point in orbital parameter space, different frequency, tested against 500 different random template banks
param_space.pdfpdf param_space.pdf manage 120 K 04 Dec 2008 - 07:54 BenjaminKnispel Background information on the template placement
parameter_ranges.pdfpdf parameter_ranges.pdf manage 101 K 05 Dec 2008 - 07:35 BenjaminKnispel Some thoughts on how to constrain our parameter space based on astrophysics (Jim Cordes)
pipeline.pngpng pipeline.png manage 195 K 04 Dec 2008 - 10:01 BenjaminKnispel graphical overview of the AEI pulsar search pipeline
RadioPulsars.pdfpdf RadioPulsars.pdf manage 119 K 05 Dec 2008 - 07:34 BenjaminKnispel Some general information on radio pulsar searches by Jim Cordes
review1.pngpng review1.png manage 281 K 04 Dec 2008 - 09:37 BenjaminKnispel Visualisation of result files for run on J1949+31
review2.pngpng review2.png manage 212 K 04 Dec 2008 - 09:39 BenjaminKnispel Visualisation of result files for run on J1947+19
review3.pngpng review3.png manage 87 K 04 Dec 2008 - 10:36 BenjaminKnispel whitening test 1
review4.pngpng review4.png manage 87 K 04 Dec 2008 - 10:36 BenjaminKnispel whitening test 2
review5.pngpng review5.png manage 87 K 04 Dec 2008 - 10:36 BenjaminKnispel whitening test 3
This topic: Main > RadioPulsarMain > RadioPulsarSearchCodeReview
Topic revision: 26 Feb 2009, BenjaminKnispel
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback