-- Main.JingMing - 06 Nov 2013

1: lalapps_Makefakedata_v4:

used to produce the fake data. The command code is like this:

DataTests jiming$ lalapps_Makefakedata_v4 --outSingleSFT=TRUE --outSFTbname="parameter_test_TB100_1days3e-25.sft" --logfile="./parameter_test_TB100_1days3e-25.log" --IFO="H1" --ephemDir="/Users/jiming/mylal/share/lalpulsar" --startTime=1065789985 --duration=8640000 --Tsft=1800 --fmin=184.75 --Band=0.5 --window="tukey" --refTime=1070109985 --Alpha=2.9 --Delta=0.7 --h0=3e-25 --cosi=0.25 --psi=0.25 --phi0=0.25 --Freq=185 --noiseSqrtSh="3e-23" ### S/N=0.01

DataTests jiming$ lalapps_Makefakedata_v4 --outSingleSFT=TRUE --outSFTbname="parameter_test_TB100_1days3e-24.sft" --logfile="./parameter_test_TB100_1days3e-24.log" --IFO="H1" --ephemDir="/Users/jiming/mylal/share/lalpulsar" --startTime=1065789985 --duration=8640000 --Tsft=1800 --fmin=184.75 --Band=0.5 --window="tukey" --refTime=1070109985 --Alpha=2.9 --Delta=0.7 --h0=3e-24 --cosi=0.25 --psi=0.25 --phi0=0.25 --Freq=185 --noiseSqrtSh="3e-23" ### S/N=0.1

Note: the data produced by lalapps_Makefakedata_v4 is under the current directory. Make sure the data produced into " cd /Users/jiming/uniwork/DataTests"

Comments (anyone can leave some comments, suggestions or questions here):


2:lalapps_ComputeFStatistic_v2:

used to calculate the F-statistic for a given parameter-space of pulsar GW signals. It can demodulate the data and combine each SFT.

We use it to set the templates to filter the data and get the template candidates which have the highest 2F.

The command code is like this:

lalapps_ComputeFStatistic_v2 --maxEndTime=1074429985 --minStartTime=1065789985 --DataFiles="parameter_test_TB100_1days3e-25.sft" --outputFstat="outputTB100days3e-25" --outputFstatHist="outputTB100_1daysHist" --outputLoudest="outputTB100_1daysLoudest" --outputTransientStats="outputTB100_1daysTransientStats" --outputLogfile="./TB100_1days.log" --IFO="H1" --ephemDir="/Users/jiming/mylal/share/lalpulsar" --refTime=1070109985 --Alpha=2.9 --Delta=0.7 --Freq=184.95 --FreqBand=0.1 --ephemYear="00-19-DE405" --dFreq=5e-5 --NumCandidatesToKeep=2000 ##S/N=0.01

lalapps_ComputeFStatistic_v2 --maxEndTime=1074429985 --minStartTime=1065789985 --DataFiles="parameter_test_TB100_1days3e-24.sft" --outputFstat="outputTB100days3e-24" --outputFstatHist="outputTB100_1daysHist" --outputLoudest="outputTB100_1daysLoudest" --outputTransientStats="outputTB100_1daysTransientStats" --outputLogfile="./TB100_1days.log" --IFO="H1" --ephemDir="/Users/jiming/mylal/share/lalpulsar" --refTime=1070109985 --Alpha=2.9 --Delta=0.7 --Freq=184.95 --FreqBand=0.1 --ephemYear="00-19-DE405" --dFreq=5e-5 --NumCandidatesToKeep=2000 ##S/N=0.1

lalapps_ComputeFStatistic_v2 --maxEndTime=1074429985 --minStartTime=1065789985 --DataFiles="parameter_test_TB100_1days3e-24.sft" --outputFstat="outputTB100days3e-24_hr" --outputFstatHist="outputTB100_1daysHist" --outputLoudest="outputTB100_1daysLoudest" --outputTransientStats="outputTB100_1daysTransientStats" --outputLogfile="./TB100_1days.log" --IFO="H1" --ephemDir="/Users/jiming/mylal/share/lalpulsar" --refTime=1070109985 --Alpha=2.9 --Delta=0.7 --Freq=184.9995 --FreqBand=0.001 --ephemYear="00-19-DE405" --dFreq=5e-8 --NumCandidatesToKeep=2000 ##S/N=0.1 high resolution

Note:

1:use --outputFstatAtoms carefully because this command will produce a ".dat" file for each template. It is possible that you get millions of them which would make your labtop a mess. If such terrible situation happen, here is the terminal command to sovle the problem:

"find . -type f -name outputTB\* -delete -print".

2: if the NumCandidatesToKeep is more than templates, --outputFstat will only have 5 highest 2F candidates. WEIRD.

3: the --refTime=1070109985 here should be exactly same with the refTime of the fakedata.

Question: what is the reference time ? What does it influence ?

Comments (anyone can leave some comments, suggestions or questions here):



3:Matelab Plot:

Here I plot the results which lalapps_ComputeFStatistic_v2 produced:

matlab codes:

c=load('/Users/jiming/uniwork/DataTests/fliename');
figure;plot(c(:,1),c(:,7),'*c');

Picture:

2Ffor_e-25.jpg

2fFfor_2-24.jpg

2Ffor_e-24_hr.jpg

Note:

Question: for the above figure, why there are only a part of templates which are very close to 185Hz having the relatively large value of 2F?

Good question ! Hint: The Fstatistic is a matched filter, a likelihood. What does one do when one match-filters the data with a template ?

Comments (anyone can leave some comments, suggestions or questions here):



4:expected value of 2F:

What is the expected valuse of 2F for a template that matches perfectly well signal's parameters?

The probability density distributions of F_1 both when the signal is absent and present are well known.

When the signal is absent 2F has a chi^2 distribution with four degrees of freedom

and when signal is present it has a noncentral chi^2 distribution with four degrees of freedom and noncentrality parameter λ=SNR^2.

E(2F)=E(X1^2+X2^2+X3^2+X4^2+SNR^2)

for each Xi:

we hvae

Expected Value of normal Gaussian distribution:E(Xi)=0, variance D(Xi)=1,

E(Xi^2)=[E(Xi)]^2+D(Xi)=1,

therefore, we get the result:

E(2F)=4+SNR^2

Please define your notation: E(x) and D(x).

What does this table refer to ?

Table:Consistency between predictions and observations
Tsft (min)
SFT band (Hz)
Search frequency band (Hz)
N_templates
Obs. mean(2F)
Obs. max(2F)
E[2F]
Pred. Std(2F)
Tspan (days)
   
30 0.5 0.1 2000 4.0452 35.92 68.5   100    
30 0.5 0.1 2000 6.5265 5034 6454   100    
30 0.5 0.001 20000 5.840 5078 6454   100    

*Note:*

Comments (anyone can leave some comments, suggestions or questions here):

5:How to calculate λ(SNR^2) above in chapter 4.

λ=d^2,where d=√(h|h),

h is the GW signal and it is related to many parameters.

the scalar product (.|.)here need to integrate h^2 over observeration time.

Therefore, d^2 are complicated functions of sky pos angles(α δ), orientation anhgles(ψ ι) and woble angle θ.

The expression of d^2 should have 2 parts.

One is periodic in the reletively long observation time (say, 120days). The integration of this part over reletively long observation time should considered as 0.

the second part is not periodic, so it is related to sky pos angles(α δ), orientation anhgles(ψ ι), woble angle θ and of course, the observation time T0.

In order to calculate this, we should know the values of parameters above, or we can calculate the mean value under the assumption of sources' homogenous.

Only the woble angle θ we can't simply assume it is uniform in [0,pi/2], because there should be a physical mechanism which makes θ have some probability distribution. We can assume θ=pi/2, so the there would me only one component in h.


6: PDF of F

Here we only consider the condition that there is only one component for simplicity.

As mentioned above in charpter4, if signal is not present 2F has a chi^2 distribution with four degrees of freedom and if signal is present 2F has a noncentral chi^2 distribution with four degrees(noncentrality parameter λ=d^2).

therefore,

PDF_0(2F)=1/2*F*exp(-F); for signal not present ,

PDF_1(2F)=(1/2d)*sqr(2F)*[I_1(d*sqr(2F))]*exp(-F-0.5*d^2); for signal present,

Once we have the PDF(2F), we can know the false alarm probability without signal and the detection probability with a signal.

I can set a threshold F0, and if the 2F larger than 2F0 when there is no signal, we call that fasle alarm. We can get the exactly value of F0 by giving a alse alarm probability like 1%.

then the integration of PDF_1(2F) over from 2F0 to +∞ this the detection probability.


7: a simple simulation

Map's question:

How could we verify that our search code produce results consistent with theoretical results (described above in chapter 6)?

In order to find out the answer of this question, I did the procedures as below:

1:using Python to write a script. this script would call lalapps to produce data and compute the 2F. in this script, i made 1000 times simulation for both with and without signal conditions. The code would record the result of each simulation and output them as a file.

code for the situation with signal:

note: in the signal case, using the perfect match template.

import pdb

import os, sys

import signal, subprocess, shutil, shlex, random

def run_bash_command(cmdline):

# print to screen what the command line is you want to perform

#print "I perform: %s" % (cmdline)

# try to run command line; catch the exception in case of failure and exit with a note

try:

returnValue = subprocess.call(shlex.split(cmdline))

except OSError, (errno, strerror):

print "I failed to run your code. Hint: %s" % (strerror)

sys.exit(1)

os.chdir("/Users/jiming/uniwork/simulation/")

#command_line = ''' cd ~/uniwork/simulation/ '''

#run_bash_command(command_line)

#command_line0 = ''' . ~/mylal/etc/lalpulsar-user-env.sh '''

#run_bash_command(command_line0)

# get path to current working directory

mainpath = os.getcwd()

#print "%s" % (mainpath)

#loop: how many times of simulation wants to do. results_predict is the results of the predict value of 2F

results = list()

results_predict = list()

for number in range(1,1001):

print "%s" % (number)

tmpfile1 = 'tmpfile1'

tmpfile2 = 'tmpfile2'

tmpfile3 = 'tmpfile3'

#frequency = random.uniform(184.9995,185.0005)

# define the command line to run; it calls a lalapps_Makefakedata, a lalapps_ComputeFStatistic and a lalapps_PredictFStat

command_line1 = '''lalapps_Makefakedata_v4 --outSingleSFT=TRUE --outSFTbname="%s" --IFO="H1" --ephemDir="/Users/jiming/mylal/share/lalpulsar" --startTime=1065789985 --duration=864000 --Tsft=1800 --fmin=184.75 --Band=0.5 --refTime=1066221985 --Alpha=2.9 --Delta=0.7 --h0=3e-24 --cosi=0.25 --psi=0.25 --phi0=0.25 --Freq=185 --noiseSqrtSh=3e-23''' % (tmpfile1)

command_line2 = '''lalapps_ComputeFStatistic_v2 --DataFiles="%s" --outputFstat="%s" --IFO="H1" --ephemDir="/Users/jiming/mylal/share/lalpulsar" --refTime=1066221985 --Alpha=2.9 --Delta=0.7 --Freq=185 --FreqBand=0 --ephemYear="00-19-DE405" --dFreq=5e-8 --NumCandidatesToKeep=1 ''' % (tmpfile1, tmpfile2)

command_line3 = '''lalapps_PredictFStat --IFO="H1" --ephemDir="/Users/jiming/mylal/share/lalpulsar" --Alpha=2.9 --Delta=0.7 --h0=3e-24 --cosi=0.25 --psi=0.25 --phi0=0.25 --DataFiles=%s -F 185 --ephemYear="00-19-DE405" --outputFstat=%s''' % (tmpfile1, tmpfile3)

# run lalapps command (function defined above)

run_bash_command(command_line1)

run_bash_command(command_line2)

run_bash_command(command_line3)

os.remove(tmpfile1)

# read in the tmp file to get the FSatistic(2F); remove tmp file afterwards

infile = open('%s' % (tmpfile2),'r')

content = infile.readlines()

infile.close()

os.remove(tmpfile2)

# read out the 2F

#F = content[13].split()[6]

#print "%s" % (F)

#F = content[13].split()[6]

for line in content:

if not line.startswith('%'):

F = line.split()[6]

break

# read in the tmp file to get the PredictF (2F); remove tmp file afterwards

infile = open('%s' % (tmpfile3),'r')

content = infile.readlines()

infile.close()

os.remove(tmpfile3)

for line in content:

if line.startswith('twoF_expected'):

PredictF = line.split()[2]

break

#print "%s" % (PredictF)

#PredictF = content3[9].split()[2]

# append the 2F value to list

results.append(F)

results_predict.append(PredictF)

# define outputfilepath; open outputfile and write; close file at the end

outfilename = os.path.join(mainpath,'result3')

#import numpy as np

#results = np.array(results)

#np.savetext(outfilename,results)

outfile = open(outfilename, 'w')

for number in xrange(len(results)):

#outfile.write(results)

outfile.write('%s\n'%results[number])

outfile.close()

#---------------Output the Predict_F in file which matlab can read

outfilename = os.path.join(mainpath,'result_predictF')

#import numpy as np

#results = np.array(results)

#np.savetext(outfilename,results)

outfile = open(outfilename, 'w')

for number in xrange(len(results_predict)):

#outfile.write(results)

outfile.write('%s\n'%results_predict[number])

outfile.close()

#print "%s" % (results)

2 : read the simulation results by matlab, then plot the hist picture of the simulation results and compare this with theoretical predict.

note:

1 the noncentrality parameter λ=snr^2 used in ploting the theoretical PDF is produced by lalapps_PredictFStat. Compute the mean of 1000 *PredictFStat we get noncentral parameter 642.8-4=638.8*

2 the functions used in matlab are chi2pdf and ncx2pdf for without and with signal respectively

Here is the results:

simulation_nosignal.jpg

simulation_with_signal_v2_0.jpg

The pictures show in non-signal case the simulation result matchs theory, but in signal case the PDF 2F produced by search code (simulation) shift left.

This means simulation result is smaller than theory.

Comments (anyone can leave some comments, suggestions or questions here):

After talked with Map about this inconsisitency between CFS(computerFStatistic) and PFS (predictFStatistic)gives, at first we thought this due to the f-sataistic bias described in Reinhard's note:

https://dcc.ligo.org/LIGO-T1100551

but the bias due to fact that E(1/X) not equals 1/E(x) can't be used to explain why the hist shift left( this positive bias only makes the hist even right!)

then we thought it might be that something in PFS is not right. therefore we tried another lalapp which don't need the specific data file, here is the command line:

hnb478:makeupoflal jiming$ lalapps_SemiAnalyticF -a 2.9 -d 0.7 -Y 0.25 -Q 0.25 -i 0.25 -s 3e-24 -N 3e-23 -n 480 -D H1 -E "/Users/jiming/mylal/share/lalpulsar" -S 1065789985

We get the noncentral parameter is 627-4=623

this time we use new noncentral parameter and 10 thousands simulation results gived by CFS, and the plot is like this:

simulation_with_signal_compare_with_semianalyticF.jpg

well this plot using the data from 10 thousand simulation, so it must be much more accurate than the previous one.

we can see that using the noncentral parameter produced by lalapps_SemiAnalyticF, the hist is more matched than the previous. However, We can still see the hist shifts to left a little bit. Actully, the mean of 2F from CFS is 614.1 witch is still 2% smaller than 627 produced by lalapps_SemiAnalyticF.

Now, one question becomes two.

1: Why the hist is still little smaller? where is wrong????

2: Why the PFS gives us a such large 2f???

614(CFS)????? < 627(SAF)?????? < 643(PFS)

8: 614(CFS)< 627(SAF)< 643(PFS):

1: the reason 627(SAF)< 643(PFS)

The main difference between code PFS and SAF is that PFS needs we specify a "stf" data file but SAF not.

In PFS, we need to give a parameter named "RngMedWindow" (default=50) to estimate the noise-floor which is Sh. While in SAF we just directly give the Sh value in Command line.

To get the 2F both SAF and PFS need to divide Sh. If the Sh smaller, the 2F is lagerer.

How does it estimate Sh in PFS?

At first the code takes some bins from the whole Frequency range. Each bin is 1/Tsft and the number of bins the code take out to calculate Sh is RngMedWindow. In order to minimize the influence caused by bins which is signal frequency when estimating the noise-floor. The code use mean(1/sh) but not mean (sh). Therefore, in some bins, the 1/sh is much smaller than which is in the most of other bin. Everytime we estimate the noise-floor using PFS, sh is always a litte bit smaller that the value it should be.

8: simulation results by setting different value of Tsft and Dterms:

fix Dterms:

Dterms=16&Tsft=120 ---> 2f=619.5426
Dterms=16&Tsft=180 ---> 2f=619.2570
Dterms=16&Tsft=600 ---> 2f=623.8218
Dterms=16&Tsft=900 ---> 2f=621.4189
Dterms=16&Tsft=1800 ---> 2f=613.8500

Dterms=128&Tsft=600 ---> 2f=629.5912
Dterms=128&Tsft=900 ---> 2f=628.7768
Dterms=128&Tsft=1800 ---> 2f=626.8661

fix Tsft=1800:

Dterms=32: ---> 2f=620.3307
Dterms=64: ---> 2f=620.3575
Dterms=128: ---> 2f=626.1488
Dterms=256: ---> 2f=623.6630
Dterms=380: ---> 2f=624.6019

From the result above we can see that

1:setting higher Dterms could give us higher 2f.

2:lower Tsft could reduce 2f, but giving a fixed Dterms if we reduce Tsft Dterms will cover more percentage of bins which would help us to get larger 2f.

Therefore the highest value of 2f is around Dterms=128&Tsft=600 which gives us 2f=629.5912.

> Here are the signal-only results(without noise floor estimation):
>
> Tsft=1800:
> Tsft=1800&Dterms=8 ---> 2f=5438.72(CSF)&5611.64 (PSF)
> Tsft=1800&Dterms=16 ---> 2f=5510.94(CSF)&5611.64 (PSF)
> Tsft=1800&Dterms=32 ---> 2f=5547.25(CSF)&5611.64 (PSF)
> Tsft=1800&Dterms=64 ---> 2f=5565.43(CSF)&5611.64 (PSF)
> Tsft=1800&Dterms=128 ---> 2f=5574.48(CSF)& 5611.64(PSF)
> Tsft=1800&Dterms=256 ---> 2f=5578.90(CSF)&5611.64 (PSF)
> Tsft=1800&Dterms=380 ---> 2f=5580.25(CSF)& 5611.64(PSF)
>
> Tsft=900:
> Tsft=900&Dterms=8 ---> 2f=5451.76(CSF)&5611.58 (PSF)
> Tsft=900&Dterms=16 ---> 2f=5526.01(CSF)&5611.58 (PSF)
> Tsft=900&Dterms=32 ---> 2f=5563.29(CSF)&5611.58 (PSF)
> Tsft=900&Dterms=64 ---> 2f=5581.86(CSF)&5611.58 (PSF)
> Tsft=900&Dterms=128 ---> 2f=5590.89(CSF)&5611.58 (PSF)
> Tsft=900&Dterms=190 ---> 2f=5593.62(CSF)&5611.58 (PSF)
>
> Tsft=600:
> Tsft=600&Dterms=8 ---> 2f=5474.06(CSF)&5611.57 (PSF)
> Tsft=600&Dterms=16 ---> 2f=5538.66(CSF)&5611.57 (PSF)
> Tsft=600&Dterms=32 ---> 2f=5571.01(CSF)&5611.57 (PSF)
> Tsft=600&Dterms=64 ---> 2f=5587.00(CSF)&5611.57 (PSF)
> Tsft=600&Dterms=128 ---> 2f=5594.54(CSF)&5611.57 (PSF)
>
> Tsft=300:
> Tsft=300&Dterms=8 ---> 2f=5554.04(CSF)&5611.56 (PSF)
> Tsft=300&Dterms=16 ---> 2f=5579.63(CSF)&5611.56 (PSF)
> Tsft=300&Dterms=32 ---> 2f=5592.30(CSF)&5611.56 (PSF)
> Tsft=300&Dterms=64 ---> 2f=5598.33(CSF)&5611.56 (PSF)
>
> Tsft=150:
> Tsft=150&Dterms=8 ---> 2f=5589.54(CSF)&5611.56 (PSF)
> Tsft=150&Dterms=16 ---> 2f=5596.75(CSF)&5611.56 (PSF)
> Tsft=150&Dterms=32 ---> 2f=5600.19(CSF)&5611.56 (PSF)
>
> You can see that this result is consistent with the result of noise-floor estimated case which the maximum given by CFS is 629 while PFS is 643. There is 2% of difference can't be explained by Dterm setting. Here the signal-only case also has 2% difference no matter how Dterms and Tsft setting.

> Comparing signal-only result and noise-floor-estimated result, at least we can draw a conclusion that that noise-floor-estimating process won't be responsible for losing power in CFS compared with PFS.

Reihard's comment:

1) compare the resulting antenna-pattern function numbers computed by
both codes,

2) see if there's some other approximation in LALDemod() that we haven't
considered, or

3) if there's some approximation in makefakedata that we forgot about ...
Topic attachments
I Attachment Action Size Date Who Comment
2Ffor_e-24_hr.jpgjpg 2Ffor_e-24_hr.jpg manage 44 K 11 Nov 2013 - 13:51 JingMing  
2Ffor_e-25.epseps 2Ffor_e-25.eps manage 203 K 11 Nov 2013 - 13:04 JingMing  
2Ffor_e-25.figfig 2Ffor_e-25.fig manage 31 K 11 Nov 2013 - 13:01 JingMing  
2Ffor_e-25.jpgjpg 2Ffor_e-25.jpg manage 54 K 11 Nov 2013 - 13:51 JingMing  
2fFfor_2-24.jpgjpg 2fFfor_2-24.jpg manage 62 K 11 Nov 2013 - 13:51 JingMing  
simulation_nosignal.jpgjpg simulation_nosignal.jpg manage 62 K 01 Dec 2013 - 23:06 JingMing  
simulation_with_signal.jpgjpg simulation_with_signal.jpg manage 56 K 01 Dec 2013 - 23:06 JingMing  
simulation_with_signal_compare_with_semianalyticF.jpgjpg simulation_with_signal_compare_with_semianalyticF.jpg manage 40 K 11 Dec 2013 - 16:23 JingMing  
simulation_with_signal_v2_0.jpgjpg simulation_with_signal_v2_0.jpg manage 66 K 02 Dec 2013 - 21:12 JingMing  
Topic revision: r18 - 05 Feb 2014, JingMing
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback