Directory hierarchy for LSC files

Storage structure for S4/S5/S6 data (past)

In the past we used paths like these
H/H1/RDS/C03/L1/H-H1_RDS_C03_L1-822092472-60.gwf
H/97161/H-H1_RDS_R_L1_971619968-64.gwf
SFT/B/H-1_H1_1800SFT_C03hoftCorrectSegsFmin38Hz-874533574-1800.sft

CIT uses this structure in its archive:
S5/strain-L1/LLO/L-L1_RDS_C03_L1-8193/L-L1_RDS_C03_L1-819365103-128.gwf

As things stand now, it's a mess.

Future proposed hierarchy for both HSM and file server storage

Time based data products

The following has been proposed to address the following problems:

  • standardized storage hierarchy for file servers as well as our archive on the HSM - beneficially side-effect, we can use autofs to transparantly serve files from the HSM if a file server is not reachable/under too much stress
  • not too many files/directory entries per directory (max should be of the order of 1000)
  • files should also be findable even if we don't have access to LDR's database
  • based on discussions with Dan, Stuart, Ed, Jeff, Scott and Martin, we adjusted the layout a bit to also include the "Frame Type"

Thus we came up with the following for time based data products
Observatory/Type/GPSlead/GPSrem/file

where
Observatory
Any Observatory abbreviation (H/I/K/L/V/...) and any combination there-of, e.g. GHLV
Type
Type of file, e.g. H1_RDS_R_L1, H1_RDS_C03_L1, 1_H1_1800SFT_C03hoftCorrectSegsFmin38Hz, ...
GPSlead
first 4 digits of 10 digit GPS time with leading zeros.
GPSrem
remainder of 10 digit GPS time rounded down to full 1000, e.g. 1234567890 -> 1234 / 567000
file
Full regular file name according to convention

Thus the examples from above would now have the following structure:
H/H1_RDS_C03_L1/0822/092000/H-H1_RDS_C03_L1-822092472-60.gwf
H/H1_RDS_R_L1/0971/619000/H-H1_RDS_R_L1_971619968-64.gwf
H/1_H1_1800SFT_C03hoftCorrectSegsFmin38Hz/T/0874/533000/H-1_H1_1800SFT_C03hoftCorrectSegsFmin38Hz-874533574-1800.sft
L/L1_RDS_C03_L1/0819/365000/L-L1_RDS_C03_L1-819365103-128.gwf

frequency based data products

Mostly the same as before, but as GPS times are rather static, we simply but those into 100Hz frequency bands (and adding the F directory level). The proposed structure is
Observatory/Type/subtype/GPSstart/band

where
Observatory
Any Observatory abbreviation (H/I/K/L/V/...) and any combination there-of, e.g. GHLV
Type
is the data product type (SFT)
Subtype
F for frequency based data product
GPSstart
full 10 digit GPS start second of data products (as we expect many files with the exact same starting time, this can be used as a reference to S5a S5b, S6a, ...)
band
4 digit, zero prefixed 100 Hz band, e.g. 0200

i.e.
H/SFT/F/1234567890/0200/H-H1_SFT_0245Hz-1234567890-17284566.sft

Note this might be changed once a community standard has been proposed?

Exceptions et al.

  • The subtype for the SFT type should be T for time based files, i.e. full bandwidth files and F for frequency based files, i.e. long duration but very limited frequency band.

Distribution

At the time of writing (2013) we have 37 data servers in operation where is striped across. This is done by taking the GPS time, dividing by 1000, truncating the result, take the modulus w.r.t to 37 and add 1, i.e.:

target server = printf "d%02d" $((1+(GPSTIME/1000) % 37 ))

Example:
echo H-H1_NINJA2_GAUSSIAN-875698108-4096.gwf | gawk -F"-" '{FRONT=int($3/1e6); printf "d%02d:/data/LSC/%s/%s/%04d/%06d/%s-%s-%s-%s\n",1+int($3/1000)%37,$1,$2,FRONT,1000*int(($3-1e6*FRONT)/1000),$1,$2,$3,$4}'
d20:/data/LSC/H/H1_NINJA2_GAUSSIAN/0875/698000/H-H1_NINJA2_GAUSSIAN-875698108-4096.gwf
-- CarstenAulbert - 10 Apr 2012
Topic revision: r5 - 13 Jun 2014, HenningFehrmann
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback