Cleaning your $HOME

Quick steps for those who have no time to spare

  1. Locate directory which you don't need right now but which produced a lot of small Condor log files. For now assuming this directory is named ProjectA
  2. run screen -d -m tar --remove-files -czf ProjectA.tar.gz ProjectA (this will start screen in the background)
  3. That's all, once tar is done, the screen will terminate. In case you want to attach to screen, run screen -r.

That's all, after a while all files will be in the tarball and the directory should vanish, saving precious space and making your home file server faster (a tiny bit a least).

Longer story

After working on several projects at the same time and several months, it's inevitable you leave a lot of files behind you don't really need anymore but want to keep "just in case". This, however, can cause quite a bit of pain for the admins, especially when moving a home file server to a new venue (this can take days up to more than a week!).

To elevate this problem a bit which is mostly caused by a very large number of small files, please consider putting your files into an "archive", e.g. run tar on a directory you don't need anymore. For many more details look at the final section at the bottom for a rough idea about the potential costs/benefits.

Compressing old directories

If you have such old analysis directories you don't want to lose, but you want to help to make the world a happier place, you can just run (assuming the directory of the old analysis is called oldbreakthru) tar czf oldbreakthru.tar.gz oldbreakthru and afterwards delete the directory with rm -rf oldbreakthru (but make sure to NOT delete the tar.gz file).

tar can do all the same in one go with the following command (now with long arguments, pleae have a look at tar 's man page for more information and how to use the other compression algorithms:

tar --create --gzip --remove-files --file oldbreakthru.tar.gz oldbreakthru

If you happen to have one directory full of directories of old analyses, you can do the same for every directory there with a little bit of shell magic (using bash style; assuming all directories start with OLD):

for dir in OLD*; do tar --create --gzip --remove-files --file $dir.tar.gz $dir; done

Why should you, the admins or just anyone care?

You might wonder why we (the admins) care about this so much, and the answer is just performance and costs involved. if you take the trial dataset considered below, the "real" size of this data set is 729.57MB which you get if you just add up all file sizes as shown by ls. However, this is only partially true. The underlying file system stores data in multiples of some blocksize which speeds up the access overall, but wastes space. Thus on a file system with 512 byte blocks this particular data set would use 736.34MB, on a Sun Thumper (s01..s13) with a block size of 32k this is already 906.75MB and finally on the HSM (where $HOME is on a file system with qfs in the name), the block size is already at 256k as this data is striped across many disk sets. Here, the same data set suddenly consumes 5443.5MB(!).

Thus, just by copying data from one server to another, we increase the data size by a factor of about 6. This is only due to rounding of different blocksizes governed by the underlying file systems.

Trial data set

The trial dataset was involuntarily provided by an anonymous user (i.e. the user did not know that ;)), contains 74484 files and has a total size of 929 MByte (mostly Condor logs). The following table shows results obtained in compressing this data set into a tarball. To ignore file system related effects, the data set resided on a ramdisk and was also written to a ramdisk:

compression tool final size[MB] time needed[s] relative sizeSorted ascending relative time to tar
xz 502.6 393.8 0.63 186.7
lzip 503.4 665.4 0.63 315.5
lzma 502.6 395.0 0.63 187.3
gzip 521.0 33.4 0.66 15.9
bzip2 528.0 221.8 0.67 105.1
lzop 547.4 15.7 0.69 7.5
compress 764.7 52.6 0.96 24.9
null 792.8 2.1 1.00 1.0

DocumentationForm edit

Title How to clean up your $HOME
Description Try to clean up old files by placing them into tarballs (with compression to save time and hassle)
Tags compression, tar, tarball, clean
Category User
Topic revision: r13 - 10 Feb 2012, ArthurVarkentin
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback