is a clustered file-system capable of scaling to several peta-bytes. It aggregates various storage bricks over Infiniband RDMA or TCP
/IP interconnect into one large parallel network file system. GlusterFS
is based on a stackable user space design without compromising performance.
How it works
Server allows you export volumes over the network. GlusterFS
Client mounts the GlusterFS
volumes in to the kernel VFS. Much of the functionality in GlusterFS
is implemented as Translators.
Volume specification defines your GlusterFS
file system design layout, hence its behavior.Each volume in the spec file selects an appropriate translator module with corresponding configuration options. Through this volume spec file, you can completely program the GlusterFS
filesystem by arranging translators and modules in a graph with various options.
- Read Ahead Translator - read-ahead pre-fetches a sequence of blocks in advance based on its predictions.
- Write Behind Translator - multiple smaller write operations are aggregated into fewer larger write operations and written in background (non-blocking).
- Threaded I/O Translator - utilize the server idle blocked time to handle new incoming requests.
- IO-Cache Translator - IO-Cache translator helps one to reduce to load on server.
- Stat Pre-fetch Translator - stat-prefetch fetches stat info for all files in the folder in one operation.
- Automatic File Replication Translator - Automatic-file-replication translator implements RAID-1 like functionality for selected type of files.
- Stripe Translator - Striping translator stripes the input files into given block-size (default value is 128k) to its subvolumes (or child nodes) depending on the pattern specified.
- Unify Translator - Unify translator combines multiple storage bricks into one big fast storage server.
Scheduler decides how to distribute the new creation operations across the clustered filesystem based on load, availability and other determining factors.
- "Adaptive Least Usage" scheduler is composed of multiple least-usage sub-schedulers: disk-usage, read-usage, write-usage, open-files-usage, disk-speed-usage.
- Non-Uniform Filesystem Scheduler similar to NUMA (http://en.wikipedia.org/wiki/Non-Uniform_Memory_Access
) memory design.
- randomly scatters file creation across storage bricks.
- Round-Robin (RR) scheduler creates files in a round-robin fashion. Each client will have its own round-robin loop.
- trace - trace translator produces extensive trace information for debugging purpose.
- filter - currently it only supports read-only export option.
- posix-locks - provides storage independent POSIX record locking support.
- trash - provides a 'libtrash' like feature.
- fixed-id - provides a feature where all the calls passing through this layer will be from a fixed UID and GID.
- posix - binds the GlusterFS server to underlying file system.
- server - allows you to export volumes over the network.
- client - allows you to attach to remote volumes exported by GlusterFS servers.
- rot-13 - encrypts the ASCII files in rotate-13 method.
You can install GlusterFS on your Ubunutu Feisty machine as simple as:
apt-get install glusterfs
By default glusterfsd is added to runlevel S with config file /etc/glusterfs/glusterfs-server.vol
, so do not forget to edit that file. You can change this, by editing the /etc/init.d/glusterfsd
Automatic mount in /etc/fstab
/mnt/glusterfs glusterfs defaults 0 0
1. Using Unify after losing a brick let you with inaccessible glusterfs mounted direcotry, until the brick comes up again. Work around is to remount the filesystems. In that case only the files stored on that brick are missing. If, and how, that should be done automaticaly is open quiestion to be investigate.
2. In Unify mode, if accedantly lose namespace break, remountring the file system will recover the namespace. However using 'afr' bricks for namespace is suggested. Namespace brick better should not be regular brick.
3. Using AFR, if a brick goes down in write operation, the written file will be corrupted on that brick.
4. Using AFR, if you want to store some file type 4 times, but one or more of the first 4 used bricks fails, than glusterfs will write to the next brick.
5. AFR runs even if you have only one healty brick. However files,that do not exist, can not be accessed.
6. AFR is missing self-healty system. Unify healty system is broken or still not full functioning.