Abstract
GlusterFS is a clustered file-system capable of scaling to several peta-bytes. It aggregates various storage bricks over Infiniband RDMA or
TCP/IP interconnect into one large parallel network file system.
GlusterFS is based on a stackable user space design without compromising performance.
How it works
GlusterFS Server allows you export volumes over the network.
GlusterFS Client mounts the
GlusterFS volumes in to the kernel VFS. Much of the functionality in
GlusterFS is implemented as Translators.
Volumes
Volume specification defines your
GlusterFS file system design layout, hence its behavior.Each volume in the spec file selects an appropriate translator module with corresponding configuration options. Through this volume spec file, you can completely program the
GlusterFS filesystem by arranging translators and modules in a graph with various options.
http://www.gluster.org/docs/index.php/GlusterFS_Volume_Specification
Translaters
- Read Ahead Translator - read-ahead pre-fetches a sequence of blocks in advance based on its predictions.
- Write Behind Translator - multiple smaller write operations are aggregated into fewer larger write operations and written in background (non-blocking).
- Threaded I/O Translator - utilize the server idle blocked time to handle new incoming requests.
- IO-Cache Translator - IO-Cache translator helps one to reduce to load on server.
- Stat Pre-fetch Translator - stat-prefetch fetches stat info for all files in the folder in one operation.
Clustering Translators
- Automatic File Replication Translator - Automatic-file-replication translator implements RAID-1 like functionality for selected type of files.
- Stripe Translator - Striping translator stripes the input files into given block-size (default value is 128k) to its subvolumes (or child nodes) depending on the pattern specified.
- Unify Translator - Unify translator combines multiple storage bricks into one big fast storage server.
GlusterFS Schedulers: Scheduler decides how to distribute the new creation operations across the clustered filesystem based on load, availability and other determining factors.
ALU Scheduler - "Adaptive Least Usage" scheduler is composed of multiple least-usage sub-schedulers: disk-usage, read-usage, write-usage, open-files-usage, disk-speed-usage.
NUFA Scheduler - Non-Uniform Filesystem Scheduler similar to NUMA (
http://en.wikipedia.org/wiki/Non-Uniform_Memory_Access) memory design.
Random Scheduler - randomly scatters file creation across storage bricks.
Round-Robin Scheduler - Round-Robin (RR) scheduler creates files in a round-robin fashion. Each client will have its own round-robin loop.
Debug translators
- trace - trace translator produces extensive trace information for debugging purpose.
- filter - currently it only supports read-only export option.
- posix-locks - provides storage independent POSIX record locking support.
- trash - provides a 'libtrash' like feature.
- fixed-id - provides a feature where all the calls passing through this layer will be from a fixed UID and GID.
Storage translators
- posix - binds the GlusterFS server to underlying file system.
Protocol Translators
- server - allows you to export volumes over the network.
- client - allows you to attach to remote volumes exported by GlusterFS servers.
Encryption Translators
- rot-13 - encrypts the ASCII files in rotate-13 method.
http://www.gluster.org/docs/index.php/GlusterFS_Translators
Installation
You can install GlusterFS on your Ubunutu Feisty machine as simple as:
apt-get install glusterfs
By default glusterfsd is added to runlevel S with config file
/etc/glusterfs/glusterfs-server.vol , so do not forget to edit that file. You can change this, by editing the
/etc/init.d/glusterfsd script.
Automatic mount in
/etc/fstab is possible
/mnt/glusterfs glusterfs defaults 0 0
1. Using Unify after losing a brick let you with inaccessible glusterfs mounted direcotry, until the brick comes up again. Work around is to remount the filesystems. In that case only the files stored on that brick are missing. If, and how, that should be done automaticaly is open quiestion to be investigate.
2. In Unify mode, if accedantly lose namespace break, remountring the file system will recover the namespace. However using 'afr' bricks for namespace is suggested. Namespace brick better should not be regular brick.
3. Using AFR, if a brick goes down in write operation, the written file will be corrupted on that brick.
4. Using AFR, if you want to store some file type 4 times, but one or more of the first 4 used bricks fails, than glusterfs will write to the next brick.
5. AFR runs even if you have only one healty brick. However files,that do not exist, can not be accessed.
6. AFR is missing self-healty system. Unify healty system is broken or still not full functioning.
References
http://www.gluster.org/docs/index.php/GlusterFS
http://www.gluster.org/docs/index.php/GlusterFS_FAQ