With the evolution of the Internet and services it provides, the need of sharing and distributing of data appeared. Obviously, a local filesystem doesn’t fit this need, that’s why a number of different network file systems appeared.
A network file system is a kind of a network abstraction over a file system. It provides remote access to the client over a network in a way a local file system does on a computer. In other words, it acts as a client for a remote file access protocol, providing access to files on a server. Programs using local interfaces can transparently create, manage and access hierarchical directories and files located on remote network-connected computers.
Here are some types of network file systems in terms of data sharing:
DFS was originated to ensure the fact that the processes of locating files, transporting data, modifying files remain clearly organized to client programs.
Today it is the most common way of storing data. Among distributed file systems are GlusterFS, Lustre, and many others.
DFS Design Goals:
CFS Design Goals:
Сentral management resides among the major NFS advantages. Working with a centrally managed server helps to decrease the workload for the system administrator in terms of back-ups, adding software that will be shared, and computer repair. NFS is available for every Linux distribution worldwide and it can be installed from either the command line or the distribution’s package manager.
NFS Design Goals:
File Systems Overview
GlusterFS is a scalable distributed file system originally released by Gluster, but in 2012 acquired by Red Hat. It is an open source product which helps you to create large, distributed file storage solutions for media streaming, data analysis, etc., simply using standard hardware. GlusterFS allows systems operating on the file system level to ensure that the data is copied to another location no matter when it is written to the disk, unlike other software and databases that only give the possibility to spread data out in the context of a single application.
Different kinds of storage configurations are possible with GlusterFS. Many of those configurations are functionally similar to RAID levels, such as striping data across different nodes in the cluster, or implementation of redundancy for better data availability.
GFS (Google File System) is a proprietary distributed file system which purpose is to provide efficient, reliable access to data using large clusters of commodity hardware. GFS is designed to run on clusters of computers. GFS allows to organize and manipulate huge files giving the possibility to developers to research and develop resources they require.
Scalability is a top priority for this FS so its performance won’t suffer as it grows.
While designing GFS, an easy control of the system was also considered. That is why basic file commands (as open, create, read, write and close files) were implemented along with specialized commands such as append and snapshot.
GFS2 (Global File System 2) is a clustered file system powered by Red Hat. GFS2 differs from distributed file systems such as GlusterFS because it allows all nodes to have direct parallel access to the same shared block storage. In addition, GFS2 can also be used as a local file system.
GFS2 is a journaled file system. Journaling is designed to prevent data loss and data corruption from crashes and unexpected power cut. When your system is halfway from writing a file to the disk and it unexpectedly loses power, a journal identifies whether the file was completely written to disk or not. Linux can check the file system’s journal when it re-starts and resumes any partially completed job.
Also it worth to mention Ceph, a scale-out storage platform that is “designed for excellent performance, reliability, and scalability.” It is more complex technology than just a simple FS, since it provides distributed low-level storage, on top of which different ways to access data are possible:
In terms of object storage, Ceph’s libraries provide client applications with direct access to the RADOS object-based storage system. It is prefered for large scale storage systems, since it guarantees storing data more efficiently. Object-based storage systems separate the object namespace from the underlying storage hardware which simplifies data migration.
In terms of block based storage, when you use a block device while writing data to Ceph, it automatically stripes and replicates the data across the cluster.
In terms of file system, Ceph provides a traditional file system interface with POSIX semantics. Object storage systems are a significant innovation, but they complement rather than replace traditional file systems.
The most remarkable Ceph’s feature is that it does not rely on a central metadata service to locate data, but uses CRUSH algorithm to calculate the location of data. It replicates and re-balance data within the cluster dynamically – eliminating this tedious task for administrators, while delivering high-performance and infinite scalability.
Benefits of Ceph:
Do not mix up!
DRBD (Distributed Replicated Block Device) a distributed replicated storage system for the Linux platform. It is not a file system, but rather RAID1-over-the-network block device. The DRBD’s purpose is the formation of a fault-tolerant cluster environment on Linux.
The software was designed with Linux security standard in mind that at the same time offers excellent reliability with little expenses. DRBD usually goes with all common flavors of Linux for synchronous replication of stored data between a passive system and an active system (there is also a possibility to read/write data on both systems simultaneously using one of clustered FS we described above). DRBD supports resource-level fencing as well. On 8 December 2009 DRBD became a part of the official Linux kernel.
Highly Available NFS with DRBD
Having single NFS server is not good for high availability since it becomes a single point of failure. But DRBD could be a solution here, it allows to replicate the actual block device on which NFS is hosted. With some significant configuration implied, it gives the possibility to successfully failover an NFS mount without stale file handles on the client side. Such configuration is difficult to get correct but it is one of the few means to achieve HA for NFS.
Setting up a Highly Available NFS using DRBD usually goes in conjunction with such software as Pacemaker or Heartbeat.
Heartbeat/Pacemaker and DRBD can be used effectively to maintain high availability. They are network-oriented tools for maintaining high availability and managing failover. Scenario example: We have 2 nodes with DRBD in active/passive mode, and Heartbeat/Pacemaker setup to mount DRBD on active node, start NFS server and add a failover IP. In this case the first node goes down, the second one notices this and starts migration process, which includes switching DRBD resource to active mode, mounting it and bringing up all needed services.
File systems differ in terms of their own ways of organizing their data. They also can differ in terms of features such as speed, security, drives support with large/small storage capacities. Some file systems are more robust and resistant to file corruption, while others sacrifice that robustness for additional speed. There isn’t only one correct choice.