What are we doing here?

This blog includes a series of videos and references to help new users or enthusiasts better understand how to use open source and free technology tools. The quick links includes more information for accessing many of the tools covered along with other references to learn more for taking advantage of these tools.

Click HERE to see the full list of topics covered!

Scale-out file systems - GlusterFS

Happened to be working on this and thought I'd create a write up. 

Scale-out file systems are essentially storage areas that are synchronized between different computers (nodes) but still retain the same data. There are many kinds of scale-out file systems - both open source and closed source - and for this tutorial we'll focus on one called GlusterFS.

GlusterFS is an open source project and maintained by RedHat. What it does for storage is allow for files to be written and read from via multiple devices and all files written to a GlusterFS share are identical to from client to client. 

What does that mean? Well it means for example if I want my data to be redundant even if a single computer or server in my group fails, with GlusterFS that's possible. It also helps to improve performance in certain areas because if the servers are used as storage you are serving files and data from multiple hosts (PCs/servers) rather than a single one. 

The steps below explain how to create a GlusterFS volume between 2 hosts (servers) and then mounting the files systems using GlusterFS client so that data between the hosts is synchronized.  This has numerous applications for IT deployments either in single or multiple sites , but GlusterFS can also be deployed in what is known as N+1 or N+N dispersed arrangements whereby the storage doesn't just replicate or mirror data, but also expands the total capacity. For example N+1 with 4 hosts would have 3x the total capacity of any single host. So if you had 10 TB on each host (server), with GlusterFS your usable capacity would be ~30TB minus the file system overhead, and could have one host crash at anytime without losing data.

This might be a little advanced, but I figured I'd share my progress with this as it could be handy.

Getting Gluster running on 2 hosts - tested in Fedora 33 Server


#1

First need to disable or allow the firewall to pass the Gluster packets.

Simple: $sudo service firewalld stop

Complex: $sudo firewall-cmd --zone=FedoraServer --permanent --add-port=24007/tcp --add-port=24008/tcp --add-port=24009/tcp --add-port=49152/tcp --add-port=49153/tcp


#2

IP addresses don't play nice with Gluster - need to add hosts to each node's host file or have them setup in your DNS.

/etc/hosts:

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4

::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.122.38 fedoraS1

192.168.122.93 fedoraS2


#3

Install glusterfs, glusterfs-fuse, gluster-server 

Gluster geo-replication may also be required, depends on the use case. Geo-replciation is beyond the scope of this tutorial.


#4

Start glusterd 

$sudo service glusterd start

Note: to enable it run $sudo systemctl enable glusterd <-- this allows for it to run on boot.

Another thing to pay attention to is there is the another service called glusterfsd which is the client service. Glusterd is the server portion to run the volume.


#5

As root or sudo, need to peer probe from 1 client to the other

$sudo gluster peer probe fedoraS2 <-- has to be the host name not the IP as that can screw up


#6

As root or sudo, create a volume

$sudo gluster volume create voltest replica 2 fedoraS1:/home/joe/glusterdata fedoraS2:/home/joe/glusterdata force 

Note: force seems needed with running as root


#7

Need to start the volume. 

$sudo gluster volume start voltest


#8

That's all fine and dandy, but files still won't transfer if you write below that data directory, it needs to be mounted as a glusterfs aware file system.

On both hosts run 

[joe@fedoraS1 glustermnt]$ sudo mount -t glusterfs fedoraS1:/voltest glustermnt

[joe@fedoraS2 glustermnt]$ sudo mount -t glusterfs fedoraS2:/voltest glustermnt

Note that both mount from themselves, the opposite of what I would have thought, but I guess it makes sense from a latency perspective.


#9

At this point it should all work - what shows in glustermnt on one host is replicated to the other.

[joe@fedoraS1 glustermnt]$ df

Filesystem                     1K-blocks    Used Available Use% Mounted on

devtmpfs                          477484       0    477484   0% /dev

tmpfs                             498348       0    498348   0% /dev/shm

tmpfs                             199340     996    198344   1% /run

/dev/mapper/fedora_fedora-root   9422848 2452612   6970236  27% /

tmpfs                             498352       0    498352   0% /tmp

/dev/vda1                        1038336  253940    784396  25% /boot

tmpfs                              99668       0     99668   0% /run/user/1000

fedoraS1:/voltest               18845696 4909272  13936424  27% /home/joe/glustermnt


Additional reference


No comments:

Post a Comment