GFS分布式文件系统

🕗 发布于 2024-07-21 18:39 分布式

1. 设计目标

高可用性和容错性：GFS设计时就考虑到了硬件故障的普遍存在，通过数据复制和容错机制来保证系统的高可用性。
高性能：为了处理大量数据和高并发访问，GFS通过多路复用和数据分片来提升性能。
扩展性：GFS可以通过增加节点来轻松扩展存储和计算能力。
支持大文件：GFS特别适合处理大文件和大数据量的情况，优化了对大文件的存储和读取。

2. 架构

GFS的架构由以下三个主要组件组成：

2.1 Master（主服务器）

管理元数据：Master服务器负责管理文件系统的元数据，包括文件和块的命名空间、块的位置、文件与块的映射等。
协调操作：Master协调客户端与Chunkserver之间的操作，如读写请求、块复制等。
定期操作：Master会定期执行垃圾回收、数据平衡等维护操作，以保证系统的稳定运行。

2.2 Chunkserver（块服务器）

存储数据块：Chunkserver负责存储实际的数据块，每个块默认大小为64MB。
处理读写请求：Chunkserver直接处理来自客户端的读写请求，并与其他Chunkserver协作进行块复制和故障恢复。
心跳机制：Chunkserver定期向Master发送心跳信息，报告自身状态和块的存储情况。

2.3 Client（客户端）

请求处理：客户端向Master请求文件的元数据信息（如块的位置），然后直接与Chunkserver交互进行数据的读写操作。
缓存元数据：客户端会缓存从Master获取的元数据，以减少与Master的交互频率，提升效率。

3. 数据存储与操作

3.1 数据块（Chunk）

数据分块：GFS将文件分割成固定大小的块（通常为64MB），每个块有一个唯一的块标识符。
块复制：每个数据块在多个Chunkserver上保存副本（默认是3个），以提高容错性和可用性。

3.2 读操作

请求元数据：客户端首先向Master请求文件元数据，获得文件中各个块的位置和副本信息。
读取数据块：客户端直接向存储数据块的Chunkserver请求数据，进行数据读取。
容错处理：如果某个Chunkserver不可用，客户端会尝试从其他副本所在的Chunkserver读取数据。

3.3 写操作

请求元数据：客户端向Master请求文件元数据，获取要写入块的位置信息。
锁定块：Master分配写锁，保证同一时间只有一个客户端对同一块进行写操作。
写入数据：客户端将数据写入所有副本所在的Chunkserver，采用流水线方式进行写入，保证数据一致性。
确认写入：所有副本成功写入后，Chunkserver向客户端确认写操作完成。

4. 容错与恢复

心跳机制：Chunkserver定期向Master发送心跳信息，报告自身状态和块的存储情况。如果某个Chunkserver长时间没有发送心跳，Master会认为其失效，并启动块恢复过程。
块复制：Master会监控每个数据块的副本数量，如果某个块的副本数量低于阈值，会在其他Chunkserver上创建新的副本。
数据校验：GFS使用校验和（checksum）机制来验证数据块的完整性，如果发现数据损坏，会从其他副本中恢复数据。

5. 优化与改进

读写优化：GFS针对大文件和大规模数据处理进行了多种优化，如读缓存、写流水线、并行处理等。
数据平衡：Master定期检查Chunkserver的负载情况，通过移动数据块实现负载均衡，保证系统性能稳定。
垃圾回收：Master定期执行垃圾回收操作，删除不再使用的数据块，释放存储空间。

GFS的设计和实现为大规模数据处理和分布式存储提供了可靠、高效的解决方案，广泛应用于Google的各类应用和服务中。其思想和架构也对后来的分布式文件系统（如HDFS）产生了深远的影响。

部署群集环境

1：准备环境（以node1为例，其他节点步骤相同，请自行配置）

（1）添加磁盘

按照表中所示，为node节点添加相应数量和大小的磁盘，并重启系统

（2）为所有节点的新磁盘分区

用fdisk命令为每个磁盘分区，分区步骤略

[root@localhost ~]# hostname node1

[root@localhost ~]# bash

[root@localhost ~]# fdisk /dev/sdb

[root@localhost ~]# fdisk /dev/sdc

[root@localhost ~]# fdisk /dev/sdd

[root@localhost ~]# fdisk /dev/sde

[root@localhost ~]# fdisk /dev/sdf

（2）为每个节点的每个磁盘格式化

[root@localhost ~]# mkfs -t ext4 /dev/sdb1

[root@localhost ~]# mkfs -t ext4 /dev/sdc1

[root@localhost ~]# mkfs -t ext4 /dev/sdd1

[root@localhost ~]# mkfs -t ext4 /dev/sde1

[root@localhost ~]# mkfs -t ext4 /dev/sdf1

（3）创建挂载点

[root@localhost ~]# mkdir /b3

[root@localhost ~]# mkdir /c4

[root@localhost ~]# mkdir /d5

[root@localhost ~]# mkdir /e6

[root@localhost ~]# mkdir /f7

（4）挂载磁盘

[root@localhost ~]# mount /dev/sdb1 /b3

[root@localhost ~]# mount /dev/sdc1 /c4

[root@localhost ~]# mount /dev/sdd1 /d5

[root@localhost ~]# mount /dev/sde1 /e6

[root@localhost ~]# mount /dev/sde1 /f7

（5）修改fstab文件，使挂载永久生效

[root@localhost ~]# vi /etc/fstab

在末尾添加

/dev/sdb1 /b3 ext4 defaults 0 0

/dev/sdc1 /c4 ext4 defaults 0 0

/dev/sdd1 /d5 ext4 defaults 0 0

/dev/sde1 /e6 ext4 defaults 0 0

/dev/sde1 /f7 ext4 defaults 0 0

（6）在所有节点上关闭防火墙和selinux

[root@localhost ~]# systemctl stop firewalld

[root@localhost ~]# setenforce 0

（7）在所有节点上修改hosts文件

[root@localhost ~]# cat <<EOF> /etc/hosts

192.168.10.101 node1

192.168.10.102 node2

192.168.10.103 node3

192.168.10.104 node4

192.168.10.105 node5

192.168.10.106 node6

EOF

（8）配置yum仓库

连接阿里的yum源

yum -y install wget

wget -O /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo

wget -O /etc/yum.repos.d/epel.repo https://mirrors.aliyun.com/repo/epel-7.repo

yum install centos-release-gluster

2：在所有节点上安装gfs

[root@localhost ~]# yum -y install glusterfs glusterfs-server glusterfs-fuse glusterfs-rdma

注释：

glusterfs：gluster客户端主程序

glusterfs-server：gluster服务端主程序

glusterfs-fuse：Filesystem Userspace是一个可加载的内核模块，其支持非特权用户创建自己的文件系统而不需要修改内核代码。通过在用户空间运行文件系统的代码通过FUSE代码与内核进行桥接。

glusterfs-rdma：为gluster提供远程直接内存访问，支持不通过双方的OS进行直接内存访问。

3：在所有节点上启动GFS

[root@localhost ~]# systemctl start glusterd.service

[root@localhost ~]# systemctl enable glusterd.service

4：添加节点（只在node1节点上）

[root@localhost ~]# gluster peer probe node1

peer probe: success. Probe on localhost not needed

[root@localhost ~]# gluster peer probe node2

peer probe: success.

[root@localhost ~]# gluster peer probe node3

peer probe: success.

[root@localhost ~]# gluster peer probe node4

peer probe: success.

[root@localhost ~]# gluster peer probe node5

peer probe: success.

[root@localhost ~]# gluster peer probe node6

peer probe: success.

5：在每个节点上查看状态

[root@localhost ~]# gluster peer status

Number of Peers: 3

Hostname: node2

Uuid: 469be571-b52a-4a89-a30a-c3a770753b0e

State: Peer in Cluster (Connected)

Hostname: node3

Uuid: 24742939-afc6-4243-a8a8-1aa57a336128

State: Peer in Cluster (Connected)

Hostname: node4

Uuid: dbc703a3-1e22-42cd-bedf-da3541bce983

State: Peer in Cluster (Connected)

创建卷

1：创建分布式卷（在node1上）

注释：默认创建的就是分布式卷

[root@localhost ~]# gluster volume create dist-volume node1:/e6 node2:/e6 force

[root@localhost ~]# gluster volume info dist-volume

Volume Name: dist-volume

Type: Distribute

Volume ID: 40946bd8-cc79-406a-be3c-5c03dd2a207e

Status: Created

Snapshot Count: 0

Number of Bricks: 2

Transport-type: tcp

Bricks:

Brick1: node1:/e6

Brick2: node2:/e6

Options Reconfigured:

transport.address-family: inet

nfs.disable: on

[root@localhost ~]# gluster volume start dist-volume

2：创建复制卷

注释：复制卷要求复制的数量和Brick Server的数量（node的数量）相同

[root@localhost ~]# gluster volume create rep-volume replica 2 node3:/d5 node4:/d5 force

[root@localhost ~]# gluster volume info rep-volume

Volume Name: rep-volume

Type: Replicate

Volume ID: b5d1afda-ab03-47a7-82b9-2786648a9b3a

Status: Created

Snapshot Count: 0

Number of Bricks: 1 x 2 = 2

Transport-type: tcp

Bricks:

Brick1: node3:/d5

Brick2: node4:/d5

Options Reconfigured:

transport.address-family: inet

nfs.disable: on

[root@localhost ~]# gluster volume start rep-volume

3：分散卷

gluster volume create disp-volume disperse 3 redundancy 1 node1:/b3 node2:/b3 node3:/d3 force

[root@localhost ~]# gluster volume info disp-volume

[root@localhost ~]# gluster volume start disp-volume

备注：

用三个分区创建分散卷，允许一个分区失效

分散卷中每三个brick允许有一个失效（3*1或6*2）

是基于ErasureCodes（纠错码）的一种新类型的Volume（3.6版本正式发布该特性），类似于RAID5/6。通过配置Redundancy（冗余）级别提高可靠性，在保证较高的可靠性同时，可以提升物理存储空间的利用率。redundancy 必须大于0，并且bricks的总数必须大于2 *redundancy 。这意味着dispersed卷必须至少包含3个bricks。类似raid5/raid6，防止单点故障(HA),提升IO性能(LB)。

disperse 3 redundancy 1：需要3个brick

disperse 4 redundancy 1：需要4个brick

每个分散卷至少3个brick，冗余级别最小为1（一个brick失效）

在创建Dispersed volume时，如若redundancy值设置为0，则Dispersed volume等同于分布式卷，若redundancy设置为#Bricks/2，则Dispersed volume等同于复制卷，因此，在创建Dispersed volume时，redundancy值的设定应遵循以下公式：

0< redundancy<#Bricks /2

(disperse-data)+redundancy=disperse

1）3 bricks，创建Dispersed Type为1*(2+1)，存储磁盘空间利用率66.7%；

2）10bricks，创建Dispersed Type为2*(4+1)的卷，存储磁盘空间利用率为80%。

备注：

原来的条带卷，在6的版本以后不在支持

gluster volume create strip-volume stripe 2 node1:/d5 node2:/d5 force

4：创建分布式复制卷

注释：分布式复制卷要求Brick Server的数量必须是复制数的倍数（两倍或更高的倍数）

[root@localhost ~]# gluster volume create dist-rep replica 2 node1:/c4 node2:/c4 node3:/c4 node4:/c4 force

[root@localhost ~]# gluster volume info dist-rep

Volume Name: dis-rep

Type: Distributed-Replicate

Volume ID: 197055f7-37d8-419f-bb22-9f05c7e1a032

Status: Created

Snapshot Count: 0

Number of Bricks: 2 x 2 = 4

Transport-type: tcp

Bricks:

Brick1: node1:/c4

Brick2: node2:/c4

Brick3: node3:/c4

Brick4: node4:/c4

Options Reconfigured:

transport.address-family: inet

nfs.disable: on

[root@localhost ~]# gluster volume start dist-rep

5：分布式分散卷

gluster volume create dist-disp disperse 3 redundancy 1 node1:/f7 node2:/f7 node3:/f7 node4:/f7 node5:/f7 node6:/f7 force

gluster volume start dist-disp

备注：

分布式分散卷需要用六个节点，每三个节点做一个分散卷，两组分散卷做成复制卷

部署GFS客户端

1：安装客户端软件

[root@localhost ~]# systemctl stop firewalld

[root@localhost ~]# setenforce 0

[root@localhost ~]# systemctl disable firewalld

[root@localhost ~]# yum -y install glusterfs glusterfs-fuse

注释：上面两个包centos7系统已经默认安装过了

2：创建挂载目录

[root@localhost ~]# mkdir -p /test/{dist,rep,disp,dist_and_rep,dist_and_disp}

[root@localhost ~]# ls /test

3：修改hosts文件

[root@localhost ~]# cat <<EOF> /etc/hosts

192.168.10.101 node1

192.168.10.102 node2

192.168.10.103 node3

192.168.10.104 node4

192.168.10.105 node5

192.168.10.106 node6

4：挂载Gluster文件系统

[root@localhost ~]# mount -t glusterfs node1:dist-volume /test/dist

[root@localhost ~]# mount -t glusterfs node1:rep-volume /test/rep

[root@localhost ~]# mount -t glusterfs node1:disp-volume /test/disp

[root@localhost ~]# mount -t glusterfs node1:dist-rep /test/dist_and_rep

[root@localhost ~]# mount -t glusterfs node1:dist-disp /test/dist_and_disp

[root@localhost ~]# df -h

文件系统容量已用可用已用% 挂载点

文件系统 1K-块已用可用已用% 挂载点

devtmpfs 1918628 0 1918628 0% /dev

tmpfs 1930632 0 1930632 0% /dev/shm

tmpfs 1930632 20224 1910408 2% /run

tmpfs 1930632 0 1930632 0% /sys/fs/cgroup

/dev/mapper/centos-root 203316228 6181216 197135012 4% /

/dev/loop0 4600876 4600876 0 100% /media/cdrom

/dev/sda2 2086912 153676 1933236 8% /boot

tmpfs 386128 0 386128 0% /run/user/0

node1:dist-volume 12121216 172504 11408120 2% /test/dist

node1:rep-volume 5028480 71844 4728444 2% /test/rep

node1:dis-rep 8123776 116168 7636752 2% /test/dis_and_rep

node1:disp-volume 6059552 81176 5691736 2% /test/disp

node1:dis-disp 28370912 416992 26704304 2% /test/dis_and_disp

5：修改fstab配置文件

[root@localhost ~]# vi /etc/fstab

在末尾添加：

node1:dist-volume /test/dist glusterfs defaules,_netdev 0 0

node1:rep-volume /test/rep glusterfs defaules,_netdev 0 0

node1:dist-rep /test/dist_and_rep glusterfs defaules,_netdev 0 0

node1:disp-volume /test/disp glusterfs defaules,_netdev 0 0

node1:disp-rep /test/disp_and_rep glusterfs defaules,_netdev 0 0

在客户端测试Gluster文件系统

1：在卷中写入文件

（1）创建测试文件

dd if=/dev/zero of=/root/demo1.log bs=1M count=43

dd if=/dev/zero of=/root/demo2.log bs=1M count=43

dd if=/dev/zero of=/root/demo3.log bs=1M count=43

dd if=/dev/zero of=/root/demo4.log bs=1M count=43

dd if=/dev/zero of=/root/demo5.log bs=1M count=43

（2）卷中写入文件

[root@localhost ~]# cp demo* /test/dist/

[root@localhost ~]# cp demo* /test/rep/

[root@localhost ~]# cp demo* /test/dist_and_rep/

2：查看文件分布

（1）在node1和node2中查看分布式卷文件分布

node1：

[root@localhost ~]# ll -h /e6

总用量 173M

-rw-r--r--. 2 root root 43M 4月 17 22:06 demo1.log

-rw-r--r--. 2 root root 43M 4月 17 22:06 demo2.log

-rw-r--r--. 2 root root 43M 4月 17 22:06 demo3.log

-rw-r--r--. 2 root root 43M 4月 17 22:06 demo4.log

node2：

[root@localhost ~]# ll -h /e6

总用量 44M

-rw-r--r--. 2 root root 43M 4月 17 22:06 demo5.log

（3）在node3和node4中查看复制卷文件分布

node3：

[root@localhost ~]# ll -h /d5

总用量 216M

-rw-r--r--. 2 root root 43M 4月 17 22:06 demo1.log

-rw-r--r--. 2 root root 43M 4月 17 22:06 demo2.log

-rw-r--r--. 2 root root 43M 4月 17 22:06 demo3.log

-rw-r--r--. 2 root root 43M 4月 17 22:06 demo4.log

-rw-r--r--. 2 root root 43M 4月 17 22:06 demo5.log

node4：

[root@localhost ~]# ll -h /d5

总用量 216M

-rw-r--r--. 2 root root 43M 4月 17 22:06 demo1.log

-rw-r--r--. 2 root root 43M 4月 17 22:06 demo2.log

-rw-r--r--. 2 root root 43M 4月 17 22:06 demo3.log

-rw-r--r--. 2 root root 43M 4月 17 22:06 demo4.log

-rw-r--r--. 2 root root 43M 4月 17 22:06 demo5.log

（5）在node1、node2、node3、node4中查看分布式复制卷文件分布

node1：

[root@localhost ~]# ll -h /c4

总用量 173M

-rw-r--r--. 2 root root 43M 4月 17 22:06 demo1.log

-rw-r--r--. 2 root root 43M 4月 17 22:07 demo2.log

-rw-r--r--. 2 root root 43M 4月 17 22:07 demo3.log

-rw-r--r--. 2 root root 43M 4月 17 22:07 demo4.log

node2：

[root@localhost ~]# ll -h /c4

总用量 173M

-rw-r--r--. 2 root root 43M 4月 17 22:07 demo1.log

-rw-r--r--. 2 root root 43M 4月 17 22:07 demo2.log

-rw-r--r--. 2 root root 43M 4月 17 22:07 demo3.log

-rw-r--r--. 2 root root 43M 4月 17 22:07 demo4.log

node3：

[root@localhost ~]# ll -h /c4

总用量 44M

-rw-r--r--. 2 root root 43M 4月 17 22:07 demo5.log

node4：

[root@localhost ~]# ll -h /c4

总用量 44M

-rw-r--r--. 2 root root 43M 4月 17 22:07 demo5.log

破坏性测试

（1）挂起node2节点，在客户端上测试各个文件是否可以正常使用

（2）再挂起node4，在客户端继续测试各个文件的读取

其他维护命令

1：查看GFS卷

[root@localhost ~]# gluster volume list

dist-rep

dist-disp

dist-volume

rep-volume

disp-volume

[root@localhost ~]# gluster volume info

[root@localhost ~]# gluster volume status

2：停止删除卷

[root@localhost ~]# gluster volume stop dist-volume

Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y

volume stop: dis-stripe: success

[root@localhost ~]# gluster volume delete dist

Deleting volume will erase all information about the volume. Do you want to continue? (y/n) y

volume delete: dis-stripe: failed: Some of the peers are down

3：设置卷的访问控制

[root@localhost ~]# gluster volume set dist-rep auth.allow 192.168.1.*,10.1.1.*

volume set: success

原文地址：https://blog.csdn.net/yiluo__/article/details/140568765

免责声明：本站文章内容转载自网络资源，如本站内容侵犯了原著者的合法权益，可联系本站删除。更多内容请关注自学内容网（zxcms.com）！

上一篇：Mysql在linux安装报错
下一篇：【vueUse库Array模块各函数简介及使用方法--下篇】

单片机和FPGA有什么区别？
总的来说，选择单片机还是FPGA取决于具体的应用需求、成本预算、开发资源和性能要求。单片机更适合成本敏感和性能要求不高的应用，而FPGA则适用于需要高度定制化和高性能的应用。
阅读更多2024-11-15
离线语音识别自定义功能怎么用？
自学习功能是指终端用户可以通过语音输入的方式学习客户词条，来自定义唤醒词和命令词。设备默认可以通过“开灯”执行打开灯的动作，用户通过语音输入学习了“开一下灯”，则可以通过“开一下灯”的说法来执行打开灯
阅读更多2024-11-15
PPT技巧：如何合并PPT文件？
如何合并PPT文件？
阅读更多2024-11-15
Unity3D高级编程
本文是unity3d编程的核心内容，包括了多个知识点以及C#代码实现
阅读更多2024-11-15
HOW - PPT 制作系列（一）
注意以上几点，可以让一页PPT既美观又高效地传达信息。
阅读更多2024-11-15
不仅能够实现前后场的简单互动，而且能够实现人机结合，最终实现整个巡检流程的标准化的智慧园区开源了
智慧园区场景视频监控平台是一款功能强大且简单易用的实时算法视频监控系统。它的愿景是最底层打通各大芯片厂商相互间的壁垒，省去繁琐重复的适配流程，实现芯片、算法、应用的全流程组合，从而大大减少企业级应用约
阅读更多2024-11-15
云原生后端
一、背景与概念二、关键技术三、优势四、应用场景
阅读更多2024-11-15
云原生学习
云原生学习：介绍、Docker容器化
阅读更多2024-11-15
气膜球幕展览馆：开启元宇宙时代的沉浸式科技体验—轻空间
球幕结构能够包裹观众的全部视野，在这里，每一幅画面都经过精心调校，色彩真实、细节丰富，使得场景的还原度达到一个全新的高度。这样独特的沉浸感，是传统展览馆所无法比拟的，气膜球幕展览馆让每一位参观者都仿佛
阅读更多2024-11-15
Istio分布式链路监控搭建：Jaeger与Zipkin
Jaeger是由Uber开源的分布式追踪系统，它采用Go语言编写，主要借鉴了Google Dapper论文和Zipkin的设计，兼容OpenTracing以及Zipkin追踪格式，目前已经成为CNCF
阅读更多2024-11-15

GFS分布式文件系统

1. 设计目标

2. 架构

2.1 Master（主服务器）

2.2 Chunkserver（块服务器）

2.3 Client（客户端）

3. 数据存储与操作

3.1 数据块（Chunk）

3.2 读操作

3.3 写操作

4. 容错与恢复

5. 优化与改进

部署群集环境

1：准备环境（以node1为例，其他节点步骤相同，请自行配置）

（1）添加磁盘

（2）为所有节点的新磁盘分区

（2）为每个节点的每个磁盘格式化

（3）创建挂载点

（4）挂载磁盘

（5）修改fstab文件，使挂载永久生效

（6）在所有节点上关闭防火墙和selinux

（7）在所有节点上修改hosts文件

（8）配置yum仓库

2：在所有节点上安装gfs

3：在所有节点上启动GFS

4：添加节点（只在node1节点上）

5：在每个节点上查看状态

创建卷

1：创建分布式卷（在node1上）

2：创建复制卷

3：分散卷

4：创建分布式复制卷

5：分布式分散卷

部署GFS客户端

1：安装客户端软件

2：创建挂载目录

3：修改hosts文件

4：挂载Gluster文件系统

5：修改fstab配置文件

在客户端测试Gluster文件系统

破坏性测试

其他维护命令

相关文章