【Redis 源码】压缩链表结构

🕗 发布于 2025-01-21 03:48 redis 链表源码

压缩链表结构

文章目录

压缩链表结构

1. ziplist 由来

在说压缩链表前，我先拿数组和链表来做个引子，来更好理解为什么ziplist的由来，当我们在业务编写时，会涉及链表相关的处理，这时会有这两个进行筛选来应对业务场景，他俩各有各的好处，在不同的业务场景，也发挥着极大的作用。那他俩为什么会有区别呢？

那就要从他俩的存储原理来说起，数组在内存空间是一段连续的存储，而链表是通过存入下一个节点地址(指针)进行查找。

相对于数组，它的空间局部性更好，为什么这么说呢？CPU当要处理数据时，会先读缓存行(Cache Line)从内存拿出局，到缓存行里，CPU读取缓存行里的数据，缓存行大小通常为64kb，当程序访问数组中的某个元素的时候，CPU不仅会把数组的指定元素加载到缓存行中，还会预取(prefetch)相邻的几个元素，因为它们的物理内存地址是连续的。这使便利数组时，接下来要访问的数据很可能已经在内存中了，从而提高了访问速度。

相对于链表，访问其中的元素，CPU必须跟随指针从当前节点跳到另一个节点，每次跳跃都可能导致访问不在缓存行中的内存地址，这会导致缓存未命中（cache miss），进而需要从更慢的内存层级加载数据，降低性能。

🚪从前面解读数组和链表，可看出存储地址的连续性的重要，那么我们需要引入压缩链表了，压缩链表(ziplist)，既然提到压缩，那么从字面意思，可以理解为内存占用比例会压缩，也就是很节约内存空间，压缩链表相当于把能否将少量数据通过压缩到一串连续的空间进行空间局部性加速，但如果数据量过大，或者key的长度过大，占用过多的缓存行，加速就没有作用，这也是redis建议用小key的原因。

2. 组成

下面的注释，取自redis v2.6中的ziplist.c注释，讲解ziplist如何组成

🐯：通过这几大部分组成ziplist

zlbytes：ziplist长度
zltail：最后一个entry的偏移量
zllen：entries的长度
zlend：ziplist的结束符，255代表是ziplist的结尾

🐯Entry：每个entry都有预先定义的头，头部包含了2组信息，一个是entry的长度，一个是编码

通过下面的二进制看出，前面几位代表了编码类型，类似于汇编指令的操作码

为了加深理解，可看下图

下图取自深入理解计算机系统(csapp)中的第四章处理器的体系结构。可看到它们通过1级和2级的数字来决定操作的指令类型。ziplist同理

 * ZIPLIST OVERALL LAYOUT:
 * The general layout of the ziplist is as follows:
 * <zlbytes><zltail><zllen><entry><entry><zlend>
 *
 * <zlbytes> is an unsigned integer to hold the number of bytes that the
 * ziplist occupies. This value needs to be stored to be able to resize the
 * entire structure without the need to traverse it first.
 *
 * <zltail> is the offset to the last entry in the list. This allows a pop
 * operation on the far side of the list without the need for full traversal.
 *
 * <zllen> is the number of entries.When this value is larger than 2**16-2,
 * we need to traverse the entire list to know how many items it holds.
 *
 * <zlend> is a single byte special value, equal to 255, which indicates the
 * end of the list.
 *
 * ZIPLIST ENTRIES:
 * Every entry in the ziplist is prefixed by a header that contains two pieces
 * of information. First, the length of the previous entry is stored to be
 * able to traverse the list from back to front. Second, the encoding with an
 * optional string length of the entry itself is stored.
 *
 * The length of the previous entry is encoded in the following way:
 * If this length is smaller than 254 bytes, it will only consume a single
 * byte that takes the length as value. When the length is greater than or
 * equal to 254, it will consume 5 bytes. The first byte is set to 254 to
 * indicate a larger value is following. The remaining 4 bytes take the
 * length of the previous entry as value.
 *
 * The other header field of the entry itself depends on the contents of the
 * entry. When the entry is a string, the first 2 bits of this header will hold
 * the type of encoding used to store the length of the string, followed by the
 * actual length of the string. When the entry is an integer the first 2 bits
 * are both set to 1. The following 2 bits are used to specify what kind of
 * integer will be stored after this header. An overview of the different
 * types and encodings is as follows:
 *
 * |00pppppp| - 1 byte
 *      String value with length less than or equal to 63 bytes (6 bits).
 * |01pppppp|qqqqqqqq| - 2 bytes
 *      String value with length less than or equal to 16383 bytes (14 bits).
 * |10______|qqqqqqqq|rrrrrrrr|ssssssss|tttttttt| - 5 bytes
 *      String value with length greater than or equal to 16384 bytes.
 * |11000000| - 1 byte
 *      Integer encoded as int16_t (2 bytes).
 * |11010000| - 1 byte
 *      Integer encoded as int32_t (4 bytes).
 * |11100000| - 1 byte
 *      Integer encoded as int64_t (8 bytes).
 * |11110000| - 1 byte
 *      Integer encoded as 24 bit signed (3 bytes).
 * |11111110| - 1 byte
 *      Integer encoded as 8 bit signed (1 byte).
 * |1111xxxx| - (with xxxx between 0000 and 1101) immediate 4 bit integer.
 *      Unsigned integer from 0 to 12. The encoded value is actually from
 *      1 to 13 because 0000 and 1111 can not be used, so 1 should be
 *      subtracted from the encoded 4 bit value to obtain the right value.
 * |11111111| - End of ziplist.
 *
 * All the integers are represented in little endian byte order.

ziplist的定义的结构体源码，和上述的注释是相对应

typedef struct zlentry {
    unsigned int prevrawlensize, prevrawlen;
    unsigned int lensize, len;
    unsigned int headersize;
    unsigned char encoding;
    unsigned char *p;
} zlentry;

3. ziplist对象

3.1 ziplist对象创建

redis v2.6源码

// object.c(src) - 105line

// 创建zipList对象
robj *createZiplistObject(void) {
    // 创建一个空的ziplist
    unsigned char *zl = ziplistNew();
    // 创建对象
    robj *o = createObject(REDIS_LIST,zl);
    // 字符编码指定为ziplist
    o->encoding = REDIS_ENCODING_ZIPLIST;
    // 返回地址指针
    return o;
}

3.2 创建一个空的ziplist

这里Redis提供了创建空的ziplist源码，我们进行简单阅读下

// ziplist.c(src) - 418 line

/* Create a new empty ziplist. */
unsigned char *ziplistNew(void) {
    // 定义元数据头大小
    unsigned int bytes = ZIPLIST_HEADER_SIZE+1;
    // 分配内存
    unsigned char *zl = zmalloc(bytes);
    // 小端序转换成大端序
    ZIPLIST_BYTES(zl) = intrev32ifbe(bytes);
    // 用来获取 ziplist 尾部元素相对于 ziplist 开头的偏移量。
    ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(ZIPLIST_HEADER_SIZE);
    // 定义ziplist长度为0
    ZIPLIST_LENGTH(zl) = 0;
    // 尾部结束符
    zl[bytes-1] = ZIP_END;
    // 返回地址指针
    return zl;
}

进行里面的api引用进行阅读

⭐️ZIPLIST_HEADER_SIZE也就是头部信息大小

// unit32_t：32bit
// uint16_t：16bit
// (32/8) * 2 + 16/8 = 8 + 2 = 10byte
// 头大小定义为10大小的字节
#define ZIPLIST_HEADER_SIZE     (sizeof(uint32_t)*2+sizeof(uint16_t))

⭐️zmalloc，也就是对内存分配的算法，底层是glibc的malloc源码，可看我另一篇malloc源码讲解

intrev32ifbe，这里是将小端序转换为大端序，统一数据类型的字节序

🐱字节序：

当使用int时，你会用4byte进行存储，在内存里，分为高低地址，那么int里你的高位是在内存里的高地址，还是低地址，所以出现了字节序这个概念。

⭐️ZIPLIST_TAIL_OFFSET：用来获取 ziplist 尾部元素相对于 ziplist 开头的偏移量。

// (zl)：  unsigned char *zl 传入时指定
// uint32_t：32bit = 32/8 = 4byte
// 将传入的zl,进行获取（zl）地址，偏移4byte，获取末尾地址
// 然后将该末尾地址转化为uinit32_t类型的指针，然后进行解引用获取该地址里的值
// 这里获取的是ziplist结构体里的prevrawlen属性
#define ZIPLIST_TAIL_OFFSET(zl) (*((uint32_t*)((zl)+sizeof(uint32_t))))

⭐️ZIPLIST_LENGTH

// (zl)：  unsigned char *zl 传入时指定
// 将zl指针偏移32 * 2bit大小
// 然后转换为uint16_t类型的指针
// 然后解引用获取该地址里的值
// 这里获取的是ziplist结构体里的lensize属性
#define ZIPLIST_LENGTH(zl)      (*((uint16_t*)((zl)+sizeof(uint32_t)*2)))

⭐️ zl[bytes-1] = ZIP_END;

// 将zl的末尾值，存储为一个255的数字，告诉redis，这个ziplist的末尾标志位 
zl[bytes-1] = ZIP_END;

3.3 创建对象

// 指定对象类型是list，并且将刚才创建的空的ziplist指针放入 
robj *o = createObject(REDIS_LIST,zl);

⭐️对象类型

// redis.h(src) - 139 line

/* Object types */
#define REDIS_STRING 0
#define REDIS_LIST 1
#define REDIS_SET 2
#define REDIS_ZSET 3
#define REDIS_HASH 4

⭐️创建对象源码

robj *createObject(int type, void *ptr) {
    // 分配内存
    robj *o = zmalloc(sizeof(*o));
    // 指定数据类型
    o->type = type;
    // 指定编码
    o->encoding = REDIS_ENCODING_RAW;
    // 指定指针
    o->ptr = ptr;
    // 使用次数
    o->refcount = 1;

    /* Set the LRU to the current lruclock (minutes resolution). */
    o->lru = server.lruclock;
    return o;
}

3.4 字符编码指定

// 指定该对象的类型编码是压缩链表的字符编码
o->encoding = REDIS_ENCODING_ZIPLIST;

⭐️字符编码类型

// redis.h(src) - 139 line

/* Objects encoding. Some kind of objects like Strings and Hashes can be
 * internally represented in multiple ways. The 'encoding' field of the object
 * is set to one of this fields for this object. */
#define REDIS_ENCODING_RAW 0     /* Raw representation */
#define REDIS_ENCODING_INT 1     /* Encoded as integer */
#define REDIS_ENCODING_HT 2      /* Encoded as hash table */
#define REDIS_ENCODING_ZIPMAP 3  /* Encoded as zipmap */
#define REDIS_ENCODING_LINKEDLIST 4 /* Encoded as regular linked list */
#define REDIS_ENCODING_ZIPLIST 5 /* Encoded as ziplist */
#define REDIS_ENCODING_INTSET 6  /* Encoded as intset */
#define REDIS_ENCODING_SKIPLIST 7  /* Encoded as skiplist */

4. 总结

通过上面的描述，清晰的看出ziplist的由来，组成，以及如何创建一个ziplist对象，以及部分源码的理解

如果读者想了解更多的ziplist更多源码，可阅读ziplist.c源码，里面涵盖了插入，删除等等api的实现。

原文地址：https://blog.csdn.net/zhangHP_123/article/details/145159182

免责声明：本站文章内容转载自网络资源，如侵犯了原著者的合法权益，可联系本站删除。更多内容请关注自学内容网（zxcms.com）！

上一篇：SpringBoot实现异步调用的方法
下一篇：C++ 时间操作chrono库(实现系统时间与字符串时间相互转换)

谈一谈前端构建工具的本地代理配置（Webpack与Vite）
使用代理之后，在浏览器中，前端访问还是原来的非跨域的接口，但实际请求后端的url可能早就被改的面目全非了。
阅读更多2025-01-21
摄像头模块如何应用在宠物产品领域
玩具中的摄像头可以检测宠物的接近和互动动作，例如当宠物拍打或者追逐玩具时，摄像头会捕捉宠物的表情和动作。主人可以通过手机看到宠物玩耍玩具时的可爱模样，并且可以根据宠物的反应来调整玩具的设置，如改变玩具
阅读更多2025-01-21
Kotlin 2.1.0 入门教程（三）
Kotlin 2.1.0 入门教程（三）。
阅读更多2025-01-21
Arcgis Pro安装完成后启动失败的解决办法
之前安装的Arcgis Pro 今天突然不能使用了，之前是可以使用的，自从系统更新了以后就出现了这个问题...
阅读更多2025-01-21
Ubuntu 完整卸载 WPS Office (deb包安装版)
Ubuntu完整卸载WPSOffice(deb包安装版)
阅读更多2025-01-21
力扣11-最后一个单词的长度
由若干单词组成，单词前后用一些空格字符隔开。是指仅由字母组成、不包含任何空格字符的最大子字符串。最后一个单词是长度为 6 的“joyboy”。最后一个单词是“World”，长度为 5。最后一个单词是“
阅读更多2025-01-21
云原生作业（四）
简述mysql主从复制原理及其工作过程，配置一主两从并验证。
阅读更多2025-01-21
LeetCode：37. 解数独
LeetCode：37. 解数独
阅读更多2025-01-21
leetcode763.划分字母区间
思路：遍历字符串，得到每个字母第一次和最后一次出现的下标位置。map<字母，[字母第一次出现位置，字母最后一次出现位置]>为保证题目“同一字母最多出现在一个片段中”，合并所有字母出现区间，
阅读更多2025-01-21
Datawhale组队学习笔记task2——leetcode面试题
教程内容来自Datawhale开源教程：https://github.com/datawhalechina/leetcode-notes/blob/main/docs/ch07/index.md在线学
阅读更多2025-01-21