一、Redis 部署方式介绍

  • 单机部署
  • 主从复制
  • 哨兵模式
  • 集群模式

二、单节点部署

Docker Compose 文件内容如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
name: redis-single-demo

volumes:
redis-single-volume:
name: redis-single-demo-volume

networks:
redis-single-net:
name: redis-single-net
driver: bridge

services:
redis-single:
image: ${image}
restart: ${restart}
container_name: ${container_name}
hostname: ${host_name}
networks:
- redis-single-net
ports:
- 6379:6379
volumes:
- redis-single-volume:/data
- ./redis.conf:/usr/local/etc/redis/redis.conf
command: ["redis-server", "/usr/local/etc/redis/redis.conf"]

.env 文件内容如下:

1
2
3
4
image=redis:7.4-bookworm
container_name=redis-single
host_name=redis-single
restart=unless-stopped

services 中定义了 command 属性来在启动容器时执行命令:

1
command: ["redis-server", "/usr/local/etc/redis/redis.conf"]

在 Compose 文件中,将本地 redis.conf 文件绑定挂载到容器中:

1
2
volumes:
- ./redis.conf:/usr/local/etc/redis/redis.conf

修改本地 redis.conf 配置文件中的如下内容:

1
2
3
4
5
# bind 127.0.0.1 -::1
bind 0.0.0.0

# requirepass foobared
requirepass yourpassword

之后启动容器,如果出现 Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf 日志,说明没有使用挂载到容器上的配置文件:

1
2
3
4
5
6
7
8
$ docker compose up
[+] Running 2/2
✔ Network redis-single-net Created 0.0s
✔ Container redis-single Created 0.1s
Attaching to redis-single
redis-single | 1:C 09 Oct 2024 06:05:24.554 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis-single | 1:C 09 Oct 2024 06:05:24.554 * Redis version=7.4.1, bits=64, commit=00000000, modified=0, pid=1, just started
redis-single | 1:C 09 Oct 2024 06:05:24.554 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf

如果使用了绑定挂载的配置文件,日志打印输出如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
$ docker compose up
[+] Running 2/2
✔ Network redis-single-net Created 0.0s
✔ Container redis-single Created 0.1s
Attaching to redis-single
redis-single | 1:C 09 Oct 2024 06:56:20.246 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis-single | 1:C 09 Oct 2024 06:56:20.246 * Redis version=7.4.1, bits=64, commit=00000000, modified=0, pid=1, just started
redis-single | 1:C 09 Oct 2024 06:56:20.246 * Configuration loaded
redis-single | 1:M 09 Oct 2024 06:56:20.246 * monotonic clock: POSIX clock_gettime
redis-single | 1:M 09 Oct 2024 06:56:20.247 # Failed to write PID file: Permission denied
redis-single | 1:M 09 Oct 2024 06:56:20.247 * Running mode=standalone, port=6379.
redis-single | 1:M 09 Oct 2024 06:56:20.247 * Server initialized
redis-single | 1:M 09 Oct 2024 06:56:20.247 * Loading RDB produced by version 7.4.1
redis-single | 1:M 09 Oct 2024 06:56:20.247 * RDB age 188 seconds
redis-single | 1:M 09 Oct 2024 06:56:20.247 * RDB memory usage when created 1.14 Mb
redis-single | 1:M 09 Oct 2024 06:56:20.247 * Done loading RDB, keys loaded: 0, keys expired: 0.
redis-single | 1:M 09 Oct 2024 06:56:20.247 * DB loaded from disk: 0.000 seconds
redis-single | 1:M 09 Oct 2024 06:56:20.247 * Ready to accept connections tcp

进入容器,执行 redis 命令:

1
2
3
4
5
6
7
8
9
10
11
12
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
718f9724f472 redis:7.4-bookworm "docker-entrypoint.s…" 46 seconds ago Up 45 seconds 0.0.0.0:6379->6379/tcp redis-single

$ docker exec -it 718f9724f472 /bin/bash

root@redis-single:/data# redis-cli -h 127.0.0.1 -p 6379 -a yourpassword
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> set "key" "value"
OK
127.0.0.1:6379> get "key"
"value"

三、主从复制

1、配置

Compose 文件内容如下,其中包括一个 Redis 主节点和两个 Redis 从节点。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
name: redis-master-slave-demo

volumes:
redis-master-volume:
name: redis-master-slave-demo-master-volume
redis-slave-volume-1:
name: redis-master-slave-demo-slave-1-volume
redis-slave-volume-2:
name: redis-master-slave-demo-slave-2-volume

networks:
redis-master-slave-net:
name: redis-master-slave-net
driver: bridge

services:
redis-master:
image: ${image}
restart: ${restart}
container_name: redis-master
hostname: redis-master
networks:
- redis-master-slave-net
ports:
- 16379:6379
volumes:
- redis-master-volume:/data
- ./redis-master.conf:/usr/local/etc/redis/redis.conf
command: ["redis-server", "/usr/local/etc/redis/redis.conf"]

redis-slave-01:
image: ${image}
restart: ${restart}
container_name: redis-slave-01
hostname: redis-slave-01
networks:
- redis-master-slave-net
ports:
- 16380:6379
volumes:
- redis-slave-volume-1:/data
- ./redis-slave.conf:/usr/local/etc/redis/redis.conf
command: ["redis-server", "/usr/local/etc/redis/redis.conf"]

redis-slave-02:
image: ${image}
restart: ${restart}
container_name: redis-slave-02
hostname: redis-slave-02
networks:
- redis-master-slave-net
ports:
- 16381:6379
volumes:
- redis-slave-volume-2:/data
- ./redis-slave.conf:/usr/local/etc/redis/redis.conf
command: ["redis-server", "/usr/local/etc/redis/redis.conf"]

.env 文件内容如下:

1
2
image=redis:7.4-bookworm
restart=no

Redis 配置文件分为主节点配置文件和从节点配置文件,两个配置文件都需要分别挂载到对应的主从节点容器上。

修改主节点 redis.conf 配置文件中的如下内容,并重命名为 redis-master.conf

1
2
3
4
5
6
7
# bind 127.0.0.1 -::1
bind 0.0.0.0

# requirepass foobared
requirepass yourpassword

save 3600 1 300 100 60 10000

修改从节点 redis.conf 配置文件中的如下内容,并重命名为 redis-slave.conf

1
2
3
4
5
6
7
8
9
10
11
# bind 127.0.0.1 -::1
bind 0.0.0.0

# requirepass foobared
requirepass yourpassword

save 3600 1 300 100 60 10000

replicaof redis-master 6379

masterauth master-requirepass

2、启动及验证

执行 docker compose up -d 启动容器。可以通过 docker logs <container ID> 来查看 Redis 主从节点的日志。

2.1 首次启动场景

主节点的日志:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
2024-10-10 14:32:34 1:C 10 Oct 2024 06:32:34.670 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
2024-10-10 14:32:34 1:C 10 Oct 2024 06:32:34.670 * Redis version=7.4.1, bits=64, commit=00000000, modified=0, pid=1, just started
2024-10-10 14:32:34 1:C 10 Oct 2024 06:32:34.670 * Configuration loaded
2024-10-10 14:32:34 1:M 10 Oct 2024 06:32:34.670 * monotonic clock: POSIX clock_gettime
2024-10-10 14:32:34 1:M 10 Oct 2024 06:32:34.670 # Failed to write PID file: Permission denied
2024-10-10 14:32:34 1:M 10 Oct 2024 06:32:34.670 * Running mode=standalone, port=6379.
2024-10-10 14:32:34 1:M 10 Oct 2024 06:32:34.670 * Server initialized
2024-10-10 14:32:34 1:M 10 Oct 2024 06:32:34.670 * Ready to accept connections tcp
2024-10-10 14:32:35 1:M 10 Oct 2024 06:32:35.508 * Replica 172.19.0.2:6379 asks for synchronization
2024-10-10 14:32:35 1:M 10 Oct 2024 06:32:35.508 * Full resync requested by replica 172.19.0.2:6379
2024-10-10 14:32:35 1:M 10 Oct 2024 06:32:35.508 * Replication backlog created, my new replication IDs are '439d1a536f0cefb6980f59ef88f49e2e48cd0f2c' and '0000000000000000000000000000000000000000'
2024-10-10 14:32:35 1:M 10 Oct 2024 06:32:35.508 * Delay next BGSAVE for diskless SYNC
2024-10-10 14:32:35 1:M 10 Oct 2024 06:32:35.658 * Replica 172.19.0.3:6379 asks for synchronization
2024-10-10 14:32:35 1:M 10 Oct 2024 06:32:35.658 * Full resync requested by replica 172.19.0.3:6379
2024-10-10 14:32:35 1:M 10 Oct 2024 06:32:35.658 * Delay next BGSAVE for diskless SYNC
2024-10-10 14:32:40 1:M 10 Oct 2024 06:32:40.690 * Starting BGSAVE for SYNC with target: replicas sockets
2024-10-10 14:32:40 1:M 10 Oct 2024 06:32:40.691 * Background RDB transfer started by pid 21
2024-10-10 14:32:40 21:C 10 Oct 2024 06:32:40.691 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB
2024-10-10 14:32:40 1:M 10 Oct 2024 06:32:40.691 * Diskless rdb transfer, done reading from pipe, 2 replicas still up.
2024-10-10 14:32:40 1:M 10 Oct 2024 06:32:40.700 * Background RDB transfer terminated with success
2024-10-10 14:32:40 1:M 10 Oct 2024 06:32:40.700 * Streamed RDB transfer with replica 172.19.0.2:6379 succeeded (socket). Waiting for REPLCONF ACK from replica to enable streaming
2024-10-10 14:32:40 1:M 10 Oct 2024 06:32:40.700 * Synchronization with replica 172.19.0.2:6379 succeeded
2024-10-10 14:32:40 1:M 10 Oct 2024 06:32:40.700 * Streamed RDB transfer with replica 172.19.0.3:6379 succeeded (socket). Waiting for REPLCONF ACK from replica to enable streaming
2024-10-10 14:32:40 1:M 10 Oct 2024 06:32:40.700 * Synchronization with replica 172.19.0.3:6379 succeeded

从节点的日志:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
2024-10-10 14:32:34 1:C 10 Oct 2024 06:32:34.618 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
2024-10-10 14:32:34 1:C 10 Oct 2024 06:32:34.618 * Redis version=7.4.1, bits=64, commit=00000000, modified=0, pid=1, just started
2024-10-10 14:32:34 1:C 10 Oct 2024 06:32:34.618 * Configuration loaded
2024-10-10 14:32:34 1:S 10 Oct 2024 06:32:34.618 * monotonic clock: POSIX clock_gettime
2024-10-10 14:32:34 1:S 10 Oct 2024 06:32:34.618 # Failed to write PID file: Permission denied
2024-10-10 14:32:34 1:S 10 Oct 2024 06:32:34.618 * Running mode=standalone, port=6379.
2024-10-10 14:32:34 1:S 10 Oct 2024 06:32:34.618 * Server initialized
2024-10-10 14:32:34 1:S 10 Oct 2024 06:32:34.618 * Ready to accept connections tcp
2024-10-10 14:32:34 1:S 10 Oct 2024 06:32:34.620 * Connecting to MASTER redis-master:6379
2024-10-10 14:32:34 1:S 10 Oct 2024 06:32:34.620 * MASTER <-> REPLICA sync started
2024-10-10 14:32:35 1:S 10 Oct 2024 06:32:35.657 * Non blocking connect for SYNC fired the event.
2024-10-10 14:32:35 1:S 10 Oct 2024 06:32:35.657 * Master replied to PING, replication can continue...
2024-10-10 14:32:35 1:S 10 Oct 2024 06:32:35.658 * Partial resynchronization not possible (no cached master)
2024-10-10 14:32:40 1:S 10 Oct 2024 06:32:40.690 * Full resync from master: 439d1a536f0cefb6980f59ef88f49e2e48cd0f2c:0
2024-10-10 14:32:40 1:S 10 Oct 2024 06:32:40.691 * MASTER <-> REPLICA sync: receiving streamed RDB from master with EOF to disk
2024-10-10 14:32:40 1:S 10 Oct 2024 06:32:40.691 * MASTER <-> REPLICA sync: Flushing old data
2024-10-10 14:32:40 1:S 10 Oct 2024 06:32:40.691 * MASTER <-> REPLICA sync: Loading DB in memory
2024-10-10 14:32:40 1:S 10 Oct 2024 06:32:40.699 * Loading RDB produced by version 7.4.1
2024-10-10 14:32:40 1:S 10 Oct 2024 06:32:40.699 * RDB age 0 seconds
2024-10-10 14:32:40 1:S 10 Oct 2024 06:32:40.699 * RDB memory usage when created 0.98 Mb
2024-10-10 14:32:40 1:S 10 Oct 2024 06:32:40.699 * Done loading RDB, keys loaded: 0, keys expired: 0.
2024-10-10 14:32:40 1:S 10 Oct 2024 06:32:40.699 * MASTER <-> REPLICA sync: Finished with success

在主节点新增字符串类型的键值对。

1
2
3
4
$ set "name" "zyz"
"OK"
$ get name
"zyadz"

在从节点中查询刚才设置的字符串。

1
2
$ get name
"zyadz"

关于调用 redis-cli 的方式,可以选择直接进入容器:

1
2
3
4
5
6
7
8
9
10
11
12
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
718f9724f472 redis:7.4-bookworm "docker-entrypoint.s…" 46 seconds ago Up 45 seconds 0.0.0.0:6379->6379/tcp redis-single

$ docker exec -it 718f9724f472 /bin/bash

root@redis-single:/data# redis-cli -h 127.0.0.1 -p 6379 -a yourpassword
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> set "name" "zyz"
OK
127.0.0.1:6379> get "name"
"zyadz"

还可以使用官方提供的 Redis Insight 软件来查看和编辑键。

browse

workbench

2.2 读取 dump.rdb 文件启动场景

在该场景中,启用了 Redis 主从节点的持久化机制,会在容器的 /data 目录中生成 dump.rdb 文件。

执行 docker compose down 后会删除容器和创建的网络但是不会删除之前创建的卷。之后再次执行 docker compose up -d 时,会读取原有卷中的内容,会从 dump.rdb 文件中读取数据。

以下是再次启动时主节点的日志:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
2024-10-10 14:17:19 1:C 10 Oct 2024 06:17:19.553 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
2024-10-10 14:17:19 1:C 10 Oct 2024 06:17:19.553 * Redis version=7.4.1, bits=64, commit=00000000, modified=0, pid=1, just started
2024-10-10 14:17:19 1:C 10 Oct 2024 06:17:19.553 * Configuration loaded
2024-10-10 14:17:19 1:M 10 Oct 2024 06:17:19.553 * monotonic clock: POSIX clock_gettime
2024-10-10 14:17:19 1:M 10 Oct 2024 06:17:19.554 # Failed to write PID file: Permission denied
2024-10-10 14:17:19 1:M 10 Oct 2024 06:17:19.554 * Running mode=standalone, port=6379.
2024-10-10 14:17:19 1:M 10 Oct 2024 06:17:19.554 * Server initialized
2024-10-10 14:17:19 1:M 10 Oct 2024 06:17:19.554 * Loading RDB produced by version 7.4.1
2024-10-10 14:17:19 1:M 10 Oct 2024 06:17:19.554 * RDB age 13 seconds
2024-10-10 14:17:19 1:M 10 Oct 2024 06:17:19.554 * RDB memory usage when created 1.32 Mb
2024-10-10 14:17:19 1:M 10 Oct 2024 06:17:19.554 * Done loading RDB, keys loaded: 1, keys expired: 0.
2024-10-10 14:17:19 1:M 10 Oct 2024 06:17:19.554 * DB loaded from disk: 0.000 seconds
2024-10-10 14:17:19 1:M 10 Oct 2024 06:17:19.554 * Ready to accept connections tcp
2024-10-10 14:17:20 1:M 10 Oct 2024 06:17:20.365 * Replica 172.19.0.2:6379 asks for synchronization
2024-10-10 14:17:20 1:M 10 Oct 2024 06:17:20.365 * Partial resynchronization request from 172.19.0.2:6379 accepted. Sending 0 bytes of backlog starting from offset 8471.
2024-10-10 14:17:20 1:M 10 Oct 2024 06:17:20.556 * Replica 172.19.0.3:6379 asks for synchronization
2024-10-10 14:17:20 1:M 10 Oct 2024 06:17:20.556 * Partial resynchronization request from 172.19.0.3:6379 accepted. Sending 0 bytes of backlog starting from offset 8471.

以下是再次启动时从节点的日志:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
2024-10-10 14:17:19 1:C 10 Oct 2024 06:17:19.299 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
2024-10-10 14:17:19 1:C 10 Oct 2024 06:17:19.299 * Redis version=7.4.1, bits=64, commit=00000000, modified=0, pid=1, just started
2024-10-10 14:17:19 1:C 10 Oct 2024 06:17:19.299 * Configuration loaded
2024-10-10 14:17:19 1:S 10 Oct 2024 06:17:19.299 * monotonic clock: POSIX clock_gettime
2024-10-10 14:17:19 1:S 10 Oct 2024 06:17:19.299 # Failed to write PID file: Permission denied
2024-10-10 14:17:19 1:S 10 Oct 2024 06:17:19.299 * Running mode=standalone, port=6379.
2024-10-10 14:17:19 1:S 10 Oct 2024 06:17:19.299 * Server initialized
2024-10-10 14:17:19 1:S 10 Oct 2024 06:17:19.300 * Loading RDB produced by version 7.4.1
2024-10-10 14:17:19 1:S 10 Oct 2024 06:17:19.300 * RDB age 13 seconds
2024-10-10 14:17:19 1:S 10 Oct 2024 06:17:19.300 * RDB memory usage when created 1.28 Mb
2024-10-10 14:17:19 1:S 10 Oct 2024 06:17:19.300 * Done loading RDB, keys loaded: 1, keys expired: 0.
2024-10-10 14:17:19 1:S 10 Oct 2024 06:17:19.300 * DB loaded from disk: 0.000 seconds
2024-10-10 14:17:19 1:S 10 Oct 2024 06:17:19.300 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
2024-10-10 14:17:19 1:S 10 Oct 2024 06:17:19.300 * Ready to accept connections tcp
2024-10-10 14:17:19 1:S 10 Oct 2024 06:17:19.301 * Connecting to MASTER redis-master:6379
2024-10-10 14:17:19 1:S 10 Oct 2024 06:17:19.302 * MASTER <-> REPLICA sync started
2024-10-10 14:17:20 1:S 10 Oct 2024 06:17:20.365 * Non blocking connect for SYNC fired the event.
2024-10-10 14:17:20 1:S 10 Oct 2024 06:17:20.365 * Master replied to PING, replication can continue...
2024-10-10 14:17:20 1:S 10 Oct 2024 06:17:20.365 * Trying a partial resynchronization (request 4e188d217af6bd6171cdd17616a7ea6e6d7cab7d:8471).
2024-10-10 14:17:20 1:S 10 Oct 2024 06:17:20.365 * Successful partial resynchronization with master.
2024-10-10 14:17:20 1:S 10 Oct 2024 06:17:20.365 * Master replication ID changed to 2655da29ca3bd66b6b355bea39ee70bbf3fec2f7
2024-10-10 14:17:20 1:S 10 Oct 2024 06:17:20.365 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.

以下是在再次启动之前删除了原有的从节点卷,但是未删除从节点卷,启动后从节点的日志:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
2024-10-10 14:30:40 1:C 10 Oct 2024 06:30:40.180 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
2024-10-10 14:30:40 1:C 10 Oct 2024 06:30:40.180 * Redis version=7.4.1, bits=64, commit=00000000, modified=0, pid=1, just started
2024-10-10 14:30:40 1:C 10 Oct 2024 06:30:40.180 * Configuration loaded
2024-10-10 14:30:40 1:S 10 Oct 2024 06:30:40.180 * monotonic clock: POSIX clock_gettime
2024-10-10 14:30:40 1:S 10 Oct 2024 06:30:40.181 # Failed to write PID file: Permission denied
2024-10-10 14:30:40 1:S 10 Oct 2024 06:30:40.181 * Running mode=standalone, port=6379.
2024-10-10 14:30:40 1:S 10 Oct 2024 06:30:40.181 * Server initialized
2024-10-10 14:30:40 1:S 10 Oct 2024 06:30:40.181 * Ready to accept connections tcp
2024-10-10 14:30:40 1:S 10 Oct 2024 06:30:40.182 * Connecting to MASTER redis-master:6379
2024-10-10 14:30:40 1:S 10 Oct 2024 06:30:40.183 * MASTER <-> REPLICA sync started
2024-10-10 14:30:40 1:S 10 Oct 2024 06:30:40.183 * Non blocking connect for SYNC fired the event.
2024-10-10 14:30:40 1:S 10 Oct 2024 06:30:40.183 * Master replied to PING, replication can continue...
2024-10-10 14:30:40 1:S 10 Oct 2024 06:30:40.183 * Partial resynchronization not possible (no cached master)
2024-10-10 14:30:45 1:S 10 Oct 2024 06:30:45.947 * Full resync from master: 47927bf35cb17ef2214e5a9c0ba0f99323b4f507:9584
2024-10-10 14:30:45 1:S 10 Oct 2024 06:30:45.948 * MASTER <-> REPLICA sync: receiving streamed RDB from master with EOF to disk
2024-10-10 14:30:45 1:S 10 Oct 2024 06:30:45.948 * MASTER <-> REPLICA sync: Flushing old data
2024-10-10 14:30:45 1:S 10 Oct 2024 06:30:45.948 * MASTER <-> REPLICA sync: Loading DB in memory
2024-10-10 14:30:45 1:S 10 Oct 2024 06:30:45.972 * Loading RDB produced by version 7.4.1
2024-10-10 14:30:45 1:S 10 Oct 2024 06:30:45.972 * RDB age 0 seconds
2024-10-10 14:30:45 1:S 10 Oct 2024 06:30:45.972 * RDB memory usage when created 0.98 Mb
2024-10-10 14:30:45 1:S 10 Oct 2024 06:30:45.972 * Done loading RDB, keys loaded: 2, keys expired: 0.
2024-10-10 14:30:45 1:S 10 Oct 2024 06:30:45.972 * MASTER <-> REPLICA sync: Finished with success

3、读写验证

3.1 从节点读写验证

使用 redis-cli 登录主节点可以读写成功,但是在从节点会写失败。

1
2
$ set "stringKey" "this is a string"
"READONLY You can't write against a read only replica."

可以通过修改从节点的 replica-read-only yes 配置来允许从节点写入。

3.2 Java 中读写验证

使用 Hutool 封装的 Redis 工具进行简单的读写验证。工具来自 hutool-db,其对 Jedis 做了简单的封装。

redis.setting 文件中配置 Redis 实例。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[master-slave-master]  
host = 127.0.0.1
port = 16379
password = yourpassword

[master-slave-slave-01]
host = 127.0.0.1
port = 16380
password = yourpassword

[master-slave-slave-02]
host = 127.0.0.1
port = 16381
password = yourpassword
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
@Slf4j  
public class MasterSlaveTest {

public String masterWrite() {
try (RedisDS redisDS = RedisDS.create("master-slave-master")) {

String key = RandomStringUtils.randomAlphanumeric(8);
log.info("key: {}", key);
redisDS.setStr(key, RandomStringUtils.randomAlphabetic(16));
return key;
}
}

public void masterWriteAndRead() {

String key = this.masterWrite();

try (RedisDS redisDS = RedisDS.create("master-slave-master")) {
String str = redisDS.getStr(key);
log.info("redis select result is: {}", str);
}
}

public void slaveWrite() {
try (RedisDS redisDS = RedisDS.create("master-slave-slave-01")) {

String key = RandomStringUtils.randomAlphanumeric(8);
log.info("key: {}", key);
redisDS.setStr(key, RandomStringUtils.randomAlphabetic(16));

String str = redisDS.getStr(key);
log.info("redis select result is: {}", str);
}
}

public void slaveRead(String key) {
try (RedisDS redisDS = RedisDS.create("master-slave-slave-02")) {
String str = redisDS.getStr(key);
log.info("redis select result is: {}", str);
}
}

}

在从节点中进行写入时,会出现如下异常:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[14:15:37.004][INFO ] cn.z2huo.demo.redis.hutool.masterslave.MasterSlaveTest:39 slaveWrite - key: yD29jcjU

redis.clients.jedis.exceptions.JedisDataException: READONLY You can't write against a read only replica.

at redis.clients.jedis.Protocol.processError(Protocol.java:105)
at redis.clients.jedis.Protocol.process(Protocol.java:162)
at redis.clients.jedis.Protocol.read(Protocol.java:221)
at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:351)
at redis.clients.jedis.Connection.getOne(Connection.java:333)
at redis.clients.jedis.Connection.executeCommand(Connection.java:138)
at redis.clients.jedis.Jedis.set(Jedis.java:4893)
at cn.hutool.db.nosql.redis.RedisDS.setStr(RedisDS.java:170)
at cn.z2huo.demo.redis.hutool.masterslave.MasterSlaveTest.slaveWrite(MasterSlaveTest.java:40)
at cn.z2huo.demo.redis.hutool.masterslave.MasterSlaveTestTest.slaveWrite(MasterSlaveTestTest.java:32)
at java.base/java.lang.reflect.Method.invoke(Method.java:580)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)

Process finished with exit code -1

四、哨兵模式

1、配置

1.1 Compose 文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
name: redis-sentinel-demo

volumes:
redis-master-volume:
name: redis-sentinel-demo-master-volume
redis-slave-volume-1:
name: redis-sentinel-demo-slave-1-volume
redis-slave-volume-2:
name: redis-sentinel-demo-slave-2-volume
redis-sentinel-volume-1:
name: redis-sentinel-demo-sentinel-1-volume
redis-sentinel-volume-2:
name: redis-sentinel-demo-sentinel-2-volume
redis-sentinel-volume-3:
name: redis-sentinel-demo-sentinel-3-volume

networks:
redis-sentinel-net:
name: redis-sentinel-net
driver: bridge
ipam:
driver: default
config:
- subnet: 172.21.0.0/24
ip_range: 172.21.0.0/24
gateway: 172.21.0.1

services:
redis-master:
image: ${image}
restart: ${restart}
container_name: redis-sentinel-master
hostname: redis-master
networks:
redis-sentinel-net:
ipv4_address: 172.21.0.2
ports:
- 26379:6379
volumes:
- redis-master-volume:/data
- ./redis-master.conf:/usr/local/etc/redis/redis.conf
command: ["sh", "-c", "usermod -aG root redis && redis-server /usr/local/etc/redis/redis.conf"]

redis-slave-01:
image: ${image}
restart: ${restart}
container_name: redis-sentinel-slave-01
hostname: redis-slave-01
networks:
redis-sentinel-net:
ipv4_address: 172.21.0.3
ports:
- 26380:6379
volumes:
- redis-slave-volume-1:/data
- ./redis-slave.conf:/usr/local/etc/redis/redis.conf
command: ["sh", "-c", "usermod -aG root redis && redis-server /usr/local/etc/redis/redis.conf"]

redis-slave-02:
image: ${image}
restart: ${restart}
container_name: redis-sentinel-slave-02
hostname: redis-slave-02
networks:
redis-sentinel-net:
ipv4_address: 172.21.0.4
ports:
- 26381:6379
volumes:
- redis-slave-volume-2:/data
- ./redis-slave.conf:/usr/local/etc/redis/redis.conf
command: ["sh", "-c", "usermod -aG root redis && redis-server /usr/local/etc/redis/redis.conf"]

redis-sentinel-01:
image: ${image}
restart: ${restart}
container_name: redis-sentinel-sentinel-01
hostname: redis-sentinel-01
networks:
redis-sentinel-net:
ipv4_address: 172.21.0.201
ports:
- 26001:26379
depends_on:
- redis-master
- redis-slave-01
- redis-slave-02
volumes:
- redis-sentinel-volume-2:/data
- ./redis-sentinel.conf:/usr/local/etc/redis/redis-sentinel.conf
command: ["sh", "-c", "usermod -aG root redis && redis-server /usr/local/etc/redis/redis-sentinel.conf --sentinel"]

redis-sentinel-02:
image: ${image}
restart: ${restart}
container_name: redis-sentinel-sentinel-02
hostname: redis-sentinel-02
networks:
redis-sentinel-net:
ipv4_address: 172.21.0.202
ports:
- 26002:26379
depends_on:
- redis-master
- redis-slave-01
- redis-slave-02
volumes:
- redis-sentinel-volume-2:/data
- ./redis-sentinel.conf:/usr/local/etc/redis/redis-sentinel.conf
command: ["sh", "-c", "usermod -aG root redis && redis-sentinel /usr/local/etc/redis/redis-sentinel.conf"]

redis-sentinel-03:
image: ${image}
restart: ${restart}
container_name: redis-sentinel-sentinel-03
hostname: redis-sentinel-03
networks:
redis-sentinel-net:
ipv4_address: 172.21.0.203
ports:
- 26003:26379
depends_on:
- redis-master
- redis-slave-01
- redis-slave-02
volumes:
- redis-sentinel-volume-3:/data
- ./redis-sentinel.conf:/usr/local/etc/redis/redis-sentinel.conf
command: ["sh", "-c", "usermod -aG root redis && redis-sentinel /usr/local/etc/redis/redis-sentinel.conf"]

Compose 文件中需要注意的地方:

  • 自定义容器 IP 。没有给各个容器自定义 IP 时,在 Redis 的配置中主节点、从节点的 IP 使用 hostname 属性指定的值时,在主节点宕机后,重新选举主节点时不生效,且仅限于 Windows 系统,macOS 系统上是没事的。Windows 上面使用的是 Docker Desktop,没有尝试 WSL 中的 Docker 是否有问题。
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    networks:
    redis-sentinel-net:
    name: redis-sentinel-net
    driver: bridge
    ipam:
    driver: default
    config:
    - subnet: 172.21.0.0/24
    ip_range: 172.21.0.0/24
    gateway: 172.21.0.1

    services:
    redis-master:
    networks:
    redis-sentinel-net:
    ipv4_address: 172.21.0.2
  • service 中定义的 command 属性,因为要执行多个命令,所以使用 sh -c
    • sentinel 启动时,可以使用 redis-sentinel 也可以使用 redis-server --sentinel

1.2 env 文件

1
2
image=redis:7.4-bookworm
restart=no

1.3 Redis 配置

1.3.1 主节点配置文件
1
2
3
4
5
6
7
# bind 127.0.0.1 -::1
bind 0.0.0.0

# requirepass foobared
requirepass yourpassword

save 3600 1 300 100 60 10000
1.3.2 从节点配置文件
1
2
3
4
5
6
7
8
9
10
11
12
# bind 127.0.0.1 -::1
bind 0.0.0.0

# requirepass foobared
requirepass yourpassword

save 3600 1 300 100 60 10000

# replicaof redis-master 6379
replicaof 172.21.0.2 6379

masterauth master-requirepass

从节点这里选择使用容器中定义的 hostname 属性或者容器自定义的 IP 均可。都是可以启动成功并且进行主从复制的。但是在 windows 系统中,在模拟主节点宕机时,从节点不能选举成为主节点,而使用自定义 IP 可以。暂时不知道为什么。

1.3.3 哨兵节点配置文件
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
sentinel monitor redis-master 172.21.0.2 6379 2

sentinel auth-pass redis-master yourpassword

sentinel down-after-milliseconds redis-master 3000

requirepass yourpassword

sentinel sentinel-pass yourpassword

sentinel parallel-syncs redis-master 1

sentinel failover-timeout redis-master 180000

SENTINEL resolve-hostnames yes

SENTINEL announce-hostnames yes

SENTINEL master-reboot-down-after-period redis-master 0

2、启动

主节点启动日志如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
2024-10-14 17:58:01 16:C 14 Oct 2024 09:58:01.329 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
2024-10-14 17:58:01 16:C 14 Oct 2024 09:58:01.329 * Redis version=7.4.1, bits=64, commit=00000000, modified=0, pid=16, just started
2024-10-14 17:58:01 16:C 14 Oct 2024 09:58:01.329 * Configuration loaded
2024-10-14 17:58:01 16:M 14 Oct 2024 09:58:01.329 * monotonic clock: POSIX clock_gettime
2024-10-14 17:58:01 16:M 14 Oct 2024 09:58:01.330 * Running mode=standalone, port=6379.
2024-10-14 17:58:01 16:M 14 Oct 2024 09:58:01.330 * Server initialized
2024-10-14 17:58:01 16:M 14 Oct 2024 09:58:01.330 * Loading RDB produced by version 7.4.1
2024-10-14 17:58:01 16:M 14 Oct 2024 09:58:01.330 * RDB age 197453 seconds
2024-10-14 17:58:01 16:M 14 Oct 2024 09:58:01.330 * RDB memory usage when created 1.06 Mb
2024-10-14 17:58:01 16:M 14 Oct 2024 09:58:01.330 * Done loading RDB, keys loaded: 0, keys expired: 0.
2024-10-14 17:58:01 16:M 14 Oct 2024 09:58:01.330 * DB loaded from disk: 0.000 seconds
2024-10-14 17:58:01 16:M 14 Oct 2024 09:58:01.330 * Ready to accept connections tcp
2024-10-14 17:58:02 16:M 14 Oct 2024 09:58:02.094 * Replica 172.21.0.4:6379 asks for synchronization
2024-10-14 17:58:02 16:M 14 Oct 2024 09:58:02.094 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for 'c9c7d9bec6f93681d73c36dacfd810430f7d3954', my replication IDs are 'a4c4b7ad56f8c6c962bb4ce20b62f5ccc133df22' and 'bce9594562a9218c76945ebe3ba6f62c4d8defdc')
2024-10-14 17:58:02 16:M 14 Oct 2024 09:58:02.094 * Delay next BGSAVE for diskless SYNC
2024-10-14 17:58:02 16:M 14 Oct 2024 09:58:02.222 * Replica 172.21.0.3:6379 asks for synchronization
2024-10-14 17:58:02 16:M 14 Oct 2024 09:58:02.222 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for 'c9c7d9bec6f93681d73c36dacfd810430f7d3954', my replication IDs are 'a4c4b7ad56f8c6c962bb4ce20b62f5ccc133df22' and 'bce9594562a9218c76945ebe3ba6f62c4d8defdc')
2024-10-14 17:58:02 16:M 14 Oct 2024 09:58:02.222 * Delay next BGSAVE for diskless SYNC
2024-10-14 17:58:07 16:M 14 Oct 2024 09:58:07.357 * Starting BGSAVE for SYNC with target: replicas sockets
2024-10-14 17:58:07 16:M 14 Oct 2024 09:58:07.357 * Background RDB transfer started by pid 22
2024-10-14 17:58:07 22:C 14 Oct 2024 09:58:07.358 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB
2024-10-14 17:58:07 16:M 14 Oct 2024 09:58:07.358 * Diskless rdb transfer, done reading from pipe, 2 replicas still up.
2024-10-14 17:58:07 16:M 14 Oct 2024 09:58:07.372 * Background RDB transfer terminated with success
2024-10-14 17:58:07 16:M 14 Oct 2024 09:58:07.372 * Streamed RDB transfer with replica 172.21.0.4:6379 succeeded (socket). Waiting for REPLCONF ACK from replica to enable streaming
2024-10-14 17:58:07 16:M 14 Oct 2024 09:58:07.372 * Synchronization with replica 172.21.0.4:6379 succeeded
2024-10-14 17:58:07 16:M 14 Oct 2024 09:58:07.372 * Streamed RDB transfer with replica 172.21.0.3:6379 succeeded (socket). Waiting for REPLCONF ACK from replica to enable streaming
2024-10-14 17:58:07 16:M 14 Oct 2024 09:58:07.372 * Synchronization with replica 172.21.0.3:6379 succeeded

从节点启动日志如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
2024-10-14 17:58:01 16:C 14 Oct 2024 09:58:01.215 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
2024-10-14 17:58:01 16:C 14 Oct 2024 09:58:01.215 * Redis version=7.4.1, bits=64, commit=00000000, modified=0, pid=16, just started
2024-10-14 17:58:01 16:C 14 Oct 2024 09:58:01.215 * Configuration loaded
2024-10-14 17:58:01 16:S 14 Oct 2024 09:58:01.216 * monotonic clock: POSIX clock_gettime
2024-10-14 17:58:01 16:S 14 Oct 2024 09:58:01.216 * Running mode=standalone, port=6379.
2024-10-14 17:58:01 16:S 14 Oct 2024 09:58:01.216 * Server initialized
2024-10-14 17:58:01 16:S 14 Oct 2024 09:58:01.216 * Loading RDB produced by version 7.4.1
2024-10-14 17:58:01 16:S 14 Oct 2024 09:58:01.216 * RDB age 801 seconds
2024-10-14 17:58:01 16:S 14 Oct 2024 09:58:01.216 * RDB memory usage when created 1.25 Mb
2024-10-14 17:58:01 16:S 14 Oct 2024 09:58:01.216 * Done loading RDB, keys loaded: 0, keys expired: 0.
2024-10-14 17:58:01 16:S 14 Oct 2024 09:58:01.216 * DB loaded from disk: 0.000 seconds
2024-10-14 17:58:01 16:S 14 Oct 2024 09:58:01.216 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
2024-10-14 17:58:01 16:S 14 Oct 2024 09:58:01.216 * Ready to accept connections tcp
2024-10-14 17:58:01 16:S 14 Oct 2024 09:58:01.218 * Connecting to MASTER 172.21.0.2:6379
2024-10-14 17:58:01 16:S 14 Oct 2024 09:58:01.218 * MASTER <-> REPLICA sync started
2024-10-14 17:58:01 16:S 14 Oct 2024 09:58:01.218 # Error condition on socket for SYNC: Connection refused
2024-10-14 17:58:02 16:S 14 Oct 2024 09:58:02.221 * Connecting to MASTER 172.21.0.2:6379
2024-10-14 17:58:02 16:S 14 Oct 2024 09:58:02.221 * MASTER <-> REPLICA sync started
2024-10-14 17:58:02 16:S 14 Oct 2024 09:58:02.221 * Non blocking connect for SYNC fired the event.
2024-10-14 17:58:02 16:S 14 Oct 2024 09:58:02.221 * Master replied to PING, replication can continue...
2024-10-14 17:58:02 16:S 14 Oct 2024 09:58:02.221 * Trying a partial resynchronization (request c9c7d9bec6f93681d73c36dacfd810430f7d3954:1012872).
2024-10-14 17:58:07 16:S 14 Oct 2024 09:58:07.357 * Full resync from master: a4c4b7ad56f8c6c962bb4ce20b62f5ccc133df22:1012871
2024-10-14 17:58:07 16:S 14 Oct 2024 09:58:07.358 * MASTER <-> REPLICA sync: receiving streamed RDB from master with EOF to disk
2024-10-14 17:58:07 16:S 14 Oct 2024 09:58:07.358 * Discarding previously cached master state.
2024-10-14 17:58:07 16:S 14 Oct 2024 09:58:07.358 * MASTER <-> REPLICA sync: Flushing old data
2024-10-14 17:58:07 16:S 14 Oct 2024 09:58:07.358 * MASTER <-> REPLICA sync: Loading DB in memory
2024-10-14 17:58:07 16:S 14 Oct 2024 09:58:07.377 * Loading RDB produced by version 7.4.1
2024-10-14 17:58:07 16:S 14 Oct 2024 09:58:07.377 * RDB age 0 seconds
2024-10-14 17:58:07 16:S 14 Oct 2024 09:58:07.377 * RDB memory usage when created 1.25 Mb
2024-10-14 17:58:07 16:S 14 Oct 2024 09:58:07.377 * Done loading RDB, keys loaded: 0, keys expired: 0.
2024-10-14 17:58:07 16:S 14 Oct 2024 09:58:07.377 * MASTER <-> REPLICA sync: Finished with success

哨兵节点启动日志如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
2024-10-14 17:58:02 16:X 14 Oct 2024 09:58:02.017 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
2024-10-14 17:58:02 16:X 14 Oct 2024 09:58:02.017 * Redis version=7.4.1, bits=64, commit=00000000, modified=0, pid=16, just started
2024-10-14 17:58:02 16:X 14 Oct 2024 09:58:02.017 * Configuration loaded
2024-10-14 17:58:02 16:X 14 Oct 2024 09:58:02.017 * monotonic clock: POSIX clock_gettime
2024-10-14 17:58:02 16:X 14 Oct 2024 09:58:02.017 * Running mode=sentinel, port=26379.
2024-10-14 17:58:02 16:X 14 Oct 2024 09:58:02.025 # Could not rename tmp config file (Device or resource busy)
2024-10-14 17:58:02 16:X 14 Oct 2024 09:58:02.025 # WARNING: Sentinel was not able to save the new configuration on disk!!!: Device or resource busy
2024-10-14 17:58:02 16:X 14 Oct 2024 09:58:02.025 * Sentinel ID is 82925708c40c814dadc6ead02a5aa8408526b5d3
2024-10-14 17:58:02 16:X 14 Oct 2024 09:58:02.025 # +monitor master redis-master 172.21.0.2 6379 quorum 2
2024-10-14 17:58:03 16:X 14 Oct 2024 09:58:03.827 * +sentinel sentinel 3ac66162d1056bbf9ac9a6bc54060ae7a141eb74 172.21.0.203 26379 @ redis-master 172.21.0.2 6379
2024-10-14 17:58:03 16:X 14 Oct 2024 09:58:03.839 # Could not rename tmp config file (Device or resource busy)
2024-10-14 17:58:03 16:X 14 Oct 2024 09:58:03.839 # WARNING: Sentinel was not able to save the new configuration on disk!!!: Device or resource busy
2024-10-14 17:58:03 16:X 14 Oct 2024 09:58:03.935 * +sentinel sentinel e75619eb9879f9f8c3c3188691eca1d81a0c472f 172.21.0.202 26379 @ redis-master 172.21.0.2 6379
2024-10-14 17:58:03 16:X 14 Oct 2024 09:58:03.947 # Could not rename tmp config file (Device or resource busy)
2024-10-14 17:58:03 16:X 14 Oct 2024 09:58:03.947 # WARNING: Sentinel was not able to save the new configuration on disk!!!: Device or resource busy
2024-10-14 17:58:12 16:X 14 Oct 2024 09:58:12.054 * +slave slave 172.21.0.4:6379 172.21.0.4 6379 @ redis-master 172.21.0.2 6379
2024-10-14 17:58:12 16:X 14 Oct 2024 09:58:12.063 # Could not rename tmp config file (Device or resource busy)
2024-10-14 17:58:12 16:X 14 Oct 2024 09:58:12.063 # WARNING: Sentinel was not able to save the new configuration on disk!!!: Device or resource busy
2024-10-14 17:58:12 16:X 14 Oct 2024 09:58:12.063 * +slave slave 172.21.0.3:6379 172.21.0.3 6379 @ redis-master 172.21.0.2 6379
2024-10-14 17:58:12 16:X 14 Oct 2024 09:58:12.071 # Could not rename tmp config file (Device or resource busy)
2024-10-14 17:58:12 16:X 14 Oct 2024 09:58:12.071 # WARNING: Sentinel was not able to save the new configuration on disk!!!: Device or resource busy

3、重新选举主节点验证

1
2
3
4
5
6
7
8
9
10
11
12
13
$ docker exec -it d34a6e272cafddc95ea6fb7a0c9e01c247b21eefbb347bb1a45bfdd6fb148ecc /bin/bash

$ redis-cli -h 127.0.0.1 -p 26379 -a yourpassword

$ info sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_tilt_since_seconds:-1
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=redis-master,status=ok,address=172.21.0.2:6379,slaves=2,sentinels=3

上面为正常启动所有容器后,哨兵节点的信息,可以看到,现在主节点的状态和主节点的 IP 以及主节点、从节点和哨兵节点的数量。

之后,模拟主节点宕机,关闭主节点容器,重新查看哨兵节点信息,内容如下:

1
2
3
4
5
6
7
8
9
$ info sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_tilt_since_seconds:-1
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=redis-master,status=ok,address=172.21.0.3:6379,slaves=2,sentinels=3

可以看到主节点完成了切换。由 172.21.0.2 切换为 172.21.0.3

在原主节点宕机到完成主节点切换过程中,哨兵节点日志输出如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
2024-10-14 18:00:01 16:X 14 Oct 2024 10:00:01.158 # Could not rename tmp config file (Device or resource busy)
2024-10-14 18:00:01 16:X 14 Oct 2024 10:00:01.158 # WARNING: Sentinel was not able to save the new configuration on disk!!!: Device or resource busy
2024-10-14 18:00:01 16:X 14 Oct 2024 10:00:01.158 # +new-epoch 1
2024-10-14 18:00:01 16:X 14 Oct 2024 10:00:01.166 # Could not rename tmp config file (Device or resource busy)
2024-10-14 18:00:01 16:X 14 Oct 2024 10:00:01.166 # WARNING: Sentinel was not able to save the new configuration on disk!!!: Device or resource busy
2024-10-14 18:00:01 16:X 14 Oct 2024 10:00:01.166 # +vote-for-leader e75619eb9879f9f8c3c3188691eca1d81a0c472f 1
2024-10-14 18:00:02 16:X 14 Oct 2024 10:00:02.019 # +sdown master redis-master 172.21.0.2 6379
2024-10-14 18:00:02 16:X 14 Oct 2024 10:00:02.102 # +odown master redis-master 172.21.0.2 6379 #quorum 3/2
2024-10-14 18:00:02 16:X 14 Oct 2024 10:00:02.102 * Next failover delay: I will not start a failover before Mon Oct 14 10:06:01 2024
2024-10-14 18:00:02 16:X 14 Oct 2024 10:00:02.210 # +config-update-from sentinel e75619eb9879f9f8c3c3188691eca1d81a0c472f 172.21.0.202 26379 @ redis-master 172.21.0.2 6379
2024-10-14 18:00:02 16:X 14 Oct 2024 10:00:02.210 # +switch-master redis-master 172.21.0.2 6379 172.21.0.3 6379
2024-10-14 18:00:02 16:X 14 Oct 2024 10:00:02.210 * +slave slave 172.21.0.4:6379 172.21.0.4 6379 @ redis-master 172.21.0.3 6379
2024-10-14 18:00:02 16:X 14 Oct 2024 10:00:02.210 * +slave slave 172.21.0.2:6379 172.21.0.2 6379 @ redis-master 172.21.0.3 6379
2024-10-14 18:00:02 16:X 14 Oct 2024 10:00:02.219 # Could not rename tmp config file (Device or resource busy)
2024-10-14 18:00:02 16:X 14 Oct 2024 10:00:02.219 # WARNING: Sentinel was not able to save the new configuration on disk!!!: Device or resource busy
2024-10-14 18:00:05 16:X 14 Oct 2024 10:00:05.289 # +sdown slave 172.21.0.2:6379 172.21.0.2 6379 @ redis-master 172.21.0.3 6379
2024-10-14 18:35:25 16:X 14 Oct 2024 10:35:25.222 # +tilt #tilt mode entered
2024-10-14 18:35:55 16:X 14 Oct 2024 10:35:55.296 # -tilt #tilt mode exited

master 节点的状态有 oksdownodown

五、集群部署

使用集群部署模式,启用 9 个 Redis 示例,其中 3 个 master 节点,6 个 slave 节点,一个 master 节点对应两个 slave 节点。

1、配置

1.1 Compose 文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
name: redis-cluster-demo

volumes:
redis-01-volume:
name: redis-cluster-demo-01-volume
redis-02-volume:
name: redis-cluster-demo-02-volume
redis-03-volume:
name: redis-cluster-demo-03-volume
redis-04-volume:
name: redis-cluster-demo-04-volume
redis-05-volume:
name: redis-cluster-demo-05-volume
redis-06-volume:
name: redis-cluster-demo-06-volume
redis-07-volume:
name: redis-cluster-demo-07-volume
redis-08-volume:
name: redis-cluster-demo-08-volume
redis-09-volume:
name: redis-cluster-demo-09-volume

networks:
redis-cluster-net:
name: redis-cluster-net
driver: bridge
ipam:
driver: default
config:
- subnet: 172.92.0.0/24
ip_range: 172.92.0.0/24
gateway: 172.92.0.1

services:
redis-01:
image: ${image}
restart: ${restart}
container_name: redis-cluster-node-01
hostname: redis-01
networks:
redis-cluster-net:
ipv4_address: 172.92.0.2
ports:
- 6320:6379
- 6330:16379
volumes:
- redis-01-volume:/data
- ./redis-node-1.conf:/usr/local/etc/redis/redis.conf
command: ["sh", "-c", "usermod -aG root redis && redis-server /usr/local/etc/redis/redis.conf"]

redis-02:
image: ${image}
restart: ${restart}
container_name: redis-cluster-node-02
hostname: redis-02
networks:
redis-cluster-net:
ipv4_address: 172.92.0.3
ports:
- 6321:6379
- 6331:16379
volumes:
- redis-02-volume:/data
- ./redis-node-2.conf:/usr/local/etc/redis/redis.conf
command: ["sh", "-c", "usermod -aG root redis && redis-server /usr/local/etc/redis/redis.conf"]

# ......
  • 上面的配置中,只写了两个节点,其余的七个节点一次类推,只有一些文件还有端口需要的改变。
  • 网络模式使用的使 bridge 模式,Compose 中定义了子网,容器通过网络地址转换与部署机器通信。
  • Redis 配置文件,每一个节点都不一样,其中有些配置各个节点都需要单独指定。

1.2 env 文件

1
2
image=z2huo/redis-dev:7.4.1-bookworm
restart=no

镜像使用自定义的镜像。自定义的镜像中包括一些查看网络和进程的工具。

1.3 Redis 配置

Redis 配置集中,关于集群需要修改的配置有:

1
cluster-enabled yes

上面的配置会开启集群模式,因为是在 Docker 中部署的,并且部署方式使用的是桥接模式,还需要使用如下三个配置:

1
2
3
cluster-announce-ip <ip>
cluster-announce-port <port>
cluster-announce-bus-port <bus-port>
  • cluster-announce-ip:集群节点的公共 IP 地址。因为是在 Docker 中部署的,配置成容器部署机器的 IP
  • cluster-announce-tls-port:节点用于 TLS 连接的客户端端口,Docker 容器暴露给部署机器的端口
  • cluster-announce-bus-port:节点的集群消息总线端口,Docker 容器暴露给部署机器的集群总线接口,cluster-port 配置

Redis 实例节点 1 和节点 2 的上述三个配置的取值如下:

1
2
3
4
5
6
7
8
9
# 节点 1
cluster-announce-ip 192.168.0.103
cluster-announce-port 6320
cluster-announce-bus-port 6330

# 节点 2
cluster-announce-ip 192.168.0.103
cluster-announce-port 6321
cluster-announce-bus-port 6331

上面的 192.168.0.103 为部署机器在所在局域网中的 IP 地址。

2、启动及验证

执行 docker compose up -d 之后,容器打印日志如下:

1
2
3
4
5
6
7
8
9
2024-10-18 01:11:24 16:C 17 Oct 2024 17:11:24.542 # WARNING: Changing databases number from 16 to 1 since we are in cluster mode
2024-10-18 01:11:24 16:C 17 Oct 2024 17:11:24.543 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
2024-10-18 01:11:24 16:C 17 Oct 2024 17:11:24.543 * Redis version=7.4.1, bits=64, commit=00000000, modified=0, pid=16, just started
2024-10-18 01:11:24 16:C 17 Oct 2024 17:11:24.543 * Configuration loaded
2024-10-18 01:11:24 16:M 17 Oct 2024 17:11:24.543 * monotonic clock: POSIX clock_gettime
2024-10-18 01:11:24 16:M 17 Oct 2024 17:11:24.544 * Running mode=cluster, port=6379.
2024-10-18 01:11:24 16:M 17 Oct 2024 17:11:24.544 * No cluster configuration found, I'm 616458ae2b23db9fb5070da0bb4bfab841800f31
2024-10-18 01:11:24 16:M 17 Oct 2024 17:11:24.547 * Server initialized
2024-10-18 01:11:24 16:M 17 Oct 2024 17:11:24.547 * Ready to accept connections tcp

进入容器中,查看 redis 进程,发现 redis-server 进程后有信息 [cluster]

1
2
3
4
5
6
$ ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 17:11 ? 00:00:00 sh -c usermod -aG root redis && redis-server /usr/local/etc/redi
root 16 1 0 17:11 ? 00:00:00 redis-server 0.0.0.0:6379 [cluster]
root 23 0 0 17:11 pts/0 00:00:00 /bin/bash
root 29 23 0 17:11 pts/0 00:00:00 ps -ef

查看节点文件 /data/nodes-6379.conf

1
2
3
$ cat nodes-6379.conf
616458ae2b23db9fb5070da0bb4bfab841800f31 :0@0,,tls-port=0,shard-id=47b02087ea475a99443be252282038f6298ca388 myself,master - 0 0 0 connected
vars currentEpoch 0 lastVoteEpoch 0

执行 reds-cli 命令来创建集群:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
$ redis-cli -h 127.0.0.1 -p 6379 -a z2huo@2024 \
--cluster create 192.168.0.103:6320 192.168.0.103:6321 192.168.0.103:6322 192.168.0.103:6323 192.168.0.103:6324 192.168.0.103:6325 192.168.0.103:6326 192.168.0.103:6327 192.168.0.103:6328 \
--cluster-replicas 2
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
>>> Performing hash slots allocation on 9 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
Adding replica 192.168.0.103:6324 to 192.168.0.103:6320
Adding replica 192.168.0.103:6325 to 192.168.0.103:6320
Adding replica 192.168.0.103:6326 to 192.168.0.103:6321
Adding replica 192.168.0.103:6327 to 192.168.0.103:6321
Adding replica 192.168.0.103:6328 to 192.168.0.103:6322
Adding replica 192.168.0.103:6323 to 192.168.0.103:6322
>>> Trying to optimize slaves allocation for anti-affinity
[WARNING] Some slaves are in the same host as their master
M: 616458ae2b23db9fb5070da0bb4bfab841800f31 192.168.0.103:6320
slots:[0-5460] (5461 slots) master
M: ba294fe2041d000b5663b20ab0316da96ae403bf 192.168.0.103:6321
slots:[5461-10922] (5462 slots) master
M: 8a95921f1a3c344b49176b1070b1084f22f9fe6f 192.168.0.103:6322
slots:[10923-16383] (5461 slots) master
S: 5dab82fb367c00a2f338bbc96f35ac9ecccc401a 192.168.0.103:6323
replicates 616458ae2b23db9fb5070da0bb4bfab841800f31
S: 86fcf35973686e9b0d73b84da5c06a4d356a6aca 192.168.0.103:6324
replicates 8a95921f1a3c344b49176b1070b1084f22f9fe6f
S: ad79649ca4cc9222cee027be59812477f6e736ab 192.168.0.103:6325
replicates ba294fe2041d000b5663b20ab0316da96ae403bf
S: 9c15644cc327af377abb6db6f0cd6595f322ef9f 192.168.0.103:6326
replicates ba294fe2041d000b5663b20ab0316da96ae403bf
S: ee5c7f165b1bc578371424724e0bfa35815aaa3c 192.168.0.103:6327
replicates 8a95921f1a3c344b49176b1070b1084f22f9fe6f
S: 7203cc4f0874ecbb81025c30a0c9336b4f10df8d 192.168.0.103:6328
replicates 616458ae2b23db9fb5070da0bb4bfab841800f31
Can I set the above configuration? (type 'yes' to accept): yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join

>>> Performing Cluster Check (using node 192.168.0.103:6320)
M: 616458ae2b23db9fb5070da0bb4bfab841800f31 192.168.0.103:6320
slots:[0-5460] (5461 slots) master
2 additional replica(s)
S: 7203cc4f0874ecbb81025c30a0c9336b4f10df8d 192.168.0.103:6328
slots: (0 slots) slave
replicates 616458ae2b23db9fb5070da0bb4bfab841800f31
M: 8a95921f1a3c344b49176b1070b1084f22f9fe6f 192.168.0.103:6322
slots:[10923-16383] (5461 slots) master
2 additional replica(s)
S: ee5c7f165b1bc578371424724e0bfa35815aaa3c 192.168.0.103:6327
slots: (0 slots) slave
replicates 8a95921f1a3c344b49176b1070b1084f22f9fe6f
S: 5dab82fb367c00a2f338bbc96f35ac9ecccc401a 192.168.0.103:6323
slots: (0 slots) slave
replicates 616458ae2b23db9fb5070da0bb4bfab841800f31
M: ba294fe2041d000b5663b20ab0316da96ae403bf 192.168.0.103:6321
slots:[5461-10922] (5462 slots) master
2 additional replica(s)
S: 86fcf35973686e9b0d73b84da5c06a4d356a6aca 192.168.0.103:6324
slots: (0 slots) slave
replicates 8a95921f1a3c344b49176b1070b1084f22f9fe6f
S: ad79649ca4cc9222cee027be59812477f6e736ab 192.168.0.103:6325
slots: (0 slots) slave
replicates ba294fe2041d000b5663b20ab0316da96ae403bf
S: 9c15644cc327af377abb6db6f0cd6595f322ef9f 192.168.0.103:6326
slots: (0 slots) slave
replicates ba294fe2041d000b5663b20ab0316da96ae403bf
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

使用 redis-cli 命令的 --cluster-replicas 选项来指定副本数量。

执行 redis-cli 创建集群成功后再次查看 /data/nodes-6379.conf 文件:

1
2
3
4
5
6
7
8
9
10
11
$ cat nodes-6379.conf
7203cc4f0874ecbb81025c30a0c9336b4f10df8d 192.168.0.103:6328@6338,,tls-port=0,shard-id=47b02087ea475a99443be252282038f6298ca388 slave 616458ae2b23db9fb5070da0bb4bfab841800f31 0 1729185148000 1 connected
8a95921f1a3c344b49176b1070b1084f22f9fe6f 192.168.0.103:6322@6332,,tls-port=0,shard-id=d6d9be9f6c00e617c575e9fe7653a606e1382db9 master - 0 1729185149180 3 connected 10923-16383
ee5c7f165b1bc578371424724e0bfa35815aaa3c 192.168.0.103:6327@6337,,tls-port=0,shard-id=1f007dd8d56b32bd3bf8a785a9017e025b8d1fac master - 0 1729185149000 0 connected
5dab82fb367c00a2f338bbc96f35ac9ecccc401a 192.168.0.103:6323@6333,,tls-port=0,shard-id=47b02087ea475a99443be252282038f6298ca388 slave 616458ae2b23db9fb5070da0bb4bfab841800f31 0 1729185150000 1 connected
ba294fe2041d000b5663b20ab0316da96ae403bf 192.168.0.103:6321@6331,,tls-port=0,shard-id=def9b383c18f2779a2867ebb2e48a7a2f043411f master - 0 1729185148170 2 connected 5461-10922
86fcf35973686e9b0d73b84da5c06a4d356a6aca 192.168.0.103:6324@6334,,tls-port=0,shard-id=f36d165b2f4e61894e1d431bb96e8b1e738dd97a master - 0 1729185148000 5 connected
616458ae2b23db9fb5070da0bb4bfab841800f31 192.168.0.103:6320@6330,,tls-port=0,shard-id=47b02087ea475a99443be252282038f6298ca388 myself,master - 0 0 1 connected 0-5460
ad79649ca4cc9222cee027be59812477f6e736ab 192.168.0.103:6325@6335,,tls-port=0,shard-id=def9b383c18f2779a2867ebb2e48a7a2f043411f slave ba294fe2041d000b5663b20ab0316da96ae403bf 0 1729185149000 2 connected
9c15644cc327af377abb6db6f0cd6595f322ef9f 192.168.0.103:6326@6336,,tls-port=0,shard-id=def9b383c18f2779a2867ebb2e48a7a2f043411f slave ba294fe2041d000b5663b20ab0316da96ae403bf 0 1729185150186 2 connected
vars currentEpoch 9 lastVoteEpoch 0

创建集群成功后,使用 redis-cli 进入 Redis CLI 中,通过 cluster infocluster nodes 两个命令来查看集群信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
$ cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:9
cluster_size:3
cluster_current_epoch:9
cluster_my_epoch:1
cluster_stats_messages_ping_sent:410
cluster_stats_messages_pong_sent:403
cluster_stats_messages_sent:813
cluster_stats_messages_ping_received:395
cluster_stats_messages_pong_received:410
cluster_stats_messages_meet_received:8
cluster_stats_messages_received:813
total_cluster_links_buffer_limit_exceeded:0
1
2
3
4
5
6
7
8
9
10
$ cluster nodes
7203cc4f0874ecbb81025c30a0c9336b4f10df8d 192.168.0.103:6328@6338 slave 616458ae2b23db9fb5070da0bb4bfab841800f31 0 1729185557000 1 connected
8a95921f1a3c344b49176b1070b1084f22f9fe6f 192.168.0.103:6322@6332 master - 0 1729185556000 3 connected 10923-16383
ee5c7f165b1bc578371424724e0bfa35815aaa3c 192.168.0.103:6327@6337 slave 8a95921f1a3c344b49176b1070b1084f22f9fe6f 0 1729185555665 3 connected
5dab82fb367c00a2f338bbc96f35ac9ecccc401a 192.168.0.103:6323@6333 slave 616458ae2b23db9fb5070da0bb4bfab841800f31 0 1729185558795 1 connected
ba294fe2041d000b5663b20ab0316da96ae403bf 192.168.0.103:6321@6331 master - 0 1729185559819 2 connected 5461-10922
86fcf35973686e9b0d73b84da5c06a4d356a6aca 192.168.0.103:6324@6334 slave 8a95921f1a3c344b49176b1070b1084f22f9fe6f 0 1729185557776 3 connected
616458ae2b23db9fb5070da0bb4bfab841800f31 192.168.0.103:6320@6330 myself,master - 0 0 1 connected 0-5460
ad79649ca4cc9222cee027be59812477f6e736ab 192.168.0.103:6325@6335 slave ba294fe2041d000b5663b20ab0316da96ae403bf 0 1729185557000 2 connected
9c15644cc327af377abb6db6f0cd6595f322ef9f 192.168.0.103:6326@6336 slave ba294fe2041d000b5663b20ab0316da96ae403bf 0 1729185558000 2 connected

创建集群成功后,节点 1 日志如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
2024-10-18 01:12:27 16:M 17 Oct 2024 17:12:27.605 * configEpoch set to 1 via CLUSTER SET-CONFIG-EPOCH
2024-10-18 01:12:28 16:M 17 Oct 2024 17:12:28.637 * Replica 192.168.65.1:6379 asks for synchronization
2024-10-18 01:12:28 16:M 17 Oct 2024 17:12:28.637 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '67b3e36c7e56d0060096aa0c7df45d90588c1891', my replication IDs are '5134795e887fceda73a6c40a49349e94ba8c872c' and '0000000000000000000000000000000000000000')
2024-10-18 01:12:28 16:M 17 Oct 2024 17:12:28.637 * Replication backlog created, my new replication IDs are '0932a6740161a99749ae4c5d1b5ec3d91bf00e30' and '0000000000000000000000000000000000000000'
2024-10-18 01:12:28 16:M 17 Oct 2024 17:12:28.637 * Delay next BGSAVE for diskless SYNC
2024-10-18 01:12:28 16:M 17 Oct 2024 17:12:28.651 * Replica 192.168.65.1:6379 asks for synchronization
2024-10-18 01:12:28 16:M 17 Oct 2024 17:12:28.651 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for 'fccb064385e684753d063dbed416ea4309b7141b', my replication IDs are '0932a6740161a99749ae4c5d1b5ec3d91bf00e30' and '0000000000000000000000000000000000000000')
2024-10-18 01:12:28 16:M 17 Oct 2024 17:12:28.651 * Delay next BGSAVE for diskless SYNC
2024-10-18 01:12:28 16:M 17 Oct 2024 17:12:28.960 * Node ad79649ca4cc9222cee027be59812477f6e736ab () is no longer master of shard 342bd7d99314e6f6bead1cdecfde9af5cf75567c; removed all 0 slot(s) it used to own
2024-10-18 01:12:28 16:M 17 Oct 2024 17:12:28.960 * Node ad79649ca4cc9222cee027be59812477f6e736ab () is now part of shard def9b383c18f2779a2867ebb2e48a7a2f043411f
2024-10-18 01:12:30 16:M 17 Oct 2024 17:12:30.186 * Node 9c15644cc327af377abb6db6f0cd6595f322ef9f () is no longer master of shard 45bcbb94a91c3fd1d867a60f3b8cb665ee9a1fab; removed all 0 slot(s) it used to own
2024-10-18 01:12:30 16:M 17 Oct 2024 17:12:30.186 * Node 9c15644cc327af377abb6db6f0cd6595f322ef9f () is now part of shard def9b383c18f2779a2867ebb2e48a7a2f043411f
2024-10-18 01:12:30 16:M 17 Oct 2024 17:12:30.187 * Node 7203cc4f0874ecbb81025c30a0c9336b4f10df8d () is no longer master of shard 44827437b7474050f985f2e7562b5b0f557526f8; removed all 0 slot(s) it used to own
2024-10-18 01:12:30 16:M 17 Oct 2024 17:12:30.187 * Node 7203cc4f0874ecbb81025c30a0c9336b4f10df8d () is now part of shard 47b02087ea475a99443be252282038f6298ca388
2024-10-18 01:12:30 16:M 17 Oct 2024 17:12:30.998 * Node 5dab82fb367c00a2f338bbc96f35ac9ecccc401a () is no longer master of shard 0b52e8e41d1497457e4a69f285826b3a9cdbec32; removed all 0 slot(s) it used to own
2024-10-18 01:12:30 16:M 17 Oct 2024 17:12:30.999 * Node 5dab82fb367c00a2f338bbc96f35ac9ecccc401a () is now part of shard 47b02087ea475a99443be252282038f6298ca388
2024-10-18 01:12:31 16:M 17 Oct 2024 17:12:31.101 * Node 86fcf35973686e9b0d73b84da5c06a4d356a6aca () is no longer master of shard f36d165b2f4e61894e1d431bb96e8b1e738dd97a; removed all 0 slot(s) it used to own
2024-10-18 01:12:31 16:M 17 Oct 2024 17:12:31.101 * Node 86fcf35973686e9b0d73b84da5c06a4d356a6aca () is now part of shard d6d9be9f6c00e617c575e9fe7653a606e1382db9
2024-10-18 01:12:32 16:M 17 Oct 2024 17:12:32.630 * Cluster state changed: ok
2024-10-18 01:12:33 16:M 17 Oct 2024 17:12:33.249 * Node ee5c7f165b1bc578371424724e0bfa35815aaa3c () is no longer master of shard 1f007dd8d56b32bd3bf8a785a9017e025b8d1fac; removed all 0 slot(s) it used to own
2024-10-18 01:12:33 16:M 17 Oct 2024 17:12:33.249 * Node ee5c7f165b1bc578371424724e0bfa35815aaa3c () is now part of shard d6d9be9f6c00e617c575e9fe7653a606e1382db9
2024-10-18 01:12:33 16:M 17 Oct 2024 17:12:33.346 * Starting BGSAVE for SYNC with target: replicas sockets
2024-10-18 01:12:33 16:M 17 Oct 2024 17:12:33.347 * Background RDB transfer started by pid 33
2024-10-18 01:12:33 33:C 17 Oct 2024 17:12:33.348 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB
2024-10-18 01:12:33 16:M 17 Oct 2024 17:12:33.348 * Diskless rdb transfer, done reading from pipe, 2 replicas still up.
2024-10-18 01:12:33 16:M 17 Oct 2024 17:12:33.356 * Background RDB transfer terminated with success
2024-10-18 01:12:33 16:M 17 Oct 2024 17:12:33.357 * Streamed RDB transfer with replica 192.168.65.1:6379 succeeded (socket). Waiting for REPLCONF ACK from replica to enable streaming
2024-10-18 01:12:33 16:M 17 Oct 2024 17:12:33.357 * Synchronization with replica 192.168.65.1:6379 succeeded
2024-10-18 01:12:33 16:M 17 Oct 2024 17:12:33.357 * Streamed RDB transfer with replica 192.168.65.1:6379 succeeded (socket). Waiting for REPLCONF ACK from replica to enable streaming
2024-10-18 01:12:33 16:M 17 Oct 2024 17:12:33.357 * Synchronization with replica 192.168.65.1:6379 succeeded

节点 2 日志如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
2024-10-18 01:12:27 15:M 17 Oct 2024 17:12:27.606 * configEpoch set to 2 via CLUSTER SET-CONFIG-EPOCH
2024-10-18 01:12:28 15:M 17 Oct 2024 17:12:28.649 * Replica 192.168.65.1:6379 asks for synchronization
2024-10-18 01:12:28 15:M 17 Oct 2024 17:12:28.649 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for 'c3c729a09166a36559fda6ca4cd91e04e547fd06', my replication IDs are 'bff3cfc5fc372b8abe6749762c74bf6a0c88eb15' and '0000000000000000000000000000000000000000')
2024-10-18 01:12:28 15:M 17 Oct 2024 17:12:28.649 * Replication backlog created, my new replication IDs are '7861dcb3043324385b8e60665d1c1efe72a0b31b' and '0000000000000000000000000000000000000000'
2024-10-18 01:12:28 15:M 17 Oct 2024 17:12:28.649 * Delay next BGSAVE for diskless SYNC
2024-10-18 01:12:28 15:M 17 Oct 2024 17:12:28.650 * Replica 192.168.65.1:6379 asks for synchronization
2024-10-18 01:12:28 15:M 17 Oct 2024 17:12:28.650 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '6da5214c7af72d8d2b9ad7a6c0acccfee7b1a965', my replication IDs are '7861dcb3043324385b8e60665d1c1efe72a0b31b' and '0000000000000000000000000000000000000000')
2024-10-18 01:12:28 15:M 17 Oct 2024 17:12:28.650 * Delay next BGSAVE for diskless SYNC
2024-10-18 01:12:28 15:M 17 Oct 2024 17:12:28.960 * Node ad79649ca4cc9222cee027be59812477f6e736ab () is no longer master of shard 342bd7d99314e6f6bead1cdecfde9af5cf75567c; removed all 0 slot(s) it used to own
2024-10-18 01:12:28 15:M 17 Oct 2024 17:12:28.960 * Node ad79649ca4cc9222cee027be59812477f6e736ab () is now part of shard def9b383c18f2779a2867ebb2e48a7a2f043411f
2024-10-18 01:12:29 15:M 17 Oct 2024 17:12:29.061 * Node 86fcf35973686e9b0d73b84da5c06a4d356a6aca () is no longer master of shard f36d165b2f4e61894e1d431bb96e8b1e738dd97a; removed all 0 slot(s) it used to own
2024-10-18 01:12:29 15:M 17 Oct 2024 17:12:29.061 * Node 86fcf35973686e9b0d73b84da5c06a4d356a6aca () is now part of shard d6d9be9f6c00e617c575e9fe7653a606e1382db9
2024-10-18 01:12:31 15:M 17 Oct 2024 17:12:31.136 * Node 9c15644cc327af377abb6db6f0cd6595f322ef9f () is no longer master of shard 45bcbb94a91c3fd1d867a60f3b8cb665ee9a1fab; removed all 0 slot(s) it used to own
2024-10-18 01:12:31 15:M 17 Oct 2024 17:12:31.136 * Node 9c15644cc327af377abb6db6f0cd6595f322ef9f () is now part of shard def9b383c18f2779a2867ebb2e48a7a2f043411f
2024-10-18 01:12:31 15:M 17 Oct 2024 17:12:31.202 * Node 7203cc4f0874ecbb81025c30a0c9336b4f10df8d () is no longer master of shard 4194f3fb0e53f906a7b94ec41708f69b194713f4; removed all 0 slot(s) it used to own
2024-10-18 01:12:31 15:M 17 Oct 2024 17:12:31.202 * Node 7203cc4f0874ecbb81025c30a0c9336b4f10df8d () is now part of shard 47b02087ea475a99443be252282038f6298ca388
2024-10-18 01:12:32 15:M 17 Oct 2024 17:12:32.226 * Node ee5c7f165b1bc578371424724e0bfa35815aaa3c () is no longer master of shard 0ef08df161dec18db1f2b9e5f34020e015e32088; removed all 0 slot(s) it used to own
2024-10-18 01:12:32 15:M 17 Oct 2024 17:12:32.226 * Node ee5c7f165b1bc578371424724e0bfa35815aaa3c () is now part of shard d6d9be9f6c00e617c575e9fe7653a606e1382db9
2024-10-18 01:12:32 15:M 17 Oct 2024 17:12:32.630 * Cluster state changed: ok
2024-10-18 01:12:33 15:M 17 Oct 2024 17:12:33.139 * Starting BGSAVE for SYNC with target: replicas sockets
2024-10-18 01:12:33 15:M 17 Oct 2024 17:12:33.140 * Background RDB transfer started by pid 22
2024-10-18 01:12:33 22:C 17 Oct 2024 17:12:33.143 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB
2024-10-18 01:12:33 15:M 17 Oct 2024 17:12:33.143 * Diskless rdb transfer, done reading from pipe, 2 replicas still up.
2024-10-18 01:12:33 15:M 17 Oct 2024 17:12:33.153 * Background RDB transfer terminated with success
2024-10-18 01:12:33 15:M 17 Oct 2024 17:12:33.153 * Streamed RDB transfer with replica 192.168.65.1:6379 succeeded (socket). Waiting for REPLCONF ACK from replica to enable streaming
2024-10-18 01:12:33 15:M 17 Oct 2024 17:12:33.153 * Synchronization with replica 192.168.65.1:6379 succeeded
2024-10-18 01:12:33 15:M 17 Oct 2024 17:12:33.153 * Streamed RDB transfer with replica 192.168.65.1:6379 succeeded (socket). Waiting for REPLCONF ACK from replica to enable streaming
2024-10-18 01:12:33 15:M 17 Oct 2024 17:12:33.153 * Synchronization with replica 192.168.65.1:6379 succeeded
2024-10-18 01:12:35 15:M 17 Oct 2024 17:12:35.090 * Node 5dab82fb367c00a2f338bbc96f35ac9ecccc401a () is no longer master of shard 0b52e8e41d1497457e4a69f285826b3a9cdbec32; removed all 0 slot(s) it used to own
2024-10-18 01:12:35 15:M 17 Oct 2024 17:12:35.090 * Node 5dab82fb367c00a2f338bbc96f35ac9ecccc401a () is now part of shard 47b02087ea475a99443be252282038f6298ca388

3、创建集群时集群主机名和端口的选择

创建集群时,需要指定各个 Redis 实例节点的 IP 和端口。在 Docker 中使用 bridge 网络部署,Compose 中指定了各个节点的 hostname 属性,所以可以使用 redis-01 主机名进行节点间的访问。这种情况创建集群时,我没有配置集群配置中的 cluster-announce-ipcluster-announce-portcluster-announce-bus-port 三个属性,可以创建集群成功,但是在部署主机上使用 DataGrip 或者使用 Redis Insight 都连接不上。

1
2
3
$ redis-cli -h 127.0.0.1 -p 6379 -a z2huo@2024 \
--cluster create redis-01:6379 redis-02:6379 redis-03:6379 redis-04:6379 redis-05:6379 redis-06:6379 redis-07:6379 redis-08:6379 redis-09:6379 \
--cluster-replicas 2

配置了 cluster-announce-ipcluster-announce-portcluster-announce-bus-port 三个属性之后,创建集群时,需要使用三个配置中的 IP 和端口号,如下所示,此时集群可以创建成功,并且部署主机也可以成功访问到 Docker 中的 Redis 集群。

1
2
3
$ redis-cli -h 127.0.0.1 -p 6379 -a z2huo@2024 \
--cluster create 192.168.0.103:6320 192.168.0.103:6321 192.168.0.103:6322 192.168.0.103:6323 192.168.0.103:6324 192.168.0.103:6325 192.168.0.103:6326 192.168.0.103:6327 192.168.0.103:6328 \
--cluster-replicas 2

4、Docker Compose 网络 bridge 和 host 模式

我在同时使用 Windows 系统和 MacOS,这两个系统在 Docker 中启动容器感觉总是会有一些差异,经常遇到的还是在网络管理上面的差距,原因可能就是两个系统的区别。

在使用 Windows 系统时,有时候会切换网络,这个时候电脑的 IP 会变更,因为使用桥接模式时,需要指定 Docker 部署机器的 IP,这时可能就会有问题。所以就想要使用 host 模式,host 模式中 Docker 容器将不会为容器创建子网,也不需要暴露容器的端口到主机,而是直接使用部署机器的端口。

使用 host 网络的 Compose 文件如下,不需要声明 networks 元素,services 中也不需要使用 ports 元素来暴露容器中的端口到主机,不需要指定 hostname,而是使用 network_mode: host

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
name: redis-cluster-demo

volumes:
redis-01-volume:
name: redis-cluster-demo-01-volume
redis-02-volume:
name: redis-cluster-demo-02-volume
redis-03-volume:
name: redis-cluster-demo-03-volume
redis-04-volume:
name: redis-cluster-demo-04-volume
redis-05-volume:
name: redis-cluster-demo-05-volume
redis-06-volume:
name: redis-cluster-demo-06-volume
redis-07-volume:
name: redis-cluster-demo-07-volume
redis-08-volume:
name: redis-cluster-demo-08-volume
redis-09-volume:
name: redis-cluster-demo-09-volume

services:
redis-01:
image: ${image}
restart: ${restart}
container_name: redis-cluster-node-01
network_mode: host
volumes:
- redis-01-volume:/data
- ./redis-node-1.conf:/usr/local/etc/redis/redis.conf
command: ["sh", "-c", "usermod -aG root redis && redis-server /usr/local/etc/redis/redis.conf"]

redis-02:
image: ${image}
restart: ${restart}
container_name: redis-cluster-node-02
network_mode: host
volumes:
- redis-02-volume:/data
- ./redis-node-2.conf:/usr/local/etc/redis/redis.conf
command: ["sh", "-c", "usermod -aG root redis && redis-server /usr/local/etc/redis/redis.conf"]

并且 Redis 配置文件中不需要使用 cluster-announce-ipcluster-announce-portcluster-announce-bus-port 三个配置。

但是,在 Windows 上使用 host 的 docker 容器时,还是有问题,启动的时候报错说某一个接口被占用。只能 telnet 通某几个节点,有几个节点 telnet 失败。在 DataGrip 中连接 Redis 集群时,也出现了问题,报错 DNS 什么什么的。

所以就不使用 Docker Desktop 了,直接使用 WSL,因为 WSL 的 IP 不会变更,所以直接在 WSL 中使用 bridge 网络模式搭建 Redis 集群。

另外,使用 Docker host 网络模式时,需要在 Docker Desktop 中需要开启如下设置:

5、Redis 集群中主从节点切换

现在模拟一下原主节点宕机,从节点切换为主节点。

Compose 中定义了 9 个 Redis 实例节点,创建好集群后调用 cluster nodes 来查看节点信息。

集群节点信息如下(注意该示例与上述的集群示例用的不是同一个集群):

1
2
3
4
5
6
7
8
9
10
$ cluster nodes
ab805fa1f55fe0549e8d272e3110637c73b88515 192.168.0.103:6323@6333 slave df6bbda2507ce155372cd1caddede17d5582ef00 0 1729522485000 3 connected
2a65a45426b81942990d75421452c6c898ddc917 192.168.0.103:6328@6338 slave df6bbda2507ce155372cd1caddede17d5582ef00 0 1729522485000 3 connected
99d5f5b8aff69dfe76bb94d98892fcb9c3d8cb56 192.168.0.103:6326@6336 slave 2c06746a9232fdb7c511f9b0a69cb5bf36566340 0 1729522484559 1 connected
d026b2a89a3bc70b9f62a1baad14cb74329249da 192.168.0.103:6324@6334 slave 7de54d48cf6815be86f99afd8e963b14a3a9fddb 0 1729522484000 2 connected
df6bbda2507ce155372cd1caddede17d5582ef00 192.168.0.103:6322@6332 master - 0 1729522485576 3 connected 10923-16383
23398a39d93fa4577f69b9e878934de89cd74d0f 192.168.0.103:6327@6337 slave 7de54d48cf6815be86f99afd8e963b14a3a9fddb 0 1729522486597 2 connected
13bacc3258fff844c3b34d71b7d2849509ba8e33 192.168.0.103:6325@6335 slave 2c06746a9232fdb7c511f9b0a69cb5bf36566340 0 1729522484000 1 connected
2c06746a9232fdb7c511f9b0a69cb5bf36566340 192.168.0.103:6320@6330 myself,master - 0 0 1 connected 0-5460
7de54d48cf6815be86f99afd8e963b14a3a9fddb 192.168.0.103:6321@6331 master - 0 1729522487621 2 connected 5461-10922

可以看到 192.168.0.103:6320192.168.0.103:6321192.168.0.103:6322 三个地址的实例节点为主节点。

现在,关闭 192.168.0.103:6321 所在容器,之后再次查看实例节点:

1
2
3
4
5
6
7
8
9
10
$ cluster nodes
ab805fa1f55fe0549e8d272e3110637c73b88515 192.168.0.103:6323@6333 slave df6bbda2507ce155372cd1caddede17d5582ef00 0 1729522576000 3 connected
2a65a45426b81942990d75421452c6c898ddc917 192.168.0.103:6328@6338 slave df6bbda2507ce155372cd1caddede17d5582ef00 0 1729522576000 3 connected
99d5f5b8aff69dfe76bb94d98892fcb9c3d8cb56 192.168.0.103:6326@6336 slave 2c06746a9232fdb7c511f9b0a69cb5bf36566340 0 1729522576333 1 connected
d026b2a89a3bc70b9f62a1baad14cb74329249da 192.168.0.103:6324@6334 slave 7de54d48cf6815be86f99afd8e963b14a3a9fddb 0 1729522576000 2 connected
df6bbda2507ce155372cd1caddede17d5582ef00 192.168.0.103:6322@6332 master - 0 1729522573000 3 connected 10923-16383
23398a39d93fa4577f69b9e878934de89cd74d0f 192.168.0.103:6327@6337 slave 7de54d48cf6815be86f99afd8e963b14a3a9fddb 0 1729522574277 2 connected
13bacc3258fff844c3b34d71b7d2849509ba8e33 192.168.0.103:6325@6335 slave 2c06746a9232fdb7c511f9b0a69cb5bf36566340 0 1729522577359 1 connected
2c06746a9232fdb7c511f9b0a69cb5bf36566340 192.168.0.103:6320@6330 myself,master - 0 0 1 connected 0-5460
7de54d48cf6815be86f99afd8e963b14a3a9fddb 192.168.0.103:6321@6331 master,fail - 1729522562633 1729522560880 2 disconnected 5461-10922

可以看到,排除宕机的主节点 192.168.0.103:6321,还是只有两个主节点,同时,6320 节点实例日志如下:

1
2
3
4
2024-10-21 22:56:18 16:M 21 Oct 2024 14:56:18.079 * Marking node 7de54d48cf6815be86f99afd8e963b14a3a9fddb () as failing (quorum reached).
2024-10-21 22:56:18 16:M 21 Oct 2024 14:56:18.083 # Cluster state changed: fail
2024-10-21 22:56:18 16:M 21 Oct 2024 14:56:18.711 * Failover auth granted to d026b2a89a3bc70b9f62a1baad14cb74329249da () for epoch 10
2024-10-21 22:56:18 16:M 21 Oct 2024 14:56:18.717 * Cluster state changed: ok

上面的日志中可以看到 7de54d48cf6815be86f99afd8e963b14a3a9fddb 节点的宕机日志。同时可以看到 d026b2a89a3bc70b9f62a1baad14cb74329249da 被选举为新的主节点。

更新主节点后 Redis 集群节点信息如下:

1
2
3
4
5
6
7
8
9
10
$ cluster nodes
ab805fa1f55fe0549e8d272e3110637c73b88515 192.168.0.103:6323@6333 slave df6bbda2507ce155372cd1caddede17d5582ef00 0 1729522597000 3 connected
2a65a45426b81942990d75421452c6c898ddc917 192.168.0.103:6328@6338 slave df6bbda2507ce155372cd1caddede17d5582ef00 0 1729522596000 3 connected
99d5f5b8aff69dfe76bb94d98892fcb9c3d8cb56 192.168.0.103:6326@6336 slave 2c06746a9232fdb7c511f9b0a69cb5bf36566340 0 1729522599091 1 connected
d026b2a89a3bc70b9f62a1baad14cb74329249da 192.168.0.103:6324@6334 master - 0 1729522598054 10 connected 5461-10922
df6bbda2507ce155372cd1caddede17d5582ef00 192.168.0.103:6322@6332 master - 0 1729522597000 3 connected 10923-16383
23398a39d93fa4577f69b9e878934de89cd74d0f 192.168.0.103:6327@6337 slave d026b2a89a3bc70b9f62a1baad14cb74329249da 0 1729522600132 10 connected
13bacc3258fff844c3b34d71b7d2849509ba8e33 192.168.0.103:6325@6335 slave 2c06746a9232fdb7c511f9b0a69cb5bf36566340 0 1729522599000 1 connected
2c06746a9232fdb7c511f9b0a69cb5bf36566340 192.168.0.103:6320@6330 myself,master - 0 0 1 connected 0-5460
7de54d48cf6815be86f99afd8e963b14a3a9fddb 192.168.0.103:6321@6331 master,fail - 1729522562633 1729522560880 2 disconnected

可以看到有 3 个可用的 Redis 主节点。重新将宕机主节点启动,Redis 日志和集群节点信息如下:

1
2
2024-10-21 23:04:12 16:M 21 Oct 2024 15:04:12.996 * A failover occurred in shard d32bb40dd4ddd7a95cf7e214fb78e819ef2dc654; node 7de54d48cf6815be86f99afd8e963b14a3a9fddb () lost 0 slot(s) to node d026b2a89a3bc70b9f62a1baad14cb74329249da () with a config epoch of 10
2024-10-21 23:04:13 16:M 21 Oct 2024 15:04:13.063 * Clear FAIL state for node 7de54d48cf6815be86f99afd8e963b14a3a9fddb ():replica is reachable again.
1
2
3
4
5
6
7
8
9
10
$ cluster nodes
ab805fa1f55fe0549e8d272e3110637c73b88515 192.168.0.103:6323@6333 slave df6bbda2507ce155372cd1caddede17d5582ef00 0 1729523061000 3 connected
2a65a45426b81942990d75421452c6c898ddc917 192.168.0.103:6328@6338 slave df6bbda2507ce155372cd1caddede17d5582ef00 0 1729523060000 3 connected
99d5f5b8aff69dfe76bb94d98892fcb9c3d8cb56 192.168.0.103:6326@6336 slave 2c06746a9232fdb7c511f9b0a69cb5bf36566340 0 1729523062639 1 connected
d026b2a89a3bc70b9f62a1baad14cb74329249da 192.168.0.103:6324@6334 master - 0 1729523059560 10 connected 5461-10922
df6bbda2507ce155372cd1caddede17d5582ef00 192.168.0.103:6322@6332 master - 0 1729523061610 3 connected 10923-16383
23398a39d93fa4577f69b9e878934de89cd74d0f 192.168.0.103:6327@6337 slave d026b2a89a3bc70b9f62a1baad14cb74329249da 0 1729523061000 10 connected
13bacc3258fff844c3b34d71b7d2849509ba8e33 192.168.0.103:6325@6335 slave 2c06746a9232fdb7c511f9b0a69cb5bf36566340 0 1729523059000 1 connected
2c06746a9232fdb7c511f9b0a69cb5bf36566340 192.168.0.103:6320@6330 myself,master - 0 0 1 connected 0-5460
7de54d48cf6815be86f99afd8e963b14a3a9fddb 192.168.0.103:6321@6331 slave d026b2a89a3bc70b9f62a1baad14cb74329249da 0 1729523057481 10 connected

可以看到现在 192.168.0.103:6321 节点为从节点了。

六、其他问题

1、排查问题时容器中缺少工具包

容器非常精简,在排查问题时,有的常用工具包是没有的,需要单独安装,使用到的命令如下:

  • ps
  • ping
  • telnet
1
2
3
4
5
6
$ apt update

$ apt install procps
$ apt install iputils-ping
$ apt install telnet
$ apt install net-tools

2、容器启动时执行多条命令

在容器启动时,需要执行多条命令,像下面这样是不行的,启动时会出错。

1
command: ["redis-server", "/usr/local/etc/redis/redis.conf", "usermod",  "-g", "root", "redis"]

启动报错如下:

1
2
3
4
2024-10-12 10:33:06 *** FATAL CONFIG FILE ERROR (Redis 7.4.1) ***
2024-10-12 10:33:06 Reading the configuration file, at line 2308
2024-10-12 10:33:06 >>> '"usermod" "-g" "root" "redis"'
2024-10-12 10:33:06 Bad directive or wrong number of arguments

如果需要执行多条命令,可以选择使用 sh -c

1
command: ["sh", "-c", "redis-server /usr/local/etc/redis/redis.conf && usermod -g root redis"]

3、Linux 修改用户的用户组

修改用户的主组

1
sudo usermod -g <groupname> <username>

示例如下,下面的命令,会将 redis 用户的用户组修改为 root

1
usermod -g root redis

给用户添加附加组

1
sudo usermod -aG <groupname> <username>

示例如下,下面的命令,将用户 bob 添加到 sudodocker 组:

1
sudo usermod -aG sudo,docker bob

4、容器启动时日志中的系统权限问题

4.1 现象

主节点在启动时日志如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
2024-10-12 10:45:43 1:C 12 Oct 2024 02:45:43.318 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
2024-10-12 10:45:43 1:C 12 Oct 2024 02:45:43.318 * Redis version=7.4.1, bits=64, commit=00000000, modified=0, pid=1, just started
2024-10-12 10:45:43 1:C 12 Oct 2024 02:45:43.318 * Configuration loaded
2024-10-12 10:45:43 1:M 12 Oct 2024 02:45:43.318 * monotonic clock: POSIX clock_gettime
2024-10-12 10:45:43 1:M 12 Oct 2024 02:45:43.319 # Failed to write PID file: Permission denied
2024-10-12 10:45:43 1:M 12 Oct 2024 02:45:43.319 * Running mode=standalone, port=6379.
2024-10-12 10:45:43 1:M 12 Oct 2024 02:45:43.319 * Server initialized
2024-10-12 10:45:43 1:M 12 Oct 2024 02:45:43.319 * Loading RDB produced by version 7.4.1
2024-10-12 10:45:43 1:M 12 Oct 2024 02:45:43.319 * RDB age 2353 seconds
2024-10-12 10:45:43 1:M 12 Oct 2024 02:45:43.319 * RDB memory usage when created 1.25 Mb
2024-10-12 10:45:43 1:M 12 Oct 2024 02:45:43.319 * Done loading RDB, keys loaded: 0, keys expired: 0.
2024-10-12 10:45:43 1:M 12 Oct 2024 02:45:43.319 * DB loaded from disk: 0.000 seconds
2024-10-12 10:45:43 1:M 12 Oct 2024 02:45:43.319 * Ready to accept connections tcp
2024-10-12 10:45:43 1:M 12 Oct 2024 02:45:43.603 * Replica 172.21.0.4:6379 asks for synchronization
2024-10-12 10:45:43 1:M 12 Oct 2024 02:45:43.603 * Partial resynchronization request from 172.21.0.4:6379 accepted. Sending 0 bytes of backlog starting from offset 960610.
2024-10-12 10:45:43 1:M 12 Oct 2024 02:45:43.608 * Replica 172.21.0.3:6379 asks for synchronization
2024-10-12 10:45:43 1:M 12 Oct 2024 02:45:43.608 * Partial resynchronization request from 172.21.0.3:6379 accepted. Sending 0 bytes of backlog starting from offset 960610.

上面的日志中,有内容:Failed to write PID file: Permission denied,为没有 pidfile /var/run/redis_6379.pid 路径中目录或文件的访问权限。

哨兵节点在启动时日志如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
2024-10-12 10:49:38 1:X 12 Oct 2024 02:49:38.258 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
2024-10-12 10:49:38 1:X 12 Oct 2024 02:49:38.258 * Redis version=7.4.1, bits=64, commit=00000000, modified=0, pid=1, just started
2024-10-12 10:49:38 1:X 12 Oct 2024 02:49:38.258 * Configuration loaded
2024-10-12 10:49:38 1:X 12 Oct 2024 02:49:38.259 * monotonic clock: POSIX clock_gettime
2024-10-12 10:49:38 1:X 12 Oct 2024 02:49:38.259 # Failed to write PID file: Permission denied
2024-10-12 10:49:38 1:X 12 Oct 2024 02:49:38.259 * Running mode=sentinel, port=26379.
2024-10-12 10:49:38 1:X 12 Oct 2024 02:49:38.262 # Could not create tmp config file (Permission denied)
2024-10-12 10:49:38 1:X 12 Oct 2024 02:49:38.262 # WARNING: Sentinel was not able to save the new configuration on disk!!!: Permission denied
2024-10-12 10:49:38 1:X 12 Oct 2024 02:49:38.262 * Sentinel ID is ad4de059803e1e177a79d9d66b5ba2aa841c54b3
2024-10-12 10:49:38 1:X 12 Oct 2024 02:49:38.262 # +monitor master redis-master redis-master 6379 quorum 2
2024-10-12 10:49:40 1:X 12 Oct 2024 02:49:40.461 * +sentinel sentinel e4a0c6a4feaff9c075ee9daf25d5e14cc9cb7290 172.21.0.6 26379 @ redis-master redis-master 6379
2024-10-12 10:49:40 1:X 12 Oct 2024 02:49:40.468 # Could not create tmp config file (Permission denied)
2024-10-12 10:49:40 1:X 12 Oct 2024 02:49:40.468 # WARNING: Sentinel was not able to save the new configuration on disk!!!: Permission denied
2024-10-12 10:49:40 1:X 12 Oct 2024 02:49:40.520 * +sentinel sentinel b042abc5b6ee84e0610beec286c9b530f5377eb8 172.21.0.7 26379 @ redis-master redis-master 6379
2024-10-12 10:49:40 1:X 12 Oct 2024 02:49:40.525 # Could not create tmp config file (Permission denied)
2024-10-12 10:49:40 1:X 12 Oct 2024 02:49:40.525 # WARNING: Sentinel was not able to save the new configuration on disk!!!: Permission denied
2024-10-12 10:49:48 1:X 12 Oct 2024 02:49:48.321 * +slave slave 172.21.0.2:6379 172.21.0.2 6379 @ redis-master redis-master 6379
2024-10-12 10:49:48 1:X 12 Oct 2024 02:49:48.328 # Could not create tmp config file (Permission denied)
2024-10-12 10:49:48 1:X 12 Oct 2024 02:49:48.328 # WARNING: Sentinel was not able to save the new configuration on disk!!!: Permission denied
2024-10-12 10:49:48 1:X 12 Oct 2024 02:49:48.328 * +slave slave 172.21.0.3:6379 172.21.0.3 6379 @ redis-master redis-master 6379
2024-10-12 10:49:48 1:X 12 Oct 2024 02:49:48.333 # Could not create tmp config file (Permission denied)
2024-10-12 10:49:48 1:X 12 Oct 2024 02:49:48.333 # WARNING: Sentinel was not able to save the new configuration on disk!!!: Permission denied
2024-10-12 10:50:11 1:X 12 Oct 2024 02:50:11.269 # +tilt #tilt mode entered

上面的日志中,有如下无权限的内容:

  • # Failed to write PID file: Permission denied
  • # Could not create tmp config file (Permission denied)
  • # WARNING: Sentinel was not able to save the new configuration on disk!!!: Permission denied

4.2 排查过程

查看 redis 镜像的 Dockerfile 文件,其中创建了用户 redis

1
2
CMD ["bash"]
RUN /bin/sh -c set -eux; groupadd -r -g 999 redis; useradd -r -g redis -u 999 redis # buildkit

进入容器查看容器当前用户为 root

1
2
$ whoami
root

Dockerfile 中有入口 shell 文件 docker-entrypoint.sh ,其内容如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#!/bin/sh
set -e

# first arg is `-f` or `--some-option`
# or first arg is `something.conf`
if [ "${1#-}" != "$1" ] || [ "${1%.conf}" != "$1" ]; then
set -- redis-server "$@"
fi

# allow the container to be started with `--user`
if [ "$1" = 'redis-server' -a "$(id -u)" = '0' ]; then
find . \! -user redis -exec chown redis '{}' +
exec gosu redis "$0" "$@"
fi

# set an appropriate umask (if one isn't set already)
# - https://github.com/docker-library/redis/issues/305
# - https://github.com/redis/redis/blob/bb875603fb7ff3f9d19aad906bd45d7db98d9a39/utils/systemd-redis_server.service#L37
um="$(umask)"
if [ "$um" = '0022' ]; then
umask 0077
fi

exec "$@"

上面的 shell 文件的作用详细解释如下:

此 shell 脚本主要用于在容器中启动 Redis 服务,确保以正确的用户权限和配置执行。具体作用如下:

  • **set -e**:脚本在遇到任何错误时会立即退出,防止执行后续可能导致问题的命令。
  • 参数处理:检查第一个参数是否以 - 开头(如 -f--some-option),或者是否以 .conf 结尾(如 Redis 配置文件)。如果满足这些条件之一,会将命令重写为 redis-server 加上这些参数,保证 Redis 以服务的方式启动。
  • 检查用户权限:如果命令是 redis-server 且当前用户是 rootid -u 返回 0),那么:
    • 使用 find 命令更改当前目录下所有文件的所有者为 redis 用户。
    • 使用 gosu 命令(类似于 sudo)以 redis 用户身份重新执行该脚本及其参数,避免直接使用 root 启动 Redis,提升安全性。
  • **设置 umask**:检查当前的 umask,如果是默认值 0022(这意味着新文件默认的权限是 755),则将 umask 设置为 0077,以确保新文件的权限更为严格(仅对所有者可见和修改)。
  • 执行最终命令exec "$@" 会执行传入的所有参数(通常是 redis-server 和其他传入的参数),并替换当前 shell 进程,确保 Redis 服务器以适当的配置和权限启动。

可以确定,容器中的 redis 进程在启动时,是使用 redis 用户来启动的。

进入容器中,查看文件归属。

1
2
3
4
5
6
$ cd /data
$ ls -la
total 12
drwxr-xr-x 2 redis redis 4096 Oct 12 03:07 .
drwxr-xr-x 1 root root 4096 Oct 12 03:09 ..
-rw------- 1 redis redis 174 Oct 12 03:07 dump.rdb

可以看到 Redis 实例进程创建的文件,归属用户和归属组,都为 redis

1
2
3
4
5
6
$ ls -la /run
total 16
drwxr-xr-x 1 root root 4096 Oct 12 03:09 .
drwxr-xr-x 1 root root 4096 Oct 12 03:09 ..
drwxrwxrwt 2 root root 4096 Sep 26 00:00 lock
-rw------- 1 root root 3 Oct 12 03:09 redis_6379.pid

/run 文件夹是归属于 root 用户的。

查看进程运行情况,确认 redis 进程是归属于哪个用户的。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ apt update
$ apt install procps
$ ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 07:44 ? 00:00:00 sh -c usermod -aG root redis && r
root 16 1 0 07:44 ? 00:00:06 redis-server 0.0.0.0:6379
root 22 0 0 07:44 pts/0 00:00:00 /bin/bash
root 247 22 0 08:58 pts/0 00:00:00 ps -ef
$ ps -u root
PID TTY TIME CMD
1 ? 00:00:00 sh
16 ? 00:00:06 redis-server
22 pts/0 00:00:00 bash
244 pts/0 00:00:00 ps
$ ps -u redis
PID TTY TIME CMD

执行上面的一系列命令,可以看到,redis 进程是通过 root 用户执行的。

解决方案就是修改 redis 用户的主组为 root 或给 redis 用户添加附加组 root

之后启动 docker 容器,日志中就看不到上面的警告了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
2024-10-12 17:24:26 16:C 12 Oct 2024 09:24:26.037 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
2024-10-12 17:24:26 16:C 12 Oct 2024 09:24:26.037 * Redis version=7.4.1, bits=64, commit=00000000, modified=0, pid=16, just started
2024-10-12 17:24:26 16:C 12 Oct 2024 09:24:26.037 * Configuration loaded
2024-10-12 17:24:26 16:M 12 Oct 2024 09:24:26.037 * monotonic clock: POSIX clock_gettime
2024-10-12 17:24:26 16:M 12 Oct 2024 09:24:26.038 * Running mode=standalone, port=6379.
2024-10-12 17:24:26 16:M 12 Oct 2024 09:24:26.038 * Server initialized
2024-10-12 17:24:26 16:M 12 Oct 2024 09:24:26.038 * Loading RDB produced by version 7.4.1
2024-10-12 17:24:26 16:M 12 Oct 2024 09:24:26.038 * RDB age 22638 seconds
2024-10-12 17:24:26 16:M 12 Oct 2024 09:24:26.038 * RDB memory usage when created 1.06 Mb
2024-10-12 17:24:26 16:M 12 Oct 2024 09:24:26.038 * Done loading RDB, keys loaded: 0, keys expired: 0.
2024-10-12 17:24:26 16:M 12 Oct 2024 09:24:26.038 * DB loaded from disk: 0.000 seconds
2024-10-12 17:24:26 16:M 12 Oct 2024 09:24:26.038 * Ready to accept connections tcp
2024-10-12 17:24:26 16:M 12 Oct 2024 09:24:26.914 * Replica 172.21.0.2:6379 asks for synchronization
2024-10-12 17:24:26 16:M 12 Oct 2024 09:24:26.914 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '0d9f7daf7d575fa535ed676f8d0a3745d29f911d', my replication IDs are '29160765478def2d850128bf9b0cd997e48bf165' and 'bce9594562a9218c76945ebe3ba6f62c4d8defdc')
2024-10-12 17:24:26 16:M 12 Oct 2024 09:24:26.914 * Delay next BGSAVE for diskless SYNC
2024-10-12 17:24:26 16:M 12 Oct 2024 09:24:26.950 * Replica 172.21.0.3:6379 asks for synchronization
2024-10-12 17:24:26 16:M 12 Oct 2024 09:24:26.950 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '0d9f7daf7d575fa535ed676f8d0a3745d29f911d', my replication IDs are '29160765478def2d850128bf9b0cd997e48bf165' and 'bce9594562a9218c76945ebe3ba6f62c4d8defdc')
2024-10-12 17:24:26 16:M 12 Oct 2024 09:24:26.950 * Delay next BGSAVE for diskless SYNC

另外,进入容器,查看 /run 目录,可以看到创建的 pid 文件:

1
2
3
4
5
6
7
$ cd /run
$ ls -la
total 16
drwxr-xr-x 1 root root 4096 Oct 12 09:24 .
drwxr-xr-x 1 root root 4096 Oct 12 09:24 ..
drwxrwxrwt 2 root root 4096 Sep 26 00:00 lock
-rw------- 1 root root 3 Oct 12 09:24 redis_6379.pid

4.3 无用的尝试

关于这个权限问题,先确认了是否是 docker Compose 的问题,首先修改了 Compose 文件:

1
2
3
4
services:
redis-master:
privileged: true
user: root

在 service 中增加了这两个属性,但还是没有效果。user 配置,如果不指定,默认就是使用 root 用户进行登录的。

没有丝毫作用。

5、构建自己的 Redis 镜像

需要使用多个工具包,但是 Redis 的官方 Docker 镜像是 Redis 运行精简版,其中并没有需要的工具包,可以构建自己的镜像,镜像中安装需要的工具包,以便于搭建环境时的调试。

1
2
3
4
5
docker build -f ./dev-redis.dockerfile -t redis-dev:7.4-bookworm .
docker build -f ./dev-redis.dockerfile -t redis-dev:7.4.1 .
docker build -f ./dev-redis.dockerfile -t z2huo/redis-dev:7.4.1 .
docker build -f ./dev-redis.dockerfile -t z2huo/redis-dev:7.4.1-bookworm .
docker build -f ./dev-redis.dockerfile -t z2huo/redis-dev:latest .

dev-redis.dockerfile 文件内容如下:

1
2
3
4
5
6
7
8
9
10
11
FROM redis:7.4-bookworm

LABEL maintainer="z2huo9994@163.com"

RUN apt-get update && \
apt install -y procps && \
apt install -y iputils-ping && \
apt install -y telnet && \
apt install -y net-tools && \
apt clean && \
rm -rf /var/lib/apt/lists/*

6、使用 shell 脚本复制 redis 集群各个节点配置文件

6.1 Docker 使用 bridge 网络模式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#!/bin/bash

# 定义字符串变量 ip
ip="192.168.0.103"
# ip="192.168.132.108"

# 节点数量
nodeCount=9
# 节点宿主机暴露端口
ports=(6320 6321 6322 6323 6324 6325 6326 6327 6328)
# 另一种数组的写法,windows 上面用 powershell 还是 WSL 里面执行时 ports=() 这种写法有可能不成功,要看是 sh 还是 bash
# set ports 6320 6321 6322 6323 6324 6325 6326 6327 6328

# 循环遍历每个端口,生成对应的配置文件
for ((i=0; i<$nodeCount; i++)); do
port=${ports[$i]}
bus_port=`expr ${ports[$i]} + 10`

# 定义配置文件名
node_no=`expr ${i} + 1`
config_file="redis-node-${node_no}.conf"

# 检查文件是否存在,若存在则删除
if [ -f "$config_file" ]; then
echo "File $config_file exists, deleting it..."
rm "$config_file"
fi

# 复制 redis.conf 文件为新的配置文件
cp redis.conf "$config_file"

# 追加 port 和 bus_port 到配置文件中
echo "" >> "$config_file"
echo "# 集群模式时应对 Docker 的 bridge 网络模式时使用的配置" >> "$config_file"
echo "cluster-announce-ip $ip" >> "$config_file"
echo "cluster-announce-port $port" >> "$config_file"
echo "cluster-announce-bus-port $bus_port" >> "$config_file"

echo "Created $config_file with port $port and bus_port $bus_port"
done

6.2 Docker 使用 host 网络模式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#!/bin/bash

# 定义字符串变量 ip
# ip="192.168.0.103"
# ip="192.168.132.108"

# 节点数量
nodeCount=9
# 节点宿主机暴露端口
ports=(6320 6321 6322 6323 6324 6325 6326 6327 6328)
# 另一种数组的写法,windows 上面用 powershell 还是 WSL 里面执行时 ports=() 这种写法有可能不成功,要看是 sh 还是 bash
# set ports 6320 6321 6322 6323 6324 6325 6326 6327 6328

# 循环遍历每个端口,生成对应的配置文件
for ((i=0; i<$nodeCount; i++)); do
port=${ports[$i]}
bus_port=`expr ${ports[$i]} + 10`

# 定义配置文件名
node_no=`expr ${i} + 1`
config_file="redis-node-${node_no}.conf"

# 检查文件是否存在,若存在则删除
if [ -f "$config_file" ]; then
echo "File $config_file exists, deleting it..."
rm "$config_file"
fi

# 复制 redis.conf 文件为新的配置文件
cp redis.conf "$config_file"

# 追加 port 和 cluster-port 到配置文件中
echo "" >> "$config_file"
echo "# 集群模式时应对 Docker 的 host 网络模式时使用的配置" >> "$config_file"
echo "port $port" >> "$config_file"
echo "cluster-port $bus_port" >> "$config_file"

echo "Created $config_file with port $port and bus_port $bus_port"
done

7、Redis 配置文件中的 bind 使用容器分配 IP

在上面的桥接模式中,redis 配置中的 bind 使用的是 0.0.0.0,在 Redis 中,这表示该服务器将监听来自所有网络接口的连接。无论客户端使用主机的哪个 IP 地址进行连接,只要主机支持该连接,服务器都会接受。

但是网络模式使用桥接模式时,在 Compose 中是定义了子网的,在子网中给各个容器分配了固定的 IP 地址,所以可以在 Redis 配置中将 bind 配置设置为容器的固定 IP 地址。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
networks:
redis-cluster-net:
name: redis-cluster-net
driver: bridge
ipam:
driver: default
config:
- subnet: 172.92.0.0/24
ip_range: 172.92.0.0/24
gateway: 172.92.0.1

services:
redis-01:
image: ${image}
restart: ${restart}
container_name: redis-cluster-node-01
hostname: redis-01
networks:
redis-cluster-net:
ipv4_address: 172.92.0.2

redis 配置文件中内容如下:

1
bind 172.92.0.2

批量生成节点配置文件 shell 脚本需要修改:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
#!/bin/bash

# 定义字符串变量 ip
# ip="192.168.0.103"
# ip="192.168.132.108"
ip="172.19.42.221"

container_ip_prefix="172.92.0."
container_ip_suffix_start=2

# 节点数量
nodeCount=9
# 节点宿主机暴露端口
ports=(6320 6321 6322 6323 6324 6325 6326 6327 6328)
# 另一种数组的写法,windows 上面用 powershell 还是 WSL 里面执行时 ports=() 这种写法有可能不成功,要看是 sh 还是 bash
# set ports 6320 6321 6322 6323 6324 6325 6326 6327 6328

# 循环遍历每个端口,生成对应的配置文件
for ((i=0; i<$nodeCount; i++)); do
port=${ports[$i]}
bus_port=`expr ${ports[$i]} + 10`
suffix=`expr ${container_ip_suffix_start} + $i`
container_ip="${container_ip_prefix}${suffix}"

# 定义配置文件名
node_no=`expr ${i} + 1`
config_file="redis-node-${node_no}.conf"

# 检查文件是否存在,若存在则删除
if [ -f "$config_file" ]; then
echo "File $config_file exists, deleting it..."
rm "$config_file"
fi

# 复制 redis.conf 文件为新的配置文件
cp redis.conf "$config_file"

# 追加 port 和 bus_port 到配置文件中
echo "" >> "$config_file"
echo "# 集群模式时应对 Docker 的 bridge 网络模式时使用的配置" >> "$config_file"
echo "bind $container_ip" >> "$config_file"
echo "cluster-announce-ip $ip" >> "$config_file"
echo "cluster-announce-port $port" >> "$config_file"
echo "cluster-announce-bus-port $bus_port" >> "$config_file"

echo "Created $config_file with port $port and bus_port $bus_port"
done

8、自定义虚拟适配器

Docker 容器中部署的应用需要知道节点的公网 IP,即本机 IP,比如 Redis 的 cluster-announce-ip 配置。可以使用 WIFI 适配器的 IP,但是在电脑连接到不同的 WIFI 上时,DHCP 分配的 IP 不同。并且如果一套配置文件在多个电脑上部署时因 IP 变化还要更新配置文件,比如在公司电脑和家里的电脑。

这些 Docker 容器只是在本地测试使用,所以可以在本机上自定义适配器以使 IP 固定。

8.1 Windows 上自定义适配器

打开设备管理器,点击“操作”后选择“添加过时硬件”之后执行以下步骤: 下一步 => 安装我手动从列表选择的硬件(高级)=> 常见硬件类型选择“网络适配器” => 厂商选择“Micorsoft” => 型号选择“Micorsoft KM-TEST 环回适配器”

创建好自定义适配器后,可以自定义该适配器的 IP,设置固定值,如:192.168.99.99

在 redis 配置文件中可以使用如下配置:

1
2
3
4
bind 127.0.0.1 -::1 172.92.0.2
cluster-announce-ip 192.168.99.99
cluster-announce-port 6320
cluster-announce-bus-port 6330

创建集群时使用如下命令:

1
2
3
$ redis-cli -h 127.0.0.1 -p 6379 -a z2huo@2024 \
--cluster create 192.168.99.99:6320 192.168.99.99:6321 192.168.99.99:6322 192.168.99.99:6323 192.168.99.99:6324 192.168.99.99:6325 192.168.99.99:6326 192.168.99.99:6327 192.168.99.99:6328 \
--cluster-replicas 2

8.2 MacOS 上自定义适配器

macOS 默认有一个 Loopback 设备 (lo0),可以给它绑定多个 IP 地址,类似于 Windows 的 Microsoft Loopback Adapter。

添加 Loopback IP

1
sudo ifconfig lo0 alias 192.168.99.99/32

这样,就可以在 macOS 本机上访问 192.168.99.99,查看 lo0 如下:

1
2
3
4
5
6
7
8
$ ifconfig lo0
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
options=1203<RXCSUM,TXCSUM,TXSTATUS,SW_TIMESTAMP>
inet 127.0.0.1 netmask 0xff000000
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
inet 192.168.99.99 netmask 0xffffffff
nd6 options=201<PERFORMNUD,DAD>

相关链接

Compose 顶级元素 Services | z2huo

Compose 顶级元素 Networks | z2huo

[[&、&&、I 和 II]]

[[sh -c 的作用]]

[[Dockerfile 参考]]

[[Compose 顶级元素 Services]]

[[Compose 顶级元素 Networks]]

[[Redis 哨兵配置一览]]

[[Redis 配置一览]]

OB tags

#Redis #Docker