2022年7月,南大通用gbase 8c于opengauss developer day 2022峰会正式亮相发布。gbase 8c是业内首个基于opengauss3.0发布的多模多态的分布式数据库,支持行存、列存、内存等多种存储模式和单机、主备与分布式等多种部署形态。
前文【快速搭建 gbase 8c 集群环境】已阐述如何快速搭建 gbase 8c 集群环境,本文将汇集 gbase 8c 的常用命令。
gbase 8c 数据库分布式形态采用 share nothing 的分布式架构,计算节点和存储节点分离。节点间通过高速网络进行通信,所有节点都有主从互备,确保系统的极致高可用。
gbase 8c 的主要节点分为三类,分别是协调器(coordinator,cn)、数据节点(data node,dn)、全局事务管理器(global transaction manager,gtm)。部署一套 gbase 8c 分布式环境至少需要以下节点:
- 1个 gha_server 节点
- 1个 dcs 节点
- 1个 gtm 节点
- 1个 cn 节点
- 2个 dn 节点
下文将直接使用节点缩写。
常见命令
1. 查看 gbase 8c 所有组件状态
gha_ctl monitor -l http://192.168.8.31:2379 -h
2. 查看 gha_server 节点状态
[gbase@gbase8c_1 ~]$ gha_ctl monitor server -l http://192.168.8.31:2379
{
"cluster": "gbase",
"version": "v5_s3.0.0b76",
"server": [
{
"name": "gha_server1",
"host": "192.168.8.31",
"port": "20001",
"state": "running",
"isleader": true
}
]
}
3. 查看 dcs 节点状态
[gbase@gbase8c_1 ~]$ gha_ctl monitor dcs -l http://192.168.8.31:2379
{
"cluster": "gbase",
"version": "v5_s3.0.0b76",
"dcs": {
"clusterstate": "healthy",
"members": [
{
"url": "http://192.168.8.31:2379",
"id": "bfb7ea6d0aaed3aa",
"name": "node_0",
"isleader": true,
"state": "healthy"
}
]
}
}
4. 查看 gtm 节点状态
[gbase@gbase8c_1 ~]$ gha_ctl monitor gtm -l http://192.168.8.31:2379
{
"cluster": "gbase",
"version": "v5_s3.0.0b76",
"gtm": [
{
"name": "gtm1",
"host": "192.168.8.31",
"port": "6666",
"workdir": "/home/gbase/data/gtm/gtm1",
"agentport": "8001",
"state": "running",
"role": "primary",
"agenthost": "192.168.8.31"
}
]
}
5. 查看 cn 节点状态
[gbase@gbase8c_1 ~]$ gha_ctl monitor coordinator -l http://192.168.8.31:2379
{
"cluster": "gbase",
"version": "v5_s3.0.0b76",
"coordinator": [
{
"name": "cn1",
"host": "192.168.8.31",
"port": "5432",
"workdir": "/home/gbase/data/coord/cn1",
"agentport": "8003",
"state": "running",
"role": "primary",
"agenthost": "192.168.8.31",
"central": true
}
]
}
6. 查看 dn 节点状态
[gbase@gbase8c_1 ~]$ gha_ctl monitor datanode -l http://192.168.8.31:2379
{
"cluster": "gbase",
"version": "v5_s3.0.0b76",
"datanode": {
"dn1": [
{
"name": "dn1_1",
"host": "192.168.8.32",
"port": "15432",
"workdir": "/home/gbase/data/dn1/dn1_1",
"agentport": "8005",
"state": "running",
"role": "primary",
"agenthost": "192.168.8.32"
}
],
"dn2": [
{
"name": "dn2_1",
"host": "192.168.8.33",
"port": "20010",
"workdir": "/home/gbase/data/dn2/dn2_1",
"agentport": "8007",
"state": "running",
"role": "primary",
"agenthost": "192.168.8.33"
}
]
}
}
7. 在 cn 节点以 gbase 用户连接 gbase 8c 数据库
[gbase@gbase8c_1 ~]$ gsql -d postgres
gsql ((multiple_nodes gbase8cv5 3.0.0b76 build 47948f99) compiled at 2023-02-27 16:04:20 commit 0 last mr 1232 )
non-ssl connection (ssl connection is recommended when requiring high-security)
type "help" for help.
postgres=# \conninfo
you are connected to database "postgres" as user "gbase" via socket in "/home/gbase/gbase_db/tmp" at port "5432".
postgres=#
8. 查看 cn 节点上的参数
[gbase@gbase8c_1 ~]$ gs_guc check -z coordinator -n all -i all -c "listen_addresses"
the gs_guc run with the following arguments: [gs_guc -z coordinator -n all -i all -c listen_addresses check ].
total guc values: 1.
the value of parameter listen_addresses is same on all instances.
listen_addresses='localhost,192.168.8.31'
9. 在 cn 节点配置允许任意 ip 访问集群
[gbase@gbase8c_1 ~]$ gs_guc reload -z coordinator -n all -i all -h "host all all 0.0.0.0/0 sha256"
the gs_guc run with the following arguments: [gs_guc -z coordinator -n all -i all -h host all all 0.0.0.0/0 sha256 reload ].
begin to perform the total nodes: 3.
popen count is 1, popen success count is 1, popen failure count is 0.
begin to perform gs_guc for coordinators.
command count is 1, command success count is 1, command failure count is 0.
total instances: 1.
all: success to perform gs_guc!
10. 使用数据库管理工具远程连接 gbase 8c 数据库
使用数据库管理工具连接 gbase 8c 数据库,这里 ip 地址填写 cn 节点的 ip 地址,用户可以自己创建一个新用户。
[gbase@gbase8c_1 ~]$ gsql
gsql ((multiple_nodes gbase8cv5 3.0.0b76 build 47948f99) compiled at 2023-02-27 16:04:20 commit 0 last mr 1232 )
non-ssl connection (ssl connection is recommended when requiring high-security)
type "help" for help.
gbase=# create user shawnyan login password 'xxxxxxxx';
create role
gbase=#
11. 查看当前用户连接信息
gbase=# \conninfo
you are connected to database "gbase" as user "gbase" via socket in "/home/gbase/gbase_db/tmp" at port "5432".
gbase=# select current_user;
current_user
--------------
gbase
(1 row)
12. 使用 gbase 8c 提供的 gs_check 工具检查系统 cpu 情况
- cpu 使用率
[gbase@gbase8c_1 ~]$ gs_check -i checkcpu -l
2023-03-19 16:21:10 [nam] checkcpu
2023-03-19 16:21:10 [std] 检查主机cpu占用率,如果idle 大于30%,或者 iowait 小于 30%.则检查项通过,否则检查项不通过
2023-03-19 16:21:10 [rst] ok
2023-03-19 16:21:10 [raw]
linux 3.10.0-1160.83.1.el7.x86_64 (gbase8c_1.shawnyan.com) 03/19/23 _x86_64_ (2 cpu)
16:21:05 cpu %user %nice %system %iowait %steal %idle
16:21:06 all 10.05 0.00 5.29 0.00 0.00 84.66
16:21:07 all 17.89 0.00 11.05 0.00 0.00 71.05
16:21:08 all 11.92 0.00 3.63 0.00 0.00 84.46
16:21:09 all 14.58 0.00 5.73 0.00 0.00 79.69
16:21:10 all 12.11 0.00 2.63 0.00 0.00 85.26
average: all 13.31 0.00 5.66 0.00 0.00 81.03
- cpu 超线程是否开启
[gbase@gbase8c_1 ~]$ gs_check -i checkhyperthread
parsing the check items config file successfully
distribute the context file to remote hosts successfully
start to health check for the cluster. total items:1 nodes:3
checking... [=========================] 1/1
start to analysis the check result
checkhyperthread............................ng
the item run on 3 nodes. ng: 3
the ng[gbase8c_1.shawnyan.com,gbase8c_2.shawnyan.com,gbase8c_3.shawnyan.com] value:
hyper-threading is down.
analysis the check result successfully
failed. all check items run completed. total:1 ng:1
for more information please refer to /home/gbase/gbase_db/om_89583529/script/gspylib/inspection/output/checkreport_202303195907865608.tar.gz
13. 使用 gbase 8c 提供的 gs_checkperf 工具检查数据库性能状态
[gbase@gbase8c_1 ~]$ gs_checkperf -i pmk -u gbase
cluster statistics information:
host cpu busy time ratio : 10.04 %
mppdb cpu time % in busy time : 100.00 %
shared buffer hit ratio : 99.39 %
in-memory sort ratio : 100.00 %
physical reads : 3191
physical writes : 769
db size : 153 mb
total physical writes : 357
active sql count : 4
session count : 13
[gbase@gbase8c_1 ~]$
14. 查看各组件日志
- gtm
cd /home/gbase/gbase_db/log/pg_log/gtm1
tailf postgresql-2023-03-16_000000.log
- cn
cd /home/gbase/gbase_db/log/pg_log/cn1
tailf postgresql-2023-03-16_000000.log
- dn
cd /home/gbase/gbase_db/log/pg_log/dn1_1
tailf postgresql-2023-03-16_000000.log
15. 查看审计日志
- cn 节点
日志在 /home/gbase/gbase_db/log/pg_audit/cn1 目录下,日志会按时间字段轮转,需要按需查找。
示例如下,
16. 查看 gbase 相关服务
使用 systemctl 查看当前运行的 gbase 服务
systemctl list-units | grep -i gbase
-
gtm/gha/cn
-
dn
17. 启停数据节点
gha_ctl start datanode -l http://192.168.8.31:2379 group_name dn2_1
gha_ctl stop datanode -l http://192.168.8.31:2379 group_name dn2_1
systemctl start datanode_gbase_dn2_1
systemctl stop datanode_gbase_dn2_1
常见问题
1. 80000209 – ntpd 同步异常
- 现象:
安装时报错:
gbase@gbase8c 1 script$ ./gha ctl install -c gbase -p /home/gbase/gbase package
"ret":80000209,'msg":"timestamp diff too largel
gbase@gbase8c 1 script$
- 分析:
从各节点同时取时间戳,差距较大,大于 2.5。时间不同步,可能会导致全局时间戳 tso 失效。
- 解决:
在三个节点分别检查 ntpd 服务的同步情况:
[gbase@gbase8c_1 ~]$ timedatectl
local time: mon 2023-03-20 21:13:02 cst
universal time: mon 2023-03-20 13:13:02 utc
rtc time: mon 2023-03-20 13:13:02
time zone: asia/shanghai (cst, 0800)
ntp enabled: no
ntp synchronized: no
rtc in local tz: no
dst active: n/a
[gbase@gbase8c_1 ~]$ systemctl status ntpd
● ntpd.service - network time service
loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled; vendor preset: disabled)
active: active (running) since wed 2023-03-15 00:45:31 cst; 5 days ago
main pid: 773 (ntpd)
tasks: 1
cgroup: /system.slice/ntpd.service
└─773 /usr/sbin/ntpd -u ntp:ntp -g
同时在三个节点执行命令 (date %s.%n
),查看是否存在时间差。
如 ntpd 同步异常,可以将三个节点统一修改同步源:
sudo mv /etc/ntp.conf /etc/ntp.conf.bak0320
sudo vi /etc/ntp.conf
driftfile /var/lib/ntp/drift
pidfile /var/run/ntpd.pid
logfile /var/log/ntp.log
restrict default kod nomodify notrap nopeer noquery
restrict -6 default kod nomodify notrap nopeer noquery
restrict 127.0.0.1
server 127.127.1.0
fudge 127.127.1.0 stratum 10
server ntp.aliyun.com iburst minpoll 4 maxpoll 10
restrict ntp.aliyun.com nomodify notrap nopeer noquery
sudo systemctl restart ntpd
systemctl status ntpd
timedatectl
等 5s 再次检查后,发现 ntpd 同步成功:
[gbase@gbase8c_1 ~]$ timedatectl
local time: mon 2023-03-20 21:34:06 cst
universal time: mon 2023-03-20 13:34:06 utc
rtc time: mon 2023-03-20 13:34:06
time zone: asia/shanghai (cst, 0800)
ntp enabled: yes
ntp synchronized: yes
rtc in local tz: no
dst active: n/a
[gbase@gbase8c_1 ~]$
再次检查时间差,发现小于 1:
再次执行安装命令,即可安装成功!
ps. 特殊情况下,如果是内网,还没有 ntp 服务器的情况下,有一种取巧的办法可以快速使三个节点时间保持一致,
三个节点同时执行命令:
sudo timedatectl set-ntp no ; sudo timedatectl set-time "2023-03-20 21:18:45" ; sudo timedatectl set-ntp yes ; date %s.%n
2. gsql: command not found…
- 现象:
安装完成后,直接使用 gsql 提示命令未找到。同:快速搭建 gbase 8c 集群环境 – 异常错误2
[gbase@gbase8c_1 script]$ gsql -d postgres -p 5432
bash: gsql: command not found...
- 原因:
环境变量未生效。
- 解决:
退出当前用户,再重新进入 gbase 用户即可。或直接引入环境变量 (source ~/.bashrc
)。
3. 当前版本不支持 redhat
从配置文件 /home/gbase/gbase_package/package_info.json
可以看出当前版本的 gbase 8c 支持 centos、麒麟、统信、欧拉、debian 操作系统,但尚未适配 redhat。
4. 真的有个包叫 patch
- 现象:
安装时遇到报错:
host localhost install or upgrade dependency ['patch': none] failed!
- 解决:
安装依赖包patch
即可解决!
sudo yum install -y patch
该程序将 diff 文件应用于原始文件。diff 命令用于比较原始文件和已更改的文件。diff列出了对文件所做的更改。
其他所需依赖包可参考: 快速搭建 gbase 8c 集群环境 – 确认依赖包已经安装
5. “failed to obtain host name.”
- 现象:
host name 获取失败。
-
分析:
用命令hostname
采集主机名时失败,可能是 hostname 设定问题,也可能是 gbase.yaml 中的 ip 配置项错填。 -
解决:
hostnamectl
重新检查主机名。- 仔细检查 gbase.yaml 中的各个 ip 是否正确填写。
6. semmni 信号量不足问题
-
现象:
安装过程中可能报错提示 “check install env and os setting on systemwide basis, the maximum number of semmni is not correct. the current semmni value is: 128. please check it…” -
解决:
执行sudo vi /etc/sysctl.conf
配置系统内核参数, 避免信号量不足无法初始化。
kernel.sem = 40960 2048000 40960 20480
7. 机器重启后可能会遇到 80000301 错误
错误样式:
$ gha_ctl monitor -l http://192.168.8.31:2379
{
"ret":80000301,
"msg":"transport endpoint unreach"
}
分析原因:
- 机器是否使用固定ip,不要使用自动分配ip,ip地址变化会影响服务正常启动。
- 确认时间是否同步,三台机器是否有时间差。确认ntpd服务运行情况。
- 确认etcd服务运行情况。
[gbase@gbase8c_1 ~]$ systemctl status etcd
● etcd.service - etcd server
loaded: loaded (/usr/lib/systemd/system/etcd.service; enabled; vendor preset: disabled)
active: active (running) since wed 2023-03-15 00:53:29 cst; 2 weeks 0 days ago
main pid: 9095 (etcd)
cgroup: /system.slice/etcd.service
└─9095 /usr/bin/etcd --name=node_0 --data-dir=/var/lib/etcd/default.etcd --listen-client-urls=http://192.168.8.31:2379
[gbase@gbase8c_1 ~]$
9999. 其他可能存在的问题
- 互信未正确配置
解决办法:
检查或重新配置互信:参考 快速搭建 gbase 8c 集群环境 – 3. 配置 gbase 用户 ssh 互信
- 安装使用的 gbase.yaml 文件存在格式问题(如,对齐问题,漏、错填配置项)
ghaexecuteerror[status.enotexist.code()], status.enotexist.msg(config_file))