零基础OceanBase数据库入门(7):集群性能诊断

张开发
2026/5/4 8:32:48 15 分钟阅读
零基础OceanBase数据库入门(7):集群性能诊断
上一篇我们掌握了sys租户的三种连接方式作为集群的“总管理员”sys租户最核心的能力之一就是集群性能诊断。不管是日常巡检、故障排查还是容量规划都离不开性能诊断。本篇完全基于OceanBase官方demo带你用黑屏命令快速诊断集群CPU、内存、磁盘、IO四大核心指标一键定位高负载租户、异常会话、合并转储状态新手照着敲就能用。一、先搞懂性能诊断看什么OceanBase集群性能核心看4大核心资源CPU是否过载、哪个租户占比高内存MemStore占用、是否触发冻结、各模块内存分布磁盘数据盘/日志盘使用率、租户磁盘占用IO/后台任务合并、转储、备份、导入导出是否影响性能所有诊断操作都在sys租户下执行先登录# 登录sys租户密码替换为你的obclient -h127.0.0.1 -P2881 -urootsys -pOwoAyDN2uLeDIdn62gmn -Doceanbase -A二、核心诊断1服务器资源总览最常用直接查看OBServer的CPU、内存、数据盘、日志盘分配与使用率一眼看懂集群负载。-- 查看节点资源总览CPU/内存/磁盘使用率select svr_ip as ip, CONCAT(cpu_capacity_max, C) as cpu_total, CONCAT(cpu_assigned_max, C) as cpu_assigned, CONCAT(TRUNCATE(mem_capacity / 1024 / 1024 / 1024, 2), G) as mem_total, CONCAT(TRUNCATE(mem_assigned / 1024 / 1024 / 1024, 2), G) as mem_assigned, CONCAT(TRUNCATE(data_disk_capacity / 1024 / 1024 / 1024, 2),G) as total_data_size, CONCAT(TRUNCATE(log_disk_capacity / 1024 / 1024 / 1024, 2), G) as total_log_size, CONCAT(TRUNCATE((cpu_assigned_max / cpu_capacity_max) * 100, 2),%) as cpu_assigned_percent, CONCAT(TRUNCATE((mem_assigned / mem_capacity) * 100, 2),%) as mem_assigned_percent, CONCAT(TRUNCATE((data_disk_in_use / data_disk_capacity) * 100, 2),%) as used_data_percent, CONCAT(TRUNCATE((log_disk_in_use / log_disk_capacity) * 100, 2),%) as used_log_percentfrom oceanbase.GV$OB_SERVERS;结果解读:cpu_assigned_percentCPU分配率过高会导致业务卡顿mem_assigned_percent内存分配率超过80%需警惕used_data_percent数据盘使用率超过85%必须扩容used_log_percent日志盘使用率过高会影响写入三、核心诊断 2磁盘使用率细分排查3.1 数据盘使用详情-- 查看数据盘总使用情况select CONCAT(TRUNCATE(data_disk_capacity/1024/1024/1024,2),G) as total_data_size, CONCAT(TRUNCATE(data_disk_in_use/1024/1024/1024,2),G) as used_data_size, CONCAT(TRUNCATE((data_disk_in_use/data_disk_capacity) * 100, 2),%) as used_data_percent, CONCAT(TRUNCATE((data_disk_capacity - data_disk_in_use)/1024/1024/1024,2),G) as free_data_size, CONCAT(TRUNCATE(((data_disk_capacity - data_disk_in_use)/data_disk_capacity)*100,2),%) as free_data_percentfrom oceanbase.GV$OB_SERVERS;3.2 日志盘使用详情-- 查看日志盘总使用情况select CONCAT(TRUNCATE(log_disk_capacity/1024/1024/1024,2),G) as total_log_size, CONCAT(TRUNCATE(log_disk_in_use/1024/1024/1024,2),G) as used_log_size, CONCAT(TRUNCATE(log_disk_assigned/1024/1024/1024,2),G) as assigned_log_size, CONCAT(TRUNCATE((log_disk_in_use/log_disk_capacity) * 100, 2),%) as used_log_percent, CONCAT(TRUNCATE((log_disk_capacity - log_disk_assigned - log_disk_in_use)/1024/1024/1024,2),G) as free_log_size, CONCAT(TRUNCATE(((log_disk_capacity - log_disk_assigned - log_disk_in_use)/log_disk_capacity)*100,2),%) as free_log_percentfrom oceanbase.GV$OB_SERVERS;磁盘高负载优化建议:数据盘满扩容磁盘、清理历史数据、分区归档日志盘满清理过期日志、调整日志保留策略四、核心诊断3定位高CPU占用租户CPU高是最常见故障用这条SQL一键定位耗CPU的租户每100value1核CPU。-- 查找CPU占用超过10核的租户value1000select * from gv$sysstat where con_id1000 and name cpu usage and value 1000;CPU高负载优化建议优化高消耗SQL对SQL做限流扩容租户CPU配额错峰执行批处理任务五、核心诊断4租户级资源占用排行查看所有租户的CPU、内存、磁盘占用定位资源大户。-- 查看各租户资源使用详情SELECT COALESCE(t1.TENANT_ID, -1) as TENANT_ID, TENANT_NAME, SUM(MAX_CPU) AS MAX_CPU, SUM(MIN_CPU) AS MIN_CPU, SUM(MAX_MEMORY) AS MAX_MEMORY, SUM(MIN_MEMORY) AS MIN_MEMORY, SUM(MAX_IOPS) AS MAX_IOPS, SUM(MIN_IOPS) AS MIN_IOPS, SUM(DATA_DISK_IN_USE) AS DATA_DISK_IN_USE, SUM(LOG_DISK_IN_USE) AS LOG_DISK_IN_USEFROM ( SELECT T1.UNIT_ID, T1.SVR_IP, T1.SVR_PORT, T1.ZONE, T2.TENANT_ID, T1.MIN_CPU, T1.MAX_CPU, T1.MIN_MEMORY, T1.MAX_MEMORY, T1.MIN_IOPS, T1.MAX_IOPS, T1.DATA_DISK_IN_USE, T1.LOG_DISK_IN_USE FROM (SELECT UNIT_ID,SVR_IP,SVR_PORT,ZONE,SUM(MIN_CPU) MIN_CPU,SUM(MAX_CPU) MAX_CPU, SUM(MEMORY_SIZE) MIN_MEMORY,SUM(MEMORY_SIZE) MAX_MEMORY,SUM(MIN_IOPS) MIN_IOPS, SUM(MAX_IOPS) MAX_IOPS,SUM(DATA_DISK_IN_USE) DATA_DISK_IN_USE, SUM(LOG_DISK_IN_USE) LOG_DISK_IN_USE FROM oceanbase.GV$OB_UNITS GROUP BY UNIT_ID) T1 JOIN oceanbase.DBA_OB_UNITS T2 ON T1.UNIT_ID T2.UNIT_ID ) t1JOIN oceanbase.DBA_OB_TENANTS t2 ON t1.TENANT_ID t2.TENANT_IDWHERE TENANT_TYPE IN (SYS, USER)GROUP BY TENANT_ID;六、核心诊断5内存使用深度排查6.1 MemStore内存使用MemStore是写入内存区域占比过高会触发冻结影响写入。-- 查看各租户MemStore使用情况select /* READ_CONSISTENCY(WEAK),query_timeout(100000000) */ TENANT_ID, SVR_IP, round(ACTIVE_SPAN / 1024 / 1024 / 1024, 2) ACTIVE_GB, round(MEMSTORE_USED / 1024 / 1024 / 1024, 2) TOTAL_GB, round(FREEZE_TRIGGER / 1024 / 1024 / 1024, 2) FREEZE_TRIGGER_GB, round(MEMSTORE_USED / FREEZE_TRIGGER * 100, 2) percent_trigger, round(MEMSTORE_LIMIT / 1024 / 1024 / 1024, 2) MEM_LIMIT_GBfrom V$OB_MEMSTOREwhere tenant_id 1000 or TENANT_ID 1order by tenant_id, TOTAL_GB desc;6.2 内存模块占用Top10-- 查看内存占用最大的10个模块select * from V$OB_MEMORY order by hold desc limit 10;内存高负载优化建议:扩大租户内存配额调高转储线程数加快内存释放开启写入限速避免内存暴涨应用侧降级批任务七、核心诊断6会话与连接排查7.1 各租户活跃会话数-- 查看各租户活跃会话数select case when cnt is null then 0 else cnt end as cnt, tenant_name, tenant_idfrom (select DBA_OB_TENANTS.tenant_name,DBA_OB_TENANTS.tenant_id,cnt from DBA_OB_TENANTS left join (select count(state ACTIVE OR NULL) cnt,tenant tenant_name from GV$OB_PROCESSLIST where svr_ip 127.0.0.1 and svr_port 2882 group by tenant) t1 on DBA_OB_TENANTS.tenant_name t1.tenant_name where DBA_OB_TENANTS.tenant_type META) t2;7.2 实时非休眠会话-- 查看活跃非睡眠会话select user,host,state,count(*)from GV$OB_PROCESSLISTwhere state SLEEPgroup by user,host,state order by host limit 50;八、核心诊断7合并与转储状态后台核心任务合并/转储是OB后台关键任务运行中会占用大量IO/CPU需排查状态。-- sys租户查看所有租户合并状态SELECT tenant_id, global_broadcast_scn AS broadcast_scn, is_error AS error, status, frozen_scn, last_scn, is_suspended AS suspend, info, start_time, last_finish_timeFROM CDB_OB_MAJOR_COMPACTION;合并高负载优化建议暂停正在执行的合并暂停备份、导入导出任务调低转储 / 迁移并发数错峰执行后台任务九、辅助诊断节点、资源池、RS任务9.1 查看OBServer运行状态SELECT zone, svr_ip, svr_port, sql_port AS inner_port, with_rootserver, UPPER(status) AS status, build_version, stop_time * 1000000 AS stop_time, start_service_time * 1000000 AS start_service_time, last_offline_time * 1000000 AS last_offline_timeFROM oceanbase.DBA_OB_SERVERS;9.2 查看资源池与Unit配置-- 查看资源池、Unit规格详情SELECT t1.MODIFY_TIME AS UPDATE_TIME, t1.RESOURCE_POOL_ID, t1.NAME, t1.UNIT_COUNT, t1.ZONE_LIST, t1.TENANT_ID, t2.MAX_CPU, t2.MIN_CPU, CONCAT(TRUNCATE(t2.MEMORY_SIZE/1024/1024/1024,2),G) MEM_SIZE, CONCAT(TRUNCATE(t2.LOG_DISK_SIZE/1024/1024/1024,2),G) LOG_DISK_SIZEFROM oceanbase.DBA_OB_RESOURCE_POOLS t1JOIN oceanbase.DBA_OB_UNIT_CONFIGS t2 ON t1.UNIT_CONFIG_ID t2.UNIT_CONFIG_ID;十、小结性能诊断标准化流程新手做OB性能诊断按这个流程来快速定位问题看总览执行资源总览SQL看CPU/内存/磁盘使用率找租户定位高负载租户确认是哪个业务导致查内存看MemStore是否占满是否触发冻结查会话看是否有大量活跃会话、慢查询查后台确认合并/转储/备份是否在运行做优化对应官方建议扩容、限流、优化SQL往期回顾零基础OceanBase数据库入门搭建你的第一个OceanBase数据库零基础OceanBase数据库入门(2查看集群基本信息零基础OceanBase数据库入门3创建租户零基础OceanBase数据库入门4创建MySQL模式数据库零基础OceanBase数据库入门5MySQL模式用户创建与权限管理零基础OceanBase数据库入门6连接OB集群

更多文章