记Solaris下一个rac 异常hang故障

news/2024/7/8 0:12:12

故障现象

rac 某一节点hang住,另一节点也不可用,重启hang住节点恢复。该故障出现了多次,平均1月出现一次。

故障原因

查看cssd.log

2021-05-22 13:53:50.565: [GIPCXCPT][5] gipclibMalloc: failed to allocate 10376 bytes, cowork ffffffff7cae18e8, ret gipcretOutOfMemory (28)
2021-05-22 13:53:50.566: [GIPCXCPT][5] gipcmodNetworkAttrEndpUserData: failed to read osd id for endp 104f9c390 [00000000095fea12] { gipcEndpoint : localAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_hnyx-db1_)(GIPCID=00000000-00000000-1516))', remoteAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_hnyx-db1_)(GIPCID=00000000-00000000-0))', numPend 0, numReady 0, numDone 1, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef 100b84550, ready 1, wobj 104f35490, sendp 104e50050flags 0x8060371e, usrFlags 0x14000 }
2021-05-22 13:53:50.566: [GIPCXCPT][5] gipcmodNetworkAttrEndpUserData: slos op  :  sgipcnDSAttrEndpUserData
2021-05-22 13:53:50.566: [GIPCXCPT][5] gipcmodNetworkAttrEndpUserData: slos dep :  Operation not supported (48)
2021-05-22 13:53:50.566: [GIPCXCPT][5] gipcmodNetworkAttrEndpUserData: slos loc :  getpeerucred
2021-05-22 13:53:50.566: [GIPCXCPT][5] gipcmodNetworkAttrEndpUserData: slos info:  sid 0, failed to get creds
2021-05-22 13:53:50.585: [    CSSD][5]###################################
2021-05-22 13:53:50.585: [    CSSD][5]clssscExit: CSSD signal 11 in thread GMClientListener
2021-05-22 13:53:50.585: [    CSSD][5]###################################
2021-05-22 13:53:50.585: [    CSSD][5](:CSSSC00012:)clssscExit: A fatal error occurred and the CSS daemon is terminating abnormally
2021-05-22 13:53:50.586: [    CSSD][5]

----- Call Stack Trace -----
2021-05-22 13:53:50.586: [    CSSD][5]calling              call     entry                argument values in hex
2021-05-22 13:53:50.586: [    CSSD][5]location             type     point                (? means dubious value)
2021-05-22 13:53:50.586: [    CSSD][5]-------------------- -------- -------------------- ----------------------------
2021-05-22 13:53:50.635: [    CSSD][5]mmap(offset=3137536, len=8192) failed with errno=11 for the file /export/home/grid/bin/ocssd.bin
2021-05-22 13:53:50.636: [    CSSD][5]mmap(offset=3137536, len=8192) failed with errno=11 for the file /export/home/grid/bin/ocssd.bin
2021-05-22 13:53:50.636: [    CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so
2021-05-22 13:53:50.636: [    CSSD][5]mmap(offset=50946048, len=16384) failed with errno=11 for the file /export/home/grid/lib/libclntsh.so.11.1
2021-05-22 13:53:50.636: [    CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so
2021-05-22 13:53:50.637: [    CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so
2021-05-22 13:53:50.637: [    CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so
2021-05-22 13:53:50.637: [    CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so
2021-05-22 13:53:50.637: [    CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so
2021-05-22 13:53:50.637: [    CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so
2021-05-22 13:53:50.637: [    CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so
2021-05-22 13:53:50.637: [    CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so
2021-05-22 13:53:50.638: [    CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so
2021-05-22 13:53:50.638: [    CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so
2021-05-22 13:53:50.638: [    CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so
2021-05-22 13:53:50.638: [    CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so
2021-05-22 13:53:50.638: [    CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so
2021-05-22 13:53:50.638: [    CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so
2021-05-22 13:53:50.639: [    CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so
2021-05-22 13:53:50.639: [    CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so
2021-05-22 13:53:50.639: [    CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so
2021-05-22 13:53:50.639: [    CSSD][5]mmap(offset=16818176, len=8192) failed with errno=11 for the file /export/home/grid/lib/libhasgen11.so

注意:2021-05-22 13:53:50.565: [GIPCXCPT][5] gipclibMalloc: failed to allocate 10376 bytes, cowork ffffffff7cae18e8, ret gipcretOutOfMemory (28)

对比对比故障现象,查找mos最接近为Document 2113841.1,gipcd stack内存不足。

但是 Document 2113841.1是aix环境。该环境为solaris。决定死马当活马医。

解决办法

Document 2113841.1文档中该故障解决为,解除相关limits的限制,包括grid与root用户

查询到root下stack的值偏小(8192),不是无限制,建议对其进行修改

故障解决,未再出现。

学习原理,积累工具。孵化思路,下笔有道。


http://lihuaxi.xjx100.cn/news/1745784.html

相关文章

IDEA运行报错Command line is too long

报错内容: Error running ServiceStarter: Command line is too long. Shorten command line for ServiceStarter or also for Application default configuration. 解决: 1. 2. 3.

软件测试 —— 移动端测试

1. 移动端 指移动设备(如智能手机、平板电脑、智能手表等)上的操作系统和应用程序。移动设备具有便携性和多功能性,可以随时随地连接互联网,提供丰富的应用和服务。 2. 移动端应用分类 (1) 原生应用(Native App&…

论文 辅助笔记:t2vec train.py

1 train 1.1 加载training和validation数据 def train(args):logging.basicConfig(filenameos.path.join(args.data, "training.log"), levellogging.INFO)设置了日志的基本配置。将日志信息保存到名为 "training.log" 的文件中日志的级别被设置为 INFO&…

[双指针] (四) LeetCode 18.四数之和

[双指针] (四) LeetCode 18.四数之和 文章目录 [双指针] (四) LeetCode 18.四数之和题目解析解题思路代码实现总结 18. 四数之和 题目解析 (1) 从一个数组中找一个目标值target (2) target nums[a] nums[b] nums[c] nums[d] 解题思路 和上一道题三数之和一样, 我们把四…

Linux 中 initcall 机制详解

源码基于:Linux 5.4 0. 前言 Linux 对驱动程序提供静态编译进内核和动态加载两种方式,当采用静态方式时,开发者如果想要在系统中启动这个驱动通常调用类似 xxx_init() 接口。 最直观的做法:开发者试图添加一个驱动初始化程序时&…

【机器学习】几种常用的机器学习调参方法

在机器学习中,模型的性能往往受到模型的超参数、数据的质量、特征选择等因素影响。其中,模型的超参数调整是模型优化中最重要的环节之一。超参数(Hyperparameters)在机器学习算法中需要人为设定,它们不能直接从训练数据…

flex容器布局flex-basis实现button挪动

flex容器布局 背景介绍 我准备完成一个弹性盒子的布局&#xff0c;html如下&#xff1a; <div class"slideContainer"> <h1>选择时间段</h1> <br/> <br/> <p class"priceTxt">时间范围:<span id"rangeText…

提示3D标题编辑器仍在运行如何解决 3D标题编辑器怎么使用

品牌型号&#xff1a;联想GeekPro 2020 系统&#xff1a;Windows 10 64位专业版 软件版本&#xff1a;会声会影2023旗舰版 3D标题因其独特的表现形式和多变的画面效果&#xff0c;被广泛应用于节目片头、宣传片、开幕式等诸多场景之中。掌握3D标题的使用技巧&#xff0c;能够…