起因
在使用k8s对接ceph-csi时,按照ceph官方流程部署完毕后,在执行kubectl apply -f pvc.yaml 时,相关pod一直处于pending状态。
使用命令kubectl describe pvc
会看到报错原因,其中就有failed to get connection: connecting failed: rados: ret=13, Permission denied
使用命令查看日志kubectl logs csi-rbdplugin-provisioner-7695454cb7-7zp5g -c csi-rbdplugin
I1124 06:54:18.856548 1 utils.go:159] ID: 93 Req-ID: pvc-0bde08db-5166-4439-a54e-31b405c98e40 GRPC call: /csi.v1.Controller/CreateVolume
I1124 06:54:18.858088 1 utils.go:160] ID: 93 Req-ID: pvc-0bde08db-5166-4439-a54e-31b405c98e40 GRPC request: {"capacity_range":{"required_bytes":1073741824},"name":"pvc-0bde08db-5166-4439-a54e-31b405c98e40","parameters":{"clusterID":"f38a95d3-b932-480c-89bc-161a4f81c160","imageFeatures":"layering","pool":"kubernetes"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Block":{}},"access_mode":{"mode":1}}]}
I1124 06:54:18.858323 1 rbd_util.go:722] ID: 93 Req-ID: pvc-0bde08db-5166-4439-a54e-31b405c98e40 setting disableInUseChecks on rbd volume to: false
E1124 06:54:18.868348 1 controllerserver.go:234] ID: 93 Req-ID: pvc-0bde08db-5166-4439-a54e-31b405c98e40 failed to connect to volume : failed to get connection: connecting failed: rados: ret=13, Permission denied
E1124 06:54:18.868449 1 utils.go:163] ID: 93 Req-ID: pvc-0bde08db-5166-4439-a54e-31b405c98e40 GRPC error: rpc error: code = Internal desc = failed to get connection: connecting failed: rados: ret=13, Permission denied
注意 csi-rbdplugin-provisioner
有多个pod,不一定在哪个pod内会有以上日志,因此不要只看一个pod
原因
本次安装ceph集群时,使用版本如下:
[root@node1 ceph]# ceph --version
ceph version 14.2.22 (ca74598065096e6fcbd8433c8779a2be0c889351) nautilus (stable)
k8s版本1.17.4如下:
ceph-csi版本为3.0.0
在安装完ceph集群后,执行ceph -s
看到一行警告mon is allowing insecure global_id reclaim
,随后查了解决办法,ceph config set mon auth_allow_insecure_global_id_reclaim false
禁用安全模式,执行之后,不再提示警告,恰恰是因为这个操作,导致k8s连不上ceph集群。
查看ceph日志,具体文件是ceph-mon.node1.log
,安装hostname不同应该也会不一样。其中大量出现cephx server client.kubernetes: attempt to reclaim global_id 13691 without presenting ticket
解决
知道原因后,重新打开安全模式ceph config set mon auth_allow_insecure_global_id_reclaim true
,之后再次apply pvc,问题解决