linux:ceph:troubleshooting:monitor_crash
Table of Contents
[TROUBLESHOOT] ceph-mon: cant start daemon
Documentation | |
---|---|
Name: | [TROUBLESHOOT] ceph-mon: cant start daemon |
Description: | how to solve this “issue” |
Modification date : | 04/06/2020 |
Owner: | dodger |
Notify changes to: | Owner |
Tags: | ceph, object storage |
Scalate to: | Thefuckingbofh |
The errors
On mon server
This is a summary, the stack is longer.
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.9/rpm/el7/BUILD/ceph-14.2.9/src/mon/AuthMonitor.cc: In fu nction 'virtual void AuthMonitor::update_from_paxos(bool*)' thread 7efccfa0e040 time 2020-06-04 10:30:26.887956 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.9/rpm/el7/BUILD/ceph-14.2.9/src/mon/AuthMonitor.cc: 278: FAILED ceph_assert(ret == 0) ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x7efcc6c7e875] 2: (()+0x253a3d) [0x7efcc6c7ea3d] 3: (AuthMonitor::update_from_paxos(bool*)+0x1b0a) [0x555ef6812f3a] 4: (PaxosService::refresh(bool*)+0x103) [0x555ef68a63a3] 5: (Monitor::refresh_from_paxos(bool*)+0x194) [0x555ef6794514] 6: (Monitor::init_paxos()+0xfc) [0x555ef67947ec] 7: (Monitor::preinit()+0xa32) [0x555ef67b3532] 8: (main()+0x23e2) [0x555ef674cfc2] 9: (__libc_start_main()+0xf5) [0x7efcc2854555] 10: (()+0x2332d0) [0x555ef677e2d0] *** Caught signal (Aborted) ** in thread 7efccfa0e040 thread_name:ceph-mon 2020-06-04 10:30:26.888 7efccfa0e040 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.9/rpm/el7/BUILD/c eph-14.2.9/src/mon/AuthMonitor.cc: In function 'virtual void AuthMonitor::update_from_paxos(bool*)' thread 7efccfa0e040 time 2020-06-04 10:30:26.887956 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.9/rpm/el7/BUILD/ceph-14.2.9/src/mon/AuthMonitor.cc: 278: FAILED ceph_assert(ret == 0)
Keywords are:
'virtual void AuthMonitor::update_from_paxos(bool*)'
On ceph health
mon: 5 daemons, quorum bvmlm-osm-001,bvmlm-osm-003,bvmlm-osm-004,bvmlm-osm-005 (age 2d), out of quorum: bvmlm-osm-002
The solution
Re-deploy the monitor, on any admin node:
ceph-deploy mon destroy bvmlm-osm-002 ceph-deploy mon create bvmlm-osm-002.ciberterminal.net
The Reason
Found on: https://access.redhat.com/solutions/4721981
Quote from there:
It is likely that monitor store.db is corrupted and hence asserts are happening.
linux/ceph/troubleshooting/monitor_crash.txt · Last modified: 2022/02/11 11:36 by 127.0.0.1