====== [TROUBLESHOOT] PG_DEGRADED: inactive ======
^ Documentation ^|
^Name:| [TROUBLESHOOT] PG_DEGRADED: inactive |
^Description:| how to solve this "issue" |
^Modification date :| 25/07/2019|
^Owner:|dodger|
^Notify changes to:|Owner |
^Tags:|ceph, object storage |
^Scalate to:|The_fucking_bofh|
====== The errors ======
HEALTH_WARN Reduced data availability: 40 pgs inactive; Degraded data redundancy: 52656/2531751 objects degraded (2.080%), 30 pgs degraded, 780 pgs undersized
PG_AVAILABILITY Reduced data availability: 40 pgs inactive
pg 24.1 is stuck inactive for 57124.776905, current state undersized+peered, last acting [16]
pg 24.3 is stuck inactive for 57196.756183, current state undersized+peered, last acting [14]
pg 24.15 is stuck inactive for 57196.769225, current state undersized+peered, last acting [6]
pg 24.22 is stuck inactive for 57124.781368, current state undersized+peered, last acting [18]
pg 24.2a is stuck inactive for 57124.776592, current state undersized+peered, last acting [16]
pg 26.39 is stuck inactive for 57148.799116, current state undersized+peered, last acting [16]
pg 27.13 is stuck inactive for 57148.794318, current state undersized+degraded+peered, last acting [10]
pg 27.1c is stuck inactive for 57196.754097, current state undersized+degraded+peered, last acting [16]
pg 27.22 is stuck inactive for 57124.769972, current state undersized+degraded+peered, last acting [10]
...
...
PG_DEGRADED Degraded data redundancy: 52656/2531751 objects degraded (2.080%), 30 pgs degraded, 780 pgs undersized
pg 29.5b is stuck undersized for 57219.217454, current state active+undersized+remapped, last acting [6,14]
pg 29.5c is stuck undersized for 57110.686713, current state active+undersized+remapped, last acting [12,2]
pg 29.5d is stuck undersized for 57131.448252, current state active+undersized+remapped, last acting [8,10]
pg 29.5e is stuck undersized for 57154.989293, current state active+undersized+remapped, last acting [14,18]
pg 29.5f is stuck undersized for 57194.741017, current state active+undersized+remapped, last acting [6,16]
pg 29.60 is stuck undersized for 57170.144684, current state active+undersized+remapped, last acting [0,10]
pg 29.63 is stuck undersized for 57147.771698, current state active+undersized+remapped, last acting [10,0]
...
...
avmlp-osm-001 /var/log/ceph # ceph -s 0 0 0 0 181 active+clean+remapped 15h 4052'181 4176:1121 [8]p8 [8,16,18]p8 2019-07-24 18:32:14.577516 2019-07-18 09:13:17.981502
cluster:
id: aefcf554-f949-4457-a049-0bfb432e40c4
health: HEALTH_WARN
Reduced data availability: 40 pgs inactive
Degraded data redundancy: 52656/2531751 objects degraded (2.080%), 30 pgs degraded, 780 pgs undersized
services:
mon: 6 daemons, quorum avmlp-osm-001,avmlp-osm-002,avmlp-osm-003,avmlp-osm-004,avmlp-osm-006,avmlp-osm-005 (age 22h)
mgr: avmlp-osm-002.ciberterminal.net(active, since 23h), standbys: avmlp-osm-004.ciberterminal.net, avmlp-osm-003.ciberterminal.net, avmlp-osm-001.ciberterminal.net
mds: cephfs:1 {0=avmlp-osfs-002.ciberterminal.net=up:active} 3 up:standby
osd: 20 osds: 20 up (since 15h), 20 in (since 6w); 1132 remapped pgs
rgw: 1 daemon active (avmlp-osgw-004.ciberterminal.net)
data:
pools: 10 pools, 1232 pgs
objects: 843.92k objects, 49 GiB
usage: 264 GiB used, 40 TiB / 40 TiB avail
pgs: 3.247% pgs not active
52656/2531751 objects degraded (2.080%)
1635162/2531751 objects misplaced (64.586%)
740 active+undersized+remapped
392 active+clean+remapped
60 active+clean
30 undersized+degraded+peered
10 undersized+peered
Official Documentation: [[http://docs.ceph.com/docs/master/rados/operations/health-checks/#pg-degraded]]
====== The solution ======
Force ceph to move... I think that this is not a real solution but a patch...\\
Just change pool ''size'' and ''min_size'' forcing ceph to re-balance data:
\\
See actual values:
ceph osd pool ls detail
\\
And increase them by 1 for example:
for POOL_NAME in $(ceph osd pool ls) ; do let SIZE=$(ceph osd pool get ${POOL_NAME} size|awk '{print $2}') ; let MINSIZE=$(ceph osd pool get ${POOL_NAME} min_size|awk '{print $2}') ; let SIZE++ ; let MINSIZE++ ;echo "ceph osd set ${POOL_NAME} size ${SIZE} min_size ${MINSIZE}" ; done
Upper one-liner-of-the-dead only shows the command to execute, it does not execute the command :-P