====== [TROUBLESHOOT] PG_DEGRADED: inactive ====== ^ Documentation ^| ^Name:| [TROUBLESHOOT] PG_DEGRADED: inactive | ^Description:| how to solve this "issue" | ^Modification date :| 25/07/2019| ^Owner:|dodger| ^Notify changes to:|Owner | ^Tags:|ceph, object storage | ^Scalate to:|The_fucking_bofh| ====== The errors ====== HEALTH_WARN Reduced data availability: 40 pgs inactive; Degraded data redundancy: 52656/2531751 objects degraded (2.080%), 30 pgs degraded, 780 pgs undersized PG_AVAILABILITY Reduced data availability: 40 pgs inactive pg 24.1 is stuck inactive for 57124.776905, current state undersized+peered, last acting [16] pg 24.3 is stuck inactive for 57196.756183, current state undersized+peered, last acting [14] pg 24.15 is stuck inactive for 57196.769225, current state undersized+peered, last acting [6] pg 24.22 is stuck inactive for 57124.781368, current state undersized+peered, last acting [18] pg 24.2a is stuck inactive for 57124.776592, current state undersized+peered, last acting [16] pg 26.39 is stuck inactive for 57148.799116, current state undersized+peered, last acting [16] pg 27.13 is stuck inactive for 57148.794318, current state undersized+degraded+peered, last acting [10] pg 27.1c is stuck inactive for 57196.754097, current state undersized+degraded+peered, last acting [16] pg 27.22 is stuck inactive for 57124.769972, current state undersized+degraded+peered, last acting [10] ... ... PG_DEGRADED Degraded data redundancy: 52656/2531751 objects degraded (2.080%), 30 pgs degraded, 780 pgs undersized pg 29.5b is stuck undersized for 57219.217454, current state active+undersized+remapped, last acting [6,14] pg 29.5c is stuck undersized for 57110.686713, current state active+undersized+remapped, last acting [12,2] pg 29.5d is stuck undersized for 57131.448252, current state active+undersized+remapped, last acting [8,10] pg 29.5e is stuck undersized for 57154.989293, current state active+undersized+remapped, last acting [14,18] pg 29.5f is stuck undersized for 57194.741017, current state active+undersized+remapped, last acting [6,16] pg 29.60 is stuck undersized for 57170.144684, current state active+undersized+remapped, last acting [0,10] pg 29.63 is stuck undersized for 57147.771698, current state active+undersized+remapped, last acting [10,0] ... ... avmlp-osm-001 /var/log/ceph # ceph -s 0 0 0 0 181 active+clean+remapped 15h 4052'181 4176:1121 [8]p8 [8,16,18]p8 2019-07-24 18:32:14.577516 2019-07-18 09:13:17.981502 cluster: id: aefcf554-f949-4457-a049-0bfb432e40c4 health: HEALTH_WARN Reduced data availability: 40 pgs inactive Degraded data redundancy: 52656/2531751 objects degraded (2.080%), 30 pgs degraded, 780 pgs undersized services: mon: 6 daemons, quorum avmlp-osm-001,avmlp-osm-002,avmlp-osm-003,avmlp-osm-004,avmlp-osm-006,avmlp-osm-005 (age 22h) mgr: avmlp-osm-002.ciberterminal.net(active, since 23h), standbys: avmlp-osm-004.ciberterminal.net, avmlp-osm-003.ciberterminal.net, avmlp-osm-001.ciberterminal.net mds: cephfs:1 {0=avmlp-osfs-002.ciberterminal.net=up:active} 3 up:standby osd: 20 osds: 20 up (since 15h), 20 in (since 6w); 1132 remapped pgs rgw: 1 daemon active (avmlp-osgw-004.ciberterminal.net) data: pools: 10 pools, 1232 pgs objects: 843.92k objects, 49 GiB usage: 264 GiB used, 40 TiB / 40 TiB avail pgs: 3.247% pgs not active 52656/2531751 objects degraded (2.080%) 1635162/2531751 objects misplaced (64.586%) 740 active+undersized+remapped 392 active+clean+remapped 60 active+clean 30 undersized+degraded+peered 10 undersized+peered Official Documentation: [[http://docs.ceph.com/docs/master/rados/operations/health-checks/#pg-degraded]] ====== The solution ====== Force ceph to move... I think that this is not a real solution but a patch...\\ Just change pool ''size'' and ''min_size'' forcing ceph to re-balance data: \\ See actual values: ceph osd pool ls detail \\ And increase them by 1 for example: for POOL_NAME in $(ceph osd pool ls) ; do let SIZE=$(ceph osd pool get ${POOL_NAME} size|awk '{print $2}') ; let MINSIZE=$(ceph osd pool get ${POOL_NAME} min_size|awk '{print $2}') ; let SIZE++ ; let MINSIZE++ ;echo "ceph osd set ${POOL_NAME} size ${SIZE} min_size ${MINSIZE}" ; done Upper one-liner-of-the-dead only shows the command to execute, it does not execute the command :-P