Queries que se han de lanzar desde el sistema de monitorización para gestionar eventos “críticos”.
Estas queries significarían un CRITICAL
(24×7)
Bloqueos dentro de la base de datos (un update bloquea otro):
SELECT blocked_locks.pid AS blocked_pid, blocked_activity.usename AS blocked_user, blocking_locks.pid AS blocking_pid, blocking_activity.usename AS blocking_user, blocked_activity.query AS blocked_statement, blocking_activity.query AS current_statement_in_blocking_process, blocked_activity.application_name AS blocked_application, blocking_activity.application_name AS blocking_application FROM pg_catalog.pg_locks blocked_locks JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid JOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid AND blocking_locks.pid != blocked_locks.pid JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid WHERE NOT blocked_locks.GRANTED;
Hay que montorizar por umbrales (típica monitorización con OK/WARNING/CRITICAL):
WITH max_age AS ( SELECT 2000000000 AS max_old_xid , setting AS autovacuum_freeze_max_age FROM pg_catalog.pg_settings WHERE name = 'autovacuum_freeze_max_age' ) , per_database_stats AS ( SELECT datname , m.max_old_xid::INT , m.autovacuum_freeze_max_age::INT , age(d.datfrozenxid) AS oldest_current_xid FROM pg_catalog.pg_database d JOIN max_age m ON (TRUE) WHERE d.datallowconn ) SELECT MAX(oldest_current_xid) AS oldest_current_xid , MAX(ROUND(100*(oldest_current_xid/max_old_xid::FLOAT))) AS percent_towards_wraparound , MAX(ROUND(100*(oldest_current_xid/autovacuum_freeze_max_age::FLOAT))) AS percent_towards_emergency_autovac FROM per_database_stats
Excelente explicación del problema aquí.
Hay que montorizar por umbrales (típica monitorización con OK/WARNING/CRITICAL):
SELECT SUM(numbackends) FROM pg_stat_database;
SELECT COUNT(datid) AS CURRENT, ( SELECT setting AS mc FROM pg_settings WHERE name = 'max_connections') AS mc, d.datname FROM pg_database d LEFT JOIN pg_stat_activity s ON (s.datid = d.oid) GROUP BY 2, 3 ORDER BY datname ;
Segundos de las queries en ejecución (típica monitorización con OK/WARNING/CRITICAL):
SELECT datname, datid, pid AS pid, usename, client_addr, query AS query, state AS state, CASE WHEN client_port < 0 THEN 0 ELSE client_port END AS client_port, COALESCE(ROUND(EXTRACT(epoch FROM now() - query_start)), 0) AS seconds FROM pg_stat_activity WHERE (query_start IS NOT NULL AND (state NOT LIKE 'idle%' OR state IS NULL)) AND usename <> 'barman' ORDER BY query_start, pid DESC;
Alertas que se deberían monitorzar 8×5 para su revisión como una tarea del día a día del DBA.
Básico para saber si hay alguna tabla de la que no se esté haciendo vacuum.
Esta query devuelve las tablas de las que no se ha hecho vacuum
en los últimos 7 días.
SELECT schemaname, relname, n_dead_tup, n_live_tup, autovacuum_count, last_vacuum, last_autovacuum, last_autoanalyze, last_analyze FROM pg_stat_all_tables WHERE n_live_tup > 0 AND schemaname NOT IN ('pg_catalog', 'information_schema', 'partman', 'cron') AND last_vacuum>NOW() - INTERVAL '7 days' ;
Para una alerta bastaría con saber schemaname.relname
.
Puede que la tabla tenga algún problema a analizar si el vacuum
está programado y no se está realizando.
Solo en caso de ejecutar tareas programadas con la extensión pg_cron
.
La query se ha de ejecutar sobre la BD configurada en el parámetro cron.database_name
de la extensión:
SELECT * FROM cron.job_run_details WHERE STATUS = 'failed';