# Postgresql: Basic Monitorization [ESP] # Monitorización recomendada (aka DBA monitorization) Queries que se han de lanzar desde el sistema de monitorización para gestionar eventos "críticos". ## Alta Prioridad -> 24*7 Estas queries significarían un `CRITICAL` (24x7) ### Locks Bloqueos dentro de la base de datos (un update bloquea otro): ```sql SELECT blocked_locks.pid AS blocked_pid, blocked_activity.usename AS blocked_user, blocking_locks.pid AS blocking_pid, blocking_activity.usename AS blocking_user, blocked_activity.query AS blocked_statement, blocking_activity.query AS current_statement_in_blocking_process, blocked_activity.application_name AS blocked_application, blocking_activity.application_name AS blocking_application FROM pg_catalog.pg_locks blocked_locks JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid JOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid AND blocking_locks.pid != blocked_locks.pid JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid WHERE NOT blocked_locks.GRANTED; ``` ### TXID Wraparound Hay que montorizar por umbrales (típica monitorización con OK/WARNING/CRITICAL): ```sql WITH max_age AS ( SELECT 2000000000 as max_old_xid , setting AS autovacuum_freeze_max_age FROM pg_catalog.pg_settings WHERE name = 'autovacuum_freeze_max_age' ) , per_database_stats AS ( SELECT datname , m.max_old_xid::int , m.autovacuum_freeze_max_age::int , age(d.datfrozenxid) AS oldest_current_xid FROM pg_catalog.pg_database d JOIN max_age m ON (true) WHERE d.datallowconn ) SELECT max(oldest_current_xid) AS oldest_current_xid , max(ROUND(100*(oldest_current_xid/max_old_xid::float))) AS percent_towards_wraparound , max(ROUND(100*(oldest_current_xid/autovacuum_freeze_max_age::float))) AS percent_towards_emergency_autovac FROM per_database_stats ``` Excelente explicación del problema [aquí](https://www.crunchydata.com/blog/managing-transaction-id-wraparound-in-postgresql). ### Postgres Backend Connections Hay que montorizar por umbrales (típica monitorización con OK/WARNING/CRITICAL): #### Número de conexiones totales ```sql SELECT sum(numbackends) FROM pg_stat_database; ``` #### Número de conexiones detallado por bd ```sql SELECT COUNT(datid) AS CURRENT, ( SELECT setting AS mc FROM pg_settings WHERE name = 'max_connections') AS mc, d.datname FROM pg_database d LEFT JOIN pg_stat_activity s ON (s.datid = d.oid) GROUP BY 2, 3 ORDER BY datname ; ``` ### Query time Segundos de las queries en ejecución (típica monitorización con OK/WARNING/CRITICAL): ```sql SELECT datname, datid, pid AS pid, usename, client_addr, query AS query, state AS state, CASE WHEN client_port < 0 THEN 0 ELSE client_port END AS client_port, COALESCE(ROUND(EXTRACT(epoch FROM now() - query_start)), 0) AS seconds FROM pg_stat_activity WHERE (query_start IS NOT NULL AND (state NOT LIKE 'idle%' OR state IS NULL)) AND usename <> 'barman' ORDER BY query_start, pid DESC; ``` ## Media Prioridad -> 8*5 Alertas que se deberían monitorzar 8x5 para su revisión como una tarea del día a día del DBA. ### VACUUM Básico para saber si hay alguna tabla de la que no se esté haciendo vacuum. Esta query devuelve las tablas de las que no se ha hecho `vacuum` en los últimos 7 días. ```sql SELECT schemaname, relname, n_dead_tup, n_live_tup, autovacuum_count, last_vacuum, last_autovacuum, last_autoanalyze, last_analyze FROM pg_stat_all_tables WHERE n_live_tup > 0 and schemaname not in ('pg_catalog', 'information_schema', 'partman', 'cron') and last_vacuum>NOW() - interval '7 days' ; ``` Para una alerta bastaría con saber `schemaname.relname`. Puede que la tabla tenga algún problema a analizar si el `vacuum` está programado y no se está realizando. ### Failed cron jobs Solo en caso de ejecutar tareas programadas con la extensión `pg_cron`. La query se ha de ejecutar sobre la BD configurada en el parámetro `cron.database_name` de la extensión: ```sql SELECT * FROM cron.job_run_details WHERE status = 'failed'; ```