User Tools

Site Tools


dba:postgresql:a_tale_of_postgresql_the_second

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
dba:postgresql:a_tale_of_postgresql_the_second [2023/10/24 09:23] dodgerdba:postgresql:a_tale_of_postgresql_the_second [2023/10/25 06:45] (current) – removed dodger
Line 1: Line 1:
-# A Thale(s) of PostgreSql (the 2nd) 
- 
-# Servers 
- 
-| ip | role | 
-|----|------| 
-| `10.3.0.3` | primary | 
-| `10.6.1.3` | backup | 
- 
-# The problem 
- 
-Just open the log give us information about the error: 
-``` 
-The network name cannot be found. 
-2023-10-19 09:45:56.140 BST [4600] LOG:  archive command failed with exit code 1 
-2023-10-19 09:45:56.140 BST [4600] DETAIL:  The failed archive command was: copy "pg_wal\000000010000007C00000019" "\\10.6.1.3\archivelog\000000010000007C00000019" 
-2023-10-19 09:45:56.140 BST [4600] WARNING:  archiving write-ahead log file "000000010000007C00000019" failed too many times, will try again later 
-``` 
- 
-The command executed by `archive_command` is failing.   
-The `archive_command` on PostgreSql main configuration file `postgresql.conf` is set as:   
-```config 
-archive_command = 'copy "%p" "\\\\10.6.1.3\\\archivelog\\%f"' 
-``` 
- 
-Additionally, We can see the following line: 
-``` 
-The network name cannot be found. 
-``` 
- 
-Without connecting again, in my oppinion, the ip address: `10.6.1.3` (which is the _backup_ server), either is *not* accessible via windows shared drive or the shared drive `\\10.6.1.3\archivelog` is full (0% space available).   
- 
-Anyway, the result of the `archive_command` is _ERROR_. And the default behaviour of PostgreSql in such case is *DON'T* delete the WAL file until `archive_command` succeed.   
- 
-PostgreSql [Documentation](https://www.postgresql.org/docs/12/continuous-archiving.html) is clear at that point, including an example: 
- > While designing your archiving setup, consider what will happen if the archive command fails repeatedly because some aspect requires operator intervention or the archive runs out of space. For example, this could occur if you write to tape without an autochanger; when the tape fills, nothing further can be archived until the tape is swapped. You should ensure that any error condition or request to a human operator is reported appropriately so that the situation can be resolved reasonably quickly. The pg_wal/ directory will continue to fill with WAL segment files until the situation is resolved. (If the file system containing pg_wal/ fills up, PostgreSQL will do a PANIC shutdown. No committed transactions will be lost, but the database will remain offline until you free some space.) 
- 
- 
-That's the cause of the problems you're facing. As the result of `archive_command` fail, WAL are not being automatically deleted then the disk will run out of space. PostgreSql is waiting for manual intervention so `archive_command` ends with _OK_ and it can continue deleting old WAL files. 
- 
- 
- 
-# Solutions 
- 
-## Solution 1: Solve the connectivity problems 
- 
-First solution should be fast and doesn't involve PostgreSql reconfiguration/restart.   
-You have to check: 
-  * Connectivity from _primary_ to the shared Network drive: `\\10.6.1.3\archivelog` 
-  * Disk space available on the _backup_ server in the underlying drive of the shared network drive `\\10.6.1.3\archivelog` 
- 
-### Additional actions on this solution 
- 
-To avoid possible future failures.   
- 
-Add check to the monitorization tool to check:    
-  * Connectivity from _primary_ server to the shared Network drive 
-  * Connectivity from _backup_ server to the shared Network drive 
-  * Disk space available on the _backup_ server in the underlying drive of the shared network drive 
- 
- 
-## Solution 2: Change archive_command 
- 
-This solution will involve PostgreSql server to be restarted to apply changes.   
- 
-Modify `archive_command` to copy WAL files locally: 
-```conf 
-archive_command = 'copy "%p" "C:\\APIS\\Postgres12\\primary_archivelog\\%f"' 
-``` 
-Create the destination directory: 
-```powershell 
-mkdir C:\APIS\Postgres12\primary_archivelog 
-``` 
-And restart PostgreSql.   
- 
-*Note*: This solution is not recommended because you won't have cross-repliaction. That is, in our initial recomendations we suggest that _primary_ server copies the WAL files to the _backup_ server to be able to execute a Point-in-time-recover (PITR), that is a *best-practice*.   
- 
- 
- 
-# Additional notes and recommendations 
- 
-## PostgreSQL version 
- 
-You're running version 12 which has EOL (end-of-life) support on [November 14, 2024](https://www.postgresql.org/support/versioning/).   
-Our recommendation is to upgrade to v16.   
-v16 will be on production state for many months when final upgrade happens, so we suggest to plan the version upgrade as soon as possible.   
- 
-## Archivelog location 
- 
-Consider having a external shared storage shared with both _primary_ and _backup_ for archivelog storage.   
- 
-# Thales Hyderabad info 
-You can safely ignore this. 
- 
-``` 
-Archive:  wetransfer_hyderabad-metro-apis-pg-logs_2023-10-19_0855.zip 
-Written using Zippy 
-  Length      Date    Time    Name 
----------  ---------- -----   ---- 
-  9315434  2023-10-19 10:56   10.3.0.3 primary server PG log.zip 
-  5903497  2023-10-19 10:56   10.6.1.3 backup server PG logs.zip 
----------                     ------- 
- 15218931                     2 files 
-``` 
- 
-