Drop Down MenusCSS Drop Down MenuPure CSS Dropdown Menu

Postgresqll Architecture And Briefly Explanation With Diagrams

PostgreSQL instance consists of set of Process and Memory. PostgreSQL uses a simple “process per-user” client/server model. The major processes are:

  1. The ‘postmaster’ which is:
    • supervisory daemon process,
    • ‘postmaster’ is attached to shmmem segment but refrains from accessing to it.
    • always running waiting for connection requests
  2. Utility processes ( BGwriter, WALwriter, Syslogger, Archiver, Statscollector , Autovacuum launcher,WAL sender and WAL receiver )
  3. User Backend process ( postgres process itself, Server Process )

1)postmaster:-
When a client request for connection to the database, firstly request is hit to Postmaster daemon process. After performing authentication and authorization it forks one new backend server process (postgres). Henceforth, the frontend process and the backend server communicate directly without intervention by the postmaster. The postmaster is always running, waiting for connection requests, whereas frontend and backend processes come and go. The libpq library allows a single frontend to make multiple connections to backend processes.

However, each backend process is a single-threaded process that can only execute one query at a time; so the communication over any one frontend-to-backend connection is single-threaded.
Postmaster and postgres servers run with the user ID of the PostgreSQL “superuser”.

One postgres process exists for every open database session. Once authenticated with user connection, it directly connects (with who and for what purpose) with shared memory.

2)User Backend process:-
POSTGRESQL SHARED MEMORY
Shared Buffers:
Sets the amount of memory the database server uses for shared memory buffers. The default is typically 32MB. Larger settings for shared_buffers usually require a corresponding increase in checkpoint_segments, in order to spread out the process of writing large quantities of new or changed data over a longer period of time.

Below 3 parameters should be discussed:
bgwriter_delay
bgwriter_lru_maxpages
bgwriter_lru_multiplier

WAL Buffers:
The amount of shared memory used for WAL data that has not yet been written to disk. The default setting of -1 selects a size equal to 1/32nd (about 3%) of shared_buffers, but not less than 64kB nor more than the size of one WAL segment, typically 16MB. This value can be set manually if the automatic choice is too large or too small, but any positive value less than 32kB will be treated as 32kB. This parameter can only be set at server start. The contents of the WAL buffers are written out to disk at every transaction commit, so extremely large values are unlikely to provide a significant benefit. However, setting this value to at least a few megabytes can improve write performance on a busy server where many clients are committing at once. The auto-tuning selected by the default setting of -1 should give reasonable results in most cases.

CLOG Buffers:
$PGDATA/pg_clog contains a log of transaction metadata. This log tells PostgreSQL which transactions completed and which did not. The clog is small and never has any reason to become bloated, so you should never have any reason to touch it.

POSTGRESQL PER BACKEND MEMORY
work_mem:
Specifies the amount of memory to be used by internal sort operations and hash tables before writing to temporary disk files. Default is 1M. Note that for a complex query, several sort or hash operations might be running in parallel; each operation will be allowed to use as much memory as this value specifies before it starts to write data into temporary files. Also, several running sessions could be doing such operations concurrently. Therefore, the total memory used could be many times the value of work_mem; it is necessary to keep this fact in mind when choosing the value.

temp_buffers:
Sets the maximum number of temporary buffers used by each database session. Default is 8M. The setting can be changed within individual sessions, but only before the first use of temporary tables within the session; subsequent attempts to change the value will have no effect on that session.

maintenance_work_mem:
Specifies the maximum amount of memory to be used by maintenance operations, such as VACUUM, CREATE INDEX, and ALTER TABLE ADD FOREIGN KEY. Default is 16M. Since only one of these operations can be executed at a time by a database session, and an installation normally doesn’t have many of them running concurrently, it’s safe to set this value significantly larger than work_mem.


                

3)Utility processes:-
Every PostgreSQL Instance startup, there will be a set of utilty process(including mandatory and optional process) and memory.
Two mandatory process (BGWRITER and WAL WRITER-We cannot Enable/Disable these processes.) and four optional process (Autovacuum launcher,stats collector,syslogger,Archiver,WAL sender and WAL receiver). You can check it out with the command 'ps -ef | grep postgres'


BGWriter/Writer Process:
BGWRITER or WRITER process is a mandotary process.
WRITER -- process is responsible to write the dirty buffers to data files
BGWRITER spends much of its time sleeping, but every time it wakes, it searches through the shared buffer pool looking for modified pages After each search, the BGWRITER chooses some number of modified pages, writes them to disk, and evicts those pages from the shared buffer pool. BGWRITER process can be controled with three parameters BGWRITER_DELAY,BGWRITER_LRU_PERCENT and BGWRITER_LRU_MAXPAGES.

WAL Writer Process:
WAL writer process is a mandatory process.
WAL WRITER -- is for writing the dirty buffers in WAL buffers to WAL files.
WAL buffers holds the changes made to the database in the transaction logs as WAL writer process is responsible to write on to the disk. WAL_WRITER_DELAY parameter for invoking the WAL Writer Process

Stats Collector Process:
Stats collecotr process is optional process, default is ON.
STATS COLLECTOR -- process to collect the statistics of objects in the database require by Optimizer to improve the performance
It count number of access to the tables and indexes in both disk-block and individual row items. It also tracks the total number of rows in each table, and information about VACUUM and ANALYZE actions for each table. Collection of statistics adds some overhead to query execution, whether to collect or not collect information. Some of the parameter in the postgresql.conf file will control the collection activity of the stats collector process

some important parameters of Stats Collector:
the system can be configured to collect or not collect information. This is controlled by configuration parameters that are normally set in postgresql.conf
The parameter track_counts controls whether statistics are collected about table and index accesses.
The parameter track_functions enables tracking of usage of user-defined functions.
The parameter track_activities enables monitoring of the current command being executed by any server process.

Normally these parameters are set in postgresql.conf so that they apply to all server processes, but it is possible to turn them on or off in individual sessions using the SET command. (To prevent ordinary users from hiding their activity from the administrator, only superusers are allowed to change these parameters with SET.)
The statistics collector communicates with the backends needing information (including autovacuum) through temporary files. These files are stored in the pg_stat_tmp subdirectory. When the postmaster shuts down, a permanent copy of the statistics data is stored in the global subdirectory. For increased performance, the parameter stats_temp_directory can be pointed at a RAM-based file system, decreasing physical I/O requirements.
pg_stat_database,pg_stat_activity,pg_stat_all_tables...-->views of Stats Collector  collected informations.

Autovacuum Launcher Process:
Autovacuuming is a optional Process, default is ON.
For automating the execution of VACUUM and ANALYZE command, Autovacuum Launcher is a daemon process consists of multiple processes called autovacuum workers. Autovacuum launcher is a charge of starting autovacuum worker processes for all databases. Launcher will distribute the work across time, attempting to start one worker on each database for every interval, set by the parameter autovacuum_naptime. One worker will be launched for each database, set by the parameter autovacuum_max_workers. Each worker process will check each table within its database and execute VACUUM or ANALYZE as needed
some important parameters of Autovacuum:
autovacuum (boolean)-->Controls whether the server should run the autovacuum launcher daemon. This is on by default
autovacuum_max_workers (integer)-->Specifies the maximum number of autovacuum processes
autovacuum_naptime (integer)-->Specifies the minimum delay between autovacuum runs on any given database default is one minute (1m)
autovacuum_vacuum_threshold (integer)-->Specifies the minimum number of updated or deleted tuples needed to trigger a VACUUM in any one table. The default is 50 tuples
autovacuum_analyze_threshold (integer)-->Specifies the minimum number of inserted, updated or deleted tuples needed to trigger an ANALYZE in any one table. The default is 50 tuples
log_autovacuum_min_duration (integer)-->Minus-one (the default) Causes each action executed by autovacuum to be logged if it ran for at least the specified number of milliseconds


Syslogger Process / Logger Process:
Logging is an optional process, default is OFF.
all the utility process + User backends + Postmaster Daemon attached to syslogger process for logging the information about their activities. Every process information is logged under $PGDATA/pg_log with the file .log. Note: If the data directory is created with INITDB command, then there wont be pg_log directory under it. Explicit creation is needed.

some important parameters of sysLogger :
log_destination (string)--->methods for logging server messages, including stderr, csvlog and syslog
logging_collector (boolean)-->This parameter allows messages sent to stderr,
log_directory (string)-->When logging_collector is enabled, this parameter determines the directory in which log files will be created
log_filename (string)-->When logging_collector is enabled, this parameter sets the file names of the created log files
log_rotation_size (integer)-->When logging_collector is enabled, this parameter determines the maximum size of an individual log file
log_rotation_age (integer)-->When logging_collector is enabled, this parameter determines the maximum lifetime of an individual log file
log_truncate_on_rotation (boolean)-->When logging_collector is enabled, this parameter will cause PostgreSQL to truncate (overwrite), rather than append to, any existing log file of the same name

Archiver Process:
Achiver process is optional process, default is OFF.
Setting up the database in Archive mode means, to capture the WAL data of each segment file once it is filled, and save that data somewhere before the segment file is recycled for reuse.


archiver  working process:-
1. On Database Archivelog mode, once the WAL data is filled in the WAL Segment, that filled segment named file is created under $PGDATA/pg_xlog/archive_status by the WAL Writer naming the file as ".ready". File naming will be "segment-filename.ready".
2. Archiver Process triggers on finding the files which are in ".ready" state created by the WAL Writer process. Archiver process picks the 'segment-file_number' of .ready file and copies the file from $PGDATA/pg_xlog location to its concerned Archive destination given in 'archive_command' parameter(postgresql.conf).
3. On successful completion of copy from source to destination, archiver process renames the "segment-filename.ready" to "segment-filename.done". This completes the archiving process.
It is understood that, if any files named "segement-filename.ready" found in $PGDATA/pg_xlog/archive_status are the pending files still to be copied to Archive destination.

some important parameters of archiver:
archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'  # Unix
archive_command = 'copy "%p" "C:\\server\\archivedir\\%f"'  # Windows
Compressed Archive Log
archive_command = 'pg_compresslog %p - | gzip > /var/lib/pgsql/archive/%f'Compressed Archive Logs
pg_decompresslog during recovery
restore_command = 'gunzip < /mnt/server/archivedir/%f | pg_decompresslog - %p'





4)overview of PostgreSQL
Database file layout
  1. PGDATA – base directory for the Database Server: traditionally it contains configuration and data files + data directory
  2. example location: /var/lib/pgsql/data
  3. Multiple clusters, managed by different server instances, can exist on the same machine
  4. configuration files and pid file location can be configured any where, it can reside under PGDATA also

base subdirectory
  1. contains the user database files
  2. subdirectory names are the database OIDs

Data Pages
  1. pages are located under the database subdirectories
  2. page default size: 8k
  3. additional sizes:4k and 16k but needs compilation of postgresql
  4. for general purpose 8k is best practice

user accessible files
  1. PGVERSION: major version number of installation
  2. postgresql.conf: main configuration file for PostgreSQL installation
  3. pg_hba.conf: configures the client authentication method
  4. pg_ident.conf: configures OS and PostgreSQL authentication name mapping
  5. postmaster.opts: default command line options for the postmaster
  6. postmaster.pid: PID of the postmaster and identification the main directory

    Comments

    Popular posts from this blog

    How to find the server is whether standby (slave) or primary(master) in Postgresql replication ?

    7 Steps to configure BDR replication in postgresql

    Ora2PG - Oracle/MySQL to Postgres DB migration Version 20.0

    How to Get Table Size, Database Size, Indexes Size, schema Size, Tablespace Size, column Size in PostgreSQL Database

    PostgreSQL Introduction