PostgreSQL autovacuum & Parameter configuration

- June 27, 2018

PostgreSQL's implementation of MVCC ensures that a transaction against a database will return only data that's been committed, in a snapshot, even if other processes are trying to modify that data. Imagine a database with millions of large tuples (commonly called "rows") in a table. If the table has many indices and constant read/writes happening, this could lead to huge amounts of bloat and ultimately performance issues. One crucial process to help prevent bloat is autovacuum.

At a high level, vacuuming is used to free up dead tuples in a table so they can be reused. It also helps you avoid transaction ID wraparound and improves statistics updates and query optimizations.

VACUUM:

It is a manual command.
Vacuum will remove dead tuples both from the table and indexes – it will not return the disk space back to the OS, but it will make it usable for new rows.
Don't run at transaction when transaction time it will take higher CPU and I/O usage on the system.
you may run it only once a day/week when less transaction probably night, in which case you’ll probably accumulate more dead tuples

VACUUM FULL:

VACUUM FULL would reclaim the space and return it to the OS.
Initially It acquires exclusive lock on the table, blocking all operations (including SELECT statement).
Then it creates a copy of the table, doubling the disk space needed, so it’s not very practical when already running out of disk space.

AUTOVACUUM:

Autovacuum will be executed more often during busy periods, and less often when the database is mostly idle
it should not consume too much resources (CPU and disk I/O),
By default autovacuum trigger when dead tuble reached 20% of table
autovacuum not only clearing dead tuble its updating current statistic to optimizer(for new query planner) when dead tuble reached 10% of table by default.

From above differentiate you will understand the important of autovacuum Before changing autovacuum related parameter you have to understand your database dead tuble growth following script will helpful for understanding your database dead tuble growth
run this script again and again with minimal time period and change the database name and environment variable as per your postgres server

There were 4 parameters in our postgresql.conf that were set to the default values:

autovacuum_vacuum_scale_factor = 0.2;
autovacuum_analyze_scale_factor = 0.1;
autovacuum_vacuum_threshold (integer)50
autovacuum_analyze_threshold (integer)50

The default values were far too small for the number of transactions our database was processing. For example, a nijam table in this database would not get autovacuumed tuples were updated or deleted for last few days.

select n_dead_tup ,last_vacuum,last_analyze,n_tup_upd, n_tup_del,n_tup_hot_upd,relname ,seq_scan,idx_scan  from pg_stat_all_tables where relname='k1';
 n_dead_tup | last_vacuum | last_analyze | n_tup_upd | n_tup_del | n_tup_hot_upd | relname | seq_scan | idx_scan
------------+-------------+--------------+-----------+-----------+---------------+---------+----------+----------
       200  |             |              |    200    |      0    |             0 | k1      |       17 |        
(1 row)

SELECT reltuples::numeric FROM pg_class WHERE relname = 'k1';
 reltuples
-----------
  10,000

This is how we reach the above number:

autovacuum_vacuum_threshold + pg_class.reltuples * scale_factor
autovacuum_analyze_threshold + pg_class.reltuples * scale_factor

PostgreSQL would trigger a vacuum when dead tubles reached 2050 right now we have 200 only
50+10000*0.2 =2050
PostgreSQL would trigger an analyze when dead tubles reached 1050
50+10000*0.1 =1050

we have a table with 10,000 rows and 200 of them had changed:

autovacuum_analyze_threshold tells us, that we're over the default 50;
we calculate the fraction based on autovacuum_analyze_scale_factor (which defaults to 0.1) and this gives us 1000 rows;
total calculated threshold therefore is 1050;
as 200 is less then 1050, ANALYZE is not started (we wait for more changes).

For the VACCUM there's another pair of parameters with completely similar behavior: autovacuum_vacuum_threshold and autovacuum_vacuum_scale_factor, except default scale for vacuuming is 0.2 or 20%.
Autovacuum parameter Options:

Option 1 : Default 20% for vacuum and 10% for analyze

    autovacuum_vacuum_scale_factor = 0.2
    autovacuum_analyze_scale_factor = 0.1

the table will be considered as in need of cleanup. The formula basically says that up to 20% of a table may be dead tuples before it gets cleaned up (the threshold of 50 rows is there to prevent very frequent cleanups of tiny tables).

The default scale factor works fine for small and medium-sized tables, but not so much for very large tables – on 20GB table this is roughly 4GB of dead tuples for vacuum and 2GB for analyze, while on 10TB table need to reach 2TB for vacuum and 1TB for analyze.

This is an example of accumulating a lot of dead tuples, and processing all of it at once, which is going to hurt. And per the rule mentioned before, the solution is to do it more often by significantly decreasing the scale factor, perhaps even like this:

Option 2 :1% for vacuum and 1% for analyze

 autovacuum_vacuum_scale_factor = 0.01
 autovacuum_analyze_scale_factor = 0.01

AT BIG TABLE 10TB:
PostgreSQL would trigger a vacuum and analyze when dead tubles reached (1%) means 100GB

AT MEDIUM TABLE 2TB:
PostgreSQL would trigger a vacuum and analyze when dead tubles reached (1%) means 20GB

AT SMALL TABLE 10GB:
PostgreSQL would trigger a vacuum and analyze when dead tubles reached (1%) means 100MB

This is good for small table not for Tera Bytes

which decreases the limit to only 1% of the table. An alternative solution is to abandon the scale factor entirely, and use solely the threshold

Option 3 : Based On threshold:

 autovacuum_vacuum_scale_factor = 0
 autovacuum_analyze_scale_factor = 0
 autovacuum_vacuum_threshold = 15000
 autovacuum_analyze_threshold =10000

which should trigger the cleanup after generating 15000 dead tuples and analyze the after reached dead tuble 10000.

One trouble is that these changes in postgresql.conf affect all tables (the whole cluster, in fact), and it may undesirably affect cleanups of small tables, including for example system catalogs.

When the small tables are cleaned up more frequently, the easiest solution is to simply ignore the problem entirely. Cleanup of small tables is going to be fairly cheap, and the improvement on large tables is usually so significant that even if you ignore small inefficiency on small tables, the overall effect is still very positive.

But if you decided to change the configuration in a way that would significantly delay cleanup on small tables (as for example with setting scale_factor=0 and threshold=4000), it’s better to apply those changes only to particular tables using ALTER TABLE:

ALTER TABLE small_table SET (autovacuum_vacuum_scale_factor = 0);
ALTER TABLE small_table SET (autovacuum_analyze_scale_factor = 0);
ALTER TABLE small_table SET (autovacuum_vacuum_threshold = 5000);
ALTER TABLE small_table SET (autovacuum_analyze_threshold = 2500);

Try to keep the configuration as simple as possible, and override the parameters for as few tables as possible. It’s also a good idea to include this into your internal documentation, including the reasoning for particular values.

How to Increase the performance of Autovacuum
autovacuum_max_workers :

log_autovacuum_min_duration (integer)
(integer) 3-8
autovacuum_naptime (integer) 1min->
autovacuum_vacuum_cost_delay (integer) 20ms->10ms
autovacuum_vacuum_cost_limit (integer) -1->

Search This Blog

TutorialDBA - Support | Training | Consultant

PostgreSQL autovacuum & Parameter configuration

Comments

Post a Comment

Popular posts from this blog

PostgreSQL DOMAIN Data Type -2

VMWARE WORKSTATION 3,4,5,6,7,8,9,10,11,12,14,15...etc LICENSE KEYS COLLECTION

Postgres Database Patch

How to find the server is whether standby (slave) or primary(master) in Postgresql replication ?

PostgreSQL Database startup / shutdown /restart