Drop Down MenusCSS Drop Down MenuPure CSS Dropdown Menu

Parallel VACUUMing

The vacuumdb utility now supports parallel jobs. This is specified with the -j option, just like when using pg_dump or pg_restore. This means vacuuming a database will complete a lot quicker, and especially so for cases where tables are spread across multiple tablespaces. It will also start vacuuming the largest relations first.
For example:
 vacuumdb -j4 productiondb
This would vacuum the database named "productiondb" by spawning 4 vacuum jobs to run simultaneously.

This is great for multi-table vacuums/analyzes. Previously, when running “vacuumdb some_database" you got all tables vacuumed in some kind of order.
We can see it, by creating some test tables:
for a in $( pwgen -A0 6 20 )
    psql -c "create table test_$a as select generate_series(1, 1000000)::int4 as i"
It did create for me these tables:
$ \d
           List of relations
 Schema |    Name     | Type  | Owner  
 public | test_aeshae | table | depesz
 public | test_aijoox | table | depesz
 public | test_eeyies | table | depesz
 public | test_eicaeb | table | depesz
 public | test_ieveov | table | depesz
 public | test_isoexu | table | depesz
 public | test_jaidaf | table | depesz
 public | test_meijie | table | depesz
 public | test_ohceec | table | depesz
 public | test_okaeph | table | depesz
 public | test_ooghap | table | depesz
 public | test_oojuxo | table | depesz
 public | test_sheeru | table | depesz
 public | test_shofae | table | depesz
 public | test_thaepu | table | depesz
 public | test_uuyohl | table | depesz
 public | test_uyuque | table | depesz
 public | test_wainie | table | depesz
 public | test_yiegah | table | depesz
 public | test_zeecie | table | depesz
(20 rows)
If I'll run the vacuumdb now, normally, it will show me:
=$ vacuumdb -e
vacuumdb: vacuuming database "depesz"
I.e. just “vacuum" was run, without any table names – it will run database-wide vacuum that will work in one backend, and do all tables sequentially.
I could have influenced order by doing something like:
vacuumdb -e -t test_zeecie -t test_yiegah -t test_wainie -t test_uyuque -t test_uuyohl -t test_thaepu -t test_shofae -t test_sheeru -t test_oojuxo -t test_ooghap -t test_okaeph -t test_ohceec -t test_meijie -t test_jaidaf -t test_isoexu -t test_ieveov -t test_eicaeb -t test_eeyies -t test_aijoox -t test_aeshae
vacuumdb: vacuuming database "depesz"
VACUUM test_zeecie;
VACUUM test_yiegah;
VACUUM test_wainie;
VACUUM test_uyuque;
VACUUM test_uuyohl;
VACUUM test_thaepu;
VACUUM test_shofae;
VACUUM test_sheeru;
VACUUM test_oojuxo;
VACUUM test_ooghap;
VACUUM test_okaeph;
VACUUM test_ohceec;
VACUUM test_meijie;
VACUUM test_jaidaf;
VACUUM test_isoexu;
VACUUM test_ieveov;
VACUUM test_eicaeb;
VACUUM test_eeyies;
VACUUM test_aijoox;
VACUUM test_aeshae;
But, all tables were vacuumed in order anyway.
Now, I can add -j option, and it will parallelize processing:
$ vacuumdb -e -j8
SELECT c.relname, ns.nspname FROM pg_class c, pg_namespace ns
 WHERE relkind IN ('r', 'm') AND c.relnamespace = ns.oid
 ORDER BY c.relpages DESC;
VACUUM public.test_ohceec;
VACUUM public.test_zeecie;
VACUUM public.test_jaidaf;
VACUUM public.test_aeshae;
VACUUM public.test_ieveov;
VACUUM public.test_yiegah;
VACUUM public.test_shofae;
VACUUM public.test_thaepu;
VACUUM public.test_sheeru;
VACUUM public.test_uuyohl;
VACUUM public.test_eicaeb;
VACUUM public.test_ooghap;
VACUUM public.test_oojuxo;
VACUUM public.test_okaeph;
VACUUM public.test_meijie;
VACUUM public.test_aijoox;
VACUUM public.test_eeyies;
VACUUM public.test_isoexu;
VACUUM public.test_uyuque;
VACUUM public.test_wainie;
VACUUM pg_catalog.pg_proc;
VACUUM pg_catalog.pg_depend;
VACUUM pg_catalog.pg_attribute;
VACUUM pg_catalog.pg_description;
VACUUM pg_catalog.pg_statistic;
VACUUM pg_catalog.pg_operator;
VACUUM pg_catalog.pg_rewrite;
VACUUM pg_catalog.pg_type;
VACUUM pg_catalog.pg_class;
VACUUM information_schema.sql_features;
VACUUM pg_catalog.pg_amop;
VACUUM pg_catalog.pg_amproc;
VACUUM pg_catalog.pg_conversion;
VACUUM pg_catalog.pg_opclass;
VACUUM pg_catalog.pg_index;
VACUUM pg_catalog.pg_aggregate;
VACUUM pg_catalog.pg_collation;
VACUUM pg_catalog.pg_opfamily;
VACUUM pg_catalog.pg_cast;
VACUUM pg_catalog.pg_ts_config_map;
VACUUM pg_catalog.pg_language;
VACUUM pg_catalog.pg_authid;
VACUUM pg_catalog.pg_constraint;
VACUUM pg_catalog.pg_am;
VACUUM pg_catalog.pg_namespace;
VACUUM pg_catalog.pg_database;
VACUUM pg_catalog.pg_tablespace;
VACUUM pg_catalog.pg_pltemplate;
VACUUM pg_catalog.pg_shdepend;
VACUUM pg_catalog.pg_shdescription;
VACUUM pg_catalog.pg_ts_config;
VACUUM pg_catalog.pg_ts_dict;
VACUUM pg_catalog.pg_ts_parser;
VACUUM pg_catalog.pg_ts_template;
VACUUM pg_catalog.pg_extension;
VACUUM pg_catalog.pg_range;
VACUUM information_schema.sql_implementation_info;
VACUUM information_schema.sql_languages;
VACUUM information_schema.sql_parts;
VACUUM information_schema.sql_sizing;
VACUUM information_schema.sql_packages;
VACUUM information_schema.sql_sizing_profiles;
VACUUM pg_catalog.pg_seclabel;
VACUUM pg_catalog.pg_default_acl;
VACUUM pg_catalog.pg_policy;
VACUUM pg_catalog.pg_foreign_table;
VACUUM pg_catalog.pg_foreign_server;
VACUUM pg_catalog.pg_foreign_data_wrapper;
VACUUM pg_catalog.pg_auth_members;
VACUUM pg_catalog.pg_db_role_setting;
VACUUM pg_catalog.pg_event_trigger;
VACUUM pg_catalog.pg_trigger;
VACUUM pg_catalog.pg_largeobject_metadata;
VACUUM pg_catalog.pg_inherits;
VACUUM pg_catalog.pg_attrdef;
VACUUM pg_catalog.pg_enum;
VACUUM pg_catalog.pg_shseclabel;
VACUUM pg_catalog.pg_user_mapping;
VACUUM pg_catalog.pg_largeobject;
As we can see – it initially gets list of tables to vacuum, sorted by relpages – which is basically size of the table.
And then vacuums these in parallel, using many backends. In pg_stat_activity it looks like:
$ select pid, query from pg_stat_activity where pid <> pg_backend_pid()
 pid  |           query            
 5981 | VACUUM public.test_ohceec;
 5982 | VACUUM public.test_zeecie;
 5983 | VACUUM public.test_jaidaf;
 5984 | VACUUM public.test_aeshae;
 5985 | VACUUM public.test_ieveov;
 5986 | VACUUM public.test_yiegah;
 5987 | VACUUM public.test_shofae;
 5988 | VACUUM public.test_thaepu;
(8 rows)
Nice. Starting with largest tables first is great. Thanks guys.


Popular posts from this blog

How to find the server is whether standby (slave) or primary(master) in Postgresql replication ?

How to Get Table Size, Database Size, Indexes Size, schema Size, Tablespace Size, column Size in PostgreSQL Database

7 Steps to configure BDR replication in postgresql

Postgres Streaming Replication Configuration

vacuumlo - removing large objects orphans from a database PostgreSQL