Parallel VACUUMing

- November 16, 2017

The vacuumdb utility now supports parallel jobs. This is specified with the -j option, just like when using pg_dump or pg_restore. This means vacuuming a database will complete a lot quicker, and especially so for cases where tables are spread across multiple tablespaces. It will also start vacuuming the largest relations first.

For example:

 vacuumdb -j4 productiondb

This would vacuum the database named "productiondb" by spawning 4 vacuum jobs to run simultaneously.

VACUUMDB: ENABLE PARALLEL MODE

This is great for multi-table vacuums/analyzes. Previously, when running “vacuumdb some_database" you got all tables vacuumed in some kind of order.

We can see it, by creating some test tables:

for a in $( pwgen -A0 6 20 )
do
    psql -c "create table test_$a as select generate_series(1, 1000000)::int4 as i"
done

It did create for me these tables:

$ \d
           List of relations
 Schema |    Name     | Type  | Owner  
--------+-------------+-------+--------
 public | test_aeshae | table | depesz
 public | test_aijoox | table | depesz
 public | test_eeyies | table | depesz
 public | test_eicaeb | table | depesz
 public | test_ieveov | table | depesz
 public | test_isoexu | table | depesz
 public | test_jaidaf | table | depesz
 public | test_meijie | table | depesz
 public | test_ohceec | table | depesz
 public | test_okaeph | table | depesz
 public | test_ooghap | table | depesz
 public | test_oojuxo | table | depesz
 public | test_sheeru | table | depesz
 public | test_shofae | table | depesz
 public | test_thaepu | table | depesz
 public | test_uuyohl | table | depesz
 public | test_uyuque | table | depesz
 public | test_wainie | table | depesz
 public | test_yiegah | table | depesz
 public | test_zeecie | table | depesz
(20 rows)

If I'll run the vacuumdb now, normally, it will show me:

=$ vacuumdb -e
vacuumdb: vacuuming database "depesz"
VACUUM;

I.e. just “vacuum" was run, without any table names – it will run database-wide vacuum that will work in one backend, and do all tables sequentially.

I could have influenced order by doing something like:

vacuumdb -e -t test_zeecie -t test_yiegah -t test_wainie -t test_uyuque -t test_uuyohl -t test_thaepu -t test_shofae -t test_sheeru -t test_oojuxo -t test_ooghap -t test_okaeph -t test_ohceec -t test_meijie -t test_jaidaf -t test_isoexu -t test_ieveov -t test_eicaeb -t test_eeyies -t test_aijoox -t test_aeshae
vacuumdb: vacuuming database "depesz"
VACUUM test_zeecie;
VACUUM test_yiegah;
VACUUM test_wainie;
VACUUM test_uyuque;
VACUUM test_uuyohl;
VACUUM test_thaepu;
VACUUM test_shofae;
VACUUM test_sheeru;
VACUUM test_oojuxo;
VACUUM test_ooghap;
VACUUM test_okaeph;
VACUUM test_ohceec;
VACUUM test_meijie;
VACUUM test_jaidaf;
VACUUM test_isoexu;
VACUUM test_ieveov;
VACUUM test_eicaeb;
VACUUM test_eeyies;
VACUUM test_aijoox;
VACUUM test_aeshae;

But, all tables were vacuumed in order anyway.

Now, I can add -j option, and it will parallelize processing:

$ vacuumdb -e -j8
SELECT c.relname, ns.nspname FROM pg_class c, pg_namespace ns
 WHERE relkind IN ('r', 'm') AND c.relnamespace = ns.oid
 ORDER BY c.relpages DESC;
VACUUM public.test_ohceec;
VACUUM public.test_zeecie;
VACUUM public.test_jaidaf;
VACUUM public.test_aeshae;
VACUUM public.test_ieveov;
VACUUM public.test_yiegah;
VACUUM public.test_shofae;
VACUUM public.test_thaepu;
VACUUM public.test_sheeru;
VACUUM public.test_uuyohl;
VACUUM public.test_eicaeb;
VACUUM public.test_ooghap;
VACUUM public.test_oojuxo;
VACUUM public.test_okaeph;
VACUUM public.test_meijie;
VACUUM public.test_aijoox;
VACUUM public.test_eeyies;
VACUUM public.test_isoexu;
VACUUM public.test_uyuque;
VACUUM public.test_wainie;
VACUUM pg_catalog.pg_proc;
VACUUM pg_catalog.pg_depend;
VACUUM pg_catalog.pg_attribute;
VACUUM pg_catalog.pg_description;
VACUUM pg_catalog.pg_statistic;
VACUUM pg_catalog.pg_operator;
VACUUM pg_catalog.pg_rewrite;
VACUUM pg_catalog.pg_type;
VACUUM pg_catalog.pg_class;
VACUUM information_schema.sql_features;
VACUUM pg_catalog.pg_amop;
VACUUM pg_catalog.pg_amproc;
VACUUM pg_catalog.pg_conversion;
VACUUM pg_catalog.pg_opclass;
VACUUM pg_catalog.pg_index;
VACUUM pg_catalog.pg_aggregate;
VACUUM pg_catalog.pg_collation;
VACUUM pg_catalog.pg_opfamily;
VACUUM pg_catalog.pg_cast;
VACUUM pg_catalog.pg_ts_config_map;
VACUUM pg_catalog.pg_language;
VACUUM pg_catalog.pg_authid;
VACUUM pg_catalog.pg_constraint;
VACUUM pg_catalog.pg_am;
VACUUM pg_catalog.pg_namespace;
VACUUM pg_catalog.pg_database;
VACUUM pg_catalog.pg_tablespace;
VACUUM pg_catalog.pg_pltemplate;
VACUUM pg_catalog.pg_shdepend;
VACUUM pg_catalog.pg_shdescription;
VACUUM pg_catalog.pg_ts_config;
VACUUM pg_catalog.pg_ts_dict;
VACUUM pg_catalog.pg_ts_parser;
VACUUM pg_catalog.pg_ts_template;
VACUUM pg_catalog.pg_extension;
VACUUM pg_catalog.pg_range;
VACUUM information_schema.sql_implementation_info;
VACUUM information_schema.sql_languages;
VACUUM information_schema.sql_parts;
VACUUM information_schema.sql_sizing;
VACUUM information_schema.sql_packages;
VACUUM information_schema.sql_sizing_profiles;
VACUUM pg_catalog.pg_seclabel;
VACUUM pg_catalog.pg_default_acl;
VACUUM pg_catalog.pg_policy;
VACUUM pg_catalog.pg_foreign_table;
VACUUM pg_catalog.pg_foreign_server;
VACUUM pg_catalog.pg_foreign_data_wrapper;
VACUUM pg_catalog.pg_auth_members;
VACUUM pg_catalog.pg_db_role_setting;
VACUUM pg_catalog.pg_event_trigger;
VACUUM pg_catalog.pg_trigger;
VACUUM pg_catalog.pg_largeobject_metadata;
VACUUM pg_catalog.pg_inherits;
VACUUM pg_catalog.pg_attrdef;
VACUUM pg_catalog.pg_enum;
VACUUM pg_catalog.pg_shseclabel;
VACUUM pg_catalog.pg_user_mapping;
VACUUM pg_catalog.pg_largeobject;

As we can see – it initially gets list of tables to vacuum, sorted by relpages – which is basically size of the table.

And then vacuums these in parallel, using many backends. In pg_stat_activity it looks like:

$ select pid, query from pg_stat_activity where pid <> pg_backend_pid()
 pid  |           query            
------+----------------------------
 5981 | VACUUM public.test_ohceec;
 5982 | VACUUM public.test_zeecie;
 5983 | VACUUM public.test_jaidaf;
 5984 | VACUUM public.test_aeshae;
 5985 | VACUUM public.test_ieveov;
 5986 | VACUUM public.test_yiegah;
 5987 | VACUUM public.test_shofae;
 5988 | VACUUM public.test_thaepu;
(8 rows)
 
Nice. Starting with largest tables first is great. Thanks guys.

Search This Blog

TutorialDBA - Support | Training | Consultant

Parallel VACUUMing

Comments

Post a Comment

Popular posts from this blog

Top 40 Highest Paying URL Shortener Companies - Best URL Shorteners To Earn Fast

VMWARE WORKSTATION 3,4,5,6,7,8,9,10,11,12,14,15...etc LICENSE KEYS COLLECTION

How to Get Table Size, Database Size, Indexes Size, schema Size, Tablespace Size, column Size in PostgreSQL Database

PostgreSQL ALTER TABLE ... SET LOGGED / UNLOGGED

How To Configure pglogical | streaming replication for PostgreSQL