Drop Down MenusCSS Drop Down MenuPure CSS Dropdown Menu

postgreSQL Compress format backup

The -F option on pg_dump allows you to specify the format of the output. The default output is plain-text. That’s what you get when you do this:
pg_dump bacula > bacula.sql
and is functionally equivalent to doing:
pg_dump -Fp bacula > bacula.sql
There are other formats (directory, tar), but today I’m concerned with the custom (c) option. By default, this output is also compressed (see the -Z option). What is of most value is the ability to use pg_restore with the custom output.

What’s the big deal about custom format?

The more interesting option on pg_restore was -j. This allows you to run multiple concurrent jobs for the loading of data and building of indexes. This can greatly reduce the amount of time needed to restore a large database if your server has multiple processors.
pg_restore can be much more useful than just doing:
psql bacula < bacula.sql
With pg_restore you can be very selective about what you restore. See the -L and -l options.

How much does pg_dump compress?

I was curious. I wanted to compare the various pg_dump times for each level of compression.
I did a series of test such as this:
$ time pg_dump bacula > bacula.sql

real    4m22.654s
user    0m17.080s
sys     0m7.766s
In the following table, I'm using the 'real' value shown above. In addition, I'm also showing the time it took to restore the dump via 'pg_store -d testing'.
commanddump timefile sizerestore time
pg_dump -Fc11m12.443s4.6G34m27.218s
pg_dump -Z0 -Fc4m18.742s14G36m8.156s
pg_dump -Z1 -Fc5m0.498s5.4G36m26.047s
pg_dump -Z2 -Fc5m24.376s5.2G36m11.556s
pg_dump -Z3 -Fc6m20.513s5.1G36m12.259s
pg_dump -Z4 -Fc6m46.074s4.9G36m31.050s
pg_dump -Z5 -Fc8m22.397s4.7G34m53.817s
pg_dump -Z6 -Fc11m18.154s4.6G35m54.344s
pg_dump -Z7 -Fc14m21.447s4.5G35m14.134s
pg_dump -Z8 -Fc25m15.000s4.5G (45MB smaller)35m22.793s
pg_dump -Z9 -Fc26m40.550s4.5G (1.8MB smaller)35m11.992s


Looking at the above, the default compression (i.e. without specifying -Z) is about the same as -Z6, but takes only 4 minutes versus 11 minutes.
I will be using just plain -Fc, and I will not be using the -Z option.

pg_restore times

As you can see from the above times, compression level has very little performance hit on pg_restore.

What about -j?

Let's try -j with pg_restore:
$ time pg_restore -j 8 -d testing < bacula.dump.11m12.443s
pg_restore: [custom archiver] parallel restore from standard input is not supported

real    0m0.073s
user    0m0.000s
sys     0m0.000s
Oh, OK, let's go with this format:
$ time pg_restore -j 8 -d testing bacula.dump.11m12.443s
The original test for this took about 34.5 minutes. I won't consider -j 8 to be much of a savings unless it's under 32 minutes. Anything less just might be noise.
*insert pause here*
Wow! 17 minutes! That's a huge performance increase! I am impressed. I will be using -j every time now.


Popular posts from this blog

How to find the server is whether standby (slave) or primary(master) in Postgresql replication ?

How to Get Table Size, Database Size, Indexes Size, schema Size, Tablespace Size, column Size in PostgreSQL Database

7 Steps to configure BDR replication in postgresql

Postgres Streaming Replication Configuration

vacuumlo - removing large objects orphans from a database PostgreSQL