vacuumlo - removing large objects orphans from a database PostgreSQL

- April 23, 2018

vacuumlo is a simple utility program that will remove any “orphaned” large objects from a PostgreSQL database. An orphaned large object (LO) is considered to be any LO whose OID does not appear in any oid or lo data column of the database.
OIDs basically give you a built-in, globally unique id for every row,default is on(default_with_oids (boolean)).
If you use this, you may also be interested in the lo_manage trigger in the lo module. lo_manage is useful to try to avoid creating orphaned LOs in the first place.
More about lo module

there are two ways to store large objects in the PostgreSQL:

bytea
oid and lo

Working process of vacuumlo:

First, vacuumlo builds a temporary table which contains all of the OIDs of the large objects in the selected database.
It then scans through all columns in the database that are of type oid or lo, and removes matching entries from the temporary table. (Note: only types with these names are considered; in particular, domains over them are not considered.)
The remaining entries in the temporary table identify orphaned LOs. These are removed.

AWS does not permit access to a shell to execute commands at that time you have to issue following any query depend on orphaned

VACUUM ANALYZE VERBOSE pg_largeobject_metadata; 
VACUUM ANALYZE VERBOSE pg_largeobject; 

VACUUM FULL ANALYZE VERBOSE pg_largeobject_metadata; 
VACUUM FULL ANALYZE VERBOSE pg_largeobject;

Before and after removing orphaned check the table size

most efficient way to detect such orphaned rows in Postgres

select f.*
from
    file f
    left join
    userfile u on f.id = u.file_id
where u.file_id is null

select f.*
from file f
where not exists (
    select 1
    from userfile u
    where u.file_id = f.id
)

Both will detect the orphans but the second can be faster.because of internall postgres optimizations for EXISTS queries.

Syntax:

vacuumlo [options] database [database2 ... databaseN]

command	Explanation
-v	Write a lot of progress messages.
-n	Don't remove anything, just show what would be done.
`-l` `limit`	Remove no more than `limit` large objects per transaction (default 1000). Since the server acquires a lock per LO removed, removing too many LOs in one transaction risks exceeding max_locks_per_transaction. Set the limit to zero if you want all removals done in a single transaction.
`-U` `username`	User name to connect as
`-w` `--no-password`	Never issue a password prompt. If the server requires password authentication and a password is not available by other means such as a .pgpass file, the connection attempt will fail. This option can be useful in batch jobs and scripts where no user is present to enter a password.
-W	Force vacuumlo to prompt for a password before connecting to a database. This option is never essential, since vacuumlo will automatically prompt for a password if the server demands password authentication. However, vacuumlo will waste a connection attempt finding out that the server wants a password. In some cases it is worth typing `-W` to avoid the extra connection attempt.
`-p` `port`	Database server's port.
`-h` `hostname`	Database server's host.

For Example:
You can check orphains manually, joining pg_largeobject against all your lo tables. Eg: lets create couple of lo tables and fill them with data:

b=# create table l_o (n text, o oid);
CREATE TABLE
b=# create table lo (n text, p oid);
CREATE TABLE
b=# insert into l_o values('one',lo_import ('/tmp/wheel.PNG'));
INSERT 0 1
b=# insert into l_o values('two',lo_import ('/tmp/wheel.PNG'));
INSERT 0 1
b=#  insert into l_o values('three',lo_import ('/tmp/wheel.PNG'));
INSERT 0 1
b=#  insert into lo values('one',lo_import ('/tmp/wheel.PNG'));
INSERT 0 1
b=#  insert into l_o values('two',lo_import ('/tmp/wheel.PNG'));
INSERT 0 1

Now check orphains:

b=# with l_o as (select o,'l_o' tname from l_o union all select p,'lo' from lo)
b-# select distinct loid, o, tname
b-# from pg_largeobject left outer join l_o on l_o.o = loid;
 loid  |   o   | tname
-------+-------+-------
 34530 | 34530 | lo
 34528 | 34528 | l_o
 34527 | 34527 | l_o
 34529 | 34529 | l_o
 34531 | 34531 | l_o
(5 rows)

None. and now with vacuumlo:

b=# \! vacuumlo -v -n b
Connected to database "b"
Test run: no large objects will be removed!
Checking o in public.l_o
Checking p in public.lo
Would remove 0 large objects from database "b".

Same. Now reproduce your case - drop,recreate and refill one on tables:

b=# drop table lo;
DROP TABLE
b=# create table lo (n text, p oid);
CREATE TABLE
b=# insert into lo values('one',lo_import ('/tmp/wheel.PNG'));
INSERT 0 1

Dry run:

b=# \! vacuumlo -v -n b
Connected to database "b"
Test run: no large objects will be removed!
Checking o in public.l_o
Checking p in public.lo
Would remove 1 large objects from database "b".

Check pg_largeobject:

b=# with l_o as (select o,'l_o' tname from l_o union all select p,'lo' from lo)
select distinct loid, o, tname
from pg_largeobject left outer join l_o on l_o.o = loid;
 loid  |   o   | tname
-------+-------+-------
 34528 | 34528 | l_o
 34531 | 34531 | l_o
 34550 | 34550 | lo
 34527 | 34527 | l_o
 34529 | 34529 | l_o
 34530 |       |
(6 rows)

Yes - one orphain (the "lo" table had one row, dropping and refilling made one row orphain)

run vacuumlo to fix it:

b=# \! vacuumlo -v b
Connected to database "b"
Checking o in public.l_o
Checking p in public.lo
Successfully removed 1 large objects from database "b".

Checking the result:

b=# with l_o as (select o,'l_o' tname from l_o union all select p,'lo' from lo)
select distinct loid, o, tname
from pg_largeobject left outer join l_o on l_o.o = loid;
 loid  |   o   | tname
-------+-------+-------
 34528 | 34528 | l_o
 34531 | 34531 | l_o
 34550 | 34550 | lo
 34527 | 34527 | l_o
 34529 | 34529 | l_o
(5 rows)

Yes - all clear. And now checking with vacuumlo:

b=# \! vacuumlo -v -n b
Connected to database "b"
Test run: no large objects will be removed!
Checking o in public.l_o
Checking p in public.lo
Would remove 0 large objects from database "b".

Note:Sadly AWS RDS is not currently allowing it.

How to protect blob table from orphaned ? Here is some sample step:

Creating dummy table with oid data type

CREATE TABLE files
(
  codfile integer NOT NULL,
  filename character varying(255),
  data oid,
  CONSTRAINT pkfiles PRIMARY KEY (codfile)
);

To avoid orphans:For Update Statement

CREATE OR REPLACE RULE upd_oid_rule AS ON UPDATE TO files
WHERE OLD.data IS DISTINCT FROM NEW.data AND OLD.data IS NOT NULL
DO ALSO
SELECT
  CASE WHEN (count(files.data) <= 1) AND
    EXISTS(SELECT 1
    FROM pg_catalog.pg_largeobject
    WHERE loid = OLD.data)
         THEN lo_unlink(OLD.data)
  END
FROM files
WHERE files.data = OLD.data
GROUP BY OLD.data ;

For those who not so familiar with moonspeak my translation:

The great and almighty server of the Postgres, please, hear my prayer. When one updates the files table and if the new data column distinguishes from the old one do not forget to check if the old data value is not used any more and if not then delete BLOB referenced by this old value (if this BLOB exists of course). Thank you.
And one more rule:
For delete Statement

CREATE OR REPLACE RULE del_oid_rule AS ON DELETE TO files
WHERE OLD.data IS NOT NULL
DO ALSO
SELECT
  CASE WHEN (count(files.data) <= 1) AND
    EXISTS(SELECT 1
    FROM pg_catalog.pg_largeobject
    WHERE loid = OLD.data)
       THEN lo_unlink(OLD.data)
  END
FROM files
WHERE files.data = OLD.data
GROUP BY OLD.data;

Search This Blog

TutorialDBA - Support | Training | Consultant

vacuumlo - removing large objects orphans from a database PostgreSQL

Comments

Post a Comment

Popular posts from this blog

PostgreSQL DOMAIN Data Type -2

VMWARE WORKSTATION 3,4,5,6,7,8,9,10,11,12,14,15...etc LICENSE KEYS COLLECTION

Postgres Database Patch

How to find the server is whether standby (slave) or primary(master) in Postgresql replication ?

PostgreSQL Database startup / shutdown /restart