vacuumlo - removing large objects orphans from a database PostgreSQL
- vacuumlo is a simple utility program that will remove any “orphaned” large objects from a PostgreSQL database. An orphaned large object (LO) is considered to be any LO whose OID does not appear in any
oid
orlo
data column of the database. - OIDs basically give you a built-in, globally unique id for every row,default is on(default_with_oids (boolean)).
- If you use this, you may also be interested in the
lo_manage
trigger in the lo module.lo_manage
is useful to try to avoid creating orphaned LOs in the first place. - More about lo module
Working process of vacuumlo:
most efficient way to detect such orphaned rows in Postgres
Syntax:
- First, vacuumlo builds a temporary table which contains all of the OIDs of the large objects in the selected database.
- It then scans through all columns in the database that are of type oid or lo, and removes matching entries from the temporary table. (Note: only types with these names are considered; in particular, domains over them are not considered.)
- The remaining entries in the temporary table identify orphaned LOs. These are removed.
VACUUM ANALYZE VERBOSE pg_largeobject_metadata;
VACUUM ANALYZE VERBOSE pg_largeobject;
VACUUM FULL ANALYZE VERBOSE pg_largeobject_metadata;
VACUUM FULL ANALYZE VERBOSE pg_largeobject;
Before and after removing orphaned check the table size most efficient way to detect such orphaned rows in Postgres
select f.*
from
file f
left join
userfile u on f.id = u.file_id
where u.file_id is null
select f.*
from file f
where not exists (
select 1
from userfile u
where u.file_id = f.id
)
Both will detect the orphans but the second can be faster.because of internall postgres optimizations for
EXISTS
queries.Syntax:
vacuumlo [options] database [database2 ... databaseN]
command
|
Explanation
|
-v
|
Write a lot of progress messages.
|
-n
|
Don't remove anything, just show what would be
done.
|
-l limit
|
Remove no more than limit large
objects per transaction (default 1000). Since the server acquires a lock per
LO removed, removing too many LOs in one transaction risks exceeding max_locks_per_transaction. Set the limit to zero if you want all
removals done in a single transaction.
|
-U username
|
User name to connect as
|
-w
--no-password |
Never issue a password
prompt. If the server requires password authentication and a password is not
available by other means such as a .pgpass file, the connection
attempt will fail. This option can be useful in batch jobs and scripts where
no user is present to enter a password.
|
-W
|
Force vacuumlo to prompt for a password before
connecting to a database.
This option
is never essential, since vacuumlo will
automatically prompt for a password if the server demands password
authentication. However, vacuumlo will
waste a connection attempt finding out that the server wants a password. In
some cases it is worth typing -W to
avoid the extra connection attempt.
|
-p port
|
Database server's port.
|
-h hostname
|
Database server's host.
|
For Example:
You can check orphains manually, joining
You can check orphains manually, joining
pg_largeobject
against all your lo tables. Eg: lets create couple of lo tables and fill them with data:b=# create table l_o (n text, o oid);
CREATE TABLE
b=# create table lo (n text, p oid);
CREATE TABLE
b=# insert into l_o values('one',lo_import ('/tmp/wheel.PNG'));
INSERT 0 1
b=# insert into l_o values('two',lo_import ('/tmp/wheel.PNG'));
INSERT 0 1
b=# insert into l_o values('three',lo_import ('/tmp/wheel.PNG'));
INSERT 0 1
b=# insert into lo values('one',lo_import ('/tmp/wheel.PNG'));
INSERT 0 1
b=# insert into l_o values('two',lo_import ('/tmp/wheel.PNG'));
INSERT 0 1
Now check orphains:
b=# with l_o as (select o,'l_o' tname from l_o union all select p,'lo' from lo)
b-# select distinct loid, o, tname
b-# from pg_largeobject left outer join l_o on l_o.o = loid;
loid | o | tname
-------+-------+-------
34530 | 34530 | lo
34528 | 34528 | l_o
34527 | 34527 | l_o
34529 | 34529 | l_o
34531 | 34531 | l_o
(5 rows)
None. and now with
vacuumlo
:b=# \! vacuumlo -v -n b
Connected to database "b"
Test run: no large objects will be removed!
Checking o in public.l_o
Checking p in public.lo
Would remove 0 large objects from database "b".
Same. Now reproduce your case - drop,recreate and refill one on tables:
b=# drop table lo;
DROP TABLE
b=# create table lo (n text, p oid);
CREATE TABLE
b=# insert into lo values('one',lo_import ('/tmp/wheel.PNG'));
INSERT 0 1
Dry run:
b=# \! vacuumlo -v -n b
Connected to database "b"
Test run: no large objects will be removed!
Checking o in public.l_o
Checking p in public.lo
Would remove 1 large objects from database "b".
Check
pg_largeobject
:b=# with l_o as (select o,'l_o' tname from l_o union all select p,'lo' from lo)
select distinct loid, o, tname
from pg_largeobject left outer join l_o on l_o.o = loid;
loid | o | tname
-------+-------+-------
34528 | 34528 | l_o
34531 | 34531 | l_o
34550 | 34550 | lo
34527 | 34527 | l_o
34529 | 34529 | l_o
34530 | |
(6 rows)
Yes - one orphain (the "lo" table had one row, dropping and refilling made one row orphain)
run
vacuumlo
to fix it:b=# \! vacuumlo -v b
Connected to database "b"
Checking o in public.l_o
Checking p in public.lo
Successfully removed 1 large objects from database "b".
Checking the result:
b=# with l_o as (select o,'l_o' tname from l_o union all select p,'lo' from lo)
select distinct loid, o, tname
from pg_largeobject left outer join l_o on l_o.o = loid;
loid | o | tname
-------+-------+-------
34528 | 34528 | l_o
34531 | 34531 | l_o
34550 | 34550 | lo
34527 | 34527 | l_o
34529 | 34529 | l_o
(5 rows)
Yes - all clear. And now checking with
vacuumlo
:b=# \! vacuumlo -v -n b
Connected to database "b"
Test run: no large objects will be removed!
Checking o in public.l_o
Checking p in public.lo
Would remove 0 large objects from database "b".
Note:Sadly AWS RDS is not currently allowing it.
How to protect blob table from orphaned ? Here is some sample step:
Creating dummy table with oid data type
The great and almighty server of the Postgres, please, hear my prayer. When one updates the files table and if the new data column distinguishes from the old one do not forget to check if the old data value is not used any more and if not then delete BLOB referenced by this old value (if this BLOB exists of course). Thank you.
And one more rule:
For delete Statement
Creating dummy table with oid data type
CREATE TABLE files
(
codfile integer NOT NULL,
filename character varying(255),
data oid,
CONSTRAINT pkfiles PRIMARY KEY (codfile)
);
To avoid orphans:For Update StatementCREATE OR REPLACE RULE upd_oid_rule AS ON UPDATE TO files
WHERE OLD.data IS DISTINCT FROM NEW.data AND OLD.data IS NOT NULL
DO ALSO
SELECT
CASE WHEN (count(files.data) <= 1) AND
EXISTS(SELECT 1
FROM pg_catalog.pg_largeobject
WHERE loid = OLD.data)
THEN lo_unlink(OLD.data)
END
FROM files
WHERE files.data = OLD.data
GROUP BY OLD.data ;
For those who not so familiar with moonspeak my translation:The great and almighty server of the Postgres, please, hear my prayer. When one updates the files table and if the new data column distinguishes from the old one do not forget to check if the old data value is not used any more and if not then delete BLOB referenced by this old value (if this BLOB exists of course). Thank you.
And one more rule:
For delete Statement
CREATE OR REPLACE RULE del_oid_rule AS ON DELETE TO files
WHERE OLD.data IS NOT NULL
DO ALSO
SELECT
CASE WHEN (count(files.data) <= 1) AND
EXISTS(SELECT 1
FROM pg_catalog.pg_largeobject
WHERE loid = OLD.data)
THEN lo_unlink(OLD.data)
END
FROM files
WHERE files.data = OLD.data
GROUP BY OLD.data;
Comments
Post a Comment