Drop Down MenusCSS Drop Down MenuPure CSS Dropdown Menu

TSVECTOR EDITING FUNCTIONS

Tsvector editing functions
 
Adds several tsvector editting function: convert tsvector to/from text array,
set weight for given lexemes, delete lexeme(s), unnest, filter lexemes
with given weights
 
Author: Stas Kelvich with some editorization by me
Reviewers: Tomas Vondram, Teodor Sigaev
For those that don't know tsvector is datatype used by tsearch, which in turn is PostgreSQL's full text search engine.
Basically, whenever you index data for use for tsearch, you use (sometimes explicitly, sometimes not) tsvector values.
Which looks like this:
$ select to_tsvector('english', 'Mandatory arguments to long options are mandatory for short options too.');
                          to_tsvector                          
---------------------------------------------------------------
 'argument':2 'long':4 'mandatori':1,7 'option':5,10 'short':9
(1 row)
As you can see some words have been removed (“to", “are", and “too"), and the rest were brought back to simplest form (mandatory => mandatori).
To make future examples smaller, let's store this tsvector in test table:
$ create table test as select to_tsvector('english', 'Mandatory arguments to long options are mandatory for short options too.') as t;
SELECT 1
 
$ \d test
      Table "public.test"
 Column |   Type   | Modifiers 
--------+----------+-----------
 t      | tsvector | 
 
$ select * from test;
                               t                               
---------------------------------------------------------------
 'argument':2 'long':4 'mandatori':1,7 'option':5,10 'short':9
(1 row)
Now, let's see the new functions.
First of them is setweight. We could have used it before, like:
$ select setweight(t, 'A') from test;
                              setweight                               
----------------------------------------------------------------------
 'argument':2A 'long':4A 'mandatori':1A,7A 'option':5A,10A 'short':9A
(1 row)
Which did set weight for all words in this tsvector (well, not words, lexemes).
Now, we can do it, by word:
$ select setweight(t, 'A', '{mandatory,long,short}') from test;
                            setweight                            
-----------------------------------------------------------------
 'argument':2 'long':4A 'mandatori':1,7 'option':5,10 'short':9A
(1 row)
Unfortunately, to make it work for mandatori, I'd have to provide it as it was vectorized:
$ select setweight(t, 'A', '{mandatori,long,short}') from test;
                             setweight                             
-------------------------------------------------------------------
 'argument':2 'long':4A 'mandatori':1A,7A 'option':5,10 'short':9A
(1 row)
Next function is delete(), which can be used to remove lexemes from tsvector. Like:
$ select delete(t, 'short') from test;
                       delete                        
-----------------------------------------------------
 'argument':2 'long':4 'mandatori':1,7 'option':5,10
(1 row)
or even multiple:
$ select delete(t, '{long,short}'::text[]) from test;
                   delete                   
--------------------------------------------
 'argument':2 'mandatori':1,7 'option':5,10
(1 row)
There is also unnest function which allows you to convert tsvector to set of rows with all the information:
select (unnest(t)).* from test;
  lexeme   | positions | weights 
-----------+-----------+---------
 argument  | {2}       | {D}
 long      | {4}       | {D}
 mandatori | {1,7}     | {D,D}
 option    | {5,10}    | {D,D}
 short     | {9}       | {D}
(5 rows)
Then, there are two functions for converting tsvectors to arrays, and back:
$ select tsvector_to_array(t) from test;
           tsvector_to_array            
----------------------------------------
 {argument,long,mandatori,option,short}
(1 row)
and:
$ select array_to_tsvector('{argument,long,mandatori,option,short}');
               array_to_tsvector                
------------------------------------------------
 'argument' 'long' 'mandatori' 'option' 'short'
(1 row)
As you can see this conversion is not perfect, as it loses data (position, and priority), but it can be useful together with delete() or setweight().
And finally, there is filter. Filter allows you to get only part of the tsvector.
For example, let's assume we have this tsvector:
$ select
    setweight(
        setweight(
            setweight(t, 'A', '{long,short}'),
            'B',
            '{argument,option}'
        ),
        'C',
        '{mandatori}'
    )
from test;
                              setweight                               
----------------------------------------------------------------------
 'argument':2B 'long':4A 'mandatori':1C,7C 'option':5B,10B 'short':9A
(1 row)
Filter makes it possible to fetch just elements with specific weights, like here:
$ select filter($$'argument':2B 'long':4A 'mandatori':1C,7C 'option':5B,10B 'short':9A$$, '{B,C}');
                     filter                      
-------------------------------------------------
 'argument':2B 'mandatori':1C,7C 'option':5B,10B
(1 row)
All in all, looks like great addition to tsearch. Thanks guys.

Comments

Popular posts from this blog

How to find the server is whether standby (slave) or primary(master) in Postgresql replication ?

7 Steps to configure BDR replication in postgresql

How to Get Table Size, Database Size, Indexes Size, schema Size, Tablespace Size, column Size in PostgreSQL Database

Ora2PG - Oracle/MySQL to Postgres DB migration Version 20.0

PostgreSQL Introduction