TSVECTOR EDITING FUNCTIONS

DBA Jobs
TutorialDBA Forum
IT SUPPORT
Our Services
Training
About Me

TSVECTOR EDITING FUNCTIONS

Get link
Facebook
X
Pinterest
Email
Other Apps

- September 13, 2017

Tsvector editing functions
 
Adds several tsvector editting function: convert tsvector to/from text array,
set weight for given lexemes, delete lexeme(s), unnest, filter lexemes
with given weights
 
Author: Stas Kelvich with some editorization by me
Reviewers: Tomas Vondram, Teodor Sigaev

For those that don't know tsvector is datatype used by tsearch, which in turn is PostgreSQL's full text search engine.

Basically, whenever you index data for use for tsearch, you use (sometimes explicitly, sometimes not) tsvector values.

Which looks like this:

$ select to_tsvector('english', 'Mandatory arguments to long options are mandatory for short options too.');
                          to_tsvector                          
---------------------------------------------------------------
 'argument':2 'long':4 'mandatori':1,7 'option':5,10 'short':9
(1 row)

As you can see some words have been removed (“to", “are", and “too"), and the rest were brought back to simplest form (mandatory => mandatori).

To make future examples smaller, let's store this tsvector in test table:

$ create table test as select to_tsvector('english', 'Mandatory arguments to long options are mandatory for short options too.') as t;
SELECT 1
 
$ \d test
      Table "public.test"
 Column |   Type   | Modifiers 
--------+----------+-----------
 t      | tsvector | 
 
$ select * from test;
                               t                               
---------------------------------------------------------------
 'argument':2 'long':4 'mandatori':1,7 'option':5,10 'short':9
(1 row)

Now, let's see the new functions.

First of them is setweight. We could have used it before, like:

$ select setweight(t, 'A') from test;
                              setweight                               
----------------------------------------------------------------------
 'argument':2A 'long':4A 'mandatori':1A,7A 'option':5A,10A 'short':9A
(1 row)

Which did set weight for all words in this tsvector (well, not words, lexemes).

Now, we can do it, by word:

$ select setweight(t, 'A', '{mandatory,long,short}') from test;
                            setweight                            
-----------------------------------------------------------------
 'argument':2 'long':4A 'mandatori':1,7 'option':5,10 'short':9A
(1 row)

Unfortunately, to make it work for mandatori, I'd have to provide it as it was vectorized:

$ select setweight(t, 'A', '{mandatori,long,short}') from test;
                             setweight                             
-------------------------------------------------------------------
 'argument':2 'long':4A 'mandatori':1A,7A 'option':5,10 'short':9A
(1 row)

Next function is delete(), which can be used to remove lexemes from tsvector. Like:

$ select delete(t, 'short') from test;
                       delete                        
-----------------------------------------------------
 'argument':2 'long':4 'mandatori':1,7 'option':5,10
(1 row)

or even multiple:

$ select delete(t, '{long,short}'::text[]) from test;
                   delete                   
--------------------------------------------
 'argument':2 'mandatori':1,7 'option':5,10
(1 row)

There is also unnest function which allows you to convert tsvector to set of rows with all the information:

select (unnest(t)).* from test;
  lexeme   | positions | weights 
-----------+-----------+---------
 argument  | {2}       | {D}
 long      | {4}       | {D}
 mandatori | {1,7}     | {D,D}
 option    | {5,10}    | {D,D}
 short     | {9}       | {D}
(5 rows)

Then, there are two functions for converting tsvectors to arrays, and back:

$ select tsvector_to_array(t) from test;
           tsvector_to_array            
----------------------------------------
 {argument,long,mandatori,option,short}
(1 row)

and:

$ select array_to_tsvector('{argument,long,mandatori,option,short}');
               array_to_tsvector                
------------------------------------------------
 'argument' 'long' 'mandatori' 'option' 'short'
(1 row)

As you can see this conversion is not perfect, as it loses data (position, and priority), but it can be useful together with delete() or setweight().

And finally, there is filter. Filter allows you to get only part of the tsvector.

For example, let's assume we have this tsvector:

$ select
    setweight(
        setweight(
            setweight(t, 'A', '{long,short}'),
            'B',
            '{argument,option}'
        ),
        'C',
        '{mandatori}'
    )
from test;
                              setweight                               
----------------------------------------------------------------------
 'argument':2B 'long':4A 'mandatori':1C,7C 'option':5B,10B 'short':9A
(1 row)

Filter makes it possible to fetch just elements with specific weights, like here:

$ select filter($$'argument':2B 'long':4A 'mandatori':1C,7C 'option':5B,10B 'short':9A$$, '{B,C}');
                     filter                      
-------------------------------------------------
 'argument':2B 'mandatori':1C,7C 'option':5B,10B
(1 row)

All in all, looks like great addition to tsearch. Thanks guys.

Postgresdba

Get link
Facebook
X
Pinterest
Email
Other Apps

Comments

Top 40 Highest Paying URL Shortener Companies - Best URL Shorteners To Earn Fast

- January 01, 2017

Earn Money On Short URLs is one the amazing and easiest method of all time. No doubt their are hundreds of different ways to earn money from living room but each and every task require little technical effort and investment. But from all them Earn money from URL shortener is easy and free to use. If you does not have any technical or development skills and even marketing word scares you then its better option for you to go with url shortening to earn some decent money. Now the question what the URL short means and how this mechanism really works.? Lets start which basic understanding about URL shorteners, as its name describes shorten down the long URL. Basically URL shorteners are used to make long ugly urls into small, easy to remember. Commonly we all use bitly and any other service to accomplish this task. But they are only used for shorting perform but on the other hand there are few services who provide as way to monetize from that short links. Those serv...

VMWARE WORKSTATION 3,4,5,6,7,8,9,10,11,12,14,15...etc LICENSE KEYS COLLECTION

- April 09, 2018

Below tutorialdba.com collected and sorted out hundreds of universal License Keys for all major versions of VMware Workstation Pro (not for VMware Workstation Player) 4.x, 5.x, 6.x, 7.x, 8.x, 9.x, 10.x, 11.x, 12.x and v14.x on Windows and Linux platforms (support both 32-bit and 64-bit operating system) in this single post. Besides, we also provide some license keys for VMware other projects. Just enjoy and share them. // 4~14 Universal License Keys // Version License Keys VMware Workstation VMware Workstation 4.x.x ZHDH1-UR90N-W844G-4PTN6 G1NP0-T88AL-M016F-4P8N2 ZC14J-4U16A-0A04G-4MEZP J1WF8-58LDE-881DG-4M8Q3 VMware Workstation 5.x.x LUXRM-WP0DN-A256U-4M9Q3 DJXDR-NDT27-Y2NDU-4YTZK DA925-HP80U-Z8HDC-4WXXP 3KW2W-AYR2C-88M6F-4MDQ2 VMware Workstation 6.x.x A0E8R-YUDFV-6AK2F-4GAN2 CRX0D-VWL0V-7CJ6C-46C7A NA8RX-QPNDU-D2LA9-4WAZL 1H4WM-N21FZ-7GK2A-44U5U 6AJ6N-THY2P-42KEF-4WTFG FK8R9-LPCDT-88H4Y-4WRN3 KAR8R-T8MAL-K8J6A-4WDXQ YJEKW-JMFF4-YA1DC-4WTQ...

How to Get Table Size, Database Size, Indexes Size, schema Size, Tablespace Size, column Size in PostgreSQL Database

- June 26, 2018

In this post, I am sharing few important function for finding the size of database, table and index in PostgreSQL. Finding object size in postgresql database is very important and common. Is it very useful to know the exact size occupied by the object at the tablespace. The object size in the following scripts is in GB. The scripts have been formatted to work very easily with PUTTY SQL Editor. 1. Checking table size excluding table dependency: SELECT pg_size_pretty(pg_relation_size('mhrordhu_shk.mut_kharedi_audit')); pg_size_pretty ---------------- 238 MB (1 row) 2. Checking table size including table dependency: SELECT pg_size_pretty(pg_total_relation_size('mhrordhu_shk.mut_kharedi_audit')); pg_size_pretty ---------------- 268 MB (1 row) 3. Finding individual postgresql database size SELECT pg_size_pretty(pg_database_size('db_name')); 4. Finding individual table size for postgresql database -including dependency index: SELECT pg_size_pretty(pg_total_rel...

PostgreSQL ALTER TABLE ... SET LOGGED / UNLOGGED

- November 16, 2017

PostgreSQL allows one to create tables which aren't written to the Write Ahead Log, meaning they aren't replicated or crash-safe, but also don't have the associated overhead, so are good for data that doesn't need the guarantees of regular tables. But if you decided an unlogged table should now be replicated, or a regular table should no longer be logged, you'd previously have to create a new copy of the table and copy the data across. But in 9.5, you can switch between logged and unlogged using a new command: Set an unlogged table to logged: ALTER TABLE <tablename> SET LOGGED; Set a logged table to unlogged: ALTER TABLE <tablename> SET UNLOGGED; For example: # CREATE UNLOGGED TABLE messages (id int PRIMARY KEY, message text); # SELECT relname, CASE relpersistence WHEN 'u' THEN 'unlogged' WHEN 'p' then 'logged' ELSE 'unknown' END AS table_type FROM pg_class WHERE relna...

How To Configure pglogical | streaming replication for PostgreSQL

- January 12, 2018

The pglogical extension provides logical streaming replication for PostgreSQL We use the following terms to describe data streams between nodes, deliberately reused from the earlier Slony technology: • Nodes - PostgreSQL database instances • Providers and Subscribers - roles taken by Nodes • Replication Set - a collection of tables Architectural details: • pglogical works on a per-database level, not whole server level like physical streaming replication • One Provider may feed multiple Subscribers without incurring additional disk write overhead • One Subscriber can merge changes from several origins and detect conflict between changes with automatic and configurable conflict resolution (some, but not all aspects required for multi-master). • Cascading replication is implemented in the form of changeset forwarding. Requirements :- To use pglogical the provider and subscriber must be running PostgreSQL 9.4 or newer. The pglogical exten...

Search This Blog

TutorialDBA - Support | Training | Consultant

TSVECTOR EDITING FUNCTIONS

Comments

Post a Comment

Popular posts from this blog

Top 40 Highest Paying URL Shortener Companies - Best URL Shorteners To Earn Fast

VMWARE WORKSTATION 3,4,5,6,7,8,9,10,11,12,14,15...etc LICENSE KEYS COLLECTION

How to Get Table Size, Database Size, Indexes Size, schema Size, Tablespace Size, column Size in PostgreSQL Database

PostgreSQL ALTER TABLE ... SET LOGGED / UNLOGGED

How To Configure pglogical | streaming replication for PostgreSQL