Drop Down MenusCSS Drop Down MenuPure CSS Dropdown Menu

Postgresql Memory flow and OS caching

  • Postgres as a cross platform database, relies heavily on the operating system for its caching.
  • The shared_buffers is actually duplicating what the OS does.A typical picture of how the data flows through postgres is given below.
  • This is confusing at first, since caching is managed by both the OS and postgres as well, but there are reasons to this.
  • Talking about operating system cache requires another post of its own, but there are many resources on the net which can leveraged.
  • Keep in mind that the OS caches data for the same reason we saw above, i.e why do we need a cache ?
  • We can classify the I/O as two types, i.e reads and writes. To put it even more simpler, data flows from disk to memory for reads and flows from memory to disk for writes.

Data Reads:
  • For reads, when you consider the flow diagram above, the data flows from disk to OS cache and then to shared_buffers. We have already discussed how the pages will get pinned on to the shared_buffers until they get dirty/unpinned.
  • Sometimes, both the OS cache and shared_buffers can hold the same pages. This may lead for space wastage, but remember the OS cache is using a simple LRU and not a database optimized clock sweep. Once the pages take a hit on shared_buffers, the reads never reach the OS cache, and if there are any duplicates, they get removed easily.
  • In reality, there are not much pages which gets stacked on both the memory areas.
  • This is one of the reasons why it is advised to size the shared_buffers carefully. Using hard and fast rules such as giving it the lion’s share of the memory or giving it too little is going to hurt performance.

Data Writes:
  • Writes flow from memory to disk. This is where the concept of dirty pages come in.
  • Once a page is marked dirty, it gets flushed to the OS cache which then writes to disk. This is where the OS has more freedom to schedule I/O based on the incoming traffic.
  • As said above, if the OS cache size is less, then it cannot re-order the writes and optimize I/O. This is particularly important for a write heavy workload. So the OS cache size is important as well.

  • As with many database systems, there is no silver bullet configuration which will just work. PostgreSQL ships with a basic configuration tuned for wide compatibility rather than performance.
  • It is the responsibility of the database administrator/developer to tune the configuration according to the application/workload. However, the folks at postgres have a good documentation of where to start
  • Once the default/startup configuration is set. We can do load/performance testing to see how it is holding up.
  • Keep in mind that the initial configuration is also tuned for availability rather than performance, it is better to always experiment and arrive at a config that is more suitable for the workload under consideration.
Postgres Optimize:
  • If you cannot measure something, you cannot optimize it
  • With postgres, there are two ways you can measure.

Operating system:
  • While there is no general consensus on which platform postgres works best, I am assuming that you are using something in the linux family of operating systems. But the idea is kind of similar.
  • To start with, there is a tool called Io top which can measure disk I/O. Similar to top, this can come in handy when measuring disk I/O. Just run the command iotop to measure writes/reads.
  • This can give useful insights into how postgres is behaving under load i.e how much is hitting the disk and how much is from RAM which can be arrived based on the load being generated.

Directly from postgres:
It is always better to monitor something directly from postgres,rather than going through the OS route.
Typically we would do OS level monitoring if we believe that there is something wrong with postgres itself, but this is rarely the case.
With postgres, there are several tools at our disposal for measuring performance with respect to memory.

  • Explain:The default is SQL explain.Gives more information than any other database system, but a little hard to get your head around. Needs practice to get used to. Don’t miss the several useful flags, that can be given especially buffers which we previously saw. 
  • Query logs:Query logs are another way to understand what is happening inside the system.Instead of logging everything, we can log only queries that cross certain duration or otherwise called slow query logs using the log_min_duration_statement parameter.
  • Auto explain:This is another cool thing that you can do which will automatically log the execution plan along with the slow queries. Very useful for debugging without having the need to run explain by hand.
  • Pg stat statements:The above methods are good, but lack a consolidated view.This is a module built within postgres itself, but disabled by default.We can enable this by doing create extension pg_stat_statementsOnce this is enabled, after a fair amount of queries are run, then we can fire a query such as below.Gives lot of details on how much time queries took and their average.The disadvantage with this approach is it takes some amount of performance, so it is not generally recommended in production systems.
        (total_time / 1000 / 60)::decimal as total_minutes,
        (total_time/calls)::decimal as average_time,
      FROM pg_stat_statements
      LIMIT 100;
PG Buffer cache and PG fincore
If you want to get a little deeper, then there are two modules which can dig directly into shared_buffers and OS cache itself.
An important thing to note is that the explain (analyze,buffers) shows data from shared_buffers only and not from the OS cache.
  • PG buffer cache:This helps us see the data in shared buffers in real time. Collects information from shared_buffers and puts it inside of pg_buffercache for viewing.A sample query goes as below, which lists the top 100 tables along with the number of pages cached.
SELECT c.relname,count(*) AS buffers
    FROM pg_class c INNER JOIN pg_buffercache b
    ON b.relfilenode=c.relfilenode INNER JOIN pg_database d
    ON (b.reldatabase=d.oid AND 
   GROUP BY c.relname ORDER BY 2 DESC LIMIT 100;
  • PG fincore:This an external module, that gives information on how the OS caches the pages. It is pretty low level and also very powerful.
  • Pg prewarm:This is an in built module that can actually load the data into shared_buffers/OS cache or both. If you think memory warm up is the problem, then this is pretty useful for debugging.


Popular posts from this blog

How to find the server is whether standby (slave) or primary(master) in Postgresql replication ?

7 Steps to configure BDR replication in postgresql

How to Get Table Size, Database Size, Indexes Size, schema Size, Tablespace Size, column Size in PostgreSQL Database

Ora2PG - Oracle/MySQL to Postgres DB migration Version 20.0

PostgreSQL Introduction