2 min readPostgreSQL

How Platform Engineers Can Automate Postgres Maintenance

Database maintenance is a crucial ingredient for sustained performance and data integrity. For platform engineers, automating routine tasks within a PostgreSQL environment ensures both reliability and relieves team members from repetitive manual work. Here's a quick rundown of effective approaches:

Fine-Tuning Autovacuum

PostgreSQL's built-in autovacuum daemon automates essential VACUUM and ANALYZE operations. To get the most out of it:

Verifying Autovacuum Enablement

Ensure autovacuum is actively running on your databases:

SHOW autovacuum;

(1 row)

Adjusting Autovacuum Thresholds

Adjust autovacuum thresholds to align with your database's update patterns and size. Postgres uses this calculation to trigger an autovacuum:

autovacuum threshold = autovacuum_vacuum_threshold + (table_size * autovacuum_vacuum_scale_factor)

The default threshold value is 50 tuples, and the default scale factor is 0.2. Thus, a table with 1000 tuples will be autovacuumed after 250 have been updated/deleted. But if you have a one million row table, vacuuming won’t happen until more than 200,000 tuples have been updated, possibly leading to degraded performance. Decreasing the autovacuum_vacuum_scale_factor as your table grows will lead to more vacuuming.

Cost-Based Vacuuming Techniques

Utilize autovacuum_vacuum_cost_delay and related parameters to fine-tune when vacuuming operations trigger, balancing performance impact with maintenance needs.

Scheduled Database Cleaning with Ansible

You might want to run scheduled VACUUM operations for predictive or aggressive cleanup. For on-demand VACUUM operations, create an Ansible playbook targeting your PostgreSQL databases. This offers centralized control and the ability to target specific databases or tables. This is easy using the community.general.postgresql_query module, which allows you to execute arbitrary SQL queries. This module is part of the community.general collection, so ensure you have it installed in your Ansible environment.

Here’s an example of an Ansible playbook that performs a VACUUM operation on a specific PostgreSQL database.

- name: Run VACUUM on PostgreSQL Database
  hosts: database_servers
  become: yes 

  - name: VACUUM PostgreSQL database
      db: "your_database_name"
      login_user: "your_database_user"
      login_password: "your_database_password"
      login_host: "localhost"
      login_port: "5432"
      query: "VACUUM;"
      register: vacuum_result

  - name: Print VACUUM output
      var: vacuum_result

Adjust this to your needs. For instance, use VACUUM ANALYZE instead of VACUUM to update statistics for the query planner.

Advanced VACUUM and ANALYZE with Custom Scripts

Write custom scripts (in your preferred language) for advanced VACUUM and ANALYZE operations. Vacuum tables based on factors like size, dead tuples, or the last modification time. If you’re working in Python, you can try something like this:

# conditional_vacuum.py

def vaccum_table(table_name):
    cursor = conn.cursor()

    # Get the number of dead tuples and total tuples in the table
      SELECT n_dead_tup, n_live_tup
      FROM pg_stat_user_tables
      WHERE relname = %s;
    """, (table_name,))

    result = cursor.fetchone()
    dead_tuples = result[0]
    total_tuples = result[0] + result[1]

    # Calculate the percentage of dead tuples
    dead_tuple_percentage = (dead_tuples / total_tuples) * 100

    # Logic for vacuuming - you can adjust the conditions below
        if dead_tuple_percentage > 20:
            cursor.execute(f"VACUUM ANALYZE {table_name}")
            print(f"Table {table_name} vacuumed and analyzed")
            print(f"Table {table_name} does not meet vacuum criteria")

By calculating the percentage of dead tuples, we ensure that the script triggers VACUUM only on tables that have a significant amount of reclaimable space. Customize these thresholds to your needs.

You then want to leverage tools like cron to execute your scripts during off-peak hours. Example for running the script daily at 2 AM:

0 2 * * * /usr/bin/python3 /path/to/your/conditional_vacuum.py

Index Fragmentation in PostgreSQL

Utilizing pg_repack and pgcompacttable for Index Maintenance

These extensions allow index rebuilding and defragmentation without the heavy locking restrictions commonly associated with index maintenance.

Index Reindexing for Optimal Performance

Set up scheduled jobs to identify and automatically reindex fragmented indexes during periods of low database usage.

By following these guidelines, platform engineers can efficiently automate their PostgreSQL database maintenance routines, enabling greater resource optimization and helping sustain database health over time.

Readyset Cloud for PostgreSQL

Readyset Cloud is a MySQL and PostgreSQL wire-compatible caching platform that offers significant database scalability enhancements while reducing the complexity of query optimization. Sign up for Readyset Cloud today.

Published by:

Marcelo Altmann
Marcelo Altmann