Automating PostgreSQL Partitioning for Large-Scale Time-Series Data with pg_partman
Lukas Schneider
DevOps Engineer · Leapcell

Introduction
In the era of big data, managing ever-growing datasets is a critical challenge, especially when dealing with time-series information. Imagine IoT sensor readings, financial transaction logs, or network telemetry data – these datasets can quickly escalate to gigabytes, then terabytes, and even petabytes. As the data volume grows, so do the complexities of querying, archiving, and maintaining these tables. Simply appending data to a single monolithic table will inevitably lead to performance degradation, increased vacuuming overhead, and prolonged index rebuilds. This is where database partitioning comes into play, offering a strategy to divide large tables into smaller, more manageable pieces based on a specific criterion, often time. While PostgreSQL provides native partitioning capabilities, manually managing thousands of partitions, including their creation, deletion, and indexing, becomes an administrative nightmare. This article dives into pg_partman, a powerful extension designed to automate this very process, making the management of ultra-large time-series data tables in PostgreSQL both efficient and practically feasible.
Understanding Time-Series Data Management
Before we delve into pg_partman, let's clarify some core concepts essential for understanding its utility.
- Time-Series Data: Data points indexed by time, typically collected over a period. Its common characteristics include high insert rates, append-only nature (or infrequent updates), and queries often filtered by time ranges.
- Database Partitioning: A technique to divide a large logical table into smaller physical pieces called partitions. In PostgreSQL, this is typically done using
RANGEorLISTpartitioning. For time-series data,RANGEpartitioning on a timestamp column is the most common approach, creating partitions for specific time intervals (e.g., daily, weekly, monthly). - Declarative Partitioning: Since PostgreSQL 10, partitioning can be defined directly within the
CREATE TABLEstatement usingPARTITION BY RANGEorPARTITION BY LIST. Child partitions are then created withCREATE TABLE ... PARTITION OF .... - Partition Pruning: An optimization where the query planner examines the
WHEREclause of a query and eliminates (prunes) partitions that do not contain relevant data, thus significantly reducing the amount of data scanned. - Retention Policy: A set of rules defining how long data should be kept. For time-series data, older data often becomes less relevant or needs to be moved to cheaper storage, necessitating a robust archiving and purging mechanism.
pg_partman addresses the gap in automating the lifecycle management of these declarative partitions. While PostgreSQL provides the building blocks, pg_partman orchestrates the creation, expiration, and indexing of partitions, freeing administrators from tedious manual tasks.
How pg_partman Works
pg_partman operates by maintaining a metadata table (partman.part_config) that defines the partitioning scheme for each parent table it manages. Based on this configuration, it dynamically creates future partitions and drops stale ones according to your specified retention policies. It's typically run as a scheduled job (e.g., via cron).
Let's walk through an example of how to set up pg_partman for a time-series table.
1. Install pg_partman
First, you need to install the extension. This usually involves installing a package (e.g., postgresql-14-partman on Debian/Ubuntu) and then enabling it within your database:
CREATE EXTENSION pg_partman;
2. Create Your Parent Table
Let's assume we have a table sensor_data that stores sensor readings, with a timestamp column.
CREATE TABLE public.sensor_data ( id BIGSERIAL NOT NULL, sensor_id INT NOT NULL, measurement_value NUMERIC(10, 2) NOT NULL, event_timestamp TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW() ) PARTITION BY RANGE (event_timestamp); -- It's a good practice to create an initial default partition for any data that -- falls outside the defined range of specific partitions. CREATE TABLE public.sensor_data_default PARTITION OF public.sensor_data DEFAULT;
Notice the PARTITION BY RANGE (event_timestamp) clause. This tells PostgreSQL that sensor_data is a partitioned table based on the event_timestamp column.
3. Configure pg_partman
Now, we tell pg_partman to manage public.sensor_data. We'll set up daily partitions, predict 5 days into the future, and retain data for 30 days.
SELECT partman.create_parent( p_parent_table => 'public.sensor_data', p_control_column => 'event_timestamp', p_interval => 'daily', p_premake => '5 days', p_retention => '30 days' );
Let's break down these parameters:
p_parent_table: The schema and name of the parent table (public.sensor_data).p_control_column: The column used for partitioning (event_timestamp).p_interval: The duration of each partition (daily). Other options include'hourly','weekly','monthly','yearly', or specific time intervals like'3 hours','15 minutes'.p_premake: How far into the futurepg_partmanshould create empty partitions.'5 days'means it will create partitions for the next 5 days. This ensures that new data always has a partition to go into without delay.p_retention: How long partitions should be kept.'30 days'means partitions older than 30 days will be dropped.
After running create_parent, pg_partman will immediately create the first set of child partitions. You can check them using:
\d+ public.sensor_data
You'll see something like sensor_data_2023_10_26, sensor_data_2023_10_27, etc.
4. Run the Maintenance Function
The core of pg_partman's automation lies in its maintenance function. This function should be called periodically.
SELECT partman.run_maintenance();
When run_maintenance() is executed, it performs several tasks based on the partman.part_config table:
- Create New Partitions: It checks
p_premakeand creates any necessary future partitions. - Drop Old Partitions: It checks
p_retentionand drops partitions that are older than the specified retention period. This is crucial for managing storage and performance. - Attach Partitions: If you have historical data in non-partitioned tables,
pg_partmancan help attach them. - Manage Indexes: It can automatically create indexes on new partitions based on the parent table's index definitions.
5. Schedule Maintenance
The run_maintenance() call should be scheduled to run regularly. A common approach is using cron on the database server or a dedicated scheduler. For example, to run every hour:
# In your crontab (e.g., `crontab -e`) 0 * * * * psql -d your_database -c "SELECT partman.run_maintenance();"
Application Scenarios and Advanced Features
- Historical Data Migration:
pg_partmanprovides functions to move existing data from a non-partitioned table into its newly created partitioned children. - Custom Naming Conventions: You can customize the naming of your child partitions if the default
tablename_YYYY_MM_DDformat isn't suitable. - Index Management:
pg_partmancan be configured to create indexes on new partitions, mirroring those on the parent table. This is essential for query performance. - Templates: For more complex partitioning schemes or applying specific table/index storage parameters to new partitions,
pg_partmansupports template tables. - Background Worker: For very active systems,
pg_partmancan also be run as a background worker process directly within PostgreSQL to avoid external scheduling issues. - Partition Archiving: Instead of just dropping old partitions,
pg_partmancan be configured to detach them, allowing you to move them to cold storage or archive them before deletion.
By automating these processes, pg_partman significantly reduces the operational overhead associated with managing large partitioned tables, allowing developers and DBAs to focus on data utilization rather than infrastructure management. It ensures that queries remain performant by leveraging partition pruning and that storage growth is controlled by regular data pruning.
Conclusion
Managing terabyte-scale time-series data tables in PostgreSQL without proper automation is an uphill battle, fraught with performance bottlenecks and administrative complexities. pg_partman emerges as an indispensable tool, seamlessly extending PostgreSQL's native partitioning capabilities with robust lifecycle management. It simplifies the creation, retention, and maintenance of thousands of partitions, ensuring optimal database performance and predictable storage costs. For any PostgreSQL deployment dealing with large, time-based datasets, pg_partman is not just a convenience, but a necessity for scalable and maintainable operations.

