Create suitable indexes (single-column, composite indexes, etc.) for columns frequently used in queries.
Problematic SQL:
SELECT name FROM employees WHERE department_id = 10;
Optimization: Create an index for department_id
:
CREATE INDEX idx_department_id ON employees(department_id);
Query only the required columns to reduce the amount of returned data.
Problematic SQL:
SELECT * FROM employees WHERE department_id = 10;
Optimization: Query only necessary columns:
SELECT name FROM employees WHERE department_id = 10;
Subqueries are generally less efficient than JOINs.
Problematic SQL:
SELECT name FROM employees WHERE department_id IN (SELECT id FROM departments WHERE location = 'New York');
Optimization: Use JOIN instead of subquery:
SELECT e.name FROM employees e JOIN departments d ON e.department_id = d.id WHERE d.location = 'New York';
Use EXPLAIN
or EXPLAIN ANALYZE
to view the execution plan of SQL queries and identify performance bottlenecks.
EXPLAIN SELECT name FROM employees WHERE department_id = 10;
ORDER BY
consumes significant resources, especially for large datasets. Use it only when sorting is necessary.
Problematic SQL:
SELECT name FROM employees WHERE department_id = 10 ORDER BY hire_date;
Optimization: Remove ORDER BY
if sorting is not needed.
For pagination, use LIMIT
. For queries with large offsets, optimize using indexes or caching.
Problematic SQL:
SELECT name FROM employees ORDER BY hire_date LIMIT 100000, 10;
Optimization: Use primary keys or indexes to improve pagination performance:
SELECT name FROM employees WHERE id > 100000 ORDER BY hire_date LIMIT 10;
Function calls prevent index usage; avoid them where possible.
Problematic SQL:
SELECT name FROM employees WHERE YEAR(hire_date) = 2023;
Optimization: Use range queries instead:
SELECT name FROM employees WHERE hire_date BETWEEN '2023-01-01' AND '2023-12-31';
In composite indexes, place the column with higher selectivity first.
For the query:
SELECT * FROM employees WHERE department_id = 10 AND status = 'active';
Create an index with status
first for better selectivity:
CREATE INDEX idx_status_department ON employees(status, department_id);
Batch inserts significantly reduce I/O and locking overhead.
Problematic SQL: Insert records one by one:
INSERT INTO employees (name, department_id) VALUES ('John', 10);
Optimization: Use batch inserts:
INSERT INTO employees (name, department_id) VALUES ('John', 10), ('Alice', 20), ('Bob', 30);
NOT IN
has poor performance; replace it with NOT EXISTS
or LEFT JOIN
.
Problematic SQL:
SELECT name FROM employees WHERE department_id NOT IN (SELECT id FROM departments);
Optimization: Use LEFT JOIN
:
SELECT e.name FROM employees e LEFT JOIN departments d ON e.department_id = d.id WHERE d.id IS NULL;
Use DISTINCT
only when duplicate data needs removal.
Problematic SQL:
SELECT DISTINCT name FROM employees WHERE department_id = 10;
Optimization: Remove DISTINCT
if duplicates are unnecessary.
Prefer INNER JOIN
unless all data is required. Avoid LEFT JOIN
or RIGHT JOIN
unnecessarily.
Problematic SQL:
SELECT e.name, d.name FROM employees e LEFT JOIN departments d ON e.department_id = d.id;
Optimization: Use INNER JOIN
:
SELECT e.name, d.name FROM employees e INNER JOIN departments d ON e.department_id = d.id;
Partition large tables to improve query performance.
CREATE TABLE employees (
id INT,
name VARCHAR(50),
hire_date DATE
)
PARTITION BY RANGE (YEAR(hire_date)) (
PARTITION p2020 VALUES LESS THAN (2021),
PARTITION p2021 VALUES LESS THAN (2022)
);
Optimize GROUP BY
queries using indexes.
SELECT department_id, COUNT(*) FROM employees GROUP BY department_id;
For large IN
operations, store data in temporary tables and use JOIN
instead.
Problematic SQL:
SELECT name FROM employees WHERE department_id IN (1, 2, 3, 4, 5);
Optimization: Store IDs in a temporary table:
CREATE TEMPORARY TABLE temp_ids (id INT);
INSERT INTO temp_ids (id) VALUES (1), (2), (3), (4), (5);
SELECT e.name FROM employees e JOIN temp_ids t ON e.department_id = t.id;
Views add complexity and performance overhead. Use direct SQL for complex queries.
Replace complex view queries with optimized SQL statements.
Use appropriate locking mechanisms to avoid full-table locks (e.g., LOCK IN SHARE MODE
).
SELECT * FROM employees WHERE id = 10 FOR UPDATE;
Use indexes in INSERT INTO SELECT
statements to improve performance.
INSERT INTO employees_backup (id, name)
SELECT id, name FROM employees WHERE hire_date < '2020-01-01';
For frequent database operations, use connection pools to improve efficiency.
Configure a connection pool at the application level.
Adjust memory settings (e.g., MySQL’s innodb_buffer_pool_size
) to match query demands.
Tune configurations based on query memory requirements.
In distributed database environments, minimize cross-node data transfer and optimize query plans.
Problematic SQL:
SELECT e.name, d.name
FROM employees e JOIN departments d
ON e.department_id = d.id
WHERE e.location = 'New York';
Optimization: Process location-related data on the local node before global aggregation to avoid cross-node data transfer.
When querying multiple columns, use composite indexes where possible. If not, the database may attempt index merging.
Problematic SQL:
SELECT * FROM orders WHERE customer_id = 10 AND product_id = 5;
Optimization: Combine indexes on customer_id
and product_id
for better performance. Use EXPLAIN
to check if index merging is utilized.
Use CUBE
and ROLLUP
for multi-dimensional aggregation, reducing multiple GROUP BY
queries.
Problematic SQL: Multiple GROUP BY
queries.
SELECT department_id, SUM(sales) FROM sales_data GROUP BY department_id;
SELECT region, SUM(sales) FROM sales_data GROUP BY region;
Optimization: Use ROLLUP
to aggregate at multiple levels:
SELECT department_id, region, SUM(sales)
FROM sales_data
GROUP BY department_id, region WITH ROLLUP;
Window functions (e.g., ROW_NUMBER()
, RANK()
, LAG()
, LEAD()
) simplify complex analysis, reducing the need for self-joins or subqueries.
Problematic SQL: Self-join to fetch the previous record.
SELECT a.*,
(SELECT sales FROM sales_data b WHERE b.id = a.id - 1) AS previous_sales
FROM sales_data a;
Optimization: Use a window function:
SELECT id, sales, LAG(sales, 1) OVER (ORDER BY id) AS previous_sales
FROM sales_data;
Use partition pruning to limit the data scan range for very large tables.
Problematic SQL:
SELECT * FROM transactions WHERE transaction_date BETWEEN '2023-01-01' AND '2023-01-31';
Optimization: Partition the table by date and leverage pruning:
CREATE TABLE transactions (
id INT,
amount DECIMAL(10, 2),
transaction_date DATE
)
PARTITION BY RANGE (YEAR(transaction_date)) (
PARTITION p2023 VALUES LESS THAN (2024)
);
Reduce temporary table usage in complex queries as they increase disk I/O and impact performance.
Problematic SQL: Using a temporary table to store intermediate results.
CREATE TEMPORARY TABLE temp_sales AS
SELECT department_id, SUM(sales) FROM sales_data GROUP BY department_id;
Optimization: Use subqueries or Common Table Expressions (CTEs):
WITH temp_sales AS (
SELECT department_id, SUM(sales) FROM sales_data GROUP BY department_id
)
SELECT * FROM temp_sales;
Leverage parallel query execution for large datasets to improve efficiency.
Problematic SQL: A large data scan without parallelism.
SELECT SUM(sales) FROM sales_data;
Optimization: Enable parallel query execution:
ALTER SESSION ENABLE PARALLEL QUERY;
SELECT /*+ PARALLEL(sales_data, 4) */ SUM(sales) FROM sales_data;
For complex aggregation queries, use materialized views to store precomputed results.
Problematic SQL: Complex aggregation query with performance bottlenecks.
SELECT department_id, SUM(sales) FROM sales_data GROUP BY department_id;
Optimization: Create a materialized view:
CREATE MATERIALIZED VIEW mv_sales_data AS
SELECT department_id, SUM(sales) AS total_sales FROM sales_data GROUP BY department_id;
In high-concurrency environments, avoid table or row locks by using appropriate locking mechanisms.
Problematic SQL: Table lock causing performance degradation under high concurrency.
SELECT * FROM orders WHERE status = 'pending' FOR UPDATE;
Optimization: Lock only specific rows:
SELECT * FROM orders WHERE order_id = 123 FOR UPDATE;
For long-running transactions, minimize locking time and reduce the scope of locks.
Problematic SQL: Large-scale data operations locking tables during the transaction.
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE account_id = 1;
UPDATE accounts SET balance = balance + 100 WHERE account_id = 2;
COMMIT;
Optimization: Split into smaller transactions or reduce lock time:
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE account_id = 1;
COMMIT;
BEGIN;
UPDATE accounts SET balance = balance + 100 WHERE account_id = 2;
COMMIT;
Optimizing SQL queries is both an art and a science.
The techniques outlined above provide a robust foundation for improving query performance, but the key to true mastery lies in constant experimentation and adaptation.
Every database is unique - what works for one scenario may not work for another. Always analyze, test, and refine your queries to build your own optimization.
Leapcell is the Next-Gen Serverless Platform for Web Hosting, Async Tasks, and Redis:
Multi-Language Support
Deploy unlimited projects for free
Unbeatable Cost Efficiency
Streamlined Developer Experience
Effortless Scalability and High Performance
Explore more in the Documentation!
Follow us on X: @LeapcellHQ