Skip to Content
Course content

5.4 Indexing for Optimization

Indexes in SQL are special data structures that improve the speed and efficiency of database operations, especially querying and searching. They act like pointers to the data in a table, allowing the database to locate specific rows more quickly. Indexing is one of the most effective ways to optimize database performance, particularly for large datasets. However, indexes do come with trade-offs in terms of storage and maintenance overhead, so it's important to use them wisely.

5.4.1 What is an Index?

An index is a database object that improves the speed of data retrieval operations on a table. It is created on one or more columns of a table and helps to quickly locate data without scanning the entire table.

In essence, indexes function similarly to the index of a book. If you need to find a specific topic, the index points you directly to the page number, rather than having to flip through every page.

5.4.2 How Indexes Work

When a query searches for data, the database engine typically performs a full table scan, which means it checks every row in the table to find the matching records. This can be time-consuming, especially for large tables.

An index organizes the indexed column(s) into a data structure (commonly a B-tree) that allows for faster searching. Instead of scanning every row, the database can use the index to directly find the location of the desired data, significantly speeding up query execution.

Example: If you have an index on the employee_id column of an employees table, searching for an employee by employee_id will be much faster since the database will use the index rather than scanning the entire table.

5.4.3 Types of Indexes

There are several types of indexes in SQL, each with its own use cases:

1. Single-Column Index

A single-column index is created on a single column in a table. It is useful when queries filter or sort based on that particular column.

Syntax:

CREATE INDEX index_name ON table_name (column_name);

Example:

CREATE INDEX idx_employee_id ON employees (employee_id);

2. Multi-Column Index (Composite Index)

A multi-column index is created on multiple columns. It can improve performance when queries involve multiple columns in the WHERE clause or in joins.

Syntax:

CREATE INDEX index_name ON table_name (column1, column2, ...);

Example:

CREATE INDEX idx_employee_dept ON employees (employee_id, department_id);

3. Unique Index

A unique index ensures that the values in the indexed column(s) are unique. It is often used to enforce primary key or unique constraints.

Syntax:

CREATE UNIQUE INDEX index_name ON table_name (column_name);

Example:

CREATE UNIQUE INDEX idx_unique_email ON users (email);

4. Full-Text Index

A full-text index is used for searching large text fields in a table. It allows for efficient searching of text within TEXT or VARCHAR columns using full-text search capabilities.

Syntax (MySQL example):

CREATE FULLTEXT INDEX index_name ON table_name (column_name);

Example:

CREATE FULLTEXT INDEX idx_fulltext_article ON articles (content);

5. Clustered Index

A clustered index determines the physical order of data rows in a table. In most databases, the primary key automatically creates a clustered index. A table can only have one clustered index because the rows can only be physically ordered in one way.

5.4.4 When to Use Indexes

Indexing is not always necessary. Here are some guidelines to help determine when to use indexes:

1. Frequently Queried Columns

If certain columns are frequently used in WHERE, JOIN, or ORDER BY clauses, they may benefit from an index. These are usually columns you use to filter or group data.

Example: If you're often filtering data by the employee_id column, creating an index on that column can improve query performance.

2. Primary Key and Unique Constraints

Indexes should always be created on primary keys and unique constraints, as these ensure data integrity and optimize lookups.

3. Foreign Key Columns

Indexes on foreign key columns are often beneficial, especially in large tables. These indexes can speed up joins and improve referential integrity checks.

4. Large Tables

If your tables have a significant amount of data, indexes can significantly reduce the time required to search, insert, and update rows.

5.4.5 Costs and Trade-offs of Indexes

While indexes can greatly improve read performance, they come with certain costs:

1. Storage Overhead

Indexes consume additional storage space. The more indexes you have, the more disk space your database will require.

2. Slower Write Performance

Indexes need to be updated whenever data in the indexed columns is modified (insert, update, delete). This means that while querying becomes faster, writing data to the table can be slower due to the additional overhead of maintaining the index.

3. Index Maintenance

Over time, especially in heavily updated tables, indexes can become fragmented. You may need to periodically rebuild or optimize indexes to maintain performance.

5.4.6 Managing Indexes

1. Viewing Existing Indexes

To view the indexes on a table, you can use the following command (depending on the DBMS):

  • MySQL:
    SHOW INDEXES FROM table_name;
    
  • PostgreSQL:
    \di
    

2. Dropping an Index

To drop an index that is no longer useful, use the DROP INDEX command:

DROP INDEX index_name;

Example:

DROP INDEX idx_employee_id;

3. Rebuilding or Optimizing Indexes

For databases that support it (e.g., SQL Server, MySQL), you can rebuild or optimize indexes to improve performance and reduce fragmentation.

  • MySQL:
    OPTIMIZE TABLE table_name;
    
  • SQL Server:
    ALTER INDEX ALL ON table_name REBUILD;
    

5.4.7 Best Practices for Indexing

  • Index selective columns: Index columns that are frequently used in queries with WHERE, JOIN, and ORDER BY clauses. The more selective a column is (i.e., fewer duplicate values), the more beneficial the index will be.
  • Avoid over-indexing: Too many indexes can degrade the performance of write operations. Index only the columns that will significantly speed up query performance.
  • Index foreign keys: Index columns that are used as foreign keys to speed up join operations.
  • Monitor index usage: Regularly review and analyze index usage to identify unused or redundant indexes that can be dropped.

5.4.8 Conclusion

Indexing is a critical technique for optimizing database query performance. By creating appropriate indexes, you can significantly improve the speed of data retrieval, reduce query execution time, and enhance user experience, especially for large datasets. However, it's important to balance the performance benefits of indexing with the costs associated with storage and maintenance. By understanding when and how to create indexes, you can optimize your SQL queries and database performance.

Commenting is not enabled on this course.