Skip to Content
Course content

8.1 Grouping and Aggregation: GROUP BY, HAVING

8.1.1 Introduction to Grouping and Aggregation

In SQL, grouping and aggregation are powerful tools used to summarize and analyze data. When working with large datasets, you often need to group records and perform calculations like counting, summing, or averaging data for each group. The GROUP BY clause allows you to group rows that share a common value, and the HAVING clause is used to filter groups based on aggregate conditions.

  • Grouping: Groups rows based on a specified column(s).
  • Aggregation: Calculates a single value (like sum, average, count, etc.) for each group.

8.1.2 GROUP BY Clause

The GROUP BY clause is used to arrange identical data into groups, allowing aggregate functions to be performed on each group.

  • Syntax:
    SELECT column_name, aggregate_function(column_name)
    FROM table_name
    WHERE condition
    GROUP BY column_name;
    
  • Explanation:
    • column_name: The column by which you want to group the data.
    • aggregate_function: Functions like SUM(), AVG(), COUNT(), etc., that perform calculations on the grouped data.
  • Example:
    SELECT Department, COUNT(*) AS NumberOfEmployees
    FROM Employee
    GROUP BY Department;
    
    • Explanation: This query groups the employees by their department and counts the number of employees in each department.

8.1.3 Aggregate Functions

When using GROUP BY, you often use aggregate functions to perform calculations on each group of data. Some common aggregate functions are:

  • COUNT(): Returns the number of rows in each group.
  • SUM(): Returns the sum of a numerical column for each group.
  • AVG(): Returns the average of a numerical column for each group.
  • MAX(): Returns the maximum value in a column for each group.
  • MIN(): Returns the minimum value in a column for each group.

Example with aggregate functions:

SELECT Department, AVG(Salary) AS AverageSalary
FROM Employee
GROUP BY Department;
  • Explanation: This query groups employees by their department and calculates the average salary for each department.

8.1.4 HAVING Clause

The HAVING clause is used to filter the results of a GROUP BY operation based on aggregate conditions. It is similar to the WHERE clause, but while WHERE filters rows before grouping, HAVING filters groups after the aggregation.

  • Syntax:
    SELECT column_name, aggregate_function(column_name)
    FROM table_name
    GROUP BY column_name
    HAVING condition;
    
  • Example:
    SELECT Department, AVG(Salary) AS AverageSalary
    FROM Employee
    GROUP BY Department
    HAVING AVG(Salary) > 50000;
    
    • Explanation: This query groups employees by department and calculates the average salary for each group, but only includes departments with an average salary greater than ₹50,000.

8.1.5 Using GROUP BY with Multiple Columns

You can group data by multiple columns to create more granular groupings.

  • Example:
    SELECT Department, JobTitle, COUNT(*) AS NumberOfEmployees
    FROM Employee
    GROUP BY Department, JobTitle;
    
    • Explanation: This query groups employees first by department, and then by job title, and counts the number of employees in each job title within each department.

8.1.6 Sorting Grouped Data

You can use the ORDER BY clause in conjunction with GROUP BY to sort the grouped data.

  • Example:
    SELECT Department, AVG(Salary) AS AverageSalary
    FROM Employee
    GROUP BY Department
    ORDER BY AverageSalary DESC;
    
    • Explanation: This query groups employees by department, calculates the average salary for each department, and sorts the result in descending order of the average salary.

8.1.7 NULL Values in Grouping

By default, GROUP BY treats NULL values as a single group. If you have NULL values in the grouped column, they will appear as a separate group in the results.

  • Example:
    SELECT Department, COUNT(*) AS NumberOfEmployees
    FROM Employee
    GROUP BY Department;
    
    • If some rows have NULL in the Department column, those rows will be grouped together and counted as NULL in the Department column.

Conclusion:

The GROUP BY and HAVING clauses are fundamental tools for aggregating and summarizing data in SQL. GROUP BY enables you to group rows based on one or more columns and perform calculations on each group, while HAVING allows you to filter the results of aggregated data. Together, they provide a powerful way to analyze large datasets and extract meaningful insights.

Commenting is not enabled on this course.