Select Then Group by and Then Select and Group by Again

Past:   |   Updated: 2021-09-09   |   Comments (2)   |   Related: More > TSQL


Trouble

So, you have a basic understanding of the GROUP BY clause in SQL Server, but you still feel like there is more to this elementary clause than you have been taught. Well, yous would exist right, at least for some people. A majority of united states learned the basics simply no ane e'er went any further with explaining the full spectrum of this clause, what it is capable of and how to take advantage of its variety. I've seen plenty of grooming videos, attended a few classes that attempted to teach the basics of SQL Server / T-SQL, and then far, none of them take delved deep into the full abilities of the GROUP Past clause.

Solution

In this tutorial nosotros will cover the nuts of the Grouping By clause and then we will delve further in and try to expand on all, or at least most, of the abilities of the clause and how you can take advantage of its options. Kind of a Group BY on steroids.

Nosotros volition start with the basics, adding in some features similar CUBE, ROLLUP and GROUPING SETS, but we also be discussing other benefits also as the limitations of the Grouping BY clause.

Getting AdventureWorks Sample Database for Testing

For simplicity's sake and keeping with a standard examination database, we volition exist working with the AdventureWorks2014 database. If y'all already accept this sample database installed, don't worry, we volition not be irresolute any of the tables or data. We volition, however, create some new tables along with a new schema to piece of work with. Afterwards, we will but dump the tables as well as any schemas we create. (If yous so choose.)

If you do not have the AdventureWorks2014 database installed already, you lot can become a fill-in (BAK) version for free at this link: AdventureWorks sample databases

One time it's downloaded, but follow the basic steps to restore from a ".BAK" file in your SQL Server Management Studio.

If you lot don't want to mess with sifting through the above referenced webpage to find the right database, you tin can click this link to initialize a directly download from Microsoft'due south GitHub repository.

Now, let's dive correct in and learn most the Group Past clause from the ground up.

GROUP Past Statement Basics

In the lawmaking block below, you will observe the basic syntax of a simple SELECT statement with a Group Past clause.

SELECT columnA, columnB FROM tableName  Grouping By columnA, columnB; Go          

At the cadre, the Group By clause defines a group for each singled-out combination of values in a grouped element.

In simpler terms, the Group BY clause combines rows into groups based on matching data in specified columns of a table. One row will be returned for each group.

For example, if you have a column named "Championship" in your tabular array and information technology has three values (managing director, developer, and clerk), simply the tabular array has xx rows, there volition be duplicate entries of the iii values in the "Title" column even though you take unique persons assigned to each row in the "Proper noun" column. The GROUP By clause will break all 20 rows into three groups and render only three rows of data, one for each group.

Of import points for the GROUP By SQL Statement:

  • The Grouping BY statement tin can only exist used in a SQL SELECT statement.
  • The Group By statement must be after the WHERE clause. (If i exists.)
  • The GROUP By statement must be earlier the Gild Past clause. (If one exists.)
  • To filter the Grouping Past results, you must use the HAVING clause after the GROUP BY.
  • The Group By statement is often used in conjunction with an aggregate function such as COUNT, MIN, MAX, AVG, or SUM.
  • All column names listed in the SELECT command must also appear in the Group By statement whether y'all have an aggregate office or non.
  • Except for TEXT, NTEXT, and Image, whatever column can be called in the GROUP Past statement.

query results In 2017, Microsoft stated that the data types "TEXT", "NTEXT", and "IMAGE" would be deprecated in future versions of SQL Server. However, they are still applicable in SQL Server 2019 with SSMS version 18, although you still cannot use them in a Group BY clause. Of course, there are exceptions to every rule I suppose. You lot can read more about this in the section titled "A Piece of work-Around" at the finish of this SQL Tutorial.

IIt is of import to note that using a Grouping By clause is ineffective if at that place are no duplicates in the column y'all are grouping by. When using the AdventureWorks2014 database and referencing the Person.Person table, if y'all GROUP BY the "BusinessEntityID" column, it volition return all 19,972 rows with a count of ane on each row. A better example would exist to group by the "Championship" column of that table. The SELECT clause beneath volition return the 6 unique title types likewise equally a count of how many times each i is constitute in the tabular array within the "Title" cavalcade. This is the core nuts of using a Group BY clause.

Use AdventureWorks2014;Become   SELECT Title, COUNT(*) AS 'Count' FROM Person.Person WHERE Title IS NOT Cypher GROUP By Title; GO          

Results:

query results

An ORDER BY clause was not used in this sample and as you can see there is no order to the effect set. If you need to utilise an ORDER BY clause, it must follow the GROUP By clause. The other particular you lot may notice in the to a higher place query, is that nosotros used a WHERE filter to choose out any rows that are Zippo. This is certainly optional. If you desire to include the rows that are Nothing, just remove the WHERE clause from the query.

The following results are given when we allow Nil values by removing the WHERE clause from the lawmaking cake above.

query results

AlAlthough the Group Past clause is most normally used with the COUNT, AVG, MIN, MAX, and SUM functions to render numerical data, for the purpose of charts among other reasons, it tin can also be used to categorize names, places, regions, etc. without returning, nor relying on, a numeric value. In the sample below, we will return a listing of the "CountryRegionName" column and the "StateProvinceName" from the "Sales.vSalesPerson" view in the AdventureWorks2014 sample database. In the starting time SELECT argument, we volition non do a GROUP BY, but instead, nosotros will simply use the ORDER BY clause to make our results more readable sorted as either ASC (default) or DESC.

In the 2nd SELECT statement, nosotros volition GROUP BY the "CountryRegionName" followed past the "StateProvinceName" columns. The start SELECT argument will return all 17 rows in the table. However, the second SELECT argument will only return fourteen rows. Since "Washington, United states of america" is listed four times in the table, the GROUP BY clause will "group" those four entries into one entry for Washington.p>

Employ AdventureWorks2014; Go   SELECT StateProvinceName, CountryRegionName  FROM Sales.vSalesPerson  ORDER BY CountryRegionName, StateProvinceName; Become   SELECT StateProvinceName, CountryRegionName  FROM Sales.vSalesPerson Group By CountryRegionName, StateProvinceName; GO          

query results

It'southward not oft you will need to return results like the sample higher up, most of the time you will exist working with an amass part. Just, information technology's nice to know that you can practise this besides as how to do this, should the need arise.

At present, moving forrard to some more mutual methods of using the GROUP BY clause.

Aggregates with the SQL GROUP BY Clause

T-SQL (Transact SQL) offers ix aggregate functions, all of which can be used with the GROUP By clause. The five most mutual aggregate functions that volition be discussed in this commodity are the, COUNT(), AVG(), MIN(), MAX(), and SUM(). The four remaining aggregate functions; STDDEV(), STDDEVP(), VAR(), and VARP() functions are specifically related to fiscal and statistical calculations.

The STDDEV() and STDDEVP() functions calculate sample standard deviation and population standard deviation respectively. The VAR() and VARP() functions calculate the sample variance and population variance respectively. An like shooting fish in a barrel way to recall what these 4 do, is to remember that the DEV named functions provide divergence statistics, while the VAR named functions provide the variance statistics. Yous can read more about these four aggregate functions on Microsoft Docs.

COUNT()

In its simplest course, the COUNT() function tin exist used in one of two ways. Within the parenthesis you can phone call the column name that you desire to count past, or you tin use an * (asterisk) betwixt the parenthesis.

  1. Using the * (asterisk) volition count and return all rows even if they contain a NULL value.
  2. Specifying the cavalcade name will not count or return whatever rows that have a Nix value.

So, it actually depends on whether or not you need the data from the associated columns/rows where the focused column contains a Naught value.

At present, permit's employ the COUNT() amass in the following query. Using the "Sales.vSalesPerson" view in the AdventureWorks2014 sample database, we will count how many times each state or region appears in that view.

USE AdventureWorks2014; GO   SELECT CountryRegionName, COUNT(*) AS 'Count' FROM Sales.vSalesPerson GROUP By CountryRegionName; GO          

Results:

query results

As y'all can see, the query returned a count of 11 for the United States, 2 for Canada, and i for each of the remaining countries. These represent how times, or how many rows, these places are establish in the "Sales.vSalesPerson" view.

In this context, the GROUP BY works similarly to the Singled-out clause by returning only 1 entry per country/region. Nonetheless, dissimilar the DISTINCT clause, when we added the COUNT() office, the results displayed how many times each state/region is plant in the table.

AVG()

The AVG() office sums all the non-nada values in a set, then divides that number by the corporeality of non-zilch values in that set to return the average value as the upshot. Dissimilar the COUNT() part, the AVG() function will not accept the wild card * (asterisk) as a value within the parenthesis. You must specify which column you want to return an averaged value on. Since the AVG() function is adding and dividing, (doing arithmetic on the column values), the columns must contain a numeric value. For example, you cannot return an average on a column that contains character (CHAR, VARCHAR, NVARCHAR) data types.

In the sample below, we are returning the boilerplate sales bonus value for each territory in the Sales.SalesPerson column from the AdventureWorks2014 database.

SELECT     TerritoryID    , AVG(Bonus) AS 'Avg Bonus' FROM Sales.SalesPerson WHERE TerritoryID IS Non Zip GROUP Past TerritoryID; Go          

Results:

query results

WeWe added a "WHERE" clause to cull out any NULL valued rows. This was just for clarity's sake. Since T-SQL ignores whatever Null valued rows, it makes this WHERE clause purely cosmetic in nature. Had we left out the WHERE clause, the returned values would remain the aforementioned for all rows, except for the additional row representing the Zippo values. The sample below shows the results without the WHERE clause.

SELECT    TerritoryID    , AVG(Bonus) Every bit 'Avg Bonus' FROM Sales.SalesPerson Group BY TerritoryID; GO          

Results:

query results

For the 3 following aggregates, use this link equally a starting point: Max, Min, and Avg SQL Server Functions

MIN()

The MIN() office (equally its name implies) returns the smallest value in the column specified. MIN() is not restricted to numeric values but as some people believe, it can also be used to return the lowest values of CHAR(), VARCHAR(), NVARCHAR(), UNIQUEIDENTIFIER, or datetime data types besides. However, it cannot be used with the Bit data type.

With the character data types CHAR(), VARCHAR(), and NVARCHAR(), the MIN() function sorts the cord values alphabetically and returns the start (everyman) value in the alphabetized list.

Using the aforementioned Sales.SalePerson table every bit we did in the AVG() function example in a higher place, we will return the minimum value from the "Bonus" column instead of the boilerplate value.

Use AdventureWorks2014; GO   SELECT     TerritoryID     , MIN(Bonus) Every bit 'MinBonus' FROM Sales.SalesPerson WHERE TerritoryID IS Non Zippo GROUP BY TerritoryID; GO          

Results:

query results

MAX()

In contrast to the MIN() part, the MAX() office returns the largest value of the specified column. It does this past utilizing a collating sequence allowing it to work equally efficiently on grapheme columns and datetime columns every bit it does on numeric columns. Keeping consistency, we again will be working with the Sales.SalesPerson table and return the maximum, or highest corporeality, paid in a bonus for each territory.

Use AdventureWorks2014; GO   SELECT     TerritoryID    , MAX(Bonus) AS 'MAX Bonus' FROM Sales.SalesPerson WHERE TerritoryID IS NOT NULL GROUP BY TerritoryID; GO          

Results:

query results

SUM()

The SUM() function returns the total value of all non-null values in a specified column. Since this is a mathematical process, it cannot be used on string values such as the CHAR, VARCHAR, and NVARCHAR data types. When used with a GROUP BY clause, the SUM() function will return the total for each category in the specified table.

Using the Sales.SalesPerson table in the AdventureWorks2014 database, nosotros will render the sum (total) of all bonuses paid out to each territory found in the GROUP BY clause.

USE AdventureWorks2014; GO   SELECT     TerritoryID    , SUM(Bonus) AS 'SUM Bonus' FROM Sales.SalesPerson WHERE TerritoryID IS NOT Goose egg GROUP By TerritoryID; Go          

Results:

query results

For the three following aggregates, use this link as a starting point: Group By in SQL Server with CUBE, ROLLUP and Grouping SETS Examples.

Grouping Past ROLLUP

ROLLUP is an extension of the GROUP BY clause that creates a group for each of the column expressions. Additionally, it "rolls upward" those results in subtotals followed by a k total. Under the hood, the ROLLUP function moves from correct to left decreasing the number of column expressions that it creates groups and aggregations on. Since the column society affects the ROLLUP output, it can likewise affect the number of rows returned in the event set.

The Group BY ROLLUP can be written in one of two ways. You tin declare the ROLLUP extension before you telephone call the column names or after. Both will return the same results. This is another one of those "personal preference" options of writing your SQL code.

Option 1: (Calling the ROLLUP extension earlier the cavalcade names)

Grouping By ROLLUP(State, RegionState);          

Selection two: (Calling the ROLLUP extension after the cavalcade names)

GROUP Past Country, RegionState WITH ROLLUP;          

Observe the parenthesis surrounding the column names in option one that are not present in pick ii. Option 1 must have the parenthesis, choice 2 must NOT have them.

Moving on. For this sample, nosotros are going to create a new table in the AdventureWorks2014 database under the default "dbo" schema.

USE AdventureWorks2014; Become   CREATE TABLE salesTest( Country VARCHAR(30), RegionState VARCHAR(thirty), Sales INT );   INSERT INTO salesTest VALUES('United States', 'Washington', 100); INSERT INTO salesTest VALUES('United States', 'Main', 200); INSERT INTO salesTest VALUES('United States', 'Oregon', 300); INSERT INTO salesTest VALUES('Canada', 'Alberta', 100); INSERT INTO salesTest VALUES('Canada', 'Ontario', 200); GO          

In the next block of lawmaking, we volition generate a "rollup" of all the "summed" values from Canada, a rollup of all the "summed" values from the Usa, and finally a "total summed" value on the two countries listed.

SELECT     Country     , RegionState    , SUM(Sales) AS 'Total Sales' FROM salesTest  GROUP BY ROLLUP(Country, RegionState); GO          

Results:

query results

From the result set above, we see that line 3 is the total of lines i and two (the two regions from Canada), line 7 is the full from lines four – 6 (the three states from the United states of america) and line eight is the grand (rollup) full of lines three and 7.

query resultsYou lot can replace the NULL values in the table with descriptive values by altering the SQL code block with the ISNULL constraint equally shown in the sample below.


SELECT    ISNULL(Land, 'Rollup') As 'Land'    , ISNULL(RegionState, 'Total') Equally 'RegionState'    , SUM(Sales) Every bit 'Total Sales' FROM salesTest GROUP BY ROLLUP(Country, RegionState); Get          

Results:

query results

Again, line three is the rollup total for Canada, line 7 is the rollup full for the U.s. and line viii is the "Thousand" rollup full for lines 3 and 7.

Group By CUBE

Another extension, or sub-clause, of the Grouping Past clause is the CUBE. The CUBE generates multiple grouping sets on your specified columns and aggregates them. In short, it creates unique groups for all possible combinations of the columns you specify. For instance, if y'all use Grouping BY CUBE on (column1, column2) of your table, SQL returns groups for all unique values (column1, column2), (Null, column2), (column1, NULL) and (NULL, Cipher).

Perhaps the all-time manner to understand this, is to see it action. Here nosotros will go on using the tabular array we created in the previous department "Group BY ROLLUP".

Apply AdventureWorks2014; GO   SELECT     Country     , RegionState    , SUM(Sales) AS 'Total Sales' FROM salesTest  GROUP BY ROLLUP(Country, RegionState); GO          

Results:

query results

Equally y'all tin see in the outcome set in a higher place, the query has returned all groups with unique values of (column1, column2), (Cipher, column2), (column1, NULL) and (Nada, NULL). The Goose egg NULL effect gear up on line 11 represents the total rollup of all the cubed curl up values, much like it did in the GROUP BY ROLLUP section from to a higher place.

GROUP BY Group SETS()

The GROUPING SETS selection gives yous the ability to combine multiple GROUP BY clauses into i GROUP BY clause. The GROUP By GROUPING SETS() clause produces the same results as a UNION ALL that is applied to the specified groups. In other words, if I used a Union ALL to grouping two elements or groups into ane, it would expect something like the code block beneath.

Employ AdventureWorks2014; GO   SELECT     Country    , RegionState    , SUM(Sales) As TotalSales FROM salesTest GROUP BY ROLLUP(Country, RegionState) Marriage ALL SELECT     Land    , RegionState    , SUM(Sales) Equally TotalSales FROM salesTest Group By CUBE(State, RegionState); GO          

Results:

query results

A way to condense that UNION ALL code would be to use the GROUPING SETS() sub-clause as in the sample below.

SELECT     Country    , RegionState    , SUM(Sales) As TotalSales FROM salesTest Grouping BY Grouping SETS  ( ROLLUP (Land, RegionState), CUBE (Country, RegionState) ); Get          

Results:

query results

Don't be surprised if your results do not return in the aforementioned order each fourth dimension; the ORDER By clause will aid with that.

As you can see, nosotros are returning the same results as the Matrimony ALL, but with a flake less code.

Merely Wait, There'south More

I told you this was going to be GROUP Past on steroids. Here are some boosted tricks for using the GROUP BY clause that those books and free videos wouldn't tell you about.

GROUP BY with Multiple Tables

Similar most things in SQL/T-SQL, you can always pull your data from multiple tables. Performing this task while including a Grouping BY clause is no different than any other SELECT argument with a Grouping BY clause. The fact that you're pulling the information from two or more than tables has no bearing on how this works. In the sample below, we will be working in the AdventureWorks2014 once once more every bit we bring together the "Person.Address" table with the "Person.BusinessEntityAddress" tabular array. I have as well restricted the sample code to return but the top 10 results for clarity sake in the upshot set up.

USE AdventureWorks2014; GO   SELECT Pinnacle(10)     a.City    , COUNT(b.AddressID) As EmployeeCount FROM  Person.Accost Every bit a INNER JOIN Person.BusinessEntityAddress AS b ON a.AddressID = b.AddressID Grouping By a.City; GO          

Results:

query results

Grouping Past with an Expression

Using a GROUP BY clause on a SELECT statement that contains an expression or built-in part reference, requires that you lot besides include the same expression in both the SELECT argument every bit well as the Grouping BY clause. In the sample beneath, we will use the DATEPART office to return only the year from the "ModifiedDate" column of the "Sales.SalesTerritory" table along with the boilerplate amount due from the aforementioned tabular array.

SELECT DATEPART(yyyy, ModifiedDate) AS 'Year'       ,CAST(AVG(ROUND(SalesYTD, two, ane)) AS numeric(9,2)) AS 'Avg Sales'   FROM Sales.SalesTerritory   GROUP BY DATEPART(yyyy, ModifiedDate); GO          

You may have noticed that I added a little extra code to this ane on the second line. Nosotros are performing a Circular() function on the "SalesYTD" column to return the results in a dollar format with two decimal places. Without the Circular() function, our output would have been either 5275120.9953 (if we would have used the "coin" datatype) or 5275121.00 (if we would have used the "numeric" information blazon) without the ROUND() office.

Results:

query results

GROUP Past with a HAVING clause

Adding a HAVING clause subsequently your Grouping BY clause requires that you lot include any special conditions in both clauses. If the SELECT statement contains an expression, then it follows suit that the Group BY and HAVING clauses must contain matching expressions. Information technology is similar in nature to the "GROUP Past with an EXCEPTION" sample from above. In the next sample code block, nosotros are (still using the AdventureWorks2014 database) now referencing the "Sales.SalesOrderHeader" tabular array to return the total (sum) from the "TotalDue" cavalcade, but merely for a detail twelvemonth. That year will be referenced within the HAVING clause.

SELECT     DATEPART(yyyy,OrderDate) Every bit 'Twelvemonth'       ,Bandage(SUM(Round(TotalDue, 2, ane)) AS numeric(12,2)) Equally 'Total Due'   FROM Sales.SalesOrderHeader   Group Past DATEPART(yyyy,OrderDate)   HAVING DATEPART(yyyy,OrderDate) = '2014'; Become          

Results:

query results

Limitations when using Grouping BY

Equally you would look, there are a few limitations when using the Group Past clause in your SELECT statement. Below is a list of the principal limitations that yous will demand to be familiar with.

For Group By clauses that comprise ROLLUP, CUBE or GROUPING SETS:

  • The maximum number of expressions is 32.
  • The maximum number of groups is 4096.

For Group BY clauses that do not contain ROLLUP, CUBE or Grouping SETS:

  • The number of Group BY items is express by the GROUP BY column size, aggregate values, and aggregated columns.

A Work-Around for Text, NText and Image Data Types

If, for some reason, you are stilling using "TEXT", "Ntext", and/or "Prototype" datatypes in your database, it is highly recommended that yous alter them to an advisable "current" datatype. If your situation prevents you from updating these antiquated data types but y'all still need to Grouping BY using ane or more of these information types, in that location is a work-around for that.

For this sample, we must create a new tabular array, since the AdventureWorks database samples practise not come preloaded with any tables that contain any "TEXT", "NTEXT", or "Epitome" information type columns.

Apply AdventureWorks2014; GO   CREATE Table textTest1(    colID INT IDENTITY NOT NULL    , fName VARCHAR(20)    , Championship TEXT    ); Become   INSERT INTO textTest1(fName, Title) VALUES('Bob', 'Programmer') ,('John', 'Manager') ,('Sarah', 'Clerk') ,('Melissa', 'Programmer') ,('Jeff', 'Manager') ,('Sam', 'Developer') ,('Eliot', 'Developer'); GO          

Now that we have a table that contains a "TEXT" information type, let'southward trying to run a standard SELECT – COUNT() query with a Group BY clause.

SELECT Title, COUNT(*) Equally 'Count' FROM textTest1 GROUP BY Title; GO          

This volition produce a "level 16, Land ii" error.

            Msg 306, Level 16, Land ii, Line xx            The text, ntext, and image information types cannot be compared or sorted, except when using IS NULL or LIKE operator.          

To work around this issue, nosotros can use the CAST function to convert the TEXT data blazon to a VARCHAR data type and get the desired results returned without an error.

SELECT Cast(Title AS varchar) AS Championship, COUNT(*) AS 'Count' FROM textTest1 GROUP By Bandage(Championship AS varchar); Get          

Note: you must employ CAST() in both the SELECT statement besides as the GROUP BY clause. As mentioned before, in the "GROUP Past with an Expression" department, the Grouping BY clause parameter must be called exactly as it is in the SELECT argument.

Results:

query results

Here, we have returned the appropriate count for each of the iii unique values in the Title cavalcade.

Just for future reference, if you lot are nevertheless using the TEXT, NTEXT, and IMAGE data types, you can apply the Cast role to convert them to VARCHAR, NVARCHAR, and VARBINARY respectively.

A best practice would exist to create a view from the above SELECT statement to relieve time and provide a more efficient way of group on the table(s) that have these deprecated data types.

Summary

The primary function of the Grouping Past clause is to split the rows within a tabular array into groups. Consider that a table is in itself a grouping, the Grouping Past clause simply breaks that large grouping into smaller groups, similar mini tables. From there yous can manipulate the data within those mini tables (groups) in but about any style yous tin can imagine.

Contrary to what most books and classes teach you, there are actually 9 aggregate functions, all of which tin can exist used with a GROUP By clause in your code. Equally nosotros have seen in the samples above, you tin can take a Grouping By clause without an aggregate office as well. As we demonstrated earlier in this article, the Group BY clause tin can group string values as well, so it doesn't e'er have to exist a numeric or appointment value.

To sum it all up, in that location is a lot more than to the GROUP BY clause than you lot would normally larn in an introductory SQL class.

Next Steps
  • Group By in SQL Sever with CUBE, ROLLUP and Group SETS Examples
  • GROUP BY in SQL Server
  • Aggregate Functions
  • CUBE and ROLLUP in SQL Server
  • HAVING clause in SQL Server

Related Manufactures

Pop Articles

Nearly the writer

MSSQLTips author Aubrey Love Aubrey Love has been a Database Ambassador for nigh 8 years and is currently working as a Microsoft SQL Server Business Intelligence specialist.

View all my tips

Article Last Updated: 2021-09-09

solercapproper88.blogspot.com

Source: https://www.mssqltips.com/sqlservertip/6955/learning-sql-group-by-clause/

0 Response to "Select Then Group by and Then Select and Group by Again"

Postar um comentário

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel