Ranking Data - Using Transact-SQL
Ranking data consists of:
- Finding the top N values
- Listing data in a particular order
- Listing ordinal numbers (the ranking)
There are several ways to rank data, depending on how you want to handle
duplicate values:
1- Not applicable, when duplicates are not possible in the source data
2- Allow duplicate values
3- Allow duplicates with gaps in the ranking
1 2 3
No dup. Dup. in Dup. in data,
in data,easy data create gaps
=========== ========== =========
1 197 1 197 1 197
2 163 2 185 2 185
3 155 2 185 2 185
4 132 3 155 4 155 <--
5 120 4 153 5 153
6 119 5 151 6 151
7 117 6 145 7 145
8 114 7 133 8 133
9 110 8 121 9 121
10 108 9 118 10 118
10 116
/*
Sybase 12.5.3 does have a "top N" feature, for getting top 10 lists (for example)
Method #1: Built-in Top-N, no rank #
*/
-- show top 10
select top 10 gross_sales, ticker
from sales_data
order by mkt_cap desc
go
/*
Method #2: Rowcount Top-N, no rank #
*/
set rowcount 20
go
select gross_sales, ticker
from sales_data
order by mkt_cap desc
go
set rowcount 0
go
/*
Method #3: Rank all rows, display rank #
Works best where there are no duplicate values
If the data being ranked does not contain duplicates, the easiest method is to:
Copy the data into a temporary table with an identity column
Use order by to sort the rows
Note: If there are duplicate values, the rank assigned will be somewhat random.
*/
select rank = identity(10), ticker, gross_sales
into #top
from sales_data
order by gross_sales desc
go
select * from #top
where rank <= 20
go
drop table #top
go
/*
Method #4: Rank all rows, display rank, ties share ranking
Where there may be duplicate values
If the data to be ranked contains duplicates,
Copy the data into a temporary table
Join the temporary table back to the base table
Final select uses rank_id in the WHERE clause.
*/
select rank = identity(10), gross_sales
into #top from sales_data
group by gross_sales
order by gross_sales desc
go
select t1.rank, t2.ticker, t2.gross_sales
from #top t1, sales_data t2
where t1.gross_sales = t2.gross_sales and rank <= 20
order by t2.gross_sales desc
go
drop table #top
go
/*
Method #5: Rank N rows, display rank, ties share ranking
If you want to rank only the top N values,
use rowcount on the select into command
*/
set rowcount 20
go
select rank = identity(10), gross_sales
into #top from sales_data
group by gross_sales
order by gross_sales desc
go
set rowcount 0
go
select t1.rank, t2.ticker, t2.gross_sales
from #top t1, sales_data t2
where t1.gross_sales = t2.gross_sales
order by t2.gross_sales desc
go
drop table #top
go
/*
Method #6: Alternate to #4/5
There is an alternate solution for ranking data that contains duplicates:
This method uses a correlated subquery in the select clause
The previous method will probably outperform this method
*/
set rowcount 30
go
select (select count(distinct t2.gross_sales)
from sales_data t2
where t2.gross_sales >= t1.gross_sales) as rank,
t1.ticker, t1.gross_sales
from sales_data t1
where t1.gross_sales is not null
order by rank
go
/*
Method #7: Rank all rows, display rank,
ties share ranking and take up subsequent rankings
"Ranking with Gaps" "Olympic Style"
The previous methods assume that each ordinal
number will be followed by the next number
For example, 6th place would follow 5th place
However, if there are two 5th place entries, you may prefer to follow them with
7th place:
*/
set rowcount 30
go
select (select count(t2.gross_sales) + 1
from sales_data t2
where t2.gross_sales > t1.gross_sales) as rank,
t1.ticker, t1.gross_sales
from sales_data t1
where t1.gross_sales is not null
order by rank
go