Monday, February 6, 2012

SQL Script to calculate the median

Business case
Below is a script that will calculate the median of a dataset. The median is the middle value in an ordered dataset. If the dataset has an even number of values, the median is the average of the innermost two.

Approach
This script works by taking the union of two select top values queries. If there are an odd number of records the same values are returned twice which is fine. The values returned by the union are then averaged. Note the use of derived tables. This example uses the famous parts table from Chris Date. The idea for this script came from a tek-tips SQL forum.

Script
use master
drop
database parts;
go
create
database parts;
go
use
parts
-- create Date's famous parts table
CREATE TABLE Parts (
part_nbr VARCHAR(5) NOT NULL PRIMARY KEY,
part_name VARCHAR(50) NOT NULL,
part_color VARCHAR(50) NOT NULL ,
part_wgt INTEGER NOT NULL ,
city_name VARCHAR(50) NOT NULL);
-- load data (experiment with different numbers of rows, etc.)
INSERT INTO Parts (part_nbr, part_name, part_color, part_wgt, city_name)
VALUES ('p1', 'Nut', 'Red', 12, 'London'),
('p2', 'Bolt', 'Green', 17, 'Paris'),
('p3', 'Cam', 'Blue', 12, 'Paris'),
('p4', 'Screw', 'Red', 14, 'London'),
('p5', 'Cam', 'Blue', 12, 'Paris'),
('p6', 'Cog', 'Red', 19, 'London');
use parts
-- display data
select *from parts
order by part_wgt;

 -- calculate the median

select avg(wgt) as median
from(select max(part_wgt) as wgt
from (select top 50 percent *
from parts
order by part_wgt asc) a
union
select
min(part_wgt)

from (select top 50 percent *
from parts
order by part_wgt desc) d) u;
Results

No comments:

Post a Comment