Midrange News for the IBM i Community


Ranking Data Published by: Bob Cozzi on 20 Jul 2015 view comments

[IN DEVELOPMENT]

The ability to number the rows in a resultSet of an SQL query is a pretty cool feature. Users love the ability to see the row number or the relative record number in the resultset. This is just one piece of the OLAP or "Online Analytical Processing" capability of DB2 for i SQL. A good reference page on the IBM KnowledgeBase library is located here.

By now most IBM i SQL users have seen the ROW_NUMBER() function, this gives you the ability to include a record number (not relative record number) in the resultSet. It can be used to order the results or exclude parts of the results (the top 50 rows in the resultSet only).

 SELECT Dec(row_number() over(ORDER BY A.VIEWS),9,0) as Popularity,       
  a.VIEWS, a.OID,                  
  char(rTrim(d.title) || ' ' || d.subtitle,60) as Title
  FROM articles A                           
  inner join artDesc D ON A.oid=D.oid       
  WHERE A.OID NOT IN (1205,1206,1258,1203,1270,1484,
 ORDER BY A.views desc
 Ftch first 50 rows only   

In the above SQL statement, an articles list is generated and ranked by the number of hits or "views". The ranking is listed along with a link to the specific article. If I had specified "row_number() over()" instead of including the ORDER BY clause within the over() function, then the traditional ORDER BY clause on the statement sets the ranking/row numbering.

But row_number() is not technicall ranking. It is a sequencially value that is assigned to each row for a unique identifier within the SQL resultSet. It can certainly be used for simple ranking results, as we've done here. To illustrate it more clearly, I created a little example that queries the articles on MidrangeNews.com and lists them by their views (most to least). Click the link below to view the Top 50 Articles (based on page views) from the MidrangeNews.com website.

The above is basic ranking. It does not use the RANK or DENSE_RANK function of SQL OLAP. These two features, along with PARTITION BY are used to rank data within sets of values.

RANK will number rows but also includes duplicate rows with duplicate values. It then jumps to the next relative value in the ranking. Meaning it will skip numbers, for example you might end up with rankings sequences like this:

1,2,3,3,5,6,7,7,9,10

Notice the numbers 4 and 8 are missing. It's similar to athletic competition where you could have 1 first place winner, two 2nd place winners (in the event of a tie) and no 3rd place winner.

DENSE_RANK is similar, but doesn't skip numbers, so you might have:

1,2,3,3,4,5,6,6,7,8

The duplicates are still assigned duplicate values, but the next number in the rank is the next sequencial number.

PARTITION BY

This clause allows you to rank things based on the value of fields withing the select statement. I borrowed an example from IBM.com and improved on it here. The link below goes out to IBM.com and pulls in links to the last few blog entries. It's live, so give it ia try.

[more to come...]

Return to midrangenews.com home page.
Sort Ascend | Descend

COMMENTS