My journey to Data Warehousing: May 2009

As I was browsing through blogs, I found an interesting blog where the author would post variety of TSQL challenges. I found it very interesting to solve those challenges. I never thought I would be in the winners list but as of now, i have been awarded 2 times by the authors. I would also encourage my blog readers to work on the TSQL challengs. This would definitely improve your approach in solving real world problems.

So here is my solution for the TSQL challenge 5 for which the author has awarded me 2nd place. The challenge can be found from http://beyondrelational.com/blogs/tc/archive/2009/04/24/tsql-challenge-5.aspx.

;with extract(id,name,tags,old_pos,new_pos)

as (

select id,name,substring(tags,0,case when charindex(',',tags,0) = 0 then len(tags) + 1 else charindex(',',tags,0) end),

0 as old_pos,charindex(',',tags,0) + 1 as new_pos from @blog --Anchor member function

Union all

select b.id ,b.name,substring(b.tags,new_pos,case when charindex(',',

b.tags,new_pos) = 0 then len(b.tags) else charindex(',',b.tags,new_pos) - new_pos end),

charindex(',',b.tags,new_pos) as old_pos,charindex(',',b.tags,new_pos) + 1 as new_pos

from @blog b inner join extract e on b.id = e.id where new_pos ! = 1 --Recursive member function

)

select f.id,f.data,e.name,b.tags,count(*) as relevance from @filter f,extract e,@blog b

where b.id = e.id and f.data like '%' + e.tags + '%'

group by f.id,f.data,e.name,b.tags

order by id,relevance desc

There are numerous ways to give solution for the challenge. I have used Recursive CTE (common table expression) as I thought this would make it simple and sweet.

After analysis of the challenge, I figured that 2 steps are required to solve this.

To extract the common separated values from @blog table to multiple rows in a new table
Join the new table with @filter table to populate results.

To extract the values form @blog table, I used recursive CTE.

The Anchor member function of the CTE is.

select id,name,substring(tags,0,case when charindex(',',tags,0) = 0 then len(tags) else charindex(',',tags,0) end),

0 as old_pos,charindex(',',tags,0) + 1 as new_pos from @blog

This query selects the first item from the tags column. In case we have ‘sql,profiler,table' then sql is the first item in the list. Substring function takes 3 paramaters, first parameter takes column name, second parameter takes the start index of the string and thrid parameter takes the length of the string to be extracted.

So to get the first item I check the existance of comman using charindex(',',tags,0). If the function returns 0 then there are no more commas in the list so return the length of the string len(tags).

If there are more than one commas in the list then we need a column to hold the starting postion of the new item in the list. So I have used old_pos and new_pos, old_ps will hold the starting postion of the item and new_position will hold the postion of the next item in the list . 0 is the starting postion for all the items in the Anchor member function so I have hard coded 0 as old_pos. charindex(',',tags,0) + 1 as new_pos will return the next position of the comma in the list, 1 is added so that it is pointing to the next item immediately after the comma.

The recurisive member function of the CTE is.

select b.id ,b.name,substring(b.tags,new_pos,case when charindex(',',

b.tags,new_pos) = 0 then len(b.tags) else charindex(',',b.tags,new_pos) - new_pos end),charindex(',',b.tags,new_pos) as old_pos, charindex(',',b.tags,new_pos) + 1 as new_pos

from @blog b inner join extract e on b.id = e.id where new_pos ! = 1

This query is similar to the previous query with little changes. Here for getting the item from the list, we are using charindex(',',b.tags,new_pos) - new_pos which would return the length of the new item. For example, In a list ‘sql,performance,table’, to select the second item from the list, charindex(',',b.tags,new_pos) would return 16 and new_pos would be 5 from previous step. So 16 – 5 = 11 is the length for second item.

I am joining with the table extract with the filter condition new_pos != 1. This is because charindex(',',b.tags,new_pos) funcion will retrun 0 if no commas are found in the list and i am adding 1 for return value. So 1 would be the termination condition for the list.

The last query,

select f.id,f.data,e.name,b.tags,count(*) as relevance from @filter f,extract e,@blog b

where b.id = e.id and f.data like '%' + e.tags + '%'

group by f.id,f.data,e.name,b.tags

order by id,relevance desc

Here I am just grouping the items based upon the relevance. e.tags is passed dynamically for every row to get the count of that item in the data column of @filter table.

My journey to Data Warehousing

Sunday, May 10, 2009

Solution for TSQL Challenge 5

Followers

Blog Archive

About Me

Labels