Syed A. Rizvi - Developing Web-based Architectures: August 2008

Wednesday, August 27, 2008

SQL Server - Get Nth Highest using ROW_NUMBER() Function

* Compatibility SQL Server 2005 onwards
ROW_NUMBER function was introduced in SQL Server 2005. ROW_NUMBER as the name suggests is used to calculate row numbers in the query result set.

Prior to ROW_NUMBER developers relied on temporary tables and co-related sub-queries to calculate sequential output, this often resulted in spaghetti code and overall bad query performance.

For more details on ROW_NUMBER visit: MSDN

Let us consider a real-world ROW_NUMBER example using NorthWind database. You can download and install NorthWind database from the following location.

Consider this scenario, we want to run a report which gives us Top Order by sales.

We are not calculating ties here i.e. what if more than one order generated exactly same sales amount, ties can be handled using using RANK() function but for now we will assume sequential result set.

The stored procedure which we are going to see will provide the flexibility to return exact result set i.e. Order with highest sales or range i.e. Top 10 Orders by sales

Stored procedure code:

USE [Northwind]
GO
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[GetHighestSales_sp]') AND type in (N'P', N'PC'))
DROP PROCEDURE [dbo].[GetHighestSales_sp]
GO
SET ANSI_NULLS OFF
GO
SET QUOTED_IDENTIFIER ON
GO

CREATE PROCEDURE dbo.GetHighestSales_sp
(
@Rank int = NULL,
@FROM int = 1,
@TO int = 1
)
AS
BEGIN
--no extra columns
SET NOCOUNT ON;

/*CTE Declaration block */
WITH CustomerOrders (OrderID, CustomerID, EmployeeID, ContactName, CompanyName, Phone, TotalPurchasePrice, SalesRank) AS
(
SELECT OrderTotal.OrderID As OrderID, C.CustomerID As CustomerID, E.EmployeeID As EmployeeID,
C.ContactName, C.CompanyName, C.Phone, OrderTotal.TotalPurchasePrice,
OrderTotal.SalesRank
FROM
Customers C
INNER JOIN
(
SELECT ORD.OrderID As OrderID, O.CustomerID, O.EmployeeID,
Sum(CAST((UnitPrice * Quantity) * (1-Discount)/100 * 100 AS Money)) AS TotalPurchasePrice,
Row_Number() OVER(ORDER BY (Sum(CAST((UnitPrice * Quantity) * (1-Discount)/100 * 100 AS Money))) DESC) AS SalesRank
FROM
[Order Details] ORD
INNER JOIN ORDERS O ON O.OrderID = ORD.OrderID
GROUP BY ORD.ORDERID, O.CustomerID, O.EmployeeID
) AS OrderTotal ON OrderTotal.CustomerID = C.CustomerID
INNER JOIN Employees E ON OrderTotal.EmployeeID = E.EmployeeID
)

--determine which result to extract
SELECT
OrderID, E.LastName + ', ' + E.FirstName AS EmployeeName, ContactName,
CompanyName, Phone, TotalPurchasePrice, SalesRank
FROM
Employees E
INNER JOIN CustomerOrders CO ON CO.EmployeeID = E.EmployeeID
WHERE
CO.SalesRank between @From AND @To

--reset
SET ROWCOUNT 0
END

(a) Top Order by sales

EXEC dbo.GetHighestSales_sp 1

Query returns Order with highest sales amount.

(b) Top 10 Order by sales

EXEC dbo.GetHighestSales_sp NULL, 1, 10

Query returns Top 10 Orders by sales amount.

Tuesday, August 12, 2008

MOSS 2007 - Data Caching Vs List Iteration

MOSS developers most of the time need to retrieve data from SharePoint Lists. In one of the MOSS project I worked there were several provisioned .ASPX Pages which lacked in performance.

The pages took more than usual time to render, on average a page displaying 1000+ List records (we are not discussing Pagination here) was taking approximately 2 minutes! Coming from ASP.NET/SQL background this was just unacceptable.

The code was nicely structured with N-Tier architecture, all the MOSS specific code was residing in Data Access Layer (DAL) and .ASPX pages were consumers.

DAL consisted of function which was returning a DataTable with all the relevant data to be displayed on the .ASPX page.

The problem code was in this function, below is an excerpt of code from this function.

//load datatable

DataTable dt = objDB.GetAllData();

//this happens for 1000+ rows

foreach (DataRowView dr in pages.DefaultView)

{

....

DataRow viewRow = null;
//problem code

viewRow["MyRow"] = objDB.GetNameByID(dr["ID"].ToString())["Name"].ToString();

....

}

//gets Name by ID

public SPListItem GetNameById(string id)

{

int itemId = -1;

if (Int32.TryParse(id, out itemId))

{

using (SPSite site = new SPSite(SiteUrl))

{

using (SPWeb web = site.OpenWeb())

{

SPList list = web.Lists["ListData"];

return list.GetItemById(itemId);

}

return null;

}

Code highlighted with orange shows the problem area, if the outer loop is executing 1000+ times then this lookup function is called 1000+ times along with expensive SPSite, SPWeb and individual lookup which works satisfactorily if you have 100-200 rows but the performance starts to deteriorate exponentially with the increase in number of calls.

Fix:

After the reviewing the code, I recognized the need to avoid calling lookup function 1000+ times and instead have an inline cache to do lookup.

I am not talking about .NET cache here or MOSS 2007 object cache but DataView which can be used as a temporary cache.

//gets Name by ID revised to return DataView

public DataView GetAllNameById()

{

using (SPSite site = new SPSite(SiteUrl))

{

using (SPWeb web = site.OpenWeb())

{

SPList oList = web.Lists["ListData"];

SPQuery oQuery = new SPQuery();

oQuery.Query = "";
oQuery.RowLimit = 0;
DataView dv = new DataView();
dv = oList.GetItems(oQuery).GetDataTable().DefaultView;
//indexes, helps in faster lookup
dv.Sort = "ID";
//return sorted dataview
return dv;
}
}
}

Let us see the revised code excerpt once again:

//cache all data to avoid query MOSS on every request
DataView dv = objDB.GetAllNameById();
//load datatable
DataTable dt = objDB.GetAllData();
//this happens for 1000+ rows
foreach (DataRowView dr in pages.DefaultView)
{
....
....
....
DataRow viewRow = null;
//revised code
DataRowView[] drv = dv.FindRows(dr["Name"].ToString());
if (null != drv)
{
viewRow["MyRow"] = drv[0]["Name"].ToString();
}
....
....
....
}

With this change we are not using expensive SPSite, SPWeb objects and lookup on every iteration but caching the entire lookup List in an indexed DataView, when we are doing the lookups we are not querying a List but querying a DataView which is much Faster!

With this code change the Page rendering time changed from 2 minutes to 7 seconds!

Friday, August 8, 2008

SQL Server - Using EXCEPT Operator

* Compatibility SQL Server 2005 onwards
EXCEPT operator was introduced in SQL Server 2005. EXCEPT can be used to return distinct records from the left side of a query and which are not located on the right side.

EXCEPT operator has the following behavior:

• EXCEPT by default returns a DISTINCT result set

• EXCEPT requires same column number and order on both side of the query with comparable data type

For more details on EXCEPT visit: MSDN

Let us consider a real-world EXCEPT example using NorthWind database. You can download and install NorthWind database from the following location.

(a) Customers who never placed any Order (Using EXCEPT)

SELECT
CustomerID
FROM
Customers
EXCEPT
SELECT
CustomerID
FROM
Orders

CustomerID
--------------
FISSA
PARIS
….
--------------

Query returns 2 rows, which are unique CustomerID. The above query demonstrated that we can extract unique CustomerID for all the Customers who never placed an Order using EXCEPT operator instead of writing a LEFT OUTER JOIN syntax to accomplish the same output.

(b) Customers who never placed any Order (Using LEFT OUTER JOIN)

SELECT
DISTINCT C.CustomerID
FROM
Customers C
LEFT OUTER JOIN Orders O ON C.CustomerID = O.CustomerID
WHERE
O.CustomerID IS NULL

CustomerID
--------------
FISSA
PARIS
….

--------------
Query returns same output as (a) using LEFT OUTER JOIN syntax

Thursday, August 7, 2008

SQL Server – Using INTERSECT Operator

* Compatibility SQL Server 2005 onwards
INTERSECT operator was introduced in SQL Server 2005. INTERSECT can be used to return distinct records from the left and right side of a query.

INTERSECT is often compared with INNER JOINS, it can be said that INTERSECT is almost same as INNER JOIN with the following key differences:

• INTERSECT by default returns a DISTINCT result set whereas INNER JOIN return duplicates unless it is instructed not to do so by using DISTINCT clause to the SELECT statement

• INTERSECT requires same column number and order on both side of the query with comparable data type, INNER JOIN does not require same column number and order condition

For more details on INTERSECT visit: MSDN

Let us consider a real-world INTERSECT example using NorthWind database. You can download and install NorthWind database from the following location.

(a) How many Products have been at least ordered once

SELECT
ProductID
FROM
Products
INTERSECT
SELECT
ProductID
FROM
[Order Details]

ProductID
--------------
1
2
3
4
5
….
….

Query returns 77 rows, which are unique ProductID. The above query demonstrated that we can extract unique ProductID for all the Products purchased using INTERSECT operator instead of writing an INNER JOIN to accomplish the same output

(b) INTERSECT Vs INNER JOIN

SELECT
ProductID
FROM
Products
INTERSECT
SELECT
ProductID
FROM
[Order Details]

ProductID
--------------
1
2
3
4
5
….
….
Returns 77 rows

SELECT
ORD.OrderID
FROM
[Order Details] ORD
INNER JOIN Orders O ON O.OrderID = ORD.OrderID

ProductID
------------
1
2
3
4
5
……
……

Query returns 2155 rows, which are non-unique ProductID. The above query demonstrated the default behavior of INNER JOIN

SELECT
DISTINCT ORD.OrderID
FROM
[Order Details] ORD
INNER JOIN Orders O ON O.OrderID = ORD.OrderID

ProductID
------------
1
2
3
4
5
……
……

Query returns 77 rows, which are unique ProductID as we used DISTINCT keyword to filter out the duplicates.

Tuesday, August 5, 2008

SQL Server - Common Table Expressions (CTE)

* Compatibility: SQL Server 2005 onwards
Common Table Expressions (CTE) construct was introduced with Microsoft SQL Server 2005. CTE behaves like a derived table but unlike derived table which has statement level scope CTE offers a procedure level.

CTE helps in writing more modular code, which helps in better readability and reducing complexity of query.

CTE provides following benefits:
. Ability to call block of query (CTE declaration) multiple times like a function call in modern
programming language
. Ability to provide recursion logic (with robust exit condition)
. Ability to use result set (CTE call) in multiple statements or single statement (self join)

CTE is described with the following syntax:

WITH MyCTE1 (OrderID, CustomerID) AS
(
SELECT
OrderID, CustomerID
FROM
MyTable1
)
, MyCTE2 (FirstName, LastName)
(
SELECT
FirstName, LastName
FROM
MyTable2
)
/*Multiple CTE’s can be declared with ‘,’*/

Where
. MyCTE1 is CTE Name used to reference CTE in queries
. Column list (optional), good practice to declare them to avoid confusion
. Query declaration after the AS keyword
. Multiple CTE’s can be declared putting ‘,’ after previous ending declaration

CTE can then be used as following (we will see detailed example later):
SELECT * FROM MyCTE

CTE can be declared in stored procedures, user-defined functions (UDF), triggers, or views.

For more details on CTE’s visit: MSDN

Let us consider a real-world CTE example using NorthWind database. You can download and install NorthWind database from the following location.

Consider this scenario; we want to run a report which gives us Order detail by OrderID, EmployeeID or CustomerID, our sample query will only return Employee name, Contact name, Company name, Phone, Fax and Total purchase price for the order (the objective is to make the stored procedure flexible enough to fetch data for all the input parameters or some).

If we follow conventional T-SQL approach we would have to write multiple T-SQL statements to achieve our objective, but as you can see with one stored procedure using combination of CTE, Dynamic tables and Coalesce statement we made the entire stored procedure multi-functional, we can use the stored procedure to run reports like Sales by Employee, Sales by Customer, Overall Sales total and many more cases depending on how you use the data, of course we can change the query criteria to make our stored procedure even more flexible or extendable.

Stored procedure code:

USE [Northwind]
GO
/****** Object: StoredProcedure [dbo].[GetCustomerOrders_sp] Script Date: 08/17/2008 21:37:16 ******/
SET ANSI_NULLS OFF
GO
SET QUOTED_IDENTIFIER ON
GO
/*
--all
EXEC dbo.GetCustomerOrders_sp null, null, null
--by OrderID
EXEC dbo.GetCustomerOrders_sp 10682, null, null
--by CustomerID
EXEC dbo.GetCustomerOrders_sp null, null, 'ANTON'
--by EmployeeID
EXEC dbo.GetCustomerOrders_sp null, 9, null
--exact match
EXEC dbo.GetCustomerOrders_sp 10263, 9, 'ERNSH'
*/
ALTER PROCEDURE [dbo].[GetCustomerOrders_sp]
(
@OrderID int = null,
@EmployeeID int = null,
@CustomerID nVarchar(5) = null
)
AS
BEGIN
--no extra columns
SET NOCOUNT ON;

/*CTE Declaration block */
WITH CustomerOrders (OrderID, CustomerID, EmployeeID, ContactName, CompanyName, Phone, Fax, TotalPurchasePrice) AS
(
SELECT OrderTotal.OrderID As OrderID, C.CustomerID As CustomerID, E.EmployeeID As EmployeeID,
C.ContactName, C.CompanyName, C.Phone, C.Fax, OrderTotal.TotalPurchasePrice
FROM
Customers C
INNER JOIN
(
SELECT ORD.OrderID As OrderID, O.CustomerID, O.EmployeeID,
Sum(CAST((UnitPrice * Quantity) * (1-Discount)/100 * 100 AS Money)) AS TotalPurchasePrice
FROM
[Order Details] ORD
INNER JOIN ORDERS O ON O.OrderID = ORD.OrderID
GROUP BY ORD.ORDERID, O.CustomerID, O.EmployeeID
) AS OrderTotal ON OrderTotal.CustomerID = C.CustomerID
INNER JOIN Employees E ON OrderTotal.EmployeeID = E.EmployeeID
),
CustomerDetail (CustomerID, ContactName, CompanyName, Phone, Fax) AS
(
SELECT CustomerID, ContactName, CompanyName, Phone, Fax
FROM Customers
)
/*Note Multiple CTE's can be declared in declaration block*/
--determine which result to extract
SELECT
OrderID, E.LastName + ', ' + E.FirstName AS EmployeeName, E.EmployeeID, ContactName, CompanyName, Phone, Fax, TotalPurchasePrice
FROM
Employees E
INNER JOIN CustomerOrders CO ON CO.EmployeeID = E.EmployeeID
WHERE
CO.EmployeeID LIKE Coalesce(@EmployeeID, CO.EmployeeID)
AND CO.OrderID LIKE Coalesce(@OrderID, CO.OrderID)
AND CO.CustomerID LIKE Coalesce(@CustomerID, CO.CustomerID)
ORDER BY E.EmployeeID ASC
END

Test Scripts result:

Note: Result sets images are adjusted to show the first few records and rest are omitted

--all
EXEC dbo.GetCustomerOrders_sp null, null, null

--by OrderID

EXEC dbo.GetCustomerOrders_sp 10682, null, null

--by CustomerID

EXEC dbo.GetCustomerOrders_sp null, null, 'ANTON'

--by EmployeeID

EXEC dbo.GetCustomerOrders_sp null, 9, null

--exact match

EXEC dbo.GetCustomerOrders_sp 10263, 9, 'ERNSH'

Friday, August 1, 2008

SQL Server – Power of CASE Function

* Compatibility SQL Server 2000 onwards
CASE statement can be used to introduce conditional logic. CASE can be used to evaluate multiple statements and return one possible output.

CASE can be compared to Switch statement in modern programming languages like C#.

CASE statement provides following benefits:

Ability to format output
Ability to perform conditional logic on a Column
Ability to optimize query by emitting the need to write multiple queries to
return a result set
Ability to perform conditional Update
Ability to provide dynamic Order By in single query

CASE functions are of 2 types:

Simple [Comparison is done with substituted inline values]
Searched [Comparison is done on Column level]

For more details on CASE syntax visit: MSDN

Let us see CASE statement in action for the different type of scenarios:

(a) Formatted output using Simple CASE function:

SELECT
OrderID, OrderDate, ShipVia,
Shipper =
(CASE SHIPVIA
WHEN 1 THEN 'Speedy Express'
WHEN 2 THEN 'United Package'
WHEN 3 THEN 'Federal Shipping'
ELSE 'Unknown'
END)
FROM
Orders

(b) Column level comparison using Searched CASE function:

SELECT
OrderID, OrderDate, Freight,
FreightCategory =
(CASE
WHEN Freight <= 0 THEN 'No Freight'

WHEN (Freight >= 1 AND Freight <= 100) THEN 'Low Freight'

WHEN (Freight >= 100 ) THEN 'High Freight'

ELSE 'Unknown Freight'

END)
FROM
Orders

(c) Dynamic Order By functionality using CASE function:

This is helpful in the scenario when query needs to return the output by multiple columns. CASE function can be used to accomplish this by using Column names as variable which can be evaluated and substituted in the order by clause criteria.

Sample example below shows a query which can be dynamically sorted by LastName, FirstName and EmployeeID (default), the same query can be packaged into a stored procedure where it can take column names as a parameter and return the output accordingly:

--dynamic Order By
Declare @OrderBy Varchar(100)
--order by LastName
SELECT @OrderBy = 'LastName'

SELECT
EmployeeID, LastName, FirstName, Title, BirthDate, HireDate
FROM
Employees
ORDER BY
CASE
WHEN @OrderBy = 'LastName' THEN LastName
WHEN @OrderBy = 'FirstName' THEN FirstName
--default
ELSE Cast(EmployeeID As Varchar)
END

--dynamic Order By
Declare @OrderBy Varchar(100)
--order by FirstName
SELECT @OrderBy = 'FirstName'

SELECT
EmployeeID, LastName, FirstName, Title, BirthDate, HireDate
FROM
Employees
ORDER BY
CASE
WHEN @OrderBy = 'LastName' THEN LastName
WHEN @OrderBy = 'FirstName' THEN FirstName
--default
ELSE Cast(EmployeeID As Varchar)
END

(d) Conditional Update functionality using CASE function:

This is helpful in the scenario when multiple updates are needed for a particular table. CASE function can be used to accomplish the same operation using single update statement instead of writing multiple update statements for each criteria.

Sample example below adds $8 to all the Products which have UnitPrice less than 50, adds $5 to Products having UnitPrice greater than 50 and less than 100, adds $3 to Products having UnitPrice greater than 100 and finally for conditions which are falling in any of the criteria it puts the same UnitPrice (unchanged):

--put in temporary transaction to perform rollback
BEGIN TRAN
UPDATE Products SET UnitPrice =
CASE
WHEN (UnitPrice >= 0 AND UnitPrice <= 50) THEN (UnitPrice + 8)

WHEN (UnitPrice >= 50 AND UnitPrice <= 100) THEN (UnitPrice + 5)

WHEN (UnitPrice >= 100) THEN (UnitPrice + 3)
ELSE UnitPrice
END
--roll back the changes
ROLLBACK

Syed A. Rizvi - Developing Web-based Architectures