Thursday, November 11, 2010

How to find and remove duplicate rows in SQL Server

How to find and remove duplicate rows in SQL Server:


Relational databases by design are supposed to eliminate redundancy. The whole idea of Redundancy is that
you should store information in one place in order to maintain data integrity. Reality however is far from this truth as there are many Production databases out there with redundant data.  As such the duplicate records are alive and well unfortunately!

In today’s discussion we are going to look at some sample code that will let you find duplicate data across one or many fields.  Next we will look at some different methods on how to remove and delete that duplicate data. For today’s exercises, we are going to be working with sample Microsoft Northwind database which has been around for a few years to say the least.  All this SQL code has been tested on SQL server 2008 Enterprise Edition.

-How do we create duplicate data for testing purposes?

First we are going to make a copy of the CUSTOMERS table using the following sql query.
Next we are going to reinsert the data for Country=UK twice into the same table.  We can do this by using the following SQL query to populate duplicate data.

Before we get started with finding and deleting duplicate records, we are going to do a simple record count. This will let us know how many rows are present in the CUSTOMERS_BAK table. Now that we have our baseline data we need to start looking at different methods of finding duplicates in CUSTOMERS_BAK table.

We are going to go through a set of examples to show you how to accomplish this task.

-How do you find duplicate records in a SQL table?

There are many ways of finding duplicate data in a SQL Server table, however the simplest and the most efficient method is to use the GROUP BY clause. Further information on this SQL expression can be found on this website.  In our case we are going to be using GROUP BY with a HAVING clause to find the repeating (duplicate) rows. The first query will let us find any rows with repeating Company Names in our table. We are going to use the following TSQL query in SQL Server for this.

SELECT COMPANYNAME
FROM CUSTOMERS_BAK
GROUP BY COMPANYNAME
HAVING COUNT(*)>1




This query is looking for those rows in the CUSTOMERS_BAK table where the record count is greater than 1 when you group them by COMPANYNAME field. Notice you will get the 7 rows from UK customers that we had added in an earlier step. Here is a screen capture of this duplicate query for you.


Duplicate Company Name

What if we wanted to find out, how often these records are repeated within the table.  We can easily do that by running sql command with COUNT function in it.

You will have noticed that so far we have only looked at one field, COMPANYNAME the above case.  In a similar fashion we can add multiple fields to the SQL query. Let us see if he wanted to find any duplicate rows with fields CITY and COUNTRY that are being repeated in our table. How would we find the duplicates for two fields instead of one?  Let us try this sql server command first.

SELECT CITY, COUNTRY, COUNT(*) AS [# OF TIMES]
FROM CUSTOMERS_BAK
GROUP BY CITY
HAVING COUNT(*)>1




When we ran the above SQL statement ro find duplicate records, we got the following error:

Msg 8120, Level 16, State 1, Line 1
Column 'CUSTOMERS_BAK.Country' is invalid in the select list because it is not contained
in either an aggregate function or the GROUP BY clause.


The reason for this error is that although we are selecting CITY, COUNTRY fields, in our GROUP BY clause, we only have the CITY field in the SELECT portion. The number of fields in the SELECT clause has to match the GROUP BY clause. It is ok to have the COUNT(*) as it is an aggregate function. This is how GROUP BY clause works because you are trying to find the CITY and the COUNTRY fields that are also grouped by the same fields. 

You can simply run the following query that will give you the end result. You will notice that there are 11 rows in our table that match the duplicate criteria. 

SELECT CITY, COUNTRY, COUNT(*) AS [# OF TIMES]
FROM CUSTOMERS_BAK
GROUP BY CITY, COUNTRY
HAVING COUNT(*)>1







Let us say that we actually have duplicate rows across all the fields in a MS SQL table. This is probably the most common type of situation that you will run into working with redundant data. In this one you simply need to add all the fields in the table and also make sure that all those fields are also added in the GROUP BY clause.  Using our CUSTOMERS_BAK table we can use the following tsql script to do just that.

You will notice that there are seven rows in the customer table that are repeated.

How do you remove duplicate rows that?

So far we have covered how to find duplicate data, the next question obviously is: How to delete duplicate rows? There are many options and ways on how you can eliminate duplicate rows, we are going to look at two of these options. Both these options involve inserting non-duplicate data into another table first and then renaming the new table to the original table.

Remove duplicate rows - Option 1

This is the one I prefer as it is more efficient and uses similar code to what we have already seen.  We are going to use a GROUP BY without the having clause. Here’s the query to remove duplicates. We have also included a screen capture from part of the sql statement that removes the duplicates.



Remove duplicate data


Remove duplicate rows - Option 2

Another method is to use the UNION expression to filter out the duplicate rows and leave distinct values behind. Here’s the sql command in order to do that.


Related articles to Finding and Deleting Duplicate rows:


-Remove Duplicate Rows in SQL Server
-Delete duplicate rows in a table
-Using common table expression to remove duplicate data
-Microsoft SQL Server 2008 Management and Administration
-Microsoft SQL Server 2008 Bible
-SQL Server Tutorials

Microsoft SQL Server 2008 Step by Step (Step by Step (Microsoft))

Wednesday, October 6, 2010

How to import data in SQL Azure from SQL 2008 R2 using SSIS?

How to import data in SQL Azure from SQL 2008 R2 using SSIS


When working with SQL databases one of the most common tasks is to import or export data from a database.  In experimenting with SQL Azure, I realized that it is not possible to migrate data using the familiar Import and Export wizard that is available in SQL Server Management Studio (SSMS). What you have to do it is create a SSIS (SQL Server Integration services) package in Business Intelligence Development Studio (BIDS) and then execute it. 

In this article we are going to show you how to copy data from on-site SQL 2008 to off-site SQL Azure. For this example we are going to export data from Products table in a local database to a database in the cloud using SQL Integration Services.

Before we jump into SSIS, we found another thing that is unique to SQL Azure version. We noticed that we need to have a clustered index on the destination table before SSIS will let us import data into it. So we have to do two things on the destination SQL Azure database first

-Create the Products table
-Define a clustered index on the table

We have included the SQL script for the above two steps for your convenience.


--This script creates the Products table on SQL Azure.
--Next we create a clustered index on ProductID


CREATE TABLE [dbo].[Products](
    [ProductID] [int] NOT NULL,
    [ProductName] [nvarchar](40) NOT NULL,
    [SupplierID] [int] NULL,
    [CategoryID] [int] NULL,
    [QuantityPerUnit] [nvarchar](20) NULL,
    [UnitPrice] [money] NULL,
    [UnitsInStock] [smallint] NULL,
    [UnitsOnOrder] [smallint] NULL,
    [ReorderLevel] [smallint] NULL,
    [Discontinued] [bit] NOT NULL

)
---ON [PRIMARY]


CREATE CLUSTERED INDEX IX_Product ON dbo.Products
    (
    ProductID
    )

GO






Before getting started please realize that we are working with Enterprise edition of SQL Server 2008R2 and cloud version on SQL Server.
First off launch the Business Intelligence Development Studio or BIDS from the start menu by doing the following:
 
Start – All Programs – Microsoft Visual Studio 2008 - Microsoft Visual Studio 2008

This will launch the Business Intelligence Development Studio. Next from the File menu and do the following:

File - New - Project

Next go ahead and select Integration Services Project.  This is shown in the dialog box below:



Integration Services Project



You can add a name for the project and then click OK.  This action will go ahead and create the necessary components under Solution Explorer.  In addition it will also launch the Package Design window in the middle.  The next thing you need to do is to add a data source that will be used to export data. Within the Solution Explorer, right click on Data Sources and then select New Data Source.  This will open up the Data Source wizard screen, go ahead and click on Next.  If you do not have an existing data connection, you can create one by clicking New.

In this example we are going to use sample Northwind database on a local instance of SQL server 2008 R2 to export data. This SSIS data source is for server KASHMONEY-PC\SQL08R2 as shown in the following computer screenshot.

On-site SQL 2008 R2


The next thing we need to do is to define the data destination where data will be imported into. In this case this we will use a SQL Azure database in the cloud. In order to create this connection, we need to define another data source so we are going to follow the earlier step.


We will invoke the Data Source Wizard one more time as shown below.





 Here we had some issues as we tried different data providers in order to connect successfully to a SQL Azure server and the database.  Bottom line is you need to use the .Net Providers\SqlClient data provider as shown below in the Connection Manager screen.

In addition you will need to know the server address, SQL username and the SQL password.


SQL Azure Connection


Next click OK which will finish the wizard.  At this point, we have our data source and are data destination all squared away. We have included a screen capture that shows the end of the data wizard with the connection string information.


Destination Database




Click Finish to exit out of this.

Now we are going to configure the SSIS package details, remember an Integration package is the building block for data movement in SQL Server. Make sure that you are on the Control Flow tab in design pane. From the Toolbox on the left, drag the Data Flow task under the Control Flow Items onto the design surface.

This step has been displayed for you in the following figure.


SSIS Package


Next switch to the Data Flow tab on the SSIS design pane. Next we will be adding a data source from our local on premise database. We drag the OLE DB Source from the Data Flow Sources.

Here is what it looks like on our puter monitor.


OLE DB Source



We need to configure the SSIS data flow item. Right click on OLE DB Source and select Edit.  On the next dialog OLE DB Source Editor, go ahead and select New.  This will open up a new screen as shown below.


On Premise SQL Server



 From the data connections pane, select LocalHost.Northwind and select Ok.

So far we have just made the connection to the source database, we still need to get the table information. Here you will select Table or view under Data access mode. Next choose Products under Name of the table or the view. After you click Ok, you have successfully created SSIS data flow from the source. This step is shown below:





The next step is to create the ADO Net Destination, this Is where the data will be imported into. From the Toolbox on the left, go ahead and select the Ado Net destination SSIS Data Flow item  and then move it over to the design pane in Data Flow tab.  This is shown in the screen capture right below, notice that the connection has not been configured yet and is displayed in red.


Data Flow controls



Before you can configure the ADO NET destination, you have to drag the output (green arrow) from the OLE DB Source to the ADO NET Destination connection. Next you can right click on ADO NET destination and select Edit. This will launch a dialog box titled ADO NET Destination Editor.

Go ahead and click on New. You will get the Connection Manager screen shown below.







Go ahead and select the SQL Azure connection that was defined in earlier step. Also you will need to point to the table we created using SQL Script. For Use a table or view select Products. Here is what is looks like on our Microsoft Visual Studio 2008 application.







One final thing we need to do before we can import data using Integration Services is to do map the columns between the data source and the data destination. If you are using the same table schema structure on the source and the destination tables, this should be pretty straight forward.


Click on Mappings in the left pane and make sure the Input and Output columns match. We have included a screen shot for your review.
 



Column Mappings






Let us take a look at our Data Flow designer before we kick things off. We have done this so far

1- Defined a local on premise server connection to Northwind database
2- Defined a cloud SQL Azure server connection to an off site database
3- Defined a OLE DB source that maps to local on-site database connection
4- Defined a ADO NET destination that maps to an off-site cloud SQL Azure database connection

Here is what we have so far, we are almost done.




Integration Services Package




Now we are going to kick off the SSIS package, all you need to do is click on Execute (green arrow) or Debug – Start Debugging. In our case, all the items are configured properly and our SSIS package executed with success. we have the following end result and were able to move 77 records from SQL 2008 to SQL Azure.





Execute SSIS package




We can verify the data on the target SQL Azure location with this SQL query.




Check data at Target

 




We were successful in migrating data from an on premise local database to a SQL Azure cloud database using SQL Server Integration Services.


More tips on SQL Azure can be found at our site:

http://sqlazuretutorials.com/




Tags: Import data, SQL Azure, SQL 2008 R2, SSIS

Monday, August 30, 2010

SQL Aggregate functions like SQL MAX, SQL MIN, SQL COUNT And SQL AVERAGE function

SQL Aggregate functions like SQL MAX, SQL MIN, SQL COUNT And SQL AVERAGE function


SQL Aggregate functions are necessary to summarize numerical data in a SQL Server database. We have already looked at other transact SQL statements like SELECT, INSERT, DELETE and UPDATE SQL statements. Those typically are used to return rows of data that matches the criteria in the WHERE clause. SQL statements using aggregate functions like MIN or MAX however return just one value after SQL server has already applied the aggregation and summarized the data.  In this manner SQL Server aggregate functions perform their operation across multiple rows of data. 

For today’s discussion we are going to look at AdventureWorks2008 sample database in SQL Server 2008. You can download the associated data from this location:

Products:


http://www.sqlserver2008tutorial.com/blog/sql-aggregate-functions/production_products.xls

Orders:

http://www.sqlserver2008tutorial.com/blog/sql-aggregate-functions/sales_order_header.xls



There are a number of SQL aggregate functions; however we are only going to look at the important ones so here we go:

SQL MIN Aggregate function:


The SQL function is used to return the minimum value in the select list, ignoring any Null value.  This would typically be used let’s say in a PRODUCTS table to find your cheapest product.

SELECT
MIN(LISTPRICE) AS [MINIMUM PRICED ITEM]
FROM PRODUCTION.PRODUCT
WHERE SIZE IS NOT NULL

--8.99


SQL MAX Aggregate function:


This Aggregate function is used to return the maximum value in the select list, ignoring th any Nulls.  Similar to the SQL MIN function, the SQL Max function can be used to find the highest priced item in the above products table. Here is the SQL syntax for MIN function:

SELECT
MAX(LISTPRICE) AS [MAXIMUM PRICED ITEM]
FROM PRODUCTION.PRODUCT
WHERE SIZE IS NOT NULL

--3578.27

SQL SUM Aggregate function:


This SQL function returns the total sum of all items in the select list, ignoring any Nulls.  The sum function will add up all the values for a column in a certain table.  For example we are going to use the SALESORDERHEADER table and summarize the TOTALDUE field

SELECT SUM(TOTALDUE) AS SUMMATION
FROM SALES.SALESORDERHEADER
WHERE CUSTOMERID='29898'

--197634.248

SQL AVG Aggregate function:


SQL provides the AVG function to return the average of values in the select list, ignoring any Nulls.  This is an important function which will help you find average value of let’s say product prices in your product table.  We are going to use the same SALESORDERHEADER table from the prior example to show you SQL average function in action


SELECT AVG(TOTALDUE) AS AVERAGE
FROM SALES.SALESORDERHEADER
WHERE CUSTOMERID='29898'

--49408.562


SQL COUNT Aggregate function:


A lot of times all you’re trying to figure out his how many rows or records apply to a certain condition.  This is where the sql COUNT function comes in handy as it will simply count the number of rows that meet your criteria. The COUNT aggregate function returns the number of items in the select list. This ignores the NULL values unless you use COUNT(*).  This is by far the most important aggregate function that you will be using. Here is an example from the SALESORDERHEADER table.

SELECT COUNT(TOTALDUE) AS [ITEM COUNT]
FROM SALES.SALESORDERHEADER
WHERE CUSTOMERID='29898'

--4 RECORDS

Sample video on Aggregate Functions here:

http://www.learningsqlserver2008.com/products.htm

TAGS: SQL Server AGGREGATE FUNCTIONS, SQL COUNT, SQL AVG, SQL MAX, SQL MIN, SQL SUM

Saturday, July 31, 2010

SQL STATEMENTS INCLUDING SQL INSERT, SQL UPDATE And SQL DELETE

SQL STATEMENTS INCLUDING SQL INSERT, SQL UPDATE And SQL DELETE:


SQL INSERT STATEMENT:


The SQL INSERT statement adds one or more new rows to a table. In a simplified version, INSERT has this sql syntax:


INSERT [INTO] table_or_view
[(column_list)]
data_values

Using this syntax you can typically insert data in a single row shown as follows. Here we are using Northwind sample database from Microsoft.


INSERT INTO REGION VALUES ('5','MIDWEST')

If you want to insert data in a table with multiple records, INSERT statement can use a sub-select (sub query) to do just that.
In the next sql insert command example we are trying to insert contact information into CUSTOMER_ CONTACT table from CUSTOMER table. Here’s the transact SQL syntax for insert statement with a sub query:


INSERT CUSTOMER_CONTACT (CUSTOMERID, CONTACTNAME, PHONE)
SELECT  CUSTOMERID, CONTACTNAME, PHONE
FROM CUSTOMERS

This will insert only the specific fields, however all the records from the customers table.
Sql insert comman. Here is a screen shot of what this sql command looks like in SQL Server Management Studio (SSMS)




What if you wanted to make a copy of a table and insert data at the same time? No problem. You can do that by using SELECT * INTO. Here is an example of a copy of CUSTOMERS table with a new table is CUSTOMER_CONTACT_BAK. When you use WHERE 1=2, it just creates the table and  does not insert any data in the table.

SELECT * INTO CUSTOMER_CONTACT_BAK
FROM CUSTOMERS WHERE 1=2



SQL UPDATE STATEMENT:


The Update SQL command is used to update data in tables. Here is the basic syntax of this important transact sql (tsql) command.


UPDATE TABLE
SET COLUMN=NEW VALUE 
WHERE CONDITION

If you use the update statement without the WHERE clause, you will update all the rows in the table with the same value Ooops! Be careful with this sql command for sure.

For the next scenario let us say there is a shortage of seafood in the market, so we need to raise the price by 5% for these items. Go ahead and try this update command on Northwind database.


UPDATE PRODUCTS
SET UNITPRICE=UNITPRICE * 1.05
WHERE CATEGORYID=8

Here is a screen capture of the above tsql command from SSMS,



Just like the SQL INSERT command, you can also use a sub-select command with an SQL UPDATE statement also. The concept is say you are trying to update one table with data from another table.  This is commonly used when you are trying to fix the transactional table with values from some lookup table. Let us take a look at this next update sql example now.

UPDATE DBO.CUSTOMER_CONTACT
SET REGION=REGIONDESCRIPTION
FROM DBO.REGION_STATE
WHERE CUSTOMER_CONTACT.STATE=REGION_STATE.STATE

In the above sql query we are updating CUSTOMER_CONTACT table, field REGION with values from REGION_STATE table, field REGIONDESCRIPTION. The catch is that it will only update those rows where there is a match between two tables on the state field.

Here are the copies of the CUSTOMER_CONTACT table before and after update

Before the sql update command
After the update sql command


SQL DELETE STATEMENT:


In five words SQL Delete statement “Removes rows from a table!” Be very careful with this command. Here is the basic syntax of a sql delete statement


DELETE FROM TABLE
WHERE CONDITION

Notice that absence of where clause deletes all rows. Here is an example of a simple delete query command.

DELETE FROM NORTHWIND..REGION
WHERE REGIONDESCRIPTION='MIDWEST'

This delete statement will affect the row that we had added earlier in the INSERT statement. Lets us say that we stopped carrying ay TOFU products, you could get rid of those rows by using the following sql command.


DELETE FROM Products
WHERE ProductName LIKE '%TOFU%'

When we ran this we got the following error:


The DELETE statement conflicted with the REFERENCE constraint "FK_Order_Details_Products". The conflict occurred in database "Northwind", table "dbo.Order Details", column 'ProductID'.
The statement has been terminated.

We have capture the above sql command and the error in the following image




This SQL error has to do with the Primary key and Foreign key relationships. Basically what it is saying is that you have existing customer orders with this product. You would first need to delete the specific rows in ORDER DETAILS table and then delete the “TOFU” rows in PRODUCTS table.

Just like the INSERT And UPDATE statements, DELETE sql statement can also use a sub-select (sub query). Also please note DELETE * FROM TABLE will not work and you have to use DELETE FROM TABLE format.


TAGS: SQL INSERT, SQL DELETE, SQL UPDATE, NORTHWIND

Wednesday, June 30, 2010

Transfer logins in SQL Server 2005 and SQL Server 2008

/*------------------------------------------------------------------------------------------------------------

Transfer logins in SQL Server 2005 and SQL Server 2008


In this script we cover sp_help_revlogin stored procedure which can help you copy sql server logins from one server to another server running SQL Server 2008.

This can be useful when you are trying to copy a database from one server (maybe production) to another (development or stage). Remember when you copy a database it does not move the SQL Server Logins which are present at the server level. You have to manually transfer the login accounts from the source ms sql server to target ms sql server.

Using the information here you can easily do this task.

------------------------------------------------------------------------------------------------------------*/

/*

Transfer logins SQL Server 2005 - Step 1


 Create sp_help_revlogin procedure

Sources:

http://support.microsoft.com/kb/246133/

http://blog.netnerds.net/2009/01/migratetransfer-sql-server-2008200520007-logins-to-sql-server-2008/

This needs to be run on the source sql server, the one you are copying the logins from


*/

----- Begin Script, Create sp_help_revlogin procedure -----



USE master
GO
IF OBJECT_ID ('sp_hexadecimal') IS NOT NULL
  DROP PROCEDURE sp_hexadecimal
GO
CREATE PROCEDURE sp_hexadecimal
    @binvalue varbinary(256),
    @hexvalue varchar (514) OUTPUT
AS
DECLARE @charvalue varchar (514)
DECLARE @i int
DECLARE @length int
DECLARE @hexstring char(16)
SELECT @charvalue = '0x'
SELECT @i = 1
SELECT @length = DATALENGTH (@binvalue)
SELECT @hexstring = '0123456789ABCDEF'
WHILE (@i <= @length)
BEGIN
  DECLARE @tempint int
  DECLARE @firstint int
  DECLARE @secondint int
  SELECT @tempint = CONVERT(int, SUBSTRING(@binvalue,@i,1))
  SELECT @firstint = FLOOR(@tempint/16)
  SELECT @secondint = @tempint - (@firstint*16)
  SELECT @charvalue = @charvalue +
    SUBSTRING(@hexstring, @firstint+1, 1) +
    SUBSTRING(@hexstring, @secondint+1, 1)
  SELECT @i = @i + 1
END

SELECT @hexvalue = @charvalue
GO

IF OBJECT_ID ('sp_help_revlogin') IS NOT NULL
  DROP PROCEDURE sp_help_revlogin
GO
CREATE PROCEDURE sp_help_revlogin @login_name sysname = NULL AS
DECLARE @name sysname
DECLARE @type varchar (1)
DECLARE @hasaccess int
DECLARE @denylogin int
DECLARE @is_disabled int
DECLARE @PWD_varbinary  varbinary (256)
DECLARE @PWD_string  varchar (514)
DECLARE @SID_varbinary varbinary (85)
DECLARE @SID_string varchar (514)
DECLARE @tmpstr  varchar (1024)
DECLARE @is_policy_checked varchar (3)
DECLARE @is_expiration_checked varchar (3)

DECLARE @defaultdb sysname

IF (@login_name IS NULL)
  DECLARE login_curs CURSOR FOR

      SELECT p.sid, p.name, p.type, p.is_disabled, p.default_database_name, l.hasaccess, l.denylogin FROM
sys.server_principals p LEFT JOIN sys.syslogins l
      ON ( l.name = p.name ) WHERE p.type IN ( 'S', 'G', 'U' ) AND p.name <> 'sa'
ELSE
DECLARE login_curs CURSOR FOR

      SELECT p.sid, p.name, p.type, p.is_disabled, p.default_database_name, l.hasaccess, l.denylogin FROM
sys.server_principals p LEFT JOIN sys.syslogins l
      ON ( l.name = p.name ) WHERE p.type IN ( 'S', 'G', 'U' ) AND p.name = @login_name
OPEN login_curs

FETCH NEXT FROM login_curs INTO @SID_varbinary, @name, @type, @is_disabled, @defaultdb, @hasaccess, @denylogin
IF (@@fetch_status = -1)
BEGIN
  PRINT 'No login(s) found.'
  CLOSE login_curs
  DEALLOCATE login_curs
  RETURN -1
END
SET @tmpstr = '/* sp_help_revlogin script '
PRINT @tmpstr
SET @tmpstr = '** Generated ' + CONVERT (varchar, GETDATE()) + ' on ' + @@SERVERNAME + ' */'
PRINT @tmpstr
PRINT ''
WHILE (@@fetch_status <> -1)
BEGIN
  IF (@@fetch_status <> -2)
  BEGIN
    PRINT ''
    SET @tmpstr = '-- Login: ' + @name
    PRINT @tmpstr
    IF (@type IN ( 'G', 'U'))
    BEGIN -- NT authenticated account/group

      SET @tmpstr = 'CREATE LOGIN ' + QUOTENAME( @name ) + ' FROM WINDOWS WITH DEFAULT_DATABASE = [' + @defaultdb + ']'
    END
    ELSE BEGIN -- SQL Server authentication
        -- obtain password and sid
            SET @PWD_varbinary = CAST( LOGINPROPERTY( @name, 'PasswordHash' ) AS varbinary (256) )
        EXEC sp_hexadecimal @PWD_varbinary, @PWD_string OUT
        EXEC sp_hexadecimal @SID_varbinary,@SID_string OUT

        -- obtain password policy state
        SELECT @is_policy_checked = CASE is_policy_checked WHEN 1 THEN 'ON' WHEN 0 THEN 'OFF' ELSE NULL END FROM sys.sql_logins WHERE name = @name
        SELECT @is_expiration_checked = CASE is_expiration_checked WHEN 1 THEN 'ON' WHEN 0 THEN 'OFF' ELSE NULL END FROM sys.sql_logins WHERE name = @name
         SET @tmpstr = 'CREATE LOGIN ' + QUOTENAME( @name ) + ' WITH PASSWORD = ' + @PWD_string + ' HASHED, SID = ' + @SID_string + ', DEFAULT_DATABASE = [' + @defaultdb + ']'

        IF ( @is_policy_checked IS NOT NULL )
        BEGIN
          SET @tmpstr = @tmpstr + ', CHECK_POLICY = ' + @is_policy_checked
        END
        IF ( @is_expiration_checked IS NOT NULL )
        BEGIN
          SET @tmpstr = @tmpstr + ', CHECK_EXPIRATION = ' + @is_expiration_checked
        END
    END
    IF (@denylogin = 1)
    BEGIN -- login is denied access
      SET @tmpstr = @tmpstr + '; DENY CONNECT SQL TO ' + QUOTENAME( @name )
    END
    ELSE IF (@hasaccess = 0)
    BEGIN -- login exists but does not have access
      SET @tmpstr = @tmpstr + '; REVOKE CONNECT SQL TO ' + QUOTENAME( @name )
    END
    IF (@is_disabled = 1)
    BEGIN -- login is disabled
      SET @tmpstr = @tmpstr + '; ALTER LOGIN ' + QUOTENAME( @name ) + ' DISABLE'
    END
    PRINT @tmpstr
  END

  FETCH NEXT FROM login_curs INTO @SID_varbinary, @name, @type, @is_disabled, @defaultdb, @hasaccess, @denylogin
   END
CLOSE login_curs
DEALLOCATE login_curs
RETURN 0
GO


 ----- End Script, Create sp_help_revlogin procedure -----


/*

Transfer logins in SQL Server 2005 - Step 2


Run the following script on the source SQL Server. After you execute the SQL, this stored procedure will go ahead and generate the SQL code for your Logins.  You can then copy this SQL code and execute it on the target server.

*/


--USE [Enter your database]

EXEC master..sp_help_revlogin



For our case I am using Northwind sample database. I have included a screen shot of what this looks like on my machine.





Transfer logins SQL Server 2005 - Step 3


--Execute the sql code generated by the previous step on your target (destination) SQL Server, This will copy all the login information to the destination server.


Here is part of the SQL script.


/* sp_help_revlogin script
** Generated Jun 30 2010  4:08PM on KASHMONEY-PC */


-- Login: ##MS_PolicyEventProcessingLogin##
CREATE LOGIN [##MS_PolicyEventProcessingLogin##] WITH PASSWORD = 0x01003869D680ADF63DB291C6737F1EFB8E4A481B02284215913F HASHED, SID = 0x0A6983CDF023464B9E86E4EEAB92C5DA, DEFAULT_DATABASE = [master], CHECK_POLICY = ON, CHECK_EXPIRATION = OFF; ALTER LOGIN [##MS_PolicyEventProcessingLogin##] DISABLE

-- Login: ##MS_PolicyTsqlExecutionLogin##
CREATE LOGIN [##MS_PolicyTsqlExecutionLogin##] WITH PASSWORD = 0x01008D22A249DF5EF3B79ED321563A1DCCDC9CFC5FF954DD2D0F HASHED, SID = 0x8F651FE8547A4644A0C06CA83723A876, DEFAULT_DATABASE = [master], CHECK_POLICY = ON, CHECK_EXPIRATION = OFF; ALTER LOGIN [##MS_PolicyTsqlExecutionLogin##] DISABLE

-- Login: NT AUTHORITY\SYSTEM
CREATE LOGIN [NT AUTHORITY\SYSTEM] FROM WINDOWS WITH DEFAULT_DATABASE = [master]

-- Login: NT SERVICE\MSSQLSERVER
CREATE LOGIN [NT SERVICE\MSSQLSERVER] FROM WINDOWS WITH DEFAULT_DATABASE = [master]


Also we have included a screen shot right below of what it looks like:



TAGS: SQL Server 2005, Copy SQL Logins, sp_help_revlogin, SQL Server 2005

Thursday, June 17, 2010

Why Table Variables are better than Temporary Tables

Why Table Variables are better than Temporary Tables

If you are working with SQL Server tables that have millions of rows, one common technique to speed up your SQL queries and stored procedures is to either use temporary tables or table variables. There is quite a bit of debate on which is the better option, Table Variables or Temporary Tables. We will take a closer look at the two objects with SQL Profiler to decide the winner.

Temporary tables


Temporary tables are similar to static tables, however they are not permanent database objects. Temporary tables are extremely useful as work tables for storing intermediate results and complex queries.  In addition, SQL server creates and maintains them in Tempdb database and drops them when they are no longer needed. You can virtually do everything with a temporary table that you can do with a standard SQL Server table.  You can create indexes, create defaults, modify the table and basically use the temporary table anywhere you need to use a regular persistent database table.  Temporary tables can also be used in stored procedures to improve performance tuning and query optimization.  There are two types of temporary tables:

Local Temporary table:

This is by far the most common type of a temporary table.  When using this one, the scope of the temporary table is limited to the database connection that creates the temp table. The name of the local temporary table must start with a # (Hash symbol) e.g. #local_temp_table. The local temporary table can either be dropped explicitly (best practice), or when the database connection closes.  If you are using a store procedure, the local temporary table will be dropped when the stored procedure ends execution.

Global Temporary table:

Temporary tables can also be shared across many connections in SQL Server. This is possible by using a global temporary table which is visible to any and all connection in SQL server. This table is only deleted when the last connection explicit drops the temporary tables. Global temporary table must begin with 2 ## signs, e.g.##global_temp_table.
Let us take a look at some of these examples next, we are going to be using Northwind database which is a sample database for SQL server. If you would like to download this database, please visit the Northwind download page.

--Create Table #CUSTOMER_DATA 

CREATE TABLE #CUSTOMER_DATA
( [CONTACTNAME] [NVARCHAR](30) NULL,
[ADDRESS] [NVARCHAR](60) NULL,
[CITY] [NVARCHAR](15) NULL,
[COUNTRY] [NVARCHAR](15) NULL,
)



We will compare the performance of temporary table with a table variable using SQL Server 2008 at the end of this article


Advantages of using temporary tables

Here are some of the advantages of using temporary tables:

-It is possible to create an index on a temporary table
-You can create constraints on a temporary table including Primary Key, unique, NULL and check constraints
-The scope of a temporary table is not only limited to the current session, instead you can extend it to all connections using global temporary table
-Using Statistics is possible on a temporary table
-Temporary table are best suited for big datasets which involves large amount of processing


Disadvantages of using temporary tables

Some of the disadvantages of using temporary tables are:

-For smaller data sets, temporary table are outperformed by table variables
-Generally there is more overhead with temporary table as you have to create the object, populate temporary table  and also log read/write operations
-When using temporary table in a stored procedure, there may be incidence of recompilation


Performance Testing between Temporary Tables and Table Variables

Using Northwind sample database, we are going to create a test table called CUSTOMER_DATA_BAK which contains repeated data from CUSTOMER_DATA table. This test table has 2 million rows so we can compare temporary table with table variable for a large data set. When we run these sql queries we will run a trace in SQL Profiler to capture different parameters and statistics.  We will repeat the test for these scenarios:

1. Temporary table without an index
2. Temporary table with an index
3. Table Variable


Here we have included a transact sql script for testing the first case. It does the following things:

-Clear the data and procedure cache to get a clean baseline
-Create the temporary table
-Populate the temporary table
-Run a select query from the temporary table with formatted customer mailing list


USE Northwind

--TEMP TABLE WITH NO INDEX
--The first command empties the data cache and the second one empties the procedure cache


DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE

--This displays disk activity and time of TSQL execution


SET STATISTICS IO ON
SET STATISTICS TIME ON

--Create Table #CUSTOMER_DATA


CREATE TABLE #CUSTOMER_DATA
( [CONTACTNAME] [NVARCHAR](30) NULL,
[ADDRESS] [NVARCHAR](60) NULL,
[CITY] [NVARCHAR](15) NULL,
[COUNTRY] [NVARCHAR](15) NULL,
)

--Populate Temp Table #CUSTOMER_DATA


INSERT INTO #CUSTOMER_DATA
SELECT
CONTACTNAME, ADDRESS, CITY, COUNTRY
FROM DBO.CUSTOMER_DATA_BAK
WHERE CITY IN ('SAN FRANCISCO', 'SEATLLE', 'ELGIN', 'LONDON')

--Select formatted data from Temp Table #CUSTOMER_DATA


SELECT
CONTACTNAME + CHAR(13) + CHAR(10) +
ADDRESS + CHAR(13) + CHAR(10) +
CITY + ' ' + COUNTRY + CHAR(13) + CHAR(10) +
CHAR(13) + CHAR(10)AS PRINT_ADDRESS
FROM #CUSTOMER_DATA
WHERE CITY ='LONDON'

--Explicitly drop the Temp Table #CUSTOMER_DATA


DROP TABLE #CUSTOMER_DATA
 
/* SQL PROFILER RESULTS

CPU= 3274
READS= 72194
WRITES= 2297
DURATION= 162519

*/



You can download this sql script on temp table from this location:
When we look at the trace output from the SQL Profiler on temp table, the numbers are as follows:

CPU=3274
READS=72194
WRITES=2297
DURATION=162519

Here is a screen shot from SQL server Profiler trace



http://www.sqlserver2008tutorial.com/blog/temporary-tables/temp-table-profiler.jpg


Performance Testing on Temp Table with an Index


Next we are going to test the same process, however this time we are going to create an index on the temporary table in addition to the existing code. You can download this sql script on temp table with index from this location:

We have included a partial screen shot of this transact sql code when we tested this in Management Studion in SQL Server 2008:


http://www.sqlserver2008tutorial.com/blog/temporary-tables/create-temp-table-index.jpg

When we look at the trace on temp table with an index from the SQL Profiler, here are the numbers:

CPU=6114
READS=757553
WRITES=3035
DURATION=66602

Here is a screen shot from SQL server Profiler trace



http://www.sqlserver2008tutorial.com/blog/temporary-tables/temp-table-index-profiler.jpg

Table variables:

Table variable is simply a data variable of type table. As such Table variables is created using a declare statement.  It offers an alternative approach to using temporary table and is typically faster and efficient for smaller record sets.  A table variable can store results for later processing similar in concept to using a temporary table. Table variable act like a local variables and their scope ends with the batch that is currently using them.

Here is an example of sql code for a Table variable

--Declare Table Variable @CUSTOMER_DATA
DECLARE @CUSTOMER_DATA TABLE

( [CONTACTNAME] [NVARCHAR](30) NULL,
[ADDRESS] [NVARCHAR](60) NULL,
[CITY] [NVARCHAR](15) NULL,
[COUNTRY] [NVARCHAR](15) NULL
)



Advantages of a Table variable


Here are some of the advantages of using table variables:

-In general for smaller record set, table variables outperform the temporary tables
-When using a function with temporary storage, you have to use the table variable as temporary tables will not work
-Table variables do not cause recompilation issues like temporary tables do
-Table variables require fewer system resources causing than them to be less of a performance hit compare to temp tables

Disadvantages of a table variable



Here are some of disadvantages of using table variables:

-The table definition cannot be changed once table variable has been declared
-You cannot use Select Into statements with table variables
-The only way you can create an index in a table variable is to use a Primary Key constraint at the time of table variable declaration
-New Indexes cannot be created after declaration of the table variable
-Performance is slow with table variables when working with large data sets
-Rollback Tran and Truncate table are not allowed with table variables

Performance Testing on Table Variable

Next we are going to repeat the process this time with a table variable.  Here is the script than we are going to use:

USE Northwind

--SQL TABLE VARIABLE
--The first command empties the data cache and the second one empties the procedure cache

DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE

--This displays disk activity and time of TSQL execution

SET STATISTICS IO ON
SET STATISTICS TIME ON

--Declare Table Variable @CUSTOMER_DATA

DECLARE @CUSTOMER_DATA TABLE
( [CONTACTNAME] [NVARCHAR](30) NULL,
[ADDRESS] [NVARCHAR](60) NULL,
[CITY] [NVARCHAR](15) NULL,
[COUNTRY] [NVARCHAR](15) NULL
)

--Insert data into Table Variable @CUSTOMER_DATA

INSERT INTO @CUSTOMER_DATA
SELECT
CONTACTNAME, ADDRESS, CITY, COUNTRY
FROM DBO.CUSTOMER_DATA_BAK
WHERE CITY IN ('SAN FRANCISCO', 'SEATLLE', 'ELGIN', 'LONDON')

--Select data from Table Variable @CUSTOMER_DATA

SELECT
CONTACTNAME + CHAR(13) + CHAR(10) +
ADDRESS + CHAR(13) + CHAR(10) +
CITY + ' ' + COUNTRY + CHAR(13) + CHAR(10) +
CHAR(13) + CHAR(10)AS PRINT_ADDRESS
FROM @CUSTOMER_DATA
WHERE CITY ='LONDON'

/* SQL PROFILER RESULTS

CPU= 2730
READS= 71000
WRITES= 2289
DURATION= 46184

*/

Next we are going to repeat the process this time with a table variable.  Here is the script than we are going to

You can also download the temp variable sql script from our site
We have included a screen shot of the SQL profiler results on table variables





http://www.sqlserver2008tutorial.com/blog/temporary-tables/table-variable-profiler.jpg

When we look at the trace on temp table from the SQL Profiler output, here are the numbers:

CPU=2730
READS=71000
WRITES=2289
DURATION=46184

Performance Testing Results between Temp Table and Table Variable





Parameter

Temp Table (No    Index)

Temp Table (With Index)

Table Variable
CPU 3274 6114 2730
DURATION 162519 66602 46184


As you can see using Table Variable is a much better option as it used less CPU and took only 1/3 of time to process the same data!!

For futher query optimization tricks and tips, please visit our site on SQL Server Tutorials

Related Links on the Topic

-Temporary Tables vs. Table Variables
-Should I use a #temp table or a @table variable?





TAGS: Table Variables, Temporary Tables