|
Comments
|
Today's Top SOA Links
Features The Transformation of DTS in SQL Server 2005
Increasing flexibility with a new look and feel
By: Mitch Ruebush
Jul. 6, 2004 12:00 AM
Data Transformation Services (DTS) is a set of graphical tools that you can use to extract, transform, and load (ETL) data from many different sources to a single or multiple destinations. DTS was originally released with SQL Server 7.0 to provide a more manageable interface for doing ETL without resorting to batch file scripting. This gives the administrator or developer the ability to easily create complicated workflows for loading and processing data, transforming data for data mining, tracking lineage, and versioning the workflows with less code. DTS was incrementally improved for SQL Server 2000 by increasing logging abilities, save options, and the number of tasks that you could use to create a workflow. These improvements were great, but there were many problems with manageability, reliability, and extensibility that also required attention. Under SQL Server 2005, DTS underwent a radical transformation to make it into a much more flexible ETL engine for updating SQL Server content - the most radical change being that the new DTS designer uses the Visual Studio .NET shell. New Look and Feel for the DTS Designer The new DTS designer can be hosted in the SQL Workbench, which is the new administrator tool, or in the Business Intelligence Workbench, which is the tool aimed at developers. Both of these tools are based on the Visual Studio .NET environment and should be familiar to VS.NET developers. While both of them enable you to design, execute, and schedule a DTS package, the BI Workbench supports the ability to work in SQL Server. The BI Workbench allows you to create a deployment package for the DTS package, which is one of the new features. It also lets you work with the package in an offline mode, which is great for companies that have security policies in place that don't allow direct access to the server, or if you need to make some tweaks to a package while traveling. By providing a properties window like the one in Visual Studio .NET, the new DTS designer allows for easier property editing. A picture is worth a thousand words, so the best way to understand the updated designer is to use it in creating a package that extracts data from the northwind database, applies a transform that builds an employee code from the first five letters of the LastName field, converts the LastName field to uppercase, and writes it out to an SQL table. You can start the DTS designer by launching either SQL Workbench or BI Workbench. Because we are developers, we will be using the BI Workbench. Start the BI Workbench by navigating to Start > Programs > Microsoft SQL Server > Business Intelligence Workbench. Once the BI Workbench has started, you can create a new project by choosing File > New > Project from the main menu to launch the Project Types dialog box. Click on the Business Intelligence Projects folder in the left-hand pane of the dialog box to show the Data Transformation templates. Choose the Data Transformation Project icon and type DTSTest for the name of the package as shown in Figure 1. You can also choose the location in which to store the package in the Location text box. Click the OK button and you will have a solution that can hold the various projects you develop to support the ETL process. The solution creates a project for you that we will use to host our DTS package. A project is a collection of DTS packages, data sources, and other files that you generate during package development. You can create a new package by right-clicking on the project folder DTSTest and selecting Add > New Item. Select the New DTS Package from the New Item dialog box. Enter DTSTestPackage.dtsx for the name of the package and then click the Add button. Now we need to specify the source of the data by right-clicking the Data Sources folder and selecting New Data Source to launch the Data Source Wizard. The data source can be shared among many packages within the same solution. Click Next to move the "Select how to define the connection" page and click on the New Connection... button to launch the Connection Manager dialog box. Enter localhost in the "Select or enter a server name" drop-down list. In the "Enter information to logon" section on to the server page, select "Use Windows NT Integrated security," and in the "Select the database on the server" drop-down list, select "localhost. northwind database." Enter NorthwindDS as the name and click the Finish button to complete the configuration. Now that you have a data source, you can add connections to the data sources or point to the data source directly. Now we need to create a connection for the package by clicking on the Connections tab at the bottom, selecting New OLE DB Connection, and choosing the data source we created above. Switch to the Control Flow editor by clicking the Control Flow tab. The DTS designer is broken into four editors: Control Flow editor, Data Flow editor, Event Handlers editor, and Tree View editor. This is different from previous versions where all of the editors were rolled into one. Control Flow editor is similar to the DTS 2000 designer. You lay out your components in the editor and drag the success or failure arrows to the next step in the flow of the package. The Control Flow task is where you move data into or out of SQL Server. For example, you can move data to XML, use FTP, or do bulk inserts from this area. You also use this area for looping and branching logic. Most of the controls are familiar to users of DTS 2000, except that there are some new controls that act as containers for additional tasks. They are the For Loop, For Each Loop, Sequence, and the Data Flow Task as shown in Figure 2. Now we will add an Execute SQL Task and a Data Flow Task to the Control Flow design surface. Open up the toolbar and click on the Control Flow Items. Drag the Execute SQL Task and Data Flow Task to the design surface. Click and drag the Execute SQL Task's green arrow to the Data Flow Task to add the OnSuccess precedence constraint to the workflow. This is the path that the server takes if the Execute SQL Task is successful. Note that the exclamation mark indicates a warning that the package won't execute. In this case, we need to create a connection for the Execute SQL Task object. Double-click the Execute SQL Task object to open the Edit SQL Task dialog box. Choose the NorthwindDS from the Connection drop-down list. Click the ellipsis button next to the SQL Statement section. Type the following statement in the SQL editor dialog box: IF EXISTS(Select * FROM sysobjects WHERE id = object_id(N'EmpCodeTemp') and Click OK to accept the SQL statement and then click OK again to close down the Edit SQL Task dialog box. Now we will move onto the Data Flow Task object that we added earlier. Double-clicking on the Data Flow Task will open the Data Flow editor. The Data Flow editor is where you build work flows that move data around SQL Server. This would include the source and destination tables as well as tasks like the copying or merging data, much like the Transformation Properties dialog box in SQL 2000 DTS. You should notice that the toolbar reflects the Data Flow Items, which are the tasks used to transform data for manipulating strings or for use in OLAP or Data Mining applications. Drag the OLE DB Source, Derived Column Transform, and the OLE DB Destination columns to the Data Flow design surface and link up the OnSuccess precedent constraint to each object by dragging the green arrow to the next object as shown in Figure 3. Double-click the OLE DB Source object and set the Connection drop-down list to localhost.NorthwindDS and choose "User results from the following SQL command" and type the following SQL command in the editor to select the source data for the query: SELECT FirstName, LastName FROM Employees Click the Parse Query button to make sure it was typed correctly and then click the OK button to close the Source Properties dialog box. We now need to transform some of the data from the source to produce the information needed for the EmpCode field and to make the LastName field upper case. Double-click the Derived Column Transform object and type the information in two rows of the Derived Column Properties dialog box, as displayed in Table 1. Now double-click on the OLE DB Destination object and verify that the NorthwindDS is selected in the Connection drop-down list and select [dbo].[EmpCodeTemp] in the "Name of table or view" drop-down list. Make sure the field mappings are correct by clicking on the Mappings tab. Click the OK button to close the dialog box and save the package by selecting File > Save All. Now that we have saved the package, we can look at a new feature called Pan & Scan that will allow us to look at the entire project and zoom in on the elements we are interested in quickly. You can click on the cross hairs that are located in the space between the horizontal and vertical scroll bars as shown in Figure 4. You can execute the package to verify that it runs correctly by right-clicking the package and choosing Execute Package. If you have the Data Flow tab still selected, you will notice that the objects change color as the package executes. The gray objects are waiting to execute, the yellow objects are currently executing, the green objects indicate successful executing, and red objects indicate that this step failed. Another improvement is the ability to capture the output or supply input to an Execute Process Task. One thing that I did often in SQL Server 2000 DTS was execute processes, which allowed me to extend the system without writing custom tasks. The problem in the past was that I could only analyze the return code of the executable, which was often not granular enough for determining the next step to execute. The Execute Process Task in SQL Server 2005 allows us to redirect Standard Input, Standard Output, and Standard Error to a package or container variable as shown in Figure 5. This will allow you to capture more detailed information when you execute a process from DTS. The Event Handlers editor lets you include event processing while the DTS runtime is processing your package. You can add tasks graphically to the design surface for events like OnError or OnPostExecute just like you do when building a package or creating a Data Flow task. The following is a list of the events found in SQL Server 2005 DTS.
Debugging of DTS Packages The logging facility in DTS is much more detailed than in previous versions and can be applied with different settings to each package or task, instead of the same settings to all tasks. DTS logging also takes advantage of logging information for any of the above events. This means that you can log when a variable changes, in addition to failures and warnings. You can also choose to log information about when the event occurred, who was running the package, what message occurred, what data was in the variables, the elapsed time, etc. Once you configure the logging mechanisms, they are saved to an XML file that you can use to load the settings. Often you will want different log settings supported in development, testing, and production and saving them to a file for each environment will support this scenario. All you have to do is right-click on the Control Flow editor and choose Logging... to configure logging as shown in Figure 7. New Wizards in SQL Server 2005 DTS You can use this wizard to create configurations for DTS. A configuration gives you the ability to configure package variables and properties at run time. This is similar to dynamic properties that are available in SQL Server 2000 DTS, but they can be stored in an XML file that is not part of the package. This means that you can change values for properties or variables without having access to the DTS package itself or omit the XML file from some program prior to executing the package. You could use this to change the server names for development, testing, and production without touching the source DTS package. All of the values will be assigned to the package variables and properties before the package runs. You can have more than one configuration assigned to a package. The list of configurations is applied from top to bottom, which means that the last configuration will assign the value to any variable or property that is contained in other configurations. New DTS Installer for Deploying Packages You then rebuild the project by selecting Tools > Build from the main menu. If you navigate to the DeploymentOutputPath directory, you should see the following files.
Create a Custom Task Using Any .NET Language Reliability Features in DTS A checkpoint can be enabled by setting properties on the package and in the Control Flow editor. At the package level, you can set the SaveCheckpoints property to enable checkpoints for the package. You would also need to set the CheckpointFileName to the XML file that you want to persist the package state. You would then make sure that you set task's or Control Flow editor's FailPackageOnFailure or FailParentOnFailure is set to True to persist the state information on failure. You would then set the CheckPointUsage property to tell the package to ignore the checkpoint file or to use it if it exists (IfExists value) to restart the package execution. You can also store data after transformation and use it to recover a package without waiting for the transformation to recur by using Multicast Save Points. Multicast Save Points allow you to write transformed data to two locations at once without experience the performance issues of running two sets of tasks in parallel. You can choose to store the transformed data in a file or an SQL Server table. You can then use the data stored in this location to restart the execution of a DTS package that has failed without waiting for the whole package to execute again. This feature works well with the logical checkpoints. Migrating Packages from SQL Server 2000 DTS Summary References Reader Feedback: Page 1 of 1
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
|
SYS-CON Featured Whitepapers
Most Read This Week |
|||||||||||||||||||||||||||