Showing posts with label SQL. Show all posts
Showing posts with label SQL. Show all posts

Saturday, September 15, 2012

Reseed Identity Using Variable

In a sproc that involves creating a copy of a production SQL table for testing, I wanted to reseed the identity of the new table to 1 more than the max of the production key value. 

I figured at first I'd just use the appropriate syntax for DBCC and just use a variable already at hand containing the max production value and add 1 to it, like so:

 DECLARE @MaxID int

 -- Grab the max identity value from production.
 SELECT @MaxID = MAX(ID) FROM dbo.MyProductionTable

 DBCC CHECKIDENT (MyProductionTable_COPY, reseed, @MaxID + 1) 


SQL, however, would have nothing of it:
Incorrect syntax near '+'.

To work around this, I decided to try creating a dynamic SQL string to incorporate the variable arithmetic:

 DECLARE @MaxID int
 DECLARE @SQL nvarchar(100)

 -- Grab the max identity value from production.
 SELECT @MaxID = MAX(ID) FROM dbo.MyProductionTable

 SET @SQL = 'DBCC CHECKIDENT (MyProductionTable_COPY, reseed, ' 
 + CAST(@MaxID + 1 AS nvarchar(10)) 
 + ')'


Now I can create the table and set its starting identity value to 1 greater than in production using a variable despite DBCC's syntax dictating otherwise.





Friday, August 31, 2012

Enforced Dependencies and Schemabinding

In trying to rename a table I encountered the dreaded error about enforced dependencies:
Object 'myTable' cannot be renamed because the object participates in enforced dependencies.

I first used the following script to identify the objects dependent upon the table:
SELECT DISTINCT o.object_id, o.name, o.type_desc, m.definition
FROM sys.sql_dependencies d
JOIN sys.objects o ON o.object_id = d.object_id
JOIN sys.objects r ON r.object_id = d.referenced_major_id
JOIN sys.sql_modules m ON m.object_id = o.object_id
WHERE d.class = 1
AND r.name = 'myTable'

This yielded the names of three objects in my database which happen to use SCHEMABINDING in their definitions.

I didn't want to manually alter and recreate these objects, so instead I decided to create a script which will find the SQL code for each object, remove the WITH SCHEMABINDING syntax, drop the existing objects and then recreate them sans schema binding.

DECLARE
     @TheTable varchar(100),

     @Count int,
     @Max int,

     @object_id int,
     @name varchar(50),
     @type_desc varchar(50),
     @definition nvarchar(max),
 
     @DropSQL nvarchar(max),
     @CreateSQL nvarchar(max)

-- Remove schemabinding from objects tied to table.
DECLARE @Objects TABLE
(
     ID int IDENTITY (1, 1) PRIMARY KEY NOT NULL,
     object_id int,
     name varchar(50),
     type_desc varchar(50),
     definition nvarchar(max)
)

-- Set the table name here:
SET @TheTable = 'MyTable'

INSERT INTO @Objects (object_id, name, type_desc, definition)
SELECT DISTINCT o.object_id, o.name, o.type_desc, m.definition
FROM sys.sql_dependencies d
JOIN sys.objects o ON o.object_id = d.object_id
JOIN sys.objects r ON r.object_id = d.referenced_major_id
JOIN sys.sql_modules m ON m.object_id = o.object_id
WHERE d.class = 1
AND r.name = @TheTable

UPDATE @Objects
SET definition = 
     CASE
          WHEN definition LIKE '%SCHEMABINDING%' THEN REPLACE(definition, 'WITH SCHEMABINDING', '')
          ELSE definition
     END


SELECT @Count = 1, @Max = COUNT(ID) FROM @Objects

WHILE @Count <= @Max
BEGIN

     SELECT @object_id = object_id, @name = name, @type_desc = type_desc, @definition = definition
     FROM @Objects
     WHERE ID = @Count

     SET @DropSQL =   'DROP ' 
     +
     CASE @type_desc 
          WHEN 'VIEW' THEN 'VIEW'
          WHEN 'SQL_SCALAR_FUNCTION' THEN 'FUNCTION'
          WHEN 'SQL_TABLE_VALUED_FUNCTION' THEN 'FUNCTION'
          WHEN 'SQL_STORED_PROCEDURE' THEN 'PROCEDURE'
          WHEN 'SQL_TRIGGER' THEN 'TRIGGER'
     END
     +
     ' '
     + 
     @name

     PRINT @DropSQL
     EXEC sp_executeSQL @DropSQL

     SET @CreateSQL = @definition

     PRINT @CreateSQL
     EXEC sp_executeSQL @CreateSQL

     SET @Count = @Count + 1

END


Now I can incorporate this logic into a sproc and not have to worry about meddlesome constraints if I'm going to be renaming tables. The above script has been tested successfully under SQL 2008 and 2012, and will be able to automatically drop and recreate views, scalar and table valued functions, sprocs and triggers.






 

Wednesday, April 11, 2012

Pick One Random Number from a Set of Numbers

A while back I posted a means to generate random numbers in SQL. I ran across a situation where I need to pick one number randomly from a set of numbers.

I found a decent solution here, but I didn't want to create a separate view and scalar function for this task.

Instead I create a table variable with a single integer column as the primary key. Next, I insert each of the values of the small set of numbers into the table variable. Finally, I perform a SELECT against the table variable with NEWID() specified in the ORDER BY clause, so that a randomly-chosen member of the set will be returned. 

        DECLARE @Code int

        DECLARE @Codes TABLE
        (
            Code int PRIMARY KEY
        )
       
        INSERT INTO @Codes (Code) VALUES (3)  
        INSERT INTO @Codes (Code) VALUES (6)
        INSERT INTO @Codes (Code) VALUES (9)
        INSERT INTO @Codes (Code) VALUES (12)

        IF @Code IS NULL
            SELECT TOP 1 @Code = Code FROM @Codes ORDER BY NEWID()

  
In my case this is to provide a stored procedure, which may or may not have a value provided for the @Code input parameter, be able to automatically pick a random value in case @Code is NULL. The table variable isn't necessary if the value being picked is among the primary key values of an existing table; a random selection can be obtained simply by doing a SELECT with NEWID().








Monday, September 12, 2011

Google-like Filtering with RadGrid and Separate Data Class


I needed to create a Google Suggest style search interface for an ASP.NET page.

Our site recently obtained licenses to the Telerik control suite, and the powerful RadGrid control looked like it was just what I needed. I found this example on the Telerik site.

Key to this setup is having a textbox for the user to input filter criteria, which is then from the user's perspective transformed into a RadComboBox object to house matching results; then once the user selects an item from the dropdown, the RadGrid is bound and displays any matching results.

I encountered a problem because our site uses a centralized class library for CRUD outside of this particular page’s code class, meaning I could not access my application’s data context from within the GridBoundColumn class definition. The Telerik example (specifically in the MyCustomFilteringColumnCS.cs file) performs its own queries as needed using a SqlDataAdapter with a ConnectionString obtained from the application configuration. A simple SELECT statement is executed which returns matching results.

When I tried to reference my centralized data class from within the GridBoundColumn class definition, I got the following error:

Cannot access non-static property … in static context.


Below is the problem code, revealing that the property CurrentUser is inaccessible to the class.

       protected void list_ItemsRequested(object o, RadComboBoxItemsRequestedEventArgs e)
     {
        // Cannot access non-static property 'CurrentUser' in static context.
        using (MyCRUD mc = new MyCRUD(CurrentUser) )
     
  {
              ((RadComboBox)o).DataTextField = DataField;
              ((RadComboBox)o).DataValueField = DataField;
              ((RadComboBox)o).DataSource = mc.GetMatchingAddresses(e.Text);
              ((RadComboBox)o).DataBind();
        }
     }


My GridBoundColumn class does not exist until my application instantiates it with its parent RadGrid object, so I cannot directly assign a property to it. However, I stumbled upon this post which made me realize I could, in the GridBoundColumn class definition, make several changes.

  1. Define a constructor for the class which takes an existing instance of the MyCRUD class as input.
  1. Create a public property in the class definition which can be assigned the MyCRUD object.
  1. Create a private field in the class definition to contain the instance of the MyCRUD object to be utilized by the GridBoundColumn class.
 
Below is the modified class, with additions (*) indicated below.

       public class rgcFilterColumn : GridBoundColumn
      {   
            // * I added a constructor with an input parameter of the type 
            // * corresponding to my app’s CRUD object.       
            public rgcFilterColumn(MyCRUD mycrud)
            {
                TheDataContext = mycrud;
            }

            // * This field provides an instance of the rgcFilterColumn class 
            // * with the corresponding value set for the data context object.
            private readonly MyCRUD TheDataContext;

            // * This property enables the process which instantiates this 
            // * class to assign the MyCRUD object to TheDataContext.
            public MyCRUD thedatacontext
            {
                get { return TheDataContext; }
            }


     // RadGrid will call this method when it initializes
     // the controls inside the filtering item cells
            protected override void SetupFilterControls(TableCell cell)
            {
                base.SetupFilterControls(cell);
                cell.Controls.RemoveAt(0);
                RadComboBox combo = new RadComboBox
                {
                    ID = ("RadComboBox1" + UniqueName),
                    ShowToggleImage = false,
                    Skin = "Office2007",
                    EnableLoadOnDemand = true,
                    AutoPostBack = true,
                    MarkFirstMatch = true,
                    Height = Unit.Pixel(100)
                };
                combo.ItemsRequested += list_ItemsRequested;
                combo.SelectedIndexChanged += list_SelectedIndexChanged;
                cell.Controls.AddAt(0, combo);
                cell.Controls.RemoveAt(1);
            }

   // RadGrid will call this method when the value should
   // be set to the filtering input control(s)
           protected override void SetCurrentFilterValueToControl(TableCell cell)
           {
                base.SetCurrentFilterValueToControl(cell);
                RadComboBox combo = (RadComboBox)cell.Controls[0];
                if ((CurrentFilterValue != string.Empty))
                {
      combo.Text = CurrentFilterValue;
        }
     }

   //
RadGrid will call this method when the filtering value
   // should be extracted from the filtering input control(s)
           protected override string GetCurrentFilterValueFromControl(TableCell cell)
           {
                RadComboBox combo = (RadComboBox)cell.Controls[0];
                return combo.Text;
           }


           protected void list_ItemsRequested(object o, RadComboBoxItemsRequestedEventArgs e)
           {
                // * Below we use the value of field TheDataContext to execute 
                // * a method accesible via the MyCRUD data context for the application.
                using (MyCRUD mc = TheDataContext)         
                {
                     ((RadComboBox)o).DataTextField = DataField;
                     ((RadComboBox)o).DataValueField = DataField;
                     ((RadComboBox)o).DataSource = mc.GetMatchingAddresses(e.Text);
                     ((RadComboBox)o).DataBind();
                }
           }

           private void list_SelectedIndexChanged(object o, RadComboBoxSelectedIndexChangedEventArgs e)
           {
                GridFilteringItem filterItem = (GridFilteringItem)((RadComboBox)o).NamingContainer;
                if ((UniqueName == "Index"))
                {
                    // this is filtering for integer column type
                    filterItem.FireCommandEvent("Filter", new Pair("EqualTo", UniqueName));
                }
                // filtering for string column type
                filterItem.FireCommandEvent("Filter", new Pair("Contains", UniqueName));
           }

      }



Now an instance of the GridBoundColumn class can happily utilize my application's central CRUD class for all its data retrieval operations.


Thursday, February 3, 2011

Random Numbers in SQL

I needed a SQL query to insert a range of values randomly into each record of a table variable.

The SELECT statement would use the random number in conjunction with a CASE statement to pick from two different values to insert into a field, in effect tagging records randomly with one value or the other. 

My first attempt involved trying to grab the one's digit of the second of today's date via the getdate() function, and based on whether this value is even or odd, pick one or another of the values to assign to a field for the record I insert.
INSERT INTO @MyTableVariable (MyField)
SELECT

    CASE
        WHEN RIGHT(DATEPART(s, getdate()), 1) IN (0, 2, 4, 6, 8) THEN 32
        WHEN RIGHT(DATEPART(s, getdate()), 1) IN (1, 3, 5, 7, 9) THEN 39
    END AS MyField

FROM dbo.MyTable t

The result wasn't quite what I expected. Instead of one of each inserted value appearing randomly in the resulting table variable for MyField, the same value appeared each and every time.

As it turns out, I didn't realize that in the CASE statement, the seconds portion of the current time, the "seed" if you will, only got set once at the beginning of the SELECT. There is nothing in the statement to prompt the function to update the current time with every iteration, every record examined from MyTable, it only performs a one-off examination of the seconds and goes with that for the rest of the operation.


My solution involved using instead the one's digit of the integer primary key of the source table, like so.
INSERT INTO @MyTableVariable (MyField)
SELECT

    CASE                  
        WHEN RIGHT(t.ID, 1) IN (0, 2, 4, 6, 8) THEN 32        -- Even
        WHEN RIGHT(t.ID, 1) IN (1, 3, 5, 7, 9) THEN 39        -- Odd            
    END AS MyField

FROM dbo.MyTable t

In this case, the subset of data that I'm querying from MyTable is just that, a subset, which means that the primary key (ID) won't be in any particular order; there are plenty of opportunities for the one's digit to be odd or even. This provides a reasonably random sampling of records in the resulting table variable.





Thursday, December 2, 2010

String Concatenation and Implicit Conversions

I needed to modify a SQL function to strip some leading zeros in order to concatenate a string. Part of this process involved dealing with pluralization of the number of cents (I'm picky that way).


SELECT
 CASE
  WHEN CAST(CAST(CONVERT(decimal(18,6), 0.30) * 100 AS float) AS varchar(10)) = 1 THEN ''
  WHEN CAST(CAST(CONVERT(decimal(18,6), 0.30) * 100 AS float) AS varchar(10)) > 1 THEN 's' 
  ELSE ''
 END
 

The idea is that if the value (0.30 representing cents) translates into a number greater than 1, the resulting concatenated sentence will read something like "The discount is 30 cents." If the value equals 1, however, it would read "The discount is 1 cent."

When the above is executed, I get an error:

 Msg 245, Level 16, State 1, Line 1
 Conversion failed when converting the varchar value '0.03' to data type int.



At first this baffled me. Data type int?? I wasn't trying to obtain an integer value, I was ultimately trying to obtain a varchar for this string I'm building.

Then it hit me, SQL is performing an implicit conversion when it is comparing the value resulting from the conversion first to decimal, then float, and finally to varchar, with 1.

Lesson learned, when attempting to compare two varchar with numeric values, make sure that if the value being tested is decimal, that the static value being used to compare it against is also decimal.

I made the following changes and now it works flawlessly.


SELECT
 CASE
  WHEN CAST(CAST(CONVERT(decimal(18,6), 0.30) * 100 AS float) AS varchar(10)) = 1.0 THEN ''
  WHEN CAST(CAST(CONVERT(decimal(18,6), 0.30) * 100 AS float) AS varchar(10)) > 1.0 THEN 's' 
  ELSE ''
 END
 





Friday, October 15, 2010

Hammering A Stored Procedures vs Dynamic SQL Debate

While looking for differing viewpoints on stored procedures vs dynamic SQL, I discovered an article by Frans Bouma from a few years ago, wherein he passionately criticizes elements of a post by Rob Howard.

At the time of Bouma’s post I’d had some experience with Microsoft SQL 2000, but I was at a stage where I was doing the SQL equivalent of meatball surgery on databases rather than design. My tasks mainly involved adapting existing code to suit new or revised business logic. As a result, I can relate to aspects of where each person is coming from.

Bouma and Howard both make strong cases for their arguments. Howard, clearly in favor of stored procedures, maintains that a balanced approach should be taken to designing the data access layer to optimize data interchange, and cites stored procedures as an effective tool to abstract the data and improve efficiency and performance. Bouma, a dynamic SQL convert, strongly advocates that paradigm, favoring SQL code emitted by high-level objects to creating discrete stored procedures for specific data management tasks on the fly. He also cites some deep technical details specific to Microsoft SQL which he uses to question some of Howard’s suggested implementations of stored procedures.

At face value, each person argues strongly for their viewpoint, but whether stored procedures or dynamic SQL are in and of themselves superior depends on the design and implementation of whatever system will be hitting the database. As I read each person’s arguments I got the distinct impression that each is somewhat mired in the advantages of their preferred paradigm.

The gist of this argument from years ago can be summed up in each author’s words. Howard states, “Embedded SQL script is very brittle; small changes to the database can have severe impacts on the application.” Bouma says in response to one of the comments in his blog post, “True, it requires some work to get a dynamic query generator right, but once you have one, you don't have to recode it again.”

Both these statements ring true… under a given set of circumstances. If I need code to grab values from a static lookup table, which do I use? Probably either, since unless something major changes in the design, I won’t need to touch the query frequently to modify it. However, what if I need to do something more exotic, like process a payment or mine some data based on input variables that might differ from one query to the next? Again, either methodology could be used for these purposes. It depends I think on how effectively the tools are built to deal with the data, and whether they meet the needs of the design.

Quite frankly, it seemed to me that the authors were each getting too wrapped up in their respective favorites, and arguably from two extremes of the IT spectrum. Howard’s argument seemed somewhat less technical and more geared towards an executive’s ear, with buzzwords about efficiency and performance, whereas Bouma’s deep technical knowledge describes a quintessent engineer, quite eager to implement some project with the tools he is comfortable and happy with.

A hammer is great for driving nails into wood, and for yanking nails out of wood with its secondary nail-removing end. It’s streamlined and clearly purposed for its intended use. Would it be appropriate to use a hammer to spackle a hole in the wall, or break pieces of tile into neat square shapes?

Sure, you might manage to adapt the hammer to the task, using some duct tape to secure a credit card or other stiff piece of whatever to more effectively spackle, or try to direct the blows of the hammer to snap the tile into shape without shattering it.

Would it be efficient? Absolutely not.


Friday, July 16, 2010

Optimizing SQL Views Profit++, Fun--

While working feverishly towards meeting a looming deadline on some SQL code, a problem arose. A particular UI that relies on a particular stored proc to perform a particular task was bombing. 

I brought up SQL Profiler and retraced the user's steps, and found the culprit, a convoluted view I was joining to as a step within the update seemed to be taking way too long to retrieve data.

The view started out relatively simple, a SELECT over several busy tables joined together along with a few scalar function calls and a very few subqueries nestled within some CASE statements. It evolved, thanks to lack of time and impatience and poor choices on my part, into a bit of a behemoth; it took roughly 1:30 to return the full result set. This didn't seem like a big deal at the time, however, because I planned to use it to pluck out specific entries, for this purpose it returned results in less than a second or two, which was acceptable.

Using it in a join though was a relatively new, hastily chosen design decision, and apparently it was proving a very poor choice since it was causing the UI to time out. I began by optimizing the view, minimizing the interaction with the busy tables as much as possible, avoiding subqueries unless absolutely necessary, and creating some new scalar functions to replace previously funky logic. No joy, however, as this only shaved around 10 seconds off the runtime for a SELECT, too little, I thought.

So began the quest to create a table-value function instead, full of optimizations like stuffing frequently used values into variables and creating a subset of the main tables in a table variable (impossible from within a view) and minimizing the use of scalar functions and even resorting to applying the WITH (NOLOCK) and eliminating string searches wherever possible...

It turns out, however, that the entire two paragraphs of chronicled woe were completely unnecessary. Here's a SELECT from within the stored proc which joins with the aforementioned view, notice anything odd?
SELECT TOP 1 @Fruit = 
        
  CASE 
      WHEN v.Operation IN ('APPLE', 'BANANA', 'GRAPE') THEN v.Operation
      ELSE NULL
  END

FROM dbo.ViewOfPain v
WHERE v.ID = ID 
ORDER BY v.InputDate DESC
  

Let's see, looks like I'm just trying a simple SELECT from the view with a CASE statement, checking the ID against the ID, yeah, that looks fi-- O SHI

Notice the missing '@' sign before the ID in the where clause??! I did, after spending a few hours writing a table function that I turned out not to need. The problem wasn't with the view, but with this bit of code in the stored proc, it was doing the equivalent of a SELECT on the entire view plus a bit more overhead to assign a value to the @Fruit variable. I ran this statement by itself in query analyzer, and indeed, it took well over 1:30 to execute!

Below is the code as it should've been, which returns almost instantaneously.
SELECT TOP 1 @Fruit = 
        
  CASE 
      WHEN v.Operation IN ('APPLE', 'BANANA', 'GRAPE') THEN v.Operation
      ELSE NULL
  END

FROM dbo.ViewOfPain v
WHERE v.ID = @ID
ORDER BY v.InputDate DESC


I'd spent hours optimizing and then finally creating a new table function to replace a view that turned out not to be the culprit. If I'd spot-checked my syntax in the stored proc, it's likely I could've picked out this omission of the '@' much sooner, but I was so tired and far gone into troubleshooting that I was getting increasingly desperate. 

The moral of my story is, take a break, unhook yourself from the problem, and review it again later with a fresh mind.

Wednesday, February 18, 2009

When MAX Isn't - Getting the Maximum Value of a VARCHAR Field

A user reported that a UI being fed by a particular SQL stored procedure was returning the wrong dollar amount for a customer account on an invoicing system I help maintain.

The particulars of the (somewhat convoluted) balance calculation aren't terribly exciting, but in a nutshell, a part of the process involves grabbing the most recent customer invoice by its InvoiceNumber, and using a value associated with that invoice to perform other calculations which eventually lead to the account balance.

This is a list of the customer's InvoiceNumber values, in descending order as they appear in the Invoices table.
SELECT InvoiceNumber
FROM Invoices
WHERE AccountKey = '99432'
ORDER BY InvoiceNumber DESC

12345678900902
12345678900901
1234567890507
1234567890506
1234567890505

AccountKey represents a foreign key in the Invoices table representing the customer account. Notice that the result set shows the most recent Invoice at the top of the list, and since the primary key for the Invoices table is auto-incrementing, that invoice possesses the largest key value.

Some years back, a data fix was made to this table whereby the customer account number (in this case, 1234567890) was changed to add an additional digit. Thus, what was once 1234567890 became 12345678900.

Time went by. In the last couple of months, a different data fix was made to a piece of code responsible for calculating account balances, specifically late fee calculations. This turned out to be the crux of the matter, as you'll see here.

The code which calculates the account balance, as I mentioned, needs to obtain the customer's most recent InvoiceNumber. The code used to grab this had been modified to add this section of code as follows:
SELECT MAX(InvoiceNumber)
FROM Invoices
WHERE AccountKey = '99432'

1234567890507

Take a look at the position of this InvoiceNumber in the actual list:
12345678900902
12345678900901
1234567890507
1234567890506
1234567890505

Rhut rho! There's the problem. The InvoiceNumber field is of the SQL data type varchar, and consequently the MAX function is returning what is alphanumerically the largest value, when in fact numerically the one at the top of the list should be returned instead. This resulted in the wrong InvoiceNumber being grabbed as part of the late fee calculation, and this further led to a certain balance threshold not being exceeded, which then altered the balance from what it needs to be.

The fix is easy, just CAST the InvoiceNumber to a decimal within the MAX function.
SELECT MAX(CAST(InvoiceNumber AS decimal(18,0))
FROM Invoices
WHERE AccountKey = '99432'

12345678900902

Modifying this code allowed the process to properly obtain and work with the most recent InvoiceNumber. You might be asking, why not just avoid the hassle, and grab the MAX of the Invoices table's primary key for the given AccountKey value? That would certainly make more sense, and would be more efficient, after all, the primary key field for the Invoices table is an integer, not a varchar(25) like the InvoiceNumber.

However, the code which does the calculations is outside my normal scope, so rather than modify someone else's handiwork to the extent I'd prefer, I'm opting for a less invasive change which will compensate for whatever necessitates getting the InvoiceNumber directly. In resolving this issue, I've left detailed commentary in the work order on our internal helpdesk system, so that whomever stumbles across this code in future will have a point of reference.

Friday, January 2, 2009

Extend Existing XML Parsing to Support Multiple Child Tags

I work on a database which is currently a mix of Microsoft SQL 2000 and 2005 T-SQL code. As part of a project I’m working on, I'm working with a stored procedure which receives a reference to an XML document in the form of an integer variable @idoc, and this XML document’s structure needs to be extended with some additional elements.

I'd prefer to refactor the plumbing which generates the XML in the remote stored procedure so that I can take advantage of the new XML data type in 2005, which would allow me to simply pass the XML as a parameter to my stored proc, but unfortunately time is of the essence, so I need to develop an alternative which doesn't involve refactoring to that extent.

The current XML resembles the following:

<XML>
<AccountInfo
Number="123456789"
Status="Active"
Region="SE" />
</XML>


The new XML will look like this:

<XML>
<AccountInfo
Number="123456789"
Status="Active"
Region="SE" />
<State>
<StateInfo
Name="Florida"
Population="9488294"
AreaSquareMiles="58560"
LandAreaSquareMiles="54252"
WaterAreaSquareMiles="4308" />
<County
Name="Alachua"
Population="277120" />
<County
Name="Union"
Population="14842" />
</State>
</XML>


The stored procedure which I need to modify for my project receives the pointer to the XML document, @idoc, and after extracting values from it makes various other calls to other stored procs.

A new State element is being added, with a child StateInfo element that describes the state in question, and any number of child County elements describing counties in that state. In other words, this XML describes an account, the state it operates in, and however many counties the state contains which are relevant to a given account.

My major challenge with this is in the form of the new County tags. Whereas previously there would only ever be a single instance of any particular tag in this XML, there could potentially be dozens of County tags associated with a given State for an account! For each County I need to be able to iterate through the collection and provide these details somehow to my stored procedure for further processing.

My work is pretty well cut out for me. I need to find a way to:
  • Rebuild the XML document using the @idoc as a reference.
  • Grab the new XML specific to each County tag and serialize the values.
To reproduce the XML in the state my stored procedure will be consuming it, I first needed to generate my own @idoc using sp_xml_preparedocument like this:

DECLARE @TestXML varchar(4000)
DECLARE @idoc int

SET @TestXML = '<XML>
<AccountInfo
Number="123456789"
Status="Active"
Region="SE" />
<State>
<StateInfo
Name="Florida"
Population="9488294"
AreaSquareMiles="58560"
LandAreaSquareMiles="54252"
WaterAreaSquareMiles="4308" />
<County
Name="Alachua"
Population="277120" />
<County
Name="Union"
Population="14842" />
</State>
</XML>'

EXEC sp_xml_preparedocument @idoc OUTPUT, @TestXML


At this point, @idoc is a reference to an accurate representation of the XML structure my stored proc will receive and process.

As I mentioned, the code outside of the scope of my stored proc is SQL 2000 based, but that doesn’t mean I can’t use some of the great new capabilities in 2005 in my code. Thus, here I use the OPENXML command to crack open the XML, in conjunction with the FOR XML PATH directive to build the XML structure.

-- First build the XML to give @idoc a reason to live.
EXEC sp_xml_preparedocument @idoc OUTPUT, @TestXML

SELECT
(
SELECT Number, Status, Region
FROM OPENXML(@idoc, '/XML/AccountInfo', 1) WITH (
Number int,
Status varchar(100),
Region varchar(2)
) AS StateInfo FOR XML PATH('AccountInfo'), Type
),
(
SELECT
(
SELECT Name,
Population,
AreaSquareMiles,
LandAreaSquareMiles,
WaterAreaSquareMiles
FROM OPENXML(@idoc, '/XML/StateInfo', 1) WITH (
Name varchar(100),
Population int,
AreaSquareMiles int,
LandAreaSquareMiles int,
WaterAreaSquareMiles int
) AS StateInfo FOR XML PATH('StateInfo'), Type
),

(
SELECT Name,
Population
FROM OPENXML(@idoc, '/XML/County', 1) WITH (
Name varchar(100),
Population int
) AS County FOR XML PATH('County'), Type
)
FOR XML PATH('State'), Type
)
FOR XML PATH('XML')

-- Eliminate the document.
EXEC sp_xml_removedocument @idoc


Here is the output from execution of the above code. I’m done… right??

<XML>
<AccountInfo>
<Number>123456789</Number>
<Status>Active</Status>
<Region>SE</Region>
</AccountInfo>
<State>
<StateInfo>
<Name>Florida</Name>
<Population>9488294</Population>
<AreaSquareMiles>58560</AreaSquareMiles>
<LandAreaSquareMiles>54252</LandAreaSquareMiles>
<WaterAreaSquareMiles>4308</WaterAreaSquareMiles>
</StateInfo>
<County>
<Name>Alachua</Name>
<Population>277120</Population>
</County>
<County>
<Name>Union</Name>
<Population>14842</Population>
</County>
</State>
</XML>


Hmmm, not quite. Notice that there are separate tags beneath AccountInfo, StateInfo and County. Whereas my prototype XML has the values under these tags in the form of XML attributes, they appear here as tags. Not good!

I found an article by Jerry Dixon which led me to the answer. I just needed to take each item in my SELECT statements and use AS ‘@<attributeName> to tell SQL to output the item as an attribute of the node rather than a separate element.

EXEC sp_xml_preparedocument @idoc OUTPUT, @TestXML

SELECT
(
SELECT Number AS '@Number',
Status AS '@Status',
Region AS '@Region'
FROM OPENXML(@idoc, '/XML/AccountInfo', 1) WITH (
Number int,
Status varchar(100),
Region varchar(2)
) AS StateInfo FOR XML PATH('AccountInfo'), Type
),

(
SELECT Name AS '@Name',
Population AS '@Population',
AreaSquareMiles AS '@AreaSquareMiles',
LandAreaSquareMiles AS '@LandAreaSquareMiles',
WaterAreaSquareMiles AS '@WaterAreaSquareMiles'
FROM OPENXML(@idoc, '/XML/State/StateInfo', 1) WITH (
Name varchar(100),
Population int,
AreaSquareMiles int,
LandAreaSquareMiles int,
WaterAreaSquareMiles int
) AS StateInfo FOR XML PATH('StateInfo'), Type
),

(
SELECT Name AS '@Name',
Population AS '@Population'
FROM OPENXML(@idoc, '/XML/State/County', 1) WITH (
Name varchar(100),
Population int
) AS County FOR XML PATH('County'), Type
)

FOR XML PATH('XML')

EXEC sp_xml_removedocument @idoc


Now my rebuilt representation of the original XML document, reconstructed using @idoc as a reference, is just the way I need it to be:

<XML>
<AccountInfo Number="123456789" Status="Active" Region="SE" />
<State>
<StateInfo
Name="Florida"
Population="9488294"
AreaSquareMiles="58560"
LandAreaSquareMiles="54252"
WaterAreaSquareMiles="4308" />
<County Name="Alachua" Population="277120" />
<County Name="Union" Population="14842" />
</State>
</XML>


The next requirement, how to serialize only the County values tied to each account? I can use some of the code which reconstructs the County nodes, above, and omit the AS and FOR XML PATH(‘County’), Type syntax so that a result set rather than XML is returned:

   SELECT Name, Population
FROM OPENXML(@idoc, '/XML/State/County', 1) WITH (
Name varchar(100),
Population int
)


Here’s the result set output by the SELECT:


   Name          Population

Alachua 277120
Union 14842


I defined a table variable to hold the County values, along with its own autoincrementing ID (just because it’s handy to have a unique identifier for other stuff happening in my stored proc).


DECLARE @County TABLE
(
ID int IDENTITY (1, 1) PRIMARY KEY NOT NULL,
Name varchar(100),
Population int
)


Finally, I populate the table variable with the County data I extract from the XML:


INSERT INTO @County (Name, Population)
SELECT Name, Population
FROM OPENXML(@idoc, '/XML/State/County', 1) WITH (
Name varchar(100),
Population int
)


In summary
, I’ve created code which extends the stored procedure so that it can handle multiple instances of a tag, whereas previously it was built to expect only a single instance of each tag. This enables me to properly recombobulate, if you will, the original XML passed by reference from a remote stored proc, then serialize all instances of the new tag and mess with them elsewhere in my code.


___

Special thanks to Greg Houston and his article Format My Source Code For Blogging which helped me compensate for difficulties I had with formatting my code.