Friday, January 2, 2009

Extend Existing XML Parsing to Support Multiple Child Tags

I work on a database which is currently a mix of Microsoft SQL 2000 and 2005 T-SQL code. As part of a project I’m working on, I'm working with a stored procedure which receives a reference to an XML document in the form of an integer variable @idoc, and this XML document’s structure needs to be extended with some additional elements.

I'd prefer to refactor the plumbing which generates the XML in the remote stored procedure so that I can take advantage of the new XML data type in 2005, which would allow me to simply pass the XML as a parameter to my stored proc, but unfortunately time is of the essence, so I need to develop an alternative which doesn't involve refactoring to that extent.

The current XML resembles the following:

<XML>
<AccountInfo
Number="123456789"
Status="Active"
Region="SE" />
</XML>


The new XML will look like this:

<XML>
<AccountInfo
Number="123456789"
Status="Active"
Region="SE" />
<State>
<StateInfo
Name="Florida"
Population="9488294"
AreaSquareMiles="58560"
LandAreaSquareMiles="54252"
WaterAreaSquareMiles="4308" />
<County
Name="Alachua"
Population="277120" />
<County
Name="Union"
Population="14842" />
</State>
</XML>


The stored procedure which I need to modify for my project receives the pointer to the XML document, @idoc, and after extracting values from it makes various other calls to other stored procs.

A new State element is being added, with a child StateInfo element that describes the state in question, and any number of child County elements describing counties in that state. In other words, this XML describes an account, the state it operates in, and however many counties the state contains which are relevant to a given account.

My major challenge with this is in the form of the new County tags. Whereas previously there would only ever be a single instance of any particular tag in this XML, there could potentially be dozens of County tags associated with a given State for an account! For each County I need to be able to iterate through the collection and provide these details somehow to my stored procedure for further processing.

My work is pretty well cut out for me. I need to find a way to:
  • Rebuild the XML document using the @idoc as a reference.
  • Grab the new XML specific to each County tag and serialize the values.
To reproduce the XML in the state my stored procedure will be consuming it, I first needed to generate my own @idoc using sp_xml_preparedocument like this:

DECLARE @TestXML varchar(4000)
DECLARE @idoc int

SET @TestXML = '<XML>
<AccountInfo
Number="123456789"
Status="Active"
Region="SE" />
<State>
<StateInfo
Name="Florida"
Population="9488294"
AreaSquareMiles="58560"
LandAreaSquareMiles="54252"
WaterAreaSquareMiles="4308" />
<County
Name="Alachua"
Population="277120" />
<County
Name="Union"
Population="14842" />
</State>
</XML>'

EXEC sp_xml_preparedocument @idoc OUTPUT, @TestXML


At this point, @idoc is a reference to an accurate representation of the XML structure my stored proc will receive and process.

As I mentioned, the code outside of the scope of my stored proc is SQL 2000 based, but that doesn’t mean I can’t use some of the great new capabilities in 2005 in my code. Thus, here I use the OPENXML command to crack open the XML, in conjunction with the FOR XML PATH directive to build the XML structure.

-- First build the XML to give @idoc a reason to live.
EXEC sp_xml_preparedocument @idoc OUTPUT, @TestXML

SELECT
(
SELECT Number, Status, Region
FROM OPENXML(@idoc, '/XML/AccountInfo', 1) WITH (
Number int,
Status varchar(100),
Region varchar(2)
) AS StateInfo FOR XML PATH('AccountInfo'), Type
),
(
SELECT
(
SELECT Name,
Population,
AreaSquareMiles,
LandAreaSquareMiles,
WaterAreaSquareMiles
FROM OPENXML(@idoc, '/XML/StateInfo', 1) WITH (
Name varchar(100),
Population int,
AreaSquareMiles int,
LandAreaSquareMiles int,
WaterAreaSquareMiles int
) AS StateInfo FOR XML PATH('StateInfo'), Type
),

(
SELECT Name,
Population
FROM OPENXML(@idoc, '/XML/County', 1) WITH (
Name varchar(100),
Population int
) AS County FOR XML PATH('County'), Type
)
FOR XML PATH('State'), Type
)
FOR XML PATH('XML')

-- Eliminate the document.
EXEC sp_xml_removedocument @idoc


Here is the output from execution of the above code. I’m done… right??

<XML>
<AccountInfo>
<Number>123456789</Number>
<Status>Active</Status>
<Region>SE</Region>
</AccountInfo>
<State>
<StateInfo>
<Name>Florida</Name>
<Population>9488294</Population>
<AreaSquareMiles>58560</AreaSquareMiles>
<LandAreaSquareMiles>54252</LandAreaSquareMiles>
<WaterAreaSquareMiles>4308</WaterAreaSquareMiles>
</StateInfo>
<County>
<Name>Alachua</Name>
<Population>277120</Population>
</County>
<County>
<Name>Union</Name>
<Population>14842</Population>
</County>
</State>
</XML>


Hmmm, not quite. Notice that there are separate tags beneath AccountInfo, StateInfo and County. Whereas my prototype XML has the values under these tags in the form of XML attributes, they appear here as tags. Not good!

I found an article by Jerry Dixon which led me to the answer. I just needed to take each item in my SELECT statements and use AS ‘@<attributeName> to tell SQL to output the item as an attribute of the node rather than a separate element.

EXEC sp_xml_preparedocument @idoc OUTPUT, @TestXML

SELECT
(
SELECT Number AS '@Number',
Status AS '@Status',
Region AS '@Region'
FROM OPENXML(@idoc, '/XML/AccountInfo', 1) WITH (
Number int,
Status varchar(100),
Region varchar(2)
) AS StateInfo FOR XML PATH('AccountInfo'), Type
),

(
SELECT Name AS '@Name',
Population AS '@Population',
AreaSquareMiles AS '@AreaSquareMiles',
LandAreaSquareMiles AS '@LandAreaSquareMiles',
WaterAreaSquareMiles AS '@WaterAreaSquareMiles'
FROM OPENXML(@idoc, '/XML/State/StateInfo', 1) WITH (
Name varchar(100),
Population int,
AreaSquareMiles int,
LandAreaSquareMiles int,
WaterAreaSquareMiles int
) AS StateInfo FOR XML PATH('StateInfo'), Type
),

(
SELECT Name AS '@Name',
Population AS '@Population'
FROM OPENXML(@idoc, '/XML/State/County', 1) WITH (
Name varchar(100),
Population int
) AS County FOR XML PATH('County'), Type
)

FOR XML PATH('XML')

EXEC sp_xml_removedocument @idoc


Now my rebuilt representation of the original XML document, reconstructed using @idoc as a reference, is just the way I need it to be:

<XML>
<AccountInfo Number="123456789" Status="Active" Region="SE" />
<State>
<StateInfo
Name="Florida"
Population="9488294"
AreaSquareMiles="58560"
LandAreaSquareMiles="54252"
WaterAreaSquareMiles="4308" />
<County Name="Alachua" Population="277120" />
<County Name="Union" Population="14842" />
</State>
</XML>


The next requirement, how to serialize only the County values tied to each account? I can use some of the code which reconstructs the County nodes, above, and omit the AS and FOR XML PATH(‘County’), Type syntax so that a result set rather than XML is returned:

   SELECT Name, Population
FROM OPENXML(@idoc, '/XML/State/County', 1) WITH (
Name varchar(100),
Population int
)


Here’s the result set output by the SELECT:


   Name          Population

Alachua 277120
Union 14842


I defined a table variable to hold the County values, along with its own autoincrementing ID (just because it’s handy to have a unique identifier for other stuff happening in my stored proc).


DECLARE @County TABLE
(
ID int IDENTITY (1, 1) PRIMARY KEY NOT NULL,
Name varchar(100),
Population int
)


Finally, I populate the table variable with the County data I extract from the XML:


INSERT INTO @County (Name, Population)
SELECT Name, Population
FROM OPENXML(@idoc, '/XML/State/County', 1) WITH (
Name varchar(100),
Population int
)


In summary
, I’ve created code which extends the stored procedure so that it can handle multiple instances of a tag, whereas previously it was built to expect only a single instance of each tag. This enables me to properly recombobulate, if you will, the original XML passed by reference from a remote stored proc, then serialize all instances of the new tag and mess with them elsewhere in my code.


___

Special thanks to Greg Houston and his article Format My Source Code For Blogging which helped me compensate for difficulties I had with formatting my code.