Showing posts with label XML. Show all posts
Showing posts with label XML. Show all posts

Saturday, November 27, 2010

Pretty Formatting for XML or HTML

Sometimes it's helpful to pretty up XML or HTML in order to more easily analyze it. Many tools create machine-readable XML or HTML, which is well-suited to being parsed by computers, but whose formatting can be painful for a human to wade through. 

You might think, why do humans need to be able to work with machine-readable text, anyway? Well, if you happen to be the new guy on the job and get stuck with having to debug some code that spits out some XML or HTML you've never dealt with before for some complex data transformation processing, you can see why it might be necessary, particularly if something changes between the sender and receiver and the XML, as produced, no longer does the job.

In Notepad++, one of my favorite text editors, I can utilize the TextFX plugin to take a given chunk of XML or HTML and clean it up, neatly indenting the nodes and attributes for easier examination. I paste the text into the editor and select TextFX => TextFX HTML Tidy => TiDy clean document - wrap. Finally I select Language => XML or HTML to enable syntax highlighting.

Prior to discovering Notepad++, I had discovered the freeware MoreMotion XML Editor a few years ago when I was having to dissect some XML being output by a tool whose XML output I needed to clean up. Provided the XML was well-formed, I could very quickly pretty it up by clicking XML => Pretty Format, or even more easily with a quick key combo, Shift-Ctrl-P.

BEFORE: Messy, messy XML!


AFTER: Ahhh, clean, legible, structured XML.

Unfortunately, the original download URL for MoreMotion XML Editor appears to be defunct, leading to a parked page rather than the actual file. You can download MoreMotion's more comprehensive XML application suite here, but if you just need relatively simple XML editing provided by the older (and much smaller, around 1.3 MB) freeware tool, this working download link fits the bill. The ZIP file contains the EXE, which can be copied to a convenient folder and run.

A few caveats, however. MoreMotion seems to have a few editing issues with extremely long lines of text; you may notice keypresses on such lines may take a long time to register. Also, whereas Notepad++ will do its best to deal with poorly-formed XML or HTML and log any errors it encounters, MoreMotion will not allow you to proceed until the errors are resolved, and in my experience some of these errors can be difficult to pin down.

Nevertheless, for relatively small chunks of well-formed XML or HTML, MoreMotion XML Editor makes cleaning up XML or HTML just a few steps quicker than Notepad++.


Friday, January 2, 2009

Extend Existing XML Parsing to Support Multiple Child Tags

I work on a database which is currently a mix of Microsoft SQL 2000 and 2005 T-SQL code. As part of a project I’m working on, I'm working with a stored procedure which receives a reference to an XML document in the form of an integer variable @idoc, and this XML document’s structure needs to be extended with some additional elements.

I'd prefer to refactor the plumbing which generates the XML in the remote stored procedure so that I can take advantage of the new XML data type in 2005, which would allow me to simply pass the XML as a parameter to my stored proc, but unfortunately time is of the essence, so I need to develop an alternative which doesn't involve refactoring to that extent.

The current XML resembles the following:

<XML>
<AccountInfo
Number="123456789"
Status="Active"
Region="SE" />
</XML>


The new XML will look like this:

<XML>
<AccountInfo
Number="123456789"
Status="Active"
Region="SE" />
<State>
<StateInfo
Name="Florida"
Population="9488294"
AreaSquareMiles="58560"
LandAreaSquareMiles="54252"
WaterAreaSquareMiles="4308" />
<County
Name="Alachua"
Population="277120" />
<County
Name="Union"
Population="14842" />
</State>
</XML>


The stored procedure which I need to modify for my project receives the pointer to the XML document, @idoc, and after extracting values from it makes various other calls to other stored procs.

A new State element is being added, with a child StateInfo element that describes the state in question, and any number of child County elements describing counties in that state. In other words, this XML describes an account, the state it operates in, and however many counties the state contains which are relevant to a given account.

My major challenge with this is in the form of the new County tags. Whereas previously there would only ever be a single instance of any particular tag in this XML, there could potentially be dozens of County tags associated with a given State for an account! For each County I need to be able to iterate through the collection and provide these details somehow to my stored procedure for further processing.

My work is pretty well cut out for me. I need to find a way to:
  • Rebuild the XML document using the @idoc as a reference.
  • Grab the new XML specific to each County tag and serialize the values.
To reproduce the XML in the state my stored procedure will be consuming it, I first needed to generate my own @idoc using sp_xml_preparedocument like this:

DECLARE @TestXML varchar(4000)
DECLARE @idoc int

SET @TestXML = '<XML>
<AccountInfo
Number="123456789"
Status="Active"
Region="SE" />
<State>
<StateInfo
Name="Florida"
Population="9488294"
AreaSquareMiles="58560"
LandAreaSquareMiles="54252"
WaterAreaSquareMiles="4308" />
<County
Name="Alachua"
Population="277120" />
<County
Name="Union"
Population="14842" />
</State>
</XML>'

EXEC sp_xml_preparedocument @idoc OUTPUT, @TestXML


At this point, @idoc is a reference to an accurate representation of the XML structure my stored proc will receive and process.

As I mentioned, the code outside of the scope of my stored proc is SQL 2000 based, but that doesn’t mean I can’t use some of the great new capabilities in 2005 in my code. Thus, here I use the OPENXML command to crack open the XML, in conjunction with the FOR XML PATH directive to build the XML structure.

-- First build the XML to give @idoc a reason to live.
EXEC sp_xml_preparedocument @idoc OUTPUT, @TestXML

SELECT
(
SELECT Number, Status, Region
FROM OPENXML(@idoc, '/XML/AccountInfo', 1) WITH (
Number int,
Status varchar(100),
Region varchar(2)
) AS StateInfo FOR XML PATH('AccountInfo'), Type
),
(
SELECT
(
SELECT Name,
Population,
AreaSquareMiles,
LandAreaSquareMiles,
WaterAreaSquareMiles
FROM OPENXML(@idoc, '/XML/StateInfo', 1) WITH (
Name varchar(100),
Population int,
AreaSquareMiles int,
LandAreaSquareMiles int,
WaterAreaSquareMiles int
) AS StateInfo FOR XML PATH('StateInfo'), Type
),

(
SELECT Name,
Population
FROM OPENXML(@idoc, '/XML/County', 1) WITH (
Name varchar(100),
Population int
) AS County FOR XML PATH('County'), Type
)
FOR XML PATH('State'), Type
)
FOR XML PATH('XML')

-- Eliminate the document.
EXEC sp_xml_removedocument @idoc


Here is the output from execution of the above code. I’m done… right??

<XML>
<AccountInfo>
<Number>123456789</Number>
<Status>Active</Status>
<Region>SE</Region>
</AccountInfo>
<State>
<StateInfo>
<Name>Florida</Name>
<Population>9488294</Population>
<AreaSquareMiles>58560</AreaSquareMiles>
<LandAreaSquareMiles>54252</LandAreaSquareMiles>
<WaterAreaSquareMiles>4308</WaterAreaSquareMiles>
</StateInfo>
<County>
<Name>Alachua</Name>
<Population>277120</Population>
</County>
<County>
<Name>Union</Name>
<Population>14842</Population>
</County>
</State>
</XML>


Hmmm, not quite. Notice that there are separate tags beneath AccountInfo, StateInfo and County. Whereas my prototype XML has the values under these tags in the form of XML attributes, they appear here as tags. Not good!

I found an article by Jerry Dixon which led me to the answer. I just needed to take each item in my SELECT statements and use AS ‘@<attributeName> to tell SQL to output the item as an attribute of the node rather than a separate element.

EXEC sp_xml_preparedocument @idoc OUTPUT, @TestXML

SELECT
(
SELECT Number AS '@Number',
Status AS '@Status',
Region AS '@Region'
FROM OPENXML(@idoc, '/XML/AccountInfo', 1) WITH (
Number int,
Status varchar(100),
Region varchar(2)
) AS StateInfo FOR XML PATH('AccountInfo'), Type
),

(
SELECT Name AS '@Name',
Population AS '@Population',
AreaSquareMiles AS '@AreaSquareMiles',
LandAreaSquareMiles AS '@LandAreaSquareMiles',
WaterAreaSquareMiles AS '@WaterAreaSquareMiles'
FROM OPENXML(@idoc, '/XML/State/StateInfo', 1) WITH (
Name varchar(100),
Population int,
AreaSquareMiles int,
LandAreaSquareMiles int,
WaterAreaSquareMiles int
) AS StateInfo FOR XML PATH('StateInfo'), Type
),

(
SELECT Name AS '@Name',
Population AS '@Population'
FROM OPENXML(@idoc, '/XML/State/County', 1) WITH (
Name varchar(100),
Population int
) AS County FOR XML PATH('County'), Type
)

FOR XML PATH('XML')

EXEC sp_xml_removedocument @idoc


Now my rebuilt representation of the original XML document, reconstructed using @idoc as a reference, is just the way I need it to be:

<XML>
<AccountInfo Number="123456789" Status="Active" Region="SE" />
<State>
<StateInfo
Name="Florida"
Population="9488294"
AreaSquareMiles="58560"
LandAreaSquareMiles="54252"
WaterAreaSquareMiles="4308" />
<County Name="Alachua" Population="277120" />
<County Name="Union" Population="14842" />
</State>
</XML>


The next requirement, how to serialize only the County values tied to each account? I can use some of the code which reconstructs the County nodes, above, and omit the AS and FOR XML PATH(‘County’), Type syntax so that a result set rather than XML is returned:

   SELECT Name, Population
FROM OPENXML(@idoc, '/XML/State/County', 1) WITH (
Name varchar(100),
Population int
)


Here’s the result set output by the SELECT:


   Name          Population

Alachua 277120
Union 14842


I defined a table variable to hold the County values, along with its own autoincrementing ID (just because it’s handy to have a unique identifier for other stuff happening in my stored proc).


DECLARE @County TABLE
(
ID int IDENTITY (1, 1) PRIMARY KEY NOT NULL,
Name varchar(100),
Population int
)


Finally, I populate the table variable with the County data I extract from the XML:


INSERT INTO @County (Name, Population)
SELECT Name, Population
FROM OPENXML(@idoc, '/XML/State/County', 1) WITH (
Name varchar(100),
Population int
)


In summary
, I’ve created code which extends the stored procedure so that it can handle multiple instances of a tag, whereas previously it was built to expect only a single instance of each tag. This enables me to properly recombobulate, if you will, the original XML passed by reference from a remote stored proc, then serialize all instances of the new tag and mess with them elsewhere in my code.


___

Special thanks to Greg Houston and his article Format My Source Code For Blogging which helped me compensate for difficulties I had with formatting my code.