Saturday, November 27, 2010

Pretty Formatting for XML or HTML

Sometimes it's helpful to pretty up XML or HTML in order to more easily analyze it. Many tools create machine-readable XML or HTML, which is well-suited to being parsed by computers, but whose formatting can be painful for a human to wade through. 

You might think, why do humans need to be able to work with machine-readable text, anyway? Well, if you happen to be the new guy on the job and get stuck with having to debug some code that spits out some XML or HTML you've never dealt with before for some complex data transformation processing, you can see why it might be necessary, particularly if something changes between the sender and receiver and the XML, as produced, no longer does the job.

In Notepad++, one of my favorite text editors, I can utilize the TextFX plugin to take a given chunk of XML or HTML and clean it up, neatly indenting the nodes and attributes for easier examination. I paste the text into the editor and select TextFX => TextFX HTML Tidy => TiDy clean document - wrap. Finally I select Language => XML or HTML to enable syntax highlighting.

Prior to discovering Notepad++, I had discovered the freeware MoreMotion XML Editor a few years ago when I was having to dissect some XML being output by a tool whose XML output I needed to clean up. Provided the XML was well-formed, I could very quickly pretty it up by clicking XML => Pretty Format, or even more easily with a quick key combo, Shift-Ctrl-P.

BEFORE: Messy, messy XML!


AFTER: Ahhh, clean, legible, structured XML.

Unfortunately, the original download URL for MoreMotion XML Editor appears to be defunct, leading to a parked page rather than the actual file. You can download MoreMotion's more comprehensive XML application suite here, but if you just need relatively simple XML editing provided by the older (and much smaller, around 1.3 MB) freeware tool, this working download link fits the bill. The ZIP file contains the EXE, which can be copied to a convenient folder and run.

A few caveats, however. MoreMotion seems to have a few editing issues with extremely long lines of text; you may notice keypresses on such lines may take a long time to register. Also, whereas Notepad++ will do its best to deal with poorly-formed XML or HTML and log any errors it encounters, MoreMotion will not allow you to proceed until the errors are resolved, and in my experience some of these errors can be difficult to pin down.

Nevertheless, for relatively small chunks of well-formed XML or HTML, MoreMotion XML Editor makes cleaning up XML or HTML just a few steps quicker than Notepad++.


No comments: