My last place of employment had a large amount of data being imported from multiple companies on a daily basis which was then imported into a database and processed throughout the company.
I was the lucky guy who got to write the application which imported all of the data from as many as 10 different locations daily.
Having never directly dealt with XML before on a programatic level, I began researching and learning as I always do when I get to deal with a new technology.
After only a few moments of research, I was led to a
wonderful page which had, what seemed to be, an extremely complex way of creating XML in PHP.
WIth just a few more moments of research, I found
a great tutorial from a guy who was using the PHP - DOM method of creating XML.
I then used this tutorial to create my own application which took data from our db and created an XML file from it.
I assumed everyone creating XML used a program to create it.
Holy cow, was I in for a very rude awakening.
For our imports, I did some more research and found that importing XML was
much easier than creating it.
PHP 5's
SimpleXML capabilities were absolutely thrilling to discover. Simply import the XML file as an object, and then iterate through it like any other object - or use XPath - or a slew of other ways to easily access the necessary parts of the data.
So, while creating my import process to import customer's XML files, I used the simpleXML module to import the (supposedly) well-formed XML files from our partners.
About 2 months after deploying said import script, we noticed that entire chunks of data was missing intermitently. Well.... After some quick research, we found out that quite a few XML files we were to import weren't well-formed. They had ampersands throughout which hadn't been escaped, along with a few other "gotchas" which will make it impossible to import XML, without validating it yourself.
*sigh*
Evidently, this problem is a bit more rampant than just within my little programming history's bubble.
Hence, this blog entry.
PLEASE, PLEASE, PLEASE!!!
If you are creating XML which will be used by anyone but yourself, PLEASE use a program designed to create XML!
DON'T CREATE IT BY HAND.
Trust me, you'll save yourself and everyone who has to deal with your XML, a lot of pain and suffering.