Tuesday, November 22, 2005

PHP5 reading XML using DOM

Working with XML Files in php5 you definetly want to work with DOM. Using DOM is very easy, but unfortunately the documentation is not complete and there are one or two things missing, which stop your work for several hours.

We use XML for multiple languages on websites, so our content is in the XML File written in XHTML.
If you use XML Files for different Languages like for example german, you need specific characters. This made us some troubles, but in the end it's more than easy.

First you have to tell the xml File which language you are using in the "xml" header through using encoding. For Example a german File looks like this:
<?xml version="1.0" encoding="iso-8859-1">
Next if you are using HTML Entities like or ü you have to define a DOCTYPE. Since we used XHTML Entities we used a dtd from w3.org:
<!DOCTYPE shiva SYSTEM "http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent">
Now we get to the PHP Part:
Loading a XML File using DOM is very easy. First you have to create a new DOMDocument Object and then load the File:
$oXML = new DOMDocument();
$oXML->load(Filename with Path);
Now if you have a external DOCTYPE like we did, you have to enable a Variable or it will not work before the ->load(..) Function:
$oXML->resolveExternals = true;
This enables to read external DOCTYPE Files.
Remember: If you are just using a couple of Entities, it's quicker to just put them directly in the File or create your own File with just the entities you need and link to that File. There were some comments on php.net that enabling that Variable will slow down the load Procedure for some time.

Now the load Procedure will not send you any more Warning Messages and possible an Exception, but you will still miss the HTML Entities in your Object.
Instead of Hello World which should look like that: Hello World, you will get a HelloWorld. You are still missing the Final Variable:
$oXML->substituteEntities = true;
With that you will get your Hello World and all other Entities your DOCTYPE defines.

Addition:
We had the additional Problem that we used htmlentities in the text like "&amp;" .. but we needed it also in php this way and using a doctype from w3c converted our entity into &, so we wrote our own Doctype File where we made a little hack so we could use htmlentities.

To use "&uuml ;" for ü we wrote into the dtd File the following:
<!ENTITY uuml "&amp;uuml ;">

No comments: