03.05.2011. 16:17
Tags:phpregexpxml
PHP: Parsing XML Using Regexp
Here's one evergreen task: get values from external XML. There are zillions of ways to perform this, you can use PHP xml_parser (if you're into BDSM), or SimpleXML, etc, etc - but I prefer an old school kludge that I've made some 10 years ago in perl, using regular expressions. It's pretty efficient solution when XML format is fixed and known, e.g. for a collection of items like:
<items>
<item>
<property1>value1</property1>
<property2>value2</property2>
...
</item>
<item>
<property1>value1</property1>
<property2>value2</property2>
...
</item>
...
</items>
All you need is to hit a string with this regexp:
preg_match_all('/<'.$tag.'>((?:(?!<\/'.$tag.'>).)*)<\/'.$tag.'>/ms', $text, $matches);
now, everything's in $matches[1] array.
(Of course, nesting is not supported - this will grab the first closing tag)
So, here's a parser example:
<?php
/**
* Sample class
*/
class sampleXmlRegexpParser {
/**
* Parse import xml
*
* @param string $xml
* @return array
*/
public function parse($xml) {
$items = array();
foreach ($this->_getAllWrappedInTag($xml, 'items') as $itemTxt) {
$fields = array();
foreach (array(
'property1',
'property2',
//...
) as $key)
$fields[$key] = $this->_getTagValue($itemTxt, $key);
$items[] = $fields;
}
return $items;
}
/**
* Gets value by tag
*
* @param string $text
* @param string $tag
* @return mixed
*/
protected function _getTagValue($text, $tag) {
if (preg_match('/<'.$tag.'>((?:(?!<\/'.$tag.'>).)*)<\/'.$tag.'>/ms', $text, $matches))
return $matches[1];
else
return null;
}
/**
* Extracts all texts surrounded by given tag
*
* @param string $text
* @param string $tag
* @return array
*/
protected function _getAllWrappedInTag($text, $tag) {
if (preg_match_all('/<'.$tag.'>((?:(?!<\/'.$tag.'>).)*)<\/'.$tag.'>/ms', $text, $matches))
return $matches[1];
else
return array();
}
}
?>
