03.05.2011. 16:17
Tags:phpregexpxml
PHP: Parsing XML Using Regexp
Here's one evergreen task: get values from external XML. There are zillions of ways to perform this, you can use PHP xml_parser (if you're into BDSM), or SimpleXML, etc, etc - but I prefer an old school kludge that I've made some 10 years ago in perl, using regular expressions. It's pretty efficient solution when XML format is fixed and known, e.g. for a collection of items like:
<items> <item> <property1>value1</property1> <property2>value2</property2> ... </item> <item> <property1>value1</property1> <property2>value2</property2> ... </item> ... </items>
All you need is to hit a string with this regexp:
preg_match_all('/<'.$tag.'>((?:(?!<\/'.$tag.'>).)*)<\/'.$tag.'>/ms', $text, $matches);
now, everything's in $matches[1] array.
(Of course, nesting is not supported - this will grab the first closing tag)
So, here's a parser example:
<?php /** * Sample class */ class sampleXmlRegexpParser { /** * Parse import xml * * @param string $xml * @return array */ public function parse($xml) { $items = array(); foreach ($this->_getAllWrappedInTag($xml, 'items') as $itemTxt) { $fields = array(); foreach (array( 'property1', 'property2', //... ) as $key) $fields[$key] = $this->_getTagValue($itemTxt, $key); $items[] = $fields; } return $items; } /** * Gets value by tag * * @param string $text * @param string $tag * @return mixed */ protected function _getTagValue($text, $tag) { if (preg_match('/<'.$tag.'>((?:(?!<\/'.$tag.'>).)*)<\/'.$tag.'>/ms', $text, $matches)) return $matches[1]; else return null; } /** * Extracts all texts surrounded by given tag * * @param string $text * @param string $tag * @return array */ protected function _getAllWrappedInTag($text, $tag) { if (preg_match_all('/<'.$tag.'>((?:(?!<\/'.$tag.'>).)*)<\/'.$tag.'>/ms', $text, $matches)) return $matches[1]; else return array(); } } ?>