How can I parse an XML document into a Python object? -


i'm trying consume xml api. i'd have python objects represent xml data. have several xsd , example api responses documentation.

here's 1 example xml response:

<?xml version="1.0" encoding="utf-8" standalone="yes"?> <serial:serialheadertype xmlns:isan="http://www.isan.org/isan/isan"                          xmlns:title="http://www.isan.org/schema/v1.11/common/title"                          xmlns:serial="http://www.isan.org/schema/v1.21/common/serial"                          xmlns:externalid="http://www.isan.org/schema/v1.11/common/externalid"                          xmlns:common="http://www.isan.org/schema/v1.11/common/common"                          xmlns:participant="http://www.isan.org/schema/v1.11/common/participant"                          xmlns:language="http://www.isan.org/schema/v1.11/common/language"                          xmlns:country="http://www.isan.org/schema/v1.11/common/country">     <common:status>         <common:datatype>serial_header_type</common:datatype>         <common:isan root="0000-0002-3b9f"/>         <common:workstatus>active</common:workstatus>     </common:status>     <serial:serialheaderid root="0000-0002-3b9f"/>     <serial:maintitles>         <title:titledetail>             <title:title>braquo</title:title>             <title:language>                 <language:languagelabel>french</language:languagelabel>                 <language:languagecode>                     <language:codingsystem>iso639_2</language:codingsystem>                     <language:iso639_2code>fre</language:iso639_2code>                 </language:languagecode>             </title:language>             <title:titlekind>original</title:titlekind>         </title:titledetail>     </serial:maintitles>     <serial:totalepisodes>11</serial:totalepisodes>     <serial:totalseasons>0</serial:totalseasons>     <serial:minduration>         <common:timeunit>min</common:timeunit>         <common:timevalue>45</common:timevalue>     </serial:minduration>     <serial:maxduration>         <common:timeunit>min</common:timeunit>         <common:timevalue>144</common:timevalue>     </serial:maxduration>     <serial:minyear>2009</serial:minyear>     <serial:maxyear>2009</serial:maxyear>     <serial:mainparticipantlist>         <participant:participant>             <participant:firstname>frédéric</participant:firstname>             <participant:lastname>schoendoerffer</participant:lastname>             <participant:rolecode>dir</participant:rolecode>         </participant:participant>         <participant:participant>             <participant:firstname>karole</participant:firstname>             <participant:lastname>rocher</participant:lastname>             <participant:rolecode>act</participant:rolecode>         </participant:participant>     </serial:mainparticipantlist>     <serial:companylist>         <common:company>             <common:companykind>pro</common:companykind>             <common:companyname>r.t.b.f.</common:companyname>         </common:company>         <common:company>             <common:companykind>pro</common:companykind>             <common:companyname>capa drama</common:companyname>         </common:company>         <common:company>             <common:companykind>pro</common:companykind>             <common:companyname>marathon</common:companyname>         </common:company>     </serial:companylist> </serial:serialheadertype> 

i tried ignoring xsd , using lxml.objectify on xml i'd api. had problem namespaces. having refer every child node explicit namespace real pain , doesn't make readable code.

from lxml import objectify obj = objectify.fromstring(response) print obj.maintitles.titledetail # fail find element because need specify namespace print obj.maintitles['{http://www.isan.org/schema/v1.11/common/title}titledetail'] # or that, couldn't work, , i'd rather use attributes , not specify namespace 

so tried generateds create python class definitions me. i've lost error messages attempt gave me couldn't work. generate module each xsd gave wouldn't parse example xml.

i'm trying pyxb , seems nicer far. it's generating nicer definitions generateds (splitting them multiple, reusable modules) won't parse xml:

from models import serial obj = serial.createfromdocument(response)  traceback (most recent call last):   ...   file "/vagrant/isan/isan.py", line 58, in lookup     return serial.createfromdocument(resp.content)   file "/vagrant/isan/models/serial.py", line 69, in createfromdocument     instance = handler.rootobject()   file "/home/vagrant/venv/lib/python2.7/site-packages/pyxb/binding/saxer.py", line 285, in rootobject     raise pyxb.unrecognizeddomrootnodeerror(self.__rootobject) unrecognizeddomrootnodeerror: <pyxb.utils.saxdom.element object @ 0x2b53664dc850> 

the unrecognised node <serial:serialheadertype> node example. looking @ pyxb source seems error comes "if top-level element got processed dom instance" don't know means or how prevent it.

i've run out of steam trying explore this, don't know next.

unrecognizeddomrootnodeerror indicates pyxb not locate element in namespace has bindings registered. in case fails on first element, {http://www.isan.org/schema/v1.21/common/serial}serialheadertype.

the schema namespace defines complextype named serialheadertype not define element name serialheadertype. in fact defines no top-level elements. pyxb can't recognize it, , xml not validate.

either there's additional schema namespace you'll need locate provides elements, or message you're sending doesn't validate. may because somebody's expecting implicit mapping complex type element type, or because it's fragment found within other element qname member element name.

update: can hand-craft element in namespace adding following generated bindings in serial.py:

serialheadertype = pyxb.binding.basis.element(pyxb.namespace.expandedname(namespace, 'serialheadertype'), serialheadertype) namespace.addcategoryobject('elementbinding', serialheadertype.name().localname(), serialheadertype) 

if that, won't unrecognizeddomrootnodeerror incompleteelementcontenterror at:

<common:status>     <common:datatype>serial_header_type</common:datatype>     <common:isan root="0000-0002-3b9f"/>     <common:workstatus>active</common:workstatus> </common:status> 

which provides following details:

the containing element {http://www.isan.org/schema/v1.11/common/common}status defined @ common.xsd[243:3]. containing element type {http://www.isan.org/schema/v1.11/common/common}statustype defined @ common.xsd[289:1] {http://www.isan.org/schema/v1.11/common/common}statustype automaton not in accepting state. accepted content has been stored in instance following element , wildcard content accepted:     element {http://www.isan.org/schema/v1.11/common/common}activeisan per common.xsd[316:3]     element {http://www.isan.org/schema/v1.11/common/common}matchingisans per common.xsd[317:3]     element {http://www.isan.org/schema/v1.11/common/common}description per common.xsd[318:3] no content remains unconsumed 

reviewing schema confirms that, @ minimum, {http://www.isan.org/schema/v1.11/common/common}description element missing required.

so seems these documents not meant validated, , pyxb wrong technology use.


Comments

Popular posts from this blog

How has firefox/gecko HTML+CSS rendering changed in version 38? -

android - CollapsingToolbarLayout: position the ExpandedText programmatically -

Listeners to visualise results of load test in JMeter -