Scrapy - How can I obtain the 'src' value for a 'script' tag -
i'm trying obtain 'src' values 'script' tags using scrapy (http://scrapy.org/).
i can no problem images:
for sel in response.xpath('//img'): item = elsrc() item['src'] = sel.xpath('@src').extract() yield item
sample output:
{"src": ["http://ecx.images-amazon.com/images/i/51ubhvgfefl._ac_sx75_.jpg"]},
however, same thing script tag doesn't seem work:
for sel in response.xpath('//script'): item = elsrc() item['src'] = sel.xpath('@src').extract() yield item
sample output:
{"src": []},
i confirmed manually script tags on page in question did indeed have 'src' values present. i've tried number of other approaches utilizing scrapy shell no avail.
has else been able obtain 'src' values 'script' tag using scrapy, , if how did it?
thanks!
uggg. in horribly formatted page looking @ 'src' appeared populated. paul trmbrth's comment prompted me examine things again , setup simpler test page validate findings. believe solved. moral of story: utilize clean, easy read code testing purposes, , set simple environments before tackle complex production items.
Comments
Post a Comment