python - Scrapy gives URLError: <urlopen error timed out> -
so have scrapy program trying off ground can't code execute comes out error below.
i can still visit site using scrapy shell
command know url's , stuff work.
here code
from scrapy.spiders import crawlspider, rule scrapy.linkextractors import linkextractor malscraper.items import malitem class malspider(crawlspider): name = 'mal' allowed_domains = ['www.website.net'] start_urls = ['http://www.website.net/stuff.php?'] rules = [ rule(linkextractor( allow=['//*[@id="content"]/div[2]/div[2]/div/span/a[1]']), callback='parse_item', follow=true) ] def parse_item(self, response): mal_list = response.xpath('//*[@id="content"]/div[2]/table/tr/td[2]/') mal in mal_list: item = malitem() item['name'] = mal.xpath('a[1]/strong/text()').extract_first() item['link'] = mal.xpath('a[1]/@href').extract_first() yield item
edit: here trace.
traceback (most recent call last): file "c:\users\2015\anaconda\lib\site-packages\boto\utils.py", line 210, in retry_url r = opener.open(req, timeout=timeout) file "c:\users\2015\anaconda\lib\urllib2.py", line 431, in open response = self._open(req, data) file "c:\users\2015\anaconda\lib\urllib2.py", line 449, in _open '_open', req) file "c:\users\2015\anaconda\lib\urllib2.py", line 409, in _call_chain result = func(*args) file "c:\users\2015\anaconda\lib\urllib2.py", line 1227, in http_open return self.do_open(httplib.httpconnection, req) file "c:\users\2015\anaconda\lib\urllib2.py", line 1197, in do_open raise urlerror(err) urlerror: <urlopen error timed out>
edit2:
so scrapy shell command
able manipulate responses noticed same exact error comes again when visiting site
edit3:
i finding error shows on every website use shell command
with, able manipulate response still.
edit4: how verify atleast receiving response scrapy when running crawl command
? don't know if code reason logs turns empty or error ?
here settings.py
bot_name = 'malscraper' spider_modules = ['malscraper.spiders'] newspider_module = 'malscraper.spiders' feed_uri = 'logs/%(name)s/%(time)s.csv' feed_format = 'csv'
there's open scrapy issue problem: https://github.com/scrapy/scrapy/issues/1054
although seems warning on other platforms.
you can disable s3downloadhandler (that causing error) adding scrapy settings:
download_handlers = { 's3': none, }
Comments
Post a Comment