python - Scrapy gives URLError: <urlopen error timed out> -


so have scrapy program trying off ground can't code execute comes out error below.

i can still visit site using scrapy shell command know url's , stuff work.

here code

from scrapy.spiders import crawlspider, rule scrapy.linkextractors import linkextractor malscraper.items import malitem  class malspider(crawlspider):   name = 'mal'   allowed_domains = ['www.website.net']   start_urls = ['http://www.website.net/stuff.php?']   rules = [     rule(linkextractor(         allow=['//*[@id="content"]/div[2]/div[2]/div/span/a[1]']),         callback='parse_item',         follow=true)   ]    def parse_item(self, response):     mal_list = response.xpath('//*[@id="content"]/div[2]/table/tr/td[2]/')      mal in mal_list:       item = malitem()       item['name'] = mal.xpath('a[1]/strong/text()').extract_first()       item['link'] = mal.xpath('a[1]/@href').extract_first()        yield item 

edit: here trace.

traceback (most recent call last):   file "c:\users\2015\anaconda\lib\site-packages\boto\utils.py", line 210, in retry_url     r = opener.open(req, timeout=timeout)   file "c:\users\2015\anaconda\lib\urllib2.py", line 431, in open     response = self._open(req, data)   file "c:\users\2015\anaconda\lib\urllib2.py", line 449, in _open     '_open', req)   file "c:\users\2015\anaconda\lib\urllib2.py", line 409, in _call_chain     result = func(*args)   file "c:\users\2015\anaconda\lib\urllib2.py", line 1227, in http_open     return self.do_open(httplib.httpconnection, req)   file "c:\users\2015\anaconda\lib\urllib2.py", line 1197, in do_open     raise urlerror(err) urlerror: <urlopen error timed out> 

edit2:

so scrapy shell command able manipulate responses noticed same exact error comes again when visiting site

edit3:

i finding error shows on every website use shell command with, able manipulate response still.

edit4: how verify atleast receiving response scrapy when running crawl command? don't know if code reason logs turns empty or error ?

here settings.py

bot_name = 'malscraper'  spider_modules = ['malscraper.spiders'] newspider_module = 'malscraper.spiders' feed_uri = 'logs/%(name)s/%(time)s.csv' feed_format = 'csv' 

there's open scrapy issue problem: https://github.com/scrapy/scrapy/issues/1054

although seems warning on other platforms.

you can disable s3downloadhandler (that causing error) adding scrapy settings:

download_handlers = {   's3': none, } 

Comments

Popular posts from this blog

How has firefox/gecko HTML+CSS rendering changed in version 38? -

android - CollapsingToolbarLayout: position the ExpandedText programmatically -

Listeners to visualise results of load test in JMeter -