python - Scrapy finishes crawl prematurely -


i've had crawler running months last few weeks it's been finishing prematurely after few crawled pages out of tens of thousands of pages should crawled.

it's sitemapspider following sitemap_rules.

class foositemapspider(sitemapspider):     name = "foo"     sitemap_urls = ["http://www.foo.se/sitemap.xml"]     sitemap_rules = [         ('/bostad/', 'parse_house')     ] 

all url's want crawl looks this:

http://www.foo.se/bostad/address-1-259413 http://www.foo.se/bostad/address-2-275754 

there aprox 50,000+ pages should crawled, instead spider stops crawling after 0 crawled pages , handful of pages crawled, without error. says:

2015-06-25 19:37:38 [scrapy] info: closing spider (finished) 2015-06-25 19:37:38 [scrapy] info: dumping scrapy stats: {'downloader/request_bytes': 106313,  'downloader/request_count': 310,  'downloader/request_method_count/get': 310,  'downloader/response_bytes': 2809108,  'downloader/response_count': 310,  'downloader/response_status_count/200': 309,  'downloader/response_status_count/404': 1,  'file_count': 21,  'file_status_count/downloaded': 21,  'finish_reason': 'finished',  'finish_time': datetime.datetime(2015, 6, 25, 17, 37, 38, 154000),  'item_scraped_count': 4,  'log_count/debug': 1717,  'log_count/info': 9,  'log_count/warning': 8,  'request_depth_max': 2,  'response_received_count': 310,  'scheduler/dequeued': 289,  'scheduler/dequeued/memory': 289,  'scheduler/enqueued': 289,  'scheduler/enqueued/memory': 289,  'start_time': datetime.datetime(2015, 6, 25, 17, 35, 51, 868000)} 2015-06-25 19:37:38 [scrapy] info: spider closed (finished) 

i've tried changing user_agent, download_delay , server/ip run spider from, make sure it's not target stopping requests.

any ideas? suggestions of should debug? it's difficult since no errors.

here complete log of crawl 0 errors: http://pastebin.com/psqx6bck


Comments

Popular posts from this blog

How has firefox/gecko HTML+CSS rendering changed in version 38? -

javascript - Complex json ng-repeat -

jquery - Cloning of rows and columns from the old table into the new with colSpan and rowSpan -