python - Scrapy finishes crawl prematurely -

- May 15, 2010

i've had crawler running months last few weeks it's been finishing prematurely after few crawled pages out of tens of thousands of pages should crawled.

it's sitemapspider following sitemap_rules.

class foositemapspider(sitemapspider):     name = "foo"     sitemap_urls = ["http://www.foo.se/sitemap.xml"]     sitemap_rules = [         ('/bostad/', 'parse_house')     ]

all url's want crawl looks this:

http://www.foo.se/bostad/address-1-259413 http://www.foo.se/bostad/address-2-275754

there aprox 50,000+ pages should crawled, instead spider stops crawling after 0 crawled pages , handful of pages crawled, without error. says:

2015-06-25 19:37:38 [scrapy] info: closing spider (finished) 2015-06-25 19:37:38 [scrapy] info: dumping scrapy stats: {'downloader/request_bytes': 106313,  'downloader/request_count': 310,  'downloader/request_method_count/get': 310,  'downloader/response_bytes': 2809108,  'downloader/response_count': 310,  'downloader/response_status_count/200': 309,  'downloader/response_status_count/404': 1,  'file_count': 21,  'file_status_count/downloaded': 21,  'finish_reason': 'finished',  'finish_time': datetime.datetime(2015, 6, 25, 17, 37, 38, 154000),  'item_scraped_count': 4,  'log_count/debug': 1717,  'log_count/info': 9,  'log_count/warning': 8,  'request_depth_max': 2,  'response_received_count': 310,  'scheduler/dequeued': 289,  'scheduler/dequeued/memory': 289,  'scheduler/enqueued': 289,  'scheduler/enqueued/memory': 289,  'start_time': datetime.datetime(2015, 6, 25, 17, 35, 51, 868000)} 2015-06-25 19:37:38 [scrapy] info: spider closed (finished)

i've tried changing user_agent, download_delay , server/ip run spider from, make sure it's not target stopping requests.

any ideas? suggestions of should debug? it's difficult since no errors.

here complete log of crawl 0 errors: http://pastebin.com/psqx6bck

Search This Blog

Alconcel

python - Scrapy finishes crawl prematurely -

Comments

Post a Comment

Popular posts from this blog

c# - Where does the .ToList() go in LINQ query result -

Listeners to visualise results of load test in JMeter -

android - CollapsingToolbarLayout: position the ExpandedText programmatically -