python - Dynamically change scrapy Request scheduler priority -


i'm using scrapy perform test on internal web app. once tests done, use crawlspider check everywhere, , run each response html validator , 404 media files.

it work except this: crawl @ end, get things in random order... so, url perform delete operation being executed before other operations.

i schedule delete @ end. tried many way, such kind of scheduler:

from scrapy import log

class deletedelayer(object):     def enqueue_request(self, spider, request):         if request.url.find('delete') != -1:             log.msg("delay %s" % request.url, log.debug)             request.priority = 50 

but not work... see delete being "delay" in log executed during execution.

i thought of using middleware can pile in memory delete url , when spider_idle signal called put them in, i'm not sure on how this.

what best way acheive this?

  1. default priority request 0, set priority 50 not work
  2. you can use middleware collect (insert requests own queue, e.g, redis set) , ignore (return ingnorerequest exception) 'delete' request
  3. start 2nd crawl requests load queue in step 2

Comments

Popular posts from this blog

SPSS keyboard combination alters encoding -

Add new record to the table by click on the button in Microsoft Access -

CSS3 Transition to highlight new elements created in JQuery -