1、Windows下安装Redis服务

下载Windows的安装包地址:
https://github.com/microsoftarchive/redis/releases/download/win-3.0.504/Redis-x64-3.0.504.msi

redis可视化工具软件包:
https://github.com/uglide/RedisDesktopManager/releases/download/0.9.3/redis-desktop-manager-0.9.3.817.exe

安装不懂和配置步骤也可以参考:
推荐参考下面第一个博客,很详细的安装和配置:https://www.cnblogs.com/jaign/articles/7920588.html

https://jingyan.baidu.com/article/0f5fb099045b056d8334ea97.html

2、Windows下scrapy-redis的安装与配置

2.1、安装:

pip install scrapy-redis

2.2、setting配置:

1、是否遵守robot协议:一般选择不遵守
ROBOTSTXT_OBEY = False

2、显示log设置:
_LEVEL = 'DEBUG'
LOG_LEVEL = "WARNING"

3、scrapy-redis设置:
# redis
DUPEFILTER_CLASS = "scrapy_redis.dupefilter.RFPDupeFilter"
SCHEDULER = "scrapy_redis.scheduler.Scheduler"
SCHEDULER_PERSIST = True
DOWNLOAD_DELAY = 3

REDIS_URL = 'redis://192.168.12.209:6379/8'

# redis数据库没有密码情况
# REDIS_URL = 'redis://30.6.252.40:6379/1'
# redis数据库有密码情况
REDIS_HOST = '30.6.252.40'
REDIS_PORT = 6379
REDIS_PARAMS = {
    'password': '123456',
}
# 另一种有密码情况 Windows账户名
REDIS_URL = 'redis://Windows账户名:123456@127.0.0.1:6379/10'

# item存放在redis中的配置
ITEM_PIPELINES = {
    'ddbooks.pipelines.InfoPipeline': 100,
    'ddbooks.pipelines.DdbooksPipeline': 300,
    # 'scrapy_redis.pipelines.RedisPipeline': 400,
}

3、scrapy-redis三个模板

可以去参考:
官方文档:https://scrapy-redis.readthedocs.io/en/stable/
源码位置:https://github.com/rmax/scrapy-redis

一、CrawlSpider的继承与设置:

from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule

class DmozSpider(CrawlSpider):
    name = 'dmoz'
    allowed_domains = ['dmoztools.net']
    start_urls = ['http://www.dmoztools.net/']

    rules = [
        Rule(LinkExtractor(restrict_xpaths=(“”)), callback='parse_directory', follow=True),
    ]

二、RedisSpider的继承与设置:

from scrapy_redis.spiders import RedisSpider

class MySpider(RedisSpider):
    name = 'myspider_redis'
    redis_key = 'myspider:start_urls'

三、RedisCrawlSpider的继承与设置:

from scrapy.spiders import Rule
from scrapy.linkextractors import LinkExtractor
from scrapy_redis.spiders import RedisCrawlSpider

class MyCrawler(RedisCrawlSpider):
    name = 'mycrawler_redis'
    redis_key = 'mycrawler:start_urls'

    rules = (    
        Rule(LinkExtractor(restrict_xpaths=(“”)), callback='parse_page', follow=True),
    )