Scrapy

Used: Python 2.79, scrapy, Git
System: OS X (10.x)

How to run it:
scrapy crawl dalin -o dainhuang.json -t json

Category path:
Department >> Sub_Department >> Sub_Sub_Department >> this_category
(example): Home >> TV & Video >> Televisions >> 25 - 31" Televisions

Features:
• Fast speed with 6639 Products retrieved in 5mins (DOWNLOAD_DELAY = 0.01), 7mins (DOWNLOAD_DELAY = 0.01)
• Products info were retrieved in product list pages (15 product each page)
hence it is much faster than going to product pages • No wasted request, spider is guided through categories
• Sub_Departments Gift Card and Bundles are special treated since page formates are different

Known Issues:
• Output Data Structures (JSON) are not well formated, but it is very easy to handle data with JSON
• Some of the Category names shown in http://www.visions.ca/ home page are different inside sub categories pages
[ I used the categories path shown in the product page ]

REQUIREMENT:

A simple tool that will scrape product information from http://www.visions.ca/, returning at a minimum the following information:
• The product categories available on the website
• At least one product per category
Each product returned should have the following information:
• Product title
• Product sale or regular price where applicable
• Product availability
As a bonus (but not doing this will not count against you), you may want to use the following your solution:
• Python
• The scrapy, lxml, or requests python libraries
• Xpaths or CSS selectors

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
ScrapyDalinHuang		ScrapyDalinHuang
README.md		README.md
dainhuang.json		dainhuang.json
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scrapy

About

Releases

Packages

Languages

dalindev/Scrapy-crawler

Folders and files

Latest commit

History

Repository files navigation

Scrapy

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages