Scraping an E-Commerce using Selenium in Python, Crawling pages || Automate boring stuff

Sep 28, 2022 admin 52 Comments 132 Views

#selenium #python #automateboring stuff
In this video we will cover below-mentioned topics:

The video is a working example of how to create a web scraper for an E-commerce website, it's a real working example in less than a 30 minutes.

The course material is in github repo:

Please subscribe to my channel by clicking on the link:

admin

https://www.educational.guru

52 comments

abu kaium

November 30, -0001 at 12:00 am

First of all , thanks a lot for your great videos. subscribed. by the way, i am scraping same website following your tutorial but i am facing 2 problems when i go from one page to another( one page scraped but problem starts with 2nd page)
here are. 1.
Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1464.0 Safari/537.36
100%|██████████| 6/6 [00:05<00:00, 1.08it/s]
Traceback (most recent call last):
File "C:/Users/kaium/twitter/ecommerce.py", line 93, in
p_a = p.find_element_by_tag_name(‘a’)
File “C:UserskaiumAnaconda3libsite-packagesseleniumwebdriverremotewebelement.py”, line 305, in find_element_by_tag_name
return self.find_element(by=By.TAG_NAME, value=name)
File “C:UserskaiumAnaconda3libsite-packagesseleniumwebdriverremotewebelement.py”, line 658, in find_element
return self._execute(Command.FIND_CHILD_ELEMENT,
File “C:UserskaiumAnaconda3libsite-packagesseleniumwebdriverremotewebelement.py”, line 633, in _execute
return self._parent.execute(command, params)
File “C:UserskaiumAnaconda3libsite-packagesseleniumwebdriverremotewebdriver.py”, line 321, in execute
self.error_handler.check_response(response)
File “C:UserskaiumAnaconda3libsite-packagesseleniumwebdriverremoteerrorhandler.py”, line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {“method”:”css selector”,”selector”:”a”}
(Session info: chrome=90.0.4430.212)

2. it shows cookie”We use cookies to make your Web Scraper experience better. Learn more.
Accept & Continue”
how to solve this line.
I will be very grateful if you answer me.
Thanks in advance!

reply to comment →
1. Technology for Noobs
  
  November 30, -0001 at 12:00 am
  
  I am not really sure how else to say it, all this techniques are to prevent their website to get scraped, which cannot be completely by passed, you need to write logic which you do when you login as a user, thats the whole point of scraping automations, mimicking a user actions.
  
  reply to comment →
OMAYMA LAICHIR

November 30, -0001 at 12:00 am

Thanks for this is amazing tutorial. But I have a problem, I`m using the same script to run in a different website,and I am getting a problem with my code, it gets stuck at the second page and give me an error. The error is that ‘the element is not attached to the page document.
and I’m having problems in the while loop,it’s gave me just the elements of the first page and the page were te code have stopped but repeted the long of the looping.can you please help me solving this issue??

reply to comment →
1. Technology for Noobs
  
  November 30, -0001 at 12:00 am
  
  the same script would not work on diff website, you need to write the logic based on the new website.
  
  reply to comment →
王冠信

November 30, -0001 at 12:00 am

Thank you for sharing the web scraping step by step with clear explanations, and your video editing is amazing as well.

reply to comment →
1. Technology for Noobs
  
  November 30, -0001 at 12:00 am
  
  Glad it was helpful!
  
  reply to comment →
Cristh Tejada

November 30, -0001 at 12:00 am

Thanks for this is amazing tutorial. But I have a problem, I`m using the same script to run in a different website, but while collecting the product links at some point the script duplicate half of the links and at the end I get the double of data I should. How can I avoid this problem? And how I can set to get information from only 10 pages of products., or restrict the pagination. Thanks

reply to comment →
1. Technology for Noobs
  
  November 30, -0001 at 12:00 am
  
  Below are some pointers which might help you:
  1. Same script for different websites lucky you, you need to write custom code for all websites something or other thing would be different, so you need to accommodate those changes
  2. Duplicate records are easy to handle in post processing, load them in pandas and de-duplicate them using pandas functionality
  3. Restricting the pagination is even simpler then above two points, either use for loop which stops at 10 or write a condition in while loop which moves out at 10.
  
  reply to comment →
Praneeth Babu

November 30, -0001 at 12:00 am

Hi, great tutorial, really helped me a lot. I have a question, how do I scrape stuff from a website with “Show More” button but it doesn’t change the URL after clicking it?

reply to comment →
1. Technology for Noobs
  
  November 30, -0001 at 12:00 am
  
  The approach would be >
  Load the page,
  Scroll to the bottom and click the “Show more”.
  Scroll till it does not show the “Show more”
  then scrape the data.
  If page changes then scrape data accordingly
  
  reply to comment →
Oladipo O. Racheal

November 30, -0001 at 12:00 am

This is really helpful. I like that u didn’t make it seemingly perfect without error bugs. Those errors made me feel safe while working on it

reply to comment →
1. Technology for Noobs
  
  November 30, -0001 at 12:00 am
  
  Glad it helped!
  
  reply to comment →
Manasa Devi Noolu

November 30, -0001 at 12:00 am

Thank you so much, your video helped me a lot. Your way of explanation is super and clear.

reply to comment →
1. Technology for Noobs
  
  November 30, -0001 at 12:00 am
  
  Glad it helped!
  
  reply to comment →
Jack Bird

November 30, -0001 at 12:00 am

awesome video tutorial, thank you!! also from time to time, I am using e-scraper scraping service on demand. maybe it will help somebody too.

reply to comment →
1. Technology for Noobs
  
  November 30, -0001 at 12:00 am
  
  Sounds great!
  
  reply to comment →
Asif Abdullah

November 30, -0001 at 12:00 am

Waww Amazing sir its awesome please make more videos on this and there is two question if you please answer 1. if the website is dynamic like weedmaps then what we have to add more codes.,,,,,2. if we do same you did then after getting some data website ask for human verification how can we pass human verification or how can we get data to avoid human verification or captcha thanks waiting you reply please

reply to comment →
1. Praveen Kumar
  
  December 3, 2016 at 9:55 pm
  
  Hi guys, I tried scraping a Ecommerce site through Selenium like the same way in this video but it took 30 seconds for getting just one product details. It was too slow. Could you pls help
  
  reply to comment →
2. Asif Abdullah
  
  November 30, -0001 at 12:00 am
  
  @Technology for Noobs thanks for reply sir, there is one extension called web scraper and also they have website website is expensive but i saw there website they are using some methods like proxy and delay time as you said but i don’t know how to create tool like this can you suggest me how can i learn these things i am student of python how can you help me in this ..?
  
  reply to comment →
3. Technology for Noobs
  
  November 30, -0001 at 12:00 am
  
  Hi, just to be clear, most of the websites you scrape kind of allow you to do, and where you get all the human verification popups are meant to stop you from scraping as with high-speed request you are increasing load in their server and for which they are paying, the simple solution is to introduce some randomness in your scraping activity, slow down the speed, and don’t use any login credentials, as it directly hit them and tells you are the one, there are too many standards approached which people follow. Just read about it you will get the idea, and my suggestion if you have to scrape such a website, don’t take that work, it’s not gonna work for huge scraping projects.
  
  reply to comment →
Mohammad Haris

November 30, -0001 at 12:00 am

Hi. Great work.
I am getting a problem with my code, it gets stuck at page 2. The error says that ‘the element is not attached to the page document’. I know that this means that it cannot find ‘h4’ in the page. I have tried it by using xpath, still it cannot find the element. Is there a solution to this?

reply to comment →
1. Technology for Noobs
  
  November 30, -0001 at 12:00 am
  
  Could you please pull the latest code from git repo, I had fixed some code there
  
  reply to comment →
2. Mohammad Haris
  
  November 30, -0001 at 12:00 am
  
  @Technology for Noobs Yes I am following your tutorial. Works fine for page 1 but gets stuck on page 2 and shows the error I have mentioned above.
  
  reply to comment →
3. Technology for Noobs
  
  November 30, -0001 at 12:00 am
  
  How can I help you? Are you using the same website and same code?
  
  reply to comment →
Amir Majeed

November 30, -0001 at 12:00 am

Thanks for perfect tutorial but i have problem at 23:24 with same code as it is not installing the package and showing error …. need your help regarding this .

reply to comment →
1. Technology for Noobs
  
  November 30, -0001 at 12:00 am
  
  Pip install tqdm should work, just make sure activate your env if you have one
  
  reply to comment →
2. Amir Majeed
  
  November 30, -0001 at 12:00 am
  
  waiting for reply
  
  reply to comment →
3. Amir Majeed
  
  November 30, -0001 at 12:00 am
  
  @Technology for Noobs Tqdm package
  
  reply to comment →
4. Technology for Noobs
  
  November 30, -0001 at 12:00 am
  
  which package?
  
  reply to comment →
Trial Account

November 30, -0001 at 12:00 am

can you please explain 9:29 why you used [-1] what is the significance of using it here?

reply to comment →
1. Technology for Noobs
  
  November 30, -0001 at 12:00 am
  
  -1 of a list gives the last element of the list, see list comprehension for more details.
  
  reply to comment →
AFFAN

November 30, -0001 at 12:00 am

Thank you Man TBH very well explained

reply to comment →
1. Technology for Noobs
  
  November 30, -0001 at 12:00 am
  
  Glad it helped
  
  reply to comment →
SHUBHAM ABHANGE

November 30, -0001 at 12:00 am

absolutely amazing

reply to comment →
1. Technology for Noobs
  
  November 30, -0001 at 12:00 am
  
  Super!
  
  reply to comment →
Guillermo Rodríguez

November 30, -0001 at 12:00 am

Hi. Thaks for the video. I’m having problems in the while loop. For some reason, while loop doesn’t stop after we get to page 20 and it continues scraping the links of the items in it. Can someone help me?

reply to comment →
1. Technology for Noobs
  
  November 30, -0001 at 12:00 am
  
  The problem is I needed to change the code for the same issue, my suggestion would be to pull the code from github link and try again
  
  reply to comment →
2. Guillermo Rodríguez
  
  November 30, -0001 at 12:00 am
  
  @Technology for Noobs I copied and paste exactly what you wrote, but the while loop never exits :/
  
  reply to comment →
3. Technology for Noobs
  
  April 10, 2016 at 6:47 pm
  
  Check the condition!
  
  reply to comment →
Unik Mandakh

November 30, -0001 at 12:00 am

GoodJob mate 🙂 from many tutorials yours only one worked from start to end 2021 Jan for Python3

reply to comment →
1. Technology for Noobs
  
  November 30, -0001 at 12:00 am
  
  Great to hear!
  
  reply to comment →
Carlos Luis

November 30, -0001 at 12:00 am

How to scraping date interval, for example: scraping just information between 02/01/2021 to 02/28/2021? Suppose each object contain a date and I just want the ones between an especific range of date.

reply to comment →
1. Carlos Luis
  
  November 30, -0001 at 12:00 am
  
  @Technology for Noobs How can I do if the object doesn’t have a link? Just a span class?
  
  reply to comment →
2. Technology for Noobs
  
  November 30, -0001 at 12:00 am
  
  If the website gives the info infilter format then good, but if not, then we need to filter it during the processing.
  
  reply to comment →
ii oo

November 30, -0001 at 12:00 am

Thaank yoouu so much for your efforts.
I have just a qst: how to extract the data into a csv/axcel file?

reply to comment →
1. ii oo
  
  November 30, -0001 at 12:00 am
  
  thaanks
  
  reply to comment →
2. Technology for Noobs
  
  November 30, -0001 at 12:00 am
  
  use pandas library and use .to_csv(filename) this will export the data for you
  
  reply to comment →
3. Toko Independen
  
  November 30, -0001 at 12:00 am
  
  @Technology for Noobs please make complete tutorial sir
  
  reply to comment →
Elias Grayde

November 30, -0001 at 12:00 am

As I told you I have another problem with the pager loop, the web page I am making is: ferreteria.cl and the code that is not working for me is this, to see if you can help me:

listOflinks = []
condition = True
#while condition:
productInfoList = webD.find_elements_by_class_name (‘prod’)
for the in productInfoList:
    ppp = el.find_element_by_class_name (‘img’)
    listOflinks.append (ppp.get_attribute (‘href’))
try:
    kk = webD.find_elements_by_class_name (‘paginate’) [- 1]
    print (kk.get_attribute (‘aria-label’))
    if kk.get_attribute (‘aria-label’) == ‘Next’:
        kk.click ()

reply to comment →
1. Technology for Noobs
  
  November 30, -0001 at 12:00 am
  
  Your code seems to be working fine in my system.
  
  listOflinks = []
  condition = True
  while condition:
  productInfoList = webD.find_elements_by_class_name(‘prod’)
  for the in productInfoList:
  ppp = the.find_element_by_class_name(‘img’)
  listOflinks.append (ppp.get_attribute(‘href’))
  try:
  kk = webD.find_elements_by_class_name(‘paginate’)[- 1]
  print (kk.get_attribute (‘aria-label’))
  if kk.get_attribute (‘aria-label’) == ‘Next’:
  kk.click()
  except:
  print (‘Except’)
  pass
  
  reply to comment →
adeela ashraf

November 30, -0001 at 12:00 am

Thank you so much for the great work. Can you please show how to handle stale element reference issue?

reply to comment →
1. Technology for Noobs
  
  November 30, -0001 at 12:00 am
  
  stale meaning page is loaded for a long time, or something has changed
  
  reply to comment →

Upper surface of the impenetrable foliage

C Language Tutorial in Hindi | C Programming in Hindi | C Language Complete Course for beginners

Angular 12 Project Tutorial, Complete Restaurant Project from Scratch in Angular, Testycodeiz

Surflputate posuere nunc sit amet ultrices

Python Basics | Python Tutorial For Beginners | Learn Python Programming from Scratch | Edureka

Scraping an E-Commerce using Selenium in Python, Crawling pages || Automate boring stuff

52 comments

Leave a Comment Cancel reply

Login

Share on

Related Posts

52 comments

Leave a Comment Cancel reply