Login

Lost your password?
Don't have an account? Sign Up

Scraping an E-Commerce using Selenium in Python, Crawling pages || Automate boring stuff

#selenium #python #automateboring stuff
In this video we will cover below-mentioned topics:

The video is a working example of how to create a web scraper for an E-commerce website, it's a real working example in less than a 30 minutes.

The course material is in github repo:

Please subscribe to my channel by clicking on the link:

https://www.educational.guru

52 comments

  1. abu kaium

    First of all , thanks a lot for your great videos. subscribed. by the way, i am scraping same website following your tutorial but i am facing 2 problems when i go from one page to another( one page scraped but problem starts with 2nd page)
    here are. 1.
    Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1464.0 Safari/537.36
    100%|██████████| 6/6 [00:05<00:00, 1.08it/s]
    Traceback (most recent call last):
    File "C:/Users/kaium/twitter/ecommerce.py", line 93, in
    p_a = p.find_element_by_tag_name(‘a’)
    File “C:UserskaiumAnaconda3libsite-packagesseleniumwebdriverremotewebelement.py”, line 305, in find_element_by_tag_name
    return self.find_element(by=By.TAG_NAME, value=name)
    File “C:UserskaiumAnaconda3libsite-packagesseleniumwebdriverremotewebelement.py”, line 658, in find_element
    return self._execute(Command.FIND_CHILD_ELEMENT,
    File “C:UserskaiumAnaconda3libsite-packagesseleniumwebdriverremotewebelement.py”, line 633, in _execute
    return self._parent.execute(command, params)
    File “C:UserskaiumAnaconda3libsite-packagesseleniumwebdriverremotewebdriver.py”, line 321, in execute
    self.error_handler.check_response(response)
    File “C:UserskaiumAnaconda3libsite-packagesseleniumwebdriverremoteerrorhandler.py”, line 242, in check_response
    raise exception_class(message, screen, stacktrace)
    selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {“method”:”css selector”,”selector”:”a”}
    (Session info: chrome=90.0.4430.212)

    2. it shows cookie”We use cookies to make your Web Scraper experience better. Learn more.
    Accept & Continue”
    how to solve this line.
    I will be very grateful if you answer me.
    Thanks in advance!

    1. Technology for Noobs

      I am not really sure how else to say it, all this techniques are to prevent their website to get scraped, which cannot be completely by passed, you need to write logic which you do when you login as a user, thats the whole point of scraping automations, mimicking a user actions.

  2. OMAYMA LAICHIR

    Thanks for this is amazing tutorial. But I have a problem, I`m using the same script to run in a different website,and I am getting a problem with my code, it gets stuck at the second page and give me an error. The error is that ‘the element is not attached to the page document.
    and I’m having problems in the while loop,it’s gave me just the elements of the first page and the page were te code have stopped but repeted the long of the looping.can you please help me solving this issue??

  3. Cristh Tejada

    Thanks for this is amazing tutorial. But I have a problem, I`m using the same script to run in a different website, but while collecting the product links at some point the script duplicate half of the links and at the end I get the double of data I should. How can I avoid this problem? And how I can set to get information from only 10 pages of products., or restrict the pagination. Thanks

    1. Technology for Noobs

      Below are some pointers which might help you:
      1. Same script for different websites lucky you, you need to write custom code for all websites something or other thing would be different, so you need to accommodate those changes
      2. Duplicate records are easy to handle in post processing, load them in pandas and de-duplicate them using pandas functionality
      3. Restricting the pagination is even simpler then above two points, either use for loop which stops at 10 or write a condition in while loop which moves out at 10.

    1. Technology for Noobs

      The approach would be >
      Load the page,
      Scroll to the bottom and click the “Show more”.
      Scroll till it does not show the “Show more”
      then scrape the data.
      If page changes then scrape data accordingly

  4. Asif Abdullah

    Waww Amazing sir its awesome please make more videos on this and there is two question if you please answer 1. if the website is dynamic like weedmaps then what we have to add more codes.,,,,,2. if we do same you did then after getting some data website ask for human verification how can we pass human verification or how can we get data to avoid human verification or captcha thanks waiting you reply please

    1. Praveen Kumar

      Hi guys, I tried scraping a Ecommerce site through Selenium like the same way in this video but it took 30 seconds for getting just one product details. It was too slow. Could you pls help

    2. Asif Abdullah

      @Technology for Noobs thanks for reply sir, there is one extension called web scraper and also they have website website is expensive but i saw there website they are using some methods like proxy and delay time as you said but i don’t know how to create tool like this can you suggest me how can i learn these things i am student of python how can you help me in this ..?

    3. Technology for Noobs

      Hi, just to be clear, most of the websites you scrape kind of allow you to do, and where you get all the human verification popups are meant to stop you from scraping as with high-speed request you are increasing load in their server and for which they are paying, the simple solution is to introduce some randomness in your scraping activity, slow down the speed, and don’t use any login credentials, as it directly hit them and tells you are the one, there are too many standards approached which people follow. Just read about it you will get the idea, and my suggestion if you have to scrape such a website, don’t take that work, it’s not gonna work for huge scraping projects.

  5. Mohammad Haris

    Hi. Great work.
    I am getting a problem with my code, it gets stuck at page 2. The error says that ‘the element is not attached to the page document’. I know that this means that it cannot find ‘h4’ in the page. I have tried it by using xpath, still it cannot find the element. Is there a solution to this?

  6. Guillermo Rodríguez

    Hi. Thaks for the video. I’m having problems in the while loop. For some reason, while loop doesn’t stop after we get to page 20 and it continues scraping the links of the items in it. Can someone help me?

  7. Carlos Luis

    How to scraping date interval, for example: scraping just information between 02/01/2021 to 02/28/2021? Suppose each object contain a date and I just want the ones between an especific range of date.

  8. Elias Grayde

    As I told you I have another problem with the pager loop, the web page I am making is: ferreteria.cl and the code that is not working for me is this, to see if you can help me:

    listOflinks = []
    condition = True
    #while condition:
    productInfoList = webD.find_elements_by_class_name (‘prod’)
    for the in productInfoList:
        ppp = el.find_element_by_class_name (‘img’)
        listOflinks.append (ppp.get_attribute (‘href’))
    try:
        kk = webD.find_elements_by_class_name (‘paginate’) [- 1]
        print (kk.get_attribute (‘aria-label’))
        if kk.get_attribute (‘aria-label’) == ‘Next’:
            kk.click ()

    1. Technology for Noobs

      Your code seems to be working fine in my system.

      listOflinks = []
      condition = True
      while condition:
      productInfoList = webD.find_elements_by_class_name(‘prod’)
      for the in productInfoList:
      ppp = the.find_element_by_class_name(‘img’)
      listOflinks.append (ppp.get_attribute(‘href’))
      try:
      kk = webD.find_elements_by_class_name(‘paginate’)[- 1]
      print (kk.get_attribute (‘aria-label’))
      if kk.get_attribute (‘aria-label’) == ‘Next’:
      kk.click()
      except:
      print (‘Except’)
      pass

Leave a Comment

Your email address will not be published. Required fields are marked *

*
*