Scraping an E-Commerce using Selenium in Python, Crawling pages || Automate boring stuff
#selenium #python #automateboring stuff
In this video we will cover below-mentioned topics:
The video is a working example of how to create a web scraper for an E-commerce website, it's a real working example in less than a 30 minutes.
The course material is in github repo:
Please subscribe to my channel by clicking on the link:
First of all , thanks a lot for your great videos. subscribed. by the way, i am scraping same website following your tutorial but i am facing 2 problems when i go from one page to another( one page scraped but problem starts with 2nd page)
here are. 1.
Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1464.0 Safari/537.36
100%|██████████| 6/6 [00:05<00:00, 1.08it/s]
Traceback (most recent call last):
File "C:/Users/kaium/twitter/ecommerce.py", line 93, in
p_a = p.find_element_by_tag_name(‘a’)
File “C:UserskaiumAnaconda3libsite-packagesseleniumwebdriverremotewebelement.py”, line 305, in find_element_by_tag_name
return self.find_element(by=By.TAG_NAME, value=name)
File “C:UserskaiumAnaconda3libsite-packagesseleniumwebdriverremotewebelement.py”, line 658, in find_element
return self._execute(Command.FIND_CHILD_ELEMENT,
File “C:UserskaiumAnaconda3libsite-packagesseleniumwebdriverremotewebelement.py”, line 633, in _execute
return self._parent.execute(command, params)
File “C:UserskaiumAnaconda3libsite-packagesseleniumwebdriverremotewebdriver.py”, line 321, in execute
self.error_handler.check_response(response)
File “C:UserskaiumAnaconda3libsite-packagesseleniumwebdriverremoteerrorhandler.py”, line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {“method”:”css selector”,”selector”:”a”}
(Session info: chrome=90.0.4430.212)
2. it shows cookie”We use cookies to make your Web Scraper experience better. Learn more.
Accept & Continue”
how to solve this line.
I will be very grateful if you answer me.
Thanks in advance!
I am not really sure how else to say it, all this techniques are to prevent their website to get scraped, which cannot be completely by passed, you need to write logic which you do when you login as a user, thats the whole point of scraping automations, mimicking a user actions.
Thanks for this is amazing tutorial. But I have a problem, I`m using the same script to run in a different website,and I am getting a problem with my code, it gets stuck at the second page and give me an error. The error is that ‘the element is not attached to the page document.
and I’m having problems in the while loop,it’s gave me just the elements of the first page and the page were te code have stopped but repeted the long of the looping.can you please help me solving this issue??
the same script would not work on diff website, you need to write the logic based on the new website.
Thank you for sharing the web scraping step by step with clear explanations, and your video editing is amazing as well.
Glad it was helpful!
Thanks for this is amazing tutorial. But I have a problem, I`m using the same script to run in a different website, but while collecting the product links at some point the script duplicate half of the links and at the end I get the double of data I should. How can I avoid this problem? And how I can set to get information from only 10 pages of products., or restrict the pagination. Thanks
Below are some pointers which might help you:
1. Same script for different websites lucky you, you need to write custom code for all websites something or other thing would be different, so you need to accommodate those changes
2. Duplicate records are easy to handle in post processing, load them in pandas and de-duplicate them using pandas functionality
3. Restricting the pagination is even simpler then above two points, either use for loop which stops at 10 or write a condition in while loop which moves out at 10.
Hi, great tutorial, really helped me a lot. I have a question, how do I scrape stuff from a website with “Show More” button but it doesn’t change the URL after clicking it?
The approach would be >
Load the page,
Scroll to the bottom and click the “Show more”.
Scroll till it does not show the “Show more”
then scrape the data.
If page changes then scrape data accordingly
This is really helpful. I like that u didn’t make it seemingly perfect without error bugs. Those errors made me feel safe while working on it
Glad it helped!
Thank you so much, your video helped me a lot. Your way of explanation is super and clear.
Glad it helped!
awesome video tutorial, thank you!! also from time to time, I am using e-scraper scraping service on demand. maybe it will help somebody too.
Sounds great!
Waww Amazing sir its awesome please make more videos on this and there is two question if you please answer 1. if the website is dynamic like weedmaps then what we have to add more codes.,,,,,2. if we do same you did then after getting some data website ask for human verification how can we pass human verification or how can we get data to avoid human verification or captcha thanks waiting you reply please
Hi guys, I tried scraping a Ecommerce site through Selenium like the same way in this video but it took 30 seconds for getting just one product details. It was too slow. Could you pls help
@Technology for Noobs thanks for reply sir, there is one extension called web scraper and also they have website website is expensive but i saw there website they are using some methods like proxy and delay time as you said but i don’t know how to create tool like this can you suggest me how can i learn these things i am student of python how can you help me in this ..?
Hi, just to be clear, most of the websites you scrape kind of allow you to do, and where you get all the human verification popups are meant to stop you from scraping as with high-speed request you are increasing load in their server and for which they are paying, the simple solution is to introduce some randomness in your scraping activity, slow down the speed, and don’t use any login credentials, as it directly hit them and tells you are the one, there are too many standards approached which people follow. Just read about it you will get the idea, and my suggestion if you have to scrape such a website, don’t take that work, it’s not gonna work for huge scraping projects.
Hi. Great work.
I am getting a problem with my code, it gets stuck at page 2. The error says that ‘the element is not attached to the page document’. I know that this means that it cannot find ‘h4’ in the page. I have tried it by using xpath, still it cannot find the element. Is there a solution to this?
Could you please pull the latest code from git repo, I had fixed some code there
@Technology for Noobs Yes I am following your tutorial. Works fine for page 1 but gets stuck on page 2 and shows the error I have mentioned above.
How can I help you? Are you using the same website and same code?
Thanks for perfect tutorial but i have problem at 23:24 with same code as it is not installing the package and showing error …. need your help regarding this .
Pip install tqdm should work, just make sure activate your env if you have one
waiting for reply
@Technology for Noobs Tqdm package
which package?
can you please explain 9:29 why you used [-1] what is the significance of using it here?
-1 of a list gives the last element of the list, see list comprehension for more details.
Thank you Man TBH very well explained
Glad it helped
absolutely amazing
Super!
Hi. Thaks for the video. I’m having problems in the while loop. For some reason, while loop doesn’t stop after we get to page 20 and it continues scraping the links of the items in it. Can someone help me?
The problem is I needed to change the code for the same issue, my suggestion would be to pull the code from github link and try again
@Technology for Noobs I copied and paste exactly what you wrote, but the while loop never exits :/
Check the condition!
GoodJob mate 🙂 from many tutorials yours only one worked from start to end 2021 Jan for Python3
Great to hear!
How to scraping date interval, for example: scraping just information between 02/01/2021 to 02/28/2021? Suppose each object contain a date and I just want the ones between an especific range of date.
@Technology for Noobs How can I do if the object doesn’t have a link? Just a span class?
If the website gives the info infilter format then good, but if not, then we need to filter it during the processing.
Thaank yoouu so much for your efforts.
I have just a qst: how to extract the data into a csv/axcel file?
thaanks
use pandas library and use .to_csv(filename) this will export the data for you
@Technology for Noobs please make complete tutorial sir
As I told you I have another problem with the pager loop, the web page I am making is: ferreteria.cl and the code that is not working for me is this, to see if you can help me:
listOflinks = []
condition = True
#while condition:
productInfoList = webD.find_elements_by_class_name (‘prod’)
for the in productInfoList:
ppp = el.find_element_by_class_name (‘img’)
listOflinks.append (ppp.get_attribute (‘href’))
try:
kk = webD.find_elements_by_class_name (‘paginate’) [- 1]
print (kk.get_attribute (‘aria-label’))
if kk.get_attribute (‘aria-label’) == ‘Next’:
kk.click ()
Your code seems to be working fine in my system.
listOflinks = []
condition = True
while condition:
productInfoList = webD.find_elements_by_class_name(‘prod’)
for the in productInfoList:
ppp = the.find_element_by_class_name(‘img’)
listOflinks.append (ppp.get_attribute(‘href’))
try:
kk = webD.find_elements_by_class_name(‘paginate’)[- 1]
print (kk.get_attribute (‘aria-label’))
if kk.get_attribute (‘aria-label’) == ‘Next’:
kk.click()
except:
print (‘Except’)
pass
Thank you so much for the great work. Can you please show how to handle stale element reference issue?
stale meaning page is loaded for a long time, or something has changed