51,503 questions
0
votes
1
answer
19
views
web-scraping using R selenider on linux error --user-data-dir
I'm attempting to web-scrape using the following R code (which was obtained from this thread: link to other question
library(selenider)
library(rvest)
session <- selenider_session("selenium&...
0
votes
1
answer
46
views
'list' object cannot be coerced to type 'double' error
I've written some code to scrape a webpage, however when I try to make some modifications, I am getting an error. Below is my code
library(httr)
library(jsonlite)
library(dplyr)
library(janitor)
...
0
votes
1
answer
101
views
Way to web-scrape a popular eSport website using R?
I'm attempting to webscrape the following url to obtain live game data: https://egamersworld.com/callofduty/matches I've attempted to inspect the fetch requests being made, but there isn't an obvious ...
-4
votes
0
answers
13
views
I need help in python request package to get register on a website using my own GUI [closed]
How to register on a website using python request package if it has a captcha validation. Actually I am sending a payload to a website server using appropriate headers and all necessary details. but ...
0
votes
0
answers
6
views
Amqp nodejs disconnect event
I'm using amqp in my nodejs scrapping service. Sometimes I get random disconnects and when it autoreconnects, it starts a new whenConnected promise but the previous is still running so my app mess up:
...
-2
votes
0
answers
28
views
Write and fill Excel file with scraped data using Puppeteer and Node.js [closed]
I have a bunch of data stored in an array of objects and now want to fill it into an already existing Excel file. Each property has its column where it needs to be filled into and it should start ...
-4
votes
0
answers
67
views
Webscraping Page issue (code no longer works) [closed]
I had set up some code to scrape the following page
https://www.nba.com/stats/players/catch-shoot
Below is my code which used to run perfectly fine, but when I tried running it just now I got the ...
0
votes
0
answers
16
views
Collecting metadata of the reels/ posts sent to yourself from instagram
I need to collect the metadata of the reels/posts that I have sent to myself. The problem I am running in is graphAPI of meta does not allow to access private dm, is there any way around it without ...
-1
votes
0
answers
20
views
My telegram bot does not respond to commands [closed]
My telegram bot does not respond to commands. As soon as I start the telegram bot it responds to the commands I entered before turning it on and then it just stops seeing the commands I send. Moreover,...
0
votes
2
answers
74
views
How to Scrape a JavaScript-Rendered Table? (wait_for_selector Timeout & Data Not Loading) [closed]
I'm trying to scrape a table from a webpage, but the table is dynamically loaded via JavaScript and appears 5-7 seconds after page load when viewed manually.
However, when using a web scraper, the ...
-2
votes
0
answers
41
views
Can't scrape email from GitHub sidebar (vcard) with BeautifulSoup
I'm trying to scrape emails from GitHub profiles. I can get emails from the main section, but I'm unable to scrape the email from the sidebar (vcard) using BeautifulSoup. I can get emails from the ...
0
votes
0
answers
44
views
How to Extract Code Blocks from Different Tabs in a Code Documentation Using Crawl4AI (or any other tool)?
I'm trying to scrape code blocks from multiple tabs in a documentation page using Crawl4AI. While I'm able to extract Markdown content, the code blocks inside tabbed sections are not being captured.
...
0
votes
1
answer
98
views
Failed to identify the reason why my script is missing a few results while scraping a webpage
I've created a script in Python to scrape consultant links from this webpage based on the country filter United States, located in the left sidebar. The webpage shows 2,025 results. However, when I ...
-1
votes
1
answer
46
views
Coordinates of a location in a web page [closed]
I am trying to extract the coordinates e. g. longitude and latitude of the pointer on a map depicted on a static google maps image from a house listing.
Example: https://www.zoopla.co.uk/to-rent/...
0
votes
2
answers
67
views
How to search with xpath selector in "nodriver" on python
I am not sure about the correct way to search for specific items using XPath in nodriver on python.
I'm using this for try to select a button with a "confirm" text inside.
await tab.select(&...
-1
votes
0
answers
16
views
How can we use coroutines in web scraping?
In the situation of scraping 100 URLs,
It can be broadly divided into 3 stages.
Stage of accessing the URL
Stage of waiting for the page to load
Stage of parsing and retrieving the data on the page
...
1
vote
1
answer
96
views
Webscraping instruction for an R user
I am a statistician/data scientist, R user, runner, and a beginner in the realm of webscraping.
I recently completed a race in Tampa, FL and the results are posted online. I would like to use some web ...
1
vote
3
answers
68
views
How can I webscrape pdfs under a dropdown button in HTML?
I'm new to scraping websites with HTML and need to download all pdfs from this website, but the info is under dropdown buttons. I tried inspecting the HTML of the website, and I think the code of the ...
0
votes
1
answer
61
views
How can I download PDF's using an AI WebCrawler? (Crawler4AI)
I have been using Crawler4AI to try downloading a series of documents from this Website. However, since it requieres JavaScript code and I am using Python, I don't know hot to solve my error.
Code, ...
0
votes
1
answer
69
views
Trying to scrape data from a table from a website
I'm trying to pull some data from a table and store it in a CSV file.
I'm using the following (all 64-bit):
Firefox version 135.0.1
GeckoDriver 0.36.0
Python version is 3.11.0
I'm trying to scrape ...
-1
votes
0
answers
45
views
Connection to socket.io with R websocket package not working
I am trying to get some data from this page, namely game names and odds and rounds:
https://www.winamax.fr/paris-sportifs/sports/1/7/4
I first tried using a GET request from the httr package, by ...
1
vote
1
answer
51
views
Why is the coroutine not converted and works synchronously even though a delay is given?
runBlocking {
bookLinks.mapIndexed { ranking, bookLink ->
val job = async { scrapeBookData(browser, bookLink, ranking) }
val result = job.await()
if (result != null) {
...
1
vote
0
answers
41
views
When using playwright and coroutine for web crawling, the speed is the same as when using coroutine and not using it
package com.example.demo.service
import com.example.demo.dto.BookDTO
import com.microsoft.playwright.*
import com.microsoft.playwright.options.WaitUntilState
import jakarta.transaction.Transactional
...
1
vote
0
answers
67
views
How to switch to "puppeteer-real-browser" from default puppeteer? [closed]
i want to change my scraper's "puppeteer" library with "puppeteer-real-browser". I tried so many ways but i got bunch of errors and i dont want to ask all in here to make process ...
0
votes
2
answers
56
views
Trying to use chrome with seleniumbase and uc=true option
I am trying to scrape a site that has a cloudflare bot check I currently use
import undetected_chromedriver as uc
and portable CHROME.EXE
however this seems to not get me around the bot check , so ...
1
vote
0
answers
116
views
Instagram user web profile info API not working anymore
I've been using this link https://i.instagram.com/api/v1/users/web_profile_info/?username={username}for a while along with the APP ID to make requests and it's been fine but suddenly it now says
{
...
1
vote
1
answer
62
views
AWS Lambda webscraping through a docker image
I'm learning AWS Lambda and I'm trying to implement a webscraping program. I created my Lambda function through a container image, that I built through Docker. My project folder has three files:
...
0
votes
0
answers
69
views
Nodriver web scraping program gets stuck at cdp.network.get_response_body?
I'm trying to intercept the response from the web server and extract the body. it uses the module nodriver to successfully load the page and capture the request event. However when it attempts to send ...
-7
votes
0
answers
70
views
Selenium python scrapping
I do have the following code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import ...
0
votes
1
answer
68
views
How can one scrape any table from Wikipedia in Python?
I want to scrape tables from Wikipedia in Python. Wikipedia is a good source to get data from, but the data present is in HTML format which is extremely machine unfriendly and cannot be used directly. ...
0
votes
0
answers
45
views
crawl4ai gives Error: 'NoneType' object has no attribute 'new_context'
I am trying to scrape data from www.example.com but the below code returns error :
import asyncio
from crawl4ai import AsyncWebCrawler
from crawl4ai.async_configs import BrowserConfig, ...
0
votes
2
answers
82
views
Playwright Python can't find HTML tag which shows up in debugger and in a print statement
I am trying to scrape a page product detail page
but I am not able to find the tag when the code runs. I print the parent tag out, and I see the h2 tag I want, and also when I enter the debugger I can ...
-2
votes
1
answer
92
views
How does the website know that it's not my browser?
When I access the URL https://www.getfpv.com/media/sitemap.xml from my browser it works, but when I try to do it with Python, it returns 403 forbidden. How does the website know that it's python ...
0
votes
0
answers
27
views
How to fill an input and select an option from the dropdown using Puppeteer?
I am working on a project with JavaScript and chose Puppeteer to perform web scraping on various websites. One of the websites I need to scrape is this one Mapas SII, from which I plan to obtain ...
0
votes
0
answers
25
views
FeignException - 504 Gateway Time-out
I have a webscraping service that takes a bit too long (sometimes up to 2 hours).
But I did a good configuration, that can handle long functions, however I got this error:
2025-02-18T10:40:00.017Z ...
0
votes
1
answer
30
views
Collect Google Play Reviews in Multiple Countries
I am trying to collect Google Play reviews on certain apps in English-speaking countries using google-play-scraper. The problem was that when I changed the 'country' parameter, it returned the same ...
0
votes
1
answer
54
views
webscrape table using rvest
I am attempting to scrape the table on this page using rvest
https://www.nrl.com/ladder/?competition=111&round=27&season=2024
This is what I have tried so far
library(rvest)
page <- ...
-1
votes
0
answers
31
views
Not able to fix version of the Chromium, chrome webdriver on the AWS Lambda function for the WebScraping
I am using Selenium, chrome driver, chromium to web scrape the Amazon website, It works fine in the local system. But when I used this approach on the Lambda function, then I am getting the versioning ...
0
votes
0
answers
52
views
How can I convert an HTML element or React node into an SVG or image?
I am working on a project where I gather data from a user and retrieve stats from their other publicly accessible profiles. Based on this, I generate profile cards or images to display that data.
The ...
0
votes
2
answers
91
views
How to Resolve Google News Redirects to Get the Final Article URL Using Axios?
I'm trying to scrape news articles from Google News using Node.js. The issue I am passing is that the links provided by the RSS feed. They give us this type of link which is a Google Rss Link which ...
0
votes
2
answers
58
views
How to do web scraping using pyspark
Hello I've a question how to do web scraping and read the response in pyspark
Here's my code
import requests
import pyspark
from pyspark.sql.functions import *
from pyspark.sql import SparkSession
r =...
0
votes
1
answer
207
views
Selenium ChromeDriver does not navigate to a URL when using a custom user data directory
This is a code that used Selenium to crawl the web, but .get() does not seem to work after updating Chrome and Chrome Driver.
Chrome version is "133.0.6943.99" and the Chrome Driver version ...
0
votes
2
answers
105
views
Use R to scrape MLB.com player fielding data
I'm learning how to use R to scrape tables of baseball stats from different places on the web. For example, I adapted this post to scrape a player's minor league fielding data from the player register ...
0
votes
1
answer
83
views
How to scrape website which has hidden data inside table?
I am trying to Scrape Screener.in website to extract some information related to stocks.
However while trying to extract Quarterly Results section there are some field which is hidden and when click ...
2
votes
1
answer
77
views
Puppeteer Scraping: See XHR response data before request completes for real time data
I am using puppeteer to scrape a website for real time data in nodejs. Instead of scraping the page, I am watching the backend requests and capturing the JSON/TEXT responses so I have more structured ...
0
votes
1
answer
37
views
Selenium not triggering 'Save' button after modifying a placeholder field in AngularJS page
I am automating a web page using Selenium + Python, and I need to update the Zip Code field. The expected behavior in the UI is:
The "Save" button is hidden initially.
When I click on the ...
0
votes
0
answers
53
views
How to scrape searched youtube videos with puppeteer
I am trying to use nodeJs with puppeteer to scrape for YouTube video information from the search results. Unfortunately, for some reason, the scrape doesn't load the elements via the document query ...
2
votes
1
answer
70
views
can't find correct 'select' HTML tag value, and trying to wait for a select option to load, playwright Python
I have an issue where I use a url that ends such as T-shirts page
I am trying to scrape the product links off the pages. I have been trying for some time now, nothing is working yet. This is my ...
-1
votes
1
answer
61
views
How to scrape links off Google images result with selenium, python?
I'm trying to work on a project, and I need to get the links off google image results.
Here is my code:
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ...
0
votes
1
answer
87
views
Pagination error while accessing data using Google Apps Script
I am trying to access url data (clickable titles) from this table. The script gets the first page correctly but I could not find a way to get the data from second page. Here is the sample script:
...