Newest 'web-scraping' Questions

0 votes

1 answer

19 views

web-scraping using R selenider on linux error --user-data-dir

I'm attempting to web-scrape using the following R code (which was obtained from this thread: link to other question library(selenider) library(rvest) session <- selenider_session("selenium&...

Nick Amato

21

asked 4 hours ago

0 votes

1 answer

46 views

'list' object cannot be coerced to type 'double' error

I've written some code to scrape a webpage, however when I try to make some modifications, I am getting an error. Below is my code library(httr) library(jsonlite) library(dplyr) library(janitor) ...

HowGoodisData

25

asked 23 hours ago

0 votes

1 answer

101 views

Way to web-scrape a popular eSport website using R?

I'm attempting to webscrape the following url to obtain live game data: https://egamersworld.com/callofduty/matches I've attempted to inspect the fetch requests being made, but there isn't an obvious ...

Nick Amato

21

asked yesterday

-4 votes

0 answers

13 views

I need help in python request package to get register on a website using my own GUI [closed]

How to register on a website using python request package if it has a captcha validation. Actually I am sending a payload to a website server using appropriate headers and all necessary details. but ...

Ishan Kishan

1

asked yesterday

0 votes

0 answers

6 views

Amqp nodejs disconnect event

I'm using amqp in my nodejs scrapping service. Sometimes I get random disconnects and when it autoreconnects, it starts a new whenConnected promise but the previous is still running so my app mess up: ...

Rubén M

119

asked yesterday

-2 votes

0 answers

28 views

Write and fill Excel file with scraped data using Puppeteer and Node.js [closed]

I have a bunch of data stored in an array of objects and now want to fill it into an already existing Excel file. Each property has its column where it needs to be filled into and it should start ...

Honeybadger

1

asked yesterday

-4 votes

0 answers

67 views

Webscraping Page issue (code no longer works) [closed]

I had set up some code to scrape the following page https://www.nba.com/stats/players/catch-shoot Below is my code which used to run perfectly fine, but when I tried running it just now I got the ...

gimmethedata123

65

asked 2 days ago

0 votes

0 answers

16 views

Collecting metadata of the reels/ posts sent to yourself from instagram

I need to collect the metadata of the reels/posts that I have sent to myself. The problem I am running in is graphAPI of meta does not allow to access private dm, is there any way around it without ...

Divyam Sharma

1

asked Mar 3 at 3:15

-1 votes

0 answers

20 views

My telegram bot does not respond to commands [closed]

My telegram bot does not respond to commands. As soon as I start the telegram bot it responds to the commands I entered before turning it on and then it just stops seeing the commands I send. Moreover,...

user25074879

1

asked Mar 2 at 17:25

0 votes

2 answers

74 views

How to Scrape a JavaScript-Rendered Table? (wait_for_selector Timeout & Data Not Loading) [closed]

I'm trying to scrape a table from a webpage, but the table is dynamically loaded via JavaScript and appears 5-7 seconds after page load when viewed manually. However, when using a web scraper, the ...

HamidBee

289

asked Mar 2 at 0:41

-2 votes

0 answers

41 views

Can't scrape email from GitHub sidebar (vcard) with BeautifulSoup

I'm trying to scrape emails from GitHub profiles. I can get emails from the main section, but I'm unable to scrape the email from the sidebar (vcard) using BeautifulSoup. I can get emails from the ...

Balla P. Tall

1

asked Mar 1 at 0:07

0 votes

0 answers

44 views

How to Extract Code Blocks from Different Tabs in a Code Documentation Using Crawl4AI (or any other tool)?

I'm trying to scrape code blocks from multiple tabs in a documentation page using Crawl4AI. While I'm able to extract Markdown content, the code blocks inside tabbed sections are not being captured. ...

harsha bajaj

23

asked Feb 28 at 13:57

0 votes

1 answer

98 views

Failed to identify the reason why my script is missing a few results while scraping a webpage

I've created a script in Python to scrape consultant links from this webpage based on the country filter United States, located in the left sidebar. The webpage shows 2,025 results. However, when I ...

MITHU

164

asked Feb 28 at 10:20

-1 votes

1 answer

46 views

Coordinates of a location in a web page [closed]

I am trying to extract the coordinates e. g. longitude and latitude of the pointer on a map depicted on a static google maps image from a house listing. Example: https://www.zoopla.co.uk/to-rent/...

Chioma Okoroafor

75

asked Feb 27 at 17:25

0 votes

2 answers

67 views

How to search with xpath selector in "nodriver" on python

I am not sure about the correct way to search for specific items using XPath in nodriver on python. I'm using this for try to select a button with a "confirm" text inside. await tab.select(&...

jguerr

3

asked Feb 27 at 11:55

-1 votes

0 answers

16 views

How can we use coroutines in web scraping?

In the situation of scraping 100 URLs, It can be broadly divided into 3 stages. Stage of accessing the URL Stage of waiting for the page to load Stage of parsing and retrieving the data on the page ...

JAEIK JEONG

21

asked Feb 26 at 15:08

1 vote

1 answer

96 views

Webscraping instruction for an R user

I am a statistician/data scientist, R user, runner, and a beginner in the realm of webscraping. I recently completed a race in Tampa, FL and the results are posted online. I would like to use some web ...

Omar123456789

71

asked Feb 26 at 2:27

1 vote

3 answers

68 views

How can I webscrape pdfs under a dropdown button in HTML?

I'm new to scraping websites with HTML and need to download all pdfs from this website, but the info is under dropdown buttons. I tried inspecting the HTML of the website, and I think the code of the ...

aimee prieto

11

asked Feb 25 at 19:26

0 votes

1 answer

61 views

How can I download PDF's using an AI WebCrawler? (Crawler4AI)

I have been using Crawler4AI to try downloading a series of documents from this Website. However, since it requieres JavaScript code and I am using Python, I don't know hot to solve my error. Code, ...

franjefriten

3

asked Feb 25 at 19:15

0 votes

1 answer

69 views

Trying to scrape data from a table from a website

I'm trying to pull some data from a table and store it in a CSV file. I'm using the following (all 64-bit): Firefox version 135.0.1 GeckoDriver 0.36.0 Python version is 3.11.0 I'm trying to scrape ...

Machzy

23

asked Feb 25 at 15:13

-1 votes

0 answers

45 views

Connection to socket.io with R websocket package not working

I am trying to get some data from this page, namely game names and odds and rounds: https://www.winamax.fr/paris-sportifs/sports/1/7/4 I first tried using a GET request from the httr package, by ...

M.O

471

asked Feb 25 at 10:33

1 vote

1 answer

51 views

Why is the coroutine not converted and works synchronously even though a delay is given?

runBlocking { bookLinks.mapIndexed { ranking, bookLink -> val job = async { scrapeBookData(browser, bookLink, ranking) } val result = job.await() if (result != null) { ...

JAEIK JEONG

21

asked Feb 25 at 9:33

1 vote

0 answers

41 views

When using playwright and coroutine for web crawling, the speed is the same as when using coroutine and not using it

package com.example.demo.service import com.example.demo.dto.BookDTO import com.microsoft.playwright.* import com.microsoft.playwright.options.WaitUntilState import jakarta.transaction.Transactional ...

JAEIK JEONG

21

asked Feb 24 at 9:04

1 vote

0 answers

67 views

How to switch to "puppeteer-real-browser" from default puppeteer? [closed]

i want to change my scraper's "puppeteer" library with "puppeteer-real-browser". I tried so many ways but i got bunch of errors and i dont want to ask all in here to make process ...

kokoKOK

11

asked Feb 24 at 5:18

0 votes

2 answers

56 views

Trying to use chrome with seleniumbase and uc=true option

I am trying to scrape a site that has a cloudflare bot check I currently use import undetected_chromedriver as uc and portable CHROME.EXE however this seems to not get me around the bot check , so ...

RobM

835

asked Feb 24 at 3:19

1 vote

0 answers

116 views

Instagram user web profile info API not working anymore

I've been using this link https://i.instagram.com/api/v1/users/web_profile_info/?username={username}for a while along with the APP ID to make requests and it's been fine but suddenly it now says { ...

pkdev

113

asked Feb 23 at 14:36

1 vote

1 answer

62 views

AWS Lambda webscraping through a docker image

I'm learning AWS Lambda and I'm trying to implement a webscraping program. I created my Lambda function through a container image, that I built through Docker. My project folder has three files: ...

weyronn12934

13

asked Feb 22 at 19:38

0 votes

0 answers

69 views

Nodriver web scraping program gets stuck at cdp.network.get_response_body?

I'm trying to intercept the response from the web server and extract the body. it uses the module nodriver to successfully load the page and capture the request event. However when it attempts to send ...

Fab49er

19

asked Feb 21 at 17:59

-7 votes

0 answers

70 views

Selenium python scrapping

I do have the following code: from selenium import webdriver from selenium.webdriver.common.keys import Keys from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import ...

Radosław Poprawski

1

asked Feb 21 at 11:26

0 votes

1 answer

68 views

How can one scrape any table from Wikipedia in Python?

I want to scrape tables from Wikipedia in Python. Wikipedia is a good source to get data from, but the data present is in HTML format which is extremely machine unfriendly and cannot be used directly. ...

Ξένη Γήινος

3,082

asked Feb 20 at 19:54

0 votes

0 answers

45 views

crawl4ai gives Error: 'NoneType' object has no attribute 'new_context'

I am trying to scrape data from www.example.com but the below code returns error : import asyncio from crawl4ai import AsyncWebCrawler from crawl4ai.async_configs import BrowserConfig, ...

user9291211

1

asked Feb 20 at 7:54

0 votes

2 answers

82 views

Playwright Python can't find HTML tag which shows up in debugger and in a print statement

I am trying to scrape a page product detail page but I am not able to find the tag when the code runs. I print the parent tag out, and I see the h2 tag I want, and also when I enter the debugger I can ...

Cody Childers

25

asked Feb 19 at 17:27

-2 votes

1 answer

92 views

How does the website know that it's not my browser?

When I access the URL https://www.getfpv.com/media/sitemap.xml from my browser it works, but when I try to do it with Python, it returns 403 forbidden. How does the website know that it's python ...

Fab49er

19

asked Feb 19 at 16:04

0 votes

0 answers

27 views

How to fill an input and select an option from the dropdown using Puppeteer?

I am working on a project with JavaScript and chose Puppeteer to perform web scraping on various websites. One of the websites I need to scrape is this one Mapas SII, from which I plan to obtain ...

alvaro soto albornoz

1

asked Feb 18 at 14:27

0 votes

0 answers

25 views

FeignException - 504 Gateway Time-out

I have a webscraping service that takes a bit too long (sometimes up to 2 hours). But I did a good configuration, that can handle long functions, however I got this error: 2025-02-18T10:40:00.017Z ...

Aziz Zina

11

asked Feb 18 at 13:44

0 votes

1 answer

30 views

Collect Google Play Reviews in Multiple Countries

I am trying to collect Google Play reviews on certain apps in English-speaking countries using google-play-scraper. The problem was that when I changed the 'country' parameter, it returned the same ...

Sơn Phạm

1

asked Feb 18 at 4:21

0 votes

1 answer

54 views

webscrape table using rvest

I am attempting to scrape the table on this page using rvest https://www.nrl.com/ladder/?competition=111&round=27&season=2024 This is what I have tried so far library(rvest) page <- ...

HowGoodisData

25

asked Feb 18 at 0:16

-1 votes

0 answers

31 views

Not able to fix version of the Chromium, chrome webdriver on the AWS Lambda function for the WebScraping

I am using Selenium, chrome driver, chromium to web scrape the Amazon website, It works fine in the local system. But when I used this approach on the Lambda function, then I am getting the versioning ...

Shalini Dixit

1

asked Feb 17 at 10:14

0 votes

0 answers

52 views

How can I convert an HTML element or React node into an SVG or image?

I am working on a project where I gather data from a user and retrieve stats from their other publicly accessible profiles. Based on this, I generate profile cards or images to display that data. The ...

Sanju Chilukuri

11

asked Feb 17 at 6:48

0 votes

2 answers

91 views

How to Resolve Google News Redirects to Get the Final Article URL Using Axios?

I'm trying to scrape news articles from Google News using Node.js. The issue I am passing is that the links provided by the RSS feed. They give us this type of link which is a Google Rss Link which ...

Deus

13

asked Feb 16 at 22:05

0 votes

2 answers

58 views

How to do web scraping using pyspark

Hello I've a question how to do web scraping and read the response in pyspark Here's my code import requests import pyspark from pyspark.sql.functions import * from pyspark.sql import SparkSession r =...

Bahy Mohamed

311

asked Feb 16 at 10:47

0 votes

1 answer

207 views

Selenium ChromeDriver does not navigate to a URL when using a custom user data directory

This is a code that used Selenium to crawl the web, but .get() does not seem to work after updating Chrome and Chrome Driver. Chrome version is "133.0.6943.99" and the Chrome Driver version ...

Mr. OH

1

asked Feb 16 at 10:17

0 votes

2 answers

105 views

Use R to scrape MLB.com player fielding data

I'm learning how to use R to scrape tables of baseball stats from different places on the web. For example, I adapted this post to scrape a player's minor league fielding data from the player register ...

Buckaroo Banzai

51

asked Feb 15 at 19:36

0 votes

1 answer

83 views

How to scrape website which has hidden data inside table?

I am trying to Scrape Screener.in website to extract some information related to stocks. However while trying to extract Quarterly Results section there are some field which is hidden and when click ...

Data-7scientist

145

asked Feb 15 at 18:01

2 votes

1 answer

77 views

Puppeteer Scraping: See XHR response data before request completes for real time data

I am using puppeteer to scrape a website for real time data in nodejs. Instead of scraping the page, I am watching the backend requests and capturing the JSON/TEXT responses so I have more structured ...

EdE

21

asked Feb 15 at 13:26

0 votes

1 answer

37 views

Selenium not triggering 'Save' button after modifying a placeholder field in AngularJS page

I am automating a web page using Selenium + Python, and I need to update the Zip Code field. The expected behavior in the UI is: The "Save" button is hidden initially. When I click on the ...

matias cantella

19

asked Feb 15 at 12:26

0 votes

0 answers

53 views

How to scrape searched youtube videos with puppeteer

I am trying to use nodeJs with puppeteer to scrape for YouTube video information from the search results. Unfortunately, for some reason, the scrape doesn't load the elements via the document query ...

keshawn Sharper

1

asked Feb 14 at 20:21

2 votes

1 answer

70 views

can't find correct 'select' HTML tag value, and trying to wait for a select option to load, playwright Python

I have an issue where I use a url that ends such as T-shirts page I am trying to scrape the product links off the pages. I have been trying for some time now, nothing is working yet. This is my ...

Cody Childers

25

asked Feb 14 at 15:49

-1 votes

1 answer

61 views

How to scrape links off Google images result with selenium, python?

I'm trying to work on a project, and I need to get the links off google image results. Here is my code: from selenium.webdriver.common.by import By from selenium.webdriver.common.action_chains import ...

Thomas Haddad

1

asked Feb 13 at 22:52

0 votes

1 answer

87 views

Pagination error while accessing data using Google Apps Script

I am trying to access url data (clickable titles) from this table. The script gets the first page correctly but I could not find a way to get the data from second page. Here is the sample script: ...

EagleEye

510

asked Feb 13 at 16:21

Collectives™ on Stack Overflow

Related Tags