Scraping Customer Reviews from Amazon Using R- Please help!

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Scraping Customer Reviews from Amazon Using R- Please help!

dress_code
Hi,

I am currently using the below R script to scrape Amazon customer reviews. I have been using this script for quite a while and it has seemed to work well, until now. I have noticed that R fails to scrape the specified node (found by using SelectorGadget) from all reviews. Each time I run the script I retrieve a different amount, but never the entirety. The goal is to scrape the reviews, (for text, star rating, date, etc.) and compile them into csv files that can later be manipulated using R.

I have also started to get timeout errors some of the time, specifically "Error in open.connection(x, "rb") : Timeout was reached". I am not sure what exactly this error means.

How do I get around this to continue scraping? I am unfamiliar with proxies but have been told that may be a solution, any help or insight is greatly appreciated!! Thank you!

    url <- "https://www.amazon.com/Match-Mens-Wild-Cargo-Pants/product-reviews/B009HLOZ9U/ref=cm_cr_arp_d_show_all?ie=UTF8&reviewerType=all_reviews&pageNumber="
 
    N_pages <- 204
    A <- NULL
    for (j in 1: N_pages){
       pant <- read_html(paste0(url, j))
       B <- cbind(pant %>% html_nodes(".review-text") %>%     html_text()     )
       A <- rbind(A,B)
      }
    tail(A)

    print(j)