Google Web Scraper

I’ve been working on this project ever since I learned about Twitter Sentiment Analysis, and I think I am finally at the finish. At this moment, I have no ideas for new features that don’t go beyond scraping more Google search results or swapping search engines, but both tasks would require very few code changes.

You can view the final code on GitHub.

I’ve added many subtle features through all the updates, but here are the highlights:

  • Scrapes Google search results for hyperlinks not on Google’s homepage
  • Scrapes the text off those hyperlinks’ pages
  • Performs sentiment analysis using TextBlob and VADER in tandem; the 2 libraries must agree on the classification, otherwise the classification is “unknown”
  • Sunmaries the text, by classification, using 4 methods: LexRank, Luhn, LSA, and LSA with stop words
  • Ranks, by classification, the stopwords-scrubbed keywords accompanying the search term
  • Displays all results on screen and also saves all results as a text file