Google Web Scraper

I’ve been working on this project ever since I learned about Twitter Sentiment Analysis, and I think I am finally at the finish. At this moment, I have no ideas for new features that don’t go beyond scraping more Google search results or swapping search engines, but both tasks would require very few code changes.

You can view the final code on GitHub.

I’ve added many subtle features through all the updates, but here are the highlights:

Scrapes Google search results for hyperlinks not on Google’s homepage
Scrapes the text off those hyperlinks’ pages
Performs sentiment analysis using TextBlob and VADER in tandem; the 2 libraries must agree on the classification, otherwise the classification is “unknown”
Sunmaries the text, by classification, using 4 methods: LexRank, Luhn, LSA, and LSA with stop words
Ranks, by classification, the stopwords-scrubbed keywords accompanying the search term
Displays all results on screen and also saves all results as a text file

Google Web Scraper

Published by A.N.A.K.I.N.

7 Comments

Leave a comment Cancel reply

Share this:

Related

Published by A.N.A.K.I.N.

7 Comments

Leave a comment Cancel reply