Python Codes

Extracting Search Engine Links – Python


Indian Bloggers

It could be a basic need for many data science projects to extract all the links present in the search engine results. The task can be achieved through many way including Google API’s, Scrappers etc.

One simple and easy way is to use the BeautifulSoup. Following is the python code to extract all the links present in the google search for the user supplied keyword.

Requirements:
– Python 3.5
– Beautiful Soup 4
– Internet connection
[Note: the code might need some syntactic changes if other versions are used]

Code can also be downloaded from the github HERE.

from bs4 import BeautifulSoup
import urllib.request

# Collect the relevant urls from the search engine for the
# user supplied keyword
def collect_urls():
    # Define and set an agent
    opener = urllib.request.build_opener()
    opener.addheaders = [('User-agent', 'Mozilla/5.0')]

    # Accept a keyword from the user
    print("Enter the Keyword. [use - instead of space for more than one word]")
    key = input()

    # Prepare the search engine query
    # We are going to visit the google search engine
    # URL is prepared accordingly
    # Our result will be from the first page of the search engine only
    url = "http://www.google.com/search?q="+ key +"&start="

    # Open the page and parse through the BeautifulSoup
    # Parser is set to html. Any other suitable could be used as well
    page = opener.open(url)
    soup = BeautifulSoup(page, "html.parser")

    # Open a file and write all the links found
    file = open("links.txt", "w")
    for cite in soup.find_all('cite'):
        file.write(cite.text)
        file.write("\n")

    # Close the file
    file.close()

collect_urls()
Advertisements

Let me Know What you Think!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s