How to Scrape Google Ngrams?

To scrape google ngram, we will use Python’s requests and urllib libraries.

Now, we will create a function that extracts the data from google ngram’s website. Go through the comments written along with the code in order to follow along.

Python3

import requests 
import urllib 
  
def runQuery(query, start_year=1850,  
             end_year=1860, corpus=26, 
             smoothing=0): 
  
    # converting a regular string to  
    # the standard URL format 
    # eg: "geeks for,geeks" will 
    # convert to "geeks%20for%2Cgeeks" 
    query = urllib.parse.quote(query) 
  
    # creating the URL 
    url = 'https://books.google.com/ngrams/json?content=' + query +
    '&year_start=' + str(start_year) + '&year_end=' +
    str(end_year) + '&corpus=' + str(corpus) + '&smoothing=' +
    str(smoothing) + '' 
  
    # requesting data from the above url 
    response = requests.get(url) 
  
    # extracting the json data from the response we got 
    output = response.json() 
  
    # creating a list to store the ngram data 
    return_data = [] 
  
    if len(output) == 0: 
        # if no data returned from site, 
        # print the following statement 
        return "No data available for this Ngram."
    else: 
        # if data returned from site, 
        # store the data in return_data list 
        for num in range(len(output)): 
            
              # getting the name 
            return_data.append((output[num]['ngram'],  
                                  
                                # getting ngram data 
                                output[num]['timeseries'])  
                               ) 
  
    return return_data 

In the function runQuery, we took an argument string query as the function’s argument while the rest of the arguments were default arguments. By default, the year range was kept 1850 to 1860, the corpus was 26 (i.e. English language), and the smoothing was kept 0. We created the google ngram URL as per the argument string. Then, we used this URL to get the data from google ngram. Once the JSON data was returned, we stored the data we needed in a list and then returned the list.

Now, let us use the runQuery function to find out the popularity of “Albert Einstein”.

Python3

query = "Albert Einstein"
  
print(runQuery(query)) 

Output:

[(‘Albert Einstein’, [0.0, 0.0, 0.0, 0.0, 2.171790969285325e-09,

1.014315520464492e-09, 6.44787723214079e-10, 0.0, 7.01216085197131e-10, 0.0, 0.0])]

We can even enter multiple phrases in the same query by separating each phrase with commas.

Python3

query = "Albert Einstein,Isaac Newton"
  
print(runQuery(query)) 

Output:

[(‘Albert Einstein’, [0.0, 0.0, 0.0, 0.0, 2.171790969285325e-09,

1.014315520464492e-09, 6.44787723214079e-10, 0.0, 7.01216085197131e-10,

0.0, 0.0]), (‘Isaac Newton’, [1.568728407619346e-06, 1.135979687205690e-06,

1.140318772741011e-06, 1.102130454455618e-06, 1.34806168716750e-06,

2.039112359852879e-06, 1.356955749542976e-06, 1.121004174819972e-06,

1.223622120960499e-06, 1.18965874662535e-06, 1.077695060303085e-06])]

Scrape Google Ngram Viewer using Python

In this article, we will learn how to scrape Google Ngarm using Python. Google Ngram/Google Books Ngram Viewer is an online search engine that charts the frequencies of any set of search strings.

Tags:

#Geeks-Premier-League-2022 #python-utility #Web-scraping #Geeks Premier League #Python #python

What is Google Ngram Viewer?

How to Scrape Google Ngrams?

Python3

Python3

Python3

Scrape Google Ngram Viewer using Python

Similar Reads

Contact Us