How to Scrape Google Ngrams?

To scrape google ngram, we will use Python’s requests and urllib libraries.

Now, we will create a function that extracts the data from google ngram’s website. Go through the comments written along with the code in order to follow along. 

Python3




import requests
import urllib
  
def runQuery(query, start_year=1850
             end_year=1860, corpus=26,
             smoothing=0):
  
    # converting a regular string to 
    # the standard URL format
    # eg: "geeks for,geeks" will
    # convert to "geeks%20for%2Cgeeks"
    query = urllib.parse.quote(query)
  
    # creating the URL
    url = 'https://books.google.com/ngrams/json?content=' + query +
    '&year_start=' + str(start_year) + '&year_end=' +
    str(end_year) + '&corpus=' + str(corpus) + '&smoothing=' +
    str(smoothing) + ''
  
    # requesting data from the above url
    response = requests.get(url)
  
    # extracting the json data from the response we got
    output = response.json()
  
    # creating a list to store the ngram data
    return_data = []
  
    if len(output) == 0:
        # if no data returned from site,
        # print the following statement
        return "No data available for this Ngram."
    else:
        # if data returned from site,
        # store the data in return_data list
        for num in range(len(output)):
            
              # getting the name
            return_data.append((output[num]['ngram'], 
                                  
                                # getting ngram data
                                output[num]['timeseries']) 
                               )
  
    return return_data


In the function runQuery, we took an argument string query as the function’s argument while the rest of the arguments were default arguments. By default, the year range was kept 1850 to 1860, the corpus was 26 (i.e. English language), and the smoothing was kept 0. We created the google ngram URL as per the argument string. Then, we used this URL to get the data from google ngram. Once the JSON data was returned, we stored the data we needed in a list and then returned the list.

Now, let us use the runQuery function to find out the popularity of “Albert Einstein”.

Python3




query = "Albert Einstein"
  
print(runQuery(query))


Output:

[(‘Albert Einstein’, [0.0, 0.0, 0.0, 0.0, 2.171790969285325e-09, 

1.014315520464492e-09, 6.44787723214079e-10, 0.0, 7.01216085197131e-10, 0.0, 0.0])]

We can even enter multiple phrases in the same query by separating each phrase with commas.

Python3




query = "Albert Einstein,Isaac Newton"
  
print(runQuery(query))


Output:

[(‘Albert Einstein’, [0.0, 0.0, 0.0, 0.0, 2.171790969285325e-09, 

1.014315520464492e-09, 6.44787723214079e-10, 0.0, 7.01216085197131e-10, 

0.0, 0.0]), (‘Isaac Newton’, [1.568728407619346e-06, 1.135979687205690e-06, 

1.140318772741011e-06, 1.102130454455618e-06, 1.34806168716750e-06, 

2.039112359852879e-06, 1.356955749542976e-06, 1.121004174819972e-06, 

1.223622120960499e-06, 1.18965874662535e-06, 1.077695060303085e-06])]



Scrape Google Ngram Viewer using Python

In this article, we will learn how to scrape Google Ngarm using Python. Google Ngram/Google Books Ngram Viewer is an online search engine that charts the frequencies of any set of search strings.

Similar Reads

What is Google Ngram Viewer?

The Google Ngram Viewer is a search engine used to determine the popularity of a word or a phrase in books. Google ngram viewer gives us various filter options, including selecting the language/genre of the books (also called corpus) and the range of years in which the books were published. By default, the search is case-sensitive....

How to Scrape Google Ngrams?

To scrape google ngram, we will use Python’s requests and urllib libraries....

Contact Us