Scope for Improvement

We could use a dedicated database, like lucent or elastic search to make the search more efficient and fast.  But for the time being, we use the pandas library to get the path of the image to display to the user.

Project Idea – Searching news from Old Newspaper using NLP

We know that the newspaper is an enriched source of knowledge. When a person needs some information about a particular topic or subject he searches online, but it is difficult to get all old news articles from regional local newspapers related to our search. As not every local newspaper provides an online search for people.In this article, we will present an idea to overcome this problem.

Similar Reads

What project does?

This project uses images or pdf of newspaper images from old regional newspapers as input for the database. The model will extract the text from images using Pytesseract. The text from the Pytesseract would be cleaned by NLP practices to simplify and eliminate the words (stop words) that are not helpful for us. The data will be saved in the form of key-value pair in which keys have an image path and values have keywords in the image. Searching: When the user visits the website he will type the topic name or entity name in the search box then images of the newspaper will load on the screen....

Why NLP ?

Newspaper articles contain many articles, prepositions, and other stop words that are not useful to us, so NLP helps us to remove those stop words. It also helps to get unique words....

Use Case Diagram

...

Step By step Implementation:

Libraries installation...

Scope for Improvement

...

Project Application in Real-Life

...

Contact Us