How to load the text in Tensorflow?

Text is the most used form of data in today’s real-time world. Documentation, Media Posts, Social Media conversations, and Blog articles all come in the form of text. All the text comes in raw form to be used in Machine Learning models. Tensorflow provides utility support to load the text. 

Let’s take an example to demonstrate on how to load and preprocess text. 

Before we proceed let us first import the required modules and download the dataset

Python3




import tensorflow as tf
import tensorflow.keras as keras
import pathlib
  
url = 'https://storage.googleapis.com/download.tensorflow.org/data/stack_overflow_16k.tar.gz'
  
download = keras.utils.get_file(
    origin=url, untar=True, cache_dir='stack_overflow')
DATA_DIR = pathlib.Path(download).parent
print(pathlib.os.listdir(DATA_DIR))
print(pathlib.os.listdir(f"{DATA_DIR}/train"))


Output:

['train', 'stack_overflow_16k.tar.gz', 'test', 'README.md']
['java', 'python', 'csharp', 'javascript']

We downloaded Stack Overflow question text data in the above code using Keras API. utils.get_file method takes in the origin URL which contains the actual data. By setting untar=True, the dataset is unzipped automatically and saved in the directory. A Machine Learning model is continually trained on training data, verified on validation data, and tested on testing data. 

Load text in Tensorflow

In this article, we are going to see how to load the text in Tensorflow using Python.

Tensorflow is an open-source Machine Learning platform that helps to create production-ready Machine Learning pipelines. Using Tensorflow, one can easily manage large datasets and develop a Neural network model in a few lines of code. These large datasets may include audio, image, video, or text. In this article, we will focus on the text dataset. 

Similar Reads

How to load the text in Tensorflow?

Text is the most used form of data in today’s real-time world. Documentation, Media Posts, Social Media conversations, and Blog articles all come in the form of text. All the text comes in raw form to be used in Machine Learning models. Tensorflow provides utility support to load the text....

text_dataset_from_directory

...

Contact Us