text_dataset_from_directory

Tensorflow enables us to read or load text directly from the directory and moreover lets us split the dataset into train and validation, everything using the same method. 

The training directory consists of Java, Python, C#, and JavaScript questions each containing 2000 texts. 

Python3




TRAIN_DIR = f"{DATA_DIR}/train"
TEST_DIR = f"{DATA_DIR}/test"
  
for i in pathlib.os.listdir(TRAIN_DIR):
    text_len = len(pathlib.os.listdir(f"{TRAIN_DIR}/{i}"))
    print(f"{i} contains {text_len} text")


Output:

java contains 2000 text
python contains 2000 text
csharp contains 2000 text
javascript contains 2000 text

To create validation data and assign labels to the data, we shall now use the text_dataset_from_directory method that is used to load text from the directory. 

Python3




training_data = keras.utils.text_dataset_from_directory(
    TRAIN_DIR,
    batch_size=32,
    validation_split=0.2,
    subset='training',
    seed = 42)
  
validation_data = keras.utils.text_dataset_from_directory(
    TRAIN_DIR,
    batch_size=32,
    validation_split=0.2,
    subset='validation',
    seed=42)


Output:

Found 8000 files belonging to 4 classes.
Using 6400 files for training.

Found 8000 files belonging to 4 classes.
Using 1600 files for validation.

This is how you can load the text in Tensorflow. 



Load text in Tensorflow

In this article, we are going to see how to load the text in Tensorflow using Python.

Tensorflow is an open-source Machine Learning platform that helps to create production-ready Machine Learning pipelines. Using Tensorflow, one can easily manage large datasets and develop a Neural network model in a few lines of code. These large datasets may include audio, image, video, or text. In this article, we will focus on the text dataset. 

Similar Reads

How to load the text in Tensorflow?

Text is the most used form of data in today’s real-time world. Documentation, Media Posts, Social Media conversations, and Blog articles all come in the form of text. All the text comes in raw form to be used in Machine Learning models. Tensorflow provides utility support to load the text....

text_dataset_from_directory

...

Contact Us