text_dataset_from_directory
Tensorflow enables us to read or load text directly from the directory and moreover lets us split the dataset into train and validation, everything using the same method.
The training directory consists of Java, Python, C#, and JavaScript questions each containing 2000 texts.
Python3
TRAIN_DIR = f "{DATA_DIR}/train" TEST_DIR = f "{DATA_DIR}/test" for i in pathlib.os.listdir(TRAIN_DIR): text_len = len (pathlib.os.listdir(f "{TRAIN_DIR}/{i}" )) print (f "{i} contains {text_len} text" ) |
Output:
java contains 2000 text python contains 2000 text csharp contains 2000 text javascript contains 2000 text
To create validation data and assign labels to the data, we shall now use the text_dataset_from_directory method that is used to load text from the directory.
Python3
training_data = keras.utils.text_dataset_from_directory( TRAIN_DIR, batch_size = 32 , validation_split = 0.2 , subset = 'training' , seed = 42 ) validation_data = keras.utils.text_dataset_from_directory( TRAIN_DIR, batch_size = 32 , validation_split = 0.2 , subset = 'validation' , seed = 42 ) |
Output:
Found 8000 files belonging to 4 classes. Using 6400 files for training. Found 8000 files belonging to 4 classes. Using 1600 files for validation.
This is how you can load the text in Tensorflow.
Load text in Tensorflow
In this article, we are going to see how to load the text in Tensorflow using Python.
Tensorflow is an open-source Machine Learning platform that helps to create production-ready Machine Learning pipelines. Using Tensorflow, one can easily manage large datasets and develop a Neural network model in a few lines of code. These large datasets may include audio, image, video, or text. In this article, we will focus on the text dataset.
Contact Us