Data Preprocessing/ Splitting into Train/Valid/Test Set

 

There are multiple ways to split the data, you can define custom functions or use timestamps if present or use predefined functions like train_test_split in scikit-learn.

 

Here we have used the sample function to fetch 75% of the data to create the training set and then used the rest of the data for the validation set. You can and should create a test set too but here we have a very small dataset and our primary focus here is to get familiar with the process and train a neural network right?

 

Now let’s divide our dataset.

 

Python3




import tensorflow as tf
 
# 75% of the data is selected
train_df = df.sample(frac=0.75, random_state=4)
 
# it drops the training data
# from the original dataframe
val_df = df.drop(train_df.index)


Something to note is that neural networks generally perform better on data that is in the same range. Like if you have different columns and in 1 column you have values which range from 1-10 but in another, it ranges from 100-1000, it’s suggested to first scale all the columns to the same range for better performance. 

Now, the most simple method to do that is : 

value – (min value of the column) / (range of the column)

Python3




# calling to (0,1) range
max_val = train_df.max(axis= 0)
min_val = train_df.min(axis= 0)
 
range = max_val - min_val
train_df = (train_df - min_val)/(range)
 
val_df =  (val_df- min_val)/range


Since we’re done with scaling our data and creating our training and validation datasets, let’s separate it into features i.e inputs and targets, since that’s how we’re going to pass it to the model.

Python3




# now let's separate the targets and labels
X_train = train_df.drop('quality',axis=1)
X_val = val_df.drop('quality',axis=1)
y_train = train_df['quality']
y_val = val_df['quality']
 
# We'll need to pass the shape
# of features/inputs as an argument
# in our model, so let's define a variable
# to save it.
input_shape = [X_train.shape[1]]
 
input_shape


 
 

Output:

 

[11]

 

This means that we’ll be passing 11 features as input to the first layer of our neural network.

 

Implementing Neural Networks Using TensorFlow

Deep learning has been on the rise in this decade and its applications are so wide-ranging and amazing that it’s almost hard to believe that it’s been only a few years in its advancements. And at the core of deep learning lies a basic “unit” that governs its architecture, yes, It’s neural networks.

A neural network architecture comprises a number of neurons or activation units as we call them, and this circuit of units serves their function of finding underlying relationships in data. And it’s mathematically proven that neural networks can find any kind of relation/function regardless of its complexity, provided it is deep/optimized enough, that is how much potential it has.

Now let’s learn to implement a neural network using TensorFlow

Similar Reads

Install Tensorflow

Tensorflow is a library/platform created by and open-sourced by Google. It is the most used library for deep learning applications. Now, creating a neural network might not be the primary function of the TensorFlow library but it is used quite frequently for this purpose. So before going ahead let’s install and import the TensorFlow module....

Download and Read the Data

You can use any dataset you want, here I have used the red-wine quality dataset from Kaggle. This is a classification problem, of course, you can learn to apply the concept to other problems. First, download the dataset in your working directory. Now that the data is downloaded let’s load the data as data frame....

Data Preprocessing/ Splitting into Train/Valid/Test Set

...

Create Model Neural Network

...

Training The Model

...

Generate Predictions and Analyze Accuracy

...

Contact Us