Dummy Variable Encoding

We will be using pandas.get_dummies function to convert the categorical string data into numeric.

Syntax:

pandas.get_dummies(data, prefix=None, prefix_sep=’_’, dummy_na=False, columns=None, sparse=False, drop_first=False, dtype=None)

Parameters :

  • data : Pandas Series, or DataFrame
  • prefix : str, list of str, or dict of str, default None. String to append DataFrame column names
  • prefix_sep : str, default ‘_’. If appending prefix, separator/delimiter to use.
  • dummy_na : bool, default False. Add a column to indicate NaNs, if False NaNs are ignored.
  • columns : list-like, default None. Column names in the DataFrame to be encoded.
  • sparse : bool, default False. Whether the dummy-encoded columns should be backed by a SparseArray (True) or a regular NumPy array (False).
  • drop_first : bool, default False. Whether to get k-1 dummies out of k categorical levels by removing the first level.
  • dtype : dtype, default np.uint8. It specifies the data type for new columns.

Returns : DataFrame

How to convert categorical string data into numeric in Python?

The datasets have both numerical and categorical features. Categorical features refer to string data types and can be easily understood by human beings. However, machines cannot interpret the categorical data directly. Therefore, the categorical data must be converted into numerical data for further processing.

There are many ways to convert categorical data into numerical data. Here in this article, we’ll be discussing the two most used methods namely :

  • Dummy Variable Encoding
  • Label Encoding

In both the Methods we are using the same data, the link to the dataset is here

Similar Reads

Method 1: Dummy Variable Encoding

We will be using pandas.get_dummies function to convert the categorical string data into numeric....

Stepwise Implementation

Step 1: Importing Libraries...

Method 2:  Label Encoding

...

Stepwise Implementation

...

Contact Us