What is Splitting and Exploding String Entries?

Splitting and exploding string entries are common operations performed on data frames. These operations are useful when the data present is in the form of strings separated by delimiters such as spaces, commas, etc.

Splitting

The process of splitting a single string or entry into several pieces according to a specified delimiter or pattern is known as splitting. This is frequently used when there are several values in a single entry that are separated by a common character or sequence, like a space, semicolon, or comma. Splitting is performed to divide a string entry into multiple parts based on delimiters present.

Python




import pandas as pd
 
data = {'Names': ['Alice,Bob,Charlie', 'David,Eve', 'Frank']}
df = pd.DataFrame(data)
print(df)


Output:

    Names
0 Alice,Bob,Charlie
1 David,Eve
2 Frank

After splitting

Python




df[['Name1', 'Name2', 'Name3']] = df['Names'].str.split(',', expand=True)
print(df)


Output:

               Names  Name1 Name2    Name3
0 Alice,Bob,Charlie Alice Bob Charlie
1 David,Eve David Eve None
2 Frank Frank None None

Exploding

Creating distinct rows out of a column that has lists or multiple values is known as “exploding.” The explode function in pandas is used to accomplish this. It takes a column containing lists or arrays and, while keeping the values in the other columns, generates a new row for each element in those lists. Exploding is performed to separate that divided string into different rows.

Python




import pandas as pd
 
# Create a list of names for each row
d = {'Names': [['Alice', 'Bob', 'Charlie'], ['David', 'Eve'], ['Frank']]}
df = pd.DataFrame(d)
print(df)


Output:

                Names
0 [Alice, Bob, Charlie]
1 [David, Eve]
2 [Frank]

After using explode method

Python




result = df.explode('Names')
print(result)


Output:

     Names
0 Alice
0 Bob
0 Charlie
1 David
1 Eve
2 Frank

Split Explode Pandas DataFrame String Entry to Separate Rows

Pandas provide us with an explode method which can be used to separate data into rows. Below is the implementation of the explode function along with split that will help in separating the comma separated data into different rows.

Step 1: import the Pandas Library

Firstly, we’ll start with importing all the libraries we’ll be needing for this. Only one library will be required i.e. pandas.

Python




#importing libraries
import pandas as pd


Step 2: Create a dataframe

Then we’ll be needing a dataframe for this implementation. For demo purposes, we’ll create a dummy dataframe using pd.dataframe() function in pandas.

Python




#making a dummy dataframe
df = pd.DataFrame({'id': [1, 2, 3], 'data': ['x, y', 'z, w', 'a']})
#printing dataframe
print(df)


Output:

   id  data
0 1 x, y
1 2 z, w
2 3 a

Step 3: Split the String Entry into a List

Then, we’ll use the str.split() function for splitting the string which are separated by ‘, ‘. If the string entries present are separated by spaces, then we can split by using the delimiter as ” “.

Python




#split the data column on ', '
df['data'] = df['data'].str.split(', ')
#print dataframe
print(df)


Output:

   id    data
0 1 [x, y]
1 2 [z, w]
2 3 [a]

Step 4: Explode the List into Separate Rows

Then the pd.explode() function will separate the values into different rows and we’ll get our desired final dataframe (shown below).

Python




#using the explode method to split the dataframe into rows
df = df.explode('data')
#print final dataframe
print(df)


Output:

   id data
0 1 x
0 1 y
1 2 z
1 2 w
2 3 a

How to Split Explode Pandas DataFrame String Entry to Separate Rows

Sometimes when working with data, one may encounter a situation where the string entries present in the data frame need to be split into different rows. This can be a challenging task especially when the data is large and complex. Still, a Python library known as pandas provides various functions using which this task can be accomplished easily and efficiently. So in this article, we’ll be looking into how to convert dataframe entries in string format to separate rows using methods available in pandas i.e. split and explode.

Similar Reads

What is a Pandas Data frame?

A data frame in pandas is a two-dimensional tabular data structure with labelled axes known as rows and columns. Some of the properties of a data frame are:-...

What is Splitting and Exploding String Entries?

Splitting and exploding string entries are common operations performed on data frames. These operations are useful when the data present is in the form of strings separated by delimiters such as spaces, commas, etc....

Conclusion

...

Contact Us