Preventing duplicates by mentioning explicit suffix names for columns
In this method to prevent the duplicated while joining the columns of the two different data frames, the user needs to use the pd.merge() function which is responsible to join the columns together of the data frame, and then the user needs to call the drop() function with the required condition passed as the parameter as shown below to remove all the duplicates from the final data frame.
drop() function:
This function is used to drop specified labels from rows or columns..
Syntax:
DataFrame.drop(self, labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors=’raise’)
Parameters:
- labels: Index or column labels to drop.
- axis: Whether to drop labels from the index (0 or ‘index’) or columns (1 or ‘columns’). {0 or ‘index’, 1 or ‘columns’}
- index: Alternative to specifying axis (labels, axis=0 is equivalent to index=labels).
- columns: Alternative to specifying axis (labels, axis=1 is equivalent to columns=labels).
- level: For MultiIndex, the level from which the labels will be removed.
- in place: If True, do operation inplace and return None.
- errors: If ‘ignore’, suppress error and only existing labels are dropped.
Example:
In this example, we are using the pd.merge() function to join the two data frames by inner join. Now, add a suffix called ‘remove’ for newly joined columns that have the same name in both data frames. Use the drop() function to remove the columns with the suffix ‘remove’. This will ensure that identical columns don’t exist in the new dataframe.
Python3
# import python pandas package import pandas as pd # import the numpy package import numpy as np # Create sample dataframe data1 and data2 data1 = pd.DataFrame(np.random.randint( 100 , size = ( 1000 , 3 )), columns = [ 'EMI' , 'Salary' , 'Debt' ]) data2 = pd.DataFrame(np.random.randint( 100 , size = ( 1000 , 3 )), columns = [ 'Salary' , 'Debt' , 'Bonus' ]) # Merge the DataFrames df_merged = pd.merge(data1, data2, how = 'inner' , left_index = True , right_index = True , suffixes = (' ', ' _remove')) # remove the duplicate columns df_merged.drop([i for i in df_merged.columns if 'remove' in i], axis = 1 , inplace = True ) print (merged) |
Output:
Prevent duplicated columns when joining two Pandas DataFrames
Column duplication usually occurs when the two data frames have columns with the same name and when the columns are not used in the JOIN statement. In this article, let us discuss the three different methods in which we can prevent duplication of columns when joining two data frames.
Syntax: pandas.merge(left, right, how=’inner’, on=None, left_on=None, right_on=None)
Explanation:
- left – Dataframe which has to be joined from left
- right – Dataframe which has to be joined from the right
- how – specifies the type of join. left, right, outer, inner, cross
- on – Column names to join the two dataframes.
- left_on – Column names to join on in the left DataFrame.
- right_on – Column names to join on in the right DataFrame.
Contact Us