How to usePandas and SciPy in Python Pandas

Step – 1

After the installation of the required modules, we will import them.

Python3




import pandas as pd
from scipy.io import arff


We will use the loadarff() method of the arff class of the SciPy.io module. So the user can import them directly at the beginning, or load just the arff class and then use the loadarff method while needed.

Step – 2

Download an ARFF file from the Official WEKA website and keep it in the same directory as the python file. It would be easier to import it then. We will now use the loadarff() method to import the file which we have downloaded and store it in a variable.

Python3




# code
arff_file = arff.loadarff('/content/cpu.arff')


Step – 3

Now we will use the DataFrame method of the pandas library here to convert that ARFF file into pandas dataframe.

Python3




df = pd.DataFrame(arff_file[0])


Here inside the DataFrame() method we are passing the name of the file in which we have imported and stored the ARFF file and providing the index [0] to signify that the data is extracted from the first column of the arff file and then converted into a Pandas Dataframe.

Step – 4

Now we will use common pandas commands like head(), tail() etc to see if the arff file has been successfully converted into a dataframe or not.

Python3




df.head()


Output:

    MYCT    MMIN     MMAX   CACH  CHMIN  CHMAX  class
0 125.0 256.0 6000.0 256.0 16.0 128.0 198.0
1 29.0 8000.0 32000.0 32.0 8.0 32.0 269.0
2 29.0 8000.0 32000.0 32.0 8.0 32.0 220.0
3 29.0 8000.0 32000.0 32.0 8.0 32.0 172.0
4 29.0 8000.0 16000.0 32.0 8.0 16.0 132.0

Python3




df.tail()


Output:

      MYCT    MMIN    MMAX  CACH  CHMIN  CHMAX  class
204 124.0 1000.0 8000.0 0.0 1.0 8.0 42.0
205 98.0 1000.0 8000.0 32.0 2.0 8.0 46.0
206 125.0 2000.0 8000.0 0.0 2.0 14.0 52.0
207 480.0 512.0 8000.0 32.0 0.0 0.0 67.0
208 480.0 1000.0 4000.0 0.0 0.0 0.0 45.0

Python3




df['MYCT'].head(20)


Output:

0     125.0
1 29.0
2 29.0
3 29.0
4 29.0
5 26.0
6 23.0
7 23.0
8 23.0
9 23.0
10 400.0
11 400.0
12 60.0
13 50.0
14 350.0
15 200.0
16 167.0
17 143.0
18 143.0
19 110.0
Name: MYCT, dtype: float64

Reading An Arff File To Pandas Dataframe

Attribute-Relation File Format (ARFF) is a file format developed by the Machine Learning Project of the University of Waikato, New Zealand. It has been developed by the Computer Science department of the aforementioned University. The ARFF files mostly belong to WEKA (Waikato Environment for Knowledge Analysis), which is free software licensed under the GNU Free Public License. It is a collection of Machine Learning and Data Analysis tools.

In this article, we will see how we can convert an ARFF file into a Pandas data frame.

Prerequisites:

We will be using two modules here.

To install them, execute the following command –

pip install pandas
pip install scipy

Similar Reads

Approach 1: Using Pandas and SciPy

Step – 1...

Approach – 2 : Using liac_arff and Pandas

...

Conclusion

...

Contact Us