How to Create ARFF File in Weka Tool
In this article, we will be learning about ARFF files and how to create ARFF File (Attribute relation File Format)
As the name suggests it described a list of instances sharing a set of attributes. these files are supported by WEKA machine Learning tool, arff files are used for the purpose of various operations related to data preprocessing, data cleaning etc.
Structure of file.
ARFF file contains 2 sections
- Header Section
- Data Section
All the keywords in ARFF file start with @ symbol.
1. Header Section
This section contains various information related to the dataset like the name of the relation, columns, and type of columns. The header section contains 2 parts Table/relation and attribute part.
@relation :used to give the table name
@attribute: used to give a column name
datatypes:
nominal: represented inside curly brackets (Like constants)
string : data type which accepts only string value
numeric: used to store numbers
date: used to store date
Syntax:
@relation tablename @attribute column_name type
example:
@relation "employee" @attribute f_name string @attribute l_name string @attribute contact_num numeric @attribute dept {HR,IT,MANAGEMENT,MAINTAINANCE} @attribute DOB date dd-mm-yyyy @attribute city string
Here dept column is having nominal data type so it can only accept above mentioned types of data only,
2. Data section
Data section is used to represents the data or entries for available columns. (according to the order in header section data would be inserted).
data section starts with @data, and this section must be added after Header section. only single record can be written in single line.
@data: Used to start data section
%: % sign is used to represent the comment in file.
Syntax:
@data
<record1>
<record2>
.
.
<record N>
all the Records must be in the same format as their attributes are defined in Header section Like
example:
1,naman,N,1234556678,IT,02-08-2000,rjt 2,yash,M,1234556679,HR,04-05-2001,amd 3,kishan,G,1214556678,MANAGEMENT,02-11-2001,pbr 4,?,?,5234556678,IT,03-05-2000,amd
entire file would look like this:
emp.arff file:
@relation "employee" @attribute id numeric @attribute f_name string @attribute l_name string @attribute contact_num numeric @attribute dept {HR,IT,MANAGEMENT,MAINTAINANCE} @attribute DOB date dd-mm-yyyy @attribute city string @data 1,naman,N,1234556678,IT,02-08-2000,rjt 2,yash,M,1234556679,HR,04-05-2001,amd 3,kishan,G,1214556678,MANAGEMENT,02-11-2001,pbr 4,?,?,5234556678,IT,03-05-2000,amd
We separate values by comma(,) and to represent the empty or missing value for a particular column we use the (?)sign.
How to Create and open arff file
you need to have weka tool install on your machine. you can check this How to install Weka.
Step 1: Open any text editor and paste the above code.
Step 2: Save the file with emp_dm.arff file extension
Step 3: Open weka tool
Step 4: Click on Explorer
Then click on Open file
Select/Locate arff file from disk then click On Open.
Step 6: file is now Loaded now click on Edit from Preprocess Tab
Step 7: dataset would be shown like this.
So this is how you can work with arff file. with weka tool, various operations can be done on the Available Dataset. here missing values would be shown as the empty cells.
Contact Us