Reading Tabular Data from Files in Julia
Julia is a high level, high performance, dynamic programming language which allows users to load, save, and manipulate data in various types of files for data science, analysis, and machine learning purposes. Tabular data is data that has a structure of a table and it can be easily read from various files like text, CSV, Excel, etc.
To perform such operations on data and files with ease, we add the Queryverse.jl package which provides us ease of use for other useful packages such as Query.jl, FileIO.jl, CSVFiles.jl, etc.
Julia
# Adding the Queryverse package using Pkg Pkg.add( "Queryverse" ) |
Reading Tabular Data from Text Files
To read data from a text file we have to open it first using the open() function. And to read the tabular data in the file we have to read data in the file line by line using readline() function as shown below:
Julia
# read file contents, line by line open ( "geek.txt" ) do f # line_number line = 0 # read till end of file while ! eof(f) # read a new / next line for every iteration s = readline(f) line + = 1 println( "$(line-1). $s" ) end end |
Reading Tabular Data from CSV Files
DataFrames are used to store data in a tabular form and these DataFrames can be read from CSV or Excel files by using the Queryverse.jl package and the load() function. Queryverse.jl package lets the FileIO.jl package use the CSVFiles.jl package to implement this.
Julia
# using necessary packages using DataFrames, Queryverse # reading dataframe df = load( "marks.csv" ) |> DataFrame |
Sometimes in CSV files, data is separated by different characters like semicolons.
The semicolon can be specified in the load() function to read data in normal tabular form, i.e. without the semicolons.
Julia
# reading data without semicolons df = load( "marks_sc.csv" , ';' ) |> DataFrame |
The column names of the DataFrame take up the first row of the file. To change this we can use the header keyword argument and equate it to false to remove the column names and change the first row into elements of the table in the file.
Julia
# reading data without headers df = load( "marks.csv" , header_exists = false) |> DataFrame |
While loading the data of the file, we can also change the column names using the colnames keyword as shown below:
Julia
# reading data by changing column names df = load( "marks.csv" , colnames = [ "class" , "score" ]) |> DataFrame |
Tabular data from a CSV file can be loaded without a specific number of rows using the skiplines_begin keyword.
Julia
# reading data without specific rows df = load( "marks.csv" , skiplines_begin = 1 ) |> DataFrame |
Reading Tabular Data from Excel Files
The process for reading data from excel sheets is the same as that of CSV files, which has been discussed above, but we have to specify a file with the extension ā*.xlsxā instead of a ā.csvā in the load() function and the specific sheet we want to read.
Julia
# reading sheet 1 of an excel file df = load( "marks.xlsx" , "Sheet1" ) |> DataFrame |
We can also read specific rows and columns of the data in an excel file using the skipstartrows and skipstartcols keywords which skip specified rows and columns as shown below:
Julia
# reading by skipping specific rows and columns df = load( "marks.xlsx" , "Sheet1" , skipstartrows = 1 , skipstartcols = 1 ) |> DataFrame |
Contact Us