Listing and Reading Files
Let’s now prepare the document data and read the context in the data.
Python3
# Get a list of student files student_file = [ file for file in os.listdir() if file .endswith( '.txt' )] # Read the content of each student's file student_docs = [ open ( file ).read() for file in student_file] # Print the list of student files and their content for filename, document in zip (student_file, student_docs): print (f "File: {filename}" ) print ( "Content:" ) print (document) print ( "-" * 30 ) # Separator between documents |
output:
File: fatma.txt
Content:
Lorem Ipsum is simply dummy text of the printing and typesetting industry.
Lorem Ipsum has been the industry's
standard dummy text ever since the 1500s,
------------------------------
File: john.txt
Content:
t is a long established fact that a reader will be distracted by the readable content of
a page when looking at its layout.
The point of using Lorem Ipsum
------------------------------
File: juma.txt
Content:
t is a long established fact that a reader will
be distracted by the readable content of a
page when looking at its layout. The point of using Lorem Ipsum
------------------------------
Here, in this code, it collects a list of the student text files, reads their content and prints both the file names and their respective content, making it useful for inspecting and working with the content of the files.
Plagiarism Detection using Python
In this article, we are going to learn how to check plagiarism using Python.
Plagiarism: Plagiarism basically refers to cheating. It means stealing someone’s else work, ideas, or information from the resources without providing the necessary credits to the author. For example, copying text from different resources from word to word without mentioning any quotation marks.
Table of Content
- What is Plagiarism detection?
- Importing Libraries
- Listing and Reading Files
- TF-IDF Vectorization
- Calculating Cosine Similarity
- Creating Document-vector Pairs
- Checking Plagiarism
- Word Cloud Visualization
- Conclusion
Contact Us