Checking Plagiarism

Now, after performing all the tasks , we start with implementing the plagiarism checking function that will help us calculate the plagiarism.

Python3

# Function to check for plagiarism 
def find_plagiarism(): 
    # Initialize an empty set to store plagiarism results 
    plagiarism_results = set() 
  
    # Access the global variable doc_filename_pairs 
    global doc_filename_pairs 
  
    # Iterate through each student's file and vector 
    for student_a_file, student_a_vec in doc_filename_pairs: 
        # Create a copy of the document-filename pairs for iteration 
        remaining_pairs = doc_filename_pairs.copy() 
  
        # Find the index of the current document-filename pair 
        current_index = remaining_pairs.index((student_a_file, student_a_vec)) 
  
        # Remove the current pair from the remaining pairs 
        del remaining_pairs[current_index] 
  
        # Iterate through the remaining pairs to compare with other students 
        for student_b_file, student_b_vec in remaining_pairs: 
            # Calculate the cosine similarity between student_a_vec and student_b_vec 
            similarity_score = calc_cosine_similarity( 
                student_a_vec, student_b_vec)[0][1] 
  
            # Sort the filenames to maintain consistency in results 
            sorted_filenames = sorted((student_a_file, student_b_file)) 
  
            # Create a plagiarism result tuple with sorted filenames and similarity score 
            plagiarism_result = ( 
                sorted_filenames[0], sorted_filenames[1], similarity_score) 
  
            # Add the result to the plagiarism_results set 
            plagiarism_results.add(plagiarism_result) 
  
    # Return the set of plagiarism results 
    return plagiarism_results 
  
  
# Print plagiarism results 
plagiarism_results = find_plagiarism() 
for result in plagiarism_results: 
    print(result) 

Output:

('fatma.txt', 'juma.txt', 0.22010931810615814)
('john.txt', 'juma.txt', 0.9999999999999998)
('fatma.txt', 'john.txt', 0.22010931810615814)

In the above code, it defines a function ‘find_plagiarism’ to check the plagiarism among the collection of the student documents. It iterates through pairs of student documents, calculating cosine similarity between each pair. It ensures that each document is compared to others only once. The results are stored in a set, ‘plagiarism_results’ , as tuples containing the filenames of similar documents and their cosine similarity scores. Finally, it prints the plagiarism results, identifying the potentially plagiarized documents.

Plagiarism Detection using Python

In this article, we are going to learn how to check plagiarism using Python.

Plagiarism: Plagiarism basically refers to cheating. It means stealing someone’s else work, ideas, or information from the resources without providing the necessary credits to the author. For example, copying text from different resources from word to word without mentioning any quotation marks.

Table of Content

What is Plagiarism detection?
Importing Libraries
Listing and Reading Files
TF-IDF Vectorization
Calculating Cosine Similarity
Creating Document-vector Pairs
Checking Plagiarism
Word Cloud Visualization
Conclusion

Checking Plagiarism

Python3

Plagiarism Detection using Python

Table of Content

Similar Reads

Contact Us