Find Duplicate files by computing Hash in C#
Here we will try to find the duplicate files by its content in it with computing its hash. If you copy the file several times and change the file name still it will scan and can detect that its a duplicate file.
I will use .NET 5 and C# 9 here.
let me show you how
Let me explain what's going on here.
There is two methods i defined one is GetHashValue()
.This method opens the file and computes its hash value with the help of the SHA256 class
from System.Security.Cryptography;
The second method is Print() which is same as Console.WriteLine which I am not a big fan of. Print is small , sweet and it works just fine and the code looks clean. #opinion.
We are getting all the list of files from the Directory.GetFiles method and computing hash for each file and generating an IEnumerable of an anonymous type which has two property Name(the name of the file) and Hash(the hash value computed from the method). LINQ is making our code easier here.
Again using LINQ we are grouping the collection by the hash value and counting how many duplicates of which file are there in the directory.
If you want to scan for subfolders you have use a recursive function but for now it gets the job done.
For testing what I did is i had few files in the folders and i copied them in the same directory to test if it is working or not. If you make a copy of a file and you rename it to some another name still it can detect that its a duplicate file.
Let's see the test results
Leave a comment on what do you think on this.
Thanks for reading.
Happy Coding
Comments
Post a Comment