How do you find duplicates from Windows SMB shares using Linux
-
I'm just looking for a way to tally the amount of duplicate files there are on any given share, doesn't need to be anything fancy. I would ideally like it to check the hashes of the files and then post a summary to a log file.
I'm looking at fdupes (
dnf install fdupes
) as this might do what I want, but I'm open to suggestions. -
@DustinB3403 said in How do you find duplicates from Windows SMB shares using Linux:
I'm looking at fdupes ( dnf install fdupes ) as this might do what I want, but I'm open to suggestions.
I looked it up and that's what I found as likely the best option, too.
-
@DustinB3403 said in How do you find duplicates from Windows SMB shares using Linux:
I'm just looking for a way to tally the amount of duplicate files there are on any given share, doesn't need to be anything fancy. I would ideally like it to check the hashes of the files and then post a summary to a log file.
I'm looking at fdupes (
dnf install fdupes
) as this might do what I want, but I'm open to suggestions.I would assume you can just write command output to a file and that should accomplish what you want with most simplicity.
-
@IRJ Yeah the output part is really simple, fdupes seems really simple too.
fdupes -rmsHA --sameline /target > output.log
is running.I just wasn't sure if there was any better options out there.
-
@DustinB3403 said in How do you find duplicates from Windows SMB shares using Linux:
@IRJ Yeah the output part is really simple, fdupes seems really simple too.
fdupes -rmsHA --sameline /target > output.log
is running.I just wasn't sure if there was any better options out there.
I just realized that the
--sameline
option can be replaced with-1
as in number one. The manual isn't clear about that and reading the option itself is difficult to delineate the difference. -
@DustinB3403 said in How do you find duplicates from Windows SMB shares using Linux:
@IRJ Yeah the output part is really simple, fdupes seems really simple too.
fdupes -rmsHA --sameline /target > output.log
is running.I just wasn't sure if there was any better options out there.
you also may want to grep for certain data if the entire output is too noisy
-
@IRJ said in How do you find duplicates from Windows SMB shares using Linux:
@DustinB3403 said in How do you find duplicates from Windows SMB shares using Linux:
@IRJ Yeah the output part is really simple, fdupes seems really simple too.
fdupes -rmsHA --sameline /target > output.log
is running.I just wasn't sure if there was any better options out there.
you also may want to grep for certain data if the entire output is too noisy
Normally I would filter down, but since I'm just trying to get a grasp on the amount of potential duplication that there is, filtering at this point would only skew that number.
-
@DustinB3403 some folks claim jdupes is faster, I have used both, and did not much of a difference.
Both work well. -
@pattonb to get an idea of how many dupes use the following
fdupes -r -m /directory(share to scan)
-
I wonder if this would run faster directly on the server in powershell instead? I'm assuming with doing this over SMB you have to download all files, run the hash - if ran locally, you get to skip the download time, I assume.
-
@Dashrender said in How do you find duplicates from Windows SMB shares using Linux:
I wonder if this would run faster directly on the server in powershell instead? I'm assuming with doing this over SMB you have to download all files, run the hash - if ran locally, you get to skip the download time, I assume.
I gathered that the SMB shares are hosted on Linux, but I could be wrong.
If they are hosted on Windows like you are assuming, then I would agree that PowerShell would probably be most performant for this.
-
@IRJ said in How do you find duplicates from Windows SMB shares using Linux:
@Dashrender said in How do you find duplicates from Windows SMB shares using Linux:
I wonder if this would run faster directly on the server in powershell instead? I'm assuming with doing this over SMB you have to download all files, run the hash - if ran locally, you get to skip the download time, I assume.
I gathered that the SMB shares are hosted on Linux, but I could be wrong.
If they are hosted on Windows like you are assuming, then I would agree that PowerShell would probably be most performant for this.
The title says - Windows SMB Shares.
My guess is that Dustin is a lone wolf running a 'nix OS as his machine - and the rest of the company is using Windows. Nothing wrong with that, just my guess.
-
@Dashrender said in How do you find duplicates from Windows SMB shares using Linux:
@IRJ said in How do you find duplicates from Windows SMB shares using Linux:
@Dashrender said in How do you find duplicates from Windows SMB shares using Linux:
I wonder if this would run faster directly on the server in powershell instead? I'm assuming with doing this over SMB you have to download all files, run the hash - if ran locally, you get to skip the download time, I assume.
I gathered that the SMB shares are hosted on Linux, but I could be wrong.
If they are hosted on Windows like you are assuming, then I would agree that PowerShell would probably be most performant for this.
The title says - Windows SMB Shares.
My guess is that Dustin is a lone wolf running a 'nix OS as his machine - and the rest of the company is using Windows. Nothing wrong with that, just my guess.
His company is significantly Mac.
-
@JaredBusch said in How do you find duplicates from Windows SMB shares using Linux:
@Dashrender said in How do you find duplicates from Windows SMB shares using Linux:
@IRJ said in How do you find duplicates from Windows SMB shares using Linux:
@Dashrender said in How do you find duplicates from Windows SMB shares using Linux:
I wonder if this would run faster directly on the server in powershell instead? I'm assuming with doing this over SMB you have to download all files, run the hash - if ran locally, you get to skip the download time, I assume.
I gathered that the SMB shares are hosted on Linux, but I could be wrong.
If they are hosted on Windows like you are assuming, then I would agree that PowerShell would probably be most performant for this.
The title says - Windows SMB Shares.
My guess is that Dustin is a lone wolf running a 'nix OS as his machine - and the rest of the company is using Windows. Nothing wrong with that, just my guess.
His company is significantly Mac.
aww, that's right - he has been asking a lot of MAC questions lately.
-
@Dashrender said in How do you find duplicates from Windows SMB shares using Linux:
@JaredBusch said in How do you find duplicates from Windows SMB shares using Linux:
@Dashrender said in How do you find duplicates from Windows SMB shares using Linux:
@IRJ said in How do you find duplicates from Windows SMB shares using Linux:
@Dashrender said in How do you find duplicates from Windows SMB shares using Linux:
I wonder if this would run faster directly on the server in powershell instead? I'm assuming with doing this over SMB you have to download all files, run the hash - if ran locally, you get to skip the download time, I assume.
I gathered that the SMB shares are hosted on Linux, but I could be wrong.
If they are hosted on Windows like you are assuming, then I would agree that PowerShell would probably be most performant for this.
The title says - Windows SMB Shares.
My guess is that Dustin is a lone wolf running a 'nix OS as his machine - and the rest of the company is using Windows. Nothing wrong with that, just my guess.
His company is significantly Mac.
aww, that's right - he has been asking a lot of MAC questions lately.
Unix questions to be more precise, but yeah we are a heavy Mac shop.