![]() I mentioned MusicBee only because I'm using those to confirm that the files are (or are not) working, but neither the files nor their tags are being edited in that process. But that's not what's happening in this case. I'm aware that different id3 tags will make otherwise identical files have different checksums. I'm not using anything other than Windows to copy them. These are all just files in folders, that have been copied using Windows copy functions, from one drive/folder to another. ![]() But no files are in a music player at any point in the copying or checksum process. I mention this only because I'm really not sure what to think. But the state of these files is not changing in between runs, and I wouldn't think that a file's corruption could affect a sha256 test, since no matter how corrupted the file may be it is still stable in its corrupted state - or so I would think. There are no long path names in the set now on the T7.Īs I mentioned above, I know that some of these files have been previously corrupted, e.g., there are mp3 files that when brought into MusicBee will not play, and that Reaper will not even load. Some of the files have had extremely long path names, which have been corrected either by renaming the paths or moving their containing folders higher up in the tree. ![]() I'm trying to cull these and produce a definitive archive, and know from the start that there are huge numbers of dupes. They were moved to the T7 from several WD Passports, where they exist as disorganized backups, made haphazardly over the years by copying full folder trees from different computers. The files are on a new Samsung 1Tb T7 SSD. But when I regroup those into larger sets, the same inconsistencies appear. When I run sha256 against the 2nd-level subfolders, the results are much more consistent. I'm not doing anything to the files in between runs, which are performed one right after the other. And the results list differs by small amounts, just 1 to 3 files out of thousands, each time I run it. I am doing identical SHA256 runs, on an identical set of files that is unchanged between runs. I have tried running the SHA256 comparison on the individual folders, and those results are more consistent, but will still differ by small amounts. Is it the size of the data set? The size of some of the files? Some of the files may be corrupt, but would that affect the ability of a comparison test to consistently read them? I find the same slight variations when I run the files through MD5. I am wondering what could account for the slightly different SHA256 results when run on the identical set of files. And that number is the same each time I run it, but I realize that checking for NSD is very different for checksums. Most of these numbers I've seen more than once.Īll the numbers are plausible: When I run a comparison on the basis of Name, Size, and Date, the results are in the same ball park, 19621, as expected. When I run SHA256 against the folder, I am getting a slightly different dupe count each time: 19668, 19204 19671, 19675, 19669, 19673. I am using TreeSize to check for dupes, using a SHA256 checksum, which I am running on the top level folder. Each set (with its own folder structure) is now in its own subfolder, and these are all under a single top level pool folder. These 20,000 files were pooled to an SSD from multiple backup or archive sets located on different hard drives. I'm using a Surface Pro 6 i7 with Windows 10 Pro and 8 Gb RAM. ![]() The majority of files are small, in the 5 Mb to 50 Mb range, but several hundred of them are between. I know that roughly 80%-90% of the files will have dupes. Some dupes will have identical names, others will not. ![]() I am trying to find duplicate audio files in a folder structure that contains a little over 20,000 files. ![]()
0 Comments
Leave a Reply. |