Page 1 of 1


Posted: 15. Oct 2012, 14:07
by mimosa
This is a C script for locating duplicate files. The output isn't as human readable as some other similar projects, but the algorithm is robust and it is faster than those I've tried. Really, for larger scans, the idea is to parse dupedit's output with another script.

Here is the logic of the output:
Files with a common first number (first level grouping) are identical — they contain the same data.
x.x.x filename

Files with a common first and second number (second level grouping) are physically the same file — writing to one file will affect all.
x.x.x filename

The third number is there only to enumerate members of the second level group, and is omitted when there is only one member.
x.x.x filename

For human readability, each first level group is preceded by a comment of the form:
-- #<first level group number> -- <filesize> --