duplicates

As files are shared on the internet, they are often resized, cropped, converted to a different format, subsequently altered by the original or a new artist, or turned into a template and reinterpreted over and over and over. Even if you have a very restrictive importing workflow, your client is almost certainly going to get some duplicates. Some will be interesting alternate versions that you want to keep, and others will be thumbnails and other low-quality garbage you accidentally imported and would rather delete. Along the way, it would be nice to harmonise your ratings and tags to the better files so you don't lose any work.

Finding and processing duplicates within a large collection is impossible to do by hand, so I have written a system to do the heavy lifting for you. It is all on--

the duplicates processing page

On the normal 'new page' selection window, hit special->duplicates processing. This will open this page:

There are three steps to this page:

the duplicates filter

Just like the archive/delete filter, this uses quick mouse-clicks or keyboard shortcuts to assign pairs of potential duplicates a particular new status that is saved back to the database. Depending on the status, different tag and rating and deletion actions will occur.

The system uses pairs because they are the simplest building block of the underlying network of similar files. Two similar files, A and B, have one relationship, A-B, but three similar files would have three: A-B, B-C, and A-C. Larger groups can get very complicated. Making decisions on just two files at a time is fast and easy, leaving the database to handle the difficult implications.

So, the filter works just like a normal media viewer window, except that it only ever presents two files at a time to scroll through. You can set shortcuts for any action, but by default, it uses:

The idea is to compare the two files by scrolling with your mouse wheel and then clicking to assign a status, at which point the next pair will be loaded. If you prefer different shortcuts, you can set them under file->shortcuts or the keyboard icon on the duplicate filter's top hover window. You can also access more 'duplicate decisions' through the labelled buttons and change what happens to the files and their tags and ratings on each different decision through the cog icon on the same top hover window.

Move your move to the top of the media viewer to bring up the top hover window. Hit the cog or keyboard icons to edit how it works, and click the buttons if you do not have a shortcut mapped. 'Custom action' lets you one of the other four actions but with one-off content merge options--say if you want to set that files are alternate but still with to merge some tags.

Because of technical limitations, you may be asked to checkpoint (save your progress to the database and then continue filtering) every now and then.

different duplicate statuses

There are currently five possible statuses. The client uses different logic to apply them at the database level, so please treat them as described and not a different scheme.

the future

This only supports jpgs and pngs at the moment, but I will attempt to add video in a future iteration. And as I said above, I would like to add more search algorithms beyond this first phash system, and there is plenty of db and gui stuff to add to provide support for 'this image has a parent'-type notification and navigation actions for alternates.