diff --git a/docs/changelog.md b/docs/changelog.md
index ad2bebc2..fa310d3b 100644
--- a/docs/changelog.md
+++ b/docs/changelog.md
@@ -7,6 +7,34 @@ title: Changelog
!!! note
This is the new changelog, only the most recent builds. For all versions, see the [old changelog](old_changelog.html).
+## [Version 513](https://github.com/hydrusnetwork/hydrus/releases/tag/v513)
+
+### client api
+
+* the Client API now supports the duplicates system! this is early stages, and what I've exposed is ugly and technical, but if you want to try out some external dupe processing, give it a go and let me know what you think! (issue #347)
+* a new 'manage file relationships' permission gives your api keys access
+* the new GET commands are:
+* - `/manage_file_relationships/get_file_relationships`, which fetches potential dupes, dupes, alternates, false positives, and dupe kings
+* - `/manage_file_relationships/get_potentials_count`, which can take two file searches, a potential dupes search type, a pixel match type, and max hamming distance, and will give the number of potential pairs in that domain
+* - `/manage_file_relationships/get_potential_pairs`, which takes the same params as count and a `max_num_pairs` and gives you a batch of pairs to process, just like the dupe filter
+* - `/manage_file_relationships/get_random_potentials`, which takes the same params as count and gives you some hashes just like the 'show some random potential pairs' button
+* the new POST commands are:
+* - `/manage_file_relationships/set_file_relationships`, which sets potential/dupe/alternate/false positive relationships between file pairs with some optional content merge and file deletes
+* - `/manage_file_relationships/set_kings`, which sets duplicate group kings
+* more commands will be written in the future for various remove/dissolve actions
+* wrote unit tests for all the commands!
+* wrote help for all the commands!
+* fixed an issue in the '/manage_pages/get_pages' call where the response data structure was saying 'focused' instead of 'selected' for 'page of pages'
+* cilent api version is now 40
+
+### boring misc cleanup and refactoring
+
+* cleaned and wrote some more parsing methods for the api to support duplicate search tech and reduce copypasted parsing code
+* renamed the client api permission labels a little, just making it all clearer and line up better. also, the 'edit client permissions' dialog now sorts the permissions
+* reordered and renamed the dev help headers in the same way
+* simple but significant rename-refactoring in file duplicates database module, tearing off the old 'Duplicates' prefixes to every method ha ha
+* updated the advanced Windows 'running from source' help to talk more about VC build tools. some old scripts don't seem to work any more in Win 11, but you also don't really need it any more (I moved to a new dev machine this week so had to set everything up again)
+
## [Version 512](https://github.com/hydrusnetwork/hydrus/releases/tag/v512)
### two searches in duplicates
@@ -451,36 +479,3 @@ title: Changelog
* cleaned up a bunch of related metadata importer/exporter code
* cleaned import folder code
* cleaned hdd importer code
-
-## [Version 503](https://github.com/hydrusnetwork/hydrus/releases/tag/v503)
-
-### misc
-* fixed show/hiding the main gui splitters after a regression in v502. also, keyboard focus after these events should now be less jank
-* thanks to a user, the Deviant Art parser we rolled back to recently now gets video support. I also added artist tag parsing like the api parser used to do
-* if you use the internal client database backup system, it now says in the menu when it was last run. this menu doesn't update often, so I put a bit of buffer in where it says 'did one recently'. let me know if the numbers here are ever confusing
-* fixed a bug where the database menu was not immediately updating the first time you set a backup location
-* if an apng has sub-millisecond frame durations (seems to be jitter-apngs that were created oddly), these are now each rounded up to 1ms. any apngs that previously appeared to have 0 duration now have borked-tiny but valid duration and will now import ok
-* the client now catches 529 error responses from servers (service is overloaded) and treats them like a 429/509 bandwidth problem, waiting for a bit before retrying. more work may be needed here
-* the new popup toaster should restore from minimised better
-* fixed a subtle bug where trashing and untrashing a file when searching the special 'all my files' domain would temporarily sort that file at the front/end of sorting by 'import time'
-* added 'dateutil present' to _help->about_ and reordered all the entries for readability
-* brushed up the network job response-bytes-size counting logic a little more
-* cleaned up the EVT_ICONIZE event processing wx/Qt patch
-
-### running from source is now easy on Windows
-* as I expect to drop Qt5 support in the builds next week, we need an easy way for Windows 7 and other older-OS users to run from source. I am by no means an expert at this, but I have written some easy-setup scripts that can get you running the client in Windows from nothing in a few minutes with no python experience
-* the help is updated to reflect this, with more pointers to 'running from source', and that page now has a new guide that takes you through it all in simple steps
-* there's a client-user.bat you can edit to add your own launch parameters, and a setup_help.bat to build the help too
-* all the requirements.txts across the program have had a full pass. all are now similarly formatted for easy future editing. it is now simple to select whether you want Qt5 or Qt6, and seeing the various differences between the documents is now obvious
-* the .gitignore has been updated to not stomp over your venv, mpv/ffmpeg/sqlite, or client-user.bat
-* feedback on how this works and how to make it better would be appreciated, and once we are happy with the workflow, I will invite Linux and macOS users to generate equivalent .sh and .command scripts so we are multiplatform-easy
-
-### build stuff
-* _this is all wizard nonsense, so you can ignore it. I am mostly just noting it here for my records. tl;dr: I fixed more boot problems, now and in the future_
-* just when I was getting on top of the latest boot problems, we had another one last week, caused by yet another external library that updated unusually, this time just a day after the normal release. it struck some users who run from source (such as AUR), and the macOS hotfix I put out on saturday. it turns out PySide6 6.4.0 is not yet supported by qtpy. since these big libraries' bleeding edge versions are common problems, I have updated all the requirements.txts across the program to set specific versions for qtpy, PySide2/PySide6, opencv-python-headless, requests, python-mpv, and setuptools (issue #1254)
-* updated all the requirements.txts with 'python-dateutil', which has spotty default support and whose absence broke some/all of the macOS and Docker deployments last week
-* added failsafe code in case python-dateutil is not available
-* pylzma is no longer in the main requirements.txt. it doesn't have a wheel (and hence needs compiler tech to pip install), and it is only useful for some weird flash files. UPDATE: with the blessed assistance of stackexchange, I rewrote the 'decompress lzma-compressed flash file' routine to re-munge the flash header into a proper lzma header and use the python default 'lzma' library, so 'pylzma' is no longer needed and removed from all requirements.txts
-* updated most of the actions in the build script to use updated node16 versions. node12 just started getting deprecation warnings. there is more work to do
-* replaced the node12 pip installer action with a manual command on the reworked requirements.txts
-* replaced most of the build script's uses of 'set-output', which just started getting deprecation warnings. there is more work to do
diff --git a/docs/developer_api.md b/docs/developer_api.md
index 85d60624..c7568003 100644
--- a/docs/developer_api.md
+++ b/docs/developer_api.md
@@ -145,13 +145,15 @@ Arguments:
The permissions are currently:
- * 0 - Import URLs
- * 1 - Import Files
- * 2 - Add Tags
- * 3 - Search for Files
+ * 0 - Import and Edit URLs
+ * 1 - Import and Delete Files
+ * 2 - Edit File Tags
+ * 3 - Search for and Fetch Files
* 4 - Manage Pages
* 5 - Manage Cookies
* 6 - Manage Database
+ * 7 - Edit File Notes
+ * 8 - Manage File Relationships
``` title="Example request"
/request_new_permissions?name=my%20import%20script&basic_permissions=[0,1]
@@ -336,7 +338,7 @@ Response:
* 99 - server administration
-## Adding Files
+## Importing and Deleting Files
### **POST `/add_files/add_file`** { id="add_files_add_file" }
@@ -352,7 +354,7 @@ Arguments (in JSON):
: - `path`: (the path you want to import)
```json title="Example request body"
-{"path" : "E:\to_import\ayanami.jpg"}
+{"path" : "E:\\to_import\\ayanami.jpg"}
```
Arguments (as bytes):
@@ -505,212 +507,7 @@ You can use hash or hashes, whichever is more convenient.
This puts files back in the inbox, taking them out of the archive. It only has meaning for files currently in 'my files' or 'trash'. There is no error if any files do not currently exist or are already in the inbox.
-## Adding Tags
-
-### **GET `/add_tags/clean_tags`** { id="add_tags_clean_tags" }
-
-_Ask the client about how it will see certain tags._
-
-Restricted access:
-: YES. Add Tags permission needed.
-
-Required Headers: n/a
-
-Arguments (in percent-encoded JSON):
-:
-* `tags`: (a list of the tags you want cleaned)
-
-Example request:
-: Given tags `#!json [ " bikini ", "blue eyes", " character : samus aran ", " :)", " ", "", "10", "11", "9", "system:wew", "-flower" ]`:
- ```
- /add_tags/clean_tags?tags=%5B%22%20bikini%20%22%2C%20%22blue%20%20%20%20eyes%22%2C%20%22%20character%20%3A%20samus%20aran%20%22%2C%20%22%3A%29%22%2C%20%22%20%20%20%22%2C%20%22%22%2C%20%2210%22%2C%20%2211%22%2C%20%229%22%2C%20%22system%3Awew%22%2C%20%22-flower%22%5D
- ```
-
-Response:
-: The tags cleaned according to hydrus rules. They will also be in hydrus human-friendly sorting order.
-```json title="Example response"
-{
- "tags" : ["9", "10", "11", " ::)", "bikini", "blue eyes", "character:samus aran", "flower", "wew"]
-}
-```
-
- Mostly, hydrus simply trims excess whitespace, but the other examples are rare issues you might run into. 'system' is an invalid namespace, tags cannot be prefixed with hyphens, and any tag starting with ':' is secretly dealt with internally as "\[no namespace\]:\[colon-prefixed-subtag\]". Again, you probably won't run into these, but if you see a mismatch somewhere and want to figure it out, or just want to sort some numbered tags, you might like to try this.
-
-
-### **GET `/add_tags/get_tag_services`** { id="add_tags_get_tag_services" }
-
-!!! warning "Deprecated"
- This is becoming obsolete and will be removed! Use [/get_services](#get_services) instead!
-
-_Ask the client about its tag services._
-
-Restricted access:
-: YES. Add Tags permission needed.
-
-Required Headers: n/a
-
-Arguments: n/a
-
-Response:
-: Some JSON listing the client's 'local tags' and tag repository services by name.
-```json title="Example response"
-{
- "local_tags" : ["my tags"],
- "tag_repositories" : [ "public tag repository", "mlp fanfic tagging server" ]
-}
-```
-
- !!! note
- A user can rename their services. Don't assume the client's local tags service will be "my tags".
-
-### **GET `/add_tags/search_tags`** { id="add_tags_search_tags" }
-
-_Search the client for tags._
-
-Restricted access:
-: YES. Search for Files permission needed.
-
-Required Headers: n/a
-
-Arguments:
-:
-* `search`: (the tag text to search for, enter exactly what you would in the client UI)
-* `tag_service_key`: (optional, selective, hexadecimal, the tag domain on which to search)
-* `tag_service_name`: (optional, selective, string, the tag domain on which to search)
-* `tag_display_type`: (optional, string, to select whether to search raw or sibling-processed tags)
-
-Example request:
-:
-```http title="Example request"
-/add_tags/search_tags?search=kim
-```
-
-Response:
-: Some JSON listing the client's matching tags.
-
-:
-```json title="Example response"
-{
- "tags" : [
- {
- "value" : "series:kim possible",
- "count" : 3
- },
- {
- "value" : "kimchee",
- "count" : 2
- },
- {
- "value" : "character:kimberly ann possible",
- "count" : 1
- }
- ]
-}
-```
-
-The `tags` list will be sorted by descending count. If you do not specify a tag service, it will default to 'all known tags'. The various rules in _tags->manage tag display and search_ (e.g. no pure `*` searches on certain services) will also be checked--and if violated, you will get 200 OK but an empty result.
-
-The `tag_display_type` can be either `storage` (the default), which searches your file's stored tags, just as they appear in a 'manage tags' dialog, or `display`, which searches the sibling-processed tags, just as they appear in a normal file search page. In the example above, setting the `tag_display_type` to `display` could well combine the two kim possible tags and give a count of 3 or 4.
-
-Note that if your client api access is only allowed to search certain tags, the results will be similarly filtered.
-
-Also, for now, it gives you the 'storage' tags, which are the 'raw' ones you see in the manage tags dialog, without collapsed siblings, but more options will be added in future.
-
-### **POST `/add_tags/add_tags`** { id="add_tags_add_tags" }
-
-_Make changes to the tags that files have._
-
-Restricted access:
-: YES. Add Tags permission needed.
-
-Required Headers: n/a
-
-Arguments (in JSON):
-:
-* `hash`: (selective A, an SHA256 hash for a file in 64 characters of hexadecimal)
-* `hashes`: (selective A, a list of SHA256 hashes)
-* `file_id`: (a numerical file id)
-* `file_ids`: (a list of numerical file ids)
-* `service_names_to_tags`: (selective B, an Object of service names to lists of tags to be 'added' to the files)
-* `service_keys_to_tags`: (selective B, an Object of service keys to lists of tags to be 'added' to the files)
-* `service_names_to_actions_to_tags`: (selective B, an Object of service names to content update actions to lists of tags)
-* `service_keys_to_actions_to_tags`: (selective B, an Object of service keys to content update actions to lists of tags)
-
- You can use either 'hash' or 'hashes'.
-
- You can use either 'service\_names\_to...' or 'service\_keys\_to...', where names is simple and human-friendly "my tags" and similar (but may be renamed by a user), but keys is a little more complicated but accurate/unique. Since a client may have multiple tag services with non-default names and pseudo-random keys, if it is not your client you will need to check the [/get_services](#get_services) call to get the names or keys, and you may need some selection UI on your end so the user can pick what to do if there are multiple choices. I encourage using keys if you can.
-
- Also, you can use either '...to\_tags', which is simple and add-only, or '...to\_actions\_to\_tags', which is more complicated and allows you to remove/petition or rescind pending content.
-
- The permitted 'actions' are:
-
- * 0 - Add to a local tag service.
- * 1 - Delete from a local tag service.
- * 2 - Pend to a tag repository.
- * 3 - Rescind a pend from a tag repository.
- * 4 - Petition from a tag repository. (This is special)
- * 5 - Rescind a petition from a tag repository.
-
- When you petition a tag from a repository, a 'reason' for the petition is typically needed. If you send a normal list of tags here, a default reason of "Petitioned from API" will be given. If you want to set your own reason, you can instead give a list of \[ tag, reason \] pairs.
-
-Some example requests:
-:
-```json title="Adding some tags to a file"
-{
- "hash" : "df2a7b286d21329fc496e3aa8b8a08b67bb1747ca32749acb3f5d544cbfc0f56",
- "service_names_to_tags" : {
- "my tags" : ["character:supergirl", "rating:safe"]
- }
-}
-```
-```json title="Adding more tags to two files"
-{
- "hashes" : [
- "df2a7b286d21329fc496e3aa8b8a08b67bb1747ca32749acb3f5d544cbfc0f56",
- "f2b022214e711e9a11e2fcec71bfd524f10f0be40c250737a7861a5ddd3faebf"
- ],
- "service_names_to_tags" : {
- "my tags" : ["process this"],
- "public tag repository" : ["creator:dandon fuga"]
- }
-}
-```
-```json title="A complicated transaction with all possible actions"
-{
- "hash" : "df2a7b286d21329fc496e3aa8b8a08b67bb1747ca32749acb3f5d544cbfc0f56",
- "service_keys_to_actions_to_tags" : {
- "6c6f63616c2074616773" : {
- "0" : ["character:supergirl", "rating:safe"],
- "1" : ["character:superman"]
- },
- "aa0424b501237041dab0308c02c35454d377eebd74cfbc5b9d7b3e16cc2193e9" : {
- "2" : ["character:supergirl", "rating:safe"],
- "3" : ["filename:image.jpg"],
- "4" : [["creator:danban faga", "typo"], ["character:super_girl", "underscore"]],
- "5" : ["skirt"]
- }
- }
-}
-```
-
- This last example is far more complicated than you will usually see. Pend rescinds and petition rescinds are not common. Petitions are also quite rare, and gathering a good petition reason for each tag is often a pain.
-
- Note that the enumerated status keys in the service\_names\_to\_actions\_to_tags structure are strings, not ints (JSON does not support int keys for Objects).
-
-Response description:
-: 200 and no content.
-
-!!! note
- Note also that hydrus tag actions are safely idempotent. You can pend a tag that is already pended, or add a tag that already exists, and not worry about an error--the surplus add action will be discarded. The same is true if you try to pend a tag that actually already exists, or rescinding a petition that doesn't. Any invalid actions will fail silently.
-
- It is fine to just throw your 'process this' tags at every file import and not have to worry about checking which files you already added them to.
-
-!!! danger "HOWEVER"
- When you delete a tag, a deletion record is made _even if the tag does not exist on the file_. This is important if you expect to add the tags again via parsing, because, in general, when hydrus adds tags through a downloader, it will not overwrite a previously 'deleted' tag record (this is to stop re-downloads overwriting the tags you hand-removed previously). Undeletes usually have to be done manually by a human.
-
- So, _do_ be careful about how you spam delete unless it is something that doesn't matter or it is something you'll only be touching again via the API anyway.
-
-## Adding URLs
+## Importing and Editing URLs
### **GET `/add_urls/get_url_files`** { id="add_urls_get_url_files" }
@@ -927,7 +724,212 @@ Response:
: 200 with no content. Like when adding tags, this is safely idempotent--do not worry about re-adding URLs associations that already exist or accidentally trying to delete ones that don't.
-## Adding Notes
+## Editing File Tags
+
+### **GET `/add_tags/clean_tags`** { id="add_tags_clean_tags" }
+
+_Ask the client about how it will see certain tags._
+
+Restricted access:
+: YES. Add Tags permission needed.
+
+Required Headers: n/a
+
+Arguments (in percent-encoded JSON):
+:
+* `tags`: (a list of the tags you want cleaned)
+
+Example request:
+: Given tags `#!json [ " bikini ", "blue eyes", " character : samus aran ", " :)", " ", "", "10", "11", "9", "system:wew", "-flower" ]`:
+ ```
+ /add_tags/clean_tags?tags=%5B%22%20bikini%20%22%2C%20%22blue%20%20%20%20eyes%22%2C%20%22%20character%20%3A%20samus%20aran%20%22%2C%20%22%3A%29%22%2C%20%22%20%20%20%22%2C%20%22%22%2C%20%2210%22%2C%20%2211%22%2C%20%229%22%2C%20%22system%3Awew%22%2C%20%22-flower%22%5D
+ ```
+
+Response:
+: The tags cleaned according to hydrus rules. They will also be in hydrus human-friendly sorting order.
+```json title="Example response"
+{
+ "tags" : ["9", "10", "11", " ::)", "bikini", "blue eyes", "character:samus aran", "flower", "wew"]
+}
+```
+
+ Mostly, hydrus simply trims excess whitespace, but the other examples are rare issues you might run into. 'system' is an invalid namespace, tags cannot be prefixed with hyphens, and any tag starting with ':' is secretly dealt with internally as "\[no namespace\]:\[colon-prefixed-subtag\]". Again, you probably won't run into these, but if you see a mismatch somewhere and want to figure it out, or just want to sort some numbered tags, you might like to try this.
+
+
+### **GET `/add_tags/get_tag_services`** { id="add_tags_get_tag_services" }
+
+!!! warning "Deprecated"
+ This is becoming obsolete and will be removed! Use [/get_services](#get_services) instead!
+
+_Ask the client about its tag services._
+
+Restricted access:
+: YES. Add Tags permission needed.
+
+Required Headers: n/a
+
+Arguments: n/a
+
+Response:
+: Some JSON listing the client's 'local tags' and tag repository services by name.
+```json title="Example response"
+{
+ "local_tags" : ["my tags"],
+ "tag_repositories" : [ "public tag repository", "mlp fanfic tagging server" ]
+}
+```
+
+ !!! note
+ A user can rename their services. Don't assume the client's local tags service will be "my tags".
+
+### **GET `/add_tags/search_tags`** { id="add_tags_search_tags" }
+
+_Search the client for tags._
+
+Restricted access:
+: YES. Search for Files permission needed.
+
+Required Headers: n/a
+
+Arguments:
+:
+* `search`: (the tag text to search for, enter exactly what you would in the client UI)
+* `tag_service_key`: (optional, selective, hexadecimal, the tag domain on which to search)
+* `tag_service_name`: (optional, selective, string, the tag domain on which to search)
+* `tag_display_type`: (optional, string, to select whether to search raw or sibling-processed tags)
+
+Example request:
+:
+```http title="Example request"
+/add_tags/search_tags?search=kim
+```
+
+Response:
+: Some JSON listing the client's matching tags.
+
+:
+```json title="Example response"
+{
+ "tags" : [
+ {
+ "value" : "series:kim possible",
+ "count" : 3
+ },
+ {
+ "value" : "kimchee",
+ "count" : 2
+ },
+ {
+ "value" : "character:kimberly ann possible",
+ "count" : 1
+ }
+ ]
+}
+```
+
+The `tags` list will be sorted by descending count. If you do not specify a tag service, it will default to 'all known tags'. The various rules in _tags->manage tag display and search_ (e.g. no pure `*` searches on certain services) will also be checked--and if violated, you will get 200 OK but an empty result.
+
+The `tag_display_type` can be either `storage` (the default), which searches your file's stored tags, just as they appear in a 'manage tags' dialog, or `display`, which searches the sibling-processed tags, just as they appear in a normal file search page. In the example above, setting the `tag_display_type` to `display` could well combine the two kim possible tags and give a count of 3 or 4.
+
+Note that if your client api access is only allowed to search certain tags, the results will be similarly filtered.
+
+Also, for now, it gives you the 'storage' tags, which are the 'raw' ones you see in the manage tags dialog, without collapsed siblings, but more options will be added in future.
+
+### **POST `/add_tags/add_tags`** { id="add_tags_add_tags" }
+
+_Make changes to the tags that files have._
+
+Restricted access:
+: YES. Add Tags permission needed.
+
+Required Headers: n/a
+
+Arguments (in JSON):
+:
+* `hash`: (selective A, an SHA256 hash for a file in 64 characters of hexadecimal)
+* `hashes`: (selective A, a list of SHA256 hashes)
+* `file_id`: (a numerical file id)
+* `file_ids`: (a list of numerical file ids)
+* `service_names_to_tags`: (selective B, an Object of service names to lists of tags to be 'added' to the files)
+* `service_keys_to_tags`: (selective B, an Object of service keys to lists of tags to be 'added' to the files)
+* `service_names_to_actions_to_tags`: (selective B, an Object of service names to content update actions to lists of tags)
+* `service_keys_to_actions_to_tags`: (selective B, an Object of service keys to content update actions to lists of tags)
+
+ You can use either 'hash' or 'hashes'.
+
+ You can use either 'service\_names\_to...' or 'service\_keys\_to...', where names is simple and human-friendly "my tags" and similar (but may be renamed by a user), but keys is a little more complicated but accurate/unique. Since a client may have multiple tag services with non-default names and pseudo-random keys, if it is not your client you will need to check the [/get_services](#get_services) call to get the names or keys, and you may need some selection UI on your end so the user can pick what to do if there are multiple choices. I encourage using keys if you can.
+
+ Also, you can use either '...to\_tags', which is simple and add-only, or '...to\_actions\_to\_tags', which is more complicated and allows you to remove/petition or rescind pending content.
+
+ The permitted 'actions' are:
+
+ * 0 - Add to a local tag service.
+ * 1 - Delete from a local tag service.
+ * 2 - Pend to a tag repository.
+ * 3 - Rescind a pend from a tag repository.
+ * 4 - Petition from a tag repository. (This is special)
+ * 5 - Rescind a petition from a tag repository.
+
+ When you petition a tag from a repository, a 'reason' for the petition is typically needed. If you send a normal list of tags here, a default reason of "Petitioned from API" will be given. If you want to set your own reason, you can instead give a list of \[ tag, reason \] pairs.
+
+Some example requests:
+:
+```json title="Adding some tags to a file"
+{
+ "hash" : "df2a7b286d21329fc496e3aa8b8a08b67bb1747ca32749acb3f5d544cbfc0f56",
+ "service_names_to_tags" : {
+ "my tags" : ["character:supergirl", "rating:safe"]
+ }
+}
+```
+```json title="Adding more tags to two files"
+{
+ "hashes" : [
+ "df2a7b286d21329fc496e3aa8b8a08b67bb1747ca32749acb3f5d544cbfc0f56",
+ "f2b022214e711e9a11e2fcec71bfd524f10f0be40c250737a7861a5ddd3faebf"
+ ],
+ "service_names_to_tags" : {
+ "my tags" : ["process this"],
+ "public tag repository" : ["creator:dandon fuga"]
+ }
+}
+```
+```json title="A complicated transaction with all possible actions"
+{
+ "hash" : "df2a7b286d21329fc496e3aa8b8a08b67bb1747ca32749acb3f5d544cbfc0f56",
+ "service_keys_to_actions_to_tags" : {
+ "6c6f63616c2074616773" : {
+ "0" : ["character:supergirl", "rating:safe"],
+ "1" : ["character:superman"]
+ },
+ "aa0424b501237041dab0308c02c35454d377eebd74cfbc5b9d7b3e16cc2193e9" : {
+ "2" : ["character:supergirl", "rating:safe"],
+ "3" : ["filename:image.jpg"],
+ "4" : [["creator:danban faga", "typo"], ["character:super_girl", "underscore"]],
+ "5" : ["skirt"]
+ }
+ }
+}
+```
+
+ This last example is far more complicated than you will usually see. Pend rescinds and petition rescinds are not common. Petitions are also quite rare, and gathering a good petition reason for each tag is often a pain.
+
+ Note that the enumerated status keys in the service\_names\_to\_actions\_to_tags structure are strings, not ints (JSON does not support int keys for Objects).
+
+Response description:
+: 200 and no content.
+
+!!! note
+ Note also that hydrus tag actions are safely idempotent. You can pend a tag that is already pended, or add a tag that already exists, and not worry about an error--the surplus add action will be discarded. The same is true if you try to pend a tag that actually already exists, or rescinding a petition that doesn't. Any invalid actions will fail silently.
+
+ It is fine to just throw your 'process this' tags at every file import and not have to worry about checking which files you already added them to.
+
+!!! danger "HOWEVER"
+ When you delete a tag, a deletion record is made _even if the tag does not exist on the file_. This is important if you expect to add the tags again via parsing, because, in general, when hydrus adds tags through a downloader, it will not overwrite a previously 'deleted' tag record (this is to stop re-downloads overwriting the tags you hand-removed previously). Undeletes usually have to be done manually by a human.
+
+ So, _do_ be careful about how you spam delete unless it is something that doesn't matter or it is something you'll only be touching again via the API anyway.
+
+## Editing File Notes
### **POST `/add_notes/set_notes`** { id="add_notes_set_notes" }
@@ -1010,353 +1012,7 @@ Arguments (in percent-encoded JSON):
Response:
: 200 with no content. This operation is idempotent.
-## Managing Cookies and HTTP Headers
-
-This refers to the cookies held in the client's session manager, which are sent with network requests to different domains.
-
-### **GET `/manage_cookies/get_cookies`** { id="manage_cookies_get_cookies" }
-
-_Get the cookies for a particular domain._
-
-Restricted access:
-: YES. Manage Cookies permission needed.
-
-Required Headers: n/a
-
-Arguments:
-: * `domain`
-
- ``` title="Example request (for gelbooru.com)"
- /manage_cookies/get_cookies?domain=gelbooru.com
- ```
-
-
-Response:
-: A JSON Object listing all the cookies for that domain in \[ name, value, domain, path, expires \] format.
-```json title="Example response"
-{
- "cookies" : [
- ["__cfduid", "f1bef65041e54e93110a883360bc7e71", ".gelbooru.com", "/", 1596223327],
- ["pass_hash", "0b0833b797f108e340b315bc5463c324", "gelbooru.com", "/", 1585855361],
- ["user_id", "123456", "gelbooru.com", "/", 1585855361]
- ]
-}
-```
-
- Note that these variables are all strings except 'expires', which is either an integer timestamp or _null_ for session cookies.
-
- This request will also return any cookies for subdomains. The session system in hydrus generally stores cookies according to the second-level domain, so if you request for specific.someoverbooru.net, you will still get the cookies for someoverbooru.net and all its subdomains.
-
-### **POST `/manage_cookies/set_cookies`** { id="manage_cookies_set_cookies" }
-
-Set some new cookies for the client. This makes it easier to 'copy' a login from a web browser or similar to hydrus if hydrus's login system can't handle the site yet.
-
-Restricted access:
-: YES. Manage Cookies permission needed.
-
-Required Headers:
-:
- * `Content-Type`: application/json
-
-Arguments (in JSON):
-:
- * `cookies`: (a list of cookie rows in the same format as the GET request above)
-
-```json title="Example request body"
-{
- "cookies" : [
- ["PHPSESSID", "07669eb2a1a6e840e498bb6e0799f3fb", ".somesite.com", "/", 1627327719],
- ["tag_filter", "1", ".somesite.com", "/", 1627327719]
- ]
-}
-```
-
-You can set 'value' to be null, which will clear any existing cookie with the corresponding name, domain, and path (acting essentially as a delete).
-
-Expires can be null, but session cookies will time-out in hydrus after 60 minutes of non-use.
-
-### **POST `/manage_headers/set_user_agent`** { id="manage_headers_set_user_agent" }
-
-This sets the 'Global' User-Agent for the client, as typically editable under _network->data->manage http headers_, for instance if you want hydrus to appear as a specific browser associated with some cookies.
-
-Restricted access:
-: YES. Manage Cookies permission needed.
-
-Required Headers:
-:
- * `Content-Type`: application/json
-
-Arguments (in JSON):
-:
- * `user-agent`: (a string)
-
-```json title="Example request body"
-{
- "user-agent" : "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0"
-}
-```
-
-Send an empty string to reset the client back to the default User-Agent, which should be `Mozilla/5.0 (compatible; Hydrus Client)`.
-
-## Managing Pages
-
-This refers to the pages of the main client UI.
-
-### **GET `/manage_pages/get_pages`** { id="manage_pages_get_pages" }
-
-_Get the page structure of the current UI session._
-
-Restricted access:
-: YES. Manage Pages permission needed.
-
-Required Headers: n/a
-
-Arguments: n/a
-
-
-Response:
-: A JSON Object of the top-level page 'notebook' (page of pages) detailing its basic information and current sub-pages. Page of pages beneath it will list their own sub-page lists.
-```json title="Example response"
-{
- "pages" : {
- "name" : "top pages notebook",
- "page_key" : "3b28d8a59ec61834325eb6275d9df012860a1ecfd9e1246423059bc47fb6d5bd",
- "page_type" : 10,
- "selected" : true,
- "pages" : [
- {
- "name" : "files",
- "page_key" : "d436ff5109215199913705eb9a7669d8a6b67c52e41c3b42904db083255ca84d",
- "page_type" : 6,
- "selected" : false
- },
- {
- "name" : "thread watcher",
- "page_key" : "40887fa327edca01e1d69b533dddba4681b2c43e0b4ebee0576177852e8c32e7",
- "page_type" : 9,
- "selected" : false
- },
- {
- "name" : "pages",
- "page_key" : "2ee7fa4058e1e23f2bd9e915cdf9347ae90902a8622d6559ba019a83a785c4dc",
- "page_type" : 10,
- "selected" : true,
- "pages" : [
- {
- "name" : "urls",
- "page_key" : "9fe22cb760d9ee6de32575ed9f27b76b4c215179cf843d3f9044efeeca98411f",
- "page_type" : 7,
- "selected" : true
- },
- {
- "name" : "files",
- "page_key" : "2977d57fc9c588be783727bcd54225d577b44e8aa2f91e365a3eb3c3f580dc4e",
- "page_type" : 6,
- "selected" : false
- }
- ]
- }
- ]
- }
-}
-```
-
- The page types are as follows:
-
- * 1 - Gallery downloader
- * 2 - Simple downloader
- * 3 - Hard drive import
- * 5 - Petitions (used by repository janitors)
- * 6 - File search
- * 7 - URL downloader
- * 8 - Duplicates
- * 9 - Thread watcher
- * 10 - Page of pages
-
- The top page of pages will always be there, and always selected. 'selected' means which page is currently in view and will propagate down other page of pages until it terminates. It may terminate in an empty page of pages, so do not assume it will end on a 'media' page.
-
- The 'page_key' is a unique identifier for the page. It will stay the same for a particular page throughout the session, but new ones are generated on a client restart or other session reload.
-
-### **GET `/manage_pages/get_page_info`** { id="manage_pages_get_page_info" }
-
-_Get information about a specific page._
-
-!!! warning "Under Construction"
- This is under construction. The current call dumps a ton of info for different downloader pages. Please experiment in IRL situations and give feedback for now! I will flesh out this help with more enumeration info and examples as this gets nailed down. POST commands to alter pages (adding, removing, highlighting), will come later.
-
-Restricted access:
-: YES. Manage Pages permission needed.
-
-Required Headers: n/a
-
-Arguments:
-:
- * `page_key`: (hexadecimal page\_key as stated in [/manage\_pages/get\_pages](#manage_pages_get_pages))
- * `simple`: true or false (optional, defaulting to true)
-
- ``` title="Example request"
- /manage_pages/get_page_info?page_key=aebbf4b594e6986bddf1eeb0b5846a1e6bc4e07088e517aff166f1aeb1c3c9da&simple=true
- ```
-
-Response description
-: A JSON Object of the page's information. At present, this mostly means downloader information.
-```json title="Example response with simple = true"
-{
- "page_info" : {
- "name" : "threads",
- "page_key" : "aebbf4b594e6986bddf1eeb0b5846a1e6bc4e07088e517aff166f1aeb1c3c9da",
- "page_type" : 3,
- "management" : {
- "multiple_watcher_import" : {
- "watcher_imports" : [
- {
- "url" : "https://someimageboard.net/m/123456",
- "watcher_key" : "cf8c3525c57a46b0e5c2625812964364a2e801f8c49841c216b8f8d7a4d06d85",
- "created" : 1566164269,
- "last_check_time" : 1566164272,
- "next_check_time" : 1566174272,
- "files_paused" : false,
- "checking_paused" : false,
- "checking_status" : 0,
- "subject" : "gundam pictures",
- "imports" : {
- "status" : "4 successful (2 already in db)",
- "simple_status" : "4",
- "total_processed" : 4,
- "total_to_process" : 4
- },
- "gallery_log" : {
- "status" : "1 successful",
- "simple_status" : "1",
- "total_processed" : 1,
- "total_to_process" : 1
- }
- },
- {
- "url" : "https://someimageboard.net/a/1234",
- "watcher_key" : "6bc17555b76da5bde2dcceedc382cf7d23281aee6477c41b643cd144ec168510",
- "created" : 1566063125,
- "last_check_time" : 1566063133,
- "next_check_time" : 1566104272,
- "files_paused" : false,
- "checking_paused" : true,
- "checking_status" : 1,
- "subject" : "anime pictures",
- "imports" : {
- "status" : "124 successful (22 already in db), 2 previously deleted",
- "simple_status" : "124",
- "total_processed" : 124,
- "total_to_process" : 124
- },
- "gallery_log" : {
- "status" : "3 successful",
- "simple_status" : "3",
- "total_processed" : 3,
- "total_to_process" : 3
- }
- }
- ]
- },
- "highlight" : "cf8c3525c57a46b0e5c2625812964364a2e801f8c49841c216b8f8d7a4d06d85"
- }
- },
- "media" : {
- "num_files" : 4
- }
-}
-```
-
- As you can see, even the 'simple' mode can get very large. Imagine that response for a page watching 100 threads! Turning simple mode off will display every import item, gallery log entry, and all hashes in the media (thumbnail) panel.
-
- For this first version, the five importer pages--hdd import, simple downloader, url downloader, gallery page, and watcher page--all give rich info based on their specific variables. The first three only have one importer/gallery log combo, but the latter two of course can have multiple. The "imports" and "gallery_log" entries are all in the same data format.
-
-
-### **POST `/manage_pages/add_files`** { id="manage_pages_add_files" }
-
-_Add files to a page._
-
-Restricted access:
-: YES. Manage Pages permission needed.
-
-Required Headers:
-:
- * `Content-Type`: application/json
-
-Arguments (in JSON):
-:
- * `page_key`: (the page key for the page you wish to add files to)
- * `file_id`: (selective, a numerical file id)
- * `file_ids`: (selective, a list of numerical file ids)
- * `hash`: (selective, a hexadecimal SHA256 hash)
- * `hashes`: (selective, a list of hexadecimal SHA256 hashes)
-
-You need to use either file_ids or hashes. The files they refer to will be appended to the given page, just like a thumbnail drag and drop operation. The page key is the same as fetched in the [/manage\_pages/get\_pages](#manage_pages_get_pages) call.
-
-```json title="Example request body"
-{
- "page_key" : "af98318b6eece15fef3cf0378385ce759bfe056916f6e12157cd928eb56c1f18",
- "file_ids" : [123, 124, 125]
-}
-```
-
-Response:
-: 200 with no content. If the page key is not found, this will 404.
-
-### **POST `/manage_pages/focus_page`** { id="manage_pages_focus_page" }
-
-_'Show' a page in the main GUI, making it the current page in view. If it is already the current page, no change is made._
-
-Restricted access:
-: YES. Manage Pages permission needed.
-
-Required Headers:
-:
- * `Content-Type`: application/json
-
-Arguments (in JSON):
-:
- * `page_key`: (the page key for the page you wish to show)
-
-The page key is the same as fetched in the [/manage\_pages/get\_pages](#manage_pages_get_pages) call.
-
-```json title="Example request body"
-{
- "page_key" : "af98318b6eece15fef3cf0378385ce759bfe056916f6e12157cd928eb56c1f18"
-}
-```
-
-Response:
-: 200 with no content. If the page key is not found, this will 404.
-
-
-### **POST `/manage_pages/refresh_page`** { id="manage_pages_refresh_page" }
-
-_Refresh a page in the main GUI. Like hitting F5 in the client, this obviously makes file search pages perform their search again, but for other page types it will force the currently in-view files to be re-sorted._
-
-Restricted access:
-: YES. Manage Pages permission needed.
-
-Required Headers:
-:
- * `Content-Type`: application/json
-
-Arguments (in JSON):
-:
- * `page_key`: (the page key for the page you wish to refresh)
-
-The page key is the same as fetched in the [/manage\_pages/get\_pages](#manage_pages_get_pages) call. If a file search page is not set to 'searching immediately', a 'refresh' command does nothing.
-
-```json title="Example request body"
-{
- "page_key" : "af98318b6eece15fef3cf0378385ce759bfe056916f6e12157cd928eb56c1f18"
-}
-```
-
-Response:
-: 200 with no content. If the page key is not found, this will 404.
-
-
-## Searching Files
+## Searching and Fetching Files
File search in hydrus is not paginated like a booru--all searches return all results in one go. In order to keep this fast, search is split into two steps--fetching file identifiers with a search, and then fetching file metadata in batches. You may have noticed that the client itself performs searches like this--thinking a bit about a search and then bundling results in batches of 256 files before eventually throwing all the thumbnails on screen.
@@ -1922,22 +1578,24 @@ If you set `only_return_basic_information=true`, this will be much faster for fi
If you add `detailed_url_information=true`, a new entry, `detailed_known_urls`, will be added for each file, with a list of the same structure as /`add_urls/get_url_info`. This may be an expensive request if you are querying thousands of files at once.
```json title="For example"
-"detailed_known_urls" : [
- {
- "normalised_url" : "https://gelbooru.com/index.php?id=4841557&page=post&s=view",
- "url_type" : 0,
- "url_type_string" : "post url",
- "match_name" : "gelbooru file page",
- "can_parse" : true
- },
- {
- "normalised_url" : "https://img2.gelbooru.com//images/80/c8/80c8646b4a49395fb36c805f316c49a9.jpg",
- "url_type" : 5,
- "url_type_string" : "unknown url",
- "match_name" : "unknown url",
- "can_parse" : false
- }
-]
+{
+ "detailed_known_urls": [
+ {
+ "normalised_url": "https://gelbooru.com/index.php?id=4841557&page=post&s=view",
+ "url_type": 0,
+ "url_type_string": "post url",
+ "match_name": "gelbooru file page",
+ "can_parse": true
+ },
+ {
+ "normalised_url": "https://img2.gelbooru.com//images/80/c8/80c8646b4a49395fb36c805f316c49a9.jpg",
+ "url_type": 5,
+ "url_type_string": "unknown url",
+ "match_name": "unknown url",
+ "can_parse": false
+ }
+ ]
+}
```
@@ -2001,6 +1659,642 @@ Response:
+## Managing File Relationships
+
+This refers to the File Relationships system, which includes 'potential duplicates', 'duplicates', and 'alternates'.
+
+This system is pending significant rework and expansion, so please do not get too married to some of the routines here. I am mostly just exposing my internal commands, so things are a little ugly/hacked. I expect duplicate and alternate groups to get some form of official identifier in future, which may end up being the way to refer and edit things here.
+
+Also, at least for now, 'Manage File Relationships' permission is not going to be bound by the search permission restrictions that normal file search does. Getting this permission allows you to search anything. I expect to add this permission filtering tech in future, particularly for file domains.
+
+_There is more work to do here, including adding various 'dissolve'/'undo' commands to break groups apart._
+
+### **GET `/manage_file_relationships/get_file_relationships`** { id="manage_file_relationships_get_file_relationships" }
+
+_Get the current relationships for one or more files._
+
+Restricted access:
+: YES. Manage File Relationships permission needed.
+
+Required Headers: n/a
+
+Arguments (in percent-encoded JSON):
+:
+ * `file_id`: (selective, a numerical file id)
+ * `file_ids`: (selective, a list of numerical file ids)
+ * `hash`: (selective, a hexadecimal SHA256 hash)
+ * `hashes`: (selective, a list of hexadecimal SHA256 hashes)
+
+``` title="Example request"
+/manage_file_relationships/get_file_relationships?hash=ac940bb9026c430ea9530b4f4f6980a12d9432c2af8d9d39dfc67b05d91df11d
+```
+
+Response:
+: A JSON Object mapping the hashes to their relationships.
+``` json title="Example response"
+{
+ "file_relationships" : {
+ "ac940bb9026c430ea9530b4f4f6980a12d9432c2af8d9d39dfc67b05d91df11d" : {
+ "is_king" : false,
+ "king" : "8784afbfd8b59de3dcf2c13dc1be9d7cb0b3d376803c8a7a8b710c7c191bb657",
+ "0" : [
+ ],
+ "1" : [],
+ "3" : [
+ "8bf267c4c021ae4fd7c4b90b0a381044539519f80d148359b0ce61ce1684fefe"
+ ],
+ "8" : [
+ "8784afbfd8b59de3dcf2c13dc1be9d7cb0b3d376803c8a7a8b710c7c191bb657",
+ "3fa8ef54811ec8c2d1892f4f08da01e7fc17eed863acae897eb30461b051d5c3"
+ ]
+ }
+ }
+}
+```
+
+`is_king` and `king` relate to which file is the set best of a group. The king is usually the best representative of a group if you need to do comparisons between groups, and the 'get some pairs to filter'-style commands usually try to select the kings of the various to-be-compared duplicate groups.
+
+**It is possible for the king to not be available, in which case `king` is null.** The king can be unavailable in several duplicate search contexts, generally when you have the option to search/filter and it is outside of that domain. For this request, the king will usually be available unless the user has deleted it. You have to deal with the king being unavailable--in this situation, your best bet is to just use the file itself as its own representative.
+
+A file that has no duplicates is considered to be in a duplicate group of size 1 and thus is always its own king.
+
+The numbers are from a duplicate status enum, as so:
+
+* 0 - potential duplicates
+* 1 - false positives
+* 3 - alternates
+* 8 - duplicates
+
+Note that because of JSON constraints, these are the string versions of the integers since they are Object keys.
+
+All the hashes given here are in 'all my files', i.e. not in the trash. A file may have duplicates that have long been deleted, but, like the null king above, they will not show here.
+
+### **GET `/manage_file_relationships/get_potentials_count`** { id="manage_file_relationships_get_potentials_count" }
+
+_Get the count of remaining potential duplicate pairs in a particular search domain. Exactly the same as the counts you see in the duplicate processing page._
+
+Restricted access:
+: YES. Manage File Relationships permission needed.
+
+Required Headers: n/a
+
+Arguments (in percent-encoded JSON):
+:
+ * `tag_service_key_1`: (optional, default 'all known tags', a hex tag service key)
+ * `tags_1`: (optional, default system:everything, a list of tags you wish to search for)
+ * `tag_service_key_2`: (optional, default 'all known tags', a hex tag service key)
+ * `tags_2`: (optional, default system:everything, a list of tags you wish to search for)
+ * `potentials_search_type`: (optional, integer, default 0, regarding how the pairs should match the search(es))
+ * `pixel_duplicates`: (optional, integer, default 1, regarding whether the pairs should be pixel duplicates)
+ * `max_hamming_distance`: (optional, integer, default 4, the max 'search distance' of the pairs)
+
+``` title="Example request"
+/manage_file_relationships/get_potentials_count?tag_service_key_1=c1ba23c60cda1051349647a151321d43ef5894aacdfb4b4e333d6c4259d56c5f&tags_1=%5B%22dupes_to_process%22%2C%20%22system%3Awidth%3C400%22%5D&search_type=1&pixel_duplicates=2&max_hamming_distance=0&max_num_pairs=50
+```
+
+`tag_service_key` and `tags` work the same as [/get\_files/search\_files](#get_files_search_files). The `_2` variants are only useful if the `potentials_search_type` is 2. For now the file domain is locked to 'all my files'.
+
+`potentials_search_type` and `pixel_duplicates` are enums:
+
+* 0 - one file matches search 1
+* 1 - both files match search 1
+* 2 - one file matches search 1, the other 2
+
+-and-
+
+* 0 - must be pixel duplicates
+* 1 - can be pixel duplicates
+* 2 - must not be pixel duplicates
+
+The `max_hamming_distance` is the same 'search distance' you see in the Client UI. A higher number means more speculative 'similar files' search. If `pixel_duplicates` is set to 'must be', then `max_hamming_distance` is obviously ignored.
+
+Response:
+: A JSON Object stating the count.
+``` json title="Example response"
+{
+ "potential_duplicates_count" : 17
+}
+```
+
+If you confirm that a pair of potentials are duplicates, this may transitively collapse other potential pairs and decrease the count by more than 1.
+
+### **GET `/manage_file_relationships/get_potential_pairs`** { id="manage_file_relationships_get_potential_pairs" }
+
+_Get some potential duplicate pairs for a filtering workflow. Exactly the same as the 'duplicate filter' in the duplicate processing page._
+
+Restricted access:
+: YES. Manage File Relationships permission needed.
+
+Required Headers: n/a
+
+Arguments (in percent-encoded JSON):
+:
+ * `tag_service_key_1`: (optional, default 'all known tags', a hex tag service key)
+ * `tags_1`: (optional, default system:everything, a list of tags you wish to search for)
+ * `tag_service_key_2`: (optional, default 'all known tags', a hex tag service key)
+ * `tags_2`: (optional, default system:everything, a list of tags you wish to search for)
+ * `potentials_search_type`: (optional, integer, default 0, regarding how the pairs should match the search(es))
+ * `pixel_duplicates`: (optional, integer, default 1, regarding whether the pairs should be pixel duplicates)
+ * `max_hamming_distance`: (optional, integer, default 4, the max 'search distance' of the pairs)
+ * `max_num_pairs`: (optional, integer, defaults to client's option, how many pairs to get in a batch)
+
+``` title="Example request"
+/manage_file_relationships/get_potential_pairs?tag_service_key_1=c1ba23c60cda1051349647a151321d43ef5894aacdfb4b4e333d6c4259d56c5f&tags_1=%5B%22dupes_to_process%22%2C%20%22system%3Awidth%3C400%22%5D&search_type=1&pixel_duplicates=2&max_hamming_distance=0&max_num_pairs=50
+```
+
+The search arguments work the same as [/manage\_file\_relationships/get\_potentials\_count](#manage_file_relationships_get_potentials_count).
+
+`max_num_pairs` is simple and just caps how many pairs you get.
+
+Response:
+: A JSON Object listing a batch of hash pairs.
+```json title="Example response"
+{
+ "potential_duplicate_pairs" : [
+ [ "16470d6e73298cd75d9c7e8e2004810e047664679a660a9a3ba870b0fa3433d3", "7ed062dc76265d25abeee5425a859cfdf7ab26fd291f50b8de7ca381e04db079" ],
+ [ "eeea390357f259b460219d9589b4fa11e326403208097b1a1fbe63653397b210", "9215dfd39667c273ddfae2b73d90106b11abd5fd3cbadcc2afefa526bb226608" ],
+ [ "a1ea7d671245a3ae35932c603d4f3f85b0d0d40c5b70ffd78519e71945031788", "8e9592b2dfb436fe0a8e5fa15de26a34a6dfe4bca9d4363826fac367a9709b25" ]
+ ]
+}
+```
+
+The selected pair sample and their order is strictly hardcoded for now (e.g. to guarantee that a decision will not invalidate any other pair in the batch, you shouldn't see the same file twice in a batch, nor two files in the same duplicate group). Treat it as the client filter does, where you fetch batches to process one after another. I expect to make it more flexible in future, in the client itself and here.
+
+You will see significantly fewer than `max_num_pairs` (and potential duplicate count) as you close to the last available pairs, and when there are none left, you will get an empty list.
+
+### **GET `/manage_file_relationships/get_random_potentials`** { id="manage_file_relationships_get_random_potentials" }
+
+_Get some random potentially duplicate file hashes. Exactly the same as the 'show some random potential dupes' button in the duplicate processing page._
+
+Restricted access:
+: YES. Manage File Relationships permission needed.
+
+Required Headers: n/a
+
+Arguments (in percent-encoded JSON):
+:
+ * `tag_service_key_1`: (optional, default 'all known tags', a hex tag service key)
+ * `tags_1`: (optional, default system:everything, a list of tags you wish to search for)
+ * `tag_service_key_2`: (optional, default 'all known tags', a hex tag service key)
+ * `tags_2`: (optional, default system:everything, a list of tags you wish to search for)
+ * `potentials_search_type`: (optional, integer, default 0, regarding how the files should match the search(es))
+ * `pixel_duplicates`: (optional, integer, default 1, regarding whether the files should be pixel duplicates)
+ * `max_hamming_distance`: (optional, integer, default 4, the max 'search distance' of the files)
+
+``` title="Example request"
+/manage_file_relationships/get_random_potentials?tag_service_key_1=c1ba23c60cda1051349647a151321d43ef5894aacdfb4b4e333d6c4259d56c5f&tags_1=%5B%22dupes_to_process%22%2C%20%22system%3Awidth%3C400%22%5D&search_type=1&pixel_duplicates=2&max_hamming_distance=0
+```
+
+The arguments work the same as [/manage\_file\_relationships/get\_potentials\_count](#manage_file_relationships_get_potentials_count), with the caveat that `potentials_search_type` has special logic:
+
+* 0 - first file matches search 1
+* 1 - all files match search 1
+* 2 - first file matches search 1, the others 2
+
+Essentially, the first hash is the 'master' to which the others are paired. The other files will include every matching file.
+
+Response:
+: A JSON Object listing a group of hashes exactly as the client would.
+```json title="Example response"
+{
+ "random_potential_duplicate_hashes" : [
+ "16470d6e73298cd75d9c7e8e2004810e047664679a660a9a3ba870b0fa3433d3",
+ "7ed062dc76265d25abeee5425a859cfdf7ab26fd291f50b8de7ca381e04db079",
+ "9e0d6b928b726562d70e1f14a7b506ba987c6f9b7f2d2e723809bb11494c73e6",
+ "9e01744819b5ff2a84dda321e3f1a326f40d0e7f037408ded9f18a11ee2b2da8"
+ ]
+}
+```
+
+If there are no potential duplicate groups in the search, this returns an empty list.
+
+### **POST `/manage_file_relationships/set_file_relationships`** { id="manage_file_relationships_set_kings" }
+
+Set the relationships to the specified file pairs.
+
+Restricted access:
+: YES. Manage File Relationships permission needed.
+
+Required Headers:
+:
+ * `Content-Type`: application/json
+
+Arguments (in JSON):
+:
+ * `pair_rows`: (a list of lists)
+
+Each row is:
+
+ * [ relationship, hash_a, hash_b, do_default_content_merge, delete_a, delete_b ]
+
+Where `relationship` is one of this enum:
+
+* 0 - set as potential duplicates
+* 1 - set as false positives
+* 2 - set as same quality
+* 3 - set as alternates
+* 4 - set A as better
+* 7 - set B as better
+
+2, 4, and 7 all make the files 'duplicates' (8 under `get_file_relationships`), which, specifically, merges the two files' duplicate groups. 'same quality' has different duplicate content merge options to the better/worse choices, but it ultimately sets A>B. You obviously don't have to use 'B is better' if you prefer just to swap the hashes. Do what works for you.
+
+`hash_a` and `hash_b` are normal hex SHA256 hashes for your file pair.
+
+`do_default_content_merge` is a boolean setting whether the user's duplicate content merge options should be loaded and applied to the files along with the duplicate status. Most operations in the client do this automatically, so the user may expect it to apply, but if you want to do content merge yourself, set this to false.
+
+`delete_a` and `delete_b` are booleans that obviously select whether to delete A and/or B. You can also do this externally if you prefer.
+
+```json title="Example request body"
+{
+ "pair_rows" : [
+ [ 4, "b54d09218e0d6efc964b78b070620a1fa19c7e069672b4c6313cee2c9b0623f2", "bbaa9876dab238dcf5799bfd8319ed0bab805e844f45cf0de33f40697b11a845", true, false, true ],
+ [ 4, "22667427eaa221e2bd7ef405e1d2983846c863d40b2999ce8d1bf5f0c18f5fb2", "65d228adfa722f3cd0363853a191898abe8bf92d9a514c6c7f3c89cfed0bf423", true, false, true ],
+ [ 2, "0480513ffec391b77ad8c4e57fe80e5b710adfa3cb6af19b02a0bd7920f2d3ec", "5fab162576617b5c3fc8caabea53ce3ab1a3c8e0a16c16ae7b4e4a21eab168a7", true, false, false ]
+ ]
+}
+```
+
+Response:
+: 200 with no content.
+
+If you try to add an invalid or redundant relationship, for instance setting that files that are already duplicates are potential duplicates, no changes are made.
+
+This is the file relationships request that is probably most likely to change in future. I may implement content merge options. I may move from file pairs to group identifiers. When I expand alternates, those file groups are going to support more variables.
+
+### **POST `/manage_file_relationships/set_kings`** { id="manage_file_relationships_set_kings" }
+
+Set the specified files to be the kings of their duplicate groups.
+
+Restricted access:
+: YES. Manage File Relationships permission needed.
+
+Required Headers:
+:
+ * `Content-Type`: application/json
+
+Arguments (in JSON):
+:
+ * `file_id`: (selective, a numerical file id)
+ * `file_ids`: (selective, a list of numerical file ids)
+ * `hash`: (selective, a hexadecimal SHA256 hash)
+ * `hashes`: (selective, a list of hexadecimal SHA256 hashes)
+
+```json title="Example request body"
+{
+ "file_id" : 123
+}
+```
+
+Response:
+: 200 with no content.
+
+The files will be promoted to be the kings of their respective duplicate groups. If the file is already the king (also true for any file with no duplicates), this is idempotent. It also processes the files in the given order, so if you specify two files in the same group, the latter will be the king at the end of the request.
+
+## Managing Cookies and HTTP Headers
+
+This refers to the cookies held in the client's session manager, which are sent with network requests to different domains.
+
+### **GET `/manage_cookies/get_cookies`** { id="manage_cookies_get_cookies" }
+
+_Get the cookies for a particular domain._
+
+Restricted access:
+: YES. Manage Cookies permission needed.
+
+Required Headers: n/a
+
+Arguments:
+: * `domain`
+
+ ``` title="Example request (for gelbooru.com)"
+ /manage_cookies/get_cookies?domain=gelbooru.com
+ ```
+
+
+Response:
+: A JSON Object listing all the cookies for that domain in \[ name, value, domain, path, expires \] format.
+```json title="Example response"
+{
+ "cookies" : [
+ ["__cfduid", "f1bef65041e54e93110a883360bc7e71", ".gelbooru.com", "/", 1596223327],
+ ["pass_hash", "0b0833b797f108e340b315bc5463c324", "gelbooru.com", "/", 1585855361],
+ ["user_id", "123456", "gelbooru.com", "/", 1585855361]
+ ]
+}
+```
+
+ Note that these variables are all strings except 'expires', which is either an integer timestamp or _null_ for session cookies.
+
+ This request will also return any cookies for subdomains. The session system in hydrus generally stores cookies according to the second-level domain, so if you request for specific.someoverbooru.net, you will still get the cookies for someoverbooru.net and all its subdomains.
+
+### **POST `/manage_cookies/set_cookies`** { id="manage_cookies_set_cookies" }
+
+Set some new cookies for the client. This makes it easier to 'copy' a login from a web browser or similar to hydrus if hydrus's login system can't handle the site yet.
+
+Restricted access:
+: YES. Manage Cookies permission needed.
+
+Required Headers:
+:
+ * `Content-Type`: application/json
+
+Arguments (in JSON):
+:
+ * `cookies`: (a list of cookie rows in the same format as the GET request above)
+
+```json title="Example request body"
+{
+ "cookies" : [
+ ["PHPSESSID", "07669eb2a1a6e840e498bb6e0799f3fb", ".somesite.com", "/", 1627327719],
+ ["tag_filter", "1", ".somesite.com", "/", 1627327719]
+ ]
+}
+```
+
+You can set 'value' to be null, which will clear any existing cookie with the corresponding name, domain, and path (acting essentially as a delete).
+
+Expires can be null, but session cookies will time-out in hydrus after 60 minutes of non-use.
+
+### **POST `/manage_headers/set_user_agent`** { id="manage_headers_set_user_agent" }
+
+This sets the 'Global' User-Agent for the client, as typically editable under _network->data->manage http headers_, for instance if you want hydrus to appear as a specific browser associated with some cookies.
+
+Restricted access:
+: YES. Manage Cookies permission needed.
+
+Required Headers:
+:
+ * `Content-Type`: application/json
+
+Arguments (in JSON):
+:
+ * `user-agent`: (a string)
+
+```json title="Example request body"
+{
+ "user-agent" : "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0"
+}
+```
+
+Send an empty string to reset the client back to the default User-Agent, which should be `Mozilla/5.0 (compatible; Hydrus Client)`.
+
+## Managing Pages
+
+This refers to the pages of the main client UI.
+
+### **GET `/manage_pages/get_pages`** { id="manage_pages_get_pages" }
+
+_Get the page structure of the current UI session._
+
+Restricted access:
+: YES. Manage Pages permission needed.
+
+Required Headers: n/a
+
+Arguments: n/a
+
+
+Response:
+: A JSON Object of the top-level page 'notebook' (page of pages) detailing its basic information and current sub-pages. Page of pages beneath it will list their own sub-page lists.
+```json title="Example response"
+{
+ "pages" : {
+ "name" : "top pages notebook",
+ "page_key" : "3b28d8a59ec61834325eb6275d9df012860a1ecfd9e1246423059bc47fb6d5bd",
+ "page_type" : 10,
+ "selected" : true,
+ "pages" : [
+ {
+ "name" : "files",
+ "page_key" : "d436ff5109215199913705eb9a7669d8a6b67c52e41c3b42904db083255ca84d",
+ "page_type" : 6,
+ "selected" : false
+ },
+ {
+ "name" : "thread watcher",
+ "page_key" : "40887fa327edca01e1d69b533dddba4681b2c43e0b4ebee0576177852e8c32e7",
+ "page_type" : 9,
+ "selected" : false
+ },
+ {
+ "name" : "pages",
+ "page_key" : "2ee7fa4058e1e23f2bd9e915cdf9347ae90902a8622d6559ba019a83a785c4dc",
+ "page_type" : 10,
+ "selected" : true,
+ "pages" : [
+ {
+ "name" : "urls",
+ "page_key" : "9fe22cb760d9ee6de32575ed9f27b76b4c215179cf843d3f9044efeeca98411f",
+ "page_type" : 7,
+ "selected" : true
+ },
+ {
+ "name" : "files",
+ "page_key" : "2977d57fc9c588be783727bcd54225d577b44e8aa2f91e365a3eb3c3f580dc4e",
+ "page_type" : 6,
+ "selected" : false
+ }
+ ]
+ }
+ ]
+ }
+}
+```
+
+ The page types are as follows:
+
+ * 1 - Gallery downloader
+ * 2 - Simple downloader
+ * 3 - Hard drive import
+ * 5 - Petitions (used by repository janitors)
+ * 6 - File search
+ * 7 - URL downloader
+ * 8 - Duplicates
+ * 9 - Thread watcher
+ * 10 - Page of pages
+
+ The top page of pages will always be there, and always selected. 'selected' means which page is currently in view and will propagate down other page of pages until it terminates. It may terminate in an empty page of pages, so do not assume it will end on a 'media' page.
+
+ The 'page_key' is a unique identifier for the page. It will stay the same for a particular page throughout the session, but new ones are generated on a client restart or other session reload.
+
+### **GET `/manage_pages/get_page_info`** { id="manage_pages_get_page_info" }
+
+_Get information about a specific page._
+
+!!! warning "Under Construction"
+ This is under construction. The current call dumps a ton of info for different downloader pages. Please experiment in IRL situations and give feedback for now! I will flesh out this help with more enumeration info and examples as this gets nailed down. POST commands to alter pages (adding, removing, highlighting), will come later.
+
+Restricted access:
+: YES. Manage Pages permission needed.
+
+Required Headers: n/a
+
+Arguments:
+:
+ * `page_key`: (hexadecimal page\_key as stated in [/manage\_pages/get\_pages](#manage_pages_get_pages))
+ * `simple`: true or false (optional, defaulting to true)
+
+ ``` title="Example request"
+ /manage_pages/get_page_info?page_key=aebbf4b594e6986bddf1eeb0b5846a1e6bc4e07088e517aff166f1aeb1c3c9da&simple=true
+ ```
+
+Response description
+: A JSON Object of the page's information. At present, this mostly means downloader information.
+```json title="Example response with simple = true"
+{
+ "page_info" : {
+ "name" : "threads",
+ "page_key" : "aebbf4b594e6986bddf1eeb0b5846a1e6bc4e07088e517aff166f1aeb1c3c9da",
+ "page_type" : 3,
+ "management" : {
+ "multiple_watcher_import" : {
+ "watcher_imports" : [
+ {
+ "url" : "https://someimageboard.net/m/123456",
+ "watcher_key" : "cf8c3525c57a46b0e5c2625812964364a2e801f8c49841c216b8f8d7a4d06d85",
+ "created" : 1566164269,
+ "last_check_time" : 1566164272,
+ "next_check_time" : 1566174272,
+ "files_paused" : false,
+ "checking_paused" : false,
+ "checking_status" : 0,
+ "subject" : "gundam pictures",
+ "imports" : {
+ "status" : "4 successful (2 already in db)",
+ "simple_status" : "4",
+ "total_processed" : 4,
+ "total_to_process" : 4
+ },
+ "gallery_log" : {
+ "status" : "1 successful",
+ "simple_status" : "1",
+ "total_processed" : 1,
+ "total_to_process" : 1
+ }
+ },
+ {
+ "url" : "https://someimageboard.net/a/1234",
+ "watcher_key" : "6bc17555b76da5bde2dcceedc382cf7d23281aee6477c41b643cd144ec168510",
+ "created" : 1566063125,
+ "last_check_time" : 1566063133,
+ "next_check_time" : 1566104272,
+ "files_paused" : false,
+ "checking_paused" : true,
+ "checking_status" : 1,
+ "subject" : "anime pictures",
+ "imports" : {
+ "status" : "124 successful (22 already in db), 2 previously deleted",
+ "simple_status" : "124",
+ "total_processed" : 124,
+ "total_to_process" : 124
+ },
+ "gallery_log" : {
+ "status" : "3 successful",
+ "simple_status" : "3",
+ "total_processed" : 3,
+ "total_to_process" : 3
+ }
+ }
+ ]
+ },
+ "highlight" : "cf8c3525c57a46b0e5c2625812964364a2e801f8c49841c216b8f8d7a4d06d85"
+ }
+ },
+ "media" : {
+ "num_files" : 4
+ }
+}
+```
+
+ As you can see, even the 'simple' mode can get very large. Imagine that response for a page watching 100 threads! Turning simple mode off will display every import item, gallery log entry, and all hashes in the media (thumbnail) panel.
+
+ For this first version, the five importer pages--hdd import, simple downloader, url downloader, gallery page, and watcher page--all give rich info based on their specific variables. The first three only have one importer/gallery log combo, but the latter two of course can have multiple. The "imports" and "gallery_log" entries are all in the same data format.
+
+
+### **POST `/manage_pages/add_files`** { id="manage_pages_add_files" }
+
+_Add files to a page._
+
+Restricted access:
+: YES. Manage Pages permission needed.
+
+Required Headers:
+:
+ * `Content-Type`: application/json
+
+Arguments (in JSON):
+:
+ * `page_key`: (the page key for the page you wish to add files to)
+ * `file_id`: (selective, a numerical file id)
+ * `file_ids`: (selective, a list of numerical file ids)
+ * `hash`: (selective, a hexadecimal SHA256 hash)
+ * `hashes`: (selective, a list of hexadecimal SHA256 hashes)
+
+The files you set will be appended to the given page, just like a thumbnail drag and drop operation. The page key is the same as fetched in the [/manage\_pages/get\_pages](#manage_pages_get_pages) call.
+
+```json title="Example request body"
+{
+ "page_key" : "af98318b6eece15fef3cf0378385ce759bfe056916f6e12157cd928eb56c1f18",
+ "file_ids" : [123, 124, 125]
+}
+```
+
+Response:
+: 200 with no content. If the page key is not found, this will 404.
+
+### **POST `/manage_pages/focus_page`** { id="manage_pages_focus_page" }
+
+_'Show' a page in the main GUI, making it the current page in view. If it is already the current page, no change is made._
+
+Restricted access:
+: YES. Manage Pages permission needed.
+
+Required Headers:
+:
+ * `Content-Type`: application/json
+
+Arguments (in JSON):
+:
+ * `page_key`: (the page key for the page you wish to show)
+
+The page key is the same as fetched in the [/manage\_pages/get\_pages](#manage_pages_get_pages) call.
+
+```json title="Example request body"
+{
+ "page_key" : "af98318b6eece15fef3cf0378385ce759bfe056916f6e12157cd928eb56c1f18"
+}
+```
+
+Response:
+: 200 with no content. If the page key is not found, this will 404.
+
+
+### **POST `/manage_pages/refresh_page`** { id="manage_pages_refresh_page" }
+
+_Refresh a page in the main GUI. Like hitting F5 in the client, this obviously makes file search pages perform their search again, but for other page types it will force the currently in-view files to be re-sorted._
+
+Restricted access:
+: YES. Manage Pages permission needed.
+
+Required Headers:
+:
+ * `Content-Type`: application/json
+
+Arguments (in JSON):
+:
+ * `page_key`: (the page key for the page you wish to refresh)
+
+The page key is the same as fetched in the [/manage\_pages/get\_pages](#manage_pages_get_pages) call. If a file search page is not set to 'searching immediately', a 'refresh' command does nothing.
+
+```json title="Example request body"
+{
+ "page_key" : "af98318b6eece15fef3cf0378385ce759bfe056916f6e12157cd928eb56c1f18"
+}
+```
+
+Response:
+: 200 with no content. If the page key is not found, this will 404.
+
## Managing the Database
### **POST `/manage_database/lock_on`** { id="manage_database_lock_on" }
diff --git a/docs/old_changelog.html b/docs/old_changelog.html
index 7fe287f8..fa034c36 100644
--- a/docs/old_changelog.html
+++ b/docs/old_changelog.html
@@ -34,6 +34,33 @@
+ -
+
+
+ client api
+ - the Client API now supports the duplicates system! this is early stages, and what I've exposed is ugly and technical, but if you want to try out some external dupe processing, give it a go and let me know what you think! (issue #347)
+ - a new 'manage file relationships' permission gives your api keys access
+ - the new GET commands are:
+ - - `/manage_file_relationships/get_file_relationships`, which fetches potential dupes, dupes, alternates, false positives, and dupe kings
+ - - `/manage_file_relationships/get_potentials_count`, which can take two file searches, a potential dupes search type, a pixel match type, and max hamming distance, and will give the number of potential pairs in that domain
+ - - `/manage_file_relationships/get_potential_pairs`, which takes the same params as count and a `max_num_pairs` and gives you a batch of pairs to process, just like the dupe filter
+ - - `/manage_file_relationships/get_random_potentials`, which takes the same params as count and gives you some hashes just like the 'show some random potential pairs' button
+ - the new POST commands are:
+ - - `/manage_file_relationships/set_file_relationships`, which sets potential/dupe/alternate/false positive relationships between file pairs with some optional content merge and file deletes
+ - - `/manage_file_relationships/set_kings`, which sets duplicate group kings
+ - more commands will be written in the future for various remove/dissolve actions
+ - wrote unit tests for all the commands!
+ - wrote help for all the commands!
+ - fixed an issue in the '/manage_pages/get_pages' call where the response data structure was saying 'focused' instead of 'selected' for 'page of pages'
+ - cilent api version is now 40
+ boring misc cleanup and refactoring
+ - cleaned and wrote some more parsing methods for the api to support duplicate search tech and reduce copypasted parsing code
+ - renamed the client api permission labels a little, just making it all clearer and line up better. also, the 'edit client permissions' dialog now sorts the permissions
+ - reordered and renamed the dev help headers in the same way
+ - simple but significant rename-refactoring in file duplicates database module, tearing off the old 'Duplicates' prefixes to every method ha ha
+ - updated the advanced Windows 'running from source' help to talk more about VC build tools. some old scripts don't seem to work any more in Win 11, but you also don't really need it any more (I moved to a new dev machine this week so had to set everything up again)
+
+
-
diff --git a/docs/running_from_source.md b/docs/running_from_source.md
index 841590c6..11ccc472 100644
--- a/docs/running_from_source.md
+++ b/docs/running_from_source.md
@@ -316,27 +316,28 @@ When running from source you may want to [build the hydrus help docs](about_docs
## building packages on windows { id="windows_build" }
-Almost everything is provided as pre-compiled 'wheels' these days, but if you get an error about Visual Studio C++ when you try to pip something, it may be you need that compiler tech.
+Almost everything you get through pip is provided as pre-compiled 'wheels' these days, but if you get an error about Visual Studio C++ when you try to pip something, you have two choices:
-You also need this if you want to build a frozen release locally.
+- Get Visual Studio 14/whatever build tools
+- Pick a different library version
-Although these tools are free, it can be a pain to get them through the official (and often huge) downloader installer from Microsoft. Instead, install [Chocolatey](https://chocolatey.org/) and use this one simple line:
+Option B is always the simpler. If opencv-headless as the requirements.txt specifies won't compile in Python 3.10, then try a newer version--there will probably be one of these new highly compatible wheels and it'll just work in seconds. Check my build scripts and various requirements.txts for ideas on what versions to try for your python etc...
+
+If you are confident you need Visual Studio tools, then prepare for headaches. Although the tools are free from Microsoft, it can be a pain to get them through the official (and often huge) downloader installer from Microsoft. Expect a 5GB+ install with an eye-watering number of checkboxes that probably needs some stackexchange searches to figure out.
+
+On Windows 10, [Chocolatey](https://chocolatey.org/) has been the easy answer. Get it installed and and use this one simple line:
```
-choco install -y vcbuildtools visualstudio2017buildtools
+choco install -y vcbuildtools visualstudio2017buildtools windows-sdk-10.0
```
Trust me, just do this, it will save a ton of headaches!
-This can also be helpful for Windows 10 python work generally:
-
-```
-choco install -y windows-sdk-10.0
-```
+_Update:_ On Windows 11, in 2023-01, I had trouble with the above. There's a couple '11' SDKs that installed ok, but the vcbuildtools stuff had unusual errors. I hadn't done this in years, so maybe they are broken for Windows 10 too! The good news is that a basic stock Win 11 install with Python 3.10 is fine getting everything on our requirements and even making a build without any extra compiler tech.
## additional windows info { id="additional_windows" }
-This does not matter much any more, but in the old days, Windows pip could have problems building modules like lz4 and lxml, and Visual Studio was tricky to get working. [This page](http://www.lfd.uci.edu/~gohlke/pythonlibs/) has a lot of prebuilt binaries--I have found it very helpful many times.
+This does not matter much any more, but in the old days, building modules like lz4 and lxml was a complete nightmare, and hooking up Visual Studio was even more difficult. [This page](http://www.lfd.uci.edu/~gohlke/pythonlibs/) has a lot of prebuilt binaries--I have found it very helpful many times.
I have a fair bit of experience with Windows python, so send me a mail if you need help.
@@ -344,4 +345,4 @@ I have a fair bit of experience with Windows python, so send me a mail if you ne
My coding style is unusual and unprofessional. Everything is pretty much hacked together. If you are interested in how things work, please do look through the source and ask me if you don't understand something.
-I'm constantly throwing new code together and then cleaning and overhauling it down the line. I work strictly alone, however, so while I am very interested in detailed bug reports or suggestions for good libraries to use, I am not looking for pull requests or suggestions on style. I know a lot of things are a mess. Everything I do is [WTFPL](https://github.com/sirkris/WTFPL/blob/master/WTFPL.md), so feel free to fork and play around with things on your end as much as you like.
+I'm constantly throwing new code together and then cleaning and overhauling it down the line. I work strictly alone. While I am very interested in detailed bug reports or suggestions for good libraries to use, I am not looking for pull requests or suggestions on style. I know a lot of things are a mess. Everything I do is [WTFPL](https://github.com/sirkris/WTFPL/blob/master/WTFPL.md), so feel free to fork and play around with things on your end as much as you like.
diff --git a/hydrus/client/ClientAPI.py b/hydrus/client/ClientAPI.py
index 8a43e82b..39d8e61e 100644
--- a/hydrus/client/ClientAPI.py
+++ b/hydrus/client/ClientAPI.py
@@ -17,19 +17,21 @@ CLIENT_API_PERMISSION_MANAGE_PAGES = 4
CLIENT_API_PERMISSION_MANAGE_COOKIES = 5
CLIENT_API_PERMISSION_MANAGE_DATABASE = 6
CLIENT_API_PERMISSION_ADD_NOTES = 7
+CLIENT_API_PERMISSION_MANAGE_FILE_RELATIONSHIPS = 8
-ALLOWED_PERMISSIONS = ( CLIENT_API_PERMISSION_ADD_FILES, CLIENT_API_PERMISSION_ADD_TAGS, CLIENT_API_PERMISSION_ADD_URLS, CLIENT_API_PERMISSION_SEARCH_FILES, CLIENT_API_PERMISSION_MANAGE_PAGES, CLIENT_API_PERMISSION_MANAGE_COOKIES, CLIENT_API_PERMISSION_MANAGE_DATABASE, CLIENT_API_PERMISSION_ADD_NOTES )
+ALLOWED_PERMISSIONS = ( CLIENT_API_PERMISSION_ADD_FILES, CLIENT_API_PERMISSION_ADD_TAGS, CLIENT_API_PERMISSION_ADD_URLS, CLIENT_API_PERMISSION_SEARCH_FILES, CLIENT_API_PERMISSION_MANAGE_PAGES, CLIENT_API_PERMISSION_MANAGE_COOKIES, CLIENT_API_PERMISSION_MANAGE_DATABASE, CLIENT_API_PERMISSION_ADD_NOTES, CLIENT_API_PERMISSION_MANAGE_FILE_RELATIONSHIPS )
basic_permission_to_str_lookup = {}
-basic_permission_to_str_lookup[ CLIENT_API_PERMISSION_ADD_URLS ] = 'add urls for processing'
-basic_permission_to_str_lookup[ CLIENT_API_PERMISSION_ADD_FILES ] = 'import files'
-basic_permission_to_str_lookup[ CLIENT_API_PERMISSION_ADD_TAGS ] = 'add tags to files'
-basic_permission_to_str_lookup[ CLIENT_API_PERMISSION_SEARCH_FILES ] = 'search for files'
+basic_permission_to_str_lookup[ CLIENT_API_PERMISSION_ADD_URLS ] = 'import and edit urls'
+basic_permission_to_str_lookup[ CLIENT_API_PERMISSION_ADD_FILES ] = 'import and delete files'
+basic_permission_to_str_lookup[ CLIENT_API_PERMISSION_ADD_TAGS ] = 'edit file tags'
+basic_permission_to_str_lookup[ CLIENT_API_PERMISSION_SEARCH_FILES ] = 'search and fetch files'
basic_permission_to_str_lookup[ CLIENT_API_PERMISSION_MANAGE_PAGES ] = 'manage pages'
basic_permission_to_str_lookup[ CLIENT_API_PERMISSION_MANAGE_COOKIES ] = 'manage cookies'
basic_permission_to_str_lookup[ CLIENT_API_PERMISSION_MANAGE_DATABASE ] = 'manage database'
-basic_permission_to_str_lookup[ CLIENT_API_PERMISSION_ADD_NOTES ] = 'add notes to files'
+basic_permission_to_str_lookup[ CLIENT_API_PERMISSION_ADD_NOTES ] = 'edit file notes'
+basic_permission_to_str_lookup[ CLIENT_API_PERMISSION_MANAGE_FILE_RELATIONSHIPS ] = 'manage file relationships'
SEARCH_RESULTS_CACHE_TIMEOUT = 4 * 3600
diff --git a/hydrus/client/db/ClientDB.py b/hydrus/client/db/ClientDB.py
index d334db0c..80b29a77 100644
--- a/hydrus/client/db/ClientDB.py
+++ b/hydrus/client/db/ClientDB.py
@@ -1862,13 +1862,13 @@ class DB( HydrusDB.HydrusDB ):
chosen_allowed_hash_ids = query_hash_ids_1
comparison_allowed_hash_ids = query_hash_ids_2
- table_join = self.modules_files_duplicates.DuplicatesGetPotentialDuplicatePairsTableJoinOnSeparateSearchResults( temp_table_name_1, temp_table_name_2, pixel_dupes_preference, max_hamming_distance )
+ table_join = self.modules_files_duplicates.GetPotentialDuplicatePairsTableJoinOnSeparateSearchResults( temp_table_name_1, temp_table_name_2, pixel_dupes_preference, max_hamming_distance )
else:
if file_search_context_1.IsJustSystemEverything() or file_search_context_1.HasNoPredicates():
- table_join = self.modules_files_duplicates.DuplicatesGetPotentialDuplicatePairsTableJoinOnEverythingSearchResults( db_location_context, pixel_dupes_preference, max_hamming_distance )
+ table_join = self.modules_files_duplicates.GetPotentialDuplicatePairsTableJoinOnEverythingSearchResults( db_location_context, pixel_dupes_preference, max_hamming_distance )
else:
@@ -1879,14 +1879,14 @@ class DB( HydrusDB.HydrusDB ):
chosen_allowed_hash_ids = query_hash_ids
comparison_allowed_hash_ids = query_hash_ids
- table_join = self.modules_files_duplicates.DuplicatesGetPotentialDuplicatePairsTableJoinOnSearchResultsBothFiles( temp_table_name_1, pixel_dupes_preference, max_hamming_distance )
+ table_join = self.modules_files_duplicates.GetPotentialDuplicatePairsTableJoinOnSearchResultsBothFiles( temp_table_name_1, pixel_dupes_preference, max_hamming_distance )
else:
# the master will always be one that matches the search, the comparison can be whatever
chosen_allowed_hash_ids = query_hash_ids
- table_join = self.modules_files_duplicates.DuplicatesGetPotentialDuplicatePairsTableJoinOnSearchResults( db_location_context, temp_table_name_1, pixel_dupes_preference, max_hamming_distance )
+ table_join = self.modules_files_duplicates.GetPotentialDuplicatePairsTableJoinOnSearchResults( db_location_context, temp_table_name_1, pixel_dupes_preference, max_hamming_distance )
@@ -1915,7 +1915,7 @@ class DB( HydrusDB.HydrusDB ):
for potential_media_id in potential_media_ids:
- best_king_hash_id = self.modules_files_duplicates.DuplicatesGetBestKingId( potential_media_id, db_location_context, allowed_hash_ids = chosen_allowed_hash_ids, preferred_hash_ids = chosen_preferred_hash_ids )
+ best_king_hash_id = self.modules_files_duplicates.GetBestKingId( potential_media_id, db_location_context, allowed_hash_ids = chosen_allowed_hash_ids, preferred_hash_ids = chosen_preferred_hash_ids )
if best_king_hash_id is not None:
@@ -1931,7 +1931,7 @@ class DB( HydrusDB.HydrusDB ):
return []
- # I used to do self.modules_files_duplicates.DuplicatesGetFileHashesByDuplicateType here, but that gets _all_ potentials in the db context, even with allowed_hash_ids doing work it won't capture pixel hashes or duplicate distance that we searched above
+ # I used to do self.modules_files_duplicates.GetFileHashesByDuplicateType here, but that gets _all_ potentials in the db context, even with allowed_hash_ids doing work it won't capture pixel hashes or duplicate distance that we searched above
# so, let's search and make the list manually!
comparison_hash_ids = []
@@ -1950,7 +1950,7 @@ class DB( HydrusDB.HydrusDB ):
potential_media_id = smaller_media_id
- best_king_hash_id = self.modules_files_duplicates.DuplicatesGetBestKingId( potential_media_id, db_location_context, allowed_hash_ids = comparison_allowed_hash_ids, preferred_hash_ids = comparison_preferred_hash_ids )
+ best_king_hash_id = self.modules_files_duplicates.GetBestKingId( potential_media_id, db_location_context, allowed_hash_ids = comparison_allowed_hash_ids, preferred_hash_ids = comparison_preferred_hash_ids )
if best_king_hash_id is not None:
@@ -1965,8 +1965,15 @@ class DB( HydrusDB.HydrusDB ):
return self.modules_hashes_local_cache.GetHashes( results_hash_ids )
+
+
- def _DuplicatesGetPotentialDuplicatePairsForFiltering( self, file_search_context_1: ClientSearch.FileSearchContext, file_search_context_2: ClientSearch.FileSearchContext, dupe_search_type: int, pixel_dupes_preference, max_hamming_distance ):
+ def _DuplicatesGetPotentialDuplicatePairsForFiltering( self, file_search_context_1: ClientSearch.FileSearchContext, file_search_context_2: ClientSearch.FileSearchContext, dupe_search_type: int, pixel_dupes_preference, max_hamming_distance, max_num_pairs: typing.Optional[ int ] = None ):
+
+ if max_num_pairs is None:
+
+ max_num_pairs = HG.client_controller.new_options.GetInteger( 'duplicate_filter_max_batch_size' )
+
# we need to batch non-intersecting decisions here to keep it simple at the gui-level
# we also want to maximise per-decision value
@@ -1993,13 +2000,13 @@ class DB( HydrusDB.HydrusDB ):
chosen_allowed_hash_ids = query_hash_ids_1
comparison_allowed_hash_ids = query_hash_ids_2
- table_join = self.modules_files_duplicates.DuplicatesGetPotentialDuplicatePairsTableJoinOnSeparateSearchResults( temp_table_name_1, temp_table_name_2, pixel_dupes_preference, max_hamming_distance )
+ table_join = self.modules_files_duplicates.GetPotentialDuplicatePairsTableJoinOnSeparateSearchResults( temp_table_name_1, temp_table_name_2, pixel_dupes_preference, max_hamming_distance )
else:
if file_search_context_1.IsJustSystemEverything() or file_search_context_1.HasNoPredicates():
- table_join = self.modules_files_duplicates.DuplicatesGetPotentialDuplicatePairsTableJoinOnEverythingSearchResults( db_location_context, pixel_dupes_preference, max_hamming_distance )
+ table_join = self.modules_files_duplicates.GetPotentialDuplicatePairsTableJoinOnEverythingSearchResults( db_location_context, pixel_dupes_preference, max_hamming_distance )
else:
@@ -2011,14 +2018,14 @@ class DB( HydrusDB.HydrusDB ):
chosen_allowed_hash_ids = query_hash_ids
comparison_allowed_hash_ids = query_hash_ids
- table_join = self.modules_files_duplicates.DuplicatesGetPotentialDuplicatePairsTableJoinOnSearchResultsBothFiles( temp_table_name_1, pixel_dupes_preference, max_hamming_distance )
+ table_join = self.modules_files_duplicates.GetPotentialDuplicatePairsTableJoinOnSearchResultsBothFiles( temp_table_name_1, pixel_dupes_preference, max_hamming_distance )
else:
# the chosen must be in the search, but we don't care about the comparison as long as it is viewable
chosen_preferred_hash_ids = query_hash_ids
- table_join = self.modules_files_duplicates.DuplicatesGetPotentialDuplicatePairsTableJoinOnSearchResults( db_location_context, temp_table_name_1, pixel_dupes_preference, max_hamming_distance )
+ table_join = self.modules_files_duplicates.GetPotentialDuplicatePairsTableJoinOnSearchResults( db_location_context, temp_table_name_1, pixel_dupes_preference, max_hamming_distance )
@@ -2028,8 +2035,6 @@ class DB( HydrusDB.HydrusDB ):
- MAX_BATCH_SIZE = HG.client_controller.new_options.GetInteger( 'duplicate_filter_max_batch_size' )
-
batch_of_pairs_of_media_ids = []
seen_media_ids = set()
@@ -2089,7 +2094,7 @@ class DB( HydrusDB.HydrusDB ):
batch_of_pairs_of_media_ids.append( pair )
- if len( batch_of_pairs_of_media_ids ) >= MAX_BATCH_SIZE:
+ if len( batch_of_pairs_of_media_ids ) >= max_num_pairs:
break
@@ -2097,13 +2102,13 @@ class DB( HydrusDB.HydrusDB ):
seen_media_ids.update( seen_media_ids_for_this_master_media_id )
- if len( batch_of_pairs_of_media_ids ) >= MAX_BATCH_SIZE:
+ if len( batch_of_pairs_of_media_ids ) >= max_num_pairs:
break
- if len( batch_of_pairs_of_media_ids ) >= MAX_BATCH_SIZE:
+ if len( batch_of_pairs_of_media_ids ) >= max_num_pairs:
break
@@ -2119,8 +2124,8 @@ class DB( HydrusDB.HydrusDB ):
for ( smaller_media_id, larger_media_id ) in batch_of_pairs_of_media_ids:
- best_smaller_king_hash_id = self.modules_files_duplicates.DuplicatesGetBestKingId( smaller_media_id, db_location_context, allowed_hash_ids = chosen_allowed_hash_ids, preferred_hash_ids = chosen_preferred_hash_ids )
- best_larger_king_hash_id = self.modules_files_duplicates.DuplicatesGetBestKingId( larger_media_id, db_location_context, allowed_hash_ids = chosen_allowed_hash_ids, preferred_hash_ids = chosen_preferred_hash_ids )
+ best_smaller_king_hash_id = self.modules_files_duplicates.GetBestKingId( smaller_media_id, db_location_context, allowed_hash_ids = chosen_allowed_hash_ids, preferred_hash_ids = chosen_preferred_hash_ids )
+ best_larger_king_hash_id = self.modules_files_duplicates.GetBestKingId( larger_media_id, db_location_context, allowed_hash_ids = chosen_allowed_hash_ids, preferred_hash_ids = chosen_preferred_hash_ids )
if best_smaller_king_hash_id is not None and best_larger_king_hash_id is not None:
@@ -2137,15 +2142,15 @@ class DB( HydrusDB.HydrusDB ):
for ( smaller_media_id, larger_media_id ) in batch_of_pairs_of_media_ids:
- best_smaller_king_hash_id = self.modules_files_duplicates.DuplicatesGetBestKingId( smaller_media_id, db_location_context, allowed_hash_ids = chosen_allowed_hash_ids, preferred_hash_ids = chosen_preferred_hash_ids )
- best_larger_king_hash_id = self.modules_files_duplicates.DuplicatesGetBestKingId( larger_media_id, db_location_context, allowed_hash_ids = comparison_allowed_hash_ids, preferred_hash_ids = comparison_preferred_hash_ids )
+ best_smaller_king_hash_id = self.modules_files_duplicates.GetBestKingId( smaller_media_id, db_location_context, allowed_hash_ids = chosen_allowed_hash_ids, preferred_hash_ids = chosen_preferred_hash_ids )
+ best_larger_king_hash_id = self.modules_files_duplicates.GetBestKingId( larger_media_id, db_location_context, allowed_hash_ids = comparison_allowed_hash_ids, preferred_hash_ids = comparison_preferred_hash_ids )
if best_smaller_king_hash_id is None or best_larger_king_hash_id is None:
# ok smaller was probably the comparison, let's see if that produces a better king hash
- best_smaller_king_hash_id = self.modules_files_duplicates.DuplicatesGetBestKingId( smaller_media_id, db_location_context, allowed_hash_ids = comparison_allowed_hash_ids, preferred_hash_ids = comparison_preferred_hash_ids )
- best_larger_king_hash_id = self.modules_files_duplicates.DuplicatesGetBestKingId( larger_media_id, db_location_context, allowed_hash_ids = chosen_allowed_hash_ids, preferred_hash_ids = chosen_preferred_hash_ids )
+ best_smaller_king_hash_id = self.modules_files_duplicates.GetBestKingId( smaller_media_id, db_location_context, allowed_hash_ids = comparison_allowed_hash_ids, preferred_hash_ids = comparison_preferred_hash_ids )
+ best_larger_king_hash_id = self.modules_files_duplicates.GetBestKingId( larger_media_id, db_location_context, allowed_hash_ids = chosen_allowed_hash_ids, preferred_hash_ids = chosen_preferred_hash_ids )
if best_smaller_king_hash_id is not None and best_larger_king_hash_id is not None:
@@ -2179,13 +2184,13 @@ class DB( HydrusDB.HydrusDB ):
self._PopulateSearchIntoTempTable( file_search_context_1, temp_table_name_1 )
self._PopulateSearchIntoTempTable( file_search_context_2, temp_table_name_2 )
- table_join = self.modules_files_duplicates.DuplicatesGetPotentialDuplicatePairsTableJoinOnSeparateSearchResults( temp_table_name_1, temp_table_name_2, pixel_dupes_preference, max_hamming_distance )
+ table_join = self.modules_files_duplicates.GetPotentialDuplicatePairsTableJoinOnSeparateSearchResults( temp_table_name_1, temp_table_name_2, pixel_dupes_preference, max_hamming_distance )
else:
if file_search_context_1.IsJustSystemEverything() or file_search_context_1.HasNoPredicates():
- table_join = self.modules_files_duplicates.DuplicatesGetPotentialDuplicatePairsTableJoinOnEverythingSearchResults( db_location_context, pixel_dupes_preference, max_hamming_distance )
+ table_join = self.modules_files_duplicates.GetPotentialDuplicatePairsTableJoinOnEverythingSearchResults( db_location_context, pixel_dupes_preference, max_hamming_distance )
else:
@@ -2193,11 +2198,11 @@ class DB( HydrusDB.HydrusDB ):
if dupe_search_type == CC.DUPE_SEARCH_BOTH_FILES_MATCH_ONE_SEARCH:
- table_join = self.modules_files_duplicates.DuplicatesGetPotentialDuplicatePairsTableJoinOnSearchResultsBothFiles( temp_table_name_1, pixel_dupes_preference, max_hamming_distance )
+ table_join = self.modules_files_duplicates.GetPotentialDuplicatePairsTableJoinOnSearchResultsBothFiles( temp_table_name_1, pixel_dupes_preference, max_hamming_distance )
else:
- table_join = self.modules_files_duplicates.DuplicatesGetPotentialDuplicatePairsTableJoinOnSearchResults( db_location_context, temp_table_name_1, pixel_dupes_preference, max_hamming_distance )
+ table_join = self.modules_files_duplicates.GetPotentialDuplicatePairsTableJoinOnSearchResults( db_location_context, temp_table_name_1, pixel_dupes_preference, max_hamming_distance )
@@ -2227,8 +2232,8 @@ class DB( HydrusDB.HydrusDB ):
hash_id_a = self.modules_hashes_local_cache.GetHashId( hash_a )
hash_id_b = self.modules_hashes_local_cache.GetHashId( hash_b )
- media_id_a = self.modules_files_duplicates.DuplicatesGetMediaId( hash_id_a )
- media_id_b = self.modules_files_duplicates.DuplicatesGetMediaId( hash_id_b )
+ media_id_a = self.modules_files_duplicates.GetMediaId( hash_id_a )
+ media_id_b = self.modules_files_duplicates.GetMediaId( hash_id_b )
smaller_media_id = min( media_id_a, media_id_b )
larger_media_id = max( media_id_a, media_id_b )
@@ -2246,29 +2251,29 @@ class DB( HydrusDB.HydrusDB ):
if duplicate_type == HC.DUPLICATE_FALSE_POSITIVE:
- alternates_group_id_a = self.modules_files_duplicates.DuplicatesGetAlternatesGroupId( media_id_a )
- alternates_group_id_b = self.modules_files_duplicates.DuplicatesGetAlternatesGroupId( media_id_b )
+ alternates_group_id_a = self.modules_files_duplicates.GetAlternatesGroupId( media_id_a )
+ alternates_group_id_b = self.modules_files_duplicates.GetAlternatesGroupId( media_id_b )
- self.modules_files_duplicates.DuplicatesSetFalsePositive( alternates_group_id_a, alternates_group_id_b )
+ self.modules_files_duplicates.SetFalsePositive( alternates_group_id_a, alternates_group_id_b )
elif duplicate_type == HC.DUPLICATE_ALTERNATE:
if media_id_a == media_id_b:
- king_hash_id = self.modules_files_duplicates.DuplicatesGetKingHashId( media_id_a )
+ king_hash_id = self.modules_files_duplicates.GetKingHashId( media_id_a )
hash_id_to_remove = hash_id_b if king_hash_id == hash_id_a else hash_id_a
- self.modules_files_duplicates.DuplicatesRemoveMediaIdMember( hash_id_to_remove )
+ self.modules_files_duplicates.RemoveMediaIdMember( hash_id_to_remove )
- media_id_a = self.modules_files_duplicates.DuplicatesGetMediaId( hash_id_a )
- media_id_b = self.modules_files_duplicates.DuplicatesGetMediaId( hash_id_b )
+ media_id_a = self.modules_files_duplicates.GetMediaId( hash_id_a )
+ media_id_b = self.modules_files_duplicates.GetMediaId( hash_id_b )
smaller_media_id = min( media_id_a, media_id_b )
larger_media_id = max( media_id_a, media_id_b )
- self.modules_files_duplicates.DuplicatesSetAlternates( media_id_a, media_id_b )
+ self.modules_files_duplicates.SetAlternates( media_id_a, media_id_b )
elif duplicate_type in ( HC.DUPLICATE_BETTER, HC.DUPLICATE_WORSE, HC.DUPLICATE_SAME_QUALITY ):
@@ -2281,8 +2286,8 @@ class DB( HydrusDB.HydrusDB ):
duplicate_type = HC.DUPLICATE_BETTER
- king_hash_id_a = self.modules_files_duplicates.DuplicatesGetKingHashId( media_id_a )
- king_hash_id_b = self.modules_files_duplicates.DuplicatesGetKingHashId( media_id_b )
+ king_hash_id_a = self.modules_files_duplicates.GetKingHashId( media_id_a )
+ king_hash_id_b = self.modules_files_duplicates.GetKingHashId( media_id_b )
if duplicate_type == HC.DUPLICATE_BETTER:
@@ -2292,7 +2297,7 @@ class DB( HydrusDB.HydrusDB ):
# user manually set that a > King A, hence we are setting a new king within a group
- self.modules_files_duplicates.DuplicatesSetKing( hash_id_a, media_id_a )
+ self.modules_files_duplicates.SetKing( hash_id_a, media_id_a )
else:
@@ -2301,16 +2306,16 @@ class DB( HydrusDB.HydrusDB ):
# user manually set that a member of A is better than a non-King of B. remove b from B and merge it into A
- self.modules_files_duplicates.DuplicatesRemoveMediaIdMember( hash_id_b )
+ self.modules_files_duplicates.RemoveMediaIdMember( hash_id_b )
- media_id_b = self.modules_files_duplicates.DuplicatesGetMediaId( hash_id_b )
+ media_id_b = self.modules_files_duplicates.GetMediaId( hash_id_b )
# b is now the King of its new group
# a member of A is better than King B, hence B can merge into A
- self.modules_files_duplicates.DuplicatesMergeMedias( media_id_a, media_id_b )
+ self.modules_files_duplicates.MergeMedias( media_id_a, media_id_b )
elif duplicate_type == HC.DUPLICATE_SAME_QUALITY:
@@ -2324,9 +2329,9 @@ class DB( HydrusDB.HydrusDB ):
# if neither file is the king, remove B from B and merge it into A
- self.modules_files_duplicates.DuplicatesRemoveMediaIdMember( hash_id_b )
+ self.modules_files_duplicates.RemoveMediaIdMember( hash_id_b )
- media_id_b = self.modules_files_duplicates.DuplicatesGetMediaId( hash_id_b )
+ media_id_b = self.modules_files_duplicates.GetMediaId( hash_id_b )
superior_media_id = media_id_a
mergee_media_id = media_id_b
@@ -2351,7 +2356,7 @@ class DB( HydrusDB.HydrusDB ):
mergee_media_id = media_id_b
- self.modules_files_duplicates.DuplicatesMergeMedias( superior_media_id, mergee_media_id )
+ self.modules_files_duplicates.MergeMedias( superior_media_id, mergee_media_id )
@@ -2359,7 +2364,7 @@ class DB( HydrusDB.HydrusDB ):
potential_duplicate_media_ids_and_distances = [ ( media_id_b, 0 ) ]
- self.modules_files_duplicates.DuplicatesAddPotentialDuplicates( media_id_a, potential_duplicate_media_ids_and_distances )
+ self.modules_files_duplicates.AddPotentialDuplicates( media_id_a, potential_duplicate_media_ids_and_distances )
@@ -2569,7 +2574,7 @@ class DB( HydrusDB.HydrusDB ):
db_location_context = self.modules_files_storage.GetDBLocationContext( location_context )
- table_join = self.modules_files_duplicates.DuplicatesGetPotentialDuplicatePairsTableJoinOnFileService( db_location_context )
+ table_join = self.modules_files_duplicates.GetPotentialDuplicatePairsTableJoinOnFileService( db_location_context )
( total_potential_pairs, ) = self._Execute( 'SELECT COUNT( * ) FROM ( SELECT DISTINCT smaller_media_id, larger_media_id FROM {} );'.format( table_join ) ).fetchone()
@@ -3667,7 +3672,7 @@ class DB( HydrusDB.HydrusDB ):
else:
- dupe_hash_ids = self.modules_files_duplicates.DuplicatesGetHashIdsFromDuplicateCountPredicate( db_location_context, operator, num_relationships, dupe_type )
+ dupe_hash_ids = self.modules_files_duplicates.GetHashIdsFromDuplicateCountPredicate( db_location_context, operator, num_relationships, dupe_type )
query_hash_ids = intersection_update_qhi( query_hash_ids, dupe_hash_ids )
@@ -3972,7 +3977,7 @@ class DB( HydrusDB.HydrusDB ):
if king_filter is not None and king_filter:
- king_hash_ids = self.modules_files_duplicates.DuplicatesFilterKingHashIds( query_hash_ids )
+ king_hash_ids = self.modules_files_duplicates.FilterKingHashIds( query_hash_ids )
query_hash_ids = intersection_update_qhi( query_hash_ids, king_hash_ids )
@@ -4118,7 +4123,7 @@ class DB( HydrusDB.HydrusDB ):
if king_filter is not None and not king_filter:
- king_hash_ids = self.modules_files_duplicates.DuplicatesFilterKingHashIds( query_hash_ids )
+ king_hash_ids = self.modules_files_duplicates.FilterKingHashIds( query_hash_ids )
query_hash_ids.difference_update( king_hash_ids )
@@ -4130,17 +4135,17 @@ class DB( HydrusDB.HydrusDB ):
if only_do_zero:
- nonzero_hash_ids = self.modules_files_duplicates.DuplicatesGetHashIdsFromDuplicateCountPredicate( db_location_context, '>', 0, dupe_type )
+ nonzero_hash_ids = self.modules_files_duplicates.GetHashIdsFromDuplicateCountPredicate( db_location_context, '>', 0, dupe_type )
query_hash_ids.difference_update( nonzero_hash_ids )
elif include_zero:
- nonzero_hash_ids = self.modules_files_duplicates.DuplicatesGetHashIdsFromDuplicateCountPredicate( db_location_context, '>', 0, dupe_type )
+ nonzero_hash_ids = self.modules_files_duplicates.GetHashIdsFromDuplicateCountPredicate( db_location_context, '>', 0, dupe_type )
zero_hash_ids = query_hash_ids.difference( nonzero_hash_ids )
- accurate_except_zero_hash_ids = self.modules_files_duplicates.DuplicatesGetHashIdsFromDuplicateCountPredicate( db_location_context, operator, num_relationships, dupe_type )
+ accurate_except_zero_hash_ids = self.modules_files_duplicates.GetHashIdsFromDuplicateCountPredicate( db_location_context, operator, num_relationships, dupe_type )
hash_ids = zero_hash_ids.union( accurate_except_zero_hash_ids )
@@ -6196,11 +6201,11 @@ class DB( HydrusDB.HydrusDB ):
return ( still_work_to_do, num_done )
- media_id = self.modules_files_duplicates.DuplicatesGetMediaId( hash_id )
+ media_id = self.modules_files_duplicates.GetMediaId( hash_id )
- potential_duplicate_media_ids_and_distances = [ ( self.modules_files_duplicates.DuplicatesGetMediaId( duplicate_hash_id ), distance ) for ( duplicate_hash_id, distance ) in self.modules_similar_files.Search( hash_id, search_distance ) if duplicate_hash_id != hash_id ]
+ potential_duplicate_media_ids_and_distances = [ ( self.modules_files_duplicates.GetMediaId( duplicate_hash_id ), distance ) for ( duplicate_hash_id, distance ) in self.modules_similar_files.Search( hash_id, search_distance ) if duplicate_hash_id != hash_id ]
- self.modules_files_duplicates.DuplicatesAddPotentialDuplicates( media_id, potential_duplicate_media_ids_and_distances )
+ self.modules_files_duplicates.AddPotentialDuplicates( media_id, potential_duplicate_media_ids_and_distances )
self._Execute( 'UPDATE shape_search_cache SET searched_distance = ? WHERE hash_id = ?;', ( search_distance, hash_id ) )
@@ -7320,8 +7325,8 @@ class DB( HydrusDB.HydrusDB ):
elif action == 'client_files_locations': result = self.modules_files_physical_storage.GetClientFilesLocations( *args, **kwargs )
elif action == 'deferred_physical_delete': result = self.modules_files_storage.GetDeferredPhysicalDelete( *args, **kwargs )
elif action == 'duplicate_pairs_for_filtering': result = self._DuplicatesGetPotentialDuplicatePairsForFiltering( *args, **kwargs )
- elif action == 'file_duplicate_hashes': result = self.modules_files_duplicates.DuplicatesGetFileHashesByDuplicateType( *args, **kwargs )
- elif action == 'file_duplicate_info': result = self.modules_files_duplicates.DuplicatesGetFileDuplicateInfo( *args, **kwargs )
+ elif action == 'file_duplicate_hashes': result = self.modules_files_duplicates.GetFileHashesByDuplicateType( *args, **kwargs )
+ elif action == 'file_duplicate_info': result = self.modules_files_duplicates.GetFileDuplicateInfo( *args, **kwargs )
elif action == 'file_hashes': result = self.modules_hashes.GetFileHashes( *args, **kwargs )
elif action == 'file_history': result = self.modules_files_metadata_rich.GetFileHistory( *args, **kwargs )
elif action == 'file_info_managers': result = self._GetFileInfoManagersFromHashes( *args, **kwargs )
@@ -7329,6 +7334,7 @@ class DB( HydrusDB.HydrusDB ):
elif action == 'file_maintenance_get_job': result = self.modules_files_maintenance_queue.GetJob( *args, **kwargs )
elif action == 'file_maintenance_get_job_counts': result = self.modules_files_maintenance_queue.GetJobCounts( *args, **kwargs )
elif action == 'file_query_ids': result = self._GetHashIdsFromQuery( *args, **kwargs )
+ elif action == 'file_relationships_for_api': result = self.modules_files_duplicates.GetFileRelationshipsForAPI( *args, **kwargs )
elif action == 'file_system_predicates': result = self._GetFileSystemPredicates( *args, **kwargs )
elif action == 'filter_existing_tags': result = self.modules_mappings_counts_update.FilterExistingTags( *args, **kwargs )
elif action == 'filter_hashes': result = self.modules_files_metadata_rich.FilterHashesByService( *args, **kwargs )
@@ -9776,7 +9782,7 @@ class DB( HydrusDB.HydrusDB ):
num_relationships = 0
dupe_type = HC.DUPLICATE_POTENTIAL
- dupe_hash_ids = self.modules_files_duplicates.DuplicatesGetHashIdsFromDuplicateCountPredicate( db_location_context, operator, num_relationships, dupe_type )
+ dupe_hash_ids = self.modules_files_duplicates.GetHashIdsFromDuplicateCountPredicate( db_location_context, operator, num_relationships, dupe_type )
with self._MakeTemporaryIntegerTable( dupe_hash_ids, 'hash_id' ) as temp_hash_ids_table_name:
@@ -11234,8 +11240,8 @@ class DB( HydrusDB.HydrusDB ):
elif action == 'associate_repository_update_hashes': self.modules_repositories.AssociateRepositoryUpdateHashes( *args, **kwargs )
elif action == 'backup': self._Backup( *args, **kwargs )
elif action == 'clear_deferred_physical_delete': self.modules_files_storage.ClearDeferredPhysicalDelete( *args, **kwargs )
- elif action == 'clear_false_positive_relations': self.modules_files_duplicates.DuplicatesClearAllFalsePositiveRelationsFromHashes( *args, **kwargs )
- elif action == 'clear_false_positive_relations_between_groups': self.modules_files_duplicates.DuplicatesClearFalsePositiveRelationsBetweenGroupsFromHashes( *args, **kwargs )
+ elif action == 'clear_false_positive_relations': self.modules_files_duplicates.ClearAllFalsePositiveRelationsFromHashes( *args, **kwargs )
+ elif action == 'clear_false_positive_relations_between_groups': self.modules_files_duplicates.ClearFalsePositiveRelationsBetweenGroupsFromHashes( *args, **kwargs )
elif action == 'clear_orphan_file_records': self._ClearOrphanFileRecords( *args, **kwargs )
elif action == 'clear_orphan_tables': self._ClearOrphanTables( *args, **kwargs )
elif action == 'content_updates': self._ProcessContentUpdates( *args, **kwargs )
@@ -11246,12 +11252,12 @@ class DB( HydrusDB.HydrusDB ):
elif action == 'delete_pending': self._DeletePending( *args, **kwargs )
elif action == 'delete_serialisable_named': self.modules_serialisable.DeleteJSONDumpNamed( *args, **kwargs )
elif action == 'delete_service_info': self._DeleteServiceInfo( *args, **kwargs )
- elif action == 'delete_potential_duplicate_pairs': self.modules_files_duplicates.DuplicatesDeleteAllPotentialDuplicatePairs( *args, **kwargs )
+ elif action == 'delete_potential_duplicate_pairs': self.modules_files_duplicates.DeleteAllPotentialDuplicatePairs( *args, **kwargs )
elif action == 'dirty_services': self._SaveDirtyServices( *args, **kwargs )
- elif action == 'dissolve_alternates_group': self.modules_files_duplicates.DuplicatesDissolveAlternatesGroupIdFromHashes( *args, **kwargs )
- elif action == 'dissolve_duplicates_group': self.modules_files_duplicates.DuplicatesDissolveMediaIdFromHashes( *args, **kwargs )
+ elif action == 'dissolve_alternates_group': self.modules_files_duplicates.DissolveAlternatesGroupIdFromHashes( *args, **kwargs )
+ elif action == 'dissolve_duplicates_group': self.modules_files_duplicates.DissolveMediaIdFromHashes( *args, **kwargs )
elif action == 'duplicate_pair_status': self._DuplicatesSetDuplicatePairStatus( *args, **kwargs )
- elif action == 'duplicate_set_king': self.modules_files_duplicates.DuplicatesSetKingFromHash( *args, **kwargs )
+ elif action == 'duplicate_set_king': self.modules_files_duplicates.SetKingFromHash( *args, **kwargs )
elif action == 'file_maintenance_add_jobs': self.modules_files_maintenance_queue.AddJobs( *args, **kwargs )
elif action == 'file_maintenance_add_jobs_hashes': self.modules_files_maintenance_queue.AddJobsHashes( *args, **kwargs )
elif action == 'file_maintenance_cancel_jobs': self.modules_files_maintenance_queue.CancelJobs( *args, **kwargs )
@@ -11286,9 +11292,9 @@ class DB( HydrusDB.HydrusDB ):
elif action == 'repopulate_tag_cache_missing_subtags': self._RepopulateTagCacheMissingSubtags( *args, **kwargs )
elif action == 'repopulate_tag_display_mappings_cache': self._RepopulateTagDisplayMappingsCache( *args, **kwargs )
elif action == 'relocate_client_files': self.modules_files_physical_storage.RelocateClientFiles( *args, **kwargs )
- elif action == 'remove_alternates_member': self.modules_files_duplicates.DuplicatesRemoveAlternateMemberFromHashes( *args, **kwargs )
- elif action == 'remove_duplicates_member': self.modules_files_duplicates.DuplicatesRemoveMediaIdMemberFromHashes( *args, **kwargs )
- elif action == 'remove_potential_pairs': self.modules_files_duplicates.DuplicatesRemovePotentialPairsFromHashes( *args, **kwargs )
+ elif action == 'remove_alternates_member': self.modules_files_duplicates.RemoveAlternateMemberFromHashes( *args, **kwargs )
+ elif action == 'remove_duplicates_member': self.modules_files_duplicates.RemoveMediaIdMemberFromHashes( *args, **kwargs )
+ elif action == 'remove_potential_pairs': self.modules_files_duplicates.RemovePotentialPairsFromHashes( *args, **kwargs )
elif action == 'repair_client_files': self.modules_files_physical_storage.RepairClientFiles( *args, **kwargs )
elif action == 'repair_invalid_tags': self._RepairInvalidTags( *args, **kwargs )
elif action == 'reprocess_repository': self.modules_repositories.ReprocessRepository( *args, **kwargs )
diff --git a/hydrus/client/db/ClientDBFilesDuplicates.py b/hydrus/client/db/ClientDBFilesDuplicates.py
index 18c8a4e4..7bcd1188 100644
--- a/hydrus/client/db/ClientDBFilesDuplicates.py
+++ b/hydrus/client/db/ClientDBFilesDuplicates.py
@@ -5,11 +5,9 @@ import sqlite3
import typing
from hydrus.core import HydrusConstants as HC
-from hydrus.core import HydrusExceptions
from hydrus.client import ClientConstants as CC
from hydrus.client import ClientLocation
-from hydrus.client import ClientSearch
from hydrus.client.db import ClientDBDefinitionsCache
from hydrus.client.db import ClientDBFilesStorage
from hydrus.client.db import ClientDBModule
@@ -34,6 +32,137 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
self._service_ids_to_content_types_to_outstanding_local_processing = collections.defaultdict( dict )
+ def _GetFileHashIdsByDuplicateType( self, db_location_context: ClientDBFilesStorage.DBLocationContext, hash_id: int, duplicate_type: int, allowed_hash_ids = None, preferred_hash_ids = None ) -> typing.List[ int ]:
+
+ dupe_hash_ids = set()
+
+ if duplicate_type == HC.DUPLICATE_FALSE_POSITIVE:
+
+ media_id = self.GetMediaId( hash_id, do_not_create = True )
+
+ if media_id is not None:
+
+ alternates_group_id = self.GetAlternatesGroupId( media_id, do_not_create = True )
+
+ if alternates_group_id is not None:
+
+ false_positive_alternates_group_ids = self.GetFalsePositiveAlternatesGroupIds( alternates_group_id )
+
+ false_positive_alternates_group_ids.discard( alternates_group_id )
+
+ false_positive_media_ids = set()
+
+ for false_positive_alternates_group_id in false_positive_alternates_group_ids:
+
+ false_positive_media_ids.update( self.GetAlternateMediaIds( false_positive_alternates_group_id ) )
+
+
+ for false_positive_media_id in false_positive_media_ids:
+
+ best_king_hash_id = self.GetBestKingId( false_positive_media_id, db_location_context, allowed_hash_ids = allowed_hash_ids, preferred_hash_ids = preferred_hash_ids )
+
+ if best_king_hash_id is not None:
+
+ dupe_hash_ids.add( best_king_hash_id )
+
+
+
+
+
+ elif duplicate_type == HC.DUPLICATE_ALTERNATE:
+
+ media_id = self.GetMediaId( hash_id, do_not_create = True )
+
+ if media_id is not None:
+
+ alternates_group_id = self.GetAlternatesGroupId( media_id, do_not_create = True )
+
+ if alternates_group_id is not None:
+
+ alternates_media_ids = self._STS( self._Execute( 'SELECT media_id FROM alternate_file_group_members WHERE alternates_group_id = ?;', ( alternates_group_id, ) ) )
+
+ alternates_media_ids.discard( media_id )
+
+ for alternates_media_id in alternates_media_ids:
+
+ best_king_hash_id = self.GetBestKingId( alternates_media_id, db_location_context, allowed_hash_ids = allowed_hash_ids, preferred_hash_ids = preferred_hash_ids )
+
+ if best_king_hash_id is not None:
+
+ dupe_hash_ids.add( best_king_hash_id )
+
+
+
+
+
+ elif duplicate_type == HC.DUPLICATE_MEMBER:
+
+ media_id = self.GetMediaId( hash_id, do_not_create = True )
+
+ if media_id is not None:
+
+ media_hash_ids = self.GetDuplicateHashIds( media_id, db_location_context = db_location_context )
+
+ if allowed_hash_ids is not None:
+
+ media_hash_ids.intersection_update( allowed_hash_ids )
+
+
+ dupe_hash_ids.update( media_hash_ids )
+
+
+ elif duplicate_type == HC.DUPLICATE_KING:
+
+ media_id = self.GetMediaId( hash_id, do_not_create = True )
+
+ if media_id is not None:
+
+ best_king_hash_id = self.GetBestKingId( media_id, db_location_context, allowed_hash_ids = allowed_hash_ids, preferred_hash_ids = preferred_hash_ids )
+
+ if best_king_hash_id is not None:
+
+ dupe_hash_ids.add( best_king_hash_id )
+
+
+
+ elif duplicate_type == HC.DUPLICATE_POTENTIAL:
+
+ media_id = self.GetMediaId( hash_id, do_not_create = True )
+
+ if media_id is not None:
+
+ table_join = self.GetPotentialDuplicatePairsTableJoinOnFileService( db_location_context )
+
+ for ( smaller_media_id, larger_media_id ) in self._Execute( 'SELECT smaller_media_id, larger_media_id FROM {} WHERE smaller_media_id = ? OR larger_media_id = ?;'.format( table_join ), ( media_id, media_id ) ).fetchall():
+
+ if smaller_media_id != media_id:
+
+ potential_media_id = smaller_media_id
+
+ else:
+
+ potential_media_id = larger_media_id
+
+
+ best_king_hash_id = self.GetBestKingId( potential_media_id, db_location_context, allowed_hash_ids = allowed_hash_ids, preferred_hash_ids = preferred_hash_ids )
+
+ if best_king_hash_id is not None:
+
+ dupe_hash_ids.add( best_king_hash_id )
+
+
+
+
+
+ dupe_hash_ids.discard( hash_id )
+
+ dupe_hash_ids = list( dupe_hash_ids )
+
+ dupe_hash_ids.insert( 0, hash_id )
+
+ return dupe_hash_ids
+
+
def _GetInitialIndexGenerationDict( self ) -> dict:
index_generation_dict = {}
@@ -62,7 +191,7 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
}
- def DuplicatesAddPotentialDuplicates( self, media_id, potential_duplicate_media_ids_and_distances ):
+ def AddPotentialDuplicates( self, media_id, potential_duplicate_media_ids_and_distances ):
inserts = []
@@ -73,12 +202,12 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
continue
- if self.DuplicatesMediasAreFalsePositive( media_id, potential_duplicate_media_id ):
+ if self.MediasAreFalsePositive( media_id, potential_duplicate_media_id ):
continue
- if self.DuplicatesMediasAreConfirmedAlternates( media_id, potential_duplicate_media_id ):
+ if self.MediasAreConfirmedAlternates( media_id, potential_duplicate_media_id ):
continue
@@ -98,7 +227,7 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
- def DuplicatesAlternatesGroupsAreFalsePositive( self, alternates_group_id_a, alternates_group_id_b ):
+ def AlternatesGroupsAreFalsePositive( self, alternates_group_id_a, alternates_group_id_b ):
if alternates_group_id_a == alternates_group_id_b:
@@ -115,38 +244,38 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
return false_positive_pair_found
- def DuplicatesClearAllFalsePositiveRelations( self, alternates_group_id ):
+ def ClearAllFalsePositiveRelations( self, alternates_group_id ):
self._Execute( 'DELETE FROM duplicate_false_positives WHERE smaller_alternates_group_id = ? OR larger_alternates_group_id = ?;', ( alternates_group_id, alternates_group_id ) )
- media_ids = self.DuplicatesGetAlternateMediaIds( alternates_group_id )
+ media_ids = self.GetAlternateMediaIds( alternates_group_id )
- hash_ids = self.DuplicatesGetDuplicatesHashIds( media_ids )
+ hash_ids = self.GetDuplicatesHashIds( media_ids )
self.modules_similar_files.ResetSearch( hash_ids )
- def DuplicatesClearAllFalsePositiveRelationsFromHashes( self, hashes ):
+ def ClearAllFalsePositiveRelationsFromHashes( self, hashes ):
hash_ids = self.modules_hashes_local_cache.GetHashIds( hashes )
for hash_id in hash_ids:
- media_id = self.DuplicatesGetMediaId( hash_id, do_not_create = True )
+ media_id = self.GetMediaId( hash_id, do_not_create = True )
if media_id is not None:
- alternates_group_id = self.DuplicatesGetAlternatesGroupId( media_id, do_not_create = True )
+ alternates_group_id = self.GetAlternatesGroupId( media_id, do_not_create = True )
if alternates_group_id is not None:
- self.DuplicatesClearAllFalsePositiveRelations( alternates_group_id )
+ self.ClearAllFalsePositiveRelations( alternates_group_id )
- def DuplicatesClearFalsePositiveRelationsBetweenGroups( self, alternates_group_ids ):
+ def ClearFalsePositiveRelationsBetweenGroups( self, alternates_group_ids ):
pairs = list( itertools.combinations( alternates_group_ids, 2 ) )
@@ -160,25 +289,25 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
for alternates_group_id in alternates_group_ids:
- media_ids = self.DuplicatesGetAlternateMediaIds( alternates_group_id )
+ media_ids = self.GetAlternateMediaIds( alternates_group_id )
- hash_ids = self.DuplicatesGetDuplicatesHashIds( media_ids )
+ hash_ids = self.GetDuplicatesHashIds( media_ids )
self.modules_similar_files.ResetSearch( hash_ids )
- def DuplicatesClearFalsePositiveRelationsBetweenGroupsFromHashes( self, hashes ):
+ def ClearFalsePositiveRelationsBetweenGroupsFromHashes( self, hashes ):
alternates_group_ids = set()
hash_id = self.modules_hashes_local_cache.GetHashId( hash )
- media_id = self.DuplicatesGetMediaId( hash_id, do_not_create = True )
+ media_id = self.GetMediaId( hash_id, do_not_create = True )
if media_id is not None:
- alternates_group_id = self.DuplicatesGetAlternatesGroupId( media_id, do_not_create = True )
+ alternates_group_id = self.GetAlternatesGroupId( media_id, do_not_create = True )
if alternates_group_id is not None:
@@ -188,11 +317,11 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
if len( alternates_group_ids ) > 1:
- self.DuplicatesClearFalsePositiveRelationsBetweenGroups( alternates_group_ids )
+ self.ClearFalsePositiveRelationsBetweenGroups( alternates_group_ids )
- def DuplicatesClearPotentialsBetweenMedias( self, media_ids_a, media_ids_b ):
+ def ClearPotentialsBetweenMedias( self, media_ids_a, media_ids_b ):
# these two groups of medias now have a false positive or alternates relationship set between them, or they are about to be merged
# therefore, potentials between them are no longer needed
@@ -228,17 +357,17 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
- def DuplicatesClearPotentialsBetweenAlternatesGroups( self, alternates_group_id_a, alternates_group_id_b ):
+ def ClearPotentialsBetweenAlternatesGroups( self, alternates_group_id_a, alternates_group_id_b ):
# these groups are being set as false positive. therefore, any potential between them no longer applies
- media_ids_a = self.DuplicatesGetAlternateMediaIds( alternates_group_id_a )
- media_ids_b = self.DuplicatesGetAlternateMediaIds( alternates_group_id_b )
+ media_ids_a = self.GetAlternateMediaIds( alternates_group_id_a )
+ media_ids_b = self.GetAlternateMediaIds( alternates_group_id_b )
- self.DuplicatesClearPotentialsBetweenMedias( media_ids_a, media_ids_b )
+ self.ClearPotentialsBetweenMedias( media_ids_a, media_ids_b )
- def DuplicatesDeleteAllPotentialDuplicatePairs( self ):
+ def DeleteAllPotentialDuplicatePairs( self ):
media_ids = set()
@@ -248,50 +377,50 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
media_ids.add( larger_media_id )
- hash_ids = self.DuplicatesGetDuplicatesHashIds( media_ids )
+ hash_ids = self.GetDuplicatesHashIds( media_ids )
self._Execute( 'DELETE FROM potential_duplicate_pairs;' )
self.modules_similar_files.ResetSearch( hash_ids )
- def DuplicatesDissolveAlternatesGroupId( self, alternates_group_id ):
+ def DissolveAlternatesGroupId( self, alternates_group_id ):
- media_ids = self.DuplicatesGetAlternateMediaIds( alternates_group_id )
+ media_ids = self.GetAlternateMediaIds( alternates_group_id )
for media_id in media_ids:
- self.DuplicatesDissolveMediaId( media_id )
+ self.DissolveMediaId( media_id )
- def DuplicatesDissolveAlternatesGroupIdFromHashes( self, hashes ):
+ def DissolveAlternatesGroupIdFromHashes( self, hashes ):
hash_ids = self.modules_hashes_local_cache.GetHashIds( hashes )
for hash_id in hash_ids:
- media_id = self.DuplicatesGetMediaId( hash_id, do_not_create = True )
+ media_id = self.GetMediaId( hash_id, do_not_create = True )
if media_id is not None:
- alternates_group_id = self.DuplicatesGetAlternatesGroupId( media_id, do_not_create = True )
+ alternates_group_id = self.GetAlternatesGroupId( media_id, do_not_create = True )
if alternates_group_id is not None:
- self.DuplicatesDissolveAlternatesGroupId( alternates_group_id )
+ self.DissolveAlternatesGroupId( alternates_group_id )
- def DuplicatesDissolveMediaId( self, media_id ):
+ def DissolveMediaId( self, media_id ):
- self.DuplicatesRemoveAlternateMember( media_id )
+ self.RemoveAlternateMember( media_id )
self._Execute( 'DELETE FROM potential_duplicate_pairs WHERE smaller_media_id = ? OR larger_media_id = ?;', ( media_id, media_id ) )
- hash_ids = self.DuplicatesGetDuplicateHashIds( media_id )
+ hash_ids = self.GetDuplicateHashIds( media_id )
self._Execute( 'DELETE FROM duplicate_file_members WHERE media_id = ?;', ( media_id, ) )
self._Execute( 'DELETE FROM duplicate_files WHERE media_id = ?;', ( media_id, ) )
@@ -299,22 +428,22 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
self.modules_similar_files.ResetSearch( hash_ids )
- def DuplicatesDissolveMediaIdFromHashes( self, hashes ):
+ def DissolveMediaIdFromHashes( self, hashes ):
hash_ids = self.modules_hashes_local_cache.GetHashIds( hashes )
for hash_id in hash_ids:
- media_id = self.DuplicatesGetMediaId( hash_id, do_not_create = True )
+ media_id = self.GetMediaId( hash_id, do_not_create = True )
if media_id is not None:
- self.DuplicatesDissolveMediaId( media_id )
+ self.DissolveMediaId( media_id )
- def DuplicatesFilterKingHashIds( self, allowed_hash_ids ):
+ def FilterKingHashIds( self, allowed_hash_ids ):
# can't just pull explicit king_hash_ids, since files that do not have a media_id are still kings
# kings = hashes - explicitly not kings
@@ -336,7 +465,7 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
return allowed_hash_ids.difference( all_non_king_hash_ids )
- def DuplicatesFilterMediaIdPairs( self, db_location_context: ClientDBFilesStorage.DBLocationContext, media_id_pairs ):
+ def FilterMediaIdPairs( self, db_location_context: ClientDBFilesStorage.DBLocationContext, media_id_pairs ):
if len( media_id_pairs ) == 0:
@@ -363,7 +492,7 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
return good_media_id_pairs
- def DuplicatesGetAlternatesGroupId( self, media_id, do_not_create = False ):
+ def GetAlternatesGroupId( self, media_id, do_not_create = False ):
result = self._Execute( 'SELECT alternates_group_id FROM alternate_file_group_members WHERE media_id = ?;', ( media_id, ) ).fetchone()
@@ -388,16 +517,16 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
return alternates_group_id
- def DuplicatesGetAlternateMediaIds( self, alternates_group_id ):
+ def GetAlternateMediaIds( self, alternates_group_id ):
media_ids = self._STS( self._Execute( 'SELECT media_id FROM alternate_file_group_members WHERE alternates_group_id = ?;', ( alternates_group_id, ) ) )
return media_ids
- def DuplicatesGetBestKingId( self, media_id, db_location_context: ClientDBFilesStorage.DBLocationContext, allowed_hash_ids = None, preferred_hash_ids = None ):
+ def GetBestKingId( self, media_id, db_location_context: ClientDBFilesStorage.DBLocationContext, allowed_hash_ids = None, preferred_hash_ids = None ):
- media_hash_ids = self.DuplicatesGetDuplicateHashIds( media_id, db_location_context = db_location_context )
+ media_hash_ids = self.GetDuplicateHashIds( media_id, db_location_context = db_location_context )
if allowed_hash_ids is not None:
@@ -406,7 +535,7 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
if len( media_hash_ids ) > 0:
- king_hash_id = self.DuplicatesGetKingHashId( media_id )
+ king_hash_id = self.GetKingHashId( media_id )
if preferred_hash_ids is not None:
@@ -434,7 +563,7 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
return None
- def DuplicatesGetDuplicateHashIds( self, media_id, db_location_context: ClientDBFilesStorage.DBLocationContext = None ):
+ def GetDuplicateHashIds( self, media_id, db_location_context: ClientDBFilesStorage.DBLocationContext = None ):
table_join = 'duplicate_file_members'
@@ -457,7 +586,7 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
return hash_ids
- def DuplicatesGetDuplicatesHashIds( self, media_ids, db_location_context: ClientDBFilesStorage.DBLocationContext = None ):
+ def GetDuplicatesHashIds( self, media_ids, db_location_context: ClientDBFilesStorage.DBLocationContext = None ):
with self._MakeTemporaryIntegerTable( media_ids, 'media_id' ) as temp_media_ids_table_name:
@@ -474,7 +603,7 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
return hash_ids
- def DuplicatesGetFalsePositiveAlternatesGroupIds( self, alternates_group_id ):
+ def GetFalsePositiveAlternatesGroupIds( self, alternates_group_id ):
false_positive_alternates_group_ids = set()
@@ -489,7 +618,7 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
return false_positive_alternates_group_ids
- def DuplicatesGetFileDuplicateInfo( self, location_context, hash ):
+ def GetFileDuplicateInfo( self, location_context, hash ):
result_dict = {}
@@ -499,7 +628,7 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
counter = collections.Counter()
- media_id = self.DuplicatesGetMediaId( hash_id, do_not_create = True )
+ media_id = self.GetMediaId( hash_id, do_not_create = True )
if media_id is not None:
@@ -507,18 +636,18 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
all_potential_pairs = self._Execute( 'SELECT DISTINCT smaller_media_id, larger_media_id FROM potential_duplicate_pairs WHERE smaller_media_id = ? OR larger_media_id = ?;', ( media_id, media_id, ) ).fetchall()
- potential_pairs = self.DuplicatesFilterMediaIdPairs( db_location_context, all_potential_pairs )
+ potential_pairs = self.FilterMediaIdPairs( db_location_context, all_potential_pairs )
if len( potential_pairs ) > 0:
counter[ HC.DUPLICATE_POTENTIAL ] = len( potential_pairs )
- king_hash_id = self.DuplicatesGetKingHashId( media_id )
+ king_hash_id = self.GetKingHashId( media_id )
result_dict[ 'is_king' ] = king_hash_id == hash_id
- media_hash_ids = self.DuplicatesGetDuplicateHashIds( media_id, db_location_context = db_location_context )
+ media_hash_ids = self.GetDuplicateHashIds( media_id, db_location_context = db_location_context )
num_other_dupe_members = len( media_hash_ids ) - 1
@@ -527,17 +656,17 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
counter[ HC.DUPLICATE_MEMBER ] = num_other_dupe_members
- alternates_group_id = self.DuplicatesGetAlternatesGroupId( media_id, do_not_create = True )
+ alternates_group_id = self.GetAlternatesGroupId( media_id, do_not_create = True )
if alternates_group_id is not None:
- alt_media_ids = self.DuplicatesGetAlternateMediaIds( alternates_group_id )
+ alt_media_ids = self.GetAlternateMediaIds( alternates_group_id )
alt_media_ids.discard( media_id )
for alt_media_id in alt_media_ids:
- alt_hash_ids = self.DuplicatesGetDuplicateHashIds( alt_media_id, db_location_context = db_location_context )
+ alt_hash_ids = self.GetDuplicateHashIds( alt_media_id, db_location_context = db_location_context )
if len( alt_hash_ids ) > 0:
@@ -555,17 +684,17 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
- false_positive_alternates_group_ids = self.DuplicatesGetFalsePositiveAlternatesGroupIds( alternates_group_id )
+ false_positive_alternates_group_ids = self.GetFalsePositiveAlternatesGroupIds( alternates_group_id )
false_positive_alternates_group_ids.discard( alternates_group_id )
for false_positive_alternates_group_id in false_positive_alternates_group_ids:
- fp_media_ids = self.DuplicatesGetAlternateMediaIds( false_positive_alternates_group_id )
+ fp_media_ids = self.GetAlternateMediaIds( false_positive_alternates_group_id )
for fp_media_id in fp_media_ids:
- fp_hash_ids = self.DuplicatesGetDuplicateHashIds( fp_media_id, db_location_context = db_location_context )
+ fp_hash_ids = self.GetDuplicateHashIds( fp_media_id, db_location_context = db_location_context )
if len( fp_hash_ids ) > 0:
@@ -581,144 +710,92 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
return result_dict
- def DuplicatesGetFileHashesByDuplicateType( self, location_context: ClientLocation.LocationContext, hash: bytes, duplicate_type: int, allowed_hash_ids = None, preferred_hash_ids = None ) -> typing.List[ bytes ]:
+ def GetFileRelationshipsForAPI( self, location_context: ClientLocation.LocationContext, hashes: typing.Collection[ bytes ] ):
+
+ hashes_to_file_relationships = {}
+
+ db_location_context = self.modules_files_storage.GetDBLocationContext( location_context )
+
+ duplicate_types_to_fetch = (
+ HC.DUPLICATE_POTENTIAL,
+ HC.DUPLICATE_MEMBER,
+ HC.DUPLICATE_FALSE_POSITIVE,
+ HC.DUPLICATE_ALTERNATE
+ )
+
+ for hash in hashes:
+
+ file_relationships_dict = {}
+
+ hash_id = self.modules_hashes_local_cache.GetHashId( hash )
+
+ media_id = self.GetMediaId( hash_id, do_not_create = True )
+
+ if media_id is None:
+
+ file_relationships_dict[ 'is_king' ] = True
+ file_relationships_dict[ 'king' ] = hash.hex()
+
+ for duplicate_type in duplicate_types_to_fetch:
+
+ file_relationships_dict[ str( duplicate_type ) ] = []
+
+
+ else:
+
+ king_hash_id = self.GetBestKingId( media_id, db_location_context )
+
+ if king_hash_id is None:
+
+ file_relationships_dict[ 'is_king' ] = False
+ file_relationships_dict[ 'king' ] = None
+
+ elif king_hash_id == hash_id:
+
+ file_relationships_dict[ 'is_king' ] = True
+ file_relationships_dict[ 'king' ] = hash.hex()
+
+ else:
+
+ file_relationships_dict[ 'is_king' ] = False
+ file_relationships_dict[ 'king' ] = self.modules_hashes_local_cache.GetHash( king_hash_id ).hex()
+
+
+ for duplicate_type in ( HC.DUPLICATE_POTENTIAL, HC.DUPLICATE_MEMBER, HC.DUPLICATE_FALSE_POSITIVE, HC.DUPLICATE_ALTERNATE ):
+
+ dupe_hash_ids = list( self._GetFileHashIdsByDuplicateType( db_location_context, hash_id, duplicate_type ) )
+
+ dupe_hash_ids.sort()
+
+ if hash_id in dupe_hash_ids:
+
+ dupe_hash_ids.remove( hash_id )
+
+
+ file_relationships_dict[ str( duplicate_type ) ] = [ h.hex() for h in self.modules_hashes_local_cache.GetHashes( dupe_hash_ids ) ]
+
+
+
+ hashes_to_file_relationships[ hash.hex() ] = file_relationships_dict
+
+
+ return hashes_to_file_relationships
+
+
+ def GetFileHashesByDuplicateType( self, location_context: ClientLocation.LocationContext, hash: bytes, duplicate_type: int, allowed_hash_ids = None, preferred_hash_ids = None ) -> typing.List[ bytes ]:
hash_id = self.modules_hashes_local_cache.GetHashId( hash )
db_location_context = self.modules_files_storage.GetDBLocationContext( location_context )
- dupe_hash_ids = set()
-
- if duplicate_type == HC.DUPLICATE_FALSE_POSITIVE:
-
- media_id = self.DuplicatesGetMediaId( hash_id, do_not_create = True )
-
- if media_id is not None:
-
- alternates_group_id = self.DuplicatesGetAlternatesGroupId( media_id, do_not_create = True )
-
- if alternates_group_id is not None:
-
- false_positive_alternates_group_ids = self.DuplicatesGetFalsePositiveAlternatesGroupIds( alternates_group_id )
-
- false_positive_alternates_group_ids.discard( alternates_group_id )
-
- false_positive_media_ids = set()
-
- for false_positive_alternates_group_id in false_positive_alternates_group_ids:
-
- false_positive_media_ids.update( self.DuplicatesGetAlternateMediaIds( false_positive_alternates_group_id ) )
-
-
- for false_positive_media_id in false_positive_media_ids:
-
- best_king_hash_id = self.DuplicatesGetBestKingId( false_positive_media_id, db_location_context, allowed_hash_ids = allowed_hash_ids, preferred_hash_ids = preferred_hash_ids )
-
- if best_king_hash_id is not None:
-
- dupe_hash_ids.add( best_king_hash_id )
-
-
-
-
-
- elif duplicate_type == HC.DUPLICATE_ALTERNATE:
-
- media_id = self.DuplicatesGetMediaId( hash_id, do_not_create = True )
-
- if media_id is not None:
-
- alternates_group_id = self.DuplicatesGetAlternatesGroupId( media_id, do_not_create = True )
-
- if alternates_group_id is not None:
-
- alternates_media_ids = self._STS( self._Execute( 'SELECT media_id FROM alternate_file_group_members WHERE alternates_group_id = ?;', ( alternates_group_id, ) ) )
-
- alternates_media_ids.discard( media_id )
-
- for alternates_media_id in alternates_media_ids:
-
- best_king_hash_id = self.DuplicatesGetBestKingId( alternates_media_id, db_location_context, allowed_hash_ids = allowed_hash_ids, preferred_hash_ids = preferred_hash_ids )
-
- if best_king_hash_id is not None:
-
- dupe_hash_ids.add( best_king_hash_id )
-
-
-
-
-
- elif duplicate_type == HC.DUPLICATE_MEMBER:
-
- media_id = self.DuplicatesGetMediaId( hash_id, do_not_create = True )
-
- if media_id is not None:
-
- media_hash_ids = self.DuplicatesGetDuplicateHashIds( media_id, db_location_context = db_location_context )
-
- if allowed_hash_ids is not None:
-
- media_hash_ids.intersection_update( allowed_hash_ids )
-
-
- dupe_hash_ids.update( media_hash_ids )
-
-
- elif duplicate_type == HC.DUPLICATE_KING:
-
- media_id = self.DuplicatesGetMediaId( hash_id, do_not_create = True )
-
- if media_id is not None:
-
- best_king_hash_id = self.DuplicatesGetBestKingId( media_id, db_location_context, allowed_hash_ids = allowed_hash_ids, preferred_hash_ids = preferred_hash_ids )
-
- if best_king_hash_id is not None:
-
- dupe_hash_ids.add( best_king_hash_id )
-
-
-
- elif duplicate_type == HC.DUPLICATE_POTENTIAL:
-
- media_id = self.DuplicatesGetMediaId( hash_id, do_not_create = True )
-
- if media_id is not None:
-
- table_join = self.DuplicatesGetPotentialDuplicatePairsTableJoinOnFileService( db_location_context )
-
- for ( smaller_media_id, larger_media_id ) in self._Execute( 'SELECT smaller_media_id, larger_media_id FROM {} WHERE smaller_media_id = ? OR larger_media_id = ?;'.format( table_join ), ( media_id, media_id ) ).fetchall():
-
- if smaller_media_id != media_id:
-
- potential_media_id = smaller_media_id
-
- else:
-
- potential_media_id = larger_media_id
-
-
- best_king_hash_id = self.DuplicatesGetBestKingId( potential_media_id, db_location_context, allowed_hash_ids = allowed_hash_ids, preferred_hash_ids = preferred_hash_ids )
-
- if best_king_hash_id is not None:
-
- dupe_hash_ids.add( best_king_hash_id )
-
-
-
-
-
- dupe_hash_ids.discard( hash_id )
-
- dupe_hash_ids = list( dupe_hash_ids )
-
- dupe_hash_ids.insert( 0, hash_id )
+ dupe_hash_ids = self._GetFileHashIdsByDuplicateType( db_location_context, hash_id, duplicate_type, allowed_hash_ids = allowed_hash_ids, preferred_hash_ids = preferred_hash_ids )
dupe_hashes = self.modules_hashes_local_cache.GetHashes( dupe_hash_ids )
return dupe_hashes
- def DuplicatesGetHashIdsFromDuplicateCountPredicate( self, db_location_context: ClientDBFilesStorage.DBLocationContext, operator, num_relationships, dupe_type ):
+ def GetHashIdsFromDuplicateCountPredicate( self, db_location_context: ClientDBFilesStorage.DBLocationContext, operator, num_relationships, dupe_type ):
# doesn't work for '= 0' or '< 1'
@@ -779,11 +856,11 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
valid = False
- fp_media_ids = self.DuplicatesGetAlternateMediaIds( false_positive_alternates_group_id )
+ fp_media_ids = self.GetAlternateMediaIds( false_positive_alternates_group_id )
for fp_media_id in fp_media_ids:
- fp_hash_ids = self.DuplicatesGetDuplicateHashIds( fp_media_id, db_location_context = db_location_context )
+ fp_hash_ids = self.GetDuplicateHashIds( fp_media_id, db_location_context = db_location_context )
if len( fp_hash_ids ) > 0:
@@ -804,9 +881,9 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
if filter_func( count ):
- media_ids = self.DuplicatesGetAlternateMediaIds( alternates_group_id )
+ media_ids = self.GetAlternateMediaIds( alternates_group_id )
- hash_ids = self.DuplicatesGetDuplicatesHashIds( media_ids, db_location_context = db_location_context )
+ hash_ids = self.GetDuplicatesHashIds( media_ids, db_location_context = db_location_context )
@@ -820,13 +897,13 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
count -= 1 # num relationships is number group members - 1
- media_ids = self.DuplicatesGetAlternateMediaIds( alternates_group_id )
+ media_ids = self.GetAlternateMediaIds( alternates_group_id )
alternates_group_id_hash_ids = []
for media_id in media_ids:
- media_id_hash_ids = self.DuplicatesGetDuplicateHashIds( media_id, db_location_context = db_location_context )
+ media_id_hash_ids = self.GetDuplicateHashIds( media_id, db_location_context = db_location_context )
if len( media_id_hash_ids ) == 0:
@@ -863,11 +940,11 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
- hash_ids = self.DuplicatesGetDuplicatesHashIds( media_ids, db_location_context = db_location_context )
+ hash_ids = self.GetDuplicatesHashIds( media_ids, db_location_context = db_location_context )
elif dupe_type == HC.DUPLICATE_POTENTIAL:
- table_join = self.DuplicatesGetPotentialDuplicatePairsTableJoinOnFileService( db_location_context )
+ table_join = self.GetPotentialDuplicatePairsTableJoinOnFileService( db_location_context )
smaller_query = 'SELECT smaller_media_id, COUNT( * ) FROM ( SELECT DISTINCT smaller_media_id, larger_media_id FROM {} ) GROUP BY smaller_media_id;'.format( table_join )
larger_query = 'SELECT larger_media_id, COUNT( * ) FROM ( SELECT DISTINCT smaller_media_id, larger_media_id FROM {} ) GROUP BY larger_media_id;'.format( table_join )
@@ -886,20 +963,20 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
media_ids = [ media_id for ( media_id, count ) in media_ids_to_counts.items() if filter_func( count ) ]
- hash_ids = self.DuplicatesGetDuplicatesHashIds( media_ids, db_location_context = db_location_context )
+ hash_ids = self.GetDuplicatesHashIds( media_ids, db_location_context = db_location_context )
return hash_ids
- def DuplicatesGetKingHashId( self, media_id ):
+ def GetKingHashId( self, media_id ):
( king_hash_id, ) = self._Execute( 'SELECT king_hash_id FROM duplicate_files WHERE media_id = ?;', ( media_id, ) ).fetchone()
return king_hash_id
- def DuplicatesGetMediaId( self, hash_id, do_not_create = False ):
+ def GetMediaId( self, hash_id, do_not_create = False ):
result = self._Execute( 'SELECT media_id FROM duplicate_file_members WHERE hash_id = ?;', ( hash_id, ) ).fetchone()
@@ -924,7 +1001,7 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
return media_id
- def DuplicatesGetPotentialDuplicatePairsTableJoinGetInitialTablesAndPreds( self, pixel_dupes_preference: int, max_hamming_distance: int ):
+ def GetPotentialDuplicatePairsTableJoinGetInitialTablesAndPreds( self, pixel_dupes_preference: int, max_hamming_distance: int ):
tables = [
'potential_duplicate_pairs',
@@ -965,9 +1042,9 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
return ( tables, join_predicates )
- def DuplicatesGetPotentialDuplicatePairsTableJoinOnEverythingSearchResults( self, db_location_context: ClientDBFilesStorage.DBLocationContext, pixel_dupes_preference: int, max_hamming_distance: int ):
+ def GetPotentialDuplicatePairsTableJoinOnEverythingSearchResults( self, db_location_context: ClientDBFilesStorage.DBLocationContext, pixel_dupes_preference: int, max_hamming_distance: int ):
- ( tables, join_predicates ) = self.DuplicatesGetPotentialDuplicatePairsTableJoinGetInitialTablesAndPreds( pixel_dupes_preference, max_hamming_distance )
+ ( tables, join_predicates ) = self.GetPotentialDuplicatePairsTableJoinGetInitialTablesAndPreds( pixel_dupes_preference, max_hamming_distance )
if not db_location_context.location_context.IsAllKnownFiles():
@@ -986,7 +1063,7 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
return table_join
- def DuplicatesGetPotentialDuplicatePairsTableJoinOnFileService( self, db_location_context: ClientDBFilesStorage.DBLocationContext ):
+ def GetPotentialDuplicatePairsTableJoinOnFileService( self, db_location_context: ClientDBFilesStorage.DBLocationContext ):
if db_location_context.location_context.IsAllKnownFiles():
@@ -1002,9 +1079,9 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
return table_join
- def DuplicatesGetPotentialDuplicatePairsTableJoinOnSearchResultsBothFiles( self, results_table_name: str, pixel_dupes_preference: int, max_hamming_distance: int ):
+ def GetPotentialDuplicatePairsTableJoinOnSearchResultsBothFiles( self, results_table_name: str, pixel_dupes_preference: int, max_hamming_distance: int ):
- ( tables, join_predicates ) = self.DuplicatesGetPotentialDuplicatePairsTableJoinGetInitialTablesAndPreds( pixel_dupes_preference, max_hamming_distance )
+ ( tables, join_predicates ) = self.GetPotentialDuplicatePairsTableJoinGetInitialTablesAndPreds( pixel_dupes_preference, max_hamming_distance )
tables.extend( [
'{} AS results_smaller'.format( results_table_name ),
@@ -1018,7 +1095,7 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
return table_join
- def DuplicatesGetPotentialDuplicatePairsTableJoinOnSearchResults( self, db_location_context: ClientDBFilesStorage.DBLocationContext, results_table_name: str, pixel_dupes_preference: int, max_hamming_distance: int ):
+ def GetPotentialDuplicatePairsTableJoinOnSearchResults( self, db_location_context: ClientDBFilesStorage.DBLocationContext, results_table_name: str, pixel_dupes_preference: int, max_hamming_distance: int ):
# why yes this is a seven table join that involves a mix of duplicated tables, temporary tables, and duplicated temporary tables
#
@@ -1069,7 +1146,7 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
# ████████████████████████████████████████████████████████████████████████
#
- ( tables, join_predicates ) = self.DuplicatesGetPotentialDuplicatePairsTableJoinGetInitialTablesAndPreds( pixel_dupes_preference, max_hamming_distance )
+ ( tables, join_predicates ) = self.GetPotentialDuplicatePairsTableJoinGetInitialTablesAndPreds( pixel_dupes_preference, max_hamming_distance )
if db_location_context.location_context.IsAllKnownFiles():
@@ -1098,13 +1175,13 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
return table_join
- def DuplicatesGetPotentialDuplicatePairsTableJoinOnSeparateSearchResults( self, results_table_name_1: str, results_table_name_2: str, pixel_dupes_preference: int, max_hamming_distance: int ):
+ def GetPotentialDuplicatePairsTableJoinOnSeparateSearchResults( self, results_table_name_1: str, results_table_name_2: str, pixel_dupes_preference: int, max_hamming_distance: int ):
#
# And taking the above to its logical conclusion with two results sets, one file in xor either
#
- ( tables, join_predicates ) = self.DuplicatesGetPotentialDuplicatePairsTableJoinGetInitialTablesAndPreds( pixel_dupes_preference, max_hamming_distance )
+ ( tables, join_predicates ) = self.GetPotentialDuplicatePairsTableJoinGetInitialTablesAndPreds( pixel_dupes_preference, max_hamming_distance )
# we don't have to do any db_location_context jibber-jabber here as long as we stipulate that the two results sets have the same location context, which we'll enforce in UI
# just like above when 'both files match', we know we are db_location_context cross-referenced since we are intersecting with file searches performed on that search domain
@@ -1125,16 +1202,16 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
return table_join
- def DuplicatesMediasAreAlternates( self, media_id_a, media_id_b ):
+ def MediasAreAlternates( self, media_id_a, media_id_b ):
- alternates_group_id_a = self.DuplicatesGetAlternatesGroupId( media_id_a, do_not_create = True )
+ alternates_group_id_a = self.GetAlternatesGroupId( media_id_a, do_not_create = True )
if alternates_group_id_a is None:
return False
- alternates_group_id_b = self.DuplicatesGetAlternatesGroupId( media_id_b, do_not_create = True )
+ alternates_group_id_b = self.GetAlternatesGroupId( media_id_b, do_not_create = True )
if alternates_group_id_b is None:
@@ -1144,7 +1221,7 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
return alternates_group_id_a == alternates_group_id_b
- def DuplicatesMediasAreConfirmedAlternates( self, media_id_a, media_id_b ):
+ def MediasAreConfirmedAlternates( self, media_id_a, media_id_b ):
smaller_media_id = min( media_id_a, media_id_b )
larger_media_id = max( media_id_a, media_id_b )
@@ -1154,40 +1231,40 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
return result is not None
- def DuplicatesMediasAreFalsePositive( self, media_id_a, media_id_b ):
+ def MediasAreFalsePositive( self, media_id_a, media_id_b ):
- alternates_group_id_a = self.DuplicatesGetAlternatesGroupId( media_id_a, do_not_create = True )
+ alternates_group_id_a = self.GetAlternatesGroupId( media_id_a, do_not_create = True )
if alternates_group_id_a is None:
return False
- alternates_group_id_b = self.DuplicatesGetAlternatesGroupId( media_id_b, do_not_create = True )
+ alternates_group_id_b = self.GetAlternatesGroupId( media_id_b, do_not_create = True )
if alternates_group_id_b is None:
return False
- return self.DuplicatesAlternatesGroupsAreFalsePositive( alternates_group_id_a, alternates_group_id_b )
+ return self.AlternatesGroupsAreFalsePositive( alternates_group_id_a, alternates_group_id_b )
- def DuplicatesMergeMedias( self, superior_media_id, mergee_media_id ):
+ def MergeMedias( self, superior_media_id, mergee_media_id ):
if superior_media_id == mergee_media_id:
return
- self.DuplicatesClearPotentialsBetweenMedias( ( superior_media_id, ), ( mergee_media_id, ) )
+ self.ClearPotentialsBetweenMedias( ( superior_media_id, ), ( mergee_media_id, ) )
- alternates_group_id = self.DuplicatesGetAlternatesGroupId( superior_media_id )
- mergee_alternates_group_id = self.DuplicatesGetAlternatesGroupId( mergee_media_id )
+ alternates_group_id = self.GetAlternatesGroupId( superior_media_id )
+ mergee_alternates_group_id = self.GetAlternatesGroupId( mergee_media_id )
if alternates_group_id != mergee_alternates_group_id:
- if self.DuplicatesAlternatesGroupsAreFalsePositive( alternates_group_id, mergee_alternates_group_id ):
+ if self.AlternatesGroupsAreFalsePositive( alternates_group_id, mergee_alternates_group_id ):
smaller_alternates_group_id = min( alternates_group_id, mergee_alternates_group_id )
larger_alternates_group_id = max( alternates_group_id, mergee_alternates_group_id )
@@ -1195,7 +1272,7 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
self._Execute( 'DELETE FROM duplicate_false_positives WHERE smaller_alternates_group_id = ? AND larger_alternates_group_id = ?;', ( smaller_alternates_group_id, larger_alternates_group_id ) )
- self.DuplicatesSetAlternates( superior_media_id, mergee_media_id )
+ self.SetAlternates( superior_media_id, mergee_media_id )
self._Execute( 'UPDATE duplicate_file_members SET media_id = ? WHERE media_id = ?;', ( superior_media_id, mergee_media_id ) )
@@ -1228,7 +1305,7 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
potential_duplicate_media_ids_and_distances = [ ( media_id_b, distance ) ]
- self.DuplicatesAddPotentialDuplicates( media_id_a, potential_duplicate_media_ids_and_distances )
+ self.AddPotentialDuplicates( media_id_a, potential_duplicate_media_ids_and_distances )
# ensure any previous confirmed alt pair is gone
@@ -1253,13 +1330,13 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
self._Execute( 'DELETE FROM duplicate_files WHERE media_id = ?;', ( mergee_media_id, ) )
- def DuplicatesRemoveAlternateMember( self, media_id ):
+ def RemoveAlternateMember( self, media_id ):
- alternates_group_id = self.DuplicatesGetAlternatesGroupId( media_id, do_not_create = True )
+ alternates_group_id = self.GetAlternatesGroupId( media_id, do_not_create = True )
if alternates_group_id is not None:
- alternates_media_ids = self.DuplicatesGetAlternateMediaIds( alternates_group_id )
+ alternates_media_ids = self.GetAlternateMediaIds( alternates_group_id )
self._Execute( 'DELETE FROM alternate_file_group_members WHERE media_id = ?;', ( media_id, ) )
@@ -1272,38 +1349,38 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
self._Execute( 'DELETE FROM duplicate_false_positives WHERE smaller_alternates_group_id = ? OR larger_alternates_group_id = ?;', ( alternates_group_id, alternates_group_id ) )
- hash_ids = self.DuplicatesGetDuplicateHashIds( media_id )
+ hash_ids = self.GetDuplicateHashIds( media_id )
self.modules_similar_files.ResetSearch( hash_ids )
- def DuplicatesRemoveAlternateMemberFromHashes( self, hashes ):
+ def RemoveAlternateMemberFromHashes( self, hashes ):
hash_ids = self.modules_hashes_local_cache.GetHashIds( hashes )
for hash_id in hash_ids:
- media_id = self.DuplicatesGetMediaId( hash_id, do_not_create = True )
+ media_id = self.GetMediaId( hash_id, do_not_create = True )
if media_id is not None:
- self.DuplicatesRemoveAlternateMember( media_id )
+ self.RemoveAlternateMember( media_id )
- def DuplicatesRemoveMediaIdMember( self, hash_id ):
+ def RemoveMediaIdMember( self, hash_id ):
- media_id = self.DuplicatesGetMediaId( hash_id, do_not_create = True )
+ media_id = self.GetMediaId( hash_id, do_not_create = True )
if media_id is not None:
- king_hash_id = self.DuplicatesGetKingHashId( media_id )
+ king_hash_id = self.GetKingHashId( media_id )
if hash_id == king_hash_id:
- self.DuplicatesDissolveMediaId( media_id )
+ self.DissolveMediaId( media_id )
else:
@@ -1314,19 +1391,19 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
- def DuplicatesRemoveMediaIdMemberFromHashes( self, hashes ):
+ def RemoveMediaIdMemberFromHashes( self, hashes ):
hash_ids = self.modules_hashes_local_cache.GetHashIds( hashes )
for hash_id in hash_ids:
- self.DuplicatesRemoveMediaIdMember( hash_id )
+ self.RemoveMediaIdMember( hash_id )
- def DuplicatesRemovePotentialPairs( self, hash_id ):
+ def RemovePotentialPairs( self, hash_id ):
- media_id = self.DuplicatesGetMediaId( hash_id, do_not_create = True )
+ media_id = self.GetMediaId( hash_id, do_not_create = True )
if media_id is not None:
@@ -1334,17 +1411,17 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
- def DuplicatesRemovePotentialPairsFromHashes( self, hashes ):
+ def RemovePotentialPairsFromHashes( self, hashes ):
hash_ids = self.modules_hashes_local_cache.GetHashIds( hashes )
for hash_id in hash_ids:
- self.DuplicatesRemovePotentialPairs( hash_id )
+ self.RemovePotentialPairs( hash_id )
- def DuplicatesSetAlternates( self, media_id_a, media_id_b ):
+ def SetAlternates( self, media_id_a, media_id_b ):
if media_id_a == media_id_b:
@@ -1353,14 +1430,14 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
# let's clear out any outstanding potentials. whether this is a valid or not connection, we don't want to see it again
- self.DuplicatesClearPotentialsBetweenMedias( ( media_id_a, ), ( media_id_b, ) )
+ self.ClearPotentialsBetweenMedias( ( media_id_a, ), ( media_id_b, ) )
# now check if we should be making a new relationship
- alternates_group_id_a = self.DuplicatesGetAlternatesGroupId( media_id_a )
- alternates_group_id_b = self.DuplicatesGetAlternatesGroupId( media_id_b )
+ alternates_group_id_a = self.GetAlternatesGroupId( media_id_a )
+ alternates_group_id_b = self.GetAlternatesGroupId( media_id_b )
- if self.DuplicatesAlternatesGroupsAreFalsePositive( alternates_group_id_a, alternates_group_id_b ):
+ if self.AlternatesGroupsAreFalsePositive( alternates_group_id_a, alternates_group_id_b ):
return
@@ -1388,11 +1465,11 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
if smaller_false_positive_alternates_group_id == alternates_group_id_a:
- self.DuplicatesClearPotentialsBetweenAlternatesGroups( alternates_group_id_b, larger_false_positive_alternates_group_id )
+ self.ClearPotentialsBetweenAlternatesGroups( alternates_group_id_b, larger_false_positive_alternates_group_id )
else:
- self.DuplicatesClearPotentialsBetweenAlternatesGroups( smaller_false_positive_alternates_group_id, alternates_group_id_b )
+ self.ClearPotentialsBetweenAlternatesGroups( smaller_false_positive_alternates_group_id, alternates_group_id_b )
@@ -1410,11 +1487,11 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
if smaller_false_positive_alternates_group_id == alternates_group_id_b:
- self.DuplicatesSetFalsePositive( alternates_group_id_a, larger_false_positive_alternates_group_id )
+ self.SetFalsePositive( alternates_group_id_a, larger_false_positive_alternates_group_id )
else:
- self.DuplicatesSetFalsePositive( smaller_false_positive_alternates_group_id, alternates_group_id_a )
+ self.SetFalsePositive( smaller_false_positive_alternates_group_id, alternates_group_id_a )
@@ -1425,14 +1502,14 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
# pubsub to refresh alternates info for alternates_group_id_a and _b goes here
- def DuplicatesSetFalsePositive( self, alternates_group_id_a, alternates_group_id_b ):
+ def SetFalsePositive( self, alternates_group_id_a, alternates_group_id_b ):
if alternates_group_id_a == alternates_group_id_b:
return
- self.DuplicatesClearPotentialsBetweenAlternatesGroups( alternates_group_id_a, alternates_group_id_b )
+ self.ClearPotentialsBetweenAlternatesGroups( alternates_group_id_a, alternates_group_id_b )
smaller_alternates_group_id = min( alternates_group_id_a, alternates_group_id_b )
larger_alternates_group_id = max( alternates_group_id_a, alternates_group_id_b )
@@ -1440,18 +1517,18 @@ class ClientDBFilesDuplicates( ClientDBModule.ClientDBModule ):
self._Execute( 'INSERT OR IGNORE INTO duplicate_false_positives ( smaller_alternates_group_id, larger_alternates_group_id ) VALUES ( ?, ? );', ( smaller_alternates_group_id, larger_alternates_group_id ) )
- def DuplicatesSetKing( self, king_hash_id, media_id ):
+ def SetKing( self, king_hash_id, media_id ):
self._Execute( 'UPDATE duplicate_files SET king_hash_id = ? WHERE media_id = ?;', ( king_hash_id, media_id ) )
- def DuplicatesSetKingFromHash( self, hash ):
+ def SetKingFromHash( self, hash ):
hash_id = self.modules_hashes_local_cache.GetHashId( hash )
- media_id = self.DuplicatesGetMediaId( hash_id )
+ media_id = self.GetMediaId( hash_id )
- self.DuplicatesSetKing( hash_id, media_id )
+ self.SetKing( hash_id, media_id )
def GetTablesAndColumnsThatUseDefinitions( self, content_type: int ) -> typing.List[ typing.Tuple[ str, str ] ]:
diff --git a/hydrus/client/gui/ClientGUIAPI.py b/hydrus/client/gui/ClientGUIAPI.py
index 4c346e36..96910835 100644
--- a/hydrus/client/gui/ClientGUIAPI.py
+++ b/hydrus/client/gui/ClientGUIAPI.py
@@ -86,6 +86,8 @@ class EditAPIPermissionsPanel( ClientGUIScrolledPanels.EditPanel ):
self._basic_permissions.Append( ClientAPI.basic_permission_to_str_lookup[ permission ], permission )
+ self._basic_permissions.sortItems()
+
search_tag_filter = api_permissions.GetSearchTagFilter()
message = 'The API will only permit searching for tags that pass through this filter.'
@@ -114,7 +116,7 @@ class EditAPIPermissionsPanel( ClientGUIScrolledPanels.EditPanel ):
rows.append( ( 'access key: ', self._access_key ) )
rows.append( ( 'name: ', self._name ) )
- rows.append( ( 'permissions: ', self._basic_permissions) )
+ rows.append( ( 'permissions: ', self._basic_permissions ) )
rows.append( ( 'tag search permissions: ', self._search_tag_filter ) )
gridbox = ClientGUICommon.WrapInGrid( self, rows )
diff --git a/hydrus/client/gui/pages/ClientGUIPages.py b/hydrus/client/gui/pages/ClientGUIPages.py
index 776eac01..c26ff0c3 100644
--- a/hydrus/client/gui/pages/ClientGUIPages.py
+++ b/hydrus/client/gui/pages/ClientGUIPages.py
@@ -820,7 +820,7 @@ class Page( QW.QWidget ):
root[ 'name' ] = self.GetName()
root[ 'page_key' ] = self._page_key.hex()
root[ 'page_type' ] = self._management_controller.GetType()
- root[ 'focused' ] = is_selected
+ root[ 'selected' ] = is_selected
return root
diff --git a/hydrus/client/networking/ClientLocalServer.py b/hydrus/client/networking/ClientLocalServer.py
index c8a2b724..11396995 100644
--- a/hydrus/client/networking/ClientLocalServer.py
+++ b/hydrus/client/networking/ClientLocalServer.py
@@ -99,6 +99,25 @@ class HydrusServiceClientAPI( HydrusClientService ):
manage_cookies.putChild( b'get_cookies', ClientLocalServerResources.HydrusResourceClientAPIRestrictedManageCookiesGetCookies( self._service, self._client_requests_domain ) )
manage_cookies.putChild( b'set_cookies', ClientLocalServerResources.HydrusResourceClientAPIRestrictedManageCookiesSetCookies( self._service, self._client_requests_domain ) )
+ manage_database = NoResource()
+
+ root.putChild( b'manage_database', manage_database )
+
+ manage_database.putChild( b'mr_bones', ClientLocalServerResources.HydrusResourceClientAPIRestrictedManageDatabaseMrBones( self._service, self._client_requests_domain ) )
+ manage_database.putChild( b'lock_on', ClientLocalServerResources.HydrusResourceClientAPIRestrictedManageDatabaseLockOn( self._service, self._client_requests_domain ) )
+ manage_database.putChild( b'lock_off', ClientLocalServerResources.HydrusResourceClientAPIRestrictedManageDatabaseLockOff( self._service, self._client_requests_domain ) )
+
+ manage_file_relationships = NoResource()
+
+ root.putChild( b'manage_file_relationships', manage_file_relationships )
+
+ manage_file_relationships.putChild( b'get_file_relationships', ClientLocalServerResources.HydrusResourceClientAPIRestrictedManageFileRelationshipsGetRelationships( self._service, self._client_requests_domain ) )
+ manage_file_relationships.putChild( b'get_potentials_count', ClientLocalServerResources.HydrusResourceClientAPIRestrictedManageFileRelationshipsGetPotentialsCount( self._service, self._client_requests_domain ) )
+ manage_file_relationships.putChild( b'get_potential_pairs', ClientLocalServerResources.HydrusResourceClientAPIRestrictedManageFileRelationshipsGetPotentialPairs( self._service, self._client_requests_domain ) )
+ manage_file_relationships.putChild( b'get_random_potentials', ClientLocalServerResources.HydrusResourceClientAPIRestrictedManageFileRelationshipsGetRandomPotentials( self._service, self._client_requests_domain ) )
+ manage_file_relationships.putChild( b'set_file_relationships', ClientLocalServerResources.HydrusResourceClientAPIRestrictedManageFileRelationshipsSetRelationships( self._service, self._client_requests_domain ) )
+ manage_file_relationships.putChild( b'set_kings', ClientLocalServerResources.HydrusResourceClientAPIRestrictedManageFileRelationshipsSetKings( self._service, self._client_requests_domain ) )
+
manage_headers = NoResource()
root.putChild( b'manage_headers', manage_headers )
@@ -115,14 +134,6 @@ class HydrusServiceClientAPI( HydrusClientService ):
manage_pages.putChild( b'get_page_info', ClientLocalServerResources.HydrusResourceClientAPIRestrictedManagePagesGetPageInfo( self._service, self._client_requests_domain ) )
manage_pages.putChild( b'refresh_page', ClientLocalServerResources.HydrusResourceClientAPIRestrictedManagePagesRefreshPage( self._service, self._client_requests_domain ) )
- manage_database = NoResource()
-
- root.putChild( b'manage_database', manage_database )
-
- manage_database.putChild( b'mr_bones', ClientLocalServerResources.HydrusResourceClientAPIRestrictedManageDatabaseMrBones( self._service, self._client_requests_domain ) )
- manage_database.putChild( b'lock_on', ClientLocalServerResources.HydrusResourceClientAPIRestrictedManageDatabaseLockOn( self._service, self._client_requests_domain ) )
- manage_database.putChild( b'lock_off', ClientLocalServerResources.HydrusResourceClientAPIRestrictedManageDatabaseLockOff( self._service, self._client_requests_domain ) )
-
return root
diff --git a/hydrus/client/networking/ClientLocalServerResources.py b/hydrus/client/networking/ClientLocalServerResources.py
index 1b1e1529..bff6114f 100644
--- a/hydrus/client/networking/ClientLocalServerResources.py
+++ b/hydrus/client/networking/ClientLocalServerResources.py
@@ -55,10 +55,12 @@ LOCAL_BOORU_STRING_PARAMS = set()
LOCAL_BOORU_JSON_PARAMS = set()
LOCAL_BOORU_JSON_BYTE_LIST_PARAMS = set()
-CLIENT_API_INT_PARAMS = { 'file_id', 'file_sort_type' }
-CLIENT_API_BYTE_PARAMS = { 'hash', 'destination_page_key', 'page_key', 'Hydrus-Client-API-Access-Key', 'Hydrus-Client-API-Session-Key', 'tag_service_key', 'file_service_key' }
+# if a variable name isn't defined here, a GET with it won't work
+
+CLIENT_API_INT_PARAMS = { 'file_id', 'file_sort_type', 'potentials_search_type', 'pixel_duplicates', 'max_hamming_distance', 'max_num_pairs' }
+CLIENT_API_BYTE_PARAMS = { 'hash', 'destination_page_key', 'page_key', 'Hydrus-Client-API-Access-Key', 'Hydrus-Client-API-Session-Key', 'tag_service_key', 'tag_service_key_1', 'tag_service_key_2', 'file_service_key' }
CLIENT_API_STRING_PARAMS = { 'name', 'url', 'domain', 'search', 'file_service_name', 'tag_service_name', 'reason', 'tag_display_type', 'source_hash_type', 'desired_hash_type' }
-CLIENT_API_JSON_PARAMS = { 'basic_permissions', 'system_inbox', 'system_archive', 'tags', 'file_ids', 'only_return_identifiers', 'only_return_basic_information', 'create_new_file_ids', 'detailed_url_information', 'hide_service_names_tags', 'hide_service_keys_tags', 'simple', 'file_sort_asc', 'return_hashes', 'return_file_ids', 'include_notes', 'notes', 'note_names', 'doublecheck_file_system' }
+CLIENT_API_JSON_PARAMS = { 'basic_permissions', 'system_inbox', 'system_archive', 'tags', 'tags_1', 'tags_2', 'file_ids', 'only_return_identifiers', 'only_return_basic_information', 'create_new_file_ids', 'detailed_url_information', 'hide_service_names_tags', 'hide_service_keys_tags', 'simple', 'file_sort_asc', 'return_hashes', 'return_file_ids', 'include_notes', 'notes', 'note_names', 'doublecheck_file_system' }
CLIENT_API_JSON_BYTE_LIST_PARAMS = { 'hashes' }
CLIENT_API_JSON_BYTE_DICT_PARAMS = { 'service_keys_to_tags', 'service_keys_to_actions_to_tags', 'service_keys_to_additional_tags' }
@@ -109,6 +111,24 @@ def CheckHashLength( hashes, hash_type = 'sha256' ):
+
+def CheckTagService( tag_service_key: bytes ):
+
+ try:
+
+ service = HG.client_controller.services_manager.GetService( tag_service_key )
+
+ except:
+
+ raise HydrusExceptions.BadRequestException( 'Could not find that tag service!' )
+
+
+ if service.GetServiceType() not in HC.ALL_TAG_SERVICES:
+
+ raise HydrusExceptions.BadRequestException( 'Sorry, that service key did not give a tag service!' )
+
+
+
def ConvertServiceNamesDictToKeys( allowed_service_types, service_name_dict ):
service_key_dict = {}
@@ -413,6 +433,60 @@ def ParseClientAPISearchPredicates( request ) -> typing.List[ ClientSearch.Predi
return predicates
+
+def ParseDuplicateSearch( request: HydrusServerRequest.HydrusRequest ):
+
+ # TODO: When we have ParseLocationContext for clever file searching, swap it in here too
+ # LocationContext has to be the same for both searches
+ location_context = ClientLocation.LocationContext.STATICCreateSimple( CC.COMBINED_LOCAL_MEDIA_SERVICE_KEY )
+
+ tag_service_key_1 = request.parsed_request_args.GetValue( 'tag_service_key_1', bytes, default_value = CC.COMBINED_TAG_SERVICE_KEY )
+ tag_service_key_2 = request.parsed_request_args.GetValue( 'tag_service_key_2', bytes, default_value = CC.COMBINED_TAG_SERVICE_KEY )
+
+ CheckTagService( tag_service_key_1 )
+ CheckTagService( tag_service_key_2 )
+
+ tag_context_1 = ClientSearch.TagContext( service_key = tag_service_key_1 )
+ tag_context_2 = ClientSearch.TagContext( service_key = tag_service_key_2 )
+
+ tags_1 = request.parsed_request_args.GetValue( 'tags_1', list, default_value = [] )
+ tags_2 = request.parsed_request_args.GetValue( 'tags_2', list, default_value = [] )
+
+ if len( tags_1 ) == 0:
+
+ predicates_1 = [ ClientSearch.Predicate( ClientSearch.PREDICATE_TYPE_SYSTEM_EVERYTHING ) ]
+
+ else:
+
+ predicates_1 = ConvertTagListToPredicates( request, tags_1, do_permission_check = False )
+
+
+ if len( tags_2 ) == 0:
+
+ predicates_2 = [ ClientSearch.Predicate( ClientSearch.PREDICATE_TYPE_SYSTEM_EVERYTHING ) ]
+
+ else:
+
+ predicates_2 = ConvertTagListToPredicates( request, tags_2, do_permission_check = False )
+
+
+
+ file_search_context_1 = ClientSearch.FileSearchContext( location_context = location_context, tag_context = tag_context_1, predicates = predicates_1 )
+ file_search_context_2 = ClientSearch.FileSearchContext( location_context = location_context, tag_context = tag_context_2, predicates = predicates_2 )
+
+ dupe_search_type = request.parsed_request_args.GetValue( 'potentials_search_type', int, default_value = CC.DUPE_SEARCH_ONE_FILE_MATCHES_ONE_SEARCH )
+ pixel_dupes_preference = request.parsed_request_args.GetValue( 'pixel_duplicates', int, default_value = CC.SIMILAR_FILES_PIXEL_DUPES_ALLOWED )
+ max_hamming_distance = request.parsed_request_args.GetValue( 'max_hamming_distance', int, default_value = 4 )
+
+ return (
+ file_search_context_1,
+ file_search_context_2,
+ dupe_search_type,
+ pixel_dupes_preference,
+ max_hamming_distance
+ )
+
+
def ParseLocationContext( request: HydrusServerRequest.HydrusRequest, default: ClientLocation.LocationContext ):
if 'file_service_key' in request.parsed_request_args or 'file_service_name' in request.parsed_request_args:
@@ -456,6 +530,7 @@ def ParseLocationContext( request: HydrusServerRequest.HydrusRequest, default: C
return default
+
def ParseHashes( request: HydrusServerRequest.HydrusRequest ):
hashes = set()
@@ -496,6 +571,7 @@ def ParseHashes( request: HydrusServerRequest.HydrusRequest ):
return hashes
+
def ParseRequestedResponseMime( request: HydrusServerRequest.HydrusRequest ):
# let them ask for something specifically, else default to what they asked in, finally default to json
@@ -541,6 +617,38 @@ def ParseRequestedResponseMime( request: HydrusServerRequest.HydrusRequest ):
return HC.APPLICATION_JSON
+def ParseTagServiceKey( request: HydrusServerRequest.HydrusRequest ):
+
+ if 'tag_service_key' in request.parsed_request_args or 'tag_service_name' in request.parsed_request_args:
+
+ if 'tag_service_key' in request.parsed_request_args:
+
+ tag_service_key = request.parsed_request_args[ 'tag_service_key' ]
+
+ else:
+
+ tag_service_name = request.parsed_request_args[ 'tag_service_name' ]
+
+ try:
+
+ tag_service_key = HG.client_controller.services_manager.GetServiceKeyFromName( HC.ALL_TAG_SERVICES, tag_service_name )
+
+ except:
+
+ raise HydrusExceptions.BadRequestException( 'Could not find the service "{}"!'.format( tag_service_name ) )
+
+
+
+ CheckTagService( tag_service_key )
+
+ else:
+
+ tag_service_key = CC.COMBINED_TAG_SERVICE_KEY
+
+
+ return tag_service_key
+
+
def ConvertTagListToPredicates( request, tag_list, do_permission_check = True, error_on_invalid_tag = True ) -> typing.List[ ClientSearch.Predicate ]:
or_tag_lists = [ tag for tag in tag_list if isinstance( tag, list ) ]
@@ -1246,6 +1354,7 @@ class HydrusResourceClientAPIRestrictedGetServices( HydrusResourceClientAPIRestr
ClientAPI.CLIENT_API_PERMISSION_ADD_TAGS,
ClientAPI.CLIENT_API_PERMISSION_ADD_NOTES,
ClientAPI.CLIENT_API_PERMISSION_MANAGE_PAGES,
+ ClientAPI.CLIENT_API_PERMISSION_MANAGE_FILE_RELATIONSHIPS,
ClientAPI.CLIENT_API_PERMISSION_SEARCH_FILES
)
)
@@ -1778,43 +1887,6 @@ class HydrusResourceClientAPIRestrictedAddTagsSearchTags( HydrusResourceClientAP
return parsed_autocomplete_text
- def _GetTagServiceKey( self, request: HydrusServerRequest.HydrusRequest ):
-
- tag_service_key = CC.COMBINED_TAG_SERVICE_KEY
-
- if 'tag_service_key' in request.parsed_request_args:
-
- tag_service_key = request.parsed_request_args[ 'tag_service_key' ]
-
- elif 'tag_service_name' in request.parsed_request_args:
-
- tag_service_name = request.parsed_request_args[ 'tag_service_name' ]
-
- try:
-
- tag_service_key = HG.client_controller.services_manager.GetServiceKeyFromName( HC.ALL_TAG_SERVICES, tag_service_name )
-
- except:
-
- raise HydrusExceptions.BadRequestException( 'Could not find the service "{}"!'.format( tag_service_name ) )
-
-
- try:
-
- service = HG.client_controller.services_manager.GetService( tag_service_key )
-
- except:
-
- raise HydrusExceptions.BadRequestException( 'Could not find that tag service!' )
-
- if service.GetServiceType() not in HC.ALL_TAG_SERVICES:
-
- raise HydrusExceptions.BadRequestException( 'Sorry, that service key did not give a tag service!' )
-
-
- return tag_service_key
-
-
def _GetTagMatches( self, request: HydrusServerRequest.HydrusRequest, tag_display_type: int, tag_service_key: bytes, parsed_autocomplete_text: ClientSearch.ParsedAutocompleteText ) -> typing.List[ ClientSearch.Predicate ]:
matches = []
@@ -1855,7 +1927,7 @@ class HydrusResourceClientAPIRestrictedAddTagsSearchTags( HydrusResourceClientAP
tag_display_type = ClientTags.TAG_DISPLAY_STORAGE if tag_display_type_str == 'storage' else ClientTags.TAG_DISPLAY_ACTUAL
- tag_service_key = self._GetTagServiceKey( request )
+ tag_service_key = ParseTagServiceKey( request )
parsed_autocomplete_text = self._GetParsedAutocompleteText( search, tag_service_key )
@@ -2213,44 +2285,7 @@ class HydrusResourceClientAPIRestrictedGetFilesSearchFiles( HydrusResourceClient
location_context = ParseLocationContext( request, ClientLocation.LocationContext.STATICCreateSimple( CC.COMBINED_LOCAL_MEDIA_SERVICE_KEY ) )
- if 'tag_service_key' in request.parsed_request_args or 'tag_service_name' in request.parsed_request_args:
-
- if 'tag_service_key' in request.parsed_request_args:
-
- tag_service_key = request.parsed_request_args[ 'tag_service_key' ]
-
- else:
-
- tag_service_name = request.parsed_request_args[ 'tag_service_name' ]
-
- try:
-
- tag_service_key = HG.client_controller.services_manager.GetServiceKeyFromName( HC.ALL_TAG_SERVICES, tag_service_name )
-
- except:
-
- raise HydrusExceptions.BadRequestException( 'Could not find the service "{}"!'.format( tag_service_name ) )
-
-
-
- try:
-
- service = HG.client_controller.services_manager.GetService( tag_service_key )
-
- except:
-
- raise HydrusExceptions.BadRequestException( 'Could not find that tag service!' )
-
-
- if service.GetServiceType() not in HC.ALL_TAG_SERVICES:
-
- raise HydrusExceptions.BadRequestException( 'Sorry, that service key did not give a tag service!' )
-
-
- else:
-
- tag_service_key = CC.COMBINED_TAG_SERVICE_KEY
-
+ tag_service_key = ParseTagServiceKey( request )
if tag_service_key == CC.COMBINED_TAG_SERVICE_KEY and location_context.IsAllKnownFiles():
@@ -3099,6 +3134,285 @@ class HydrusResourceClientAPIRestrictedManageDatabaseMrBones( HydrusResourceClie
return response_context
+
+class HydrusResourceClientAPIRestrictedManageFileRelationships( HydrusResourceClientAPIRestricted ):
+
+ def _CheckAPIPermissions( self, request: HydrusServerRequest.HydrusRequest ):
+
+ request.client_api_permissions.CheckPermission( ClientAPI.CLIENT_API_PERMISSION_MANAGE_FILE_RELATIONSHIPS )
+
+
+
+class HydrusResourceClientAPIRestrictedManageFileRelationshipsGetRelationships( HydrusResourceClientAPIRestrictedManageFileRelationships ):
+
+ def _threadDoGETJob( self, request: HydrusServerRequest.HydrusRequest ):
+
+ # TODO: When we have ParseLocationContext for clever file searching, swap it in here too
+ location_context = ClientLocation.LocationContext.STATICCreateSimple( CC.COMBINED_LOCAL_MEDIA_SERVICE_KEY )
+
+ hashes = ParseHashes( request )
+
+ # maybe in future we'll just get the media results and dump the dict from there, but whatever
+ hashes_to_file_duplicates = HG.client_controller.Read( 'file_relationships_for_api', location_context, hashes )
+
+ body_dict = { 'file_relationships' : hashes_to_file_duplicates }
+
+ body = Dumps( body_dict, request.preferred_mime )
+
+ response_context = HydrusServerResources.ResponseContext( 200, mime = request.preferred_mime, body = body )
+
+ return response_context
+
+
+
+class HydrusResourceClientAPIRestrictedManageFileRelationshipsGetPotentialsCount( HydrusResourceClientAPIRestrictedManageFileRelationships ):
+
+ def _threadDoGETJob( self, request: HydrusServerRequest.HydrusRequest ):
+
+ (
+ file_search_context_1,
+ file_search_context_2,
+ dupe_search_type,
+ pixel_dupes_preference,
+ max_hamming_distance
+ ) = ParseDuplicateSearch( request )
+
+ count = HG.client_controller.Read( 'potential_duplicates_count', file_search_context_1, file_search_context_2, dupe_search_type, pixel_dupes_preference, max_hamming_distance )
+
+ body_dict = { 'potential_duplicates_count' : count }
+
+ body = Dumps( body_dict, request.preferred_mime )
+
+ response_context = HydrusServerResources.ResponseContext( 200, mime = request.preferred_mime, body = body )
+
+ return response_context
+
+
+
+class HydrusResourceClientAPIRestrictedManageFileRelationshipsGetPotentialPairs( HydrusResourceClientAPIRestrictedManageFileRelationships ):
+
+ def _threadDoGETJob( self, request: HydrusServerRequest.HydrusRequest ):
+
+ (
+ file_search_context_1,
+ file_search_context_2,
+ dupe_search_type,
+ pixel_dupes_preference,
+ max_hamming_distance
+ ) = ParseDuplicateSearch( request )
+
+ max_num_pairs = request.parsed_request_args.GetValue( 'max_num_pairs', int, default_value = HG.client_controller.new_options.GetInteger( 'duplicate_filter_max_batch_size' ) )
+
+ filtering_pairs_media_results = HG.client_controller.Read( 'duplicate_pairs_for_filtering', file_search_context_1, file_search_context_2, dupe_search_type, pixel_dupes_preference, max_hamming_distance, max_num_pairs = max_num_pairs )
+
+ filtering_pairs_hashes = [ ( m1.GetHash().hex(), m2.GetHash().hex() ) for ( m1, m2 ) in filtering_pairs_media_results ]
+
+ body_dict = { 'potential_duplicate_pairs' : filtering_pairs_hashes }
+
+ body = Dumps( body_dict, request.preferred_mime )
+
+ response_context = HydrusServerResources.ResponseContext( 200, mime = request.preferred_mime, body = body )
+
+ return response_context
+
+
+
+class HydrusResourceClientAPIRestrictedManageFileRelationshipsGetRandomPotentials( HydrusResourceClientAPIRestrictedManageFileRelationships ):
+
+ def _threadDoGETJob( self, request: HydrusServerRequest.HydrusRequest ):
+
+ (
+ file_search_context_1,
+ file_search_context_2,
+ dupe_search_type,
+ pixel_dupes_preference,
+ max_hamming_distance
+ ) = ParseDuplicateSearch( request )
+
+ hashes = HG.client_controller.Read( 'random_potential_duplicate_hashes', file_search_context_1, file_search_context_2, dupe_search_type, pixel_dupes_preference, max_hamming_distance )
+
+ body_dict = { 'random_potential_duplicate_hashes' : [ hash.hex() for hash in hashes ] }
+
+ body = Dumps( body_dict, request.preferred_mime )
+
+ response_context = HydrusServerResources.ResponseContext( 200, mime = request.preferred_mime, body = body )
+
+ return response_context
+
+
+
+class HydrusResourceClientAPIRestrictedManageFileRelationshipsSetKings( HydrusResourceClientAPIRestrictedManageFileRelationships ):
+
+ def _threadDoPOSTJob( self, request: HydrusServerRequest.HydrusRequest ):
+
+ hashes = ParseHashes( request )
+
+ for hash in hashes:
+
+ HG.client_controller.WriteSynchronous( 'duplicate_set_king', hash )
+
+
+ response_context = HydrusServerResources.ResponseContext( 200 )
+
+ return response_context
+
+
+
+class HydrusResourceClientAPIRestrictedManageFileRelationshipsSetRelationships( HydrusResourceClientAPIRestrictedManageFileRelationships ):
+
+ def _threadDoPOSTJob( self, request: HydrusServerRequest.HydrusRequest ):
+
+ rows = []
+
+ raw_rows = request.parsed_request_args.GetValue( 'pair_rows', list, expected_list_type = list )
+
+ all_hashes = set()
+
+ for row in raw_rows:
+
+ if len( row ) != 6:
+
+ raise HydrusExceptions.BadRequestException( 'One of the pair rows was the wrong length!' )
+
+
+ ( duplicate_type, hash_a_hex, hash_b_hex, do_default_content_merge, delete_first, delete_second ) = row
+
+ try:
+
+ hash_a = bytes.fromhex( hash_a_hex )
+ hash_b = bytes.fromhex( hash_b_hex )
+
+ except:
+
+ raise HydrusExceptions.BadRequestException( 'Sorry, did not understand one of the hashes {} or {}!'.format( hash_a_hex, hash_b_hex ) )
+
+
+ CheckHashLength( ( hash_a, hash_b ) )
+
+ all_hashes.update( ( hash_a, hash_b ) )
+
+
+ media_results = HG.client_controller.Read( 'media_results', all_hashes )
+
+ hashes_to_media_results = { media_result.GetHash() : media_result for media_result in media_results }
+
+ for row in raw_rows:
+
+ ( duplicate_type, hash_a_hex, hash_b_hex, do_default_content_merge, delete_first, delete_second ) = row
+
+ if duplicate_type not in [
+ HC.DUPLICATE_FALSE_POSITIVE,
+ HC.DUPLICATE_ALTERNATE,
+ HC.DUPLICATE_BETTER,
+ HC.DUPLICATE_WORSE,
+ HC.DUPLICATE_SAME_QUALITY,
+ HC.DUPLICATE_POTENTIAL
+ ]:
+
+ raise HydrusExceptions.BadRequestException( 'One of the duplicate statuses ({}) was incorrect!'.format( duplicate_type ) )
+
+
+ try:
+
+ hash_a = bytes.fromhex( hash_a_hex )
+ hash_b = bytes.fromhex( hash_b_hex )
+
+ except:
+
+ raise HydrusExceptions.BadRequestException( 'Sorry, did not understand one of the hashes {} or {}!'.format( hash_a_hex, hash_b_hex ) )
+
+
+ if not isinstance( do_default_content_merge, bool ):
+
+ raise HydrusExceptions.BadRequestException( 'Sorry, "do_default_content_merge" has to be a boolean! "{}" was not!'.format( do_default_content_merge ) )
+
+
+ if not isinstance( delete_first, bool ):
+
+ raise HydrusExceptions.BadRequestException( 'Sorry, "delete_first" has to be a boolean! "{}" was not!'.format( delete_first ) )
+
+
+ if not isinstance( delete_second, bool ):
+
+ raise HydrusExceptions.BadRequestException( 'Sorry, "delete_second" has to be a boolean! "{}" was not!'.format( delete_second ) )
+
+
+ # ok the raw row looks good
+
+ list_of_service_keys_to_content_updates = []
+
+ first_media = ClientMedia.MediaSingleton( hashes_to_media_results[ hash_a ] )
+ second_media = ClientMedia.MediaSingleton( hashes_to_media_results[ hash_b ] )
+
+ file_deletion_reason = 'From Client API (duplicates processing).'
+
+ if do_default_content_merge:
+
+ duplicate_content_merge_options = HG.client_controller.new_options.GetDuplicateContentMergeOptions( duplicate_type )
+
+ list_of_service_keys_to_content_updates.append( duplicate_content_merge_options.ProcessPairIntoContentUpdates( first_media, second_media, file_deletion_reason = file_deletion_reason, delete_first = delete_first, delete_second = delete_second ) )
+
+ elif delete_first or delete_second:
+
+ service_keys_to_content_updates = collections.defaultdict( list )
+
+ deletee_media = set()
+
+ if delete_first:
+
+ deletee_media.add( first_media )
+
+
+ if delete_second:
+
+ deletee_media.add( second_media )
+
+
+ for media in deletee_media:
+
+ if media.HasDeleteLocked():
+
+ ClientMedia.ReportDeleteLockFailures( [ media ] )
+
+ continue
+
+
+ if media.GetLocationsManager().IsTrashed():
+
+ deletee_service_keys = ( CC.COMBINED_LOCAL_FILE_SERVICE_KEY, )
+
+ else:
+
+ local_file_service_keys = HG.client_controller.services_manager.GetServiceKeys( ( HC.LOCAL_FILE_DOMAIN, ) )
+
+ deletee_service_keys = media.GetLocationsManager().GetCurrent().intersection( local_file_service_keys )
+
+
+ for deletee_service_key in deletee_service_keys:
+
+ content_update = HydrusData.ContentUpdate( HC.CONTENT_TYPE_FILES, HC.CONTENT_UPDATE_DELETE, media.GetHashes(), reason = file_deletion_reason )
+
+ service_keys_to_content_updates[ deletee_service_key ].append( content_update )
+
+
+
+ list_of_service_keys_to_content_updates.append( service_keys_to_content_updates )
+
+
+ rows.append( ( duplicate_type, hash_a, hash_b, list_of_service_keys_to_content_updates ) )
+
+
+ if len( rows ) > 0:
+
+ HG.client_controller.WriteSynchronous( 'duplicate_pair_status', rows )
+
+
+ response_context = HydrusServerResources.ResponseContext( 200 )
+
+ return response_context
+
+
+
class HydrusResourceClientAPIRestrictedManagePages( HydrusResourceClientAPIRestricted ):
def _CheckAPIPermissions( self, request: HydrusServerRequest.HydrusRequest ):
@@ -3106,6 +3420,7 @@ class HydrusResourceClientAPIRestrictedManagePages( HydrusResourceClientAPIRestr
request.client_api_permissions.CheckPermission( ClientAPI.CLIENT_API_PERMISSION_MANAGE_PAGES )
+
class HydrusResourceClientAPIRestrictedManagePagesAddFiles( HydrusResourceClientAPIRestrictedManagePages ):
def _threadDoPOSTJob( self, request: HydrusServerRequest.HydrusRequest ):
@@ -3183,6 +3498,7 @@ class HydrusResourceClientAPIRestrictedManagePagesAddFiles( HydrusResourceClient
return response_context
+
class HydrusResourceClientAPIRestrictedManagePagesFocusPage( HydrusResourceClientAPIRestrictedManagePages ):
def _threadDoPOSTJob( self, request: HydrusServerRequest.HydrusRequest ):
diff --git a/hydrus/core/HydrusConstants.py b/hydrus/core/HydrusConstants.py
index 97b72a79..1ed42c97 100644
--- a/hydrus/core/HydrusConstants.py
+++ b/hydrus/core/HydrusConstants.py
@@ -2,6 +2,7 @@ import os
import sqlite3
import sys
import typing
+
import yaml
# old method of getting frozen dir, doesn't work for symlinks looks like:
@@ -83,8 +84,8 @@ options = {}
# Misc
NETWORK_VERSION = 20
-SOFTWARE_VERSION = 512
-CLIENT_API_VERSION = 39
+SOFTWARE_VERSION = 513
+CLIENT_API_VERSION = 40
SERVER_THUMBNAIL_DIMENSIONS = ( 200, 200 )
diff --git a/hydrus/test/HelperFunctions.py b/hydrus/test/HelperFunctions.py
index 265fd6f4..7f3754f3 100644
--- a/hydrus/test/HelperFunctions.py
+++ b/hydrus/test/HelperFunctions.py
@@ -1,5 +1,14 @@
+import random
import unittest
+from hydrus.core import HydrusConstants as HC
+from hydrus.core import HydrusData
+from hydrus.core import HydrusGlobals as HG
+
+from hydrus.client import ClientConstants as CC
+from hydrus.client.media import ClientMediaManagers
+from hydrus.client.media import ClientMediaResult
+
def compare_content_updates( ut: unittest.TestCase, service_keys_to_content_updates, expected_service_keys_to_content_updates ):
ut.assertEqual( len( service_keys_to_content_updates ), len( expected_service_keys_to_content_updates ) )
@@ -14,3 +23,55 @@ def compare_content_updates( ut: unittest.TestCase, service_keys_to_content_upda
ut.assertEqual( c_u_tuples, e_c_u_tuples )
+
+def GetFakeMediaResult( hash: bytes ):
+
+ hash_id = random.randint( 0, 200 * ( 1024 ** 2 ) )
+
+ size = random.randint( 8192, 20 * 1048576 )
+ mime = random.choice( [ HC.IMAGE_JPEG, HC.VIDEO_WEBM, HC.APPLICATION_PDF ] )
+ width = random.randint( 200, 4096 )
+ height = random.randint( 200, 4096 )
+ duration = random.choice( [ 220, 16.66667, None ] )
+ has_audio = random.choice( [ True, False ] )
+
+ file_info_manager = ClientMediaManagers.FileInfoManager( hash_id, hash, size = size, mime = mime, width = width, height = height, duration = duration, has_audio = has_audio )
+
+ file_info_manager.has_exif = True
+ file_info_manager.has_icc_profile = True
+
+ service_keys_to_statuses_to_tags = { CC.DEFAULT_LOCAL_TAG_SERVICE_KEY : { HC.CONTENT_STATUS_CURRENT : { 'blue_eyes', 'blonde_hair' }, HC.CONTENT_STATUS_PENDING : { 'bodysuit' } } }
+ service_keys_to_statuses_to_display_tags = { CC.DEFAULT_LOCAL_TAG_SERVICE_KEY : { HC.CONTENT_STATUS_CURRENT : { 'blue eyes', 'blonde hair' }, HC.CONTENT_STATUS_PENDING : { 'bodysuit', 'clothing' } } }
+
+ service_keys_to_filenames = {}
+
+ import_timestamp = random.randint( HydrusData.GetNow() - 1000000, HydrusData.GetNow() - 15 )
+
+ current_to_timestamps = { CC.COMBINED_LOCAL_FILE_SERVICE_KEY : import_timestamp, CC.COMBINED_LOCAL_MEDIA_SERVICE_KEY : import_timestamp, CC.LOCAL_FILE_SERVICE_KEY : import_timestamp }
+
+ tags_manager = ClientMediaManagers.TagsManager( service_keys_to_statuses_to_tags, service_keys_to_statuses_to_display_tags )
+
+ timestamp_manager = ClientMediaManagers.TimestampManager()
+
+ file_modified_timestamp = random.randint( import_timestamp - 50000, import_timestamp - 1 )
+
+ timestamp_manager.SetFileModifiedTimestamp( file_modified_timestamp )
+
+ locations_manager = ClientMediaManagers.LocationsManager(
+ current_to_timestamps,
+ {},
+ set(),
+ set(),
+ inbox = False,
+ urls = set(),
+ service_keys_to_filenames = service_keys_to_filenames,
+ timestamp_manager = timestamp_manager
+ )
+ ratings_manager = ClientMediaManagers.RatingsManager( {} )
+ notes_manager = ClientMediaManagers.NotesManager( { 'note' : 'hello', 'note2' : 'hello2' } )
+ file_viewing_stats_manager = ClientMediaManagers.FileViewingStatsManager.STATICGenerateEmptyManager()
+
+ media_result = ClientMediaResult.MediaResult( file_info_manager, tags_manager, locations_manager, ratings_manager, notes_manager, file_viewing_stats_manager )
+
+ return media_result
+
diff --git a/hydrus/test/TestClientAPI.py b/hydrus/test/TestClientAPI.py
index 7b8667ba..118170a0 100644
--- a/hydrus/test/TestClientAPI.py
+++ b/hydrus/test/TestClientAPI.py
@@ -8,6 +8,7 @@ import shutil
import time
import unittest
import urllib
+import urllib.parse
from twisted.internet import reactor
@@ -21,7 +22,9 @@ from hydrus.core import HydrusText
from hydrus.client import ClientConstants as CC
from hydrus.client import ClientAPI
+from hydrus.client import ClientLocation
from hydrus.client import ClientSearch
+from hydrus.client import ClientSearchParseSystemPredicates
from hydrus.client import ClientServices
from hydrus.client.importing import ClientImportFiles
from hydrus.client.media import ClientMediaManagers
@@ -31,6 +34,8 @@ from hydrus.client.networking import ClientLocalServer
from hydrus.client.networking import ClientLocalServerResources
from hydrus.client.networking import ClientNetworkingContexts
+from hydrus.test import HelperFunctions
+
CBOR_AVAILABLE = False
try:
import cbor2
@@ -2506,6 +2511,417 @@ class TestClientAPI( unittest.TestCase ):
self.assertEqual( boned_stats, dict( expected_data ) )
+ def _test_manage_duplicates( self, connection, set_up_permissions ):
+
+ # this stuff is super dependent on the db requests, which aren't tested in this class, but we can do the arg parsing and wrapper
+
+ api_permissions = set_up_permissions[ 'everything' ]
+
+ access_key_hex = api_permissions.GetAccessKey().hex()
+
+ headers = { 'Hydrus-Client-API-Access-Key' : access_key_hex }
+
+ default_location_context = ClientLocation.LocationContext.STATICCreateSimple( CC.COMBINED_LOCAL_MEDIA_SERVICE_KEY )
+
+ # file relationships
+
+ file_relationships_hash = bytes.fromhex( 'ac940bb9026c430ea9530b4f4f6980a12d9432c2af8d9d39dfc67b05d91df11d' )
+
+ # yes the database returns hex hashes in this case
+ example_response = {
+ "file_relationships" : {
+ "ac940bb9026c430ea9530b4f4f6980a12d9432c2af8d9d39dfc67b05d91df11d" : {
+ "is_king" : False,
+ "king" : "8784afbfd8b59de3dcf2c13dc1be9d7cb0b3d376803c8a7a8b710c7c191bb657",
+ "0" : [
+ ],
+ "1" : [],
+ "3" : [
+ "8bf267c4c021ae4fd7c4b90b0a381044539519f80d148359b0ce61ce1684fefe"
+ ],
+ "8" : [
+ "8784afbfd8b59de3dcf2c13dc1be9d7cb0b3d376803c8a7a8b710c7c191bb657",
+ "3fa8ef54811ec8c2d1892f4f08da01e7fc17eed863acae897eb30461b051d5c3"
+ ]
+ }
+ }
+ }
+
+ HG.test_controller.SetRead( 'file_relationships_for_api', example_response )
+
+ path = '/manage_file_relationships/get_file_relationships?hash={}'.format( file_relationships_hash.hex() )
+
+ connection.request( 'GET', path, headers = headers )
+
+ response = connection.getresponse()
+
+ data = response.read()
+
+ text = str( data, 'utf-8' )
+
+ self.assertEqual( response.status, 200 )
+
+ d = json.loads( text )
+
+ self.assertEqual( d[ 'file_relationships' ], example_response )
+
+ [ ( args, kwargs ) ] = HG.test_controller.GetRead( 'file_relationships_for_api' )
+
+ ( location_context, hashes ) = args
+
+ self.assertEqual( location_context, default_location_context )
+ self.assertEqual( hashes, { file_relationships_hash } )
+
+ # search files failed tag permission
+
+ tag_context = ClientSearch.TagContext( CC.COMBINED_TAG_SERVICE_KEY )
+ predicates = { ClientSearch.Predicate( ClientSearch.PREDICATE_TYPE_SYSTEM_EVERYTHING ) }
+
+ default_file_search_context = ClientSearch.FileSearchContext( location_context = default_location_context, tag_context = tag_context, predicates = predicates )
+
+ default_potentials_search_type = CC.DUPE_SEARCH_ONE_FILE_MATCHES_ONE_SEARCH
+ default_pixel_duplicates = CC.SIMILAR_FILES_PIXEL_DUPES_ALLOWED
+ default_max_hamming_distance = 4
+
+ test_tag_service_key_1 = CC.DEFAULT_LOCAL_TAG_SERVICE_KEY
+ test_tags_1 = [ 'skirt', 'system:width<400' ]
+
+ test_tag_context_1 = ClientSearch.TagContext( test_tag_service_key_1 )
+ test_predicates_1 = ClientLocalServerResources.ConvertTagListToPredicates( None, test_tags_1, do_permission_check = False )
+
+ test_file_search_context_1 = ClientSearch.FileSearchContext( location_context = default_location_context, tag_context = test_tag_context_1, predicates = test_predicates_1 )
+
+ test_tag_service_key_2 = HG.test_controller.example_tag_repo_service_key
+ test_tags_2 = [ 'system:untagged' ]
+
+ test_tag_context_2 = ClientSearch.TagContext( test_tag_service_key_2 )
+ test_predicates_2 = ClientLocalServerResources.ConvertTagListToPredicates( None, test_tags_2, do_permission_check = False )
+
+ test_file_search_context_2 = ClientSearch.FileSearchContext( location_context = default_location_context, tag_context = test_tag_context_2, predicates = test_predicates_2 )
+
+ test_potentials_search_type = CC.DUPE_SEARCH_BOTH_FILES_MATCH_DIFFERENT_SEARCHES
+ test_pixel_duplicates = CC.SIMILAR_FILES_PIXEL_DUPES_EXCLUDED
+ test_max_hamming_distance = 8
+
+ # get count
+
+ HG.test_controller.SetRead( 'potential_duplicates_count', 5 )
+
+ path = '/manage_file_relationships/get_potentials_count'
+
+ connection.request( 'GET', path, headers = headers )
+
+ response = connection.getresponse()
+
+ data = response.read()
+
+ text = str( data, 'utf-8' )
+
+ self.assertEqual( response.status, 200 )
+
+ d = json.loads( text )
+
+ self.assertEqual( d[ 'potential_duplicates_count' ], 5 )
+
+ [ ( args, kwargs ) ] = HG.test_controller.GetRead( 'potential_duplicates_count' )
+
+ ( file_search_context_1, file_search_context_2, potentials_search_type, pixel_duplicates, max_hamming_distance ) = args
+
+ self.assertEqual( file_search_context_1.GetSerialisableTuple(), default_file_search_context.GetSerialisableTuple() )
+ self.assertEqual( file_search_context_2.GetSerialisableTuple(), default_file_search_context.GetSerialisableTuple() )
+ self.assertEqual( potentials_search_type, default_potentials_search_type )
+ self.assertEqual( pixel_duplicates, default_pixel_duplicates )
+ self.assertEqual( max_hamming_distance, default_max_hamming_distance )
+
+ # get count with params
+
+ HG.test_controller.SetRead( 'potential_duplicates_count', 5 )
+
+ path = '/manage_file_relationships/get_potentials_count?tag_service_key_1={}&tags_1={}&tag_service_key_2={}&tags_2={}&potentials_search_type={}&pixel_duplicates={}&max_hamming_distance={}'.format(
+ test_tag_service_key_1.hex(),
+ urllib.parse.quote( json.dumps( test_tags_1 ) ),
+ test_tag_service_key_2.hex(),
+ urllib.parse.quote( json.dumps( test_tags_2 ) ),
+ test_potentials_search_type,
+ test_pixel_duplicates,
+ test_max_hamming_distance
+ )
+
+ connection.request( 'GET', path, headers = headers )
+
+ response = connection.getresponse()
+
+ data = response.read()
+
+ text = str( data, 'utf-8' )
+
+ self.assertEqual( response.status, 200 )
+
+ d = json.loads( text )
+
+ self.assertEqual( d[ 'potential_duplicates_count' ], 5 )
+
+ [ ( args, kwargs ) ] = HG.test_controller.GetRead( 'potential_duplicates_count' )
+
+ ( file_search_context_1, file_search_context_2, potentials_search_type, pixel_duplicates, max_hamming_distance ) = args
+
+ self.assertEqual( file_search_context_1.GetSerialisableTuple(), test_file_search_context_1.GetSerialisableTuple() )
+ self.assertEqual( file_search_context_2.GetSerialisableTuple(), test_file_search_context_2.GetSerialisableTuple() )
+ self.assertEqual( potentials_search_type, test_potentials_search_type )
+ self.assertEqual( pixel_duplicates, test_pixel_duplicates )
+ self.assertEqual( max_hamming_distance, test_max_hamming_distance )
+
+ # get pairs
+
+ default_max_num_pairs = 250
+ test_max_num_pairs = 20
+
+ test_hash_pairs = [ ( os.urandom( 32 ), os.urandom( 32 ) ) for i in range( 10 ) ]
+ test_media_result_pairs = [ ( HelperFunctions.GetFakeMediaResult( h1 ), HelperFunctions.GetFakeMediaResult( h2 ) ) for ( h1, h2 ) in test_hash_pairs ]
+ test_hash_pairs_hex = [ [ h1.hex(), h2.hex() ] for ( h1, h2 ) in test_hash_pairs ]
+
+ HG.test_controller.SetRead( 'duplicate_pairs_for_filtering', test_media_result_pairs )
+
+ path = '/manage_file_relationships/get_potential_pairs'
+
+ connection.request( 'GET', path, headers = headers )
+
+ response = connection.getresponse()
+
+ data = response.read()
+
+ text = str( data, 'utf-8' )
+
+ self.assertEqual( response.status, 200 )
+
+ d = json.loads( text )
+
+ self.assertEqual( d[ 'potential_duplicate_pairs' ], test_hash_pairs_hex )
+
+ [ ( args, kwargs ) ] = HG.test_controller.GetRead( 'duplicate_pairs_for_filtering' )
+
+ ( file_search_context_1, file_search_context_2, potentials_search_type, pixel_duplicates, max_hamming_distance ) = args
+
+ max_num_pairs = kwargs[ 'max_num_pairs' ]
+
+ self.assertEqual( file_search_context_1.GetSerialisableTuple(), default_file_search_context.GetSerialisableTuple() )
+ self.assertEqual( file_search_context_2.GetSerialisableTuple(), default_file_search_context.GetSerialisableTuple() )
+ self.assertEqual( potentials_search_type, default_potentials_search_type )
+ self.assertEqual( pixel_duplicates, default_pixel_duplicates )
+ self.assertEqual( max_hamming_distance, default_max_hamming_distance )
+ self.assertEqual( max_num_pairs, default_max_num_pairs )
+
+ # get pairs with params
+
+ HG.test_controller.SetRead( 'duplicate_pairs_for_filtering', test_media_result_pairs )
+
+ path = '/manage_file_relationships/get_potential_pairs?tag_service_key_1={}&tags_1={}&tag_service_key_2={}&tags_2={}&potentials_search_type={}&pixel_duplicates={}&max_hamming_distance={}&max_num_pairs={}'.format(
+ test_tag_service_key_1.hex(),
+ urllib.parse.quote( json.dumps( test_tags_1 ) ),
+ test_tag_service_key_2.hex(),
+ urllib.parse.quote( json.dumps( test_tags_2 ) ),
+ test_potentials_search_type,
+ test_pixel_duplicates,
+ test_max_hamming_distance,
+ test_max_num_pairs
+ )
+
+ connection.request( 'GET', path, headers = headers )
+
+ response = connection.getresponse()
+
+ data = response.read()
+
+ text = str( data, 'utf-8' )
+
+ self.assertEqual( response.status, 200 )
+
+ d = json.loads( text )
+
+ self.assertEqual( d[ 'potential_duplicate_pairs' ], test_hash_pairs_hex )
+
+ [ ( args, kwargs ) ] = HG.test_controller.GetRead( 'duplicate_pairs_for_filtering' )
+
+ ( file_search_context_1, file_search_context_2, potentials_search_type, pixel_duplicates, max_hamming_distance ) = args
+
+ max_num_pairs = kwargs[ 'max_num_pairs' ]
+
+ self.assertEqual( file_search_context_1.GetSerialisableTuple(), test_file_search_context_1.GetSerialisableTuple() )
+ self.assertEqual( file_search_context_2.GetSerialisableTuple(), test_file_search_context_2.GetSerialisableTuple() )
+ self.assertEqual( potentials_search_type, test_potentials_search_type )
+ self.assertEqual( pixel_duplicates, test_pixel_duplicates )
+ self.assertEqual( max_hamming_distance, test_max_hamming_distance )
+ self.assertEqual( max_num_pairs, test_max_num_pairs )
+
+ # get random
+
+ test_hashes = [ os.urandom( 32 ) for i in range( 6 ) ]
+ test_hash_pairs_hex = [ h.hex() for h in test_hashes ]
+
+ HG.test_controller.SetRead( 'random_potential_duplicate_hashes', test_hashes )
+
+ path = '/manage_file_relationships/get_random_potentials'
+
+ connection.request( 'GET', path, headers = headers )
+
+ response = connection.getresponse()
+
+ data = response.read()
+
+ text = str( data, 'utf-8' )
+
+ self.assertEqual( response.status, 200 )
+
+ d = json.loads( text )
+
+ self.assertEqual( d[ 'random_potential_duplicate_hashes' ], test_hash_pairs_hex )
+
+ [ ( args, kwargs ) ] = HG.test_controller.GetRead( 'random_potential_duplicate_hashes' )
+
+ ( file_search_context_1, file_search_context_2, potentials_search_type, pixel_duplicates, max_hamming_distance ) = args
+
+ self.assertEqual( file_search_context_1.GetSerialisableTuple(), default_file_search_context.GetSerialisableTuple() )
+ self.assertEqual( file_search_context_2.GetSerialisableTuple(), default_file_search_context.GetSerialisableTuple() )
+ self.assertEqual( potentials_search_type, default_potentials_search_type )
+ self.assertEqual( pixel_duplicates, default_pixel_duplicates )
+ self.assertEqual( max_hamming_distance, default_max_hamming_distance )
+
+ # get random with params
+
+ HG.test_controller.SetRead( 'random_potential_duplicate_hashes', test_hashes )
+
+ path = '/manage_file_relationships/get_random_potentials?tag_service_key_1={}&tags_1={}&tag_service_key_2={}&tags_2={}&potentials_search_type={}&pixel_duplicates={}&max_hamming_distance={}'.format(
+ test_tag_service_key_1.hex(),
+ urllib.parse.quote( json.dumps( test_tags_1 ) ),
+ test_tag_service_key_2.hex(),
+ urllib.parse.quote( json.dumps( test_tags_2 ) ),
+ test_potentials_search_type,
+ test_pixel_duplicates,
+ test_max_hamming_distance
+ )
+
+ connection.request( 'GET', path, headers = headers )
+
+ response = connection.getresponse()
+
+ data = response.read()
+
+ text = str( data, 'utf-8' )
+
+ self.assertEqual( response.status, 200 )
+
+ d = json.loads( text )
+
+ self.assertEqual( d[ 'random_potential_duplicate_hashes' ], test_hash_pairs_hex )
+
+ [ ( args, kwargs ) ] = HG.test_controller.GetRead( 'random_potential_duplicate_hashes' )
+
+ ( file_search_context_1, file_search_context_2, potentials_search_type, pixel_duplicates, max_hamming_distance ) = args
+
+ self.assertEqual( file_search_context_1.GetSerialisableTuple(), test_file_search_context_1.GetSerialisableTuple() )
+ self.assertEqual( file_search_context_2.GetSerialisableTuple(), test_file_search_context_2.GetSerialisableTuple() )
+ self.assertEqual( potentials_search_type, test_potentials_search_type )
+ self.assertEqual( pixel_duplicates, test_pixel_duplicates )
+ self.assertEqual( max_hamming_distance, test_max_hamming_distance )
+
+ # set relationship
+
+ # this is tricky to test fully
+
+ HG.test_controller.ClearWrites( 'duplicate_pair_status' )
+
+ HG.test_controller.ClearReads( 'media_result' )
+
+ hashes = {
+ 'b54d09218e0d6efc964b78b070620a1fa19c7e069672b4c6313cee2c9b0623f2',
+ 'bbaa9876dab238dcf5799bfd8319ed0bab805e844f45cf0de33f40697b11a845',
+ '22667427eaa221e2bd7ef405e1d2983846c863d40b2999ce8d1bf5f0c18f5fb2',
+ '65d228adfa722f3cd0363853a191898abe8bf92d9a514c6c7f3c89cfed0bf423',
+ '0480513ffec391b77ad8c4e57fe80e5b710adfa3cb6af19b02a0bd7920f2d3ec',
+ '5fab162576617b5c3fc8caabea53ce3ab1a3c8e0a16c16ae7b4e4a21eab168a7'
+ }
+
+ # TODO: populate the fakes here with real tags and test actual content merge
+ # to test the content merge, we'd want to set some content merge options and populate these fakes with real tags
+ # don't need to be too clever, just test one thing and we know it'll all be hooked up right
+ HG.test_controller.SetRead( 'media_results', [ HelperFunctions.GetFakeMediaResult( bytes.fromhex( hash_hex ) ) for hash_hex in hashes ] )
+
+ headers = { 'Hydrus-Client-API-Access-Key' : access_key_hex, 'Content-Type' : HC.mime_mimetype_string_lookup[ HC.APPLICATION_JSON ] }
+
+ path = '/manage_file_relationships/set_file_relationships'
+
+ test_pair_rows = [
+ [ 4, "b54d09218e0d6efc964b78b070620a1fa19c7e069672b4c6313cee2c9b0623f2", "bbaa9876dab238dcf5799bfd8319ed0bab805e844f45cf0de33f40697b11a845", False, False, True ],
+ [ 4, "22667427eaa221e2bd7ef405e1d2983846c863d40b2999ce8d1bf5f0c18f5fb2", "65d228adfa722f3cd0363853a191898abe8bf92d9a514c6c7f3c89cfed0bf423", False, False, True ],
+ [ 2, "0480513ffec391b77ad8c4e57fe80e5b710adfa3cb6af19b02a0bd7920f2d3ec", "5fab162576617b5c3fc8caabea53ce3ab1a3c8e0a16c16ae7b4e4a21eab168a7", False, False, False ]
+ ]
+
+ request_dict = { 'pair_rows' : test_pair_rows }
+
+ request_body = json.dumps( request_dict )
+
+ connection.request( 'POST', path, body = request_body, headers = headers )
+
+ response = connection.getresponse()
+
+ data = response.read()
+
+ self.assertEqual( response.status, 200 )
+
+ [ ( args, kwargs ) ] = HG.test_controller.GetWrite( 'duplicate_pair_status' )
+
+ ( written_rows, ) = args
+
+ def delete_thing( h, do_it ):
+
+ if do_it:
+
+ c = collections.defaultdict( list )
+
+ c[ b'local files' ] = [ HydrusData.ContentUpdate( HC.CONTENT_TYPE_FILES, HC.CONTENT_UPDATE_DELETE, { bytes.fromhex( h ) }, reason = 'From Client API (duplicates processing).' ) ]
+
+ return [ c ]
+
+ else:
+
+ return []
+
+
+
+ expected_written_rows = [ ( duplicate_type, bytes.fromhex( hash_a_hex ), bytes.fromhex( hash_b_hex ), delete_thing( hash_b_hex, delete_second ) ) for ( duplicate_type, hash_a_hex, hash_b_hex, merge, delete_first, delete_second ) in test_pair_rows ]
+
+ self.assertEqual( written_rows, expected_written_rows )
+
+ # set kings
+
+ HG.test_controller.ClearWrites( 'duplicate_set_king' )
+
+ headers = { 'Hydrus-Client-API-Access-Key' : access_key_hex, 'Content-Type' : HC.mime_mimetype_string_lookup[ HC.APPLICATION_JSON ] }
+
+ path = '/manage_file_relationships/set_kings'
+
+ test_hashes = [
+ "b54d09218e0d6efc964b78b070620a1fa19c7e069672b4c6313cee2c9b0623f2",
+ "bbaa9876dab238dcf5799bfd8319ed0bab805e844f45cf0de33f40697b11a845"
+ ]
+
+ request_dict = { 'hashes' : test_hashes }
+
+ request_body = json.dumps( request_dict )
+
+ connection.request( 'POST', path, body = request_body, headers = headers )
+
+ response = connection.getresponse()
+
+ data = response.read()
+
+ self.assertEqual( response.status, 200 )
+
+ [ ( args1, kwargs1 ), ( args2, kwargs2 ) ] = HG.test_controller.GetWrite( 'duplicate_set_king' )
+
+ self.assertEqual( { args1[0], args2[0] }, { bytes.fromhex( h ) for h in test_hashes } )
+
+
def _test_manage_pages( self, connection, set_up_permissions ):
api_permissions = set_up_permissions[ 'manage_pages' ]
@@ -4183,6 +4599,7 @@ class TestClientAPI( unittest.TestCase ):
self._test_add_tags( connection, set_up_permissions )
self._test_add_tags_search_tags( connection, set_up_permissions )
self._test_add_urls( connection, set_up_permissions )
+ self._test_manage_duplicates( connection, set_up_permissions )
self._test_manage_cookies( connection, set_up_permissions )
self._test_manage_pages( connection, set_up_permissions )
self._test_search_files( connection, set_up_permissions )
diff --git a/setup_venv.bat b/setup_venv.bat
index fa58b6fc..0b69ad0c 100644
--- a/setup_venv.bat
+++ b/setup_venv.bat
@@ -39,6 +39,7 @@ SET /P install_type=Do you want the (s)imple or (a)dvanced install?
IF "%install_type%" == "s" goto :create
IF "%install_type%" == "a" goto :question_qt
+IF "%install_type%" == "d" goto :create
goto :parse_fail
:question_qt
@@ -98,7 +99,23 @@ IF "%install_type%" == "s" (
python -m pip install -r requirements.txt
-) ELSE (
+)
+
+IF "%install_type%" == "d" (
+
+ python -m pip install -r static\requirements\advanced\requirements_core.txt
+
+ python -m pip install -r static\requirements\advanced\requirements_qt6_test.txt
+ python -m pip install pyside2
+ python -m pip install PyQtChart PyQt5
+ python -m pip install PyQt6-Charts PyQt6
+ python -m pip install -r static\requirements\advanced\requirements_new_mpv.txt
+ python -m pip install -r static\requirements\advanced\requirements_new_opencv.txt
+ python -m pip install -r static\requirements\hydev\requirements_windows_build.txt
+
+)
+
+IF "%install_type%" == "a" (
python -m pip install -r static\requirements\advanced\requirements_core.txt