In general, the API deals with standard UTF-8 JSON. POST requests and 200 OK responses are generally going to be a JSON 'Object' with variable names as keys and values obviously as values. There are examples throughout this document. For GET requests, everything is in standard GET parameters, but some variables are complicated and will need to be JSON encoded and then URL encoded. An example would be the 'tags' parameter on [GET /get\_files/search\_files](#get_files_search_files), which is a list of strings. Since GET http URLs have limits on what characters are allowed, but hydrus tags can have all sorts of characters, you'll be doing this:
On 200 OK, the API returns JSON for everything except actual file/thumbnail requests. On 4XX and 5XX, assume it will return plain text, which may be a raw traceback that I'd be interested in seeing. You'll typically get 400 for a missing parameter, 401/403/419 for missing/insufficient/expired access, and 500 for a real deal serverside error.
For any request sent to the API, the total size of the initial request line (this includes the URL and any parameters) and the headers must not be larger than 2 megabytes.
Exceeding this limit will cause the request to fail. Make sure to use pagination if you are passing very large JSON arrays as parameters in a GET request.
The API now tentatively supports CBOR, which is basically 'byte JSON'. If you are in a lower level language or need to do a lot of heavy work quickly, try it out!
To send CBOR, for POST put Content-Type `application/cbor` in your request header instead of `application/json`, and for GET just add a `cbor=1` parameter to the URL string. Use CBOR to encode any parameters that you would previously put in JSON:
For POST requests, just print the pure bytes in the body, like this:
If you send CBOR, the client will return CBOR. If you want to send CBOR and get JSON back, or _vice versa_ (or you are uploading a file and can't set CBOR Content-Type), send the Accept request header, like so:
```
Accept: application/cbor
Accept: application/json
```
If the client does not support CBOR, you'll get 406.
The client gives access to its API through different 'access keys', which are the typical 64-character hex used in many other places across hydrus. Each guarantees different permissions such as handling files or tags. Most of the time, a user will provide full access, but do not assume this. If the access header or parameter is not provided, you will get 401, and all insufficient permission problems will return 403 with appropriate error text.
Access is required for every request. You can provide this as an http header, like so:
Or you can include it in the normal parameters of any request (except _POST /add\_files/add\_file_, which uses the entire POST body for the file's bytes). For GET, this means including it into the URL parameters:
There is also a simple 'session' system, where you can get a temporary key that gives the same access without having to include the permanent access key in every request. You can fetch a session key with the [/session_key](#session_key) command and thereafter use it just as you would an access key, just with _Hydrus-Client-API-Session-Key_ instead.
Session keys will expire if they are not used within 24 hours, or if the client is restarted, or if the underlying access key is deleted. An invalid/expired session key will give a **419** result with an appropriate error text.
Bear in mind the Client API is still under construction. Setting up the Client API to be accessible across the internet requires technical experience to be convenient. HTTPS is available for encrypted comms, but the default certificate is self-signed (which basically means an eavesdropper can't see through it, but your ISP/government could if they decided to target you). If you have your own domain to host from and an SSL cert, you can replace them and it'll use them instead (check the db directory for client.crt and client.key). Otherwise, be careful about transmitting sensitive content outside of your localhost/network.
When you are searching, you may want to specify a particular file domain. Most of the time, you'll want to just set `file_service_key`, but this can get complex:
Arguments:
:
*`file_service_key`: (optional, selective A, hexadecimal, the file domain on which to search)
*`file_service_keys`: (optional, selective A, list of hexadecimals, the union of file domains on which to search)
*`deleted_file_service_key`: (optional, selective B, hexadecimal, the 'deleted from this file domain' on which to search)
*`deleted_file_service_keys`: (optional, selective B, list of hexadecimals, the union of 'deleted from this file domain' on which to search)
The service keys are as in [/get\_services](#get_services).
Hydrus supports two concepts here:
* Searching over a UNION of subdomains. If the user has several local file domains, e.g. 'favourites', 'personal', 'sfw', and 'nsfw', they might like to search two of them at once.
* Searching deleted files of subdomains. You can specifically, and quickly, search the files that have been deleted from somewhere.
You can play around with this yourself by clicking 'multiple locations' in the client with _help->advanced mode_ on.
In extreme edge cases, these two can be mixed by populating both A and B selective, making a larger union of both current and deleted file records.
Please note that unions can be very very computationally expensive. If you can achieve what you want with a single file_service_key, two queries in a row with different service keys, or an umbrella like `all my files` or `all local files`, please do. Otherwise, let me know what is running slow and I'll have a look at it.
'deleted from all local files' includes all files that have been physically deleted (i.e. deleted from the trash) and not available any more for fetch file/thumbnail requests. 'deleted from all my files' includes all of those physically deleted files _and_ the trash. If a file is deleted with the special 'do not leave a deletion record' command, then it won't show up in a 'deleted from file domain' search!
'all known files' is a tricky domain. It converts much of the search tech to ignore where files actually are and look at the accompanying tag domain (e.g. all the files that have been tagged), and can sometimes be very expensive.
Also, if you have the option to set both file and tag domains, you cannot enter 'all known files'/'all known tags'. It is too complicated to support, sorry!
The Client API used to respond to name-based service identifiers, for instance using 'my tags' instead of something like '6c6f63616c2074616773'. Service names can change, and they aren't _strictly_ unique either, so I have moved away from them, but there is some soft legacy support.
The client will attempt to convert any of these to their 'service_key(s)' equivalents:
* file_service_name
* tag_service_name
* service_names_to_tags
* service_names_to_actions_to_tags
* service_names_to_additional_tags
But I strongly encourage you to move away from them as soon as reasonably possible. Look up the service keys you need with [/get\_service](#get_service) or [/get\_services](#get_services).
If you have a clever script/program that does many things, then hit up [/get\_services](#get_services) on session initialisation and cache an internal map of key_to_name for the labels to use when you present services to the user.
Also, note that all users can now copy their service keys from _review services_.
: Some simple JSON describing the current api version (and hydrus client version, if you are interested).
: Note that this is mostly obselete now, since the 'Server' header of every response (and a duplicated 'Hydrus-Server' one, if you have a complicated proxy situation that overwrites 'Server') are now in the form "client api/{client_api_version} ({software_version})", e.g. "client api/32 (497)".
_Register a new external program with the client. This requires the 'add from api request' mini-dialog under_ services->review services _to be open, otherwise it will 403._
Restricted access: NO.
Required Headers: n/a
Arguments:
: * `name`: (descriptive name of your access)
*`basic_permissions`: A JSON-encoded list of numerical permission identifiers you want to request.
Note that the access you provide to get a new session key **can** be a session key, if that happens to be useful. As long as you have some kind of access, you can generate a new session key.
A session key expires after 24 hours of inactivity, whenever the client restarts, or if the underlying access key is deleted. A request on an expired session key returns 419.
: YES. At least one of Add Files, Add Tags, Manage Pages, or Search Files permission needed.
Required Headers: n/a
Arguments:
:
*`service_name`: (selective, string, the name of the service)
*`service_key`: (selective, hex string, the service key of the service)
Example requests:
:
```title="Example requests"
/get_service?service_name=my%20tags
/get_service?service_key=6c6f63616c2074616773
```
Response:
: Some JSON about the service. The same basic format as [/get\_services](#get_services)
```json title="Example response"
{
"service" : {
"name" : "my tags",
"service_key" : "6c6f63616c2074616773",
"type" : 5,
"type_pretty" : "local tag service"
}
}
```
If the service does not exist, this gives 404. It is very unlikely but edge-case possible that two services will have the same name, in this case you'll get the pseudorandom first.
It will only respond to services in the /get_services list. I will expand the available types in future as we add ratings etc... to the Client API.
Note that a user can rename their services, so while they will recognise `name`, it is not an excellent identifier, and definitely not something to save to any permanent config file.
`service_key` is non-mutable and is the main service identifier. The hardcoded/initial services have shorter fixed service key strings (it is usually just 'all known files' etc.. ASCII-converted to hex), but user-created services will have random 64-character hex.
Now that I state `type` and `type_pretty` here, I may rearrange this call, probably into a flat list. The `all_known_files` Object keys here are arbitrary.
For service `type`, you won't see all these, and you'll only ever need some, but the enum is:
: You can alternately just send the file's bytes as the POST body.
Response:
: Some JSON with the import result. Please note that file imports for large files may take several seconds, and longer if the client is busy doing other db work, so make sure your request is willing to wait that long for the response.
A file 'veto' is caused by the file import options (which in this case is the 'quiet' set under the client's _options->importing_) stopping the file due to its resolution or minimum file size rules, etc...
'hash' is the file's SHA256 hash in hexadecimal, and 'note' is some occasional additional human-readable text appropriate to the file status that you may recognise from hydrus's normal import workflow. For an import error, it will always be the full traceback.
If you specify a file service, the file will only be deleted from that location. Only local file domains are allowed (so you can't delete from a file repository or unpin from ipfs yet). It defaults to 'all my files', which will delete from all local services (i.e. force sending to trash). Sending 'all local files' on a file already in the trash will trigger a physical file delete.
This is the reverse of a delete_files--removing files from trash and putting them back where they came from. If you specify a file service, the files will only be undeleted to there (if they have a delete record, otherwise this is nullipotent). The default, 'all my files', undeletes to all local file services for which there are deletion records. There is no error if any of the files do not currently exist in 'trash'.
This puts files in the 'archive', taking them out of the inbox. It only has meaning for files currently in 'my files' or 'trash'. There is no error if any files do not currently exist or are already in the archive.
This puts files back in the inbox, taking them out of the archive. It only has meaning for files currently in 'my files' or 'trash'. There is no error if any files do not currently exist or are already in the inbox.
: Some JSON which files are known to be mapped to that URL. Note this needs a database hit, so it may be delayed if the client is otherwise busy. Don't rely on this to always be fast.
The `url_file_statuses` is a list of zero-to-n JSON Objects, each representing a file match the client found in its database for the URL. Typically, it will be of length 0 (for as-yet-unvisited URLs or Gallery/Watchable URLs that are not attached to files) or 1, but sometimes multiple files are given the same URL (sometimes by mistaken misattribution, sometimes by design, such as pixiv manga pages). Handling n files per URL is a pain but an unavoidable issue you should account for.
`status` is the same as for `/add_files/add_file`:
* 0 - File not in database, ready for import (you will only see this very rarely--usually in this case you will just get no matches)
* 2 - File already in database
* 3 - File previously deleted
`hash` is the file's SHA256 hash in hexadecimal, and 'note' is some occasional additional human-readable text you may recognise from hydrus's normal import workflow.
If you set `doublecheck_file_system` to `true`, then any result that is 'already in db' (2) will be double-checked against the actual file system. This check happens on any normal file import process, just to check for and fix missing files (if the file is missing, the status becomes 0--new), but the check can take more than a few milliseconds on an HDD or a network drive, so the default behaviour, assuming you mostly just want to spam for 'seen this before' file statuses, is to not do it.
'Unknown' URLs are treated in the client as direct File URLs. Even though the 'File URL' type is available, most file urls do not have a URL Class, so they will appear as Unknown. Adding them to the client will pass them to the URL Downloader as a raw file for download and import.
If you specify a `destination_page_name` and an appropriate importer page already exists with that name, that page will be used. Otherwise, a new page with that name will be recreated (and used by subsequent calls with that name). Make sure it that page name is unique (e.g. '/b/ threads', not 'watcher') in your client, or it may not be found.
Alternately, `destination_page_key` defines exactly which page should be used. Bear in mind this page key is only valid to the current session (they are regenerated on client reset or session reload), so you must figure out which one you want using the [/manage\_pages/get\_pages](#manage_pages_get_pages) call. If the correct page_key is not found, or the page it corresponds to is of the incorrect type, the standard page selection/creation rules will apply.
`show_destination_page` defaults to False to reduce flicker when adding many URLs to different pages quickly. If you turn it on, the client will behave like a URL drag and drop and select the final page the URL ends up on.
`service_keys_to_additional_tags` uses the same data structure as in /add\_tags/add\_tags--service keys to a list of tags to add. You will need 'add tags' permission or this will 403. These tags work exactly as 'additional' tags work in a _tag import options_. They are service specific, and always added unless some advanced tag import options checkbox (like 'only add tags to new files') is set.
filterable_tags works like the tags parsed by a hydrus downloader. It is just a list of strings. They have no inherant service and will be sent to a _tag import options_, if one exists, to decide which tag services get what. This parameter is useful if you are pulling all a URL's tags outside of hydrus and want to have them processed like any other downloader, rather than figuring out service names and namespace filtering on your end. Note that in order for a tag import options to kick in, I think you will have to have a Post URL URL Class hydrus-side set up for the URL so some tag import options (whether that is Class-specific or just the default) can be loaded at import time.
*`url_to_add`: (optional, selective A, an url you want to associate with the file(s))
*`urls_to_add`: (optional, selective A, a list of urls you want to associate with the file(s))
*`url_to_delete`: (optional, selective B, an url you want to disassociate from the file(s))
*`urls_to_delete`: (optional, selective B, a list of urls you want to disassociate from the file(s))
* [files](#parameters_files)
The single/multiple arguments work the same--just use whatever is convenient for you. Unless you really know what you are doing with URL Classes, I strongly recommend you stick to associating URLs with just one single 'hash' at a time. Multiple hashes pointing to the same URL is unusual and frequently unhelpful.
: 200 with no content. Like when adding tags, this is safely idempotent--do not worry about re-adding URLs associations that already exist or accidentally trying to delete ones that don't.
Mostly, hydrus simply trims excess whitespace, but the other examples are rare issues you might run into. 'system' is an invalid namespace, tags cannot be prefixed with hyphens, and any tag starting with ':' is secretly dealt with internally as "\[no namespace\]:\[colon-prefixed-subtag\]". Again, you probably won't run into these, but if you see a mismatch somewhere and want to figure it out, or just want to sort some numbered tags, you might like to try this.
*`search`: (the tag text to search for, enter exactly what you would in the client UI)
* [file domain](#parameters_file_domain) (optional, defaults to 'all my files')
*`tag_service_key`: (optional, hexadecimal, the tag domain on which to search, defaults to 'all known tags')
*`tag_display_type`: (optional, string, to select whether to search raw or sibling-processed tags, defaults to 'storage')
The `file domain` and `tag_service_key` perform the function of the file and tag domain buttons in the client UI.
The `tag_display_type` can be either `storage` (the default), which searches your file's stored tags, just as they appear in a 'manage tags' dialog, or `display`, which searches the sibling-processed tags, just as they appear in a normal file search page. In the example above, setting the `tag_display_type` to `display` could well combine the two kim possible tags and give a count of 3 or 4.
'all my files'/'all known tags' works fine for most cases, but a specific tag service or 'all known files'/'tag service' can work better for editing tag repository `storage` contexts, since it provides results just for that service, and for repositories, it gives tags for all the non-local files other users have tagged.
The `tags` list will be sorted by descending count. The various rules in _tags->manage tag display and search_ (e.g. no pure `*` searches on certain services) will also be checked--and if violated, you will get 200 OK but an empty result.
In 'service\_keys\_to...', the keys are as in [/get\_services](#get_services). You may need some selection UI on your end so the user can pick what to do if there are multiple choices.
Also, you can use either '...to\_tags', which is simple and add-only, or '...to\_actions\_to\_tags', which is more complicated and allows you to remove/petition or rescind pending content.
* 4 - Petition from a tag repository. (This is special)
* 5 - Rescind a petition from a tag repository.
When you petition a tag from a repository, a 'reason' for the petition is typically needed. If you send a normal list of tags here, a default reason of "Petitioned from API" will be given. If you want to set your own reason, you can instead give a list of \[ tag, reason \] pairs.
This last example is far more complicated than you will usually see. Pend rescinds and petition rescinds are not common. Petitions are also quite rare, and gathering a good petition reason for each tag is often a pain.
Note that the enumerated status keys in the service\_keys\_to\_actions\_to_tags structure are strings, not ints (JSON does not support int keys for Objects).
Note also that hydrus tag actions are safely idempotent. You can pend a tag that is already pended, or add a tag that already exists, and not worry about an error--the surplus add action will be discarded. The same is true if you try to pend a tag that actually already exists, or rescinding a petition that doesn't. Any invalid actions will fail silently.
It is fine to just throw your 'process this' tags at every file import and not have to worry about checking which files you already added them to.
!!! danger "HOWEVER"
When you delete a tag, a deletion record is made _even if the tag does not exist on the file_. This is important if you expect to add the tags again via parsing, because, in general, when hydrus adds tags through a downloader, it will not overwrite a previously 'deleted' tag record (this is to stop re-downloads overwriting the tags you hand-removed previously). Undeletes usually have to be done manually by a human.
So, _do_ be careful about how you spam delete unless it is something that doesn't matter or it is something you'll only be touching again via the API anyway.
With `merge_cleverly` left `false`, then this is a simple update operation. Existing notes will be overwritten exactly as you specify. Any other notes the file has will be untouched.
If you turn on `merge_cleverly`, then the client will merge your new notes into the file's existing notes using the same logic you have seen in Note Import Options and the Duplicate Metadata Merge Options. This navigates conflict resolution, and you should use it if you are adding potential duplicate content from an 'automatic' source like a parser and do not want to wade into the logic. Do not use it for a user-editing experience (a user expects a strict overwrite/replace experience and will be confused by this mode).
To start off, in this mode, if your note text exists under a different name for the file, your dupe note will not be added to your new name. `extend_existing_note_if_possible` makes it so your existing note text will overwrite an existing name (or a '... (1)' rename of that name) if the existing text is inside your given text. `conflict_resolution` is an enum governing what to do in all other conflicts:
: 200 with the note changes actually sent through. If `merge_cleverly=false`, this is exactly what you gave, and this operation is idempotent. If `merge_cleverly=true`, then this may differ, even be empty, and this operation might not be idempotent.
File search in hydrus is not paginated like a booru--all searches return all results in one go. In order to keep this fast, search is split into two steps--fetching file identifiers with a search, and then fetching file metadata in batches. You may have noticed that the client itself performs searches like this--thinking a bit about a search and then bundling results in batches of 256 files before eventually throwing all the thumbnails on screen.
If the access key's permissions only permit search for certain tags, at least one positive whitelisted/non-blacklisted tag must be in the "tags" list or this will 403. Tags can be prepended with a hyphen to make a negated tag (e.g. "-green eyes"), but these will not be checked against the permissions whitelist.
Wildcards and namespace searches are supported, so if you search for 'character:sam*' or 'series:*', this will be handled correctly clientside.
**Many system predicates are also supported using a text parser!** The parser was designed by a clever user for human input and allows for a certain amount of error (e.g. ~= instead of ≈, or "isn't" instead of "is not") or requires more information (e.g. the specific hashes for a hash lookup). **Here's a big list of examples that are supported:**
Please test out the system predicates you want to send. If you are in _help->advanced mode_, you can test this parser in the advanced text input dialog when you click the OR\* button on a tag autocomplete dropdown. More system predicate types and input formats will be available in future. Reverse engineering system predicate data from text is obviously tricky. If a system predicate does not parse, you'll get 400.
The file and tag services are for search domain selection, just like clicking the buttons in the client. They are optional--default is 'all my files' and 'all known tags'.
File searches occur in the `display``tag_display_type`. If you want to pair autocomplete tag lookup from [/search_tags](#add_tags_search_tags) to this file search (e.g. for making a standard booru search interface), then make sure you are searching `display` tags there.
file\_sort\_asc is 'true' for ascending, and 'false' for descending. The default is descending.
file\_sort\_type is by default _import time_. It is an integer according to the following enum, and I have written the semantic (asc/desc) meaning for each type after:
You can of course also specify `return_hashes=true&return_file_ids=false` just to get the hashes. The order of both lists is the same.
File ids are internal and specific to an individual client. For a client, a file with hash H always has the same file id N, but two clients will have different ideas about which N goes with which H. IDs are a bit faster to retrieve than hashes and search with _en masse_, which is why they are exposed here.
This search does **not** apply the implicit limit that most clients set to all searches (usually 10,000), so if you do system:everything on a client with millions of files, expect to get boshed. Even with a system:limit included, complicated queries with large result sets may take several seconds to respond. Just like the client itself.
*`hashes`: (selective, a list of hexadecimal hashes)
*`source_hash_type`: [sha256|md5|sha1|sha512] (optional, defaulting to sha256)
*`desired_hash_type`: [sha256|md5|sha1|sha512]
If you have some MD5 hashes and want to see what their SHA256 are, or _vice versa_, this is the place. Hydrus records the non-SHA256 hashes for every file it has ever imported. This data is not removed on file deletion.
: A mapping Object of the successful lookups. Where no matching hash is found, no entry will be made (therefore, if none of your source hashes have matches on the client, this will return an empty `hashes` Object).
This request string can obviously get pretty ridiculously long. It also takes a bit of time to fetch metadata from the database. In its normal searches, the client usually fetches file metadata in batches of 256.
`is_trashed` means if the file is currently in the trash but available on the hard disk. `is_deleted` means currently either in the trash or completely deleted from disk.
`file_services` stores which file services the file is <i>current</i>ly in and _deleted_ from. The entries are by the service key, same as for tags later on. In rare cases, the timestamps may be `null`, if they are unknown (e.g. a `time_deleted` for the file deleted before this information was tracked). The `time_modified` can also be null. Time modified is just the filesystem modified time for now, but it will evolve into more complicated storage in future with multiple locations (website post times) that'll be aggregated to a sensible value in UI.
`ipfs_multihashes` stores the ipfs service key to any known multihash for the file.
The `thumbnail_width` and `thumbnail_height` are a generally reliable prediction but aren't a promise. The actual thumbnail you get from [/get\_files/thumbnail](#get_files_thumbnail) will be different if the user hasn't looked at it since changing their thumbnail options. You only get these rows for files that hydrus actually generates an actual thumbnail for. Things like pdf won't have it. You can use your own thumb, or ask the api and it'll give you a fixed fallback; those are mostly 200x200, but you can and should size them to whatever you want.
The 'tags' structures are undergoing transition. Previously, this was a mess of different Objects in different domains, all `service_xxx_to_xxx_tags`, but they are being transitioned to the combined `tags` Object.
`hide_service_keys_tags` is deprecated and will be deleted soon. When set to `false`, it shows the old `service_keys_to_statuses_to_tags` and `service_keys_to_statuses_to_display_tags` Objects.
While the 'storage_tags' represent the actual tags stored on the database for a file, 'display_tags' reflect how tags appear in the UI, after siblings are collapsed and parents are added. If you want to edit a file's tags, refer to the storage tags. If you want to render to the user, use the display tags. The display tag calculation logic is very complicated; if the storage tags change, do not try to guess the new display tags yourself--just ask the API again.
If you ask with hashes rather than file_ids, hydrus will, by default, only return results when it has seen those hashes before. This is to stop the client making thousands of new file_id records in its database if you perform a scanning operation. If you ask about a hash the client has never encountered before--for which there is no file_id--you will get this style of result:
You can change this behaviour with `create_new_file_ids=true`, but bear in mind you will get a fairly 'empty' metadata result with lots of 'null' lines, so this is only useful for gathering the numerical ids for later Client API work.
If you ask about any file_ids that do not exist, you'll get 404.
If you set `only_return_basic_information=true`, this will be much faster for first-time requests than the full metadata result, but it will be slower for repeat requests. The full metadata object is cached after first fetch, the limited file info object is not.
If you add `detailed_url_information=true`, a new entry, `detailed_known_urls`, will be added for each file, with a list of the same structure as /`add_urls/get_url_info`. This may be an expensive request if you are querying thousands of files at once.
: YES. Search for Files permission needed. Additional search permission limits may apply.
Required Headers: n/a
Arguments :
:
*`file_id`: (numerical file id for the file)
*`hash`: (a hexadecimal SHA256 hash for the file)
Only use one. As with metadata fetching, you may only use the hash argument if you have access to all files. If you are tag-restricted, you will have to use a file_id in the last search you ran.
: YES. Search for Files permission needed. Additional search permission limits may apply.
Required Headers: n/a
Arguments:
:
*`file_id`: (numerical file id for the file)
*`hash`: (a hexadecimal SHA256 hash for the file)
Only use one. As with metadata fetching, you may only use the hash argument if you have access to all files. If you are tag-restricted, you will have to use a file_id in the last search you ran.
If hydrus keeps no thumbnail for the filetype, for instance with pdfs, then you will get the same default 'pdf' icon you see in the client. If the file does not exist in the client, or the thumbnail was expected but is missing from storage, you will get the fallback 'hydrus' icon, again just as you would in the client itself. This request should never give a 404.
!!! note
If you get a 'default' filetype thumbnail like the pdf or hydrus one, you will be pulling the defaults straight from the hydrus/static folder. They will most likely be 200x200 pixels.
This refers to the File Relationships system, which includes 'potential duplicates', 'duplicates', and 'alternates'.
This system is pending significant rework and expansion, so please do not get too married to some of the routines here. I am mostly just exposing my internal commands, so things are a little ugly/hacked. I expect duplicate and alternate groups to get some form of official identifier in future, which may end up being the way to refer and edit things here.
Also, at least for now, 'Manage File Relationships' permission is not going to be bound by the search permission restrictions that normal file search does. Getting this file relationship management permission allows you to search anything.
`king` refers to which file is set as the best of a duplicate group. If you are doing potential duplicate comparisons, the kings of your two groups are usually the ideal representatives, and the 'get some pairs to filter'-style commands try to select the kings of the various to-be-compared duplicate groups. `is_king` is a convenience bool for when a file is king of its own group.
**It is possible for the king to not be available.** Every group has a king, but if that file has been deleted, or if the file domain here is limited and the king is on a different file service, then it may not be available. A similar issue occurs when you search for filtering pairs--while it is ideal to compare kings with kings, if you set 'files must be pixel dupes', then the user will expect to see those pixel duplicates, not their champions--you may be forced to compare non-kings. `king_is_on_file_domain` lets you know if the king is on the file domain you set, and `king_is_local` lets you know if it is on the hard disk--if `king_is_local=true`, you can do a `/get_files/file` request on it. It is generally rare, but you have to deal with the king being unavailable--in this situation, your best bet is to just use the file itself as its own representative.
All the relationships you get are filtered by the file domain. If you set the file domain to 'all known files', you will get every relationship a file has, including all deleted files, which is often less useful than you would think. The default, 'all my files' is usually most useful.
A file that has no duplicates is considered to be in a duplicate group of size 1 and thus is always its own king.
The numbers are from a duplicate status enum, as so:
* 0 - potential duplicates
* 1 - false positives
* 3 - alternates
* 8 - duplicates
Note that because of JSON constraints, these are the string versions of the integers since they are Object keys.
All the hashes given here are in 'all my files', i.e. not in the trash. A file may have duplicates that have long been deleted, but, like the null king above, they will not show here.
_Get the count of remaining potential duplicate pairs in a particular search domain. Exactly the same as the counts you see in the duplicate processing page._
`tag_service_key_x` and `tags_x` work the same as [/get\_files/search\_files](#get_files_search_files). The `_2` variants are only useful if the `potentials_search_type` is 2.
`potentials_search_type` and `pixel_duplicates` are enums:
* 0 - one file matches search 1
* 1 - both files match search 1
* 2 - one file matches search 1, the other 2
-and-
* 0 - must be pixel duplicates
* 1 - can be pixel duplicates
* 2 - must not be pixel duplicates
The `max_hamming_distance` is the same 'search distance' you see in the Client UI. A higher number means more speculative 'similar files' search. If `pixel_duplicates` is set to 'must be', then `max_hamming_distance` is obviously ignored.
Response:
: A JSON Object stating the count.
``` json title="Example response"
{
"potential_duplicates_count" : 17
}
```
If you confirm that a pair of potentials are duplicates, this may transitively collapse other potential pairs and decrease the count by more than 1.
The selected pair sample and their order is strictly hardcoded for now (e.g. to guarantee that a decision will not invalidate any other pair in the batch, you shouldn't see the same file twice in a batch, nor two files in the same duplicate group). Treat it as the client filter does, where you fetch batches to process one after another. I expect to make it more flexible in future, in the client itself and here.
You will see significantly fewer than `max_num_pairs` (and potential duplicate count) as you close to the last available pairs, and when there are none left, you will get an empty list.
_Get some random potentially duplicate file hashes. Exactly the same as the 'show some random potential dupes' button in the duplicate processing page._
The arguments work the same as [/manage\_file\_relationships/get\_potentials\_count](#manage_file_relationships_get_potentials_count), with the caveat that `potentials_search_type` has special logic:
* 0 - first file matches search 1
* 1 - all files match search 1
* 2 - first file matches search 1, the others 2
Essentially, the first hash is the 'master' to which the others are paired. The other files will include every matching file.
Response:
: A JSON Object listing a group of hashes exactly as the client would.
2, 4, and 7 all make the files 'duplicates' (8 under `/get_file_relationships`), which, specifically, merges the two files' duplicate groups. 'same quality' has different duplicate content merge options to the better/worse choices, but it ultimately sets something similar to A>B (but see below for more complicated outcomes). You obviously don't have to use 'B is better' if you prefer just to swap the hashes. Do what works for you.
`do_default_content_merge` sets whether the user's duplicate content merge options should be loaded and applied to the files along with the relationship. Most operations in the client do this automatically, so the user may expect it to apply, but if you want to do content merge yourself, set this to false.
`delete_a` and `delete_b` are booleans that select whether to delete A and/or B in the same operation as setting the relationship. You can also do this externally if you prefer.
If you try to add an invalid or redundant relationship, for instance setting files that are already duplicates as potential duplicates, no changes are made.
This is the file relationships request that is probably most likely to change in future. I may implement content merge options. I may move from file pairs to group identifiers. When I expand alternates, those file groups are going to support more variables.
Recall in `/get_file_relationships` that we discussed how duplicate groups have a 'king' for their best file. This file is the most useful representative when you do comparisons, since if you say "King A > King B", then we know that King A is also better than all of King B's normal duplicate group members. We can merge the group simply just by folding King B and all the other members into King A's group.
So what happens if you say 'A = B'? We have to have a king, so which should it be?
What happens if you say "non-king member of A > non-king member of B"? We don't want to merge all of B into A, since King B might be higher quality than King A.
The logic here can get tricky, but I have tried my best to avoid overcommitting and accidentally promoting the wrong king. Here are all the possible situations ('>' means 'better than', and '=' means 'same quality as'):
??? abstract "Merges"
* King A > King B
* Merge B into A
* King A is king of the new combined group
* Non-King A > King B
* Merge B into A
* King of A is king of the new combined group
* King A > Non-King B
* Remove Non-King B from B and merge it into A
* King A stays king of A
* King of B stays king of B
* Non-King A > Non-King B
* Remove Non-King B from B and merge it into A
* King of A stays king of A
* King of B stays king of B
* King A = King B
* Merge B into A
* King A is king of the new combined group
* Non-King A = King B
* Merge B into A
* King of A is king of the new combined group
* King A = Non-King B
* Merge A into B
* King of B is king of the new combined group
* Non-King A = Non-King B
* Remove Non-King B from B and merge it into A
* King of A stays king of A
* King of B stays king of B
So, if you can, always present kings to your users, and action using those kings' hashes. It makes the merge logic easier in all cases. Remember that you can set `system:is the best quality file of its duplicate group` in any file search to exclude any non-kings (e.g. if you are hunting for easily actionable pixel potential duplicates).
The files will be promoted to be the kings of their respective duplicate groups. If the file is already the king (also true for any file with no duplicates), this is idempotent. It also processes the files in the given order, so if you specify two files in the same group, the latter will be the king at the end of the request.
## Managing Cookies and HTTP Headers
This refers to the cookies held in the client's session manager, which are sent with network requests to different domains.
Note that these variables are all strings except 'expires', which is either an integer timestamp or _null_ for session cookies.
This request will also return any cookies for subdomains. The session system in hydrus generally stores cookies according to the second-level domain, so if you request for specific.someoverbooru.net, you will still get the cookies for someoverbooru.net and all its subdomains.
Set some new cookies for the client. This makes it easier to 'copy' a login from a web browser or similar to hydrus if hydrus's login system can't handle the site yet.
Restricted access:
: YES. Manage Cookies permission needed.
Required Headers:
:
*`Content-Type`: application/json
Arguments (in JSON):
:
*`cookies`: (a list of cookie rows in the same format as the GET request above)
This sets the 'Global' User-Agent for the client, as typically editable under _network->data->manage http headers_, for instance if you want hydrus to appear as a specific browser associated with some cookies.
_Get the page structure of the current UI session._
Restricted access:
: YES. Manage Pages permission needed.
Required Headers: n/a
Arguments: n/a
Response:
: A JSON Object of the top-level page 'notebook' (page of pages) detailing its basic information and current sub-pages. Page of pages beneath it will list their own sub-page lists.
`page_key` is a unique identifier for the page. It will stay the same for a particular page throughout the session, but new ones are generated on a session reload.
Most pages will be 0, normal/ready, at all times. Large pages will start in an 'initialising' state for a few seconds, which means their session-saved thumbnails aren't loaded yet. Search pages will enter 'searching' after a refresh or search change and will either return to 'ready' when the search is complete, or fall to 'search cancelled' if the search was interrupted (usually this means the user clicked the 'stop' button that appears after some time).
`selected` means which page is currently in view. It will propagate down the page of pages until it terminates. It may terminate in an empty page of pages, so do not assume it will end on a media page.
The top page of pages will always be there, and always selected.
This is under construction. The current call dumps a ton of info for different downloader pages. Please experiment in IRL situations and give feedback for now! I will flesh out this help with more enumeration info and examples as this gets nailed down. POST commands to alter pages (adding, removing, highlighting), will come later.
Restricted access:
: YES. Manage Pages permission needed.
Required Headers: n/a
Arguments:
:
*`page_key`: (hexadecimal page\_key as stated in [/manage\_pages/get\_pages](#manage_pages_get_pages))
*`simple`: true or false (optional, defaulting to true)
As you can see, even the 'simple' mode can get very large. Imagine that response for a page watching 100 threads! Turning simple mode off will display every import item, gallery log entry, and all hashes in the media (thumbnail) panel.
For this first version, the five importer pages--hdd import, simple downloader, url downloader, gallery page, and watcher page--all give rich info based on their specific variables. The first three only have one importer/gallery log combo, but the latter two of course can have multiple. The "imports" and "gallery_log" entries are all in the same data format.
The files you set will be appended to the given page, just like a thumbnail drag and drop operation. The page key is the same as fetched in the [/manage\_pages/get\_pages](#manage_pages_get_pages) call.
_Refresh a page in the main GUI. Like hitting F5 in the client, this obviously makes file search pages perform their search again, but for other page types it will force the currently in-view files to be re-sorted._
Restricted access:
: YES. Manage Pages permission needed.
Required Headers:
:
*`Content-Type`: application/json
Arguments (in JSON):
:
*`page_key`: (the page key for the page you wish to refresh)
The page key is the same as fetched in the [/manage\_pages/get\_pages](#manage_pages_get_pages) call. If a file search page is not set to 'searching immediately', a 'refresh' command does nothing.
Poll the `page_state` in [/manage\_pages/get\_pages](#manage_pages_get_pages) or [/manage\_pages/get\_page\_info](#manage_pages_get_page_info) to see when the search is complete.
_Pause the client's database activity and disconnect the current connection._
Restricted access:
: YES. Manage Database permission needed.
Arguments: None
This is a hacky prototype. It commands the client database to pause its job queue and release its connection (and related file locks and journal files). This puts the client in a similar position as a long VACUUM command--it'll hang in there, but not much will work, and since the UI async code isn't great yet, the UI may lock up after a minute or two. If you would like to automate database backup without shutting the client down, this is the thing to play with.
This should return pretty quick, but it will wait up to five seconds for the database to actually disconnect. If there is a big job (like a VACUUM) current going on, it may take substantially longer to finish that up and process this STOP command. You might like to check for the existence of a journal file in the db dir just to be safe.
As long as this lock is on, all Client API calls except the unlock command will return 503. (This is a decent way to test the current lock status, too)
_Reconnect the client's database and resume activity._
Restricted access:
: YES. Manage Database permission needed.
Arguments: None
This is the obvious complement to the lock. The client will resume processing its job queue and will catch up. If the UI was frozen, it should free up in a few seconds, just like after a big VACUUM.
_Get the data from help->how boned am I?. This is a simple Object of numbers just for hacky advanced purposes if you want to build up some stats in the background. The numbers are the same as the dialog shows, so double check that to confirm what means what._