diff --git a/changelog.html b/changelog.html index 1771be76..a05c43af 100644 --- a/changelog.html +++ b/changelog.html @@ -2850,7 +2850,6 @@
The hydrus network client is a desktop application written for Anonymous and other internet enthusiasts with large media collections. It organises your files into an internal database and browses them with tags instead of folders, a little like a booru on your desktop. Tags and files can be anonymously shared through custom servers that any user may run. Everything is free, nothing phones home, and the source code is included with the release. It is developed mostly for Windows, but builds for Linux and macOS are available (perhaps with some limitations, depending on your situation).
The software is constantly being improved. I try to put out a new release every Wednesday by 8pm Eastern.
Hydrus supports various filetypes for images, video and audio files, image project files, and more. A full list of supported filetypes is here.
On the Windows and Linux builds, an MPV window is embedded to play video and audio smoothly. For files like pdf, which cannot currently be viewed in the client, it is easy to launch any file with your OS's default program.
The client can download files and parse tags from a number of websites, including by default:
And can be extended to download from more locations using easily shareable user-made downloaders. It can also be set to 'subscribe' to any gallery search, repeating it every few days to keep up with new results.
The program's emphasis is on your freedom. There is no DRM, no spying, no censorship. The program never phones home.
"},{"location":"index.html#start_here","title":"Start Here","text":"If you would like to try hydrus, I strongly recommend you check out the help and getting started guide. It will take you through all the main systems.
"},{"location":"index.html#links","title":"links","text":"Killed
Add the followng line to the end of /etc/sysctl.conf
. You will need admin, so use
sudo nano /etc/sysctl.conf
or sudo gedit /etc/sysctl.conf
vm.min_free_kbytes=1153434\nvm.overcommit_memory=1\n
Check that you have (enough) swap space or you might still run out of memory.
sudo swapon --show\n
If you need swap
sudo fallocate -l 16G /swapfile #make 16GiB of swap\nsudo chmod 600 /swapfile\nsudo mkswap /swapfile\n
Add to /etc/fstab
so your swap is mounted on reboot /swapfile swap swap defaults 0 0\n
You may add as many swapfiles as you like, and should add a new swapfile before you delete an old one if you plan to do so, as unmounting a swapfile will evict its contents back in to real memory. You may also wish to use a swapfile type that uses compression, this saves you some disk space for a little bit of a performance hit, but also significantly saves on mostly empty memory.
Reboot for all changes to take effect, or use sysctl
to set vm
variables.
Linux's memory allocator is lazy and does not perform opportunistic reclaim. This means that the system will continue to give your process memory from the real and virtual memory pool(swap) until there is none left.
Linux will only cleanup if the available total real and virtual memory falls below the watermark as defined in the system control configuration file /etc/sysctl.conf
. The watermark's name is vm.min_free_kbytes
, it is the number of kilobytes the system keeps in reserve, and therefore the maximum amount of memory the system can allocate in one go before needing to reclaim memory it gave eariler but which is no longer in use.
The default value is vm.min_free_kbytes=65536
, which means 66MiB (megabytes).
If for a given request the amount of memory asked to be allocated is under vm.min_free_kbytes
, but this would result in an ammount of total free memory less than vm.min_free_kbytes
then the OS will clean up memory to service the request.
If vm.min_free_kbytes
is less than the ammount requested and there is no virtual memory left, then the system is officially unable to service the request and will lauch the OOMKiller (Out of Memory Killer) to free memory by kiling memory glut processes.
Increase the vm.min_free_kbytes
value to prevent this scenario.
The OOM kill decides which program to kill to reclaim memory, since hydrus loves memory it is usually picked first, even if another program asking for memory caused the OOM condition. Setting the minimum free kilobytes higher will avoid the running of the OOMkiller which is always preferable, and almost always preventable.
"},{"location":"Fixing_Hydrus_Random_Crashes_Under_Linux.html#memory_overcommmit","title":"Memory Overcommmit","text":"We mentioned that Linux will keep giving out memory, but actually it's possible for Linux to launch the OOM killer if it just feel like our program is aking for too much memory too quickly. Since hydrus is a heavyweight scientific processing package we need to turn this feature off. To turn it off change the value of vm.overcommit_memory
which defaults to 2
.
Set vm.overcommit_memory=1
this prevents the OS from using a heuristic and it will just always give memory to anyone who asks for it.
Swapiness is a setting you might have seen, but it only determines Linux's desire to spend a little bit of time moving memory you haven't touched in a while out of real memory and into virtual memory, it will not prevent the OOM condition it just determines how much time to use for moving things into swap.
"},{"location":"Fixing_Hydrus_Random_Crashes_Under_Linux.html#why_does_my_linux_system_studder_or_become_unresponsive_when_hydrus_has_been_running_a_while","title":"Why does my Linux system studder or become unresponsive when hydrus has been running a while?","text":"You are running out of pages because Linux releases I/O buffer pages only when a file is closed. Thus the OS is waiting for you to hit the watermark(as described in \"why is hydrus crashing\") to start freeing pages, which causes the chug. When contents is written from memory to disk the page is retained so that if you reread that part of the disk the OS does not need to access disk it just pulls it from the much faster memory. This is usually a good thing, but Hydrus does not close database files so it eats up pages over time. This is really good for hydrus but sucks for the responsiveness of other apps, and will cause hydrus to consume pages after doing a lengthy operation in anticipation of needing them again, even when it is thereafter idle. You need to set vm.dirtytime_expire_seconds
to a lower value.
vm.dirtytime_expire_seconds
When a lazytime inode is constantly having its pages dirtied, the inode with an updated timestamp will never get chance to be written out. And, if the only thing that has happened on the file system is a dirtytime inode caused by an atime update, a worker will be scheduled to make sure that inode eventually gets pushed out to disk. This tunable is used to define when dirty inode is old enough to be eligible for writeback by the kernel flusher threads. And, it is also used as the interval to wakeup dirtytime writeback thread.
On many distros this happens only once every 12 hours, try setting it close to every one hour or 2. This will cause the OS to drop pages that were written over 1-2 hours ago. Returning them to the free store for use by other programs.
https://www.kernel.org/doc/Documentation/sysctl/vm.txt
"},{"location":"Fixing_Hydrus_Random_Crashes_Under_Linux.html#why_does_everything_become_clunky_for_a_bit_if_i_have_tuned_all_of_the_above_settings","title":"Why does everything become clunky for a bit if I have tuned all of the above settings?","text":"The kernel launches a process called kswapd
to swap and reclaim memory pages, its behaviour is goverened by the following two values
vm.vfs_cache_pressure
The tendancy for the kernel to reclaim I/O cache for files and directories. Default=100, set to 110 to bias the kernel into reclaiming I/O pages over keeping them at a \"fair rate\" compared to other pages. Hydrus tends to write a lot of files and then ignore them for a long time, so its a good idea to prefer freeing pages for infrequent I/O. Note: Increasing vfs_cache_pressure
significantly beyond 100 may have negative performance impact. Reclaim code needs to take various locks to find freeable directory and inode objects. With vfs_cache_pressure=1000
, it will look for ten times more freeable objects than there are.
watermark_scale_factor
This factor controls the aggressiveness of kswapd. It defines the amount of memory left in a node/system before kswapd is woken up and how much memory needs to be free before kswapd goes back to sleep. The unit is in fractions of 10,000. The default value of 10 means the distances between watermarks are 0.1% of the available memory in the node/system. The maximum value is 1000, or 10% of memory. A high rate of threads entering direct reclaim (allocstall) or kswapd going to sleep prematurely (kswapd_low_wmark_hit_quickly) can indicate that the number of free pages kswapd maintains for latency reasons is too small for the allocation bursts occurring in the system. This knob can then be used to tune kswapd aggressiveness accordingly.
I like to keep watermark_scale_factor
at 70 (70/10,000)=0.7%, so kswapd will run until at least 0.7% of system memory has been reclaimed. i.e. If 32GiB (real and virt) of memory, it will try to keep at least 0.224 GiB immediately available.
An example /etc/sysctl.conf section for virtual memory settings.
########\n# virtual memory\n########\n\n#1 always overcommit, prevents the kernel from using a heuristic to decide that a process is bad for asking for a lot of memory at once and killing it.\n#https://www.kernel.org/doc/Documentation/vm/overcommit-accounting\nvm.overcommit_memory=1\n\n#force linux to reclaim pages if under a gigabyte \n#is available so large chunk allocates don't fire off the OOM killer\nvm.min_free_kbytes = 1153434\n\n#Start freeing up pages that have been written but which are in open files, after 2 hours.\n#Allows pages in long lived files to be reclaimed\nvm.dirtytime_expire_seconds = 7200\n\n#Have kswapd try to reclaim .7% = 70/10000 of pages before returning to sleep\n#This increases responsiveness by reclaiming a larger portion of pages in low memory condition\n#So that the next time you make a large allocation the kernel doesn't have to stall and look for pages to free immediately.\nvm.watermark_scale_factor=70\n\n#Have the kernel prefer to reclaim I/O pages at 110% of the rate at which it frees other pages.\n#Don't set this value much over 100 or the kernel will spend all its time reclaiming I/O pages\nvm.vfs_cache_pressure=110\n
"},{"location":"PTR.html","title":"PTR for Dummies","text":"or Myths and facts about the Public Tag Repository
"},{"location":"PTR.html#what_is_the_ptr","title":"What is the PTR?","text":"Short for Public Tag Repository, a now community managed repository of tags. Locally it acts as a tag service, just like my tags
. At the time of writing 54 million files have tags on it. The PTR only store the sha256 hash and tag mappings of a file, not the files themselves or any non-tag meta data. In other words: If you do not see it in the tag list then it is not stored.
Most of the things in this document also applies to self-hosted servers, except for tag guidelines.
"},{"location":"PTR.html#connecting_to_the_ptr","title":"Connecting to the PTR","text":"The easiest method is to use the built in function, found under help -> add the public tag repository
. For adding it manually, if you so desire, read the Hydrus help document on access keys.
Once you are connected, Hydrus will proceed to download and then process the update files. The progress of this can be seen under services -> review services -> remote -> tag repositories -> public tag repository
. Here you can view its status, your account (the default account is a shared public account. Currently only janitors and the administrator have personal accounts), tag status, and how synced you are. Being behind on the sync by a certain amount makes you unable to push tags and petitions until you are caught up again.
QuickSync 2
If you are starting out with a completely fresh client, you can instead download a fully pre-synced client here Though a little out of date, it will nonetheless save time. Some settings may differ from the defaults of an official installation.
"},{"location":"PTR.html#how_does_it_work","title":"How does it work?","text":"For something to end up on the PTR it has to be pushed there. Tags can either be entered into the tag service manually by the user through the manage tags
window, or be routed there by a parser when downloading files. See parsing tags. Once tags have been entered into the PTR tag service they are pending until pushed. This is indicated by the pending ()
that will appear between tags
and help
in the menu bar. Here you can chose to either push your changes to the PTR or discard them.
When making petitions it is important to remember that janitors are only human. We do not necessarily know everything about every niche. We do not necessarily have the files you are making changes for and we will only see a blank thumbnail if we do not have the file. Explain why you are making a petition. Try and keep the number of files manageable. If a janitor at any point is unsure if the petition is correct they are likely to deny the entire petition rather than risk losing good tags. Some users have pushed changes regarding hundreds of tags over thousands of files at once, but due to disregarding PTR tagging practices or being lazy with justification the petition has been denied entirely. Or they have just been plain wrong, trying to impose frankly stupid tagging methods.
Furthermore, if you are two weeks out of sync with PTR you are unable to push additions or deletions until you're back within the threshold.
Q: Does this automagically tag my files? A: No. Until we get machine learning based auto-tagging nothing is truly automatic. All tags on the PTR were uploaded by another user, so if nobody uploaded tags associated with the hash of your file it won't have any tags in the PTR. Q: How good is the PTR at tagging [insert file format or thing from site here]? A: That depends largely on if there's a scrapable database of tags for whatever you're asking about. Anything that comes from a booru or site that supports tags is fairly likely to have something on the PTR. Original content on some obscure chan-style imageboard is less so. Q: Help! My files don't have any tags! What do!? A: As stated above, some things are just very likely to not have any tags. It is also possible that the files have been altered by whichever service you downloaded from. Imgur, Reddit, Discord, and many other sites and services recompress images to save space which might give it a different hash even if it looks indistinguishable from the original file. Use one of the IQDB lookup programs linked in Cuddle's wiki. Q: Why is my database so big!? This can't be right. A: It is working as intended. The size is because you are literally downloading and processing the entire tag database and history of the PTR. It is done this way to ensure redundancy and privacy. Redundancy because anybody with an up-to-date PTR sync can just start their own. Privacy because nobody can tell what files you have since you are downloading the tags for everything the PTR has. Q: Does that mean I can't do anything about the size? A: Correct. There are some plans to crunch the size through a few methods but there are a lot of other far more requested features being, well, requested. Speaking crassly if you are bothered by the size requirement of the PTR you probably don't have a big enough library to really benefit and would be better off just using the IQDB script."},{"location":"PTR.html#janitors","title":"Janitors","text":"Janitors are the people that review petitions. You can meet us at the community Discord to ask questions or see us bitch about some of the silly stuff boorus and users cause to end up in the PTR.
"},{"location":"PTR.html#tag_guidelines","title":"Tag Guidelines","text":"These are a mix of standard practice used by various boorus and changes made by Hydrus Developer and PTR users, ratified by the janitors that actually have to manage all of this. The \"full\" document is viewable at Cuddle's git repo. See Hydrus Developer's thoughts on a public tagging schema.
If you are looking to help out by tagging low tag-count files, remember to keep the tags objective, start simple by for example adding the characters/persons and big obvious things in the image or what else. Tagging every little thing and detail is a sure path to burnout. If you are looking to petition removal of tags then it is preferable to sibling common misspellings, underscores, and defunct tags rather than deleting them outright. The exception is for ambiguous tags where it is better to delete and replace with a less ambiguous tag. When deleting tags that don't belong in the image it can be helpful if you include a short description as to why. It's also helpful if you sanitise downloaded tags from sites with tagged galleries before pushing them to the PTR. For example Pixiv, where you can have a gallery of multiple images, each containing one character, and all of the characters being tagged. Consequently all images in that gallery will have all of the character tags despite no image having more than one character.
"},{"location":"PTR.html#siblings_and_parents","title":"Siblings and parents","text":"When making siblings, go for the closest less-bad tag. Example: bad_tag
-> bad tag
, rather than going for what the top level sibling might be. This creates less potential future work in case standards change and makes it so your request is less likely to be denied by a janitor not being entirely certain that what you're asking is right. Be careful about creating siblings for potentially ambiguous tags. Is james bond
supposed to be character:james bond
or is it series:james bond
? This is a bit of a bad example due to having the case of the character always belonging to the series, so you can safely sibling it to series:james bond
since all instances of the character will also have the series, but not all instances of the series will have the character. So let us look at another example: how about wool
? Is it the material harvested from sheep, or is it the Malaysian artist that likes to draw Touhou? In doubtful cases it's better to leave it as is, petition the tag for deletion if it's incorrect and add the correct tag.
When making parents, make sure it's an always factually correct relationship. character:james bond
always belongs to series:james bond
. But character:james bond
is not always person:pierce brosnan
. Common examples of not-always true relationships: gender (genderbending), species (furrynisation/humanisation/anthropomorphism), hair colour, eye colour, and other mutable traits.
creator:
Used for the creator of the tagged piece of media. Hydrus being primarily used for images it will often be the artist that drew the image. Other potential examples are the author of a book or musician for a song. character:
Refers to characters. James Bond is a character. person:
Refers to real persons. Pierce Brosnan is a person. series:
Used for series. James Bond is a series tag and so is GoldenEye. Due to usage being different on some boorus chance is that you will also see things like Absolut Vodka and other brands in it. photoset:
Used for photosets. Primarily seen for content from idols, cosplayers, and gravure idols. studio:
Is used for the entity that facilitated the production of the file or what's in it. Eon Productions for the James Bond movies. species:
Species of the depicted characters/people/animals. Somewhat controversial for being needlessly detailed, some janitors not liking the namespace at all. Primarily used for furry content. title:
The title of the file. One of the tags Hydrus uses for various purposes such as sorting and collecting. Somewhat tainted by rampant Reddit parsers. medium:
Used for tags about the image and how it's made. Photography, water painting, napkin sketch as a few examples. White background, simple background, checkered background as a few others. What you see about the image. meta:
This namespace is used for information that isn't visible in the image itself or where you might need to go to the source. Some examples include: third-party edit, paid reward (patreon/enty/gumroad/fantia/fanbox), translated, commentary, and such. What you know about the image. Namespaces not listed above are not \"supported\" by the janitors and are liable to get siblinged out, removed, and/or mocked if judged being bad and annoying enough to justify the work. Do not take this to mean that all un-listed namespaces are bad, some are created and used by parsers to indicate where an image came from which can be helpful if somebody else wants to fetch the original or check source tags against the PTR tags. But do exercise some care in what you put on the PTR if you use custom namespaces. Recently clothing:
was removed due to being disliked, no booru using it, and the person(s) pushing for it seeming to have disappeared, leaving a less-than-finished mess behind. It was also rife with lossy siblings and things that just plain don't belong with clothing, such as clothing:brown hair
.
Tuning your database synchronization using the --db_synchronous_override=0
launch argument can make Hydrus significantly faster with some caveats.
--db_synchronous_override=1
on any modern filesystem and this is the default.0
you are gambling, but it is a safe gamble if you have a backup and know exactly what you are doingsync
on *NIX systems, or normal shutdown), orsynchronous=0
, other I/O on your system will slow down as the pending writes are interleaved. Normal shutdown may also take abnormally long because the system is flushing these pending writes, but you must allow it to take its time as explained in the section below.Note: In historical versions of hydrus (synchronous=2
), performance was terrible because hydrus would agressively (it was arguably somewhat paranoid) write changes to disk.
Setting the synchronous to 0 lets the database engine defer writing to disk as long as physically possible. In the normal operation of your system, files are constantly being partially transfered to disk, even if the OS pretends they have been fully written to disk. This is called write cache and it is really important to use it or your system's performance would be terrible. The caveat is that until you have \"synced
\" the disk cache, the changes to files are not actually in permanent storage. One purpose of a normal shutdown of the operating system is to make sure all disk caches have been flushed and synced. A program can also request that a file it has just written to be flushed or synced, and it will wait until that is done before continuing.
When not in synchronous 0 mode, the database engine syncs at regular intervals to make sure data has been written. - Setting synchronous to 0 is generally safe if and only if the system also shuts down normally, allowing any of these pending writes to be flushed. - The database can back out of partial changes if hydrus crashes even if synchronous=0
, so your database will not go corrupt from hydrus shutting down abnormally, only from the system shutting down abnormally.
Programmers are responsible for handling partially written files, but this is tedious for large complex data, so they use a database engine which handles all of this. The database ensures that any partially written data is reversible to a known state (called a rollback).
An existing file may be in 3 possible states:
fflush(FILE)
. fflush()
is called automatically when a programmer closes a file, or exits the program normally(under most runtimes but not for example in Java). If the program exits abnormally before data is flushed it will be lost when the program crashes.fflush()
. When you \"safely shutdown:, you are instructing the OS among other things to sync the flushed files. If someone decides to read a file before it has been synced the OS will read the contents up until the flush from the flush buffer, and return that instead of what is actually on disk. If the OS crashes due to error or power failure, data that are flushed but not synced will be lost.To ensure the consistency of the database and rollback when needed, the database engine keeps a journal of what it is doing. Each transaction ends in a flush
followed by a sync
. The flush ensures that everything written before the flush will occur before the line that indicats the transaction completed. The sync ensures that the entire contents of the transaction has been written to permenant storage before proceeding. The OS is not obligated to write chunks of the database file in the order it recieves them. It only guarantees that if you flush everything before the flush happens first, and everything after happens next.
The sync is what is controlled by the synchronous
switch. Allowing the database to ignore whether sync actually completes is the magic that makes synchronous=0
so dang fast.
Each of these steps are performed in order. Suppose a crash occcured mid writing
When the database resumes it will start scanning the journal at step 1. Since it will reach the end without seeing End Transaction 1
it knows that data was only partialy written, and can put the data back in the state before transaction 1 began. This property of a database is called atomicity in the sense that something atomic is \"indivisible\"; either all of the steps in transaction 1 occur or non of them occur.
Hydrus is structured in such a way that the database is written to to keep track of your file catalog only once the file has been fully imported and moved where it is supposed to be. Thus every action hydrus takes is kept \"atomic\" or \"repeatable\" (redo existing work that was partway through). If hydrus crashes in the middle of importing a file, then when it resumes, as far as it is aware, it didn't even start importing the file. It will repeat the steps from the start until the file catalog is \"consistent\" with what is on disk.
"},{"location":"Understanding_Database_Synchronization.html#where_synchronization_comes_in","title":"Where synchronization comes in","text":"Lets revisit the journal, this time with two transactions. Note that the database is syncing on step 8 and thus will have to wait for the OS to write to disk before proceeding, holding up transaction 2, and any other access to the database.
What happens if we remove step 6 and 8 and then die at step 11?
What if we crash and step, End Transaction
has not been written to disk. Now not only do we need to repeat transaction 2, we also need to repeat transaction 1. Note that this just increaeses the ammount of repeatable work, and actually is fully recoverable (assuming a file you were downloading didn't cease to exist in the interim).
Now what happens if we do the above and the OS crashes? The OS is not obligated to write chunks of the database file in the order you give them to it, in fact for harddrives it is optimal to scatter chunks of the file around the spinning disks so it might arbitrarily reorder your write calls.
END Transaction
is to flush()
END Transaction
was written before doing more changes is to sync()
.Thus if the OS crashes at the exact wrong moment, there is no way to be sure that the journal is correct if flushing was skipped (synchronous=0
). This means there is no way for you to determine whether the database file is correct after a system crash if you had synchronous 0, and you MUST restore your files from backup as this will be the ONLY WAY to know they are in a known good state.
So, setting synchronous=0
gets you a pretty huge speed boost, but you are gambling that everything goes perfectly and will pay the price of a manual restore every time it doesn't.
The Hydrus docs are built with MkDocs using the Material for MkDocs theme. The .md files in the docs
directory are converted into nice html in the help
directory. This is done automatically in the built releases, but if you run from source, you will want to build your own.
To see or work on the docs locally, install mkdocs-material
:
The recommended installation method is pip
:
pip install mkdocs-material\n
"},{"location":"about_docs.html#building","title":"Building","text":"To build the help, run:
mkdocs build -d help\n
In the base hydrus directory (same as the mkdocs.yml
file), which will build it into the help
directory. You will then be good! Repeat the command and MkDocs will clear out the old directory and rebuild it, so you can fold this into any update script.
"},{"location":"about_docs.html#live_preview","title":"Live Preview","text":"To edit the docs
directory, you can run the live preview development server with:
mkdocs serve \n
Again in the base hydrus directory. It will host the help site at http://127.0.0.1:8000/, and when you change a file, it will automatically rebuild and reload the page in your browser.
"},{"location":"access_keys.html","title":"PTR access keys","text":"The PTR is now run by users with more bandwidth than I had to give, so the bandwidth limits are gone! If you would like to talk with the new management, please check the discord.
A guide and schema for the new PTR is here.
"},{"location":"access_keys.html#first_off","title":"first off","text":"I don't like it when programs I use connect anywhere without asking me, so I have purposely not pre-baked any default repositories into the client. You have to choose to connect yourself. The client will never connect anywhere until you tell it to.
For a long time, I ran the Public Tag Repository myself and was the lone janitor. It grew to 650 million tags, and siblings and parents were just getting complicated, and I no longer had the bandwidth or time it deserved. It is now run by users.
There also used to be just one user account that everyone shared. Everyone was essentially the same Anon, and all uploads were merged to that one ID. As the PTR became more popular, and more sophisticated and automatically generated content was being added, it became increasingly difficult for the janitors to separate good submissions from bad and undo large scale mistakes.
That old shared account is now a 'read-only' account. This account can only download--it cannot upload new tags or siblings/parents. Users who want to upload now generate their own individual accounts, which are still Anon, but separate, which helps janitors approve and deny uploaded petitions more accurately and efficiently.
I recommend using the shared read-only account, below, to start with, but if you decide you would like to upload, making your own account is easy--just click the 'check for automatic account creation' button in services->manage services, and you should be good. You can change your access key on an existing service--you don't need to delete and re-add or anything--and your client should quickly resync and recognise your new permissions.
"},{"location":"access_keys.html#privacy","title":"privacy","text":"I have tried very hard to ensure the PTR respects your privacy. Your account is a very barebones thing--all a server stores is a couple of random hexadecimal texts and which rows of content you uploaded, and even the memory of what you uploaded is deleted after a delay. The server obviously needs to be aware of your IP address to accept your network request, but it forgets it as soon as the job is done. Normal users are never told which accounts submitted any content, so the only privacy implications are against janitors or (more realistically, since the janitor UI is even more buggy and feature-poor than the hydrus front-end!) the server owner or anyone else with raw access to the server as it operates or its database files.
Most users should have very few worries about privacy. The general rule is that it is always healthy to use a VPN, but please check here for a full discussion and explanation of the anonymisation routine.
"},{"location":"access_keys.html#ssd","title":"a note on resources","text":"Danger
If you are on an HDD, or your SSD does not have at least 64GB of free space, do not add the PTR!
The PTR has been operating since 2011 and is now huge, more than a billion mappings! Your client will be downloading and indexing them all, which is currently (2021-06) about 6GB of bandwidth and 50GB of hard drive space. It will take hours of total processing time to catch up on all the years of submissions. Furthermore, because of mechanical drive latency, HDDs are too slow to process all the content in reasonable time. Syncing is only recommended if your hydrus db is on an SSD. Even then, it is healthier and allows the client to 'grow into' the PTR if the work is done in small pieces in the background, either during idle time or shutdown time, rather than trying to do it all at once. Just leave it to download and process on its own--it usually takes a couple of weeks to quietly catch up. You'll see tags appear on your files as it proceeds, first on older, then all the way up to new files just uploaded a couple days ago. Once you are synced, the daily processing work to stay synced is usually just a few minutes. If you leave your client on all the time in the background, you'll likely never notice it.
"},{"location":"access_keys.html#easy_setup","title":"easy setup","text":"Hit help->add the public tag repository and you will all be set up.
"},{"location":"access_keys.html#manually","title":"manually","text":"Hit services->manage services and click add->hydrus tag repository. You'll get a panel, fill it out like this:
Here's the info so you can copy it:
address
ptr.hydrus.network\n
port45871\n
access key4a285629721ca442541ef2c15ea17d1f7f7578b0c3f4f5f2a05f8f0ab297786f\n
Note that because this is the public shared key, you can ignore the 'DO NOT SHARE' red text warning.
It is worth checking the 'test address' and 'test access key' buttons just to double-check your firewall and key are all correct. Notice the 'check for automatic account creation' button, for if and when you decide you want to contribute to the PTR.
Then you can check your PTR at any time under services->review services, under the 'remote' tab:
"},{"location":"access_keys.html#quicksync","title":"jump-starting an install","text":"A user kindly manages a store of update files and pre-processed empty client databases to get your synced quicker. This is generally recommended for advanced users or those following a guide, but if you are otherwise interested, please check it out:
https://cuddlebear92.github.io/Quicksync/
"},{"location":"adding_new_downloaders.html","title":"adding new downloaders","text":""},{"location":"adding_new_downloaders.html#anonymous","title":"all downloaders are user-creatable and -shareable","text":"Since the big downloader overhaul, all downloaders can be created, edited, and shared by any user. Creating one from scratch is not simple, and it takes a little technical knowledge, but importing what someone else has created is easy.
Hydrus objects like downloaders can sometimes be shared as data encoded into png files, like this:
This contains all the information needed for a client to add a realbooru tag search entry to the list you select from when you start a new download or subscription.
You can get these pngs from anyone who has experience in the downloader system. An archive is maintained here.
To 'add' the easy-import pngs to your client, hit network->downloaders->import downloaders. A little image-panel will appear onto which you can drag-and-drop these png files. The client will then decode and go through the png, looking for interesting new objects and automatically import and link them up without you having to do any more. Your only further input on your end is a 'does this look correct?' check right before the actual import, just to make sure there isn't some mistake or other glaring problem.
Objects imported this way will take precedence over existing functionality, so if one of your downloaders breaks due to a site change, importing a fixed png here will overwrite the broken entries and become the new default.
"},{"location":"advanced.html","title":"general clever tricks","text":"this is non-comprehensive
I am always changing and adding little things. The best way to learn is just to look around. If you think a shortcut should probably do something, try it out! If you can't find something, let me know and I'll try to add it!
"},{"location":"advanced.html#advanced_mode","title":"advanced mode","text":"To avoid confusing clutter, several advanced menu items and buttons are hidden by default. When you are comfortable with the program, hit help->advanced mode to reveal them!
"},{"location":"advanced.html#exclude_deleted_files","title":"exclude deleted files","text":"In the client's options is a checkbox to exclude deleted files. It recurs pretty much anywhere you can import, under 'import file options'. If you select this, any file you ever deleted will be excluded from all future remote searches and import operations. This can stop you from importing/downloading and filtering out the same bad files several times over. The default is off. You may wish to have it set one way most of the time, but switch it the other just for one specific import or search.
"},{"location":"advanced.html#ime","title":"inputting non-english lanuages","text":"If you typically use an IME to input Japanese or another non-english language, you may have encountered problems entering into the autocomplete tag entry control in that you need Up/Down/Enter to navigate the IME, but the autocomplete steals those key presses away to navigate the list of results. To fix this, press Insert to temporarily disable the autocomplete's key event capture. The autocomplete text box will change colour to let you know it has released its normal key capture. Use your IME to get the text you want, then hit Insert again to restore the autocomplete to normal behaviour.
"},{"location":"advanced.html#tag_display","title":"tag display","text":"If you do not like a particular tag or namespace, you can easily hide it with tags->manage tag display and search:
This image is out of date, sorry!
You can exclude single tags, like as shown above, or entire namespaces (enter the colon, like 'species:'), or all namespaced tags (use ':'), or all unnamespaced tags (''). 'all known tags' will be applied to everything, as well as any repository-specific rules you set.
A blacklist excludes whatever is listed; a whitelist excludes whatever is not listed.
This censorship is local to your client. No one else will experience your changes or know what you have censored.
"},{"location":"advanced.html#importing_with_tags","title":"importing and adding tags at the same time","text":"Add tags before importing on file->import files lets you give tags to the files you import en masse, and intelligently, using regexes that parse filename:
This should be somewhat self-explanatory to anyone familiar with regexes. I hate them, personally, but I recognise they are powerful and exactly the right tool to use in this case. This is a good introduction.
Once you are done, you'll get something neat like this:
Which you can more easily manage by collecting:
Collections have a small icon in the bottom left corner. Selecting them actually selects many files (see the status bar), and performing an action on them (like archiving, uploading) will do so to every file in the collection. Viewing collections fullscreen pages through their contents just like an uncollected search.
Here is a particularly zoomed out view, after importing volume 2:
Importing with tags is great for long-running series with well-formatted filenames, and will save you literally hours' finicky tagging.
"},{"location":"advanced.html#tag_migration","title":"tag migration","text":"Danger
At some point I will write some better help for this system, which is powerful. Be careful with it!
Sometimes, you may wish to move thousands or millions of tags from one place to another. These actions are now collected in one place: services->tag migration.
It proceeds from left to right, reading data from the source and applying it to the destination with the certain action. There are multiple filters available to select which sorts of tag mappings or siblings or parents will be selected from the source. The source and destination can be the same, for instance if you wanted to delete all 'clothing:' tags from a service, you would pull all those tags and then apply the 'delete' action on the same service.
You can import from and export to Hydrus Tag Archives (HTAs), which are external, portable .db files. In this way, you can move millions of tags between two hydrus clients, or share with a friend, or import from an HTA put together from a website scrape.
Tag Migration is a powerful system. Be very careful with it. Do small experiments before starting large jobs, and if you intend to migrate millions of tags, make a backup of your db beforehand, just in case it goes wrong.
This system was once much more simple, but it still had HTA support. If you wish to play around with some HTAs, there are some old user-created ones here.
"},{"location":"advanced.html#shortcuts","title":"custom shortcuts","text":"Once you are comfortable with manually setting tags and ratings, you may be interested in setting some shortcuts to do it quicker. Try hitting file->shortcuts or clicking the keyboard icon on any media viewer window's top hover window.
There are two kinds of shortcuts in the program--reserved, which have fixed names, are undeletable, and are always active in certain contexts (related to their name), and custom, which you create and name and edit and are only active in a media viewer when you want them to. You can redefine some simple shortcut commands, but most importantly, you can create shortcuts for adding/removing a tag or setting/unsetting a rating.
Use the same 'keyboard' icon to set the current and default custom shortcuts.
"},{"location":"advanced.html#finding_duplicates","title":"finding duplicates","text":"system:similar_to lets you run the duplicates processing page's searches manually. You can either insert the hash and hamming distance manually, or you can launch these searches automatically from the thumbnail right-click->find similar files menu. For example:
"},{"location":"advanced.html#file_import_errors","title":"truncated/malformed file import errors","text":"Some files, even though they seem ok in another program, will not import to hydrus. This is usually because they file has some 'truncated' or broken data, probably due to a bad upload or storage at some point in its internet history. While sophisticated external programs can usually patch the error (often rendering the bottom lines of a jpeg as grey, for instance), hydrus is not so clever. Please feel free to send or link me, hydrus developer, to these files, so I can check them out on my end and try to fix support.
If the file is one you particularly care about, the easiest solution is to open it in photoshop or gimp and save it again. Those programs should be clever enough to parse the file's weirdness, and then make a nice clean saved file when it exports. That new file should be importable to hydrus.
"},{"location":"advanced.html#password","title":"setting a password","text":"the client offers a very simple password system, enough to keep out noobs. You can set it at database->set a password. It will thereafter ask for the password every time you start the program, and will not open without it. However none of the database is encrypted, and someone with enough enthusiasm or a tool and access to your computer can still very easily see what files you have. The password is mainly to stop idle snoops checking your images if you are away from your machine.
"},{"location":"advanced_multiple_local_file_services.html","title":"multiple local file services","text":"The client lets you store your files in different overlapping partitions. This can help management workflows and privacy.
"},{"location":"advanced_multiple_local_file_services.html#the_problem","title":"what's the problem?","text":"Most of us end up storing all sorts of things in our clients, often from different parts of our lives. With everything in the same 'my files' domain, some personal photos might be sitting right beside nsfw content, a bunch of wallpapers, and thousands of comic pages. Different processing jobs, like 'go through those old vidya screenshots I imported' and 'filter my subscription files' and 'load up my favourite pictures of babes' all operate on the same gigantic list of files and must be defined through careful queries of tags, ratings, and other file metadata to separate what you want from what you don't.
The problem is aggravated the larger your client grows. When you are trying to sift the 500 art reference images out 850,000 random internet files from the last ten years, it can be difficult getting good tag counts or just generally browsing around without stumbling across other content. This particularly matters when you are typing in search tags, since the tag you want, 'anatomy drawing guide', is going to come with thousands of others, starting 'a...', 'an...', and 'ana...' as you type. If someone is looking over your shoulder as you load up the images, you want to preserve your privacy.
Wouldn't it be nice if you could break your collection into separate areas?
"},{"location":"advanced_multiple_local_file_services.html#file_domains","title":"multiple file domains","text":"tl;dr: you can have more than one 'my files', add them in 'manage services'.
A file domain (or file service) in the hydrus context, is, very simply, a list of files. There is a bit of extra metadata like the time each file was imported to the domain, and a ton of behind the scenes calculation to accelerate searching and aggregate autocomplete tag counts and so on, but overall, when you search in 'my files', you are telling the client \"find all the files in this list that have tag x, y, z on any tag domain\". If you switch to searching 'trash', you are then searching that list of trashed files.
A search page's tag domain is similar. Normally, you will be set to 'all known tags', which is basically the union of all your tag services, but if you need to, you can search just 'my tags' or 'PTR', which will make your search \"find all the files in my files that have tag x, y, z on my tags\". You are setting up an intersection of a file and a tag domain.
Changing the tag domain to 'PTR' or 'all known tags' would make for a different blue circle with a different intersection of search results ('PTR' probably has a lot more 'pretty dress', although maybe not for your files, and 'all known tags', being the union of all the blue circles, will make the same or larger intersection).
This idea of dynamically intersecting domains is very important to hydrus. Each service stands on its own, and the 'my tags' domain is not linked to 'my files'. It does not care where its tagged files are. When you delete a file, no tags are changed. But when you delete a file, the 'file domain' circle will shrink, and that may change the search results in the intersection.
With multiple local file services, you can create new file lists beyond 'my files', letting you make different red circles. You can move and copy files between your local file domains to make new sub-collections and search them separately for a very effective filter.
You can add and remove them under services->manage services:
"},{"location":"advanced_multiple_local_file_services.html#sfw","title":"what does this actually mean?","text":"I think the best simple idea for most regular users is to try a sfw/nsfw split. Make a new 'sfw' local file domain and start adding some images to it. You might eventualy plan to send all your sfw images there, or just your 'IRL' stuff like family photos, but it will be a separate area for whitelisted safe content you are definitely happy for others to glance at.
Search up some appropriate images in your collection and then add them to 'sfw':
This 'add' command is a copy. The files stay in 'my files', but they also go to 'sfw'. You still only have one file on your hard drive, but the database has its identifier in both file lists. Now make a new search page, switch it to 'sfw', and try typing in a search.
The tag results are limited to the files we added to 'sfw'. Nothing from 'my files' bleeds over. The same is true of a file search. Note the times the file was added to 'my files' and 'sfw' are both tracked.
Also note that these files now have two 'delete' commands. You will be presented with more complicated delete and undelete dialogs for files in multiple services. Files only end up in the trash when they are no longer in any local file domain.
You can be happy that any search in this new domain--for tags or files--is not going to provide any unexpected surprises. You can also do 'system:everything', 'system:limit=64' for a random sample, or any other simple search predicate for browsing, and the search should run fast and safe.
If you want to try multiple local file services out, I recommend this split to start off. If you don't like it, you can delete 'sfw' later with no harm done.
Note
While 'add to y' copies the files, 'move from x to y' deletes the files from the original location. They get a delete timestamp (\"deleted from my files 5 minutes ago\"), and they can be undeleted or 'added' back, and they will get their old import timestamp back.
"},{"location":"advanced_multiple_local_file_services.html#using_it","title":"using it","text":"The main way to add and move files around is the thumbnail/media viewer right-click menu.
You can make shortcuts for the add/move operations too. Check file->shortcuts and then the 'media actions' set.
In the future, I expect to have more ways to move files around, particularly integration into the archive/delete filter, and ideally a 'file migration' system that will allow larger operations such as 'add all the files in search x to place y'.
I also expect to write a system to easily merge clients together. Several users already run several different clients to get their 'my files' separation (e.g. a sfw client and a nsfw client), and now we have this tech supported in one client, it makes a lot of efficiency sense to merge them together.
Note that when you select a file domain, you can select 'multiple locations'. This provides the union of whichever domains you like. Tag counts will be correct but imprecise, often something like 'blonde hair (2-5)', meaning 'between two and five files', due to the complexity of quickly counting within these complicated domains.
As soon as you add another local file service, you will also see a 'all my files' service listed in the file domain selector. This is a virtual service that provides a very efficient and accurate search space of the union of all your local file domains.
This whole system is new. I will keep working on it, including better 'at a glance' indications of which files are where (current thoughts are custom thumbnail border colours and little indicator icons). Let me know how you get on with it!
"},{"location":"advanced_multiple_local_file_services.html#meta_file_domains","title":"advanced: a word on the meta file domains","text":"If you are in help->advanced mode, your file search file domain selectors will see 'all known files'. This domain is similar to 'all known tags', but it is not useful for normal browsing. It represents not filtering your tag services by any file list, fetching all tagged file results regardless of what your client knows about them.
If you search 'all known files'/'PTR', you can search all the files the PTR knows about, the vast majority of which you will likely never import. The client will show these files with a default hydrus thumbnail and offer very limited information about them. For file searches, this search domain is only useful for debug and janitorial purposes. You cannot combine 'all known files' with 'all known tags'. It also has limited sibling/parent support.
You can search for deleted files under 'multiple domains' too. These may or may not still be in your client, so they might get the hydrus icon again. You won't need to do this much, but it can be super useful for some maintenance operations like 'I know I deleted this file by accident, what was its URL so I can find it again?'.
Another service is 'all local files'. This is a larger version of 'all my files'. It essentially means 'all the files on your hard disk', which strictly means the union of all the files in your local file domains ('my files' and any others you create, i.e. the 'all my files' domain), 'repository updates' (which stores update files for hydrus repository sync), and 'trash'. This search can be useful for some advanced maintenance jobs.
If you select 'repository updates' specifically, you can inspect this advanced domain, but I recommend you not touch it! Otherwise, if you search 'all local files', repository files are usually hidden from view.
Your client looks a bit like this:
graph TB\n A[all local files] --- B[repository updates]\n A --- C[all my files]\n C --- D[local file domains]\n A --- E[trash]
Repository files, your media, and the trash are actually mutually exclusive. When a file is imported, it is added to 'all local files' and either repository updates or 'all my files' and one or more local file domains. When it is deleted from all of those, it is taken from 'all my files' and moved to trash. When trashed files are cleared, the files are removed from 'trash' and then 'all local files' and thus your hard disk.
"},{"location":"advanced_multiple_local_file_services.html#advanced","title":"more advanced usage","text":"Warning
Careful! It is easy to construct a massively overcomplicated Mind Palace here that won't actually help you due to the weight of overhead. If you want to categorise things, tags are generally better. But if you do want strict search separations for speed, workflow, or privacy, try this out.
If you put your files through several layers of processing, such as inbox/archive->tags->rating
, it might be helpful to create different file domains for each step. I have seen a couple of proposals like this that I think make sense:
graph LR\n A[inbox] --> B[sfw processing]\n A --> C[nsfw processing]\n B --> D[sfw archive]\n C --> E[nsfw archive]
Where the idea would be to make the 'is this sfw/nsfw?' choice early, probably at the same time as archive/delete, and splitting files off to either side before doing tagging and rating. I expect to expand the 'archive/delete' filter to support more actions soon to help make these workflows easy.
File Import Options allows you to specify which service it will import to. You can even import to multiple, although that is probably a bit much. If your inbox filters are overwhelming you--or each other--you might like to have more than one 'landing zone' for your files:
graph LR\n A[subscription and gallery inbox] --> B[archive]\n B --- C[sfw]\n D[watcher inbox] --> B\n E[hard drive inbox] --> B\n F[that zip of cool architecture photos] --> C
Some users have floated the idea of storing your archive on one drive and the inbox on another. This makes a lot of sense for network storage situations--the new inbox could be on a local disk, but the less-accessed archive on cheap network storage. File domains would be a great way to manage this in future, turning the workflow into nice storage commands.
Another likely use of this in future is in the Client API, when sharing with others. If you were to put the files you wanted to share in a file domain, and the Client API were set up to search just on that domain, this would guarantee great privacy. I am still thinking about this, and it may ultimately end up just being something that works that way behind the scenes.
graph LR\n A[inbox] --> B[19th century fishman conspiracy theory evidence]\n A --> C[the mlp x sonic hyperplex]\n A --> D[extremely detailed drawings of hands and feet]\n A --> E[normal stuff]\n E --- F[share with dave]
"},{"location":"advanced_parents.html","title":"tag parents","text":"Tag parents let you automatically add a particular tag every time another tag is added. The relationship will also apply retroactively.
"},{"location":"advanced_parents.html#the_problem","title":"what's the problem?","text":"Tags often fall into certain heirarchies. Certain tags always imply other tags, and it is annoying and time-consuming to type them all out individually every time.
As a basic example, a car
is a vehicle
. It is a subset. Any time you see a car, you also see a vehicle. Similarly, a rifle
is a firearm
, face tattoo
implies tattoo
, and species:pikachu
implies species:pok\u00e9mon
which also implies series:pok\u00e9mon
.
Another way of thinking about this is considering what you would expect to see when you search these terms. If you search vehicle
, you would expect the result to include all cars
. If you search series:league of legends
, you would expect to see all instances of character:ahri
(even if, on rare occasion, she were just appearing in cameo or in a crossover).
For hydrus terms, character x is in series y
is a common relationship, as is costume x is of character y
:
graph TB\n C[series:metroid] --- B[character:samus aran] --- A[character:zero suit samus]
In this instance, anything with character:zero suit samus
would also have character:samus aran
. Anything with character:samus aran
(and thus anything with character:zero suit samus
) would have series:metroid
.
Remember that the reverse is not true. Samus comes inextricably from Metroid, but not everything Metroid is Samus (e.g. a picture of just Ridley).
Even a small slice of these relationships can get complicated:
graph TB\n A[studio:blizzard entertainment]\n A --- B[series:overwatch]\n B --- B1[character:dr. angela 'mercy' ziegler]\n B1 --- B1b[character:pink mercy]\n B1 --- B1c[character:witch mercy]\n B --- B2[character:hana 'd.va' song]\n B2 --- B2b[\"character:d.va (gremlin)\"]\n A --- C[series:world of warcraft]\n C --- C1[character:jaina proudmoore]\n C1 --- C1a[character:dreadlord jaina]\n C --- C2[character:sylvanas windrunner]
Some franchises are bananas:
Also, unlike siblings, which as we previously saw are n->1
, some tags have more than one implication (n->n
):
graph TB\n A[adjusting clothes] --- B[adjusting swimsuit]\n C[swimsuit] --- B
adjusting swimsuit
implies both a swimsuit
and adjusting clothes
. Consider how adjusting bikini
might fit on this chart--perhaps this:
graph TB\n A[adjusting clothes] --- B[adjusting swimsuit]\n A --- E[adjusting bikini]\n C[swimsuit] --- B\n F[bikini] --- E\n D[swimwear] --- C\n D --- F
Note this is not a loop--like with siblings, loops are not allowed--this is a family tree with three 'generations'. adjusting bikini
is a child to both bikini
and adjusting clothes
, and bikini
is a child to the new swimwear
, which is also a parent to swimsuit
. adjusting bikini
and adjusting swimsuit
are both grandchildren to swimwear
.
This can obviously get as complicated and over-engineered as you like, but be careful of being too confident. Reasonable people disagree on what is 'clearly' a parent or sibling, or what is an excessive level of detail (e.g. person:scarlett johansson
may be gender:female
, if you think that useful, but species:human
, species:mammal
, and species:animal
may be going a little far). Beyond its own intellectual neatness, ask yourself the purpose of what you are creating.
Of course you can create any sort of parent tags on your local tags or your own tag repositories, but this sort of thing can easily lead to arguments between reasonable people on a shared server like the PTR.
Just like with normal tags, try not to create anything 'perfect' or stray away from what you actually search with, as it usually ends up wasting time. Act from need, not toward purpose.
"},{"location":"advanced_parents.html#tag_parents","title":"tag parents","text":"Let's define the child-parent relationship 'C->P' as saying that tag P is the semantic superset/superclass of tag C. All files that have C should also have P, without exception.
Any file that has C should appear to have P. Any search for P will include all of C implicitly.
Tags can have multiple parents, and multiple tags have the same parent. Loops are not allowed.
Note
In hydrus, tag parents are virtual. P is not actually added to every file by C, it just appears as if it is. When you look at a file in manage tags, you will see the implication, just like you see how tags will be renamed by siblings, but you won't see the parent unless it actually happens to also be there as a 'hard' tag. If you remove a C->P
parent relationship, all the implied P tags will disappear!
It also takes a bunch of CPU to figure this stuff out. Please bear with this system, sometimes it can take time.
"},{"location":"advanced_parents.html#how_to_do_it","title":"how you do it","text":"Go to tags->manage tag parents:
Which looks and works just like the manage tag siblings dialog.
Note that when you hit ok, the client will look up all the files with all your added tag Cs and retroactively apply/pend the respective tag Ps if needed. This could mean thousands of tags!
Once you have some relationships added, the parents and grandparents will show indented anywhere you 'write' tags, such as the manage tags dialog:
"},{"location":"advanced_parents.html#remote_parents","title":"remote parents","text":"Whenever you add or remove a tag parent pair to a tag repository, you will have to supply a reason (like when you petition a tag). A janitor will review this petition, and will approve or deny it. If it is approved, all users who synchronise with that tag repository will gain that parent pair. If it is denied, only you will see it.
"},{"location":"advanced_parents.html#parent_favourites","title":"parent 'favourites'","text":"As you use the client, you will likely make several processing workflows to archive/delete your different sorts of imports. You don't always want to go through things randomly--you might want to do some big videos for a bit, or focus on a particular character. A common search page is something like [system:inbox, creator:blah, limit:256]
, which will show a sample of a creator in your inbox, so you can process just that creator. This is easy to set up and save in your favourite searches and quick to run, so you can load it up, do some archive/delete, and then dismiss it without too much hassle.
But what happens if you want to search for multiple creators? You might be tempted to make a large OR search predicate, like creator:aaa OR creator:bbb OR creator:ccc OR creator:ddd
, of all your favourite creators so you can process them together as a 'premium' group. But if you want to add or remove a creator from that long OR, it can be cumbersome. And OR searches can just run slow sometimes. One answer is to use the new tag parents tools to apply a 'favourite' parent on all the artists and then search for that favourite.
Let's assume you want to search bunch of 'creator' tags on the PTR. What you will do is:
Under tags->manage tag parents, on your 'my parent favourites' service, add:
creator:aaa->favourite:aesthetic art
creator:bbb->favourite:aesthetic art
creator:ccc->favourite:aesthetic art
creator:ddd->favourite:aesthetic art
Watch/wait a few seconds for the parents to apply across the PTR for those creator tags.
Then save a new favourite search of [system:inbox, favourite:aesthetic art, limit:256]
. This search will deliver results with any of the child 'creator' tags, just like a big OR search, and real fast!
If you want to add or remove any creators to the 'aesthetic art' group, you can simply go back to tags->manage tag parents, and it will apply everywhere. You can create more umbrella/group tags if you like (and not just creators--think about clothing, or certain characters), and also use them in regular searches when you just want to browse some cool files.
"},{"location":"advanced_siblings.html","title":"tag siblings","text":"Tag siblings let you replace a bad tag with a better tag.
"},{"location":"advanced_siblings.html#the_problem","title":"what's the problem?","text":"Reasonable people often use different words for the same things.
A great example is in Japanese names, which are natively written surname first. character:ayanami rei
and character:rei ayanami
have the same meaning, but different users will use one, or the other, or even both.
Other examples are tiny syntactic changes, common misspellings, and unique acronyms:
A particular repository may have a preferred standard, but it is not easy to guarantee that all the users will know exactly which tag to upload or search for.
After some time, you get this:
Without continual intervention by janitors or other experienced users to make sure y\u2287x (i.e. making the yellow circle entirely overlap the blue by manually giving y to everything with x), searches can only return x (blue circle) or y (yellow circle) or x\u2229y (the lens-shaped overlap). What we really want is x\u222ay (both circles).
So, how do we fix this problem?
"},{"location":"advanced_siblings.html#tag_siblings","title":"tag siblings","text":"Let's define a relationship, A->B, that means that any time we would normally see or use tag A or tag B, we will instead only get tag B:
Note that this relationship implies that B is in some way 'better' than A.
"},{"location":"advanced_siblings.html#more_complicated","title":"ok, I understand; now confuse me","text":"This relationship is transitive, which means as well as saying A->B
, you can also say B->C
, which implies A->C
and B->C
.
graph LR\n A[lena_oxton] --> B[lena oxton] --> C[character:tracer];
In this case, everything with 'lena_oxton' or 'lena oxton' will show 'character:tracer' instead.
You can also have an A->C
and B->C
that does not include A->B
.
graph LR\n A[d.va] --> C[character:hana 'd.va' song]\n B[hana song] --> C
The outcome of these two arrangements is the same--everything ends up as C.
Many complicated arrangements are possible (and inevitable, as we try to merge many different communities' ideal tags):
graph LR\n A[angela_ziegler] --> B[angela ziegler] --> I[character:dr. angela 'mercy' ziegler]\n C[\"angela_ziegler_(overwatch)\"] --> B\n D[character:mercy] --> I\n E[\"character:mercy (overwatch)\"] --> I\n F[dr angela ziegler] --> I\n G[\"character:\u30de\u30fc\u30b7\u30fc\uff08\u30aa\u30fc\u30d0\u30fc\u30a6\u30a9\u30c3\u30c1\uff09\"] --> E\n H[overwatch mercy] --> I
Note that if you say A->B
, you cannot also say A->C
. This is an n->1
relationship. Many things can point to a single ideal, but a tag cannot have more than one ideal. Also, obviously, these graphs are non-cyclic--no loops.
Just open tags->manage tag siblings, and add a few.
The client will automatically collapse the tagspace to whatever you set. It'll even work with autocomplete, like so:
Please note that siblings' autocomplete counts may be slightly inaccurate, as unioning the count is difficult to quickly estimate.
The client will not collapse siblings anywhere you 'write' tags, such as the manage tags dialog. You will be able to add or remove A as normal, but it will be written in some form of \"A (B)\" to let you know that, ultimately, the tag will end up displaying in the main gui as B:
Although the client may present A as B, it will secretly remember A! You can remove the association A->B, and everything will return to how it was. No information is lost at any point.
"},{"location":"advanced_siblings.html#remote_siblings","title":"remote siblings","text":"Whenever you add or remove a tag sibling pair to a tag repository, you will have to supply a reason (like when you petition a tag). A janitor will review this petition, and will approve or deny it. If it is approved, all users who synchronise with that tag repository will gain that sibling pair. If it is denied, only you will see it.
"},{"location":"advanced_sidecars.html","title":"sidecars","text":"Sidecars are files that provide additional metadata about a master file. They typically share the same basic filename--if the master is 'Image_123456.jpg', the sidecar will be something like 'Image_123456.txt' or 'Image_123456.jpg.json'. This obviously makes it easy to figure out which sidecar goes with which file.
Hydrus does not use sidecars in its own storage, but it can import data from them and export data to them. It currently supports raw data in .txt files and encoded data in .json files, and that data can be either tags or URLs. I expect to extend this system in future to support XML and other metadata types such as ratings, timestamps, and inbox/archive status.
We'll start with .txt, since they are simpler.
"},{"location":"advanced_sidecars.html#importing_sidecars","title":"Importing Sidecars","text":"Imagine you have some jpegs you downloaded with another program. That program grabbed the files' tags somehow, and you want to import the files with their tags without messing around with the Client API.
If your extra program can export the tags to a simple format--let's say newline-separated .txt files with the same basic filename as the jpegs, or you can, with some very simple scripting, convert to that format--then importing them to hydrus is easy!
Put the jpegs and the .txt files in the same directory and then drag and drop the directory onto the client, as you would for a normal import. The .txt files should not be added to the list. Then click 'add tags/urls with the import'. The sidecars are managed on one of the tabs:
This system can get quite complicated, but the essential idea is that you are selecting one or more sidecar sources
, parsing their text, and sending that list of data to one hydrus service destination
. Most of the time you will be pulling from just one sidecar at a time.
The source
is a description of a sidecar to load and how to read what it contains.
In this example, the texts are like so:
4e01850417d1978e6328d4f40c3b550ef582f8558539b4ad46a1cb7650a2e10b.jpg.txtflowers\nlandscape\nblue sky\n
5e390f043321de57cb40fd7ca7cf0cfca29831670bd4ad71622226bc0a057876.jpg.txtfast car\nanime girl\nnight sky\n
Since our sidecars in this example are named (filename.ext).txt, and use newlines as the separator character, we can leave things mostly as default.
If you do not have newline-separated tags, for instance comma-separated tags (flowers, landscape, blue sky
), then you can set that here. Be careful if you are making your own sidecars, since any separator character obviously cannot be used in tag text!
If your sidecars are named (filename).txt instead of (filename.ext).txt, then just hit the checkbox, but if the conversion is more complicated, then play around with the filename string converter and the test boxes.
If you need to, you can further process the texts that are loaded. They'll be trimmed of extra whitespace and so on automatically, so no need to worry about that, but if you need to, let's say, add the creator:
prefix to everything, or filter out some mis-parsed garbage, this is the place.
A 'Router' is a single set of orders to grab from one or more sidecars and send to a destination. You can have several routers in a single import or export context.
You can do more string processing here, and it will apply to everything loaded from every sidecar.
The destination is either a tag service (adding the loaded strings as tags), or your known URLs store.
"},{"location":"advanced_sidecars.html#previewing","title":"Previewing","text":"Once you have something set up, you can see the results are live-loaded in the dialog. Make sure everything looks all correct, and then start the import as normal and you should see the tags or URLs being added as the import works.
It is good to try out some simple situations with one or two files just to get a feel for the system.
"},{"location":"advanced_sidecars.html#import_folders","title":"Import Folders","text":"If you have a constant flow of sidecar-attached media, then you can add sidecars to Import Folders too. Do a trial-run of anything you want to parse with a manual import before setting up the automatic system.
"},{"location":"advanced_sidecars.html#exporting_sidecars","title":"Exporting Sidecars","text":"The rules for exporting are similar, but now you are pulling from one or more hydrus service sources
and sending to a single destination
sidecar every time. Let's look at the UI:
I have chosen to select these files' URLs and send them to newline-separated .urls.txt files. If I wanted to get the tags too, I could pull from one or more tag services, filter and convert the tags as needed, and then output to a .tags.txt file.
The best way to learn with this is just to experiment. The UI may seem intimidating, but most jobs don't need you to work with multiple sidecars or string processing or clever filenames.
"},{"location":"advanced_sidecars.html#json_files","title":"JSON Files","text":"JSON is more complicated than .txt. You might have multiple metadata types all together in one file, so you may end up setting up multiple routers that parse the same file for different content, or for an export you might want to populate the same export file with multiple kinds of content. Hydrus can do it!
"},{"location":"advanced_sidecars.html#importing","title":"Importing","text":"Since JSON files are richly structured, we will have to dip into the Hydrus parsing system:
If you have made a downloader before, you will be familiar with this. If not, then you can brave the help or just have a play around with the UI. In this example, I am getting the URL(s) of each JSON file, which are stored in a list under the file_info_urls
key.
It is important to paste an example JSON file that you want to parse into the parsing testing area (click the paste button) so you can test on read data live.
Once you have the parsing set up, the rest of the sidecar UI is the same as for .txt. The JSON Parsing formula is just the replacement/equivalent for the .txt 'separator' setting.
Note that you could set up a second Router to import the tags from this file!
"},{"location":"advanced_sidecars.html#exporting","title":"Exporting","text":"In Hydrus, the exported JSON is typically a nested Object with a similar format as in the Import example. You set the names of the Object keys.
Here I have set the URLs of each file to be stored under metadata->urls
, which will make this sort of structure:
{\n \"metadata\" : {\n \"urls\" : [\n \"http://example.com/123456\",\n \"https://site.org/post/45678\"\n ]\n }\n}\n
The cool thing about JSON files is I can export multiple times to the same file and it will update it! Lets say I made a second Router that grabbed the tags, and it was set to export to the same filename but under metadata->tags
. The final sidecar would look like this:
{\n \"metadata\" : {\n \"tags\" : [\n \"blonde hair\",\n \"blue eyes\",\n \"skirt\"\n ],\n \"urls\" : [\n \"http://example.com/123456\",\n \"https://site.org/post/45678\"\n ]\n }\n}\n
You should be careful that the location you are exporting to does not have any old JSON files with conflicting filenames in it--hydrus will update them, not overwrite them! This may be an issue if you have an synchronising Export Folder that exports random files with the same filenames.
"},{"location":"advanced_sidecars.html#note_on_notes","title":"Note on Notes","text":"You can now import/export notes with your sidecars. Since notes have two variables--name and text--but the sidecars system only supports lists of single strings, I merge these together! If you export notes, they will output in the form 'name: text'. If you want to import notes, arrange them in the same form, 'name: text'.
If you do need to select a particular note out of many, see if a String Match (regex ^name:
) in the String Processor will do it.
If you need to work with multiple notes that have newlines, I recommend you use JSON rather than txt. If you have to use txt on multiple multi-paragraph-notes, then try a different separator than newline. Go for ||||
or something, whatever works for your job.
Depending on how awkward this all is, I may revise it.
"},{"location":"after_disaster.html","title":"Recovering After Disaster","text":""},{"location":"after_disaster.html#you_just_had_a_database_problem","title":"you just had a database problem","text":"I have helped quite a few users recover a mangled database from disk failure or accidental deletion. You just had similar and have been pointed here. This is a simple spiel on the next step that I, hydev, like to give people once we are done.
"},{"location":"after_disaster.html#what_next","title":"what next?","text":"When I was younger, I lost a disk with about 75,000 curated files. It really sucks to go through, and whether you have only had a brush with death or lost tens or hundreds of thousands of files, I know exactly how you have been feeling. The only thing you can change now is the future. Let's make sure it does not happen again.
The good news is the memory of that sinking 'oh shit' feeling is a great motivator. You don't want to feel that way again, so use that to set up and maintain a proper backup regime. If you have a good backup, the worst case scenario, even if your whole computer blows up, is usually just a week's lost work.
So, plan to get a good external USB drive and figure out a backup script and a reminder to ensure you never forget to run it. Having a 'backup day' in your schedule works well, and you can fold in other jobs like computer updates and restarts at the same time. It takes a bit of extra 'computer budget' every year and a few minutes a week, but it is absolutely worth the peace of mind it brings.
Here's the how to backup help, if you want to revisit it. If you would like help setting up FreeFileSync or ToDoList or other similar software, let me know.
This is also a great time to think about backing up other things in your life. All of your documents, family photos, your password manager file--are they backed up? Would you be ok with losing them if their drive failed tomorrow? Movies and music will need a real drive, but your smaller things like documents can also fit on an (encrypted) USB stick that you can put in your wallet or keychain.
"},{"location":"changelog.html","title":"changelog","text":"Note
This is the new changelog, only the most recent builds. For all versions, see the old changelog.
"},{"location":"changelog.html#version_552","title":"Version 552","text":""},{"location":"changelog.html#misc","title":"misc","text":"false
for a while, until the file maintenance catches up/manage_database/get_client_options
call that fetches a heap of different client options. this exposes a mess that may change with any update, but there may be something neat you can hook into. this week we fixed a thing that was breaking this call for probably all old clientssystem:date
predicates were displaying labels an hour off (usually midnight -> 11pm, thus cycling back to the previous day) thanks to the clocks changed (in the USA) last weekend. I suspect there is more of this, here and there, so let me know what you see(t)est
Qt version in the 'setup_venv' now points to this. it seems fine to me on a fairly normal Win 11 machine, but if recent history is any guide, there's going to be a niggle somewhere. if you have been waiting for a fix on the menu position issue or anything else, give it a go! if things go well, I'll roll this into a larger 'future' test release and then we'll integrate it into main(w)rite
your own version in!distutils
, and thus should now be compatible (or less incompatible, let's see, ha ha) with python 3.12. thanks for the user report and assistance hereauto_update_installer.bat
, to the main install directory. it will download the latest Windows exe installer using winget and install it to the current location. if you use the installer, you might want to experiment with it (make a backup first!) as an easy hands-free update solution. let me know how it goes, and if there are no problems in a couple of weeks, I'll add it to the helpversion
and hydrus_version
in every JSON Client API response. CBOR responses are not affected. if you need to hook into these numbers for a completely stateless interface, it is now super convenient. I'm not delighted with the spamminess of this, but it is just a handful of characters and it adds value for several situations, so I'm willing to try it outHydrusImageMetadata
fileHydrusBlurhash
fileHydrusImageNormalisation
fileHydrusImageColours
fileOPENCV_OK
fallback code, which was only used, superfluously, in a couple of final places. OpenCV is not optional to run hydrus, server or clientfile_metadata
call now says the new blurhash. if you pipe it into a blurhash library and blow it up to an appopriate ratio canvas, it should just work. the typical use is as a placeholder while you wait for thumbs/files to downloadinclude_blurhash
parameter will include the blurhash when only_return_basic_information
is truefile_metadata
also shows the file's pixel_hash
now. the algorithm here is proprietary to hydrus, but you can throw it into 'system:similar files' to find pixel dupes. I expect to add perceptual hashes tooPillow
library, which also rolled out a fix. I'm not sure how vulnerable hydrus ever was, since we are usually jank about how we do anything, but best to be safe about these things. there were apparently exploits for this floating aroundPillow
migrate database
dialog now allows you to set a 'max size' for all but one of your media locations. if you have a 500GB drive you want to store some stuff on, you no longer have to balance the weights in your head--just set a max size of 450GB and hydrus will figure it out for you. it is not super precise (and it isn't healthy to fill drives up to 98% anyway), so make sure you leave some padding/get_files/render
command, which gives you a 100% zoom png render of the given file. useful if you want to display a PSD on a web page!/get_files/search_files
, the help talks about it. He also cancels his work early if the request is terminated/add_tags/get_siblings_and_parents
now properly cleans the tags you give it, trimming whitespace and lowercasing letters and so ondateparser
library. all old 'datestring to timestamp' rules remain as they are, but are now called '(advanced)'. a new option, 'datestring to timestamp (easy)', which has exactly zero variables to fiddle with, just eats up pretty much any date string you can think of, including timezone conversions, and even stuff like '2 hours ago'. you need the dateparser library for this to work, so if you run from source, you might like to rebuild your venv this week. your dateparser
import status is in help->aboutThe hydrus client now supports a very simple API so you can access it with external programs.
"},{"location":"client_api.html#enabling_the_api","title":"Enabling the API","text":"By default, the Client API is not turned on. Go to services->manage services and give it a port to get it started. I recommend you not allow non-local connections (i.e. only requests from the same computer will work) to start with.
The Client API should start immediately. It will only be active while the client is open. To test it is running all correct (and assuming you used the default port of 45869), try loading this:
http://127.0.0.1:45869
You should get a welcome page. By default, the Client API is HTTP, which means it is ok for communication on the same computer or across your home network (e.g. your computer's web browser talking to your computer's hydrus), but not secure for transmission across the internet (e.g. your phone to your home computer). You can turn on HTTPS, but due to technical complexities it will give itself a self-signed 'certificate', so the security is good but imperfect, and whatever is talking to it (e.g. your web browser looking at https://127.0.0.1:45869) may need to add an exception.
The Client API is still experimental and sometimes not user friendly. If you want to talk to your home computer across the internet, you will need some networking experience. You'll need a static IP or reverse proxy service or dynamic domain solution like no-ip.org so your device can locate it, and potentially port-forwarding on your router to expose the port. If you have a way of hosting a domain and have a signed certificate (e.g. from Let's Encrypt), you can overwrite the client.crt and client.key files in your 'db' directory and HTTPS hydrus should host with those.
Once the API is running, go to its entry in services->review services. Each external program trying to access the API will need its own access key, which is the familiar 64-character hexadecimal used in many places in hydrus. You can enter the details manually from the review services panel and then copy/paste the key to your external program, or the program may have the ability to request its own access while a mini-dialog launched from the review services panel waits to catch the request.
"},{"location":"client_api.html#tools_created_by_hydrus_users","title":"Tools created by hydrus users","text":""},{"location":"client_api.html#browser_add-on","title":"Browser Add-on","text":"I welcome all your bug reports, questions, ideas, and comments. It is always interesting to see how other people are using my software and what they generally think of it. Most of the changes every week are suggested by users.
You can contact me by email, twitter, discord, or the release threads on 8chan or Endchan--I do not mind which. Please know that I have difficulty with social media, and while I try to reply to all messages, it sometimes takes me a while to catch up.
If you need it, here's my public GPG key.
The Github Issue Tracker was turned off for some time, as it did not fit my workflow and I could not keep up, but it is now running again, managed by a team of volunteer users. Please feel free to submit feature requests there if you are comfortable with Github. I am not socially active on Github, please do not ping me there.
I am on the discord on Saturday afternoon, USA time, if you would like to talk live, and briefly on Wednesday after I put the release out. If that is not a good time for you, please leave me a DM and I will get to you when I can. There are also plenty of other hydrus users who idle who can help with support questions.
I delete all tweets and resolved email conversations after three months. So, if you think you are waiting for a reply, or I said I was going to work on something you care about and seem to have forgotten, please do nudge me.
I am always overwhelmed by work and behind on my messages. This is not to say that I do not enjoy just hanging out or talking about possible new features, but forgive me if some work takes longer than expected or if I cannot get to a particular idea quickly. In the same way, if you encounter actual traceback-raising errors or crashes, there is only one guy to fix it, so I prefer to know ASAP so I can prioritise.
I work by myself because I have acute difficulty working with others. Please do not spontaneously write long design documents or prepare other work for me--I find it more stressful than helpful, every time, and I won't give it the attention it deserves. If you would like to contribute time to hydrus, the user projects like the downloader repository and wiki help guides always have things to do.
That said:
Warning
I am working on this system right now and will be moving the 'move files now' action to a more granular, always-on background migration. This document will update to reflect those changes!
"},{"location":"database_migration.html#database_migration","title":"database migration","text":""},{"location":"database_migration.html#intro","title":"the hydrus database","text":"A hydrus client consists of three components:
the software installation
This is the part that comes with the installer or extract release, with the executable and dlls and a handful of resource folders. It doesn't store any of your settings--it just knows how to present a database as a nice application. If you just run the hydrus_client executable straight, it looks in its 'db' subdirectory for a database, and if one is not found, it creates a new one. If it sees a database running at a lower version than itself, it will update the database before booting it.
It doesn't really matter where you put this. An SSD will load it marginally quicker the first time, but you probably won't notice. If you run it without command-line parameters, it will try to write to its own directory (to create the initial database), so if you mean to run it like that, it should not be in a protected place like Program Files.
the actual SQLite database
The client stores all its preferences and current state and knowledge about files--like file size and resolution, tags, ratings, inbox status, and so on and on--in a handful of SQLite database files, defaulting to install_dir/db. Depending on the size of your client, these might total 1MB in size or be as much as 10GB.
In order to perform a search or to fetch or process tags, the client has to interact with these files in many small bursts, which means it is best if these files are on a drive with low latency. An SSD is ideal, but a regularly-defragged HDD with a reasonable amount of free space also works well.
your media files
All of your jpegs and webms and so on (and their thumbnails) are stored in a single complicated directory that is by default at install_dir/db/client_files. All the files are named by their hash and stored in efficient hash-based subdirectories. In general, it is not navigable by humans, but it works very well for the fast access from a giant pool of files the client needs to do to manage your media.
Thumbnails tend to be fetched dozens at a time, so it is, again, ideal if they are stored on an SSD. Your regular media files--which on many clients total hundreds of GB--are usually fetched one at a time for human consumption and do not benefit from the expensive low-latency of an SSD. They are best stored on a cheap HDD, and, if desired, also work well across a network file system.
Although an initial install will keep these parts together, it is possible to, say, run the SQLite database on a fast drive but keep your media in cheap slow storage. This is an excellent arrangement that works for many users. And if you have a very large collection, you can even spread your files across multiple drives. It is not very technically difficult, but I do not recommend it for new users.
Backing such an arrangement up is obviously more complicated, and the internal client backup is not sophisticated enough to capture everything, so I recommend you figure out a broader solution with a third-party backup program like FreeFileSync.
"},{"location":"database_migration.html#pulling_media_apart","title":"pulling your media apart","text":"Danger
As always, I recommend creating a backup before you try any of this, just in case it goes wrong.
If you would like to move your files and thumbnails to new locations, I generally recommend you not move their folders around yourself--the database has an internal knowledge of where it thinks its file and thumbnail folders are, and if you move them while it is closed, it will become confused.
Missing LocationsIf your folders are in the wrong locations on a client boot, a repair dialog appears, and you can manually update the client's internal understanding. This is not impossible to figure out, and in some tricky storage situations doing this on purpose can be faster than letting the client migrate things itself, but generally it is best and safest to do everything through the dialog.
Go database->migrate database, giving you this dialog:
The buttons let you add more locations and remove old ones. The operations on this dialog are simple and atomic--at no point is your db ever invalid.
Beneath db? means that the path is beneath the main db dir and so is stored internally as a relative path. Portable paths will still function if the database changes location between boots (for instance, if you run the client from a USB drive and it mounts under a different location).
Weight means the relative amount of media you would like to store in that location. It only matters if you are spreading your files across multiple locations. If location A has a weight of 1 and B has a weight of 2, A will get approximately one third of your files and B will get approximately two thirds.
Max Size means the max total size of files the client will want to store in that location. Again, it only matters if you are spreading your files across multiple locations, but it is a simple way to ensure you don't go over a particular smaller hard drive's size. One location must always be limitless. This is not precise, so give it some padding. When one location is maxed out, the remaining locations will distribute the remainder of the files according to their respective weights. For the meantime, this will not update by itself. If you import many files, the location may go over its limit and you will have to revisit 'migrate database' to rebalance your files again. Bear with me--I will fix this soon with the background migrate.
Let's set up an example move:
I made several changes:
C:\\hydrus_files
to store files.D:\\hydrus_files
to store files, with a max size of 128MB.C:\\hydrus_thumbs
as the location to store thumbnails.C:\\Hydrus Network\\db\\client_files
location.While the ideal usage has changed significantly, note that the current usage remains the same. Nothing moves until you click 'move files now'. Moving files will take some time to finish. Once done, it looks like this:
The current and ideal usages line up, and the defunct C:\\Hydrus Network\\db\\client_files
location, which no longer stores anything, is removed from the list.
A straight call to the hydrus_client executable will look for a SQLite database in install_dir/db. If one is not found, it will create one. If you move your database and then try to run the client again, it will try to create a new empty database in that old location!
To tell it about the new database location, pass it a -d
or --db_dir
command line argument, like so:
hydrus_client -d=\"D:\\media\\my_hydrus_database\"
hydrus_client --db_dir=\"G:\\misc documents\\New Folder (3)\\DO NOT ENTER\"
python hydrus_client.py -d=\"D:\\media\\my_hydrus_database\"
open -n -a \"Hydrus Network.app\" --args -d=\"/path/to/db\"
And it will instead use the given path. If no database is found, it will similarly create a new empty one at that location. You can use any path that is valid in your system.
Bad Locations
Do not run a SQLite database on a network location! The database relies on clever hardware-level exclusive file locks, which network interfaces often fake. While the program may work, I cannot guarantee the database will stay non-corrupt.
Do not run a SQLite database on a location with filesystem-level compression enabled! In the best case (BTRFS), the database can suddenly get extremely slow when it hits a certain size; in the worst (NTFS), a >50GB database will encounter I/O errors and receive sporadic corruption!
Rather than typing the path out in a terminal every time you want to launch your external database, create a new shortcut with the argument in. Something like this:
Note that an install with an 'external' database no longer needs access to write to its own path, so you can store it anywhere you like, including protected read-only locations (e.g. in 'Program Files'). Just double-check your shortcuts are good.
"},{"location":"database_migration.html#finally","title":"backups","text":"If your database now lives in one or more new locations, make sure to update your backup routine to follow them!
"},{"location":"database_migration.html#to_an_ssd","title":"moving to an SSD","text":"As an example, let's say you started using the hydrus client on your HDD, and now you have an SSD available and would like to move your thumbnails and main install to that SSD to speed up the client. Your database will be valid and functional at every stage of this, and it can all be undone. The basic steps are:
Specifically:
You should now have something like this (let's say the D drive is the fast SSD, and E is the high capacity HDD):
"},{"location":"database_migration.html#multiple_clients","title":"p.s. running multiple clients","text":"Since you now know how to tell the software about an external database, you can, if you like, run multiple clients from the same install (and if you previously had multiple install folders, now you can now just use the one). Just make multiple shortcuts to the same hydrus_client executable but with different database directories. They can run at the same time. You'll save yourself a little memory and update-hassle.
"},{"location":"developer_api.html","title":"API documentation","text":""},{"location":"developer_api.html#library_modules_created_by_hydrus_users","title":"Library modules created by hydrus users","text":"In general, the API deals with standard UTF-8 JSON. POST requests and 200 OK responses are generally going to be a JSON 'Object' with variable names as keys and values obviously as values. There are examples throughout this document. For GET requests, everything is in standard GET parameters, but some variables are complicated and will need to be JSON encoded and then URL encoded. An example would be the 'tags' parameter on GET /get_files/search_files, which is a list of strings. Since GET http URLs have limits on what characters are allowed, but hydrus tags can have all sorts of characters, you'll be doing this:
Your list of tags:
[ 'character:samus aran', 'creator:\u9752\u3044\u685c', 'system:height > 2000' ]\n
JSON encoded:
[\"character:samus aran\", \"creator:\\\\u9752\\\\u3044\\\\u685c\", \"system:height > 2000\"]\n
Then URL encoded:
%5B%22character%3Asamus%20aran%22%2C%20%22creator%3A%5Cu9752%5Cu3044%5Cu685c%22%2C%20%22system%3Aheight%20%3E%202000%22%5D\n
In python, converting your tag list to the URL encoded string would be:
urllib.parse.quote( json.dumps( tag_list ) )\n
Full URL path example:
/get_files/search_files?file_sort_type=6&file_sort_asc=false&tags=%5B%22character%3Asamus%20aran%22%2C%20%22creator%3A%5Cu9752%5Cu3044%5Cu685c%22%2C%20%22system%3Aheight%20%3E%202000%22%5D\n
The API returns JSON for everything except actual file/thumbnail requests. Every JSON response includes the version
of the Client API and hydrus_version
of the Client hosting it (for brevity, these values are not included in the example responses in this help). For errors, you'll typically get 400 for a missing/invalid parameter, 401/403/419 for missing/insufficient/expired access, and 500 for a real deal serverside error.
Note
For any request sent to the API, the total size of the initial request line (this includes the URL and any parameters) and the headers must not be larger than 2 megabytes. Exceeding this limit will cause the request to fail. Make sure to use pagination if you are passing very large JSON arrays as parameters in a GET request.
"},{"location":"developer_api.html#cbor","title":"CBOR","text":"The API now tentatively supports CBOR, which is basically 'byte JSON'. If you are in a lower level language or need to do a lot of heavy work quickly, try it out!
To send CBOR, for POST put Content-Type application/cbor
in your request header instead of application/json
, and for GET just add a cbor=1
parameter to the URL string. Use CBOR to encode any parameters that you would previously put in JSON:
For POST requests, just print the pure bytes in the body, like this:
cbor2.dumps( arg_dict )\n
For GET, encode the parameter value in base64, like this:
base64.urlsafe_b64encode( cbor2.dumps( argument ) )\n
-or- str( base64.urlsafe_b64encode( cbor2.dumps( argument ) ), 'ascii' )\n
If you send CBOR, the client will return CBOR. If you want to send CBOR and get JSON back, or vice versa (or you are uploading a file and can't set CBOR Content-Type), send the Accept request header, like so:
Accept: application/cbor\nAccept: application/json\n
If the client does not support CBOR, you'll get 406.
"},{"location":"developer_api.html#access_and_permissions","title":"Access and permissions","text":"The client gives access to its API through different 'access keys', which are the typical 64-character hex used in many other places across hydrus. Each guarantees different permissions such as handling files or tags. Most of the time, a user will provide full access, but do not assume this. If the access header or parameter is not provided, you will get 401, and all insufficient permission problems will return 403 with appropriate error text.
Access is required for every request. You can provide this as an http header, like so:
Hydrus-Client-API-Access-Key : 0150d9c4f6a6d2082534a997f4588dcf0c56dffe1d03ffbf98472236112236ae\n
Or you can include it in the normal parameters of any request (except POST /add_files/add_file, which uses the entire POST body for the file's bytes). For GET, this means including it into the URL parameters:
/get_files/thumbnail?file_id=452158&Hydrus-Client-API-Access-Key=0150d9c4f6a6d2082534a997f4588dcf0c56dffe1d03ffbf98472236112236ae\n
For POST, this means in the JSON body parameters, like so:
{\n \"hash_id\" : 123456,\n \"Hydrus-Client-API-Access-Key\" : \"0150d9c4f6a6d2082534a997f4588dcf0c56dffe1d03ffbf98472236112236ae\"\n}\n
There is also a simple 'session' system, where you can get a temporary key that gives the same access without having to include the permanent access key in every request. You can fetch a session key with the /session_key command and thereafter use it just as you would an access key, just with Hydrus-Client-API-Session-Key instead.
Session keys will expire if they are not used within 24 hours, or if the client is restarted, or if the underlying access key is deleted. An invalid/expired session key will give a 419 result with an appropriate error text.
Bear in mind the Client API is still under construction. Setting up the Client API to be accessible across the internet requires technical experience to be convenient. HTTPS is available for encrypted comms, but the default certificate is self-signed (which basically means an eavesdropper can't see through it, but your ISP/government could if they decided to target you). If you have your own domain to host from and an SSL cert, you can replace them and it'll use them instead (check the db directory for client.crt and client.key). Otherwise, be careful about transmitting sensitive content outside of your localhost/network.
"},{"location":"developer_api.html#common_complex_parameters","title":"Common Complex Parameters","text":""},{"location":"developer_api.html#parameters_files","title":"files","text":"If you need to refer to some files, you can use any of the following:
Arguments:file_id
: (selective, a numerical file id)file_ids
: (selective, a list of numerical file ids)hash
: (selective, a hexadecimal SHA256 hash)hashes
: (selective, a list of hexadecimal SHA256 hashes)In GET requests, make sure any list is percent-encoded.
"},{"location":"developer_api.html#parameters_file_domain","title":"file domain","text":"When you are searching, you may want to specify a particular file domain. Most of the time, you'll want to just set file_service_key
, but this can get complex:
file_service_key
: (optional, selective A, hexadecimal, the file domain on which to search)file_service_keys
: (optional, selective A, list of hexadecimals, the union of file domains on which to search)deleted_file_service_key
: (optional, selective B, hexadecimal, the 'deleted from this file domain' on which to search)deleted_file_service_keys
: (optional, selective B, list of hexadecimals, the union of 'deleted from this file domain' on which to search)The service keys are as in /get_services.
Hydrus supports two concepts here:
You can play around with this yourself by clicking 'multiple locations' in the client with help->advanced mode on.
In extreme edge cases, these two can be mixed by populating both A and B selective, making a larger union of both current and deleted file records.
Please note that unions can be very very computationally expensive. If you can achieve what you want with a single file_service_key, two queries in a row with different service keys, or an umbrella like all my files
or all local files
, please do. Otherwise, let me know what is running slow and I'll have a look at it.
'deleted from all local files' includes all files that have been physically deleted (i.e. deleted from the trash) and not available any more for fetch file/thumbnail requests. 'deleted from all my files' includes all of those physically deleted files and the trash. If a file is deleted with the special 'do not leave a deletion record' command, then it won't show up in a 'deleted from file domain' search!
'all known files' is a tricky domain. It converts much of the search tech to ignore where files actually are and look at the accompanying tag domain (e.g. all the files that have been tagged), and can sometimes be very expensive.
Also, if you have the option to set both file and tag domains, you cannot enter 'all known files'/'all known tags'. It is too complicated to support, sorry!
"},{"location":"developer_api.html#legacy_service_name_parameters","title":"legacy service_name parameters","text":"The Client API used to respond to name-based service identifiers, for instance using 'my tags' instead of something like '6c6f63616c2074616773'. Service names can change, and they aren't strictly unique either, so I have moved away from them, but there is some soft legacy support.
The client will attempt to convert any of these to their 'service_key(s)' equivalents:
But I strongly encourage you to move away from them as soon as reasonably possible. Look up the service keys you need with /get_service or /get_services.
If you have a clever script/program that does many things, then hit up /get_services on session initialisation and cache an internal map of key_to_name for the labels to use when you present services to the user.
Also, note that all users can now copy their service keys from review services.
"},{"location":"developer_api.html#services_object","title":"The Services Object","text":"Hydrus manages its different available domains and actions with what it calls services. If you are a regular user of the program, you will know about review services and manage services. The Client API needs to refer to services, either to accept commands from you or to tell you what metadata files have and where.
When it does this, it gives you this structure, typically under a services
key right off the root node:
{\n \"c6f63616c2074616773\" : {\n \"name\" : \"my tags\",\n \"type\": 5,\n \"type_pretty\" : \"local tag service\"\n },\n \"5674450950748cfb28778b511024cfbf0f9f67355cf833de632244078b5a6f8d\" : {\n \"name\" : \"example tag repo\",\n \"type\" : 0,\n \"type_pretty\" : \"hydrus tag repository\"\n },\n \"6c6f63616c2066696c6573\" : {\n \"name\" : \"my files\",\n \"type\" : 2,\n \"type_pretty\" : \"local file domain\"\n },\n \"7265706f7369746f72792075706461746573\" : {\n \"name\" : \"repository updates\",\n \"type\" : 20,\n \"type_pretty\" : \"local update file domain\"\n },\n \"ae7d9a603008919612894fc360130ae3d9925b8577d075cd0473090ac38b12b6\" : {\n \"name\": \"example file repo\",\n \"type\" : 1,\n \"type_pretty\" : \"hydrus file repository\"\n },\n \"616c6c206c6f63616c2066696c6573\" : {\n \"name\" : \"all local files\",\n \"type\": 15,\n \"type_pretty\" : \"virtual combined local file service\"\n },\n \"616c6c206c6f63616c206d65646961\" : {\n \"name\" : \"all my files\",\n \"type\" : 21,\n \"type_pretty\" : \"virtual combined local media service\"\n },\n \"616c6c206b6e6f776e2066696c6573\" : {\n \"name\" : \"all known files\",\n \"type\" : 11,\n \"type_pretty\" : \"virtual combined file service\"\n },\n \"616c6c206b6e6f776e2074616773\" : {\n \"name\" : \"all known tags\",\n \"type\": 10,\n \"type_pretty\" : \"virtual combined tag service\"\n },\n \"74d52c6238d25f846d579174c11856b1aaccdb04a185cb2c79f0d0e499284f2c\" : {\n \"name\" : \"example local rating like service\",\n \"type\" : 7,\n \"type_pretty\" : \"local like/dislike rating service\",\n \"star_shape\" : \"circle\"\n },\n \"90769255dae5c205c975fc4ce2efff796b8be8a421f786c1737f87f98187ffaf\" : {\n \"name\" : \"example local rating numerical service\",\n \"type\" : 6,\n \"type_pretty\" : \"local numerical rating service\",\n \"star_shape\" : \"fat star\",\n \"min_stars\" : 1,\n \"max_stars\" : 5\n },\n \"b474e0cbbab02ca1479c12ad985f1c680ea909a54eb028e3ad06750ea40d4106\" : {\n \"name\" : \"example local rating inc/dec service\",\n \"type\" : 22,\n \"type_pretty\" : \"local inc/dec rating service\"\n },\n \"7472617368\" : {\n \"name\" : \"trash\",\n \"type\" : 14,\n \"type_pretty\" : \"local trash file domain\"\n }\n}\n
I hope you recognise some of the information here. But what's that hex key on each section? It is the service_key
.
All services have these properties:
name
- A mutable human-friendly name like 'my tags'. You can use this to present the service to the user--they should recognise it.type
- An integer enum saying whether the service is a local tag service or like/dislike rating service or whatever. This cannot change.service_key
- The true 'id' of the service. It is a string of hex, sometimes just twenty or so characters but in many cases 64 characters. This cannot change, and it is how we will refer to different services.This service_key
is important. A user can rename their services, so name
is not an excellent identifier, and definitely not something you should save to any permanent config file.
If we want to search some files on a particular file and tag domain, we should expect to be saying something like file_service_key=6c6f63616c2066696c6573
and tag_service_key=f032e94a38bb9867521a05dc7b189941a9c65c25048911f936fc639be2064a4b
somewhere in the request.
You won't see all of these, but the service type
enum is:
type_pretty
is something you can show users. Hydrus uses the same labels in manage services and so on.
Rating services now have some extra data:
star_shape
, which is one of circle | square | fat star | pentagram star
min_stars
(0 or 1) and max_stars
(1 to 20)If you are displaying ratings, don't feel crazy obligated to obey the shape! Show a \u2158, select from a dropdown list, do whatever you like!
If you want to know the services in a client, hit up /get_services, which simply gives the above. The same structure has recently been added to /get_files/file_metadata for convenience, since that refers to many different services when it is talking about file locations and ratings and so on.
Note: If you need to do some quick testing, you should be able to copy the service_key
of any service by hitting the 'copy service key' button in review services.
/api_version
","text":"Gets the current API version. This increments every time I alter the API.
Restricted access: NO.
Required Headers: n/a
Arguments: n/a
Response: Some simple JSON describing the current api version (and hydrus client version, if you are interested). Note that this is not very useful any more, for two reasons:{\n \"version\" : 17,\n \"hydrus_version\" : 441\n}\n
"},{"location":"developer_api.html#request_new_permissions","title":"GET /request_new_permissions
","text":"Register a new external program with the client. This requires the 'add from api request' mini-dialog under services->review services to be open, otherwise it will 403.
Restricted access: NO.
Required Headers: n/a
Arguments:name
: (descriptive name of your access)basic_permissions
: A JSON-encoded list of numerical permission identifiers you want to request.
The permissions are currently:
/request_new_permissions?name=my%20import%20script&basic_permissions=[0,1]\n
Response: Some JSON with your access key, which is 64 characters of hex. This will not be valid until the user approves the request in the client ui. Example response{\n \"access_key\" : \"73c9ab12751dcf3368f028d3abbe1d8e2a3a48d0de25e64f3a8f00f3a1424c57\"\n}\n
"},{"location":"developer_api.html#session_key","title":"GET /session_key
","text":"Get a new session key.
Restricted access: YES. No permissions required.
Required Headers: n/a
Arguments: n/a
Response: Some JSON with a new session key in hex. Example response{\n \"session_key\" : \"f6e651e7467255ade6f7c66050f3d595ff06d6f3d3693a3a6fb1a9c2b278f800\"\n}\n
Note
Note that the access you provide to get a new session key can be a session key, if that happens to be useful. As long as you have some kind of access, you can generate a new session key.
A session key expires after 24 hours of inactivity, whenever the client restarts, or if the underlying access key is deleted. A request on an expired session key returns 419.
"},{"location":"developer_api.html#verify_access_key","title":"GET/verify_access_key
","text":"Check your access key is valid.
Restricted access: YES. No permissions required.
Required Headers: n/a
Arguments: n/a
Response: 401/403/419 and some error text if the provided access/session key is invalid, otherwise some JSON with basic permission info. Example response{\n \"basic_permissions\" : [0, 1, 3],\n \"human_description\" : \"API Permissions (autotagger): add tags to files, import files, search for files: Can search: only autotag this\"\n}\n
"},{"location":"developer_api.html#get_service","title":"GET /get_service
","text":"Ask the client about a specific service.
Restricted access: YES. At least one of Add Files, Add Tags, Manage Pages, or Search Files permission needed.Required Headers: n/a
Arguments:service_name
: (selective, string, the name of the service)service_key
: (selective, hex string, the service key of the service)/get_service?service_name=my%20tags\n/get_service?service_key=6c6f63616c2074616773\n
Response: Some JSON about the service. A similar format as /get_services and The Services Object. Example response{\n \"service\" : {\n \"name\" : \"my tags\",\n \"service_key\" : \"6c6f63616c2074616773\",\n \"type\" : 5,\n \"type_pretty\" : \"local tag service\"\n }\n}\n
If the service does not exist, this gives 404. It is very unlikely but edge-case possible that two services will have the same name, in this case you'll get the pseudorandom first.
It will only respond to services in the /get_services list. I will expand the available types in future as we add ratings etc... to the Client API.
"},{"location":"developer_api.html#get_services","title":"GET/get_services
","text":"Ask the client about its services.
Restricted access: YES. At least one of Add Files, Add Tags, Manage Pages, or Search Files permission needed.Required Headers: n/a
Arguments: n/a
Response: Some JSON listing the client's services. Example response{\n \"services\" : \"The Services Object\"\n}\n
This now primarily uses The Services Object.
Note
If you do the request and look at the actual response, you will see a lot more data under different keys--this is deprecated, and will be deleted in 2024. If you use the old structure, please move over!
"},{"location":"developer_api.html#importing_and_deleting_files","title":"Importing and Deleting Files","text":""},{"location":"developer_api.html#add_files_add_file","title":"POST/add_files/add_file
","text":"Tell the client to import a file.
Restricted access: YES. Import Files permission needed. Required Headers:application/json
(if sending path), application/octet-stream
(if sending file)path
: (the path you want to import){\n \"path\" : \"E:\\\\to_import\\\\ayanami.jpg\"\n}\n
Arguments (as bytes): You can alternately just send the file's bytes as the POST body. Response: Some JSON with the import result. Please note that file imports for large files may take several seconds, and longer if the client is busy doing other db work, so make sure your request is willing to wait that long for the response. Example response
{\n \"status\" : 1,\n \"hash\" : \"29a15ad0c035c0a0e86e2591660207db64b10777ced76565a695102a481c3dd1\",\n \"note\" : \"\"\n}\n
status
is:
A file 'veto' is caused by the file import options (which in this case is the 'quiet' set under the client's options->importing) stopping the file due to its resolution or minimum file size rules, etc...
'hash' is the file's SHA256 hash in hexadecimal, and 'note' is any additional human-readable text appropriate to the file status that you may recognise from hydrus's normal import workflow. For an outright import error, it will be a summary of the exception that you can present to the user, and a new field traceback
will have the full trace for debugging purposes.
/add_files/delete_files
","text":"Tell the client to send files to the trash.
Restricted access: YES. Import Files permission needed. Required Headers:Content-Type
: application/json
reason
: (optional, string, the reason attached to the delete action){\n \"hash\" : \"78f92ba4a786225ee2a1236efa6b7dc81dd729faf4af99f96f3e20bad6d8b538\"\n}\n
Response: 200 and no content. If you specify a file service, the file will only be deleted from that location. Only local file domains are allowed (so you can't delete from a file repository or unpin from ipfs yet). It defaults to 'all my files', which will delete from all local services (i.e. force sending to trash). Sending 'all local files' on a file already in the trash will trigger a physical file delete.
"},{"location":"developer_api.html#add_files_undelete_files","title":"POST/add_files/undelete_files
","text":"Tell the client to pull files back out of the trash.
Restricted access: YES. Import Files permission needed. Required Headers:Content-Type
: application/json{\n \"hash\" : \"78f92ba4a786225ee2a1236efa6b7dc81dd729faf4af99f96f3e20bad6d8b538\"\n}\n
Response: 200 and no content. You can use hash or hashes, whichever is more convenient.
This is the reverse of a delete_files--removing files from trash and putting them back where they came from. If you specify a file service, the files will only be undeleted to there (if they have a delete record, otherwise this is nullipotent). The default, 'all my files', undeletes to all local file services for which there are deletion records. There is no error if any of the files do not currently exist in 'trash'.
"},{"location":"developer_api.html#add_files_archive_files","title":"POST/add_files/archive_files
","text":"Tell the client to archive inboxed files.
Restricted access: YES. Import Files permission needed. Required Headers:Content-Type
: application/json{\n \"hash\" : \"78f92ba4a786225ee2a1236efa6b7dc81dd729faf4af99f96f3e20bad6d8b538\"\n}\n
Response: 200 and no content. This puts files in the 'archive', taking them out of the inbox. It only has meaning for files currently in 'my files' or 'trash'. There is no error if any files do not currently exist or are already in the archive.
"},{"location":"developer_api.html#add_files_unarchive_files","title":"POST/add_files/unarchive_files
","text":"Tell the client re-inbox archived files.
Restricted access: YES. Import Files permission needed. Required Headers:Content-Type
: application/json{\n \"hash\" : \"78f92ba4a786225ee2a1236efa6b7dc81dd729faf4af99f96f3e20bad6d8b538\"\n}\n
Response: 200 and no content. This puts files back in the inbox, taking them out of the archive. It only has meaning for files currently in 'my files' or 'trash'. There is no error if any files do not currently exist or are already in the inbox.
"},{"location":"developer_api.html#add_files_generate_hashes","title":"POST/add_files/generate_hashes
","text":"Generate hashes for an arbitrary file.
Restricted access: YES. Import Files permission needed. Required Headers:application/json
(if sending path), application/octet-stream
(if sending file)path
: (the path you want to import){\n \"path\" : \"E:\\\\to_import\\\\ayanami.jpg\"\n}\n
Arguments (as bytes): You can alternately just send the file's bytes as the POST body. Response: Some JSON with the hashes of the file Example response
{\n \"hash\": \"7de421a3f9be871a7037cca8286b149a31aecb6719268a94188d76c389fa140c\",\n \"perceptual_hashes\": [\n \"b44dc7b24dcb381c\"\n ],\n \"pixel_hash\": \"c7bf20e5c4b8a524c2c3e3af2737e26975d09cba2b3b8b76341c4c69b196da4e\",\n}\n
hash
is the sha256 hash of the submitted file.perceptual_hashes
is a list of perceptual hashes for the file.pixel_hash
is the sha256 hash of the pixel data of the rendered image.hash
will always be returned for any file, the others will only be returned for filetypes they can be generated for.
/add_urls/get_url_files
","text":"Ask the client about an URL's files.
Restricted access: YES. Import URLs permission needed.Required Headers: n/a
Arguments:url
: (the url you want to ask about)doublecheck_file_system
: true or false (optional, defaults False)http://safebooru.org/index.php?page=post&s=view&id=2753608
: /add_urls/get_url_files?url=http%3A%2F%2Fsafebooru.org%2Findex.php%3Fpage%3Dpost%26s%3Dview%26id%3D2753608\n
Response: Some JSON which files are known to be mapped to that URL. Note this needs a database hit, so it may be delayed if the client is otherwise busy. Don't rely on this to always be fast. Example response{\n \"normalised_url\" : \"https://safebooru.org/index.php?id=2753608&page=post&s=view\",\n \"url_file_statuses\" : [\n {\n \"status\" : 2,\n \"hash\" : \"20e9002824e5e7ffc240b91b6e4a6af552b3143993c1778fd523c30d9fdde02c\",\n \"note\" : \"url recognised: Imported at 2015/10/18 10:58:01, which was 3 years 4 months ago (before this check).\"\n }\n ]\n}\n
The url_file_statuses
is a list of zero-to-n JSON Objects, each representing a file match the client found in its database for the URL. Typically, it will be of length 0 (for as-yet-unvisited URLs or Gallery/Watchable URLs that are not attached to files) or 1, but sometimes multiple files are given the same URL (sometimes by mistaken misattribution, sometimes by design, such as pixiv manga pages). Handling n files per URL is a pain but an unavoidable issue you should account for.
status
is the same as for /add_files/add_file
:
hash
is the file's SHA256 hash in hexadecimal, and 'note' is some occasional additional human-readable text you may recognise from hydrus's normal import workflow.
If you set doublecheck_file_system
to true
, then any result that is 'already in db' (2) will be double-checked against the actual file system. This check happens on any normal file import process, just to check for and fix missing files (if the file is missing, the status becomes 0--new), but the check can take more than a few milliseconds on an HDD or a network drive, so the default behaviour, assuming you mostly just want to spam for 'seen this before' file statuses, is to not do it.
/add_urls/get_url_info
","text":"Ask the client for information about a URL.
Restricted access: YES. Import URLs permission needed.Required Headers: n/a
Arguments:url
: (the url you want to ask about)https://8ch.net/tv/res/1846574.html
: /add_urls/get_url_info?url=https%3A%2F%2F8ch.net%2Ftv%2Fres%2F1846574.html\n
Response: Some JSON describing what the client thinks of the URL. Example response
{\n \"normalised_url\" : \"https://8ch.net/tv/res/1846574.html\",\n \"url_type\" : 4,\n \"url_type_string\" : \"watchable url\",\n \"match_name\" : \"8chan thread\",\n \"can_parse\" : true\n}\n
The url types are currently:
'Unknown' URLs are treated in the client as direct File URLs. Even though the 'File URL' type is available, most file urls do not have a URL Class, so they will appear as Unknown. Adding them to the client will pass them to the URL Downloader as a raw file for download and import.
"},{"location":"developer_api.html#add_urls_add_url","title":"POST/add_urls/add_url
","text":"Tell the client to 'import' a URL. This triggers the exact same routine as drag-and-dropping a text URL onto the main client window.
Restricted access: YES. Import URLs permission needed. Add Tags needed to include tags. Required Headers:Content-Type
: application/json
url
: (the url you want to add)destination_page_key
: (optional page identifier for the page to receive the url)destination_page_name
: (optional page name to receive the url)show_destination_page
: (optional, defaulting to false, controls whether the UI will change pages on add)service_keys_to_additional_tags
: (optional, selective, tags to give to any files imported from this url)filterable_tags
: (optional tags to be filtered by any tag import options that applies to the URL)If you specify a destination_page_name
and an appropriate importer page already exists with that name, that page will be used. Otherwise, a new page with that name will be recreated (and used by subsequent calls with that name). Make sure it that page name is unique (e.g. '/b/ threads', not 'watcher') in your client, or it may not be found.
Alternately, destination_page_key
defines exactly which page should be used. Bear in mind this page key is only valid to the current session (they are regenerated on client reset or session reload), so you must figure out which one you want using the /manage_pages/get_pages call. If the correct page_key is not found, or the page it corresponds to is of the incorrect type, the standard page selection/creation rules will apply.
show_destination_page
defaults to False to reduce flicker when adding many URLs to different pages quickly. If you turn it on, the client will behave like a URL drag and drop and select the final page the URL ends up on.
service_keys_to_additional_tags
uses the same data structure as in /add_tags/add_tags--service keys to a list of tags to add. You will need 'add tags' permission or this will 403. These tags work exactly as 'additional' tags work in a tag import options. They are service specific, and always added unless some advanced tag import options checkbox (like 'only add tags to new files') is set.
filterable_tags works like the tags parsed by a hydrus downloader. It is just a list of strings. They have no inherant service and will be sent to a tag import options, if one exists, to decide which tag services get what. This parameter is useful if you are pulling all a URL's tags outside of hydrus and want to have them processed like any other downloader, rather than figuring out service names and namespace filtering on your end. Note that in order for a tag import options to kick in, I think you will have to have a Post URL URL Class hydrus-side set up for the URL so some tag import options (whether that is Class-specific or just the default) can be loaded at import time.
Example request body
{\n \"url\" : \"https://8ch.net/tv/res/1846574.html\",\n \"destination_page_name\" : \"kino zone\",\n \"service_keys_to_additional_tags\" : {\n \"6c6f63616c2074616773\" : [\"as seen on /tv/\"]\n }\n}\n
Example request body{\n \"url\" : \"https://safebooru.org/index.php?page=post&s=view&id=3195917\",\n \"filterable_tags\" : [\n \"1girl\",\n \"artist name\",\n \"creator:azto dio\",\n \"blonde hair\",\n \"blue eyes\",\n \"breasts\",\n \"character name\",\n \"commentary\",\n \"english commentary\",\n \"formal\",\n \"full body\",\n \"glasses\",\n \"gloves\",\n \"hair between eyes\",\n \"high heels\",\n \"highres\",\n \"large breasts\",\n \"long hair\",\n \"long sleeves\",\n \"looking at viewer\",\n \"series:metroid\",\n \"mole\",\n \"mole under mouth\",\n \"patreon username\",\n \"ponytail\",\n \"character:samus aran\",\n \"solo\",\n \"standing\",\n \"suit\",\n \"watermark\"\n ]\n}\n
Response: Some JSON with info on the URL added. Example response{\n \"human_result_text\" : \"\\\"https://8ch.net/tv/res/1846574.html\\\" URL added successfully.\",\n \"normalised_url\" : \"https://8ch.net/tv/res/1846574.html\"\n}\n
"},{"location":"developer_api.html#add_urls_associate_url","title":"POST /add_urls/associate_url
","text":"Manage which URLs the client considers to be associated with which files.
Restricted access: YES. Import URLs permission needed. Required Headers:Content-Type
: application/json
url_to_add
: (optional, selective A, an url you want to associate with the file(s))urls_to_add
: (optional, selective A, a list of urls you want to associate with the file(s))url_to_delete
: (optional, selective B, an url you want to disassociate from the file(s))urls_to_delete
: (optional, selective B, a list of urls you want to disassociate from the file(s))The single/multiple arguments work the same--just use whatever is convenient for you. Unless you really know what you are doing with URL Classes, I strongly recommend you stick to associating URLs with just one single 'hash' at a time. Multiple hashes pointing to the same URL is unusual and frequently unhelpful. Example request body
{\n \"url_to_add\" : \"https://rule34.xxx/index.php?id=2588418&page=post&s=view\",\n \"hash\" : \"3b820114f658d768550e4e3d4f1dced3ff8db77443472b5ad93700647ad2d3ba\"\n}\n
Response: 200 with no content. Like when adding tags, this is safely idempotent--do not worry about re-adding URLs associations that already exist or accidentally trying to delete ones that don't."},{"location":"developer_api.html#editing_file_tags","title":"Editing File Tags","text":""},{"location":"developer_api.html#add_tags_clean_tags","title":"GET /add_tags/clean_tags
","text":"Ask the client about how it will see certain tags.
Restricted access: YES. Add Tags permission needed.Required Headers: n/a
Arguments (in percent-encoded JSON):tags
: (a list of the tags you want cleaned)[ \" bikini \", \"blue eyes\", \" character : samus aran \", \" :)\", \" \", \"\", \"10\", \"11\", \"9\", \"system:wew\", \"-flower\" ]
: /add_tags/clean_tags?tags=%5B%22%20bikini%20%22%2C%20%22blue%20%20%20%20eyes%22%2C%20%22%20character%20%3A%20samus%20aran%20%22%2C%20%22%3A%29%22%2C%20%22%20%20%20%22%2C%20%22%22%2C%20%2210%22%2C%20%2211%22%2C%20%229%22%2C%20%22system%3Awew%22%2C%20%22-flower%22%5D\n
Response: The tags cleaned according to hydrus rules. They will also be in hydrus human-friendly sorting order. Example response
{\n \"tags\" : [\"9\", \"10\", \"11\", \" ::)\", \"bikini\", \"blue eyes\", \"character:samus aran\", \"flower\", \"wew\"]\n}\n
Mostly, hydrus simply trims excess whitespace, but the other examples are rare issues you might run into. 'system' is an invalid namespace, tags cannot be prefixed with hyphens, and any tag starting with ':' is secretly dealt with internally as \"[no namespace]:[colon-prefixed-subtag]\". Again, you probably won't run into these, but if you see a mismatch somewhere and want to figure it out, or just want to sort some numbered tags, you might like to try this.
"},{"location":"developer_api.html#add_tags_get_siblings_and_parents","title":"GET/add_tags/get_siblings_and_parents
","text":"Ask the client about tags' sibling and parent relationships.
Restricted access: YES. Add Tags permission needed.Required Headers: n/a
Arguments (in percent-encoded JSON):tags
: (a list of the tags you want info on)[ \"blue eyes\", \"samus aran\" ]
: /add_tags/get_siblings_and_parents?tags=%5B%22blue%20eyes%22%2C%20%22samus%20aran%22%5D\n
Response: An Object showing all the display relationships for each tag on each service. Also The Services Object. Example response
{\n \"services\" : \"The Services Object\"\n \"tags\" : {\n \"blue eyes\" : {\n \"6c6f63616c2074616773\" : {\n \"ideal_tag\" : \"blue eyes\",\n \"siblings\" : [\n \"blue eyes\",\n \"blue_eyes\",\n \"blue eye\",\n \"blue_eye\"\n ],\n \"descendants\" : [],\n \"ancestors\" : []\n },\n \"877bfcf81f56e7e3e4bc3f8d8669f92290c140ba0acfd6c7771c5e1dc7be62d7\": {\n \"ideal_tag\" : \"blue eyes\",\n \"siblings\" : [\n \"blue eyes\"\n ],\n \"descendants\" : [],\n \"ancestors\" : []\n }\n },\n \"samus aran\" : {\n \"6c6f63616c2074616773\" : {\n \"ideal_tag\" : \"character:samus aran\",\n \"siblings\" : [\n \"samus aran\",\n \"samus_aran\",\n \"character:samus aran\"\n ],\n \"descendants\" : [\n \"character:samus aran (zero suit)\"\n \"cosplay:samus aran\"\n ],\n \"ancestors\" : [\n \"series:metroid\",\n \"studio:nintendo\"\n ]\n },\n \"877bfcf81f56e7e3e4bc3f8d8669f92290c140ba0acfd6c7771c5e1dc7be62d7\": {\n \"ideal_tag\" : \"samus aran\",\n \"siblings\" : [\n \"samus aran\"\n ],\n \"descendants\" : [\n \"zero suit samus\",\n \"samus_aran_(cosplay)\"\n ],\n \"ancestors\" : []\n }\n }\n }\n}\n
This data is essentially how mappings in the storage
tag_display_type
become display
.
The hex keys are the service keys, which you will have seen elsewhere, like GET /get_files/file_metadata. Note that there is no concept of 'all known tags' here. If a tag is in 'my tags', it follows the rules of 'my tags', and then all the services' display tags are merged into the 'all known tags' pool for user display.
Also, the siblings and parents here are not just what is in tags->manage tag siblings/parents, they are the final computed combination of rules as set in tags->manage where tag siblings and parents apply. The data given here is not guaranteed to be useful for editing siblings and parents on a particular service. That data, which is currently pair-based, will appear in a different API request in future.
ideal_tag
is how the tag appears in normal display to the user.siblings
is every tag that will show as the ideal_tag
, including the ideal_tag
itself.descendants
is every child (and recursive grandchild, great-grandchild...) that implies the ideal_tag
.ancestors
is every parent (and recursive grandparent, great-grandparent...) that our tag implies.Every descendant and ancestor is an ideal_tag
itself that may have its own siblings.
Most situations are simple, but remember that siblings and parents in hydrus can get complex. If you want to display this data, I recommend you plan to support simple service-specific workflows, and add hooks to recognise conflicts and other difficulty and, when that happens, abandon ship (send the user back to Hydrus proper). Also, if you show summaries of the data anywhere, make sure you add a 'and 22 more...' overflow mechanism to your menus, since if you hit up 'azur lane' or 'pokemon', you are going to get hundreds of children.
I generally warn you off computing sibling and parent mappings or counts yourself. The data from this request is best used for sibling and parent decorators on individual tags in a 'manage tags' presentation. The code that actually computes what siblings and parents look like in the 'display' context can be a pain at times, and I've already done it. Just run /search_tags or /file_metadata again after any changes you make and you'll get updated values.
"},{"location":"developer_api.html#add_tags_search_tags","title":"GET/add_tags/search_tags
","text":"Search the client for tags.
Restricted access: YES. Search for Files and Add Tags permission needed.Required Headers: n/a
Arguments:search
: (the tag text to search for, enter exactly what you would in the client UI)tag_service_key
: (optional, hexadecimal, the tag domain on which to search, defaults to 'all known tags')tag_display_type
: (optional, string, to select whether to search raw or sibling-processed tags, defaults to 'storage')The file domain
and tag_service_key
perform the function of the file and tag domain buttons in the client UI.
The tag_display_type
can be either storage
(the default), which searches your file's stored tags, just as they appear in a 'manage tags' dialog, or display
, which searches the sibling-processed tags, just as they appear in a normal file search page. In the example above, setting the tag_display_type
to display
could well combine the two kim possible tags and give a count of 3 or 4.
'all my files'/'all known tags' works fine for most cases, but a specific tag service or 'all known files'/'tag service' can work better for editing tag repository storage
contexts, since it provides results just for that service, and for repositories, it gives tags for all the non-local files other users have tagged.
/add_tags/search_tags?search=kim&tag_display_type=display\n
Response: Some JSON listing the client's matching tags. Example response{\n \"tags\" : [\n {\n \"value\" : \"series:kim possible\", \n \"count\" : 3\n },\n {\n \"value\" : \"kimchee\", \n \"count\" : 2\n },\n {\n \"value\" : \"character:kimberly ann possible\", \n \"count\" : 1\n }\n ]\n}\n
The tags
list will be sorted by descending count. The various rules in tags->manage tag display and search (e.g. no pure *
searches on certain services) will also be checked--and if violated, you will get 200 OK but an empty result.
Note that if your client api access is only allowed to search certain tags, the results will be similarly filtered.
"},{"location":"developer_api.html#add_tags_add_tags","title":"POST/add_tags/add_tags
","text":"Make changes to the tags that files have.
Restricted access: YES. Add Tags permission needed.Required Headers: n/a
Arguments (in JSON):service_keys_to_tags
: (selective B, an Object of service keys to lists of tags to be 'added' to the files)service_keys_to_actions_to_tags
: (selective B, an Object of service keys to content update actions to lists of tags)In 'service_keys_to...', the keys are as in /get_services. You may need some selection UI on your end so the user can pick what to do if there are multiple choices.
Also, you can use either '...to_tags', which is simple and add-only, or '...to_actions_to_tags', which is more complicated and allows you to remove/petition or rescind pending content.
The permitted 'actions' are:
When you petition a tag from a repository, a 'reason' for the petition is typically needed. If you send a normal list of tags here, a default reason of \"Petitioned from API\" will be given. If you want to set your own reason, you can instead give a list of [ tag, reason ] pairs.
Some example requests:Adding some tags to a file
{\n \"hash\" : \"df2a7b286d21329fc496e3aa8b8a08b67bb1747ca32749acb3f5d544cbfc0f56\",\n \"service_keys_to_tags\" : {\n \"6c6f63616c2074616773\" : [\"character:supergirl\", \"rating:safe\"]\n }\n}\n
Adding more tags to two files{\n \"hashes\" : [\n \"df2a7b286d21329fc496e3aa8b8a08b67bb1747ca32749acb3f5d544cbfc0f56\",\n \"f2b022214e711e9a11e2fcec71bfd524f10f0be40c250737a7861a5ddd3faebf\"\n ],\n \"service_keys_to_tags\" : {\n \"6c6f63616c2074616773\" : [\"process this\"],\n \"ccb0cf2f9e92c2eb5bd40986f72a339ef9497014a5fb8ce4cea6d6c9837877d9\" : [\"creator:dandon fuga\"]\n }\n}\n
A complicated transaction with all possible actions{\n \"hash\" : \"df2a7b286d21329fc496e3aa8b8a08b67bb1747ca32749acb3f5d544cbfc0f56\",\n \"service_keys_to_actions_to_tags\" : {\n \"6c6f63616c2074616773\" : {\n \"0\" : [\"character:supergirl\", \"rating:safe\"],\n \"1\" : [\"character:superman\"]\n },\n \"aa0424b501237041dab0308c02c35454d377eebd74cfbc5b9d7b3e16cc2193e9\" : {\n \"2\" : [\"character:supergirl\", \"rating:safe\"],\n \"3\" : [\"filename:image.jpg\"],\n \"4\" : [[\"creator:danban faga\", \"typo\"], [\"character:super_girl\", \"underscore\"]],\n \"5\" : [\"skirt\"]\n }\n }\n}\n
This last example is far more complicated than you will usually see. Pend rescinds and petition rescinds are not common. Petitions are also quite rare, and gathering a good petition reason for each tag is often a pain.
Note that the enumerated status keys in the service_keys_to_actions_to_tags structure are strings, not ints (JSON does not support int keys for Objects).
Response description: 200 and no content.Note
Note also that hydrus tag actions are safely idempotent. You can pend a tag that is already pended, or add a tag that already exists, and not worry about an error--the surplus add action will be discarded. The same is true if you try to pend a tag that actually already exists, or rescinding a petition that doesn't. Any invalid actions will fail silently.
It is fine to just throw your 'process this' tags at every file import and not have to worry about checking which files you already added them to.
HOWEVER
When you delete a tag, a deletion record is made even if the tag does not exist on the file. This is important if you expect to add the tags again via parsing, because, in general, when hydrus adds tags through a downloader, it will not overwrite a previously 'deleted' tag record (this is to stop re-downloads overwriting the tags you hand-removed previously). Undeletes usually have to be done manually by a human.
So, do be careful about how you spam delete unless it is something that doesn't matter or it is something you'll only be touching again via the API anyway.
"},{"location":"developer_api.html#editing_file_ratings","title":"Editing File Ratings","text":""},{"location":"developer_api.html#edit_ratings_set_rating","title":"POST/edit_ratings/set_rating
","text":"Add or remove ratings associated with a file.
Restricted access: YES. Edit Ratings permission needed. Required Headers:Content-Type
: application/json
rating_service_key
: (hexadecimal, the rating service you want to edit)rating
: (mixed datatype, the rating value you want to set){\n \"hash\" : \"3b820114f658d768550e4e3d4f1dced3ff8db77443472b5ad93700647ad2d3ba\",\n \"rating_service_key\" : \"282303611ba853659aa60aeaa5b6312d40e05b58822c52c57ae5e320882ba26e\",\n \"rating\" : 2\n}\n
This is fairly simple, but there are some caveats around the different rating service types and the actual data you are setting here. It is the same as you'll see in GET /get_files/file_metadata.
"},{"location":"developer_api.html#likedislike_ratings","title":"Like/Dislike Ratings","text":"Send true
for 'like', false
for 'dislike', or null
for 'unset'.
Send an int
for the number of stars to set, or null
for 'unset'.
Send an int
for the number to set. 0 is your minimum.
As with GET /get_files/file_metadata, check The Services Object for the min/max stars on a numerical rating service.
Response: 200 and no content."},{"location":"developer_api.html#editing_file_notes","title":"Editing File Notes","text":""},{"location":"developer_api.html#add_notes_set_notes","title":"POST/add_notes/set_notes
","text":"Add or update notes associated with a file.
Restricted access: YES. Add Notes permission needed. Required Headers:Content-Type
: application/json
notes
: (an Object mapping string names to string texts)hash
: (selective, an SHA256 hash for the file in 64 characters of hexadecimal)file_id
: (selective, the integer numerical identifier for the file)merge_cleverly
: true or false (optional, defaults false)extend_existing_note_if_possible
: true or false (optional, defaults true)conflict_resolution
: 0, 1, 2, or 3 (optional, defaults 3)With merge_cleverly
left false
, then this is a simple update operation. Existing notes will be overwritten exactly as you specify. Any other notes the file has will be untouched. Example request body
{\n \"notes\" : {\n \"note name\" : \"content of note\",\n \"another note\" : \"asdf\"\n },\n \"hash\" : \"3b820114f658d768550e4e3d4f1dced3ff8db77443472b5ad93700647ad2d3ba\"\n}\n
If you turn on merge_cleverly
, then the client will merge your new notes into the file's existing notes using the same logic you have seen in Note Import Options and the Duplicate Metadata Merge Options. This navigates conflict resolution, and you should use it if you are adding potential duplicate content from an 'automatic' source like a parser and do not want to wade into the logic. Do not use it for a user-editing experience (a user expects a strict overwrite/replace experience and will be confused by this mode).
To start off, in this mode, if your note text exists under a different name for the file, your dupe note will not be added to your new name. extend_existing_note_if_possible
makes it so your existing note text will overwrite an existing name (or a '... (1)' rename of that name) if the existing text is inside your given text. conflict_resolution
is an enum governing what to do in all other conflicts:
merge_cleverly=false
, this is exactly what you gave, and this operation is idempotent. If merge_cleverly=true
, then this may differ, even be empty, and this operation might not be idempotent. Example response{\n \"notes\" : {\n \"note name\" : \"content of note\",\n \"another note (1)\" : \"asdf\"\n }\n}\n
"},{"location":"developer_api.html#add_notes_delete_notes","title":"POST /add_notes/delete_notes
","text":"Remove notes associated with a file.
Restricted access: YES. Add Notes permission needed. Required Headers:Content-Type
: application/json
note_names
: (a list of string note names to delete)hash
: (selective, an SHA256 hash for the file in 64 characters of hexadecimal)file_id
: (selective, the integer numerical identifier for the file){\n \"note_names\" : [\"note name\", \"another note\"],\n \"hash\" : \"3b820114f658d768550e4e3d4f1dced3ff8db77443472b5ad93700647ad2d3ba\"\n}\n
Response: 200 with no content. This operation is idempotent."},{"location":"developer_api.html#searching_and_fetching_files","title":"Searching and Fetching Files","text":"File search in hydrus is not paginated like a booru--all searches return all results in one go. In order to keep this fast, search is split into two steps--fetching file identifiers with a search, and then fetching file metadata in batches. You may have noticed that the client itself performs searches like this--thinking a bit about a search and then bundling results in batches of 256 files before eventually throwing all the thumbnails on screen.
"},{"location":"developer_api.html#get_files_search_files","title":"GET/get_files/search_files
","text":"Search for the client's files.
Restricted access: YES. Search for Files permission needed. Additional search permission limits may apply.Required Headers: n/a
Arguments (in percent-encoded JSON):tags
: (a list of tags you wish to search for)tag_service_key
: (optional, hexadecimal, the tag domain on which to search, defaults to 'all my files')file_sort_type
: (optional, integer, the results sort method, defaults to 'all known tags')file_sort_asc
: true or false (optional, the results sort order)return_file_ids
: true or false (optional, default true, returns file id results)return_hashes
: true or false (optional, default false, returns hex hash results)/get_files/search_files?tags=%5B%22blue%20eyes%22%2C%20%22blonde%20hair%22%2C%20%22%5Cu043a%5Cu0438%5Cu043d%5Cu043e%22%2C%20%22system%3Ainbox%22%2C%20%22system%3Alimit%3D16%22%5D\n
If the access key's permissions only permit search for certain tags, at least one positive whitelisted/non-blacklisted tag must be in the \"tags\" list or this will 403. Tags can be prepended with a hyphen to make a negated tag (e.g. \"-green eyes\"), but these will not be checked against the permissions whitelist.
Wildcards and namespace searches are supported, so if you search for 'character:sam*' or 'series:*', this will be handled correctly clientside.
Many system predicates are also supported using a text parser! The parser was designed by a clever user for human input and allows for a certain amount of error (e.g. ~= instead of \u2248, or \"isn't\" instead of \"is not\") or requires more information (e.g. the specific hashes for a hash lookup). Here's a big list of examples that are supported:
System Predicatesservice_name
service_name
service_name
> \u2157 (numerical services)service_name
is like (like/dislike services)service_name
= 13 (inc/dec services)Please test out the system predicates you want to send. If you are in help->advanced mode, you can test this parser in the advanced text input dialog when you click the OR* button on a tag autocomplete dropdown. More system predicate types and input formats will be available in future. Reverse engineering system predicate data from text is obviously tricky. If a system predicate does not parse, you'll get 400.
Also, OR predicates are now supported! Just nest within the tag list, and it'll be treated like an OR. For instance:
[ \"skirt\", [ \"samus aran\", \"lara croft\" ], \"system:height > 1000\" ]
Makes:
The file and tag services are for search domain selection, just like clicking the buttons in the client. They are optional--default is 'all my files' and 'all known tags'.
File searches occur in the display
tag_display_type
. If you want to pair autocomplete tag lookup from /search_tags to this file search (e.g. for making a standard booru search interface), then make sure you are searching display
tags there.
file_sort_asc is 'true' for ascending, and 'false' for descending. The default is descending.
file_sort_type is by default import time. It is an integer according to the following enum, and I have written the semantic (asc/desc) meaning for each type after:
The full list of numerical file ids that match the search. Example response
{\n \"file_ids\" : [125462, 4852415, 123, 591415]\n}\n
Example response with return_hashes=true{\n \"hashes\" : [\n \"1b04c4df7accd5a61c5d02b36658295686b0abfebdc863110e7d7249bba3f9ad\",\n \"fe416723c731d679aa4d20e9fd36727f4a38cd0ac6d035431f0f452fad54563f\",\n \"b53505929c502848375fbc4dab2f40ad4ae649d34ef72802319a348f81b52bad\"\n ],\n \"file_ids\" : [125462, 4852415, 123]\n}\n
You can of course also specify return_hashes=true&return_file_ids=false
just to get the hashes. The order of both lists is the same.
File ids are internal and specific to an individual client. For a client, a file with hash H always has the same file id N, but two clients will have different ideas about which N goes with which H. IDs are a bit faster to retrieve than hashes and search with en masse, which is why they are exposed here.
This search does not apply the implicit limit that most clients set to all searches (usually 10,000), so if you do system:everything on a client with millions of files, expect to get boshed. Even with a system:limit included, complicated queries with large result sets may take several seconds to respond. Just like the client itself.
"},{"location":"developer_api.html#get_files_file_hashes","title":"GET/get_files/file_hashes
","text":"Lookup file hashes from other hashes.
Restricted access: YES. Search for Files permission needed.Required Headers: n/a
Arguments (in percent-encoded JSON):hash
: (selective, a hexadecimal hash)hashes
: (selective, a list of hexadecimal hashes)source_hash_type
: [sha256|md5|sha1|sha512] (optional, defaulting to sha256)desired_hash_type
: [sha256|md5|sha1|sha512]If you have some MD5 hashes and want to see what their SHA256 are, or vice versa, this is the place. Hydrus records the non-SHA256 hashes for every file it has ever imported. This data is not removed on file deletion.
Example request/get_files/file_hashes?hash=ec5c5a4d7da4be154597e283f0b6663c&source_hash_type=md5&desired_hash_type=sha256\n
Response: A mapping Object of the successful lookups. Where no matching hash is found, no entry will be made (therefore, if none of your source hashes have matches on the client, this will return an empty hashes
Object). Example response{\n \"hashes\" : {\n \"ec5c5a4d7da4be154597e283f0b6663c\" : \"2a0174970defa6f147f2eabba829c5b05aba1f1aea8b978611a07b7bb9cf9399\"\n }\n}\n
"},{"location":"developer_api.html#get_files_file_metadata","title":"GET /get_files/file_metadata
","text":"Get metadata about files in the client.
Restricted access: YES. Search for Files permission needed. Additional search permission limits may apply.Required Headers: n/a
Arguments (in percent-encoded JSON):create_new_file_ids
: true or false (optional if asking with hash(es), defaulting to false)only_return_identifiers
: true or false (optional, defaulting to false)only_return_basic_information
: true or false (optional, defaulting to false)detailed_url_information
: true or false (optional, defaulting to false)include_blurhash
: true or false (optional, defaulting to false. Only applies when only_return_basic_information
is true)include_notes
: true or false (optional, defaulting to false)include_services_object
: true or false (optional, defaulting to true)hide_service_keys_tags
: Deprecated, will be deleted soon! true or false (optional, defaulting to true)If your access key is restricted by tag, the files you search for must have been in the most recent search result.
Example request for two files with ids 123 and 4567/get_files/file_metadata?file_ids=%5B123%2C%204567%5D\n
The same, but only wants hashes back/get_files/file_metadata?file_ids=%5B123%2C%204567%5D&only_return_identifiers=true\n
And one that fetches two hashes/get_files/file_metadata?hashes=%5B%224c77267f93415de0bc33b7725b8c331a809a924084bee03ab2f5fae1c6019eb2%22%2C%20%223e7cb9044fe81bda0d7a84b5cb781cba4e255e4871cba6ae8ecd8207850d5b82%22%5D\n
This request string can obviously get pretty ridiculously long. It also takes a bit of time to fetch metadata from the database. In its normal searches, the client usually fetches file metadata in batches of 256.
Response: A list of JSON Objects that store a variety of file metadata. Also The Services Object for service reference.Example response
{\n \"services\" : \"The Services Object\",\n \"metadata\" : [\n {\n \"file_id\" : 123,\n \"hash\" : \"4c77267f93415de0bc33b7725b8c331a809a924084bee03ab2f5fae1c6019eb2\",\n \"size\" : 63405,\n \"mime\" : \"image/jpeg\",\n \"filetype_human\" : \"jpeg\",\n \"filetype_enum\" : 1,\n \"ext\" : \".jpg\",\n \"width\" : 640,\n \"height\" : 480,\n \"thumbnail_width\" : 200,\n \"thumbnail_height\" : 150,\n \"duration\" : null,\n \"time_modified\" : null,\n \"time_modified_details\" : {},\n \"file_services\" : {\n \"current\" : {},\n \"deleted\" : {}\n },\n \"ipfs_multihashes\" : {},\n \"has_audio\" : false,\n \"blurhash\" : \"U6PZfSi_.AyE_3t7t7R**0o#DgR4_3R*D%xt\",\n \"pixel_hash\" : \"2519e40f8105599fcb26187d39656b1b46f651786d0e32fff2dc5a9bc277b5bb\",\n \"num_frames\" : null,\n \"num_words\" : null,\n \"is_inbox\" : false,\n \"is_local\" : false,\n \"is_trashed\" : false,\n \"is_deleted\" : false,\n \"has_exif\" : true,\n \"has_human_readable_embedded_metadata\" : true,\n \"has_icc_profile\" : true,\n \"has_transparency\" : false,\n \"known_urls\" : [],\n \"ratings\" : {\n \"74d52c6238d25f846d579174c11856b1aaccdb04a185cb2c79f0d0e499284f2c\" : null,\n \"90769255dae5c205c975fc4ce2efff796b8be8a421f786c1737f87f98187ffaf\" : null,\n \"b474e0cbbab02ca1479c12ad985f1c680ea909a54eb028e3ad06750ea40d4106\" : 0\n },\n \"tags\" : {\n \"6c6f63616c2074616773\" : {\n \"storage_tags\" : {},\n \"display_tags\" : {}\n },\n \"37e3849bda234f53b0e9792a036d14d4f3a9a136d1cb939705dbcd5287941db4\" : {\n \"storage_tags\" : {},\n \"display_tags\" : {}\n },\n \"616c6c206b6e6f776e2074616773\" : {\n \"storage_tags\" : {},\n \"display_tags\" : {}\n }\n }\n },\n {\n \"file_id\" : 4567,\n \"hash\" : \"3e7cb9044fe81bda0d7a84b5cb781cba4e255e4871cba6ae8ecd8207850d5b82\",\n \"size\" : 199713,\n \"mime\" : \"video/webm\",\n \"filetype_human\" : \"webm\",\n \"filetype_enum\" : 21,\n \"ext\" : \".webm\",\n \"width\" : 1920,\n \"height\" : 1080,\n \"thumbnail_width\" : 200,\n \"thumbnail_height\" : 113,\n \"duration\" : 4040,\n \"time_modified\" : 1604055647,\n \"time_modified_details\" : {\n \"local\" : 1641044491,\n \"gelbooru.com\" : 1604055647\n },\n \"file_services\" : {\n \"current\" : {\n \"616c6c206c6f63616c2066696c6573\" : {\n \"time_imported\" : 1641044491\n },\n \"616c6c206c6f63616c206d65646961\" : {\n \"time_imported\" : 1641044491\n },\n \"cb072cffbd0340b67aec39e1953c074e7430c2ac831f8e78fb5dfbda6ec8dcbd\" : {\n \"time_imported\" : 1641204220\n }\n },\n \"deleted\" : {\n \"6c6f63616c2066696c6573\" : {\n \"time_deleted\" : 1641204274,\n \"time_imported\" : 1641044491\n }\n }\n },\n \"ipfs_multihashes\" : {\n \"55af93e0deabd08ce15ffb2b164b06d1254daab5a18d145e56fa98f71ddb6f11\" : \"QmReHtaET3dsgh7ho5NVyHb5U13UgJoGipSWbZsnuuM8tb\"\n },\n \"has_audio\" : true,\n \"blurhash\" : \"UHF5?xYk^6#M@-5b,1J5@[or[k6.};FxngOZ\",\n \"pixel_hash\" : \"1dd9625ce589eee05c22798a9a201602288a1667c59e5cd1fb2251a6261fbd68\",\n \"num_frames\" : 102,\n \"num_words\" : null,\n \"is_inbox\" : false,\n \"is_local\" : true,\n \"is_trashed\" : false,\n \"is_deleted\" : false,\n \"has_exif\" : false,\n \"has_human_readable_embedded_metadata\" : false,\n \"has_icc_profile\" : false,\n \"has_transparency\" : false,\n \"known_urls\" : [\n \"https://gelbooru.com/index.php?page=post&s=view&id=4841557\",\n \"https://img2.gelbooru.com//images/80/c8/80c8646b4a49395fb36c805f316c49a9.jpg\",\n \"http://origin-orig.deviantart.net/ed31/f/2019/210/7/8/beachqueen_samus_by_dandonfuga-ddcu1xg.jpg\"\n ],\n \"ratings\" : {\n \"74d52c6238d25f846d579174c11856b1aaccdb04a185cb2c79f0d0e499284f2c\" : true,\n \"90769255dae5c205c975fc4ce2efff796b8be8a421f786c1737f87f98187ffaf\" : 3,\n \"b474e0cbbab02ca1479c12ad985f1c680ea909a54eb028e3ad06750ea40d4106\" : 11\n },\n \"tags\" : {\n \"6c6f63616c2074616773\" : {\n \"storage_tags\" : {\n \"0\" : [\"samus favourites\"],\n \"2\" : [\"process this later\"]\n },\n \"display_tags\" : {\n \"0\" : [\"samus favourites\", \"favourites\"],\n \"2\" : [\"process this later\"]\n }\n },\n \"37e3849bda234f53b0e9792a036d14d4f3a9a136d1cb939705dbcd5287941db4\" : {\n \"storage_tags\" : {\n \"0\" : [\"blonde_hair\", \"blue_eyes\", \"looking_at_viewer\"],\n \"1\" : [\"bodysuit\"]\n },\n \"display_tags\" : {\n \"0\" : [\"blonde hair\", \"blue_eyes\", \"looking at viewer\"],\n \"1\" : [\"bodysuit\", \"clothing\"]\n }\n },\n \"616c6c206b6e6f776e2074616773\" : {\n \"storage_tags\" : {\n \"0\" : [\"samus favourites\", \"blonde_hair\", \"blue_eyes\", \"looking_at_viewer\"],\n \"1\" : [\"bodysuit\"]\n },\n \"display_tags\" : {\n \"0\" : [\"samus favourites\", \"favourites\", \"blonde hair\", \"blue_eyes\", \"looking at viewer\"],\n \"1\" : [\"bodysuit\", \"clothing\"]\n }\n }\n }\n }\n ]\n}\n
And one where only_return_identifiers is true{\n \"services\" : \"The Services Object\",\n \"metadata\" : [\n {\n \"file_id\" : 123,\n \"hash\" : \"4c77267f93415de0bc33b7725b8c331a809a924084bee03ab2f5fae1c6019eb2\"\n },\n {\n \"file_id\" : 4567,\n \"hash\" : \"3e7cb9044fe81bda0d7a84b5cb781cba4e255e4871cba6ae8ecd8207850d5b82\"\n }\n ]\n}\n
And where only_return_basic_information is true{\n \"services\" : \"The Services Object\",\n \"metadata\" : [\n {\n \"file_id\" : 123,\n \"hash\" : \"4c77267f93415de0bc33b7725b8c331a809a924084bee03ab2f5fae1c6019eb2\",\n \"size\" : 63405,\n \"mime\" : \"image/jpeg\",\n \"filetype_human\" : \"jpeg\",\n \"filetype_enum\" : 1,\n \"ext\" : \".jpg\",\n \"width\" : 640,\n \"height\" : 480,\n \"duration\" : null,\n \"has_audio\" : false,\n \"num_frames\" : null,\n \"num_words\" : null\n },\n {\n \"file_id\" : 4567,\n \"hash\" : \"3e7cb9044fe81bda0d7a84b5cb781cba4e255e4871cba6ae8ecd8207850d5b82\",\n \"size\" : 199713,\n \"mime\" : \"video/webm\",\n \"filetype_human\" : \"webm\",\n \"filetype_enum\" : 21,\n \"ext\" : \".webm\",\n \"width\" : 1920,\n \"height\" : 1080,\n \"duration\" : 4040,\n \"has_audio\" : true,\n \"num_frames\" : 102,\n \"num_words\" : null\n }\n ]\n}\n
"},{"location":"developer_api.html#basics","title":"basics","text":"Size is in bytes. Duration is in milliseconds, and may be an int or a float.
is_trashed
means if the file is currently in the trash but available on the hard disk. is_deleted
means currently either in the trash or completely deleted from disk.
file_services
stores which file services the file is currently in and deleted from. The entries are by the service key, same as for tags later on. In rare cases, the timestamps may be null
, if they are unknown (e.g. a time_deleted
for the file deleted before this information was tracked). The time_modified
can also be null. Time modified is just the filesystem modified time for now, but it will evolve into more complicated storage in future with multiple locations (website post times) that'll be aggregated to a sensible value in UI.
ipfs_multihashes
stores the ipfs service key to any known multihash for the file.
The thumbnail_width
and thumbnail_height
are a generally reliable prediction but aren't a promise. The actual thumbnail you get from /get_files/thumbnail will be different if the user hasn't looked at it since changing their thumbnail options. You only get these rows for files that hydrus actually generates an actual thumbnail for. Things like pdf won't have it. You can use your own thumb, or ask the api and it'll give you a fixed fallback; those are mostly 200x200, but you can and should size them to whatever you want.
If the file has a thumbnail, blurhash
gives a base 83 encoded string of its blurhash. pixel_hash
is an SHA256 of the image's pixel data and should exactly match for pixel-identical files (it is used in the duplicate system for 'must be pixel duplicates').
The tags
structure is similar to the /add_tags/add_tags scheme, excepting that the status numbers are:
Note
Since JSON Object keys must be strings, these status numbers are strings, not ints.
While the 'storage_tags' represent the actual tags stored on the database for a file, 'display_tags' reflect how tags appear in the UI, after siblings are collapsed and parents are added. If you want to edit a file's tags, refer to the storage tags. If you want to render to the user, use the display tags. The display tag calculation logic is very complicated; if the storage tags change, do not try to guess the new display tags yourself--just ask the API again.
"},{"location":"developer_api.html#ratings","title":"ratings","text":"The ratings
structure is simple, but it holds different data types. For each service:
Check The Services Object to see the shape of a rating star, and min/max number of stars in a numerical service.
"},{"location":"developer_api.html#services","title":"services","text":"The tags
, ratings
, and file_services
structures use the hexadecimal service_key
extensively. If you need to look up the respective service name or type, check The Services Object under the top level services
key.
Note
If you look, those file structures actually include the service name and type already, but this bloated data is deprecated and will be deleted in 2024, so please transition over.
If you don't want the services object (it is generally superfluous on the 'simple' responses), then add include_services_object=false
.
The metadata
list should come back in the same sort order you asked, whether that is in file_ids
or hashes
!
If you ask with hashes rather than file_ids, hydrus will, by default, only return results when it has seen those hashes before. This is to stop the client making thousands of new file_id records in its database if you perform a scanning operation. If you ask about a hash the client has never encountered before--for which there is no file_id--you will get this style of result:
Missing file_id example{\n \"metadata\" : [\n {\n \"file_id\" : null,\n \"hash\" : \"766da61f81323629f982bc1b71b5c1f9bba3f3ed61caf99906f7f26881c3ae93\"\n }\n ]\n}\n
You can change this behaviour with create_new_file_ids=true
, but bear in mind you will get a fairly 'empty' metadata result with lots of 'null' lines, so this is only useful for gathering the numerical ids for later Client API work.
If you ask about file_ids that do not exist, you'll get 404.
If you set only_return_basic_information=true
, this will be much faster for first-time requests than the full metadata result, but it will be slower for repeat requests. The full metadata object is cached after first fetch, the limited file info object is not. You can optionally set include_blurhash
when using this option to fetch blurhash strings for the files.
If you add detailed_url_information=true
, a new entry, detailed_known_urls
, will be added for each file, with a list of the same structure as /add_urls/get_url_info
. This may be an expensive request if you are querying thousands of files at once.
{\n \"detailed_known_urls\": [\n {\n \"normalised_url\": \"https://gelbooru.com/index.php?id=4841557&page=post&s=view\",\n \"url_type\": 0,\n \"url_type_string\": \"post url\",\n \"match_name\": \"gelbooru file page\",\n \"can_parse\": true\n },\n {\n \"normalised_url\": \"https://img2.gelbooru.com//images/80/c8/80c8646b4a49395fb36c805f316c49a9.jpg\",\n \"url_type\": 5,\n \"url_type_string\": \"unknown url\",\n \"match_name\": \"unknown url\",\n \"can_parse\": false\n }\n ]\n}\n
"},{"location":"developer_api.html#get_files_file","title":"GET /get_files/file
","text":"Get a file.
Restricted access: YES. Search for Files permission needed. Additional search permission limits may apply.Required Headers: n/a
Arguments :file_id
: (selective, numerical file id for the file)hash
: (selective, a hexadecimal SHA256 hash for the file)download
: (optional, boolean, default false
)Only use one of file_id or hash. As with metadata fetching, you may only use the hash argument if you have access to all files. If you are tag-restricted, you will have to use a file_id in the last search you ran.
Example request
/get_files/file?file_id=452158\n
Example request/get_files/file?hash=7f30c113810985b69014957c93bc25e8eb4cf3355dae36d8b9d011d8b0cf623a&download=true\n
Response: The file itself. You should get the correct mime type as the Content-Type header. By default, this will set the Content-Disposition
header to inline
, which causes a web browser to show the file. If you set download=true
, it will set it to attachment
, which triggers the browser to automatically download it (or open the 'save as' dialog) instead.
/get_files/thumbnail
","text":"Get a file's thumbnail.
Restricted access: YES. Search for Files permission needed. Additional search permission limits may apply.Required Headers: n/a
Arguments:file_id
: (selective, numerical file id for the file)hash
: (selective, a hexadecimal SHA256 hash for the file)Only use one. As with metadata fetching, you may only use the hash argument if you have access to all files. If you are tag-restricted, you will have to use a file_id in the last search you ran.
Example request
/get_files/thumbnail?file_id=452158\n
Example request/get_files/thumbnail?hash=7f30c113810985b69014957c93bc25e8eb4cf3355dae36d8b9d011d8b0cf623a\n
Response: The thumbnail for the file. Some hydrus thumbs are jpegs, some are pngs. It should give you the correct image/jpeg or image/png Content-Type.
If hydrus keeps no thumbnail for the filetype, for instance with pdfs, then you will get the same default 'pdf' icon you see in the client. If the file does not exist in the client, or the thumbnail was expected but is missing from storage, you will get the fallback 'hydrus' icon, again just as you would in the client itself. This request should never give a 404.
Size of Normal Thumbs
Thumbnails are not guaranteed to be the correct size! If a thumbnail has not been loaded in the client in years, it could well have been fitted for older thumbnail settings. Also, even 'clean' thumbnails will not always fit inside the settings' bounding box; they may be boosted due to a high-DPI setting or spill over due to a 'fill' vs 'fit' preference. You cannot easily predict what resolution a thumbnail will or should have!
In general, thumbnails are the correct ratio. If you are drawing thumbs, you should embed them to fit or fill, but don't fix them at 100% true size: make sure they can scale to the size you want!
Size of Defaults
If you get a 'default' filetype thumbnail like the pdf or hydrus one, you will be pulling the pngs straight from the hydrus/static folder. They will most likely be 200x200 pixels.
"},{"location":"developer_api.html#get_files_render","title":"GET/get_files/render
","text":"Get an image file as rendered by Hydrus.
Restricted access: YES. Search for Files permission needed. Additional search permission limits may apply.Required Headers: n/a
Arguments :file_id
: (selective, numerical file id for the file)hash
: (selective, a hexadecimal SHA256 hash for the file)download
: (optional, boolean, default false
)Only use one of file_id or hash. As with metadata fetching, you may only use the hash argument if you have access to all files. If you are tag-restricted, you will have to use a file_id in the last search you ran.
The file you request must be a still image file that Hydrus can render (this includes PSD files). This request uses the client image cache.
Example request
/get_files/render?file_id=452158\n
Example request/get_files/render?hash=7f30c113810985b69014957c93bc25e8eb4cf3355dae36d8b9d011d8b0cf623a&download=true\n
Response: A PNG file of the image as would be rendered in the client. It will be converted to sRGB color if the file had a color profile but the rendered PNG will not have any color profile. By default, this will set the Content-Disposition
header to inline
, which causes a web browser to show the file. If you set download=true
, it will set it to attachment
, which triggers the browser to automatically download it (or open the 'save as' dialog) instead.
This refers to the File Relationships system, which includes 'potential duplicates', 'duplicates', and 'alternates'.
This system is pending significant rework and expansion, so please do not get too married to some of the routines here. I am mostly just exposing my internal commands, so things are a little ugly/hacked. I expect duplicate and alternate groups to get some form of official identifier in future, which may end up being the way to refer and edit things here.
Also, at least for now, 'Manage File Relationships' permission is not going to be bound by the search permission restrictions that normal file search does. Getting this file relationship management permission allows you to search anything.
There is more work to do here, including adding various 'dissolve'/'undo' commands to break groups apart.
"},{"location":"developer_api.html#manage_file_relationships_get_file_relationships","title":"GET/manage_file_relationships/get_file_relationships
","text":"Get the current relationships for one or more files.
Restricted access: YES. Manage File Relationships permission needed.Required Headers: n/a
Arguments (in percent-encoded JSON):/manage_file_relationships/get_file_relationships?hash=ac940bb9026c430ea9530b4f4f6980a12d9432c2af8d9d39dfc67b05d91df11d\n
Response: A JSON Object mapping the hashes to their relationships. Example response{\n \"file_relationships\" : {\n \"ac940bb9026c430ea9530b4f4f6980a12d9432c2af8d9d39dfc67b05d91df11d\" : {\n \"is_king\" : false,\n \"king\" : \"8784afbfd8b59de3dcf2c13dc1be9d7cb0b3d376803c8a7a8b710c7c191bb657\",\n \"king_is_on_file_domain\" : true,\n \"king_is_local\" : true,\n \"0\" : [\n ],\n \"1\" : [],\n \"3\" : [\n \"8bf267c4c021ae4fd7c4b90b0a381044539519f80d148359b0ce61ce1684fefe\"\n ],\n \"8\" : [\n \"8784afbfd8b59de3dcf2c13dc1be9d7cb0b3d376803c8a7a8b710c7c191bb657\",\n \"3fa8ef54811ec8c2d1892f4f08da01e7fc17eed863acae897eb30461b051d5c3\"\n ]\n }\n }\n}\n
king
refers to which file is set as the best of a duplicate group. If you are doing potential duplicate comparisons, the kings of your two groups are usually the ideal representatives, and the 'get some pairs to filter'-style commands try to select the kings of the various to-be-compared duplicate groups. is_king
is a convenience bool for when a file is king of its own group.
It is possible for the king to not be available. Every group has a king, but if that file has been deleted, or if the file domain here is limited and the king is on a different file service, then it may not be available. A similar issue occurs when you search for filtering pairs--while it is ideal to compare kings with kings, if you set 'files must be pixel dupes', then the user will expect to see those pixel duplicates, not their champions--you may be forced to compare non-kings. king_is_on_file_domain
lets you know if the king is on the file domain you set, and king_is_local
lets you know if it is on the hard disk--if king_is_local=true
, you can do a /get_files/file
request on it. It is generally rare, but you have to deal with the king being unavailable--in this situation, your best bet is to just use the file itself as its own representative.
All the relationships you get are filtered by the file domain. If you set the file domain to 'all known files', you will get every relationship a file has, including all deleted files, which is often less useful than you would think. The default, 'all my files' is usually most useful.
A file that has no duplicates is considered to be in a duplicate group of size 1 and thus is always its own king.
The numbers are from a duplicate status enum, as so:
Note that because of JSON constraints, these are the string versions of the integers since they are Object keys.
All the hashes given here are in 'all my files', i.e. not in the trash. A file may have duplicates that have long been deleted, but, like the null king above, they will not show here.
"},{"location":"developer_api.html#manage_file_relationships_get_potentials_count","title":"GET/manage_file_relationships/get_potentials_count
","text":"Get the count of remaining potential duplicate pairs in a particular search domain. Exactly the same as the counts you see in the duplicate processing page.
Restricted access: YES. Manage File Relationships permission needed.Required Headers: n/a
Arguments (in percent-encoded JSON):tag_service_key_1
: (optional, default 'all known tags', a hex tag service key)tags_1
: (optional, default system:everything, a list of tags you wish to search for)tag_service_key_2
: (optional, default 'all known tags', a hex tag service key)tags_2
: (optional, default system:everything, a list of tags you wish to search for)potentials_search_type
: (optional, integer, default 0, regarding how the pairs should match the search(es))pixel_duplicates
: (optional, integer, default 1, regarding whether the pairs should be pixel duplicates)max_hamming_distance
: (optional, integer, default 4, the max 'search distance' of the pairs)/manage_file_relationships/get_potentials_count?tag_service_key_1=c1ba23c60cda1051349647a151321d43ef5894aacdfb4b4e333d6c4259d56c5f&tags_1=%5B%22dupes_to_process%22%2C%20%22system%3Awidth%3C400%22%5D&potentials_search_type=1&pixel_duplicates=2&max_hamming_distance=0&max_num_pairs=50\n
tag_service_key_x
and tags_x
work the same as /get_files/search_files. The _2
variants are only useful if the potentials_search_type
is 2.
potentials_search_type
and pixel_duplicates
are enums:
-and-
The max_hamming_distance
is the same 'search distance' you see in the Client UI. A higher number means more speculative 'similar files' search. If pixel_duplicates
is set to 'must be', then max_hamming_distance
is obviously ignored.
{\n \"potential_duplicates_count\" : 17\n}\n
If you confirm that a pair of potentials are duplicates, this may transitively collapse other potential pairs and decrease the count by more than 1.
"},{"location":"developer_api.html#manage_file_relationships_get_potential_pairs","title":"GET/manage_file_relationships/get_potential_pairs
","text":"Get some potential duplicate pairs for a filtering workflow. Exactly the same as the 'duplicate filter' in the duplicate processing page.
Restricted access: YES. Manage File Relationships permission needed.Required Headers: n/a
Arguments (in percent-encoded JSON):tag_service_key_1
: (optional, default 'all known tags', a hex tag service key)tags_1
: (optional, default system:everything, a list of tags you wish to search for)tag_service_key_2
: (optional, default 'all known tags', a hex tag service key)tags_2
: (optional, default system:everything, a list of tags you wish to search for)potentials_search_type
: (optional, integer, default 0, regarding how the pairs should match the search(es))pixel_duplicates
: (optional, integer, default 1, regarding whether the pairs should be pixel duplicates)max_hamming_distance
: (optional, integer, default 4, the max 'search distance' of the pairs)max_num_pairs
: (optional, integer, defaults to client's option, how many pairs to get in a batch)/manage_file_relationships/get_potential_pairs?tag_service_key_1=c1ba23c60cda1051349647a151321d43ef5894aacdfb4b4e333d6c4259d56c5f&tags_1=%5B%22dupes_to_process%22%2C%20%22system%3Awidth%3C400%22%5D&potentials_search_type=1&pixel_duplicates=2&max_hamming_distance=0&max_num_pairs=50\n
The search arguments work the same as /manage_file_relationships/get_potentials_count.
max_num_pairs
is simple and just caps how many pairs you get.
{\n \"potential_duplicate_pairs\" : [\n [ \"16470d6e73298cd75d9c7e8e2004810e047664679a660a9a3ba870b0fa3433d3\", \"7ed062dc76265d25abeee5425a859cfdf7ab26fd291f50b8de7ca381e04db079\" ],\n [ \"eeea390357f259b460219d9589b4fa11e326403208097b1a1fbe63653397b210\", \"9215dfd39667c273ddfae2b73d90106b11abd5fd3cbadcc2afefa526bb226608\" ],\n [ \"a1ea7d671245a3ae35932c603d4f3f85b0d0d40c5b70ffd78519e71945031788\", \"8e9592b2dfb436fe0a8e5fa15de26a34a6dfe4bca9d4363826fac367a9709b25\" ]\n ]\n}\n
The selected pair sample and their order is strictly hardcoded for now (e.g. to guarantee that a decision will not invalidate any other pair in the batch, you shouldn't see the same file twice in a batch, nor two files in the same duplicate group). Treat it as the client filter does, where you fetch batches to process one after another. I expect to make it more flexible in future, in the client itself and here.
You will see significantly fewer than max_num_pairs
(and potential duplicate count) as you close to the last available pairs, and when there are none left, you will get an empty list.
/manage_file_relationships/get_random_potentials
","text":"Get some random potentially duplicate file hashes. Exactly the same as the 'show some random potential dupes' button in the duplicate processing page.
Restricted access: YES. Manage File Relationships permission needed.Required Headers: n/a
Arguments (in percent-encoded JSON):tag_service_key_1
: (optional, default 'all known tags', a hex tag service key)tags_1
: (optional, default system:everything, a list of tags you wish to search for)tag_service_key_2
: (optional, default 'all known tags', a hex tag service key)tags_2
: (optional, default system:everything, a list of tags you wish to search for)potentials_search_type
: (optional, integer, default 0, regarding how the files should match the search(es))pixel_duplicates
: (optional, integer, default 1, regarding whether the files should be pixel duplicates)max_hamming_distance
: (optional, integer, default 4, the max 'search distance' of the files)/manage_file_relationships/get_random_potentials?tag_service_key_1=c1ba23c60cda1051349647a151321d43ef5894aacdfb4b4e333d6c4259d56c5f&tags_1=%5B%22dupes_to_process%22%2C%20%22system%3Awidth%3C400%22%5D&potentials_search_type=1&pixel_duplicates=2&max_hamming_distance=0\n
The arguments work the same as /manage_file_relationships/get_potentials_count, with the caveat that potentials_search_type
has special logic:
Essentially, the first hash is the 'master' to which the others are paired. The other files will include every matching file.
Response: A JSON Object listing a group of hashes exactly as the client would. Example response{\n \"random_potential_duplicate_hashes\" : [\n \"16470d6e73298cd75d9c7e8e2004810e047664679a660a9a3ba870b0fa3433d3\",\n \"7ed062dc76265d25abeee5425a859cfdf7ab26fd291f50b8de7ca381e04db079\",\n \"9e0d6b928b726562d70e1f14a7b506ba987c6f9b7f2d2e723809bb11494c73e6\",\n \"9e01744819b5ff2a84dda321e3f1a326f40d0e7f037408ded9f18a11ee2b2da8\"\n ]\n}\n
If there are no potential duplicate groups in the search, this returns an empty list.
"},{"location":"developer_api.html#manage_file_relationships_set_file_relationships","title":"POST/manage_file_relationships/set_file_relationships
","text":"Set the relationships to the specified file pairs.
Restricted access: YES. Manage File Relationships permission needed. Required Headers:Content-Type
: application/jsonrelationships
: (a list of Objects, one for each file-pair being set)Each Object is:
* `hash_a`: (a hexadecimal SHA256 hash)\n* `hash_b`: (a hexadecimal SHA256 hash)\n* `relationship`: (integer enum for the relationship being set)\n* `do_default_content_merge`: (bool)\n* `delete_a`: (optional, bool, default false)\n* `delete_b`: (optional, bool, default false)\n
hash_a
and hash_b
are normal hex SHA256 hashes for your file pair.
relationship
is one of this enum:
2, 4, and 7 all make the files 'duplicates' (8 under /get_file_relationships
), which, specifically, merges the two files' duplicate groups. 'same quality' has different duplicate content merge options to the better/worse choices, but it ultimately sets something similar to A>B (but see below for more complicated outcomes). You obviously don't have to use 'B is better' if you prefer just to swap the hashes. Do what works for you.
do_default_content_merge
sets whether the user's duplicate content merge options should be loaded and applied to the files along with the relationship. Most operations in the client do this automatically, so the user may expect it to apply, but if you want to do content merge yourself, set this to false.
delete_a
and delete_b
are booleans that select whether to delete A and/or B in the same operation as setting the relationship. You can also do this externally if you prefer.
{\n \"relationships\" : [\n {\n \"hash_a\" : \"b54d09218e0d6efc964b78b070620a1fa19c7e069672b4c6313cee2c9b0623f2\",\n \"hash_b\" : \"bbaa9876dab238dcf5799bfd8319ed0bab805e844f45cf0de33f40697b11a845\",\n \"relationship\" : 4,\n \"do_default_content_merge\" : true,\n \"delete_b\" : true\n },\n {\n \"hash_a\" : \"22667427eaa221e2bd7ef405e1d2983846c863d40b2999ce8d1bf5f0c18f5fb2\",\n \"hash_b\" : \"65d228adfa722f3cd0363853a191898abe8bf92d9a514c6c7f3c89cfed0bf423\",\n \"relationship\" : 4,\n \"do_default_content_merge\" : true,\n \"delete_b\" : true\n },\n {\n \"hash_a\" : \"0480513ffec391b77ad8c4e57fe80e5b710adfa3cb6af19b02a0bd7920f2d3ec\",\n \"hash_b\" : \"5fab162576617b5c3fc8caabea53ce3ab1a3c8e0a16c16ae7b4e4a21eab168a7\",\n \"relationship\" : 2,\n \"do_default_content_merge\" : true\n }\n ]\n}\n
Response: 200 with no content. If you try to add an invalid or redundant relationship, for instance setting files that are already duplicates as potential duplicates, no changes are made.
This is the file relationships request that is probably most likely to change in future. I may implement content merge options. I may move from file pairs to group identifiers. When I expand alternates, those file groups are going to support more variables.
"},{"location":"developer_api.html#king_merge_rules","title":"king merge rules","text":"Recall in /get_file_relationships
that we discussed how duplicate groups have a 'king' for their best file. This file is the most useful representative when you do comparisons, since if you say \"King A > King B\", then we know that King A is also better than all of King B's normal duplicate group members. We can merge the group simply just by folding King B and all the other members into King A's group.
So what happens if you say 'A = B'? We have to have a king, so which should it be?
What happens if you say \"non-king member of A > non-king member of B\"? We don't want to merge all of B into A, since King B might be higher quality than King A.
The logic here can get tricky, but I have tried my best to avoid overcommitting and accidentally promoting the wrong king. Here are all the possible situations ('>' means 'better than', and '=' means 'same quality as'):
MergesSo, if you can, always present kings to your users, and action using those kings' hashes. It makes the merge logic easier in all cases. Remember that you can set system:is the best quality file of its duplicate group
in any file search to exclude any non-kings (e.g. if you are hunting for easily actionable pixel potential duplicates).
/manage_file_relationships/set_kings
","text":"Set the specified files to be the kings of their duplicate groups.
Restricted access: YES. Manage File Relationships permission needed. Required Headers:Content-Type
: application/json{\n \"file_id\" : 123\n}\n
Response: 200 with no content. The files will be promoted to be the kings of their respective duplicate groups. If the file is already the king (also true for any file with no duplicates), this is idempotent. It also processes the files in the given order, so if you specify two files in the same group, the latter will be the king at the end of the request.
"},{"location":"developer_api.html#managing_cookies","title":"Managing Cookies","text":"This refers to the cookies held in the client's session manager, which you can review under network->data->manage session cookies. These are sent to every request on the respective domains.
"},{"location":"developer_api.html#manage_cookies_get_cookies","title":"GET/manage_cookies/get_cookies
","text":"Get the cookies for a particular domain.
Restricted access: YES. Manage Cookies and Headers permission needed.Required Headers: n/a
Arguments:domain
/manage_cookies/get_cookies?domain=gelbooru.com\n
Response: A JSON Object listing all the cookies for that domain in [ name, value, domain, path, expires ] format. Example response{\n \"cookies\" : [\n [\"__cfduid\", \"f1bef65041e54e93110a883360bc7e71\", \".gelbooru.com\", \"/\", 1596223327],\n [\"pass_hash\", \"0b0833b797f108e340b315bc5463c324\", \"gelbooru.com\", \"/\", 1585855361],\n [\"user_id\", \"123456\", \"gelbooru.com\", \"/\", 1585855361]\n ]\n}\n
Note that these variables are all strings except 'expires', which is either an integer timestamp or _null_ for session cookies.\n\nThis request will also return any cookies for subdomains. The session system in hydrus generally stores cookies according to the second-level domain, so if you request for specific.someoverbooru.net, you will still get the cookies for someoverbooru.net and all its subdomains.\n
"},{"location":"developer_api.html#manage_cookies_set_cookies","title":"POST /manage_cookies/set_cookies
","text":"Set some new cookies for the client. This makes it easier to 'copy' a login from a web browser or similar to hydrus if hydrus's login system can't handle the site yet.
Restricted access: YES. Manage Cookies and Headers permission needed. Required Headers:Content-Type
: application/jsoncookies
: (a list of cookie rows in the same format as the GET request above){\n \"cookies\" : [\n [\"PHPSESSID\", \"07669eb2a1a6e840e498bb6e0799f3fb\", \".somesite.com\", \"/\", 1627327719],\n [\"tag_filter\", \"1\", \".somesite.com\", \"/\", 1627327719]\n ]\n}\n
You can set 'value' to be null, which will clear any existing cookie with the corresponding name, domain, and path (acting essentially as a delete).
Expires can be null, but session cookies will time-out in hydrus after 60 minutes of non-use.
"},{"location":"developer_api.html#managing_http_headers","title":"Managing HTTP Headers","text":"This refers to the custom headers you can see under network->data->manage http headers.
"},{"location":"developer_api.html#manage_headers_get_headers","title":"GET/manage_headers/get_headers
","text":"Get the custom http headers.
Restricted access: YES. Manage Cookies and Headers permission needed.Required Headers: n/a
Arguments:domain
: optional, the domain to fetch headers for/manage_headers/get_headers?domain=gelbooru.com\n
Example request (for global)/manage_headers/get_headers\n
Response: A JSON Object listing all the headers: Example response{\n \"network_context\" : {\n \"type\" : 2,\n \"data\" : \"gelbooru.com\"\n },\n \"headers\" : {\n \"User-Agent\" : {\n \"value\" : \"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0\",\n \"approved\" : \"approved\",\n \"reason\" : \"Set by Client API\"\n },\n \"DNT\" : {\n \"value\" : \"1\",\n \"approved\" : \"approved\",\n \"reason\" : \"Set by Client API\"\n }\n }\n}\n
"},{"location":"developer_api.html#manage_headers_set_headers","title":"POST /manage_headers/set_headers
","text":"Manages the custom http headers.
Restricted access: YES. Manage Cookies and Headers permission needed. Required Headers: *Content-Type
: application/json Arguments (in JSON): domain
: (optional, the specific domain to set the header for)headers
: (a JSON Object that holds \"key\" objects){\n \"domain\" : \"mysite.com\",\n \"headers\" : {\n \"User-Agent\" : {\n \"value\" : \"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0\"\n },\n \"DNT\" : {\n \"value\" : \"1\"\n },\n \"CoolStuffToken\" : {\n \"value\" : \"abcdef0123456789\",\n \"approved\" : \"pending\",\n \"reason\" : \"This unlocks the Sonic fanfiction!\"\n }\n }\n}\n
Example request body that deletes{\n \"domain\" : \"myothersite.com\",\n \"headers\" : {\n \"User-Agent\" : {\n \"value\" : null\n },\n \"Authorization\" : {\n \"value\" : null\n }\n }\n}\n
If you do not set a domain, or you set it to null
, the 'context' will be the global context, which applies as a fallback to all jobs.
Domain headers also apply to their subdomains--unless they are overwritten by specific subdomain entries.
Each key
Object under headers
has the same form as /manage_headers/get_headers. value
is obvious--it is the value of the header. If the pair doesn't exist yet, you need the value
, but if you just want to approve something, it is optional. Set it to null
to delete an existing pair.
You probably won't ever use approved
or reason
, but they plug into the 'validation' system in the client. They are both optional. Approved can be any of [ approved, denied, pending ]
, and by default everything you add will be approved
. If there is anything pending
when a network job asks, the user will be presented with a yes/no popup presenting the reason for the header. If they click 'no', the header is set to denied
and the network job goes ahead without it. If you have a header that changes behaviour or unlocks special content, you might like to make it optional in this way.
If you need to reinstate it, the default global
User-Agent
is Mozilla/5.0 (compatible; Hydrus Client)
.
/manage_headers/set_user_agent
","text":"This is deprecated--move to /manage_headers/set_headers!
This sets the 'Global' User-Agent for the client, as typically editable under network->data->manage http headers, for instance if you want hydrus to appear as a specific browser associated with some cookies.
Restricted access: YES. Manage Cookies and Headers permission needed. Required Headers: *Content-Type
: application/json Arguments (in JSON): user-agent
: (a string){\n \"user-agent\" : \"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0\"\n}\n
Send an empty string to reset the client back to the default User-Agent, which should be Mozilla/5.0 (compatible; Hydrus Client)
.
This refers to the pages of the main client UI.
"},{"location":"developer_api.html#manage_pages_get_pages","title":"GET/manage_pages/get_pages
","text":"Get the page structure of the current UI session.
Restricted access: YES. Manage Pages permission needed.Required Headers: n/a
Arguments: n/a
Response:A JSON Object of the top-level page 'notebook' (page of pages) detailing its basic information and current sub-pages. Page of pages beneath it will list their own sub-page lists. Example response
{\n \"pages\" : {\n \"name\" : \"top pages notebook\",\n \"page_key\" : \"3b28d8a59ec61834325eb6275d9df012860a1ecfd9e1246423059bc47fb6d5bd\",\n \"page_state\" : 0,\n \"page_type\" : 10,\n \"selected\" : true,\n \"pages\" : [\n {\n \"name\" : \"files\",\n \"page_key\" : \"d436ff5109215199913705eb9a7669d8a6b67c52e41c3b42904db083255ca84d\",\n \"page_state\" : 0,\n \"page_type\" : 6,\n \"selected\" : false\n },\n {\n \"name\" : \"thread watcher\",\n \"page_key\" : \"40887fa327edca01e1d69b533dddba4681b2c43e0b4ebee0576177852e8c32e7\",\n \"page_state\" : 0,\n \"page_type\" : 9,\n \"selected\" : false\n },\n {\n \"name\" : \"pages\",\n \"page_key\" : \"2ee7fa4058e1e23f2bd9e915cdf9347ae90902a8622d6559ba019a83a785c4dc\",\n \"page_state\" : 0,\n \"page_type\" : 10,\n \"selected\" : true,\n \"pages\" : [\n {\n \"name\" : \"urls\",\n \"page_key\" : \"9fe22cb760d9ee6de32575ed9f27b76b4c215179cf843d3f9044efeeca98411f\",\n \"page_state\" : 0,\n \"page_type\" : 7,\n \"selected\" : true\n },\n {\n \"name\" : \"files\",\n \"page_key\" : \"2977d57fc9c588be783727bcd54225d577b44e8aa2f91e365a3eb3c3f580dc4e\",\n \"page_state\" : 0,\n \"page_type\" : 6,\n \"selected\" : false\n }\n ]\n }\n ]\n }\n}\n
name
is the full text on the page tab.
page_key
is a unique identifier for the page. It will stay the same for a particular page throughout the session, but new ones are generated on a session reload.
page_type
is as follows:
page_state
is as follows:
Most pages will be 0, normal/ready, at all times. Large pages will start in an 'initialising' state for a few seconds, which means their session-saved thumbnails aren't loaded yet. Search pages will enter 'searching' after a refresh or search change and will either return to 'ready' when the search is complete, or fall to 'search cancelled' if the search was interrupted (usually this means the user clicked the 'stop' button that appears after some time).
selected
means which page is currently in view. It will propagate down the page of pages until it terminates. It may terminate in an empty page of pages, so do not assume it will end on a media page.
The top page of pages will always be there, and always selected.
"},{"location":"developer_api.html#manage_pages_get_page_info","title":"GET/manage_pages/get_page_info
","text":"Get information about a specific page.
Under Construction
This is under construction. The current call dumps a ton of info for different downloader pages. Please experiment in IRL situations and give feedback for now! I will flesh out this help with more enumeration info and examples as this gets nailed down. POST commands to alter pages (adding, removing, highlighting), will come later.
Restricted access: YES. Manage Pages permission needed.Required Headers: n/a
Arguments:page_key
: (hexadecimal page_key as stated in /manage_pages/get_pages)simple
: true or false (optional, defaulting to true)/manage_pages/get_page_info?page_key=aebbf4b594e6986bddf1eeb0b5846a1e6bc4e07088e517aff166f1aeb1c3c9da&simple=true\n
Response description A JSON Object of the page's information. At present, this mostly means downloader information. Example response with simple = true
{\n \"page_info\" : {\n \"name\" : \"threads\",\n \"page_key\" : \"aebbf4b594e6986bddf1eeb0b5846a1e6bc4e07088e517aff166f1aeb1c3c9da\",\n \"page_state\" : 0,\n \"page_type\" : 3,\n \"management\" : {\n \"multiple_watcher_import\" : {\n \"watcher_imports\" : [\n {\n \"url\" : \"https://someimageboard.net/m/123456\",\n \"watcher_key\" : \"cf8c3525c57a46b0e5c2625812964364a2e801f8c49841c216b8f8d7a4d06d85\",\n \"created\" : 1566164269,\n \"last_check_time\" : 1566164272,\n \"next_check_time\" : 1566174272,\n \"files_paused\" : false,\n \"checking_paused\" : false,\n \"checking_status\" : 0,\n \"subject\" : \"gundam pictures\",\n \"imports\" : {\n \"status\" : \"4 successful (2 already in db)\",\n \"simple_status\" : \"4\",\n \"total_processed\" : 4,\n \"total_to_process\" : 4\n },\n \"gallery_log\" : {\n \"status\" : \"1 successful\",\n \"simple_status\" : \"1\",\n \"total_processed\" : 1,\n \"total_to_process\" : 1\n }\n },\n {\n \"url\" : \"https://someimageboard.net/a/1234\",\n \"watcher_key\" : \"6bc17555b76da5bde2dcceedc382cf7d23281aee6477c41b643cd144ec168510\",\n \"created\" : 1566063125,\n \"last_check_time\" : 1566063133,\n \"next_check_time\" : 1566104272,\n \"files_paused\" : false,\n \"checking_paused\" : true,\n \"checking_status\" : 1,\n \"subject\" : \"anime pictures\",\n \"imports\" : {\n \"status\" : \"124 successful (22 already in db), 2 previously deleted\",\n \"simple_status\" : \"124\",\n \"total_processed\" : 124,\n \"total_to_process\" : 124\n },\n \"gallery_log\" : {\n \"status\" : \"3 successful\",\n \"simple_status\" : \"3\",\n \"total_processed\" : 3,\n \"total_to_process\" : 3\n }\n }\n ]\n },\n \"highlight\" : \"cf8c3525c57a46b0e5c2625812964364a2e801f8c49841c216b8f8d7a4d06d85\"\n }\n },\n \"media\" : {\n \"num_files\" : 4\n }\n}\n
name
, page_key
, page_state
, and page_type
are as in /manage_pages/get_pages.
As you can see, even the 'simple' mode can get very large. Imagine that response for a page watching 100 threads! Turning simple mode off will display every import item, gallery log entry, and all hashes in the media (thumbnail) panel.
For this first version, the five importer pages--hdd import, simple downloader, url downloader, gallery page, and watcher page--all give rich info based on their specific variables. The first three only have one importer/gallery log combo, but the latter two of course can have multiple. The \"imports\" and \"gallery_log\" entries are all in the same data format.
"},{"location":"developer_api.html#manage_pages_add_files","title":"POST/manage_pages/add_files
","text":"Add files to a page.
Restricted access: YES. Manage Pages permission needed. Required Headers:Content-Type
: application/jsonpage_key
: (the page key for the page you wish to add files to)The files you set will be appended to the given page, just like a thumbnail drag and drop operation. The page key is the same as fetched in the /manage_pages/get_pages call.
Example request body{\n \"page_key\" : \"af98318b6eece15fef3cf0378385ce759bfe056916f6e12157cd928eb56c1f18\",\n \"file_ids\" : [123, 124, 125]\n}\n
Response: 200 with no content. If the page key is not found, this will 404."},{"location":"developer_api.html#manage_pages_focus_page","title":"POST /manage_pages/focus_page
","text":"'Show' a page in the main GUI, making it the current page in view. If it is already the current page, no change is made.
Restricted access: YES. Manage Pages permission needed. Required Headers:Content-Type
: application/jsonpage_key
: (the page key for the page you wish to show)The page key is the same as fetched in the /manage_pages/get_pages call.
Example request body{\n \"page_key\" : \"af98318b6eece15fef3cf0378385ce759bfe056916f6e12157cd928eb56c1f18\"\n}\n
Response: 200 with no content. If the page key is not found, this will 404."},{"location":"developer_api.html#manage_pages_refresh_page","title":"POST /manage_pages/refresh_page
","text":"Refresh a page in the main GUI. Like hitting F5 in the client, this obviously makes file search pages perform their search again, but for other page types it will force the currently in-view files to be re-sorted.
Restricted access: YES. Manage Pages permission needed. Required Headers:Content-Type
: application/jsonpage_key
: (the page key for the page you wish to refresh)The page key is the same as fetched in the /manage_pages/get_pages call. If a file search page is not set to 'searching immediately', a 'refresh' command does nothing.
Example request body{\n \"page_key\" : \"af98318b6eece15fef3cf0378385ce759bfe056916f6e12157cd928eb56c1f18\"\n}\n
Response: 200 with no content. If the page key is not found, this will 404. Poll the page_state
in /manage_pages/get_pages or /manage_pages/get_page_info to see when the search is complete.
/manage_database/lock_on
","text":"Pause the client's database activity and disconnect the current connection.
Restricted access: YES. Manage Database permission needed.Arguments: None
This is a hacky prototype. It commands the client database to pause its job queue and release its connection (and related file locks and journal files). This puts the client in a similar position as a long VACUUM command--it'll hang in there, but not much will work, and since the UI async code isn't great yet, the UI may lock up after a minute or two. If you would like to automate database backup without shutting the client down, this is the thing to play with.
This should return pretty quick, but it will wait up to five seconds for the database to actually disconnect. If there is a big job (like a VACUUM) current going on, it may take substantially longer to finish that up and process this STOP command. You might like to check for the existence of a journal file in the db dir just to be safe.
As long as this lock is on, all Client API calls except the unlock command will return 503. (This is a decent way to test the current lock status, too)
"},{"location":"developer_api.html#manage_database_lock_off","title":"POST/manage_database/lock_off
","text":"Reconnect the client's database and resume activity.
Restricted access: YES. Manage Database permission needed.Arguments: None
This is the obvious complement to the lock. The client will resume processing its job queue and will catch up. If the UI was frozen, it should free up in a few seconds, just like after a big VACUUM.
"},{"location":"developer_api.html#manage_database_mr_bones","title":"GET/manage_database/mr_bones
","text":"Get the data from help->how boned am I?. This is a simple Object of numbers just for hacky advanced purposes if you want to build up some stats in the background. The numbers are the same as the dialog shows, so double check that to confirm what means what.
Restricted access: YES. Manage Database permission needed. Arguments (in percent-encoded JSON):tags
: (optional, a list of tags you wish to search for)tag_service_key
: (optional, hexadecimal, the tag domain on which to search, defaults to 'all my files')/manage_database/mr_bones\n/manage_database/mr_bones?tags=%5B%22blonde_hair%22%2C%20%22blue_eyes%22%5D\n
Example response{\n \"boned_stats\" : {\n \"num_inbox\" : 8356,\n \"num_archive\" : 229,\n \"num_deleted\" : 7010,\n \"size_inbox\" : 7052596762,\n \"size_archive\" : 262911007,\n \"size_deleted\" : 13742290193,\n \"earliest_import_time\" : 1451408539,\n \"total_viewtime\" : [3280, 41621, 2932, 83021],\n \"total_alternate_files\" : 265,\n \"total_duplicate_files\" : 125,\n \"total_potential_pairs\" : 3252\n }\n}\n
The arguments here are the same as for GET /get_files/search_files. You can set any or none of them to set a search domain like in the dialog.
"},{"location":"developer_api.html#manage_database_get_client_options","title":"GET/manage_database/get_client_options
","text":"Unstable Response
The response for this path is unstable and subject to change without warning. No examples are given.
Gets the current options from the client.
Restricted access: YES. Manage Database permission needed.Required Headers: n/a
Arguments: n/a
Response: A JSON dump of nearly all options set in the client. The format of this is based on internal hydrus structures and is subject to change without warning with new hydrus versions. Do not rely on anything you find here to continue to exist and don't rely on the structure to be the same."},{"location":"docker.html","title":"Hydrus in a container(HiC)","text":"Latest hydrus client that runs in docker 24/7. Employs xvfb and vnc. Runs on alpine.
TL;DR: docker run --name hydrusclient -d -p 5800:5800 -p 5900:5900 ghcr.io/hydrusnetwork/hydrus:latest
. Connect to noVNC via http://yourdockerhost:5800/vnc.html
or use Tiger VNC Viewer or any other VNC client and connect on port 5900.
For persistent storage you can either create a named volume or mount a new/existing db path -v /hydrus/client/db:/opt/hydrus/db
. The client runs with default permissions of 1000:1000
, this can be changed by the ENV UID
and GID
(not working atm, fixed to 1000) will be fixed someday\u2122.
If you have enough RAM, mount /tmp
as tmpfs. If not, download more RAM.
As of v359
hydrus understands IPFS nocopy
. And can be easily run with go-ipfs container. Read Hydrus IPFS help. Mount HOST_PATH_DB/client_files
to /data/client_files
in ipfs. Go manage the ipfs service and set the path to /data/client_files
, you'll know where to put it in.
Example compose file:
version: '3.8'\nvolumes:\n tor-config:\n driver: local\n hybooru-pg-data:\n driver: local\n hydrus-server:\n driver: local\n hydrus-client:\n driver: local\n ipfs-data:\n driver: local\n hydownloader-data:\n driver: local\nservices:\n hydrusclient:\n image: ghcr.io/hydrusnetwork/hydrus:latest\n container_name: hydrusclient\n restart: unless-stopped\n environment:\n - UID=1000\n - GID=1000\n volumes:\n - hydrus-client:/opt/hydrus/db\n tmpfs:\n - /tmp #optional for SPEEEEEEEEEEEEEEEEEEEEEEEEED and less disk access\n ports:\n - 5800:5800 #noVNC\n - 5900:5900 #VNC\n - 45868:45868 #Booru\n - 45869:45869 #API\n\n hydrusserver:\n image: ghcr.io/hydrusnetwork/hydrus:server\n container_name: hydrusserver\n restart: unless-stopped\n volumes:\n - hydrus-server:/opt/hydrus/db\n\n hydrusclient-ipfs:\n image: ipfs/go-ipfs\n container_name: hydrusclient-ipfs\n restart: unless-stopped\n volumes:\n - ipfs-data:/data/ipfs\n - hydrus-clients:/data/db:ro\n ports:\n - 4001:4001 # READ\n - 5001:5001 # THE\n - 8080:8080 # IPFS\n - 8081:8081 # DOCS\n\n hydrus-web:\n image: floogulinc/hydrus-web\n container_name: hydrus-web\n restart: always\n ports:\n - 8080:80 # READ\n\n hybooru-pg:\n image: healthcheck/postgres\n container_name: hybooru-pg\n environment:\n - POSTGRES_USER=hybooru\n - POSTGRES_PASSWORD=hybooru\n - POSTGRES_DB=hybooru\n volumes:\n - hybooru-pg-data:/var/lib/postgresql/data\n restart: unless-stopped\n\n hybooru:\n image: suika/hybooru:latest # https://github.com/funmaker/hybooru build it yourself\n container_name: hybooru\n restart: unless-stopped\n depends_on:\n hybooru-pg:\n condition: service_started\n ports:\n - 8081:80 # READ\n volumes:\n - hydrus-client:/opt/hydrus/db\n\n hydownloader:\n image: ghcr.io/thatfuckingbird/hydownloader:edge\n container_name: hydownloader\n restart: unless-stopped\n ports:\n - 53211:53211\n volumes:\n - hydownloader-data:/db\n - hydrus-client:/hydb\n\n tor-socks-proxy:\n #network_mode: \"container:myvpn_container\" # in case you have a vpn container\n container_name: tor-socks-proxy\n image: peterdavehello/tor-socks-proxy:latest\n restart: unless-stopped\n\n tor-hydrus:\n image: goldy/tor-hidden-service\n container_name: tor-hydrus\n depends_on:\n hydrusclient:\n condition: service_healthy\n hydrusserver:\n condition: service_healthy\n hybooru:\n condition: service_started\n environment:\n HYBOORU_TOR_SERVICE_HOSTS: '80:hybooru:80'\n HYBOORU_TOR_SERVICE_VERSION: '3'\n HYSERV_TOR_SERVICE_HOSTS: 45870:hydrusserver:45870,45871:hydrusserver:45871\n HYSERV_TOR_SERVICE_VERSION: '3'\n HYCLNT_TOR_SERVICE_HOSTS: 45868:hydrusclient:45868,45869:hydrusclient:45869\n HYCLNT_TOR_SERVICE_VERSION: '3'\n volumes:\n - tor-config:/var/lib/tor/hidden_service \n
Further containerized application of interest: # Alpine (client)\ncd hydrus/\ndocker build -t ghcr.io/hydrusnetwork/hydrus:latest -f static/build_files/docker/client/Dockerfile .\n
"},{"location":"downloader_completion.html","title":"Putting it all together","text":"Now you know what GUGs, URL Classes, and Parsers are, you should have some ideas of how URL Classes could steer what happens when the downloader is faced with an URL to process. Should a URL be imported as a media file, or should it be parsed? If so, how?
You may have noticed in the Edit GUG ui that it lists if a current URL Class matches the example URL output. If the GUG has no matching URL Class, it won't be listed in the main 'gallery selector' button's list--it'll be relegated to the 'non-functioning' page. Without a URL Class, the client doesn't know what to do with the output of that GUG. But if a URL Class does match, we can then hand the result over to a parser set at network->downloader components->manage url class links:
Here you simply set which parsers go with which URL Classes. If you have URL Classes that do not have a parser linked (which is the default for new URL Classes), you can use the 'try to fill in gaps...' button to automatically fill the gaps based on guesses using the parsers' example URLs. This is usually the best way to line things up unless you have multiple potential parsers for that URL Class, in which case it'll usually go by the parser name earliest in the alphabet.
If the URL Class has no parser set or the parser is broken or otherwise invalid, the respective URL's file import object in the downloader or subscription is going to throw some kind of error when it runs. If you make and share some parsers, the first indication that something is wrong is going to be several users saying 'I got this error: (copy notes from file import status window)'. You can then load the parser back up in manage parsers and try to figure out what changed and roll out an update.
manage url class links also shows 'api/redirect link review', which summarises which URL Classes redirect to others. In these cases, only the redirected-to URL gets a parser entry in the first 'parser links' window, since the first will never be fetched for parsing (in the downloader, it will always be converted to the Redirected URL, and that is fetched and parsed).
Once your GUG has a URL Class and your URL Classes have parsers linked, test your downloader! Note that Hydrus's URL drag-and-drop import uses URL Classes, so if you don't have the GUG and gallery stuff done but you have a Post URL set up, you can test that just by dragging a Post URL from your browser to the client, and it should be added to a new URL Downloader and just work. It feels pretty good once it does!
"},{"location":"downloader_gugs.html","title":"Gallery URL Generators","text":"Gallery URL Generators, or GUGs are simple objects that take a simple string from the user, like:
And convert them into an initialising Gallery URL, such as:
These are all the 'first page' of the results if you type or click-through to the same location on those sites. We are essentially emulating their own simple search-url generation inside the hydrus client.
"},{"location":"downloader_gugs.html#doing_it","title":"actually doing it","text":"Although it is usually a fairly simple process of just substituting the inputted tags into a string template, there are a couple of extra things to think about. Let's look at the ui under network->downloader components->manage gugs:
The client will split whatever the user enters by whitespace, so blue_eyes blonde_hair
becomes two search terms, [ 'blue_eyes', 'blonde_hair' ]
, which are then joined back together with the given 'search terms separator', to make blue_eyes+blonde_hair
. Different sites use different separators, although ' ', '+', and ',' are most common. The new string is substituted into the %tags%
in the template phrase, and the URL is made.
Note that you will not have to make %20 or %3A percent-encodings for reserved characters here--the network engine handles all that before the request is sent. For the most part, if you need to include or a user puts in ':' or ' ' or '\u304a\u3063\u3071\u3044', you can just pass it along straight into the final URL without worrying.
This ui should update as you change it, so have a play and look at how the output example url changes to get a feel for things. Look at the other defaults to see different examples. Even if you break something, you can just cancel out.
The name of the GUG is important, as this is what will be listed when the user chooses what 'downloader' they want to use. Make sure it has a clear unambiguous name.
The initial search text is also important. Most downloaders just take some text tags, but if your GUG expects a numerical artist id (like pixiv artist search does), you should specify that explicitly to the user. You can even put in a brief '(two tag maximum)' type of instruction if you like.
Notice that the Deviart Art example above is actually the stream of wlop's favourites, not his works, and without an explicit notice of that, a user could easily mistake what they have selected. 'gelbooru' or 'newgrounds' are bad names, 'type here' is a bad initialising text.
"},{"location":"downloader_gugs.html#nested_gugs","title":"Nested GUGs","text":"Nested Gallery URL Generators are GUGs that hold other GUGs. Some searches actually use more than one stream (such as a Hentai Foundry artist lookup, where you might want to get both their regular works and their scraps, which are two separate galleries under the site), so NGUGs allow you to generate multiple initialising URLs per input. You can experiment with this ui if you like--it isn't too complicated--but you might want to hold off doing anything for real until you are comfortable with everything and know how producing multiple initialising URLs is going to work in the actual downloader.
"},{"location":"downloader_intro.html","title":"Making a Downloader","text":"Caution
Creating custom downloaders is only for advanced users who understand HTML or JSON. Beware! If you are simply looking for how to add new downloaders, please head over here.
"},{"location":"downloader_intro.html#intro","title":"this system","text":"The first versions of hydrus's downloaders were all hardcoded and static--I wrote everything into the program itself and nothing was user-creatable or -fixable. After the maintenance burden of the entire messy system proved too large for me to keep up with and a semi-editable booru system proved successful, I decided to overhaul the entire thing to allow user creation and sharing of every component. It is designed to be very simple to the front-end user--they will typically handle a couple of png files and then select a new downloader from a list--but very flexible (and hence potentially complicated) on the back-end. These help pages describe the different compontents with the intention of making an HTML- or JSON- fluent user able to create and share a full new downloader on their own.
As always, this is all under active development. Your feedback on the system would be appreciated, and if something is confusing or you discover something in here that is out of date, please let me know.
"},{"location":"downloader_intro.html#downloader","title":"what is a downloader?","text":"In hydrus, a downloader is one of:
Gallery Downloader This takes a string like 'blue_eyes' to produce a series of thumbnail gallery page URLs that can be parsed for image page URLs which can ultimately be parsed for file URLs and metadata like tags. Boorus fall into this category. URL Downloader This does just the Gallery Downloader's back-end--instead of taking a string query, it takes the gallery or post URLs directly from the user, whether that is one from a drag-and-drop event or hundreds pasted from clipboard. For our purposes here, the URL Downloader is a subset of the Gallery Downloader. Watcher This takes a URL that it will check in timed intervals, parsing it for new URLs that it then queues up to be downloaded. It typically stops checking after the 'file velocity' (such as '1 new file per day') drops below a certain level. It is mostly for watching imageboard threads. Simple Downloader This takes a URL one-time and parses it for direct file URLs. This is a miscellaneous system for certain simple gallery types and some testing/'I just need the third tag's src on this one page' jobs.The system currently supports HTML and JSON parsing. XML should be fine under the HTML parser--it isn't strict about checking types and all that.
"},{"location":"downloader_intro.html#pipeline","title":"what does a downloader do?","text":"The Gallery Downloader is the most complicated downloader and uses all the possible components. In order for hydrus to convert our example 'blue_eyes' query into a bunch of files with tags, it needs to:
So we have three components:
URL downloaders and watchers do not need the Gallery URL Generator, as their input is an URL. And simple downloaders also have an explicit 'just download it and parse it with this simple rule' action, so they do not use URL Classes (or even full-fledged Page Parsers) either.
"},{"location":"downloader_login.html","title":"Login Manager","text":"The system works, but this help was never done! Check the defaults for examples of how it works, sorry!
"},{"location":"downloader_parsers.html","title":"Parsers","text":"In hydrus, a parser is an object that takes a single block of HTML or JSON data and returns many kinds of hydrus-level metadata.
Parsers are flexible and potentially quite complicated. You might like to open network->downloader components->manage parsers and explore the UI as you read these pages. Check out how the default parsers already in the client work, and if you want to write a new one, see if there is something already in there that is similar--it is usually easier to duplicate an existing parser and then alter it than to create a new one from scratch every time.
There are three main components in the parsing system (click to open each component's help page):
Once you are comfortable with these objects, you might like to check out these walkthroughs, which create full parsers from nothing:
Once you are comfortable with parsers, and if you are feeling brave, check out how the default imageboard and pixiv parsers work. These are complicated and use more experimental areas of the code to get their job done. If you are trying to get a new imageboard parser going and can't figure out subsidiary page parsers, send me a mail or something and I'll try to help you out!
When you are making a parser, consider this checklist (you might want to copy/have your own version of this somewhere):
Taken a break? Now let's put it all together ---->
"},{"location":"downloader_parsers_content_parsers.html","title":"Content Parsers","text":"So, we can now generate some strings from a document. Content Parsers will let us apply a single metadata type to those strings to inform hydrus what they are.
A content parser has a name, a content type, and a formula. This example fetches the character tags from a danbooru post.
The name is just decorative, but it is generally a good idea so you can find things again when you next revisit them.
The current content types are:
"},{"location":"downloader_parsers_content_parsers.html#intro","title":"urls","text":"This should be applied to relative ('/image/smile.jpg') and absolute ('https://mysite.com/content/image/smile.jpg') URLs. If the URL is relative, the client will generate an absolute URL based on the original URL used to fetch the data being parsed (i.e. it should all just work).
You can set several types of URL:
The 'file url quality precedence' allows the client to select the best of several possible URLs. Given multiple content parsers producing URLs at the same 'level' of parsing, it will select the one with the highest value. Consider these two posts:
The Garnet image fits into a regular page and so Danbooru embed the whole original file in the main media canvas. One easy way to find the full File URL in this case would be to select the \"src\" attribute of the \"img\" tag with id=\"image\".
The Cirno one, however, is much larger and has been scaled down. The src of the main canvas tag points to a resized 'sample' link. The full link can be found at the 'view original' link up top, which is an \"a\" tag with id=\"image-resize-link\".
The Garnet post does not have the 'view original' link, so to cover both situations we might want two content parsers--one fetching the 'canvas' \"src\" and the other finding the 'view original' \"href\". If we set the 'canvas' one with a quality of 40 and the 'view original' 60, then the parsing system would know to select the 60 when it was available but to fall back to the 40 if not.
As it happens, Danbooru (afaik, always) gives a link to the original file under the 'Size:' metadata to the left. This is the same 'best link' for both posts above, but it isn't so easy to identify. It is a quiet \"a\" tag without an \"id\" and it isn't always in the same location, but if you could pin it down reliably, it might be nice to circumvent the whole issue.
Sites can change suddenly, so it is nice to have a bit of redundancy here if it is easy.
"},{"location":"downloader_parsers_content_parsers.html#tags","title":"tags","text":"These are simple--they tell the client that the given strings are tags. You set the namespace here as well. I recommend you parse 'splashbrush' and set the namespace 'creator' here rather than trying to mess around with 'append prefix \"creator:\"' string conversions at the formula level--it is simpler up here and it lets hydrus handle any edge case logic for you.
Leave the namespace field blank for unnamespaced tags.
"},{"location":"downloader_parsers_content_parsers.html#file_hash","title":"file hash","text":"This says 'this is the hash for the file otherwise referenced in this parser'. So, if you have another content parser finding a File or Post URL, this lets the client know early that that destination happens to have a particular MD5, for instance. The client will look for that hash in its own database, and if it finds a match, it can predetermine if it already has the file (or has previously deleted it) without ever having to download it. When this happens, it will still add tags and associate the file with the URL for it's 'known urls' just as if it had downloaded it!
If you understand this concept, it is great to include. It saves time and bandwidth for everyone. Many site APIs include a hash for this exact reason--they want you to be able to skip a needless download just as much as you do.
The usual suite of hash types are supported: MD5, SHA1, SHA256, and SHA512. An old version of this required some weird string decoding, but this is no longer true. Select 'hex' or 'base64' from the encoding type dropdown, and then just parse the 'e5af57a687f089894f5ecede50049458' or '5a9XpofwiYlPXs7eUASUWA==' text, and hydrus should handle the rest. It will present the parsed hash in hex.
"},{"location":"downloader_parsers_content_parsers.html#timestamp","title":"timestamp","text":"This lets you say that a given number refers to a particular time for a file. At the moment, I only support 'source time', which represents a 'post' time for the file and is useful for thread and subscription check time calculations. It takes a Unix time integer, like 1520203484, which many APIs will provide.
If you are feeling very clever, you can decode a 'MM/DD/YYYY hh:mm:ss' style string to a Unix time integer using string converters, which use some hacky and semi-reliable python %d-style values as per here. Look at the existing defaults for examples of this, and don't worry about being more accurate than 12/24 hours--trying to figure out timezone is a hell not worth attempting, and doesn't really matter in the long-run for subscriptions and thread watchers that might care.
"},{"location":"downloader_parsers_content_parsers.html#page_title","title":"watcher page title","text":"This lets the watcher know a good name/subject for its entries. The subject of a thread is obviously ideal here, but failing that you can try to fetch the first part of the first post's comment. It has precendence, like for URLs, so you can tell the parser which to prefer if you have multiple options. Just for neatness and ease of testing, you probably want to use a string converter here to cut it down to the first 64 characters or so.
"},{"location":"downloader_parsers_content_parsers.html#veto","title":"veto","text":"This is a special content type--it tells the next highest stage of parsing that this 'post' of parsing is invalid and to cancel and not return any data. For instance, if a thread post's file was deleted, the site might provide a default '404' stock File URL using the same markup structure as it would for normal images. You don't want to give the user the same 404 image ten times over (with fifteen kinds of tag and source time metadata attached), so you can add a little rule here that says \"If the image link is 'https://somesite.com/404.png', raise a veto: File 404\" or \"If the page has 'No results found' in its main content div, raise a veto: No results found\" or \"If the expected download tag does not have 'download link' as its text, raise a veto: No Download Link found--possibly Ugoira?\" and so on.
They will associate their name with the veto being raised, so it is useful to give these a decent descriptive name so you can see what might be going right or wrong during testing. If it is an appropriate and serious enough veto, it may also rise up to the user level and will be useful if they need to report you an error (like \"After five pages of parsing, it gives 'veto: no next page link'\").
"},{"location":"downloader_parsers_formulae.html","title":"Parser Formulae","text":"Formulae are tools used by higher-level components of the parsing system. They take some data (typically some HTML or JSON) and return 0 to n strings. For our purposes, these strings will usually be tags, URLs, and timestamps. You will usually see them summarised with this panel:
The different types are currently html, json, compound, and context variable.
"},{"location":"downloader_parsers_formulae.html#html_formula","title":"html","text":"This takes a full HTML document or a sample of HTML--and any regular sort of XML should also work. It starts at the root node and searches for lower nodes using one or more ordered rules based on tag name and attributes, and then returns string data from those final nodes.
For instance, if you have this:
<html>\n <body>\n <div class=\"media_taglist\">\n <span class=\"generaltag\"><a href=\"(search page)\">blonde hair</a> (3456)</span>\n <span class=\"generaltag\"><a href=\"(search page)\">blue eyes</a> (4567)</span>\n <span class=\"generaltag\"><a href=\"(search page)\">bodysuit</a> (5678)</span>\n <span class=\"charactertag\"><a href=\"(search page)\">samus aran</a> (2345)</span>\n <span class=\"artisttag\"><a href=\"(search page)\">splashbrush</a> (123)</span>\n </div>\n <div class=\"content\">\n <span class=\"media\">(a whole bunch of content that doesn't have tags in)</span>\n </div>\n </body>\n</html>\n
(Most boorus have a taglist like this on their file pages.)
To find the artist, \"splashbrush\", here, you could:
<html>
) for the <div>
tag with attribute class=\"media_taglist\"
<div>
for <span>
tags with attribute class=\"artisttag\"
<span>
tags for <a>
tags<a>
tagsChanging the artisttag
to charactertag
or generaltag
would give you samus aran
or blonde hair
, blue eyes
, bodysuit
respectively.
You might be tempted to just go straight for any <span>
with class=\"artisttag\"
, but many sites use the same class to render a sidebar of favourite/popular tags or some other sponsored content, so it is generally best to try to narrow down to a larger <div>
container so you don't get anything you don't mean.
Clicking 'edit formula' on an HTML formula gives you this:
You edit on the left and test on the right.
"},{"location":"downloader_parsers_formulae.html#finding_the_right_html_tags","title":"finding the right html tags","text":"When you add or edit one of the specific tag search rules, you get this:
You can set multiple key/value attribute search conditions, but you'll typically be searching for 'class' or 'id' here, if anything.
Note that you can set it to fetch only the xth instance of a found tag, which can be useful in situations like this:
<span class=\"generaltag\">\n <a href=\"(add tag)\">+</a>\n <a href=\"(remove tag)\">-</a>\n <a href=\"(search page)\">blonde hair</a> (3456)\n</span>\n
Without any more attributes, there isn't a great way to distinguish the <a>
with \"blonde hair\" from the other two--so just set get the 3rd <a> tag
and you are good.
Most of the time, you'll be searching descendants (i.e. walking down the tree), but sometimes you might have this:
<span>\n <a href=\"(link to post url)\">\n <img class=\"thumb\" src=\"(thumbnail image)\" />\n </a>\n</span>\n
There isn't a great way to find the <span>
or the <a>
when looking from above here, as they are lacking a class or id, but you can find the <img>
ok, so if you find those and then add a rule where instead of searching descendants, you are 'walking back up ancestors' like this:
You can solve some tricky problems this way!
You can also set a String Match, which is the same panel as you say in with URL Classes. It tests its best guess at the tag's 'string' value, so you can find a tag with 'Original Image' as its text or that with a regex starts with 'Posted on: '. Have a play with it and you'll figure it out.
"},{"location":"downloader_parsers_formulae.html#content_to_fetch","title":"content to fetch","text":"Once you have narrowed down the right nodes you want, you can decide what text to fetch. Given a node of:
<a href=\"(URL A)\" class=\"thumb_title\">Forest Glade</a>\n
Returning the href
attribute would return the string \"(URL A)\", returning the string content would give \"Forest Glade\", and returning the full html would give <a href=\"(URL A)\" class=\"thumb\">Forest Glade</a>
. This last choice is useful in complicated situations where you want a second, separated layer of parsing, which we will get to later.
You can set a final String Match to filter the parsed results (e.g. \"only allow strings that only contain numbers\" or \"only allow full URLs as based on (complicated regex)\") and String Converter to edit it (e.g. \"remove the first three characters of whatever you find\" or \"decode from base64\").
You won't use these much, but they can sometimes get you out of a complicated situation.
"},{"location":"downloader_parsers_formulae.html#testing","title":"testing","text":"The testing panel on the right is important and worth using. Copy the html from the source you want to parse and then hit the paste buttons to set that as the data to test with.
"},{"location":"downloader_parsers_formulae.html#json_formula","title":"json","text":"This takes some JSON and does a similar style of search:
It is a bit simpler than HTML--if the current node is a list (called an 'Array' in JSON), you can fetch every item or the xth item, and if it is a dictionary (called an 'Object' in JSON), you can fetch a particular entry by name. Since you can't jump down several layers with attribute lookups or tag names like with HTML, you have to go down every layer one at a time. In any case, if you have something like this:
Note
It is a great idea to check the html or json you are trying to parse with your browser. Some web browsers have excellent developer tools that let you walk through the nodes of the document you are trying to parse in a prettier way than I would ever have time to put together. This image is one of the views Firefox provides if you simply enter a JSON URL.
Searching for \"posts\"->1st list item->\"sub\" on this data will give you \"Nobody like kino here.\".
Searching for \"posts\"->all list items->\"tim\" will give you the three SHA256 file hashes (since the third post has no file attached and so no 'tim' entry, the parser skips over it without complaint).
Searching for \"posts\"->1st list item->\"com\" will give you the OP's comment, ~AS RAW UNPARSED HTML~.
The default is to fetch the final nodes' 'data content', which means coercing simple variables into strings. If the current node is a list or dict, no string is returned.
But if you like, you can return the json beneath the current node (which, like HTML, includes the current node). This again will come in useful later.
"},{"location":"downloader_parsers_formulae.html#compound_formula","title":"compound","text":"If you want to create a string from multiple parsed strings--for instance by appending the 'tim' and the 'ext' in our json example together--you can use a Compound formula. This fetches multiple lists of strings and tries to place them into a single string using \\1
regex substitution syntax:
This is a complicated example taken from one of my thread parsers. I have to take a modified version of the original thread URL (the first rule, so \\1
) and then append the filename (\\2
) and its extension (\\3
) on the end to get the final file URL of a post. You can mix in more characters in the substitution phrase, like \\1.jpg
or even have multiple instances (https://\\2.muhsite.com/\\2/\\1
), if that is appropriate.
This is where the magic happens, sometimes, so keep it in mind if you need to do something cleverer than the data you have seems to provide.
"},{"location":"downloader_parsers_formulae.html#context_variable_formula","title":"context variable","text":"This is a basic hacky answer to a particular problem. It is a simple key:value dictionary that at the moment only stores one variable, 'url', which contains the original URL used to fetch the data being parsed.
If a different URL Class links to this parser via an API URL, this 'url' variable will always be the API URL (i.e. it literally is the URL used to fetch the data), not any thread/whatever URL the user entered.
Hit the 'edit example parsing context' to change the URL used for testing.
I have used this several times to stitch together file URLs when I am pulling data from APIs, like in the compound formula example above. In this case, the starting URL is https://a.4cdn.org/tg/thread/57806016.json
, from which I extract the board name, \"tg\", using the string converter, and then add in 4chan's CDN domain to make the appropriate base file URL (https:/i.4cdn.org/tg/
) for the given thread. I only have to jump through this hoop in 4chan's case because they explicitly store file URLs by board name. 8chan on the other hand, for instance, has a static https://media.8ch.net/file_store/
for all files, so it is a little easier (I think I just do a single 'prepend' string transformation somewhere).
If you want to make some parsers, you will have to get familiar with how different sites store and present their data!
"},{"location":"downloader_parsers_full_example_api.html","title":"api example","text":"Some sites offer API calls for their pages. Depending on complexity and quality of content, using these APIs may or may not be a good idea. Artstation has a good one--let's first review our URL Classes:
We convert the original Post URL, https://www.artstation.com/artwork/mQLe1 to https://www.artstation.com/projects/mQLe1.json. Note that Artstation Post URLs can produce multiple files, and that the API url should not be associated with those final files.
So, when the client encounters an 'artstation file page' URL, it will generate the equivalent 'artstation file page json api' URL and use that for downloading and parsing. If you would like to review your API links, check out network->downloader components->manage url class links->api links. Using Example URLs, it will figure out which URL Classes link to others and ensure you are mapping parsers only to the final link in the chain--there should be several already in there by default.
Now lets look at the JSON. Loading clean JSON in a browser should present you with a nicer view:
I have highlighted the data we want, which is:
JSON is a dream to parse, and I will assume you are comfortable with Content Parsers from the previous examples, so I'll simply paste the different formulae one after another:
Each image is stored under a separate numbered 'assets' list item. This one has just two, but some Artstation pages have dozens of images. The only unusual part here is I also put a String Match of ^(?!.*assets\\/covers).*$
, which filters out 'cover' images (such as on here), which make for nice portfolio thumbs on the site but are not interesting to us.
This fetches the 'creator' tag. Artstation's API is great because it includes profile data in content requests. There's the creator's presentation name, username, profile link, avatar URLs, all that inside a regular request about this particular work. When that information is missing (like in yiff.party), it may make the API useless to you.
These are all simple. You can take or leave the title and medium tags--some people like them, some don't. This example has no unnamespaced tags, but this one does. Creator-entered tags are sometimes not worth parsing (on tumblr, for instance, you often get run-on tags like #imbored #whatisevengoingon that are irrelevent to the work), but Artstation users are all professionals trying to get their work noticed, so the tags are usually pretty good.
This again uses python's datetime to decode the date, which Artstation presents with millisecond accuracy, ha ha. I use a (.+:..)\\..*->\\1
regex (i.e. \"get everything before the period\") to strip off the timezone and milliseconds and then decode as normal.
APIs that are stable and free to access (e.g. do not require OAuth or other complicated login headers) can make parsing fantastic. They save bandwidth and CPU time, and they are typically easier to work with than HTML. Unfortunately, the boorus that do provide APIs often list their tags without namespace information, so I recommend you double-check you can get what you want before you get too deep into it. Some APIs also offer incomplete data, such as relative URLs (relative to the original URL!), which can be a pain to figure out in our system.
"},{"location":"downloader_parsers_full_example_file_page.html","title":"file page example","text":"Let's look at this page: https://gelbooru.com/index.php?page=post&s=view&id=3837615.
What sorts of data are we interested in here?
A tempting strategy for pulling the file URL is to just fetch the src of the embedded <img>
tag, but:
<video>
and <embed>
tags.If you have an account with the site you are parsing and have clicked the appropriate 'Always view original' setting, you may not see these sorts of sample-size banners! I recommend you log out of/go incognito for sites you are inspecting for hydrus parsing (unless a log-in is required to see content, so the hydrus user will have to set up hydrus-side login to actually use the parser), or you can easily NSFW-gates and other logged-out hurdles.
When trying to pin down the right link, if there are no good alternatives, you often have to write several File URL rules with different precedence, saying 'get the \"Click Here to See Full Size\" link at 75' and 'get the embed's \"src\" at 25' and so on to make sure you cover different situations, but as it happens Gelbooru always posts the actual File URL at:
<meta property=\"og:image\" content=\"https://gelbooru.com/images/38/6e/386e12e33726425dbd637e134c4c09b5.jpeg\" />
under the <head>
<a href=\"https://simg3.gelbooru.com//images/38/6e/386e12e33726425dbd637e134c4c09b5.jpeg\" target=\"_blank\" style=\"font-weight: bold;\">Original image</a>
which can be found by putting a String Match in the html formula.<meta>
with property=\"og:image\"
is easy to search for (and they use the same tag for video links as well!). For the Original Image, you can use a String Match like so:
Gelbooru uses \"Original Image\" even when they link to webm, which is helpful, but like \"og:image\", it could be changed to 'video' in future.
I think I wrote my gelbooru parser before I added String Matches to individual HTML formulae tag rules, so I went with this, which is a bit more cheeky:
But it works. Sometimes, just regexing for links that fit the site's CDN is a good bet for finding difficult stuff.
"},{"location":"downloader_parsers_full_example_file_page.html#tags","title":"tags","text":"Most boorus have a taglist on the left that has a nice id or class you can pull, and then each namespace gets its own class for CSS-colouring:
Make sure you browse around the booru for a bit, so you can find all the different classes they use. character/artist/copyright are common, but some sneak in the odd meta/species/rating.
Skipping ?/-/+ characters can be a pain if you are lacking a nice tag-text class, in which case you can add a regex String Match to the HTML formula (as I do here, since Gelb offers '?' links for tag definitions) like [^\\?\\-+\\s], which means \"the text includes something other than just '?' or '-' or '+' or whitespace\".
"},{"location":"downloader_parsers_full_example_file_page.html#md5_hash","title":"md5 hash","text":"If you look at the Gelbooru File URL, https://gelbooru.com/images/38/6e/386e12e33726425dbd637e134c4c09b5.jpeg, you may notice the filename is all hexadecimal. It looks like they store their files under a two-deep folder structure, using the first four characters--386e here--as the key. It sure looks like '386e12e33726425dbd637e134c4c09b5' is not random ephemeral garbage!
In fact, Gelbooru use the MD5 of the file as the filename. Many storage systems do something like this (hydrus uses SHA256!), so if they don't offer a <meta>
tag that explicitly states the md5 or sha1 or whatever, you can sometimes infer it from one of the file links. This screenshot is from the more recent version of hydrus, which has the more powerful 'string processing' system for string transformations. It has an intimidating number of nested dialogs, but we can stay simple for now, with only the one regex substitution step inside a string 'converter':
Here we are using the same property=\"og:image\" rule to fetch the File URL, and then we are regexing the hex hash with .*(\\[0-9a-f\\]{32}).*
(MD5s are 32 hex characters). We select 'hex' as the encoding type. Hashes require a tiny bit more data handling behind the scenes, but in the Content Parser test page it presents the hash again neatly in English: \"md5 hash: 386e12e33726425dbd637e134c4c09b5\"), meaning everything parsed correct. It presents the hash in hex even if you select the encoding type as base64.
If you think you have found a hash string, you should obviously test your theory! The site might not be using the actual MD5 of file bytes, as hydrus does, but instead some proprietary scheme. Download the file and run it through a program like HxD (or hydrus!) to figure out its hashes, and then search the View Source for those hex strings--you might be surprised!
Finding the hash is hugely beneficial for a parser--it lets hydrus skip downloading files without ever having seen them before!
"},{"location":"downloader_parsers_full_example_file_page.html#source_time","title":"source time","text":"Post/source time lets subscriptions and watchers make more accurate guesses at current file velocity. It is neat to have if you can find it, but:
FUCK ALL TIMEZONES FOREVER
Gelbooru offers--
<li>Posted: 2017-08-18 19:59:44<br /> by <a href=\"index.php?page=account&s=profile&uname=jayage5ds\">jayage5ds</a></li>\n
--so let's see how we can turn that into a Unix timestamp:
I find the <li>
that starts \"Posted: \" and then decode the date according to the hackery-dackery-doo format from here. %c
and %z
are unreliable, and attempting timezone adjustments is overall a supervoid that will kill your time for no real benefit--subs and watchers work fine with 12-hour imprecision, so if you have a +0300 or EST in your string, just cut those characters off with another String Transformation. As long as you are getting about the right day, you are fine.
Source URLs are nice to have if they are high quality. Some boorus only ever offer artist profiles, like https://twitter.com/artistname
, whereas we want singular Post URLs that point to other places that host this work. For Gelbooru, you could fetch the Source URL as we did source time, searching for \"Source: \", but they also offer more easily in an edit form:
<input type=\"text\" name=\"source\" size=\"40\" id=\"source\" value=\"https://www.deviantart.com/art/Lara-Croft-Artifact-Dive-699335378\" />\n
This is a bit of a fragile location to parse from--Gelb could change or remove this form at any time, whereas the \"Posted: \" <li>
is probably firmer, but I expect I wrote it before I had String Matches in. It works for now, which in this game is often Good Enough\u2122.
Also--be careful pulling from text or tooltips rather than an href-like attribute, as whatever is presented to the user may be clipped for longer URLs. Make sure you try your rules on a couple of different pages to make sure you aren't pulling \"https://www.deviantart.com/art/Lara...\" by accident anywhere!
"},{"location":"downloader_parsers_full_example_file_page.html#summary","title":"summary","text":"Phew--all that for a bit of Lara Croft! Thankfully, most sites use similar schemes. Once you are familiar with the basic idea, the only real work is to duplicate an existing parser and edit for differences. Our final parser looks like this:
This is overall a decent parser. Some parts of it may fail when Gelbooru update to their next version, but that can be true of even very good parsers with multiple redundancy. For now, hydrus can use this to quickly and efficiently pull content from anything running Gelbooru 0.2.5., and the effort spent now can save millions of combined right-click->save as and manual tag copies in future. If you make something like this and share it about, you'll be doing a good service for those who could never figure it out.
"},{"location":"downloader_parsers_full_example_gallery_page.html","title":"gallery page example","text":"Caution
These guides should roughly follow what comes with the client by default! You might like to have the actual UI open in front of you so you can play around with the rules and try different test parses yourself.
Let's look at this page: https://e621.net/post/index/1/rating:safe pokemon
We've got 75 thumbnails and a bunch of page URLs at the bottom.
"},{"location":"downloader_parsers_full_example_gallery_page.html#main_page","title":"first, the main page","text":"This is easy. It gets a good name and some example URLs. e621 has some different ways of writing out their queries (and as they use some tags with '/', like 'male/female', this can cause character encoding issues depending on whether the tag is in the path or query!), but we'll put that off for now--we just want to parse some stuff.
"},{"location":"downloader_parsers_full_example_gallery_page.html#thumbnail_urls","title":"thumbnail links","text":"Most browsers have some good developer tools to let you Inspect Element and get a better view of the HTML DOM. Be warned that this information isn't always the same as View Source (which is what hydrus will get when it downloads the initial HTML document), as some sites load results dynamically with javascript and maybe an internal JSON API call (when sites move to systems that load more thumbs as you scroll down, it makes our job more difficult--in these cases, you'll need to chase down the embedded JSON or figure out what API calls their JS is making--the browser's developer tools can help you here again). Thankfully, e621 is (and most boorus are) fairly static and simple:
Every thumb on e621 is a <span>
with class=\"thumb\" wrapping an <a>
and an <img>
. This is a common pattern, and easy to parse:
There's no tricky String Matches or String Converters needed--we are just fetching hrefs. Note that the links get relative-matched to example.com for now--I'll probably fix this to apply to one of the example URLs, but rest assured that IRL the parser will 'join' its url up with the appropriate Gallery URL used to fetch the data. Sometimes, you might want to add a rule for search descendents for the first <div> tag with id=content
to make sure you are only grabbing thumbs from the main box, whether that is a <div>
or a <span>
, and whether it has id=\"content
\" or class=\"mainBox\"
, but unless you know that booru likes to embed \"popular\" or \"favourite\" 'thumbs' up top that will be accidentally caught by a <span>
's with class=\"thumb\"
, I recommend you not make your rules overly specific--all it takes is for their dev to change the name of their content box, and your whole parser breaks. I've ditched the <span>
requirement in the rule here for exactly that reason--class=\"thumb\"
is necessary and sufficient.
Remember that the parsing system allows you to go up ancestors as well as down descendants. If your thumb-box has multiple links--like to see the artist's profile or 'set as favourite'--you can try searching for the <span>
s, then down to the <img>
, and then up to the nearest <a>
. In English, this is saying, \"Find me all the image link URLs in the thumb boxes.\"
Most boorus have 'next' or '>>' at the bottom, which can be simple enough, but many have a neat <link href=\"/post/index/2/rating:safe%20pokemon\" rel=\"next\" />
in the <head>
. The <head>
solution is easier, if available, but my default e621 parser happens to pursue the 'paginator':
As it happens, e621 also apply the rel=\"next\"
attribute to their \"Next >>\" links, which makes it all that easier for us to find. Sometimes there is no \"next\" id or class, and you'll want to add a String Match to your html formula to test for a string value of '>>' or whatever it is. A good trick is to View Source and then search for the critical /post/index/2/
phrase you are looking for--you might find what you want in a <link>
tag you didn't expect or even buried in a hidden 'share to tumblr' button. <form>
s for reporting or commenting on content are another good place to find content ids.
Note that this finds two URLs. e621 apply the rel=\"next\"
to both the \"2\" link and the \"Next >>\" one. The download engine merges the parser's dupes, so don't worry if you end up parsing both the 'top' and 'bottom' next page links, or if you use multiple rules to parse the same data in different ways.
With those two rules, we are done. Gallery parsers are nice and simple.
"},{"location":"downloader_parsers_page_parsers.html","title":"Page Parsers","text":"We can now produce individual rows of rich metadata. To arrange them all into a useful structure, we will use Page Parsers.
The Page Parser is the top level parsing object. It takes a single document and produces a list--or a list of lists--of metadata. Here's the main UI:
Notice that the edit panel has three sub-pages.
"},{"location":"downloader_parsers_page_parsers.html#main","title":"main","text":"This page is just a simple list:
Each content parser here will be applied to the document and returned in this page parser's results list. Like most boorus, e621's File Pages only ever present one file, and they have simple markup, so the solution here was simple. The full contents of that test window are:
*** 1 RESULTS BEGIN ***\n\ntag: character:krystal\ntag: creator:s mino930\nfile url: https://static1.e621.net/data/fc/b6/fcb673ed89241a7b8d87a5dcb3a08af7.jpg\ntag: anthro\ntag: black nose\ntag: blue fur\ntag: blue hair\ntag: clothing\ntag: female\ntag: fur\ntag: green eyes\ntag: hair\ntag: hair ornament\ntag: jewelry\ntag: short hair\ntag: solo\ntag: video games\ntag: white fur\ntag: series:nintendo\ntag: series:star fox\ntag: species:canine\ntag: species:fox\ntag: species:mammal\n\n*** RESULTS END ***\n
When the client sees this in a downloader context, it will where to download the file and which tags to associate with it based on what the user has chosen in their 'tag import options'.
"},{"location":"downloader_parsers_page_parsers.html#subsidiary_page_parsers","title":"subsidiary page parsers","text":"Here be dragons. This was an attempt to make parsing more helpful in certain API situations, but it ended up ugly. I do not recommend you use it, as I will likely scratch the whole thing and replace it with something better one day. It basically splits the page up into pieces that can then be parsed by nested page parsers as separate objects, but the UI and workflow is hell. Afaik, the imageboard API parsers use it, but little/nothing else. If you are really interested, check out how those work and maybe duplicate to figure out your own imageboard parser and/or send me your thoughts on how to separate File URL/timestamp combos better.
"},{"location":"downloader_sharing.html","title":"Sharing Downloaders","text":"If you are working with users who also understand the downloader system, you can swap your GUGs, URL Classes, and Parsers separately using the import/export buttons on the relevant dialogs, which work in pngs and clipboard text.
But if you want to share conveniently, and with users who are not familiar with the different downloader objects, you can package everything into a single easy-import png as per here.
The dialog to use is network->downloader components->export downloaders:
It isn't difficult. Essentially, you want to bundle enough objects to make one or more 'working' GUGs at the end. I recommend you start by just hitting 'add gug', which--using Example URLs--will attempt to figure out everything you need by itself.
This all works on Example URLs and some domain guesswork, so make sure your url classes are good and the parsers have correct Example URLs as well. If they don't, they won't all link up neatly for the end user. If part of your downloader is on a different domain to the GUGs and Gallery URLs, then you'll have to add them manually. Just start with 'add gug' and see if it looks like enough.
Once you have the necessary and sufficient objects added, you can export to png. You'll get a similar 'does this look right?' summary as what the end-user will see, just to check you have everything in order and the domains all correct. If that is good, then make sure to give the png a sensible filename and embellish the title and description if you need to. You can then send/post that png wherever, and any regular user will be able to use your work.
"},{"location":"downloader_url_classes.html","title":"URL Classes","text":"The fundamental connective tissue of the downloader system is the 'URL Class'. This object identifies and normalises URLs and links them to other components. Whenever the client handles a URL, it tries to match it to a URL Class to figure out what to do.
"},{"location":"downloader_url_classes.html#url_types","title":"the types of url","text":"For hydrus, an URL is useful if it is one of:
File URLThis returns the full, raw media file with no HTML wrapper. They typically end in a filename like http://safebooru.org//images/2333/cab1516a7eecf13c462615120ecf781116265f17.jpg, but sometimes they have a more complicated fetch command ending like 'file.php?id=123456' or '/post/content/123456'.
These URLs are remembered for the file in the 'known urls' list, so if the client happens to encounter the same URL in future, it can determine whether it can skip the download because the file is already in the database or has previously been deleted.
It is not important that File URLs be matched by a URL Class. File URL is considered the 'default', so if the client finds no match, it will assume the URL is a file and try to download and import the result. You might want to particularly specify them if you want to present them in the media viewer or discover File URLs are being confused for Post URLs or something.
Post URLThis typically return some HTML that contains a File URL and metadata such as tags and post time. They sometimes present multiple sizes (like 'sample' vs 'full size') of the file or even different formats (like 'ugoira' vs 'webm'). The Post URL for the file above, http://safebooru.org/index.php?page=post&s=view&id=2429668 has this 'sample' presentation. Finding the best File URL in these cases can be tricky!
This URL is also saved to 'known urls' and will usually be similarly skipped if it has previously been downloaded. It will also appear in the media viewer as a clickable link.
Gallery URL This presents a list of Post URLs or File URLs. They often also present a 'next page' URL. It could be a page like http://safebooru.org/index.php?page=post&s=list&tags=yorha_no._2_type_b&pid=0 or an API URL like http://safebooru.org/index.php?page=dapi&s=post&tags=yorha_no._2_type_b&q=index&pid=0. Watchable URL This is the same as a Gallery URL but represents an ephemeral page that receives new files much faster than a gallery but will soon 'die' and be deleted. For our purposes, this typically means imageboard threads."},{"location":"downloader_url_classes.html#url_components","title":"the components of a url","text":"As far as we are concerned, a URL string has four parts:
http
or https
safebooru.org
or i.4cdn.org
or cdn002.somebooru.net
index.php
or tesla/res/7518.json
or pictures/user/daruak/page/2
or art/Commission-animation-Elsa-and-Anna-541820782
page=post&s=list&tags=yorha_no._2_type_b&pid=40
or page=post&s=view&id=2429668
So, let's look at the 'edit url class' panel, which is found under network->downloader components->manage url classes:
A TBIB File Page like https://tbib.org/index.php?page=post&s=view&id=6391256 is a Post URL. Let's look at the metadata first:
Name and typeLike with GUGs, we should set a good unambiguous name so the client can clearly summarise this url to the user. 'tbib file page' is good.
This is a Post URL, so we set the 'post url' type.
Association logicAll boorus and most sites only present one file per page, but some sites present multiple files on one page, usually several pages in a series/comic, as with pixiv. Danbooru-style thumbnail links to 'this file has a post parent' do not count here--I mean that a single URL embeds multiple full-size images, either with shared or separate tags. It is very important to the hydrus client's downloader logic (making decisions about whether it has previously visited a URL, so whether to skip checking it again) that if a site can present multiple files on a single page that 'can produce multiple files' is checked.
Related is the idea of whether a 'known url' should be associated. Typically, this should be checked for Post and File URLs, which are fixed, and unchecked for Gallery and Watchable URLs, which are ephemeral and give different results from day to day. There are some unusual exceptions, so give it a brief thought--but if you have no special reason, leave this as the default for the url type.
And now, for matching the string itself, let's revisit our four components:
Scheme TBIB supports http and https, so I have set the 'preferred' scheme to https. Any 'http' TBIB URL a user inputs will be automatically converted to https. Location/DomainFor Post URLs, the domain is always \"tbib.org\".
The 'allow' and 'keep' subdomains checkboxes let you determine if a URL with \"artistname.artsite.com\" will match a URL Class with \"artsite.com\" domain and if that subdomain should be remembered going forward. Most sites do not host content on subdomains, so you can usually leave 'match' unchecked. The 'keep' option (which is only available if 'keep' is checked) is more subtle, only useful for rare cases, and unless you have a special reason, you should leave it checked. (For keep: In cases where a site farms out File URLs to CDN servers on subdomains--like randomly serving a mirror of \"https://muhbooru.org/file/123456\" on \"https://srv2.muhbooru.org/file/123456\"--and removing the subdomain still gives a valid URL, you may not wish to keep the subdomain.) Since TBIB does not use subdomains, these options do not matter--we can leave both unchecked.
'www' and 'www2' and similar subdomains are automatically matched. Don't worry about them.
Path Components TBIB just uses a single \"index.php\" on the root directory, so the path is not complicated. Were it longer (like \"gallery/cgi/index.php\", we would add more (\"gallery\" and \"cgi\"), and since the path of a URL has a strict order, we would need to arrange the items in the listbox there so they were sorted correctly. Parameters TBIB's index.php takes many parameters to render different page types. Note that the Post URL uses \"s=view\", while TBIB Gallery URLs use \"s=list\". In any case, for a Post URL, \"id\", \"page\", and \"s\" are necessary and sufficient."},{"location":"downloader_url_classes.html#string_matches","title":"string matches","text":"As you edit these components, you will be presented with the Edit String Match Panel:
This lets you set the type of string that will be valid for that component. If a given path or query component does not match the rules given here, the URL will not match the URL Class. Most of the time you will probably want to set 'fixed characters' of something like \"post\" or \"index.php\", but if the component you are editing is more complicated and could have a range of different valid values, you can specify just numbers or letters or even a regex pattern. If you try to do something complicated, experiment with the 'example string' entry to make sure you have it set how you think.
Don't go overboard with this stuff, though--most sites do not have super-fine distinctions between their different URL types, and hydrus users will not be dropping user account or logout pages or whatever on the client, so you can be fairly liberal with the rules.
"},{"location":"downloader_url_classes.html#match_details","title":"how do they match, exactly?","text":"This URL Class will be assigned to any URL that matches the location, path, and query. Missing path component or parameters in the URL will invalidate the match but additonal ones will not!
For instance, given:
Only URL A will match
And:
Both URL A and B will match
And:
Both URL A and B will match, URL C will not
If multiple URL Classes match a URL, the client will try to assign the most 'complicated' one, with the most path components and then parameters.
Given two example URLs and URL Classes:
URL A will match URL Class A but not URL Class B and so will receive A.
URL B will match both and receive URL Class B as it is more complicated.
This situation is not common, but when it does pop up, it can be a pain. It is usually a good idea to match exactly what you need--no more, no less.
"},{"location":"downloader_url_classes.html#url_normalisation","title":"normalising urls","text":"Different URLs can give the same content. The http and https versions of a URL are typically the same, and:
And:
Since we are in the business of storing and comparing URLs, we want to 'normalise' them to a single comparable beautiful value. You see a preview of this normalisation on the edit panel. Normalisation happens to all URLs that enter the program.
Note that in e621's case (and for many other sites!), that text after the id is purely decoration. It can change when the file's tags change, so if we want to compare today's URLs with those we saw a month ago, we'd rather just be without it.
On normalisation, all URLs will get the preferred http/https switch, and their parameters will be alphabetised. File and Post URLs will also cull out any surplus path or query components. This wouldn't affect our TBIB example above, but it will clip the e621 example down to that 'bare' id URL, and it will take any surplus 'lang=en' or 'browser=netscape_24.11' garbage off the query text as well. URLs that are not associated and saved and compared (i.e. normal Gallery and Watchable URLs) are not culled of unmatched path components or query parameters, which can sometimes be useful if you want to match (and keep intact) gallery URLs that might or might not include an important 'sort=desc' type of parameter.
Since File and Post URLs will do this culling, be careful that you not leave out anything important in your rules. Make sure what you have is both necessary (nothing can be removed and still keep it valid) and sufficient (no more needs to be added to make it valid). It is a good idea to try pasting the 'normalised' version of the example URL into your browser, just to check it still works.
"},{"location":"downloader_url_classes.html#default_values","title":"'default' values","text":"Some sites present the first page of a search like this:
https://danbooru.donmai.us/posts?tags=skirt
But the second page is:
https://danbooru.donmai.us/posts?tags=skirt&page=2
Another example is:
https://www.hentai-foundry.com/pictures/user/Mister69M
https://www.hentai-foundry.com/pictures/user/Mister69M/page/2
What happened to 'page=1' and '/page/1'? Adding those '1' values in works fine! Many sites, when an index is absent, will secretly imply an appropriate 0 or 1. This looks pretty to users looking at a browser address bar, but it can be a pain for us, who want to match both styles to one URL Class. It would be nice if we could recognise the 'bare' initial URL and fill in the '1' values to coerce it to the explicit, automation-friendly format. Defaults to the rescue:
After you set a path component or parameter String Match, you will be asked for an optional 'default' value. You won't want to set one most of the time, but for Gallery URLs, it can be hugely useful--see how the normalisation process automatically fills in the missing path component with the default! There are plenty of examples in the default Gallery URLs of this, so check them out. Most sites use page indices starting at '1', but Gelbooru-style imageboards use 'pid=0' file index (and often move forward 42, so the next pages will be 'pid=42', 'pid=84', and so on, although others use deltas of 20 or 40).
"},{"location":"downloader_url_classes.html#next_gallery_page_prediction","title":"can we predict the next gallery page?","text":"Now we can harmonise gallery urls to a single format, we can predict the next gallery page! If, say, the third path component or 'page' parameter is always a number referring to page, you can select this under the 'next gallery page' section and set the delta to change it by. The 'next gallery page url' section will be automatically filled in. This value will be consulted if the parser cannot find a 'next gallery page url' from the page content.
It is neat to set this up, but I only recommend it if you actually cannot reliably parse a next gallery page url from the HTML later in the process. It is neater to have searches stop naturally because the parser said 'no more gallery pages' than to have hydrus always one page beyond and end every single search on an uglier 'No results found' or 404 result.
Unfortunately, some sites will either not produce an easily parsable next page link or randomly just not include it due to some issue on their end (Gelbooru is a funny example of this). Also, APIs will often have a kind of 'start=200&num=50', 'start=250&num=50' progression but not include that state in the XML or JSON they return. These cases require the automatic next gallery page rules (check out Artstation and tumblr api gallery page URL Classes in the defaults for examples of this).
"},{"location":"downloader_url_classes.html#api_links","title":"how do we link to APIs?","text":"If you know that a URL has an API backend, you can tell the client to use that API URL when it fetches data. The API URL needs its own URL Class.
To define the relationship, click the \"String Converter\" button, which gives you this:
You may have seen this panel elsewhere. It lets you convert a string to another over a number of transformation steps. The steps can be as simple as adding or removing some characters or applying a full regex substitution. For API URLs, you are mostly looking to isolate some unique identifying data (\"m/thread/16086187\" in this case) and then substituting that into the new API path. It is worth testing this with several different examples!
When the client links regular URLs to API URLs like this, it will still associate the human-pretty regular URL when it needs to display to the user and record 'known urls' and so on. The API is just a quick lookup when it actually fetches and parses the respective data.
"},{"location":"duplicates.html","title":"duplicates","text":"As files are shared on the internet, they are often resized, cropped, converted to a different format, altered by the original or a new artist, or turned into a template and reinterpreted over and over and over. Even if you have a very restrictive importing workflow, your client is almost certainly going to get some duplicates. Some will be interesting alternate versions that you want to keep, and others will be thumbnails and other low-quality garbage you accidentally imported and would rather delete. Along the way, it would be nice to merge your ratings and tags to the better files so you don't lose any work.
Finding and processing duplicates within a large collection is impossible to do by hand, so I have written a system to do the heavy lifting for you. It currently works on still images, but an extension for gifs and video is planned.
Hydrus finds potential duplicates using a search algorithm that compares images by their shape. Once these pairs of potentials are found, they are presented to you through a filter like the archive/delete filter to determine their exact relationship and if you want to make a further action, such as deleting the 'worse' file of a pair. All of your decisions build up in the database to form logically consistent groups of duplicates and 'alternate' relationships that can be used to infer future information. For instance, if you say that file A is a duplicate of B and B is a duplicate of C, A and C are automatically recognised as duplicates as well.
This all starts on--
"},{"location":"duplicates.html#duplicates_page","title":"the duplicates processing page","text":"On the normal 'new page' selection window, hit special->duplicates processing. This will open this page:
Let's go to the preparation page first:
The 'similar shape' algorithm works on distance. Two files with 0 distance are likely exact matches, such as resizes of the same file or lower/higher quality jpegs, whereas those with distance 4 tend to be to be hairstyle or costume changes. You will be starting on distance 0 and not expect to ever go above 4 or 8 or so. Going too high increases the danger of being overwhelmed by false positives.
If you are interested, the current version of this system uses a 64-bit phash to represent the image shape and a VPTree to search different files' phashes' relative hamming distance. I expect to extend it in future with multiple phash generation (flips, rotations, and 'interesting' image crops and video frames) and most-common colour comparisons.
Searching for duplicates is fairly fast per file, but with a large client with hundreds of thousands of files, the total CPU time adds up. You can do a little manual searching if you like, but once you are all settled here, I recommend you hit the cog icon on the preparation page and let hydrus do this page's catch-up search work in your regular maintenance time. It'll swiftly catch up and keep you up to date without you even thinking about it.
Start searching on the 'exact match' search distance of 0. It is generally easier and more valuable to get exact duplicates out of the way first.
Once you have some files searched, you should see a potential pair count appear in the 'filtering' page.
"},{"location":"duplicates.html#duplicate_filtering_page","title":"the filtering page","text":"Processing duplicates can be real trudge-work if you do not set up a workflow you enjoy. It is a little slower than the archive/delete filter, and sometimes takes a bit more cognitive work. For many users, it is a good task to do while listening to a podcast or having a video going on another screen.
If you have a client with tens of thousands of files, you will likely have thousands of potential pairs. This can be intimidating, but do not worry--due to the A, B, C logical inferrences as above, you will not have to go through every single one. The more information you put into the system, the faster the number will drop.
The filter has a regular file search interface attached. As you can see, it defaults to system:everything, but you can limit what files you will be working on simply by adding new search predicates. You might like to only work on files in your archive (i.e. that you know you care about to begin with), for instance. You can choose whether both files of the pair should match the search, or just one. 'creator:' tags work very well at cutting the search domain to something more manageable and consistent--try your favourite creator!
If you would like an example from the current search domain, hit the 'show some random potential pairs' button, and it will show two or more files that seem related. It is often interesting and surprising to see what it finds! The action buttons below allow for quick processing of these pairs and groups when convenient (particularly for large cg sets with 100+ alternates), but I recommend you leave these alone until you know the system better.
When you are ready, launch the filter.
"},{"location":"duplicates.html#duplicates_filter","title":"the duplicates filter","text":"We have not set up your duplicate 'merge' options yet, so do not get too into this. For this first time, just poke around, make some pretend choices, and then cancel out and choose to forget them.
Like the archive/delete filter, this uses quick mouse-clicks, keyboard shortcuts, or button clicks to action pairs. It presents two files at a time, labelled A and B, which you can quickly switch between just as in the normal media viewer. As soon as you action them, the next pair is shown. The two files will have their current zoom-size locked so they stay the same size (and in the same position) as you switch between them. Scroll your mouse wheel a couple of times and see if any obvious differences stand out.
Please note the hydrus media viewer does not currently work well with large resolutions at high zoom (it gets laggy and may have memory issues). Don't zoom in to 1600% and try to look at jpeg artifact differences on very large files, as this is simply not well supported yet.
The hover window on the right also presents a number of 'comparison statements' to help you make your decision. Green statements mean this current file is probably 'better', and red the opposite. Larger, older, higher-quality, more-tagged files are generally considered better. These statements have scores associated with them (which you can edit in file->options->duplicates), and the file of the pair with the highest score is presented first. If the files are duplicates, you can generally assume the first file you see, the 'A', is the better, particularly if there are several green statements.
The filter will need to occasionally checkpoint, saving the decisions so far to the database, before it can fetch the next batch. This allows it to apply inferred information from your current batch and reduce your pending count faster before serving up the next set. It will present you with a quick interstitial 'confirm/back' dialog just to let you know. This happens more often as the potential count decreases.
"},{"location":"duplicates.html#duplicates_decisions","title":"the decisions to make","text":"There are three ways a file can be related to another in the current duplicates system: duplicates, alternates, or false positive (not related).
False positive (not related) is the easiest. You will not see completely unrelated pairs presented very often in the filter, particularly at low search distances, but if the shape of face and hair and clothing happen to line up (or geometric shapes, often), the search system may make a false positive match. In this case, just click 'they are not related'.
Alternate relations are files that are not duplicates but obviously related in some way. Perhaps a costume change or a recolour. Hydrus does not have rich alternate support yet (but it is planned, and highly requested), so this relationship is mostly a 'holding area' for files that we will revisit for further processing in the future.
Duplicate files are of the exact same thing. They may be different resolutions, file formats, encoding quality, or one might even have watermark, but they are fundamentally different views on the exact same art. As you can see with the buttons, you can select one file as the 'better' or say they are about the same. If the files are basically the same, there is no point stressing about which is 0.2% better--just click 'they are the same'. For better/worse pairs, you might have reason to keep both, but most of the time I recommend you delete the worse.
You can customise the shortcuts under file->shortcuts->duplicate_filter. The defaults are:
Left-click or space: this is better, delete the other.
Right-click: they are related alternates.
Middle-click: Go back one decision.
Enter/Escape: Stop filtering.
If two duplicates have different metadata like tags or archive status, you probably want to merge them. Cancel out of the filter and click the 'edit default duplicate metadata merge options' button:
By default, these options are fairly empty. You will have to set up what you want based on your services and preferences. Setting a simple 'copy all tags' is generally a good idea, and like/dislike ratings also often make sense. The settings for better and same quality should probably be similar, but it depends on your situation.
If you choose the 'custom action' in the duplicate filter, you will be presented with a fresh 'edit duplicate merge options' panel for the action you select and can customise the merge specifically for that choice. ('favourite' options will come here in the future!)
Once you are all set up here, you can dive into the duplicate filter. Please let me know how you get on with it!
"},{"location":"duplicates.html#future","title":"what now?","text":"The duplicate system is still incomplete. Now the db side is solid, the UI needs to catch up. Future versions will show duplicate information on thumbnails and the media viewer and allow quick-navigation to a file's duplicates and alternates.
For now, if you wish to see a file's duplicates, right-click it and select file relationships. You can review all its current duplicates, open them in a new page, appoint the new 'best file' of a duplicate group, and even mass-action selections of thumbnails.
You can also search for files based on the number of file relations they have (including when setting the search domain of the duplicate filter!) using system:file relationships. You can also search for best/not best files of groups, which makes it easy, for instance, to find all the spare duplicate files if you decide you no longer want to keep them.
I expect future versions of the system to also auto-resolve easy duplicate pairs, such as clearing out pixel-for-pixel png versions of jpgs.
"},{"location":"duplicates.html#game_cgs","title":"game cgs","text":"If you import a lot of game CGs, which frequently have dozens or hundreds of alternates, I recommend you set them as alternates by selecting them all and setting the status through the thumbnail right-click menu. The duplicate filter, being limited to pairs, needs to compare all new members of an alternate group to all other members once to verify they are not duplicates. This is not a big deal for alternates with three or four members, but game CGs provide an overwhelming edge case. Setting a group of thumbnails as alternate 'fixes' their alternate status immediately, discounting the possibility of any internate duplicates, and provides an easy way out of this situation.
"},{"location":"duplicates.html#duplicates_examples","title":"more information and examples","text":""},{"location":"duplicates.html#duplicates_examples_better_worse","title":"better/worse","text":"Which of two files is better? Here are some common reasons:
However these are not hard rules--sometimes a file has a larger resolution or filesize due to a bad upscaling or encoding decision by the person who 'reinterpreted' it. You really have to look at it and decide for yourself.
Here is a good example of a better/worse pair:
The first image is better because it is a png (pixel-perfect pngs are always better than jpgs for screenshots of applications--note how obvious the jpg's encoding artifacts are on the flat colour background) and it has a slightly higher (original) resolution, making it less blurry. I presume the second went through some FunnyJunk-tier trash meme site to get automatically cropped to 960px height and converted to the significantly smaller jpeg. Whatever happened, let's drop the second and keep the first.
When both files are jpgs, differences in quality are very common and often significant:
Again, this is mostly due to some online service resizing and lowering quality to ease on their bandwidth costs. There is usually no reason to keep the lower quality version.
"},{"location":"duplicates.html#duplicates_examples_same","title":"same quality duplicates","text":"When are two files the same quality? A good rule of thumb is if you scroll between them and see no obvious differences, and the comparison statements do not suggest anything significant, just set them as same quality.
Here are two same quality duplicates:
There is no obvious different between those two. The filesize is significantly different, so I suspect the smaller is a lossless png optimisation, but in the grand scheme of things, that doesn't matter so much. Many of the big content providers--Facebook, Google, Cloudflare--automatically 'optimise' the data that goes through their networks in order to save bandwidth. Although jpegs are often a slaughterhouse, with pngs it is usually harmless.
Given the filesize, you might decide that these are actually a better/worse pair--but if the larger image had tags and was the 'canonical' version on most boorus, the decision might not be so clear. You can choose better/worse and delete one randomly, but sometimes you may just want to keep both without a firm decision on which is best, so just set 'same quality' and move on. Your time is more valuable than a few dozen KB.
Sometimes, you will see pixel-for-pixel duplicate jpegs of very slightly different size, such as 787KB vs 779KB. The smaller of these is usually an exact duplicate that has had its internal metadata (e.g. EXIF tags) stripped by a program or website CDN. They are same quality unless you have a strong opinion on whether having internal metadata in a file is useful.
"},{"location":"duplicates.html#duplicates_examples_alternates","title":"alternates","text":"As I wrote above, hydrus's alternates system in not yet properly ready. It is important to have a basic 'alternates' relationship for now, but it is a holding area until we have a workflow to apply 'WIP'- or 'recolour'-type labels and present that information nicely in the media viewer.
Alternates are not of exactly the same thing, but one is variant of the other or they are both descended from a common original. The precise definition is up to you, but it generally means something like:
Here are some recolours of the same image:
And some WIP:
And a costume change:
None of these are duplicates, but they are obviously related. The duplicate search will notice they are similar, so we should let the client know they are 'alternate'.
Here's a subtler case:
These two files are very similar, but try opening both in separate tabs and then flicking back and forth: the second's glove-string is further into the mouth and has improved chin shading, a more refined eye shape, and shaved pubic hair. It is simple to spot these differences in the client's duplicate filter when you scroll back and forth.
I believe the second is an improvement on the first by the same artist, so it is a WIP alternate. You might also consider it a 'better' improvement.
Here are three files you might or might not consider to be alternates:
These are all based on the same template--which is why the dupe filter found them--but they are not so closely related as those above, and the last one is joking about a different ideology entirely and might deserve to be in its own group. Ultimately, you might prefer just to give them some shared tag and consider them not alternates per se.
"},{"location":"duplicates.html#duplicates_examples_false_positive","title":"not related/false positive","text":"Here are two files that match false positively:
Despite their similar shape, they are neither duplicates nor of even the same topic. The only commonality is the medium. I would not consider them close enough to be alternates--just adding something like 'screenshot' and 'imageboard' as tags to both is probably the closest connection they have.
Recording the 'false positive' relationship is important to make sure the comparison does not come up again in the duplicate filter.
The incidence of false positives increases as you broaden the search distance--the less precise your search, the less likely it is to be correct. At distance 14, these files all match, but uselessly:
"},{"location":"duplicates.html#duplicates_advanced","title":"the duplicates system","text":"(advanced nonsense, you can skip this section. tl;dr: duplicate file groups keep track of their best quality file, sometimes called the King)
Hydrus achieves duplicate transitivity by treating duplicate files as groups. Although you action pairs, if you set (A duplicate B), that creates a group (A,B). Subsequently setting (B duplicate C) extends the group to be (A,B,C), and so (A duplicate C) is transitively implied.
The first version of the duplicate system attempted to record better/worse/same information for all files in a virtual duplicate group, but this proved very complicated, workflow-heavy, and not particularly useful. The new system instead appoints a single King as the best file of a group. All other files in the group are beneath the King and have no other relationship data retained.
This King represents the group in the duplicate filter (and in potential pairs, which are actually recorded between duplicate media groups--even if most of them at the outset only have one member). If the other file in a pair is considered better, it becomes the new King, but if it is worse or equal, it merges into the other members. When two Kings are compared, whole groups can merge!
Alternates are stored in a similar way, except the members are duplicate groups rather than individual files and they have no significant internal relationship metadata yet. If \u03b1, \u03b2, and \u03b3 are duplicate groups that each have one or more files, then setting (\u03b1 alt \u03b2) and (\u03b2 alt \u03b3) creates an alternate group (\u03b1,\u03b2,\u03b3), with the caveat that \u03b1 and \u03b3 will still be sent to the duplicate filter once just to check they are not duplicates by chance. The specific file members of these groups, A, B, C and so on, inherit the relationships of their parent groups when you right-click on their thumbnails.
False positive relationships are stored between pairs of alternate groups, so they apply transitively between all the files of either side's alternate group. If (\u03b1 alt \u03b2) and (\u03c8 alt \u03c9) and you apply (\u03b1 fp \u03c8), then (\u03b1 fp \u03c9), (\u03b2 fp \u03c8), and (\u03b2 fp \u03c9) are all transitively implied.
More examples"},{"location":"faq.html","title":"FAQ","text":""},{"location":"faq.html#repositories","title":"What is a repository?","text":"
A repository is a service in the hydrus network that stores a certain kind of information--files or tag mappings, for instance--as submitted by users all over the internet. Those users periodically synchronise with the repository so they know everything that it stores. Sometimes, like with tags, this means creating a complete local copy of everything on the repository. Hydrus network clients never send queries to repositories; they perform queries over their local cache of the repository's data, keeping everything confined to the same computer.
"},{"location":"faq.html#tags","title":"What is a tag?","text":"wiki
A tag is a small bit of text describing a single property of something. They make searching easy. Good examples are \"flower\" or \"nicolas cage\" or \"the sopranos\" or \"2003\". By combining several tags together ( e.g. [ 'tiger woods', 'sports illustrated', '2008' ] or [ 'cosplay', 'the legend of zelda' ] ), a huge image collection is reduced to a tiny and easy-to-digest sample.
A good word for the connection of a particular tag to a particular file is mapping.
Hydrus is designed with the intention that tags are for searching, not describing. Workflows and UI are tuned for finding files and other similar files (e.g. by the same artist), and while it is possible to have nice metadata overlays around files, this is not considered their chief purpose. Trying to have 'perfect' descriptions for files is often a rabbit-hole that can consume hours of work with relatively little demonstrable benefit.
All tags are automatically converted to lower case. 'Sunset Drive' becomes 'sunset drive'. Why?
Furthermore, leading and trailing whitespace is removed, and multiple whitespace is collapsed to a single character.
' yellow dress '
becomes
'yellow dress'
"},{"location":"faq.html#namespaces","title":"What is a namespace?","text":"A namespace is a category that in hydrus prefixes a tag. An example is 'person' in the tag 'person:ron paul'--it lets people and software know that 'ron paul' is a name. You can create any namespace you like; just type one or more words and then a colon, and then the next string of text will have that namespace.
The hydrus client gives namespaces different colours so you can pick out important tags more easily in a large list, and you can also search by a particular namespace, even creating complicated predicates like 'give all files that do not have any character tags', for instance.
"},{"location":"faq.html#filenames","title":"Why not use filenames and folders?","text":"As a retrieval method, filenames and folders are less and less useful as the number of files increases. Why?
A filename is often--for ridiculous reasons--limited to a certain prohibitive character set. Even when utf-8 is supported, some arbitrary ascii characters are usually not, and different localisations, operating systems and formatting conventions only make it worse.
Folders can offer context, but they are clunky and time-consuming to change. If you put each chapter of a comic in a different folder, for instance, reading several volumes in one sitting can be a pain. Nesting many folders adds navigation-latency and tends to induce less informative \"04.jpg\"-type filenames.
So, the client tracks files by their hash. This technical identifier easily eliminates duplicates and permits the database to robustly attach other metadata like tags and ratings and known urls and notes and everything else, even across multiple clients and even if a file is deleted and later imported.
As a general rule, I suggest you not set up hydrus to parse and display all your imported files' filenames as tags. 'image.jpg' is useless as a tag. Shed the concept of filenames as you would chains.
"},{"location":"faq.html#external_files","title":"Can the client manage files from their original locations?","text":"When the client imports a file, it makes a quickly accessible but human-ugly copy in its internal database, by default under install_dir/db/client_files. When it needs to access that file again, it always knows where it is, and it can be confident it is what it expects it to be. It never accesses the original again.
This storage method is not always convenient, particularly for those who are hesitant about converting to using hydrus completely and also do not want to maintain two large copies of their collections. The question comes up--\"can hydrus track files from their original locations, without having to copy them into the db?\"
The technical answer is, \"This support could be added,\" but I have decided not to, mainly because:
It is not unusual for new users who ask for this feature to find their feelings change after getting more experience with the software. If desired, path text can be preserved as tags using regexes during import, and getting into the swing of searching by metadata rather than navigating folders often shows how very effective the former is over the latter. Most users eventually import most or all of their collection into hydrus permanently, deleting their old folder structure as they go.
For this reason, if you are hesitant about doing things the hydrus way, I advise you try running it on a smaller subset of your collection, say 5,000 files, leaving the original copies completely intact. After a month or two, think about how often you used hydrus to look at the files versus navigating through folders. If you barely used the folders, you probably do not need them any more, but if you used them a lot, then hydrus might not be for you, or it might only be for some sorts of files in your collection.
"},{"location":"faq.html#sqlite","title":"Why use SQLite?","text":"Hydrus uses SQLite for its database engine. Some users who have experience with other engines such as MySQL or PostgreSQL sometimes suggest them as alternatives. SQLite serves hydrus's needs well, and at the moment, there are no plans to change.
Since this question has come up frequently, a user has written an excellent document talking about the reasons to stick with SQLite. If you are interested in this subject, please check it out here:
https://gitgud.io/prkc/hydrus-why-sqlite/blob/master/README.md
"},{"location":"faq.html#hashes","title":"What is a hash?","text":"wiki
Hashes are a subject you usually have to be a software engineer to find interesting. The simple answer is that they are unique names for things. Hashes make excellent identifiers inside software, as you can safely assume that f099b5823f4e36a4bd6562812582f60e49e818cf445902b504b5533c6a5dad94 refers to one particular file and no other. In the client's normal operation, you will never encounter a file's hash. If you want to see a thumbnail bigger, double-click it; the software handles the mathematics.
For those who are interested: hydrus uses SHA-256, which spits out 32-byte (256-bit) hashes. The software stores the hash densely, as 32 bytes, only encoding it to 64 hex characters when the user views it or copies to clipboard. SHA-256 is not perfect, but it is a great compromise candidate; it is secure for now, it is reasonably fast, it is available for most programming languages, and newer CPUs perform it more efficiently all the time.
"},{"location":"faq.html#access_keys","title":"What is an access key?","text":"The hydrus network's repositories do not use username/password, but instead a single strong identifier-password like this:
7ce4dbf18f7af8b420ee942bae42030aab344e91dc0e839260fcd71a4c9879e3
These hex numbers give you access to a particular account on a particular repository, and are often combined like so:
7ce4dbf18f7af8b420ee942bae42030aab344e91dc0e839260fcd71a4c9879e3@hostname.com:45871
They are long enough to be impossible to guess, and also randomly generated, so they reveal nothing personally identifying about you. Many people can use the same access key (and hence the same account) on a repository without consequence, although they will have to share any bandwidth limits, and if one person screws around and gets the account banned, everyone will lose access.
The access key is the account. Do not give it to anyone you do not want to have access to the account. An administrator will never need it; instead they will want your account id.
"},{"location":"faq.html#account_ids","title":"What is an account id?","text":"This is another long string of random hexadecimal that identifies your account without giving away access. If you need to identify yourself to a repository administrator (say, to get your account's permissions modified), you will need to tell them your account id. You can copy it to your clipboard in services->review services.
"},{"location":"faq.html#service_isolation","title":"Why does the file I deleted and then re-imported still have its tags?","text":"Hydrus splits its different abilities and domains (e.g. the list of files on your disk, or the tag mappings in 'my tags', or your files' notes) into separate services. You can see these in review services and manage services. Although the services of the same type may interact (e.g. deleting a file from one service might send that file to the 'trash' service, or adding tag parents to one tag service might implicate tags on another), those of different types are generally completely independent. Your tags don't care where the files they map to are.
So, when you delete a file from 'my files', none of its tag mappings in 'my tags' change--they remain attached to the 'ghost' of the deleted file. Your notes, ratings, and known URLs are the same (URLs is important, since it lets the client skip URLs for files you previously deleted). If you re-import the file, it will have everything it did before, with only a couple of pertinent changes like, obviously, import time.
This is an important part of how the PTR works--when you sync with the PTR, your client downloads a couple billion mappings for files you do not have yet. Then, when you happen to import one of those files, it appears in your importer with its PTR tags 'apparently' already set--in truth, it always had them.
When you feel like playing with some more advanced concepts, turn on help->advanced mode and open a new search page. Change the file domain from 'my files' to 'all known files' or 'deleted from my files' and start typing a common tag--you'll get autocomplete results with counts! You can even run the search, and you'll get a ton of 'non-local' and therefore non-viewable files that are typically given a default hydrus thumbnail. These are files that your client is aware of, but does not currently have. You can run the manage x dialogs and edit the metadata of these ghost files just as you can your real ones. The only thing hydrus ever needs to attach metadata to a file is the file's SHA256 hash.
If you really want to delete the tags or other data for some files you deleted, then:
Ctrl+A->manage tags
and manually delete the tags there.Not really. Unless your situation involves millions of richly locally tagged files and a gigantic deleted:kept file ratio, don't worry about it.
"},{"location":"faq.html#does_the_metadata_for_files_i_deleted_mean_there_is_some_kind_of_a_permanent_record_of_which_files_my_client_has_heard_about_andor_seen_directly_even_if_i_purge_the_deletion_record","title":"Does the metadata for files I deleted mean there is some kind of a permanent record of which files my client has heard about and/or seen directly, even if I purge the deletion record?","text":"Yes. I am working on updating the database infrastructure to allow a full purge, but the structure is complicated, so it will take some time. If you are afraid of someone stealing your hard drive and matriculating your sordid MLP collection (or, in this case, the historical log of horrors that you rejected), do some research into drive encryption. Hydrus runs fine off an encrypted disk.
"},{"location":"faq.html#encryption","title":"Does Hydrus run ok off an encrypted drive partition?","text":"Yes! Both the database and your files should be fine on any of the popular software solutions. These programs give your OS a virtual drive that on my end looks and operates like any other. I have yet to encounter one that SQLite has a problem with. Make sure you don't have auto-dismount set--or at least be hawkish that it will never trigger while hydrus is running--or you could damage your database.
Drive encryption is a good idea for all your private things. If someone steals your laptop or USB stick, it means you only have to deal with frustration and replacement expenses (rather than also a nightmare of anxiety and identity-loss as some bad guy combs through all your things).
If you don't know how drive encryption works, search it up and have a play with a spare USB stick or a small 256MB file partition. Veracrypt is a popular and easy program, but there are several solutions. Get some practice and take it seriously, since if you act foolishly you can really screw yourself (e.g. locking yourself out of the only copy of data you have left because you forgot the password). Make sure you have a good plan, reliable (encrypted) backups, and a password manager.
"},{"location":"faq.html#delays","title":"Why can my friend not see what I just uploaded?","text":"The repositories do not work like conventional search engines; it takes a short but predictable while for changes to propagate to other users.
The client's searches only ever happen over its local cache of what is on the repository. Any changes you make will be delayed for others until their next update occurs. At the moment, the update period is 100,000 seconds, which is about 1 day and 4 hours.
"},{"location":"filetypes.html","title":"Supported Filetypes","text":"This is a list of all filetypes Hydrus can import. Hydrus determines the filetype based on examining the file itself rather than the extension or MIME type.
"},{"location":"filetypes.html#images","title":"Images","text":"Filetype Extension MIME type Thumbnails Viewable in Hydrus Notes jpeg.jpeg
image/jpeg
\u2705 \u2705 png .png
image/png
\u2705 \u2705 static gif .gif
image/gif
\u2705 \u2705 webp .webp
image/webp
\u2705 \u2705 Animated webp files will display as static tiff .tiff
image/tiff
\u2705 \u2705 qoi .qoi
image/qoi
\u2705 \u2705 Quite OK Image Format icon .ico
image/x-icon
\u2705 \u2705 bmp .bmp
image/bmp
\u2705 \u2705 heif .heif
image/heif
\u2705 \u2705 heic .heic
image/heic
\u2705 \u2705 avif .avif
image/avif
\u2705 \u2705"},{"location":"filetypes.html#animations","title":"Animations","text":"Filetype Extension MIME type Thumbnails Viewable in Hydrus Notes animated gif .gif
image/gif
\u2705 \u2705 apng .apng
image/apng
\u2705 \u2705 heif sequence .heifs
image/heif-sequence
\u2705 \u2705 heic sequence .heics
image/heic-sequence
\u2705 \u2705 avif sequence .avifs
image/avif-sequence
\u2705 \u2705"},{"location":"filetypes.html#video","title":"Video","text":"Filetype Extension MIME type Thumbnails Viewable in Hydrus Notes mp4 .mp4
video/mp4
\u2705 \u2705 webm .webm
video/webm
\u2705 \u2705 matroska .mkv
video/x-matroska
\u2705 \u2705 avi .avi
video/x-msvideo
\u2705 \u2705 flv .flv
video/x-flv
\u2705 \u2705 quicktime .mov
video/quicktime
\u2705 \u2705 mpeg .mpeg
video/mpeg
\u2705 \u2705 ogv .ogv
video/ogg
\u2705 \u2705 realvideo .rm
video/vnd.rn-realvideo
\u2705 \u2705 wmv .wmv
video/x-ms-wmv
\u2705 \u2705"},{"location":"filetypes.html#audio","title":"Audio","text":"Filetype Extension MIME type Viewable in Hydrus Notes mp3 .mp3
audio/mp3
\u2705 ogg .ogg
audio/ogg
\u2705 flac .flac
audio/flac
\u2705 m4a .m4a
audio/mp4
\u2705 matroska audio .mkv
audio/x-matroska
\u2705 mp4 audio .mp4
audio/mp4
\u2705 realaudio .ra
audio/vnd.rn-realaudio
\u2705 tta .tta
audio/x-tta
\u2705 wave .wav
audio/x-wav
\u2705 wavpack .wv
audio/wavpack
\u2705 wma .wma
audio/x-ms-wma
\u2705"},{"location":"filetypes.html#applications","title":"Applications","text":"Filetype Extension MIME type Thumbnails Viewable in Hydrus Notes flash .swf
application/x-shockwave-flash
\u2705 \u274c pdf .pdf
application/pdf
\u2705 \u274c 300 DPI assumed for resolution. No thumbnails for encrypted PDFs. epub .epub
application/epub+zip
\u274c \u274c djvu .djvu
image/vnd.djvu
\u274c \u274c"},{"location":"filetypes.html#image_project_files","title":"Image Project Files","text":"Filetype Extension MIME type Thumbnails Viewable in Hydrus Notes psd .psd
image/vnd.adobe.photoshop
\u2705 \u2705 Adobe Photoshop. Hydrus shows the embedded preview image if present in the file. clip .clip
application/clip
1 \u2705 \u274c Clip Studio Paint sai2 .sai2
application/sai2
1 \u274c \u274c PaintTool SAI2 krita .kra
application/x-krita
\u2705 \u2705 Krita. Hydrus shows the embedded preview image if present in the file. svg .svg
image/svg+xml
\u2705 \u274c xcf .xcf
application/x-xcf
\u274c \u274c GIMP procreate .procreate
application/x-procreate
1 \u2705 \u274c Procreate app"},{"location":"filetypes.html#archives","title":"Archives","text":"Filetype Extension MIME type Notes 7z .7z
application/x-7z-compressed
gzip .gz
application/gzip
rar .rar
application/vnd.rar
zip .zip
application/zip
This filetype doesn't have an official or de facto media type, the one listed was made up for Hydrus.\u00a0\u21a9\u21a9\u21a9
This page serves as a checklist or overview for the getting started part of Hydrus. It is recommended to read at least all of the getting started pages, but if you want to head to some specific section directly go ahead and do so.
"},{"location":"gettingStartedOverview.html#the_client","title":"The client","text":"Have a look at getting started with files to get an overview of the Hydrus client.
"},{"location":"gettingStartedOverview.html#local_files","title":"Local files","text":"If you already have many local files, either downloaded by hand or by some other downloader tool, head to the getting started importing section to begin importing them.
"},{"location":"gettingStartedOverview.html#downloading","title":"Downloading","text":"If you want to download with Hydrus, check out getting started with downloading. If you want to add the ability to download from sites not already available in Hydrus by default, check out adding new downloaders for how and a link to a user-maintained archive of downloaders.
"},{"location":"gettingStartedOverview.html#tags_and_ratings","title":"Tags and ratings","text":"If you have imported and/or downloaded some files and want to get started searching and tagging see searching and sorting and getting started with ratings.
It is also worth having a look at siblings for when you want to consolidate different tags that all mean the same thing, common misspellings, or preferential differences into one tag.
Parents are for when you want a tag to always add another tag. Commonly used for characters since you would usually want to add the series they're from too.
"},{"location":"gettingStartedOverview.html#duplicates","title":"Duplicates","text":"Have a lot of very similar looking pictures because of one reason or another? Have a look at duplicates, Hydrus' duplicates finder and filtering tool.
"},{"location":"gettingStartedOverview.html#api","title":"API","text":"Hydrus has an API that lets external tools connect to it. See API for how to turn it on and a list of some of these tools.
"},{"location":"getting_started_downloading.html","title":"Getting started with downloading","text":"The hydrus client has a sophisticated and completely user-customisable download system. It can pull from any booru or regular gallery site or imageboard, and also from some special examples like twitter and tumblr. A single file or URL to massive imports, the downloader can handle it all. A fresh install will by default have support for the bigger sites, but it is possible, with some work, for any user to create a new shareable downloader for a new site.
The downloader is highly parallelisable, and while the default bandwidth rules should stop you from running too hot and downloading so much at once that you annoy the servers you are downloading from, there are no brakes in the program on what you can get.
Danger
It is very important that you take this slow. Many users get overexcited with their new ability to download 500,000 files and then do so, only discovering later that 98% of what they got was junk that they now have to wade through. Figure out what workflows work for you, how fast you process files, what content you actually want, how much bandwidth and hard drive space you have, and prioritise and throttle your incoming downloads to match. If you can realistically only archive/delete filter 50 files a day, there is little benefit to downloading 500 new files a day. START SLOW.
It also takes a decent whack of CPU to import a file. You'll usually never notice this with just one hard drive import going, but if you have twenty different download queues all competing for database access and individual 0.1-second hits of heavy CPU work, you will discover your client starts to judder and lag. Keep it in mind, and you'll figure out what your computer is happy with. I also recommend you try to keep your total loaded files/urls to be under 20,000 to keep things snappy. Remember that you can pause your import queues, if you need to calm things down a bit.
"},{"location":"getting_started_downloading.html#downloader_types","title":"Downloader types","text":"There are a number of different downloader types, each with its own purpose:
URL download Intended for single posts or images. (Works with the API) Gallery For big download jobs such as an artist's catalogue, everything with a given tag on a booru. Subscriptions Repeated gallery jobs, for keeping up to date with an artist or tag. Use gallery downloader to get everything and a subscription to keep updated. Watcher Imageboard thread downloader, such as 4chan, 8chan, and what else exists. (Works with the API) Simple downloader Intended for simple one-off jobs like grabbing all linked images in a page."},{"location":"getting_started_downloading.html#url_download","title":"URL download","text":"The url downloader works like the gallery downloader but does not do searches. You can paste downloadable URLs to it, and it will work through them as one list. Dragging and dropping recognisable URLs onto the client (e.g. from your web browser) will also spawn and use this downloader.
The button next to the input field lets you paste multiple URLs at once such as if you've copied from a document or browser bookmarks. The URLs need to be newline separated.
"},{"location":"getting_started_downloading.html#api","title":"API","text":"If you use API-connected programs such as the Hydrus Companion, then any non-watchable URLs sent to Hydrus through them will end up in an URL downloader page, the specifics depending on the program's settings. You can't use this to force Hydrus to download paged galleries since the URL downloader page doesn't support traversing to the next page, use the gallery downloader for this.
"},{"location":"getting_started_downloading.html#gallery_download","title":"Gallery download","text":"The gallery page can download from multiple sources at the same time. Each entry in the list represents a basic combination of two things:
Source The site you are getting from. Safebooru or Danbooru or Deviant Art or twitter or anywhere else. In the example image this is the button labelledartstation artist lookup
. Query text Something like 'contrapposto' or 'blonde_hair blue_eyes' or an artist name like 'incase'. Whatever is searched on the site to return a list of ordered media. In the example image this is the text field with artist username
in it. So, when you want to start a new download, you first select the source with the button and then type in a query in the text box and hit enter. The download will soon start and fill in information, and thumbnails should stream in, just like the hard drive importer. The downloader typically works by walking through the search's gallery pages one by one, queueing up the found files for later download. There are several intentional delays built into the system, so do not worry if work seems to halt for a little while--you will get a feel for hydrus's 'slow persistent growth' style with experience.
Do a test download now, for fun! Pause its gallery search after a page or two, and then pause the file import queue after a dozen or so files come in.
The thumbnail panel can only show results from one queue at a time, so double-click on an entry to 'highlight' it, which will show its thumbs and also give more detailed info and controls in the 'highlighted query' panel. I encourage you to explore the highlight panel over time, as it can show and do quite a lot. Double-click again to 'clear' it.
It is a good idea to 'test' larger downloads, either by visiting the site itself for that query, or just waiting a bit and reviewing the first files that come in. Just make sure that you are getting what you thought you would, whether that be verifying that the query text is correct or that the site isn't only giving you bloated gifs or other bad quality files. The 'file limit', which stops the gallery search after the set number of files, is also great for limiting fishing expeditions (such as overbroad searches like 'wide_hips', which on the bigger boorus have 100k+ results and return variable quality). If the gallery search runs out of new files before the file limit is hit, the search will naturally stop (and the entry in the list should gain a \u23f9 'stop' symbol).
Note that some sites only serve 25 or 50 pages of results, despite their indices suggesting hundreds. If you notice that one site always bombs out at, say, 500 results, it may be due to a decision on their end. You can usually test this by visiting the pages hydrus tried in your web browser.
In general, particularly when starting out, artist searches are best. They are usually fewer than a thousand files and have fairly uniform quality throughout.
"},{"location":"getting_started_downloading.html#subscriptions","title":"Subscriptions","text":"Let's say you found an artist you like. You downloaded everything of theirs from some site, but every week, one or two new pieces is posted. You'd like to keep up with the new stuff, but you don't want to manually make a new download job every week for every single artist you like.
Subscriptions are a way to automatically recheck a good query in future, to keep up with new files. Many users come to use them. You set up a number of saved queries, and the client will 'sync' with the latest files in the gallery and download anything new, just as if you were running the download yourself.
Subscriptions only work for booru-like galleries that put the newest files first, and they only keep up with new content--once they have done their first sync, which usually gets the most recent hundred files or so, they will never reach further into the past. Getting older files, as you will see later, is a job best done with a normal download page.
Note
The entire subscription system assumes the source is a typical 'newest first' booru-style search. If you dick around with some order_by:rating/random metatag, it will not work reliably.
It is important to note that while subscriptions can have multiple queries (even hundreds!), they generally only work on one site. Expect to create one subscription for safebooru, one for artstation, one for paheal, and so on for every site you care about. Advanced users may be able to think of ways to get around this, but I recommend against it as it throws off some of the internal check timing calculations.
"},{"location":"getting_started_downloading.html#setting_up_subscriptions","title":"Setting up subscriptions","text":"Here's the dialog, which is under network->manage subscriptions:
This is a very simple example--there is only one subscription, for safebooru. It has two 'queries' (i.e. searches to keep up with).
Before we trip over the advanced buttons here, let's zoom in on the actual subscription:
Danger
Do not change the max number of new files options until you know exactly what they do and have a good reason to alter them!
This is a big and powerful panel! I recommend you open the screenshot up in a new browser tab, or in the actual client, so you can refer to it.
Despite all the controls, the basic idea is simple: Up top, I have selected the 'safebooru tag search' download source, and then I have added two artists--\"hong_soon-jae\" and \"houtengeki\". These two queries have their own panels for reviewing what URLs they have worked on and further customising their behaviour, but all they really are is little bits of search text. When the subscription runs, it will put the given search text into the given download source just as if you were running the regular downloader.
Warning
Subscriptions syncs are somewhat fragile. Do not try to play with the limits or checker options to download a whole 5,000 file query in one go--if you want everything for a query, run it in the manual downloader and get everything, then set up a normal sub for new stuff. There is no benefit to having a 'large' subscription, and it will trim itself down in time anyway.
You might want to put subscriptions off until you are more comfortable with galleries. There is more help here.
"},{"location":"getting_started_downloading.html#watchers","title":"Watchers","text":"If you are an imageboard user, try going to a thread you like and drag-and-drop its URL (straight from your web browser's address bar) onto the hydrus client. It should open up a new 'watcher' page and import the thread's files!
With only one URL to check, watchers are a little simpler than gallery searches, but as that page is likely receiving frequent updates, it checks it over and over until it dies. By default, the watcher's 'checker options' will regulate how quickly it checks based on the speed at which new files are coming in--if a thread is fast, it will check frequently; if it is running slow, it may only check once per day. When a thread falls below a critical posting velocity or 404s, checking stops.
In general, you can leave the checker options alone, but you might like to revisit them if you are always visiting faster or slower boards and find you are missing files or getting DEAD too early.
"},{"location":"getting_started_downloading.html#api_1","title":"API","text":"If you use API-connected programs such as the Hydrus Companion, then any watchable URLs sent to Hydrus through them will end up in a watcher page, the specifics depending on the program's settings.
"},{"location":"getting_started_downloading.html#simple_downloader","title":"Simple downloader","text":"The simple downloader will do very simple parsing for unusual jobs. If you want to download all the images in a page, or all the image link destinations, this is the one to use. There are several default parsing rules to choose from, and if you learn the downloader system yourself, it will be easy to make more.
"},{"location":"getting_started_downloading.html#import_options","title":"Import options","text":"Every importer in Hydrus has some 'import options' that change what is allowed, what is blacklisted, and whether tags or notes should be saved.
In previous versions these were split into completely different windows called file import options
and tag import options
so if you see those anywhere, this is what they're talking about and not some hidden menu anywhere.
Importers that download from websites rely on a flexible 'defaults' system, so you do not have to set them up every time you start a new downloader. While you should play around with your import options, once you know what works for you, you should set that as the default under network->downloaders->manage default import options. You can set them for all file posts generally, all watchers, and for specific sites as well.
"},{"location":"getting_started_downloading.html#file_import_options","title":"File import options","text":"This deals with the files being downloaded and what should happen to them. There's a few more tickboxes if you turn on advanced mode.
pre-import checks Pretty self-explanatory for the most part. If you want to redownload previously deleted files turning offexclude previously deleted files
will have Hydrus ignore deletion status. A few of the options have more information if you hover over them. import destinations See multiple file services, an advanced feature. post import actions See the files section on filtering for the first option, the other two have information if you hover over them."},{"location":"getting_started_downloading.html#tag_parsing","title":"Tag Parsing","text":"By default, hydrus now starts with a local tag service called 'downloader tags' and it will parse (get) all the tags from normal gallery sites and put them in this service. You don't have to do anything, you will get some decent tags. As you use the client, you will figure out which tags you like and where you want them. On the downloader page, click import options
:
This is an important dialog, although you will not need to use it much. It governs which tags are parsed and where they go. To keep things easy to manage, a new downloader will refer to the 'default' tag import options for a website, but for now let's set some values just for this downloader:
You can see that each tag service on your client has a separate section. If you add the PTR, that will get a new box too. A new client is set to get all tags for 'downloader tags' service. Things can get much more complicated. Have a play around with the options here as you figure things out. Most of the controls have tooltips or longer explainers in sub-dialogs, so don't be afraid to try things.
It is easy to get tens of thousands of tags by downloading this way. Different sites offer different kinds and qualities of tags, and the client's downloaders (which were designed by me, the dev, or a user) may parse all or only some of them. Many users like to just get everything on offer, but others only ever want, say, creator
, series
, and character
tags. If you feel brave, click that 'all tags' button, which will take you into hydrus's advanced 'tag filter', which allows you to select which of the incoming list of tags will be added.
The blacklist button will let you skip downloading files that have certain tags (perhaps you would like to auto-skip all images with gore
, scat
, or diaper
?), again using the tag filter, while the whitelist enables you to only allow files that have at least one of a set of tags. The 'additional tags' adds some fixed personal tags to all files coming in--for instance, you might like to add 'process into favourites' to your 'my tags' for some query you really like so you can find those files again later and process them separately. That little 'cog' icon button can also do some advanced things.
Warning
The file limit and import options on the upper panel of a gallery or watcher page, if changed, will only apply to new queries. If you want to change the options for an existing queue, either do so on its highlight panel below or use the 'set options to queries' button.
"},{"location":"getting_started_downloading.html#note_parsing","title":"Note Parsing","text":"Hydrus alsos parse 'notes' from some sites. This is a young feature, and a little advanced at times, but it generally means the comments that artists leave on certain gallery sites, or something like a tweet text. Notes are editable by you and appear in a hovering window on the right side of the media viewer.
Most of the controls here ensure that successive parses do not duplicate existing notes. The default settings are fine for all normal purposes, and you can leave them alone unless you know you want something special (e.g. turning note parsing off completely).
"},{"location":"getting_started_downloading.html#bandwidth","title":"Bandwidth","text":"It will not be too long until you see a \"bandwidth free in xxxxx...\" message. As a long-term storage solution, hydrus is designed to be polite in its downloading--both to the source server and your computer. The client's default bandwidth rules have some caps to stop big mistakes, spread out larger jobs, and at a bare minimum, no domain will be hit more than once a second.
All the bandwidth rules are completely customisable and are found in network > data > review bandwidth usage and edit rules
. They can get quite complicated. I strongly recommend you not look for them until you have more experience. I especially strongly recommend you not ever turn them all off, thinking that will improve something, as you'll probably render the client too laggy to function and get yourself an IP ban from the next server you pull from.
If you want to download 10,000 files, set up the queue and let it work. The client will take breaks, likely even to the next day, but it will get there in time. Many users like to leave their clients on all the time, just running in the background, which makes these sorts of downloads a breeze--you check back in the evening and discover your download queues, watchers, and subscriptions have given you another thousand things to deal with.
Again: the real problem with downloading is not finding new things, it is keeping up with what you get. Start slow and figure out what is important to your bandwidth budget, hard drive budget, and free time budget. Almost everyone fails at this.
"},{"location":"getting_started_downloading.html#logins","title":"Logins","text":"The client now supports a flexible (but slightly prototype and ugly) login system. It can handle simple sites and is as completely user-customisable as the downloader system. The client starts with multiple login scripts by default, which you can review under network->logins->manage logins:
Many sites grant all their content without you having to log in at all, but others require it for NSFW or special content, or you may wish to take advantage of site-side user preferences like personal blacklists. If you wish, you can give hydrus some login details here, and it will try to login--just as a browser would--before it downloads anything from that domain.
Warning
For multiple reasons, I do not recommend you use important accounts with hydrus. Use a throwaway account you don't care much about.
To start using a login script, select the domain and click 'edit credentials'. You'll put in your username/password, and then 'activate' the login for the domain, and that should be it! The next time you try to get something from that site, the first request will wait (usually about ten seconds) while a login popup performs the login. Most logins last for about thirty days (and many refresh that 30-day timer every time you make a new request), so once you are set up, you usually never notice it again, especially if you have a subscription on the domain.
Most sites only have one way of logging in, but hydrus does support more. Hentai Foundry is a good example--by default, the client performs the 'click-through' login as a guest, which requires no credentials and means any hydrus client can get any content from the start. But this way of logging in only lasts about 60 minutes or so before having to be refreshed, and it does not hide any spicy stuff, so if you use HF a lot, I recommend you create a throwaway account, set the filters you like in your HF profile (e.g. no guro content), and then click the 'change login script' in the client to the proper username/pass login.
The login system is not very clever. Don't try to pull off anything too weird with it! If anything goes wrong, it will likely delay the script (and hence the whole domain) from working for a while, or invalidate it entirely. If the error is something simple, like a password typo or current server maintenance, go back to this dialog to fix and scrub the error and try again. If the site just changed its layout, you may need to update the login script. If it is more complicated, please contact me, hydrus_dev, with the details!
If you would like to login to a site that is not yet supported by hydrus (usually ones with a Captcha in the login page), you have two options:
Boorus are usually easy to parse from, and there are many hydrus downloaders available that work well. Other sites are less easy to download from. Some will purposefully disguise access behind captchas or difficult login tokens that the hydrus downloader just isn't clever enough to handle. In these cases, it can be best just to go to an external downloader program that is specially tuned for these complex sites.
It takes a bit of time to set up these sorts of programs--and if you get into them, you'll likely want to make a script to help automate their use--but if you know they solve your problem, it is well worth it!
With these tools, used manually and/or with some scripts you set up, you may be able to set up a regular import workflow to hydrus (especilly with an Import Folder
as under the file
menu) and get most of what you would with an internal downloader. Some things like known URLs and tag parsing may be limited or non-existant, but it is better than nothing, and if you only need to do it for a couple sources on a couple sites every month, you can fill in the most of the gap manually yourself.
Hydev is planning to roll yt-dlp and gallery-dl support into the program natively in a future update of the downloader engine.
"},{"location":"getting_started_files.html","title":"Getting started with files","text":"Warning
Hydrus can be powerful, and you control everything. By default, you are not connected to any servers and absolutely nothing is shared with other users--and you can't accidentally one-click your way to exposing your whole collection--but if you tag private files with real names and click to upload that data to a tag repository that other people have access to, the program won't try to stop you. If you want to do private sexy slideshows of your shy wife, that's great, but think twice before you upload files or tags anywhere, particularly as you learn. It is impossible to contain leaks of private information.
There are no limits and few brakes on your behaviour. It is possible to import millions of files. For many new users, their first mistake is downloading too much too fast in overexcitement and becoming overwhelmed. Take things slow and figure out good processing workflows that work for your schedule before you start adding 500 subscriptions.
"},{"location":"getting_started_files.html#the_problem","title":"The problem","text":"If you have ever seen something like this--
--then you already know the problem: using a filesystem to manage a lot of images sucks.
Finding the right picture quickly can be difficult. Finding everything by a particular artist at a particular resolution is unthinkable. Integrating new files into the whole nested-folder mess is a further pain, and most operating systems bug out when displaying 10,000+ thumbnails.
"},{"location":"getting_started_files.html#the_client","title":"The client","text":"Let's first focus on importing files.
When you first boot the client, you will see a blank page. There are no files in the database and so there is nothing to search. To get started, I suggest you simply drag-and-drop a folder with a hundred or so images onto the main window. A dialog will appear affirming what you want to import. Ok that, and a new page will open. Thumbnails will stream in as the software processes each file.
The files are being imported into the client's database. The client discards their filenames.
Notice your original folder and its files are untouched. You can move the originals somewhere else, delete them, and the client will still return searches fine. In the same way, you can delete from the client, and the original files will remain unchanged--import is a copy, not a move, operation. The client performs all its operations on its internal database, which holds copies of the files it imports. If you find yourself enjoying using the client and decide to completely switch over, you can delete the original files you import without worry. You can always export them back again later.
FAQ: can the client manage files from their original locations?
Now:
Move your mouse to the top-left, top-middle and top-right of the media viewer. You should see some 'hover' panels pop into place.
The one on the left is for tags, the middle is for browsing and zoom commands, and the right is for status and ratings icons. You will learn more about these things as you get more experience with the program.
Press Enter or double/middle-click again to close the media viewer.
On the left of a normal search page is a text box. When it is focused, a dropdown window appears. It looks like this:
This is where you enter the predicates that define the current search. If the text box is empty, the dropdown will show 'system' tags that let you search by file metadata such as file size or animation duration. To select one, press the up or down arrow keys and then enter, or double click with the mouse.
When you have some tags in your database, typing in the text box will search them:
The (number) shows how many files have that tag, and hence how large the search result will be if you select that tag.
Clicking 'searching immediately' will pause the searcher, letting you add several tags in a row without sending it off to get results immediately. Ignore the other buttons for now--you will figure them out as you gain experience with the program.
You can remove from the list of 'active tags' in the box above with a double-click, or by entering the exact same tag again through the dropdown.
Hydrus supports many filetypes. A full list can be viewed on the Supported Filetypes page.
Although some support is imperfect for the complicated filetypes. For the Windows and Linux built releases, hydrus now embeds an MPV player for video, audio and gifs, which provides smooth playback and audio, but some other environments may not support MPV and so will default when possible to the native hydrus software renderer, which does not support audio. When something does not render how you want, right-clicking on its thumbnail presents the option 'open externally', which will open the file in the appropriate default program (e.g. ACDSee, VLC).
The client can also download files from several websites, including 4chan and other imageboards, many boorus, and gallery sites like deviant art and hentai foundry. You will learn more about this later.
"},{"location":"getting_started_files.html#inbox_and_archive","title":"Inbox and archive","text":"The client sends newly imported files to an inbox, just like your email. Inbox acts like a tag, matched by 'system:inbox'. A small envelope icon is drawn in the top corner of all inbox files:
If you are sure you want to keep a file long-term, you should archive it, which will remove it from the inbox. You can archive from your selected thumbnails' right-click menu, or by pressing F7. If you make a mistake, you can spam Ctrl+Z for undo or hit Shift+F7 on any set of files to explicitly return them to the inbox.
Anything you do not want to keep should be deleted by selecting from the right-click menu or by hitting the delete key. Deleted files are sent to the trash. They will get a little trash icon:
A trashed file will not appear in subsequent normal searches, although you can search the trash specifically by clicking the 'my files' button on the autocomplete dropdown and changing the file domain to 'trash'. Undeleting a file (Shift+Del) will return it to 'my files' as if nothing had happened. Files that remain in the trash will be permanently deleted, usually after a few days. You can change the permanent deletion behaviour in the client's options.
A quick way of processing new files is\u2013
"},{"location":"getting_started_files.html#filtering_your_inbox","title":"Filtering your inbox","text":"Lets say you just downloaded a good thread, or perhaps you just imported an old folder of miscellany. You now have a whole bunch of files in your inbox--some good, some awful. You probably want to quickly go through them, saying yes, yes, yes, no, yes, no, no, yes, where yes means 'keep and archive' and no means 'delete this trash'. Filtering is the solution.
Select some thumbnails, and either choose filter->archive/delete from the right-click menu or hit F12. You will see them in a special version of the media viewer, with the following default controls:
Your choices will not be committed until you finish filtering.
This saves time.
"},{"location":"getting_started_files.html#what_hydrus_is_for","title":"What Hydrus is for","text":"The hydrus client's workflows are not designed for half-finished files that you are still working on. Think of it as a giant archive for everything excellent you have decided to store away. It lets you find and remember these things quickly.
In general, Hydrus is good for individual files like you commonly find on imageboards or boorus. Although advanced users can cobble together some page-tag-based solutions, it is not yet great for multi-file media like comics and definitely not as a typical playlist-based music player.
If you are looking for a comic manager to supplement hydrus, check out this user-made guide to other archiving software here!
And although the client can hold millions of files, it starts to creak and chug when displaying or otherwise tracking more than about 40,000 or so in a single gui window. As you learn to use it, please try not to let your download queues or general search pages regularly sit at more than 40 or 50k total items, or you'll start to slow other things down. Another common mistake is to leave one large 'system:everything' or 'system:inbox' page open with 70k+ files. For these sorts of 'ongoing processing' pages, try adding a 'system:limit=256' to keep them snappy. One user mentioned he had regular gui hangs of thirty seconds or so, and when we looked into it, it turned out his handful of download pages had three million files queued up! Just try and take things slow until you figure out what your computer's limits are.
"},{"location":"getting_started_importing.html","title":"Importing and exporting","text":"By now you should have launched Hydrus. If you're like most new users you probably already have a fair bit of images or other media files that you're looking at getting organised.
Note
If you're planning to import or export a large amount of files it's recommended to use the automated folders since Hydrus can have trouble dealing with large, single jobs. Splitting them up in this manner will make it much easier on the program.
"},{"location":"getting_started_importing.html#importing_files","title":"Importing files","text":"Navigate to file -> import files
in the toolbar. OR Drag-and-drop one or more folders or files into Hydrus.
This will open the import files
window. Here you can add files or folders, or delete files from the import queue. Let Hydrus parse what it will update and then look over the options. By default the option to delete original files after succesful import (if it's ignored for any reason or already present in Hydrus for example) is not checked, activate on your own risk. In file import options
you can find some settings for minimum and maximum file size, resolution, and whether to import previously deleted files or not.
From here there's two options: import now
which will just import as is, and add tags before import >>
which lets you set up some rules to add tags to files on import. Examples are keeping filename as a tag, add folders as tag (useful if you have some sort of folder based organisation scheme), or load tags from an accompanying text file generated by some other program.
Once you're done click apply (or import now
) and Hydrus will start processing the files. Exact duplicates are not imported so if you had dupes spread out you will end up with only one file in the end. If files look similar but Hydrus imports both then that's a job for the dupe filter as there is some difference even if you can't tell it by eye. A common one is compression giving files with different file sizes, but otherwise looking identical or files with extra meta data baked into them.
If you want to share your files then export is the way to go. Basic way is to mark the files in Hydrus, dragging from there and dropping the files where you want them. You can also copy files or use export files to, well, export your files to a select location. All (or at least most) non-drag'n'drop export options can be found on right-clicking the select files and going down share
and then either copy
or export
.
Just dragging from the thumbnail view will export (copy) all the selected files to wherever you drop them. You can also start a drag and drop for single files from the media viewer using this arrow button on the top hover window:
If you want to drag and drop to discord, check the special BUGFIX option under options > gui
. You also find a filename pattern setting for that drag and drop here.
By default, the files will be named by their ugly hexadecimal hash, which is how they are stored inside the database.
If you use a drag and drop to open a file inside an image editing program, remember to hit 'save as' and give it a new filename in a new location! The client does not expect files inside its db directory to ever change.
"},{"location":"getting_started_importing.html#copy","title":"Copy","text":"You can also copy the files by right-clicking and going down share -> copy -> files
and then pasting the files where you want them.
You can also export files with tags, either in filename or as a sidecar file by right-clicking and going down share -> export -> files
. Have a look at the settings and then press export
. You can create folders to export files into by using backslashes on Windows (\\
) and slashes on Linux (/
) in the filename. This can be combined with the patterns listed in the pattern shortcut button dropdown. As example [series]\\{filehash}
will export files into folders named after the series:
namespaced tags on the files, all files tagged with one series goes into one folder, files tagged with another series goes into another folder as seen in the image below.
Clicking the pattern shortcuts
button gives you an overview of available patterns.
The EXPERIMENTAL option is only available under advanced mode, use at your own risk.
"},{"location":"getting_started_importing.html#automation","title":"Automation","text":"Under file -> import and export folders
you'll find options for setting up automated import and export folders that can run on a schedule. Both have a fair deal of options and rules you can set so look them over carefully.
Like with a manual import, if you wish you can import tags by parsing filenames or loading sidecars.
"},{"location":"getting_started_importing.html#export_folders","title":"Export folders","text":"Like with manual export, you can set the filenames using a tag pattern, and you can export to sidecars too.
"},{"location":"getting_started_importing.html#importing_and_exporting_tags","title":"Importing and exporting tags","text":"While you can import and export tags together with images sometimes you just don't want to deal with the files.
Going to tags -> migrate tags
you get a window that lets you deal with just tags. One of the options here is what's called a Hydrus Tag Archive, a file containing the hash <-> tag mappings for the files and tags matching the query.
If any of this is confusing, a simpler guide is here, and some video guides are here!
"},{"location":"getting_started_installing.html#downloading","title":"Downloading","text":"You can get the latest release at the github releases page.
I try to release a new version every Wednesday by 8pm EST and write an accompanying post on my tumblr and a Hydrus Network General thread on 8chan.moe /t/.
"},{"location":"getting_started_installing.html#installing","title":"Installing","text":"The hydrus releases are 64-bit only. If you are a python expert, there is the slimmest chance you'll be able to get it running from source on a 32-bit machine, but it would be easier just to find a newer computer to run it on.
WindowsmacOSLinuxDockerFrom Sourcehydrus-network
in the 'Extras' bucket) winget install --id=HydrusNetwork.HydrusNetwork -e --location \"\\PATH\\TO\\INSTALL\\HERE\"
, which can, if you know what you are doing, be winget install --id=HydrusNetwork.HydrusNetwork -e --location \".\\\"
, maybe rolled into a batch file.apt-get install libmpv1
OSError: /lib/x86_64-linux-gnu/libgio-2.0.so.0: undefined symbol: g_module_open_full\n(traceback)\npyimod04_ctypes.install.<locals>.PyInstallerImportError: Failed to load dynlib/dll 'libmpv.so.1'. Most likely this dynlib/dll was not found when the application was frozen.\n
Then please do this: libgmodule*
. You are looking for something like libgmodule-2.0.so
. Users report finding it in /usr/lib64/
and /usr/lib/x86_64-linux-gnu
.By default, hydrus stores all its data\u2014options, files, subscriptions, everything\u2014entirely inside its own directory. You can extract it to a usb stick, move it from one place to another, have multiple installs for multiple purposes, wrap it all up inside a truecrypt volume, whatever you like. The .exe installer writes some unavoidable uninstall registry stuff to Windows, but the 'installed' client itself will run fine if you manually move it.
Bad Locations
Do not install to a network location! (i.e. on a different computer's hard drive) The SQLite database is sensitive to interruption and requires good file locking, which network interfaces often fake. There are ways of splitting your client up so the database is on a local SSD but the files are on a network--this is fine--but you really should not put the database on a remote machine unless you know what you are doing and have a backup in case things go wrong.
Do not install to a location with filesystem-level compression enabled! It may work ok to start, but when the SQLite database grows to large size, this can cause extreme access latency and I/O errors and corruption.
For macOS users
The Hydrus App is non-portable and puts your database in ~/Library/Hydrus
(i.e. /Users/[You]/Library/Hydrus
). You can update simply by replacing the old App with the new, but if you wish to backup, you should be looking at ~/Library/Hydrus
, not the App itself.
Hydrus is made by an Anon out of duct tape and string. It combines file parsing tech with lots of network and database code in unusual and powerful ways, and all through a hacked-together executable that isn't signed by any big official company.
Unfortunately, we have been hit by anti-virus false positives throughout development. Every few months, one or more of the larger anti-virus programs sees some code that looks like something bad, or they run the program in a testbed and don't like something it does, and then they quarantine it. Every single instance of this so far has been a false positive. They usually go away the next week or two when the next set of definitions roll out. Some hydrus users are kind enough to report the program as a false positive to the anti-virus companies themselves, which also helps here.
Some users have never had the problem, some get hit regularly. The situation is obviously worse on Windows. If you try to extract the zip and hydrus_client.exe or the whole folder suddenly disappears, please check your anti-virus software.
I am interested in reports about these false-positives, just so I know what is going on. Sometimes I have been able to reduce problems by changing something in the build (one of these was, no shit, an anti-virus testbed running the installer and then opening the help html at the end, which launched Edge browser, which then triggered Windows Update, which hit UAC and was considered suspicious. I took out the 'open help' checkbox from the installer as a result).
You should be careful about random software online. For my part, the program is completely open source, and I have a long track record of designing it with privacy foremost. There is no intentional spyware of any sort--the program never connects to another computer unless you tell it to. Furthermore, the exe you download is now built on github's cloud, so there are very few worries about a trojan-infected build environment putting something I did not intend into the program (as there once were when I built the release on my home machine). That doesn't stop Windows Defender from sometimes calling it an ugly name like \"Tedy.4675\" and definitively declaring \"This program is dangerous and executes commands from an attacker\" but that's the modern anti-virus ecosystem.
There aren't excellent solutions to this problem. I don't like to say 'just exclude the program directory from your anti-virus settings', but some users are comfortable with this and say it works fine. One thing I do know that helps (with other things too), if you are using the default Windows Defender, is going into the Windows Security shield icon on your taskbar, and 'virus and threat protection' and then 'virus and threat protection settings', and turning off 'Cloud-delivered protection' and 'Automatic sample submission'. It seems with these on, Windows will talk with a central server about executables you run and download early updates, and this gives a lot of false positives.
If you are still concerned, please feel free to run from source, as above. You are controlling everything, then, and can change anything about the program you like. Or you can only run releases from four weeks ago, since you know the community would notice by then if there ever were a true positive. Or just run it in a sandbox and watch its network traffic.
In 2022 I am going to explore a different build process to see if that reduces the false positives. We currently make the executable with PyInstaller, which has some odd environment set-up the anti-virus testbeds don't seem to like, and perhaps PyOxidizer will be better. We'll see.
"},{"location":"getting_started_installing.html#running","title":"Running","text":"To run the client:
WindowsmacOSLinux./client
from the terminal.Warning
Hydrus is imageboard-tier software, wild and fun but unprofessional. It is written by one Anon spinning a lot of plates. Mistakes happen from time to time, usually in the update process. There are also no training wheels to stop you from accidentally overwriting your whole db if you screw around. Be careful when updating. Make backups beforehand!
Hydrus does not auto-update. It will stay the same version unless you download and install a new one.
Although I put out a new version every week, you can update far less often if you prefer. The client keeps to itself, so if it does exactly what you want and a new version does nothing you care about, you can just leave it. Other users enjoy updating every week, simply because it makes for a nice schedule. Others like to stay a week or two behind what is current, just in case I mess up and cause a temporary bug in something they like.
A user has written a longer and more formal guide to updating, and information on the 334->335 step (python2 to python3) here.
The 526->527 step was also important.527 changed the program executable name from 'client' to 'hydrus_client'. There was also a library update that caused a dll conflict with previous installs.
If you need to update from 526 or before, then:
git pull
as normal. If you haven't already, feel free to run setup_venv again to get the new OpenCV. Update your launch scripts to point at the new hydrus_client.py
boot scripts.The update process:
Unless the update specifically disables or reconfigures something, all your files and tags and settings will be remembered after the update.
Releases typically need to update your database to their version. New releases can retroactively perform older database updates, so if the new version is v255 but your database is on v250, you generally only need to get the v255 release, and it'll do all the intervening v250->v251, v251->v252, etc... update steps in order as soon as you boot it. If you need to update from a release more than, say, ten versions older than current, see below. You might also like to skim the release posts or changelog to see what is new.
Clients and servers of different versions can usually connect to one another, but from time to time, I make a change to the network protocol, and you will get polite error messages if you try to connect to a newer server with an older client or vice versa. There is still no need to update the client--it'll still do local stuff like searching for files completely fine. Read my release posts and judge for yourself what you want to do.
"},{"location":"getting_started_installing.html#clean_installs","title":"Clean installs","text":"This is usually only relevant if you know you have a dll conflict or otherwise update and cannot boot at all.
Very rarely, hydrus needs a clean install. This can be due to a special update like when we moved from 32-bit to 64-bit or needing to otherwise 'reset' a custom install situation. The problem is usually that a library file has been renamed in a new version and hydrus has trouble figuring out whether to use the older one (from a previous version) or the newer.
In any case, if you cannot boot hydrus and it either fails silently or you get a crash log or system-level error popup complaining in a technical way about not being able to load a dll/pyd/so file, you may need a clean install, which essentially means clearing any old files out and reinstalling.
However, you need to be careful not to delete your database! It sounds silly, but at least one user has made a mistake here. The process is simple, do not deviate:
After that, you'll have a 'clean' version of hydrus that only has the latest version's dlls. If hydrus still will not boot, I recommend you roll back to your last working backup and let me, hydrus dev, know what your error is.
"},{"location":"getting_started_installing.html#big_updates","title":"Big updates","text":"If you have not updated in some time--say twenty versions or more--doing it all in one jump, like v250->v290, is likely not going to work. I am doing a lot of unusual stuff with hydrus, change my code at a fast pace, and do not have a ton of testing in place. Hydrus update code often falls to bitrot, and so some underlying truth I assumed for the v255->v256 code may not still apply six months later. If you try to update more than 50 versions at once (i.e. trying to perform more than a year of updates in one go), the client will give you a polite error rather than even try.
As a result, if you get a failure on trying to do a big update, try cutting the distance in half--try v270 first, and then if that works, try v270->v290. If it doesn't, try v260, and so on.
If you narrow the gap down to just one version and still get an error, please let me know. I am very interested in these sorts of problems and will be happy to help figure out a fix with you (and everyone else who might be affected).
All that said, and while updating is complex and every client is different, various user reports over the years suggest this route works and is efficient: 204 > 238 > 246 > 291 > 328 > 335 > 376 > 421 > 466 > 474 ? 480 > 521
"},{"location":"getting_started_installing.html#backing_up","title":"Backing up","text":"I am not joking around: if you end up liking hydrus, you should back up your database
Maintaining a regular backup is important for hydrus. The program stores a lot of complicated data that you will put hours and hours of work into, and if you only have one copy and your hard drive breaks, you could lose everything. This has happened before--to people who thought it would never happen to them--and it sucks big time to go through. Don't let it be you.
Hydrus's database engine, SQLite, is excellent at keeping data safe, but it cannot work in a faulty environment. Ways in which users of hydrus have damaged/lost their database:
Some of those you can mitigate (don't run the database over a network!) and some will always be a problem, but if you have a backup, none of them can kill you.
This mostly means your database, not your files
Note that nearly all the serious and difficult-to-fix problems occur to the database, which is four large .db files, not your media. All your images and movies are read-only in hydrus, and there's less worry if they are on a network share with bad locks or a machine that suddenly loses power. The database, however, maintains a live connection, with regular complex writes, and here a hardware failure can lead to corruption (basically the failure scrambles the data that is written, so when you try to boot back up, a small section of the database is incomprehensible garbage).
If you do not already have a backup routine for your files, this is a great time to start. I now run a backup every week of all my data so that if my computer blows up or anything else awful happens, I'll at worst have lost a few days' work. Before I did this, I once lost an entire drive with tens of thousands of files, and it felt awful. If you are new to saving a lot of media, I hope you can avoid what I felt. ;_;
I use ToDoList to remind me of my jobs for the day, including backup tasks, and FreeFileSync to actually mirror over to an external usb drive. I recommend both highly (and for ToDoList, I recommend hiding the complicated columns, stripping it down to a simple interface). It isn't a huge expense to get a couple-TB usb drive either--it is absolutely worth it for the peace of mind.
By default, hydrus stores all your user data in one location, so backing up is simple:
"},{"location":"getting_started_installing.html#the_simple_way_-_inside_the_client","title":"The simple way - inside the client","text":"Go database->set up a database backup location in the client. This will tell the client where you want your backup to be stored. A fresh, empty directory on a different drive is ideal.
Once you have your location set up, you can thereafter hit database->update database backup. It will lock everything and mirror your files, showing its progress in a popup message. The first time you make this backup, it may take a little while (as it will have to fully copy your database and all its files), but after that, it will only have to copy new or altered files and should only ever take a couple of minutes.
Advanced users who have migrated their database and files across multiple locations will not have this option--use an external program in this case.
"},{"location":"getting_started_installing.html#the_powerful_and_best_way_-_using_an_external_program","title":"The powerful (and best) way - using an external program","text":"Doing it yourself is best. If you are an advanced user with a complicated hydrus install migrated across multiple drives, then you will have to do it this way--the simple backup will be disabled.
You need to backup two things, which are both, by default, beneath install_dir/db: the four client*.db files and your client_files directory(ies). The .db files contain absolutely everything about your client and files--your settings and file lists and metadata like inbox/archive and tags--while the client_files subdirs store your actual media and its thumbnails.
If everything is still under install_dir/db, then it is usually easiest to just backup the whole install dir, keeping a functional 'portable' copy of your install that you can restore no prob. Make sure you keep the .db files together--they are not interchangeable and mostly useless on their own!
An example FreeFileSync profile for backing up a database will look like this:
Note it has 'file time and size' and 'mirror' as the main settings. This quickly ensures that changes to the left-hand side are copied to the right-hand side, adding new files and removing since-deleted files and overwriting modified files. You can save a backup profile like that and it should only take a few minutes every week to stay safely backed up, even if you have hundreds of thousands of files.
Shut the client down while you run the backup, obviously.
"},{"location":"getting_started_installing.html#a_few_options","title":"A few options","text":"There are a host of other great alternatives out there, probably far too many to count. These are a couple that are often recommended and used by Hydrus users and are, in the spirit of Hydrus Network itself, free and open source.
FreeFileSync Linux, MacOS, Windows. Recommended and used by dev. Somewhat basic but does the job well enough.
Borg Backup FreeBSD, Linux, MacOS. More advanced and featureful backup tool.
Restic Almost every OS you can name.
Danger
Do not put your live database in a folder that continuously syncs to a cloud backup. Many of these services will interfere with a running client and can cause database corruption. If you still want to use a system like this, either turn the sync off while the client is running, or use the above backup workflows to safely backup your client to a separate folder that syncs to the cloud.
There is significantly more information about the database structure here.
I recommend you always backup before you update, just in case there is a problem with my update code that breaks your database. If that happens, please contact me, describing the problem, and revert to the functioning older version. I'll get on any problems like that immediately.
"},{"location":"getting_started_installing.html#backing_up_small","title":"Backing up with not much space","text":"If you decide not to maintain a backup because you cannot afford drive space for all your files, please please at least back up your actual database files. Use FreeFileSync or a similar program to back up the four 'client*.db' files in install_dir/db when the client is not running. Just make sure you have a copy of those files, and then if your main install becomes damaged, we will have a reference to either roll back to or manually restore data from. Even if you lose a bunch of media files in this case, with an intact database we'll be able to schedule recovery of anything with a URL.
If you are really short on space, note also that the database files are very compressible. A very large database where the four files add up to 70GB can compress down to 17GB zip with 7zip on default settings. Better compression ratios are possible if you make sure to put all four files in the same archive and turn up the quality. This obviously takes some additional time to do, but if you are really short on space it may be the only way it fits, and if your only backup drive is a slow USB stick, then you might actually save time from not having to transfer the other 53GB! Media files (jpegs, webms, etc...) are generally not very compressible, usually 5% at best, so it is usually not worth trying.
It is best to have all four database files. It is generally easy and quick to fix problems if you have a backup of all four. If client.caches.db is missing, you can recover but it might take ten or more hours of CPU work to regenerate. If client.mappings.db is missing, you might be able to recover tags for your local files from a mirror in an intact client.caches.db. However, client.master.db and client.db are the most important. If you lose either of those, or they become too damaged to read and you have no backup, then your database is essentially dead and likely every single archive and view and tag and note and url record you made is lost. This has happened before, do not let it be you.
"},{"location":"getting_started_more_tags.html","title":"Tags Can Get Complicated","text":"Tags are powerful, and there are many tools within hydrus to customise how they apply and display. I recommend you play around with the basics before making your own new local tag services or jumping right into the PTR, so take it slow.
"},{"location":"getting_started_more_tags.html#tag_services","title":"Tag services","text":"Hydrus lets you organise tags across multiple separate 'services'. By default there are two, but you can have however many you want (services->manage services
). You might like to add more for different sets of siblings/parents, tags you don't want to see but still search by, parsing tags into different services based on reliability of the source or the source itself. You could for example parse all tags from Pixiv into one service, Danbooru tags into another, Deviantart etc. and so on as you chose. You must always have at least one local tag service.
Local tag services are stored only on your hard drive--they are completely private. No tags, siblings, or parents will accidentally leak, so feel free to go wild with whatever odd scheme you want to try out.
Each tag service comes with its own tags, siblings and parents.
"},{"location":"getting_started_more_tags.html#my_tags","title":"My tags","text":"The intent is to use this service for tags you yourself want to add.
"},{"location":"getting_started_more_tags.html#downloader_tags","title":"Downloader tags","text":"The default tag parse target. Tags of things you download will end up here unless you change the settings. It's probably a good idea to set up some tag blacklists for tags you don't want.
"},{"location":"getting_started_more_tags.html#tag_repositories","title":"Tag repositories","text":"It can take a long time to tag even small numbers of files well, so I created tag repositories so people can share the work.
Tag repos store many file->tag relationships. Anyone who has an access key to the repository can sync with it and hence download all these relationships. If any of their own files match up, they will get those tags. Access keys will also usually have permission to upload new tags and ask for incorrect ones to be deleted.
Anyone can run a tag repository, but it is a bit complicated for new users. I ran a public tag repository for a long time, and now this large central store is run by users. It has over a billion tags and is free to access and contribute to.
To connect with it, please check here. Please read that page if you want to try out the PTR. It is only appropriate for someone on an SSD!
If you add it, your client will download updates from the repository over time and, usually when it is idle or shutting down, 'process' them into its database until it is fully synchronised. The processing step is CPU and HDD heavy, and you can customise when it happens in file->options->maintenance and processing. As the repository synchronises, you should see some new tags appear, particularly on famous files that lots of people have.
You can watch more detailed synchronisation progress in the services->review services window.
Your new service should now be listed on the left of the manage tags dialog. Adding tags to a repository works very similarly to the 'my tags' service except hitting 'apply' will not immediately confirm your changes--it will put them in a queue to be uploaded. These 'pending' tags will be counted with a plus '+' or minus '-' sign.
Notice that a 'pending' menu has appeared on the main window. This lets you start the upload when you are ready and happy with everything that you have queued.
When you upload your pending tags, they will commit and look to you like any other tag. The tag repository will anonymously bundle them into the next update, which everyone else will download in a day or so. They will see your tags just like you saw theirs.
If you attempt to remove a tag that has been uploaded, you may be prompted to give a reason, creating a petition that a janitor for the repository will review.
I recommend you not spam tags to the public tag repo until you get a rough feel for the guidelines, and my original tag schema thoughts, or just lurk until you get the idea. It roughly follows what you will see on a typical booru. The general rule is to only add factual tags--no subjective opinion.
You can connect to more than one tag repository if you like. When you are in the manage tags dialog, pressing the up or down arrow keys on an empty input switches between your services.
FAQ: why can my friend not see what I just uploaded?
"},{"location":"getting_started_more_tags.html#siblings_and_parents","title":"Siblings and parents","text":"For more in-depth information, see siblings and parents.
tl;dr: Siblings rename/alias tags in an undoable way. Parents virtually add/imply one or more tags (parents) if the 'child' tag is present. The PTR has a lot of them.
"},{"location":"getting_started_more_tags.html#display_rules","title":"Display rules","text":"If you go to tags -> manage where siblings and parents apply
you'll get a window where you can customise where and in what order siblings and parents apply. The service at the top of the list has precedence over all else, then second, and so on depending on how many you have. If you for example have PTR you can use a tag service to overwrite tags/siblings for cases where you disagree with the PTR standards.
The hydrus client supports two kinds of ratings: like/dislike and numerical. Let's start with the simpler one:
"},{"location":"getting_started_ratings.html#like_dislike","title":"like/dislike","text":"A new client starts with one of these, called 'favourites'. It can set one of two values to a file. It does not have to represent like or dislike--it can be anything you want, like 'send to export folder' or 'explicit/safe' or 'cool babes'. Go to services->manage services->add->local like/dislike ratings:
You can set a variety of colours and shapes.
"},{"location":"getting_started_ratings.html#numerical","title":"numerical","text":"This is '3 out of 5 stars' or '8/10'. You can set the range to whatever whole numbers you like:
As well as the shape and colour options, you can set how many 'stars' to display and whether 0/10 is permitted.
If you change the star range at a later date, any existing ratings will be 'stretched' across the new range. As values are collapsed to the nearest integer, this is best done for scales that are multiples. \u2156 will neatly become 4/10 on a zero-allowed service, for instance, and 0/4 can nicely become \u2155 if you disallow zero ratings in the same step. If you didn't intuitively understand that, just don't touch the number of stars or zero rating checkbox after you have created the numerical rating service!
"},{"location":"getting_started_ratings.html#using_ratings","title":"now what?","text":"Ratings are displayed in the top-right of the media viewer:
Hovering over each control will pop up its name, in case you forget which is which. You can set then them with a left- or right-click. Like/dislike and numerical have slightly different click behaviour, so have a play with them to get their feel. Pressing F4 on a selection of thumbnails will open a dialog with a very similar layout, which will let you set the same rating to many files simultaneously.
Once you have some ratings set, you can search for them using system:rating, which produces this dialog:
On my own client, I find it useful to have several like/dislike ratings set up as one-click pseudo-tags, like the 'OP images' above.
"},{"location":"getting_started_searching.html","title":"Searching and sorting","text":"The primary purpose of tags is to be able to find what you've tagged again. Let's see more how it works.
"},{"location":"getting_started_searching.html#searching","title":"Searching","text":"Just open a new search page (pages > new file search page
or Ctrl+T > file search
) and start typing in the search field which should be focused when you first open the page.
Let's look at the tag autocomplete dropdown:
system predicates
Hydrus calls search terms predicates. 'system predicates', which search metadata other than simple tags, show on any search page with an empty autocomplete input. You can mix them into any search alongside tags. They are very useful, so try them out!
include current/pending tags
Turn these on and off to control whether tag predicates apply to tags that exist, or those pending to be uploaded to a tag repository. Just searching 'pending' tags is useful if you want to scan what you have pending to go up to the PTR--just turn off 'current' tags and search system:num tags > 0
.
searching immediately
This controls whether a change to the list of current search predicates will instantly run the new search and get new results. Turning this off is helpful if you want to add, remove, or replace several heavy search terms in a row without getting UI lag.
OR
You only see this if you have 'advanced mode' on. It lets you enter some pretty complicated tags!
file/tag domains
By default, you will search in 'my files' and 'all known tags' domain. This is the intersection of your local media files (on your hard disk) and the union of all known tag searches. If you search for character:samus aran
, then you will get file results from your 'my files' domain that have character:samus aran
in any known tag service. For most purposes, this combination is fine, but as you use the client more, you will sometimes want to access different search domains.
For instance, if you change the file domain to 'trash', then you will instead get files that are in your trash. Setting the tag domain to 'my tags' will ignore other tag services (e.g. the PTR) for all tag search predicates, so a system:num_tags
or a character:samus aran
will only look 'my tags'.
Turning on 'advanced mode' gives access to more search domains. Some of them are subtly complicated, run extremely slowly, and only useful for clever jobs--most of the time, you still want 'my files' and 'all known tags'.
favourite searches star
Once you are more experienced, have a play with this. It lets you save your common searches for future, so you don't have to either keep re-entering them or keep them open all the time. If you close big things down when you aren't using them, you will keep your client lightweight and save time.
When you type a tag in a search page, Hydrus will treat a space the same way as an underscore. Searching character:samus aran
will find files tagged with character:samus aran
and character:samus_aran
. This is true of some other syntax characters, [](){}/\\\"'-
, too.
Tags will be searchable by all their siblings. If there's a sibling for large
-> huge
then typing large
will provide huge
as a suggestion. This goes for the whole sibling chain, no matter how deep or a tag's position in it.
The autocomplete tag dropdown supports wildcard searching with *
.
The *
will match any number of characters. Every normal autocomplete search has a secret *
on the end that you don't see, which is how full words get matched from you only typing in a few letters.
This is useful when you can only remember part of a word, or can't spell part of it. You can put *
characters anywhere, but you should experiment to get used to the exact way these searches work. Some results can be surprising!
You can select the special predicate inserted at the top of your autocomplete results (the highlighted *gelion
and *va*ge*
above). It will return all files that match that wildcard, i.e. every file for every other tag in the dropdown list.
This is particularly useful if you have a number of files with commonly structured over-informationed tags, like this:
In this case, selecting the title:cool pic*
predicate will return all three images in the same search, where you can conveniently give them some more-easily searched tags like series:cool pic
and page:1
, page:2
, page:3
.
You can edit any selected 'active' search predicates by either its Right-Click menu or through Shift+Double-Left-Click on the selection. For simple tags, this means just changing the text (and, say, adding/removing a leading hyphen for negation/inclusion), but any 'system' predicate can be fully edited with its original panel. If you entered 'system:filesize < 200KB' and want to make it a little bigger, don't delete and re-add--just edit the existing one in place.
"},{"location":"getting_started_searching.html#other_shortcuts","title":"Other Shortcuts","text":"These will eventually be migrated to the shortcut system where they will be more visible and changeable, but for now:
Searches find files that match every search 'predicate' in the list (it is an AND search), which makes it difficult to search for files that include one OR another tag. For example the query red eyes
AND green eyes
(aka what you get if you enter each tag by itself) will only find files that has both tags. While the query red eyes
OR green eyes
will present you with files that are tagged with red eyes or green eyes, or both.
More recently, simple OR search support was added. All you have to do is hold down Shift when you enter/double-click a tag in the autocomplete entry area. Instead of sending the tag up to the active search list up top, it will instead start an under-construction 'OR chain' in the tag results below:
You can keep searching for and entering new tags. Holding down ++Shift++ on new tags will extend the OR chain, and entering them as normal will 'cap' the chain and send it to the complete and active search predicates above.
Any file that has one or more of those OR sub-tags will match.
If you enter an OR tag incorrectly, you can either cancel or 'rewind' the under-construction search predicate with these new buttons that will appear:
You can also cancel an under-construction OR by hitting Esc on an empty input. You can add any sort of search term to an OR search predicate, including system predicates. Some unusual sub-predicates (typically a -tag
, or a very broad system predicate) can run very slowly, but they will run much faster if you include non-OR search predicates in the search:
This search will return all files that have the tag fanfic
and one or more of medium:text
, a positive value for the like/dislike rating 'read later', or PDF mime.
There's a more advanced OR search function available by pressing the OR button. Previous knowledge of operators expected and required.
"},{"location":"getting_started_searching.html#sorting","title":"Sorting","text":"At the top-left of most pages there's a sort by:
dropdown menu. Most of the options are self-explanatory. They do nothing except change in what order Hydrus presents the currently searched files to you.
Default sort order and more sort by: namespace
are found in file -> options -> sort/collect
.
system:limit
","text":"If you add system:limit
to a search, the client will consider what that page's file sort currently is. If it is simple enough--something like file size or import time--then it will sort your results before they come back and clip the limit according to that sort, getting the n 'largest file size' or 'newest imports' and so on. This can be a great way to set up a lightweight filtering page for 'the 256 biggest videos in my inbox'.
If you change the sort, hydrus will not refresh the search, it'll just re-sort the n files you have. Hit F5 to refresh the search with a new sort.
Not all sorts are supported. Anything complicated like tag sort will result in a random sample instead.
"},{"location":"getting_started_searching.html#collecting","title":"Collecting","text":"Collection is found under the sort by:
dropdown and uses namespaces listed in the sort by: namespace
sort options. The new namespaces will only be available in new pages.
The introduction to subscriptions has been moved to the main downloading help here.
"},{"location":"getting_started_subscriptions.html#description","title":"how do subscriptions work?","text":"For the most part, all you need to do to set up a good subscription is give it a name, select the download source, and use the 'paste queries' button to paste what you want to search. Subscriptions have great default options for almost all query types, so you don't have to go any deeper than that to get started.
Once you hit ok on the main subscription dialog, the subscription system should immediately come alive. If any queries are due for a 'check', they will perform their search and look for new files (i.e. URLs it has not seen before). Once that is finished, the file download queue will be worked through as normal. Typically, the sub will make a popup like this while it works:
The initial sync can sometimes take a few minutes, but after that, each query usually only needs thirty seconds' work every few days. If you leave your client on in the background, you'll rarely see them. If they ever get in your way, don't be afraid to click their little cancel button or call a global halt with network->pause->subscriptions--the next time they run, they will resume from where they were before.
Similarly, the initial sync may produce a hundred files, but subsequent runs are likely to only produce one to ten. If a subscription comes across a lot of big files at once, it may not download them all in one go--but give it time, and it will catch back up before you know it.
When it is done, it leaves a little popup button that will open a new page for you:
This can often be a nice surprise!
"},{"location":"getting_started_subscriptions.html#good_subs","title":"what makes a good subscription?","text":"The same rules as for downloaders apply: start slow, be hesitant, and plan for the long-term. Artist queries make great subscriptions as they update reliably but not too often and have very stable quality. Pick the artists you like most, see where their stuff is posted, and set up your subs like that.
Series and character subscriptions are sometimes valuable, but they can be difficult to keep up with and have highly variable quality. It is not uncommon for users to only keep 15% of what a character sub produces. I do not recommend them for anything but your waifu.
Attribute subscriptions like 'blue_eyes' or 'smile' make for terrible subs as the quality is all over the place and you will be inundated by too much content. The only exceptions are for specific, low-count searches that really matter to you, like 'contrapposto' or 'gothic trap thighhighs'.
If you end up subscribing to eight hundred things and get ten thousand new files a week, you made a mistake. Subscriptions are for keeping up with things you like. If you let them overwhelm you, you'll resent them.
It is a good idea to run a 'full' download for a search before you set up a subscription. As well as making sure you have the exact right query text and that you have everything ever posted (beyond the 100 files deep a sub will typically look), it saves the bulk of the work (and waiting on bandwidth) for the manual downloader, where it belongs. When a new subscription picks up off a freshly completed download queue, its initial subscription sync only takes thirty seconds since its initial URLs are those that were already processed by the manual downloader. I recommend you stack artist searches up in the manual downloader using 'no limit' file limit, and when they are all finished, select them in the list and right-click->copy queries, which will put the search texts in your clipboard, newline-separated. This list can be pasted into the subscription dialog in one go with the 'paste queries' button again!
"},{"location":"getting_started_subscriptions.html#checking","title":"images/how often do subscriptions check?","text":"Hydrus subscriptions use the same variable-rate checking system as its thread watchers, just on a larger timescale. If you subscribe to a busy feed, it might check for new files once a day, but if you enter an artist who rarely posts, it might only check once every month. You don't have to do anything. The fine details of this are governed by the 'checker options' button. This is one of the things you should not mess with as you start out.
If a query goes too 'slow' (typically, this means no new files for 180 days), it will be marked DEAD in the same way a thread will, and it will not be checked again. You will get a little popup when this happens. This is all editable as you get a better feel for the system--if you wish, it is completely possible to set up a sub that never dies and only checks once a year.
I do not recommend setting up a sub that needs to check more than once a day. Any search that is producing that many files is probably a bad fit for a subscription. Subscriptions are for lightweight searches that are updated every now and then.
(you might like to come back to this point once you have tried subs for a week or so and want to refine your workflow)
"},{"location":"getting_started_subscriptions.html#presentation","title":"ok, I set up three hundred queries, and now these popup buttons are a hassle","text":"On the edit subscription panel, the 'presentation' options let you publish files to a page. The page will have the subscription's name, just like the button makes, but it cuts out the middle-man and 'locks it in' more than the button, which will be forgotten if you restart the client. Also, if a page with that name already exists, the new files will be appended to it, just like a normal import page! I strongly recommend moving to this once you have several subs going. Make a 'page of pages' called 'subs' and put all your subscription landing pages in there, and then you can check it whenever is convenient.
If you discover your subscription workflow tends to be the same for each sub, you can also customise the publication 'label' used. If multiple subs all publish to the 'nsfw subs' label, they will all end up on the same 'nsfw subs' popup button or landing page. Sending multiple subscriptions' import streams into just one or two locations like this can be great.
You can also hide the main working popup. I don't recommend this unless you are really having a problem with it, since it is useful to have that 'active' feedback if something goes wrong.
Note that subscription file import options will, by default, only present 'new' files. Anything already in the db will still be recorded in the internal import cache and used to calculate next check times and so on, but it won't clutter your import stream. This is different to the default for all the other importers, but when you are ready to enter the ranks of the Patricians, you will know to edit your 'loud' default file import options under options->importing to behave this way as well. Efficient workflows only care about new files.
"},{"location":"getting_started_subscriptions.html#syncing_explanation","title":"how exactly does the sync work?","text":"Figuring out when a repeating search has 'caught up' can be a tricky problem to solve. It sounds simple, but unusual situations like 'a file got tagged late, so it inserted deeper than it ideally should in the gallery search' or 'the website changed its URL format completely, help' can cause problems. Subscriptions are automatic systems, so they tend to be a bit more careful and paranoid about problems, lest they burn 10GB on 10,000 unexpected diaperfur images.
The initial sync is simple. It does a regular search, stopping if it reaches the 'initial file limit' or the last file in the gallery, whichever comes first. The default initial file sync is 100, which is a great number for almost all situations.
Subsequent syncs are more complicated. It ideally 'stops' searching when it reaches files it saw in a previous sync, but if it comes across new files mixed in with the old, it will search a bit deeper. It is not foolproof, and if a file gets tagged very late and ends up a hundred deep in the search, it will probably be missed. There is no good and computationally cheap way at present to resolve this problem, but thankfully it is rare.
Remember that an important 'staying sane' philosophy of downloading and subscriptions is to focus on dealing with the 99.5% you have before worrying about the 0.5% you do not.
The amount of time between syncs is calculated by the checker options. Based on the timestamps attached to existing urls in the subscription cache (either added time, or the post time as parsed from the url), the sub estimates how long it will be before n new files appear, and then next check is scheduled for then. Unless you know what you are doing, checker options, like file limits, are best left alone. A subscription will naturally adapt its checking speed to the file 'velocity' of the source, and there is usually very little benefit to trying to force a sub to check at a radically different speed.
Tip
If you want to force your subs to run at the same time, say every evening, it is easier to just use network->pause->subscriptions as a manual master on/off control. The ones that are due will catch up together, the ones that aren't won't waste your time.
Remember that subscriptions only keep up with new content. They cannot search backwards in time in order to 'fill out' a search, nor can they fill in gaps. Do not change the file limits or check times to try to make this happen. If you want to ensure complete sync with all existing content for a particular search, use the manual downloader.
In practice, most subs only need to check the first page of a gallery since only the first two or three urls are new.
"},{"location":"getting_started_subscriptions.html#periodic_file_limit","title":"periodic file limit exceeded","text":"If, during a regular sync, the sub keeps finding new URLs, never hitting a block of already-seen URLs, it will stop upon hitting its 'periodic file limit', which is also usually 100. When it happens, you will get a popup message notification. There are two typical reasons for this:
The first case is a natural accident of statistics. The subscription now has a 'gap' in its sync. If you want to get what you missed, you can try to fill in the gap with a manual downloader page. Just download to 200 files or so, and the downloader will work quickly to one-time work through the URLs in the gap.
The second case is a safety stopgap for hydrus. If a site decides to have /post/123456
style URLs instead of post.php?id=123456
style, hydrus will suddenly see those as entirely 'new' URLs. It could also be because of an updated downloader, which pulls URLs in API format or similar. This is again thankfully quite rare, but it triggers several problems--the associated downloader usually breaks, as it does not yet recognise those new URLs, and all your subs for that site will parse through and hit the periodic limit for every query. When this happens, you'll usually get several periodic limit popups at once, and you may need to update your downloader. If you know the person who wrote the original downloader, they'll likely want to know about the problem, or may already have a fix sorted. It is often a good idea to pause the affected subs until you have it figured out and working in a normal gallery downloader page.
On the main subscription dialog, there are 'merge' and 'separate' buttons. These are powerful, but they will walk you through the process of pulling queries out of a sub and merging them back into a different one. Only subs that use the same download source can be merged. Give them a go, and if it all goes wrong, just hit the cancel button on the dialog.
"},{"location":"getting_started_tags.html","title":"Getting started with tags","text":"A tag is a small bit of text describing a single property of something. They make searching easy. Good examples are \"flower\" or \"nicolas cage\" or \"the sopranos\" or \"2003\". By combining several tags together ( e.g. [ 'tiger woods', 'sports illustrated', '2008' ] or [ 'cosplay', 'the legend of zelda' ] ), a huge image collection is reduced to a tiny and easy-to-digest sample.
"},{"location":"getting_started_tags.html#intro","title":"How do we find files?","text":"So, you have some files imported. Let's give them some tags so we can find them again later.
FAQ: what is a tag?
Your client starts with two local tags services, called 'my tags' and 'downloader tags' which keep all of their file->tag mappings in your client's database where only you can see them. 'my tags' is a good place to practise.
Select a file and press F3 to open the manage tags dialog:
The area below where you type is the 'autocomplete dropdown'. You will see this on normal search pages too. Type part of a tag, and matching results will appear below. Since you are starting out, your 'my tags' service won't have many tags in it yet, but things will populate fast! Select the tag you want with the arrow keys and hit enter. If you want to remove a tag, enter the exact same thing again or double-click it in the box above.
Prefixing a tag with a category and a colon will create a namespaced tag. This helps inform the software and other users about what the tag is. Examples of namespaced tags are:
character:batman
series:street fighter
person:jennifer lawrence
title:vitruvian man
The client is set up to draw common namespaces in different colours, just like boorus do. You can change these colours in the options.
Once you are happy with your tag changes, click 'apply', or hit F3 again, or simply press Enter on the text box while it is empty. The tags are now saved to your database.
Media Viewer Manage Tags
You can also open the manage tags dialog from the full media viewer, but note that this one does not have 'apply' and 'cancel' buttons, only 'close'. It makes its changes instantly, and you can keep using the rest of the program while it is open (it is a non-'modal' dialog).
Also, you need not close the media viewer's manage tags dialog while you browse. Just like you can hit Enter on the empty text box to close the dialog, hitting Page Up/Down navigates the parent viewer Back/Forward!
AlsoHit Arrow Up/Down on an empty text input to switch between the tag service tabs!
Once you have some tags set, typing the first few characters of one in on a search page will show the counts of all the tags that start with that. Enter the one you want, and the search will run:
If you add more 'predicates' to a search, you will limit the results to those files that match every single one:
You can also exclude a tag by prefixing it with a hyphen (e.g. -solo
).
You can add as many tags as you want. In general, the more search predicates you add, the smaller and faster the results will be, but some types of tag (like excluded -tags
), or the cleverer system
tags that you will soon learn about, can be suddenly CPU expensive. If a search takes more than a few seconds to run, a 'stop' button appears by the tag input. It cancels things out pretty quick in most cases.
Click the links on the left to go through the getting started guide. Subheadings are on the right. Larger sections are up top. Please at least skim every page in the getting started section, as this will introduce you to the main systems in the client. There is a lot, so you do not have to do it all in one go.
The section on installing, updating, and backing up is very important.
This help is available locally in every release. Hit help->help and getting started guide
in the client, or open install_dir/help/index.html
.
I've been on the internet and imageboards for a long time, saving everything I like to my hard drive. After a while, the whole collection was just too large to manage on my own. I couldn't find anything in the mess, and I just saved new files in there with names like 'image1257.jpg'.
There aren't many solutions to this problem that aren't online, and I didn't want to lose my privacy or control.
"},{"location":"introduction.html#anonymous","title":"on being anonymous","text":"I enjoy being anonymous online. When you aren't afraid of repercussions, you can be as truthful as you want and share interesting things, no matter how unusual. You can have unique conversations and tackle some otherwise unsolvable problems. It's fun!
I'm a normal Anon, nothing special. :^)
"},{"location":"introduction.html#hydrus_network","title":"the hydrus network","text":"So! I'm developing a program that helps people organise their files on their own terms and, if they want to, collaborate with others anonymously. I want to help you do what you want with your stuff, and that's it. You can share some tags (and files, but this is limited) with other people if you want to, but you don't have to connect to anything if you don't. The default is complete privacy, no sharing, and every upload requires a conscious action on your part. I don't plan to ever record metrics on users, nor serve ads, nor charge for my software. The software never phones home.
This does a lot more than a normal image viewer. If you are totally new to the idea of personal media collections and booru-style tagging, I suggest you start slow, walk through the getting started guides, and experiment doing different things. If you aren't sure on what a button does, try clicking it! You'll be importing thousands of files and applying tens of thousands of tags in no time. The best way to learn is just to try things out.
The client is chiefly a file database. It stores your files inside its own folders, managing them far better than an explorer window or some online gallery. Here's a screenshot of one of my test installs with a search showing all files:
As well as the client, there is also a server that anyone can run to store files or tags for sharing between many users. This is advanced, and almost always confusing to new users, do not explore this until you know what you are doing. There is however, a user-run public tag repository, with more than a billion tags, that you can access and contribute to if you wish.
I have many plans to expand the client and the network.
"},{"location":"introduction.html#principles","title":"statement of principles","text":"None of the above are currently true, but I would love to live in a world where they were. My software is an attempt to move us a little closer.
Where possible, I prefer decentralised systems that are focused on people. I still use gmail and youtube IRL just like pretty much everyone, but I would rather we have alternative systems for alternate work, especially in the future. No one seemed to be making what I wanted for file management, particularly as everything rushed to the cloud space, so I decided to make a local solution myself, and here we are.
If, after a few months, you find you enjoy the software and would like to further support it, I have set up a simple no-reward patreon, which you can read more about here.
"},{"location":"introduction.html#license","title":"license","text":"These programs are free software. Everything I, hydrus dev, have made is under the Do What The Fuck You Want To Public License, Version 3, as published by Kris Craig.
license.txt DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE\n Version 3, May 2010\n\nCopyright (C) 2010 by Kris Craig\nOlympia, WA USA\n\nEveryone is permitted to copy and distribute verbatim or modified\ncopies of this license document, and changing it is allowed as long\nas the name is changed.\n\nThis license applies to any copyrightable work with which it is\npackaged and/or distributed, except works that are already covered by\nanother license. Any other license that applies to the same work\nshall take precedence over this one.\n\nTo the extent permitted by applicable law, the works covered by this\nlicense are provided \"as is\" and do not come with any warranty except\nwhere otherwise explicitly stated.\n\n\n DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE\n TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION, AND MODIFICATION\n\n 0. You just DO WHAT THE FUCK YOU WANT TO.\n
Do what the fuck you want to with my software, and if shit breaks, DEAL WITH IT.
"},{"location":"ipfs.html","title":"IPFS","text":"IPFS is a p2p protocol that makes it easy to share many sorts of data. The hydrus client can communicate with an IPFS daemon to send and receive files.
You can read more about IPFS from their homepage, or this guide that explains its various rules in more detail.
For our purposes, we only need to know about these concepts:
Note there is now a nicer desktop package here. I haven't used it, but it may be a nicer intro to the program.
Get the prebuilt executable here. Inside should be a very simple 'ipfs' executable that does everything. Extract it somewhere and open up a terminal in the same folder, and then type:
ipfs init
ipfs daemon
The IPFS exe should now be running in that terminal, ready to respond to requests:
You can kill it with Ctrl+C and restart it with the ipfs daemon
call again (you only have to run ipfs init
once).
When it is running, opening this page should download and display an example 'Hello World!' file from ~~~across the internet~~~.
Your daemon listens for other instances of ipfs using port 4001, so if you know how to open that port in your firewall and router, make sure you do.
"},{"location":"ipfs.html#connecting","title":"connecting your client","text":"IPFS daemons are treated as services inside hydrus, so go to services->manage services->remote->ipfs daemons and add in your information. Hydrus uses the API port, default 5001, so you will probably want to use credentials of 127.0.0.1:5001
. You can click 'test credentials' to make sure everything is working.
Thereafter, you will get the option to 'pin' and 'unpin' from a thumbnail's right-click menu, like so:
This works like hydrus's repository uploads--it won't happen immediately, but instead will be queued up at the pending menu. Commit all your pins when you are ready:
Notice how the IPFS icon appears on your pending and pinned files. You can search for these files using 'system:file service'.
Unpin works the same as pin, just like a hydrus repository petition.
Right-clicking any pinned file will give you a new 'share' action:
Which will put it straight in your clipboard. In this case, it is QmP6BNvWfkNf74bY3q1ohtDZ9gAmss4LAjuFhqpDPQNm1S.
If you want to share a pinned file with someone, you have to tell them this multihash. They can then:
http://127.0.0.1:8080/ipfs/[multihash]
http://ipfs.io/ipfs/[multihash]
If you have many files to share, IPFS also supports directories, and now hydrus does as well. IPFS directories use the same sorts of multihash as files, and you can download them into the hydrus client using the same pages->new download popup->an ipfs multihash menu entry. The client will detect the multihash represents a directory and give you a simple selection dialog:
You may recognise those hash filenames--this example was created by hydrus, which can create ipfs directories from any selection of files from the same right-click menu:
Hydrus will pin all the files and then wrap them in a directory, showing its progress in a popup. Your current directory shares are summarised on the respective services->review services panel:
"},{"location":"ipfs.html#additional_links","title":"additional links","text":"If you find you use IPFS a lot, here are some add-ons for your web browser, as recommended by /tech/:
This script changes all bare ipfs hashes into clickable links to the ipfs gateway (on page loads):
These redirect all gateway links to your local daemon when it's on, it works well with the previous script:
You can launch the program with several different arguments to alter core behaviour. If you are not familiar with this, you are essentially putting additional text after the launch command that runs the program. You can run this straight from a terminal console (usually good to test with), or you can bundle it into an easy shortcut that you only have to double-click. An example of a launch command with arguments:
C:\\Hydrus Network\\hydrus_client.exe -d=\"E:\\hydrus db\" --no_db_temp_files\n
You can also add --help to your program path, like this:
hydrus_client.py --help
hydrus_server.exe --help
./hydrus_server --help
Which gives you a full listing of all below arguments, however this will not work with the built hydrus_client executables, which are bundled as a non-console programs and will not give you text output to any console they are launched from. As hydrus_client.exe is the most commonly run version of the program, here is the list, with some more help about each command:
"},{"location":"launch_arguments.html#-d_db_dir_--db_dir_db_dir","title":"-d DB_DIR, --db_dir DB_DIR
","text":"Lets you customise where hydrus should use for its base database directory. This is install_dir/db by default, but many advanced deployments will move this around, as described here. When an argument takes a complicated value like a path that could itself include whitespace, you should wrap it in quote marks, like this:
-d=\"E:\\my hydrus\\hydrus db\"\n
"},{"location":"launch_arguments.html#--temp_dir_temp_dir","title":"--temp_dir TEMP_DIR
","text":"This tells all aspects of the client, including the SQLite database, to use a different path for temp operations. This would be by default your system temp path, such as:
C:\\Users\\You\\AppData\\Local\\Temp\n
But you can also check it in help->about. A handful of database operations (PTR tag processing, vacuums) require a lot of free space, so if your system drive is very full, or you have unusual ramdisk-based temp storage limits, you may want to relocate to another location or drive.
"},{"location":"launch_arguments.html#--db_journal_mode_waltruncatepersistmemory","title":"--db_journal_mode {WAL,TRUNCATE,PERSIST,MEMORY}
","text":"Change the journal mode of the SQLite database. The default is WAL, which works great for almost all SSD drives, but if you have a very old or slow drive, or if you encounter 'disk I/O error' errors on Windows with an NVMe drive, try TRUNCATE. Full docs are here.
Briefly:
--db_transaction_commit_period DB_TRANSACTION_COMMIT_PERIOD
","text":"Change the regular duration at which any database changes are committed to disk. By default this is 30 (seconds) for the client, and 120 for the server. Minimum value is 10. Typically, if hydrus crashes, it may 'forget' what happened up to this duration on the next boot. Increasing the duration will result in fewer overall 'commit' writes during very heavy work that makes several changes to the same database pages (read up on WAL mode for more details here), but it will increase commit time and memory/storage needs. Note that changes can only be committed after a job is complete, so if a single job takes longer than this period, changes will not be saved until it is done.
"},{"location":"launch_arguments.html#--db_cache_size_db_cache_size","title":"--db_cache_size DB_CACHE_SIZE
","text":"Change the size of the cache SQLite will use for each db file, in MB. By default this is 256, for 256MB, which for the four main client db files could mean an absolute 1GB peak use if you run a very heavy client and perform a long period of PTR sync. This does not matter so much (nor should it be fully used) if you have a smaller client.
"},{"location":"launch_arguments.html#--db_synchronous_override_0123","title":"--db_synchronous_override {0,1,2,3}
","text":"Change the rules governing how SQLite writes committed changes to your disk. The hydrus default is 1 with WAL, 2 otherwise.
A user has written a full guide on this value here! SQLite docs here.
"},{"location":"launch_arguments.html#--no_db_temp_files","title":"--no_db_temp_files
","text":"When SQLite performs very large queries, it may spool temporary table results to disk. These go in your temp directory. If your temp dir is slow but you have a ton of memory, set this to never spool to disk, as here.
"},{"location":"launch_arguments.html#--boot_debug","title":"--boot_debug
","text":"Prints additional debug information to the log during the bootup phase of the application.
"},{"location":"launch_arguments.html#--profile_mode","title":"--profile_mode
","text":"This starts the program with 'Profile Mode' turned on, which captures the performance of boot functions. This is also a way to get Profile Mode on the server, although support there is very limited.
"},{"location":"launch_arguments.html#--win_qt_darkmode_test","title":"--win_qt_darkmode_test
","text":"Windows only, client only: This starts the program with Qt's 'darkmode' detection enabled, as here, set to 1 mode. It will override any existing qt.conf, so it is only for experimentation. We are going to experiment more with the 2 mode, but that locks the style to windows
, and can't handle switches between light and dark mode.
The server supports the same arguments. It also takes an optional positional argument of 'start' (start the server, the default), 'stop' (stop any existing server), or 'restart' (do a stop, then a start), which should go before any of the above arguments.
"},{"location":"local_booru.html","title":"local booru","text":"Warning
This was a fun project, but it never advanced beyond a prototype. The future of this system is other people's nice applications plugging into the Client API.
The hydrus client has a simple booru to help you share your files with others over the internet.
First of all, this is hosted from your client, which means other people will be connecting to your computer and fetching files you choose to share from your hard drive. If you close your client or shut your computer down, the local booru will no longer work.
"},{"location":"local_booru.html#setting_up","title":"how to do it","text":"First of all, turn the local booru server on by going to services->manage services and giving it a port:
It doesn't matter what you pick, but make it something fairly high. When you ok that dialog, the client should start the booru. You may get a firewall warning.
Then right click some files you want to share and select share->local booru. This will throw up a small dialog, like so:
This lets you enter an optional name, which titles the share and helps you keep track of it, an optional text, which lets you say some words or html to the people you are sharing with, and an expiry, which lets you determine if and when the share will no longer work.
You can also copy either the internal or external link to your clipboard. The internal link (usually starting something like http://127.0.0.1:45866/
) works inside your network and is great just for testing, while the external link (starting http://[your external ip address]:[external port]/
) will work for anyone around the world, as long as your booru's port is being forwarded correctly.
If you use a dynamic-ip service like No-IP, you can replace your external IP with your redirect hostname. You have to do it by hand right now, but I'll add a way to do it automatically in future.
Danger
Note that anyone with the external link will be able to see your share, so make sure you only share links with people you trust.
"},{"location":"local_booru.html#port_forwarding","title":"forwarding your port","text":"Your home router acts as a barrier between the computers inside the network and the internet. Those inside can see out, but outsiders can only see what you tell the router to permit. Since you want to let people connect to your computer, you need to tell the router to forward all requests of a certain kind to your computer, and thus your client.
If you have never done this before, it can be a headache, especially doing it manually. Luckily, a technology called UPnP makes it a ton easier, and this is how your Skype or Bittorrent clients do it automatically. Not all routers support it, but most do. You can have hydrus try to open a port this way back on services->manage services. Unless you know what you are doing and have a good reason to make them different, you might as well keep the internal and external ports the same.
Once you have it set up, the client will try to make sure your router keeps that port open for your client. If it all works, you should see the new mapping appear in your services->manage local upnp dialog, which lists all your router's current port mappings.
If you want to test that the port forward is set up correctly, going to http://[external ip]:[external port]/
should give a little html just saying hello. Your ISP might not allow you to talk to yourself, though, so ask a friend to try if you are having trouble.
If you still do not understand what is going on here, this is a good article explaining everything.
If you do not like UPnP or your router does not support it, you can set the port forward up manually, but I encourage you to keep the internal and external port the same, because absent a 'upnp port' option, the 'copy external share link' button will use the internal port.
"},{"location":"local_booru.html#example","title":"so, what do you get?","text":"The html layout is very simple:
It uses a very similar stylesheet to these help pages. If you would like to change the style, have a look at the html and then edit install_dir/static/local_booru_style.css. The thumbnails will be the same size as in your client.
"},{"location":"local_booru.html#editing_shares","title":"editing an existing share","text":"You can review all your shares on services->review services, under local->booru. You can copy the links again, change the title/text/expiration, and delete any shares you don't want any more.
"},{"location":"local_booru.html#future","title":"future plans","text":"This was a fun project, but it never advanced beyond a prototype. The future of this system is other people's nice applications plugging into the Client API.
"},{"location":"petitionPractices.html","title":"Petitions practices","text":"This document exists to give a rough idea what to do in regard to the PTR to avoid creating uncecessary work for the janitors.
"},{"location":"petitionPractices.html#general_practice","title":"General practice","text":"Kindly avoid creating unnecessary work. Create siblings for underscore and non-namespaced/namespaced versions. Petition for deletion if they are wrong. Providing a reason outside of the stock choices helps the petition getting accepted. If, for whatever reason, you have some mega job that needs doing it's often a good idea to talk to a janitor instead since we can just go ahead and do the job directly without having to deal with potentially tens of petitions because of how Hydrus splits them on the server. An example that we often come across is the removal of the awful Sankaku URLs that are almost everywhere these days due to people using a faulty parser. It's a pretty easy search and delete for a janitor, but a lot of annoying clicking if dealt with as a petition since one big petition can be split out to God-only-knows-how many.
Eventually the PTR janitors will get tools to replace various bad but correct tags on the server itself. These include underscored, wrong or no namespace, common misspelling, wrong locale, and so on. Since we're going to have to do the job eventually anyway there's not much of a point making us do it twice by petitioning the existing bad but correct tags. Just sibling them and leave them be for now.
"},{"location":"petitionPractices.html#ambiguity","title":"Ambiguity","text":"Don't make additions involving ambiguous tags. hibiki
-> character:hibiki (kantai collection)
is bad since there's far more than one character with that name. There's quite a few wrongly tagged images because of things like this. Petitioning the deletion of such a bad sibling is good.
Anything that's covered by system predicates. Siblinging these is unecessary and parenting pointless. There's no harm leaving them be aside from crowding the tag list but there's no harm to deleting them either.
system:dimensions
covers most everything related to resolution and aspect ratios. medium:high resolution
, 4:3 aspect ratio
, and pixel count.
system:duration
for whether something has duration (is a video or animated gif/png/whatever), or is a still image.
system:has audio
for if an image has audio or not. system:has duration + system:no audio
replaces video with no sound
as an example.
system:filesize
for things like huge filesize
.
system:filetype
for filetypes. Gif, webm, mp4, psd, and so on. Anything that Hydrus can recognise which is quite a bit.
Don't push parents for tags that are not top-level siblings. It makes tracking down potential issues hard.
Only push parents for relations that are literally always true, no exceptions. character:james bond
-> series:james bond
is a good example because James Bond always belong to that series. -> gender:male
is bad because an artist might decide to draw a genderbent piece of art. Similarily -> person:pierce brosnan
is bad because there have been other actors for the character.
List of some bad parents to character:
tags as an example: - species:
due to the various -zations (humanization, animalization, mechanization). - creator:
since just about anybody can draw art of the character. - gender:
Since genderswap
and variations exists. - Any form of physical characteristics such as hair or eye colour, hair length, clothing and accessories, etc.
Translations should be siblinged to what the closest in-use romanised tag is if there's no proper translation. If the tag is ambiguous, such as \u97ff
or \u30d2\u30d3\u30ad
which means hibiki
, just sibling them to the ambiguous tag. The tag can then later on be deleted and replaced by a less ambiguous tag. On the other hand, \u97ff(\u8266\u968a\u3053\u308c\u304f\u3057\u3087\u3093)
straight up means hibiki (kantai kollection)
and can safely be siblinged to the proper character:
tag. Do the same for subjective tags. \u9b45\u60d1\u306e\u3075\u3068\u3082\u3082
can be translated to bewitching thighs
. \u307e\u3063\u305f\u304f\u3001\u99c6\u9010\u8266\u306f\u6700\u9ad8\u3060\u305c!!
straight up translates to Geez, destroyers are the best!!
, which does not contain much usable information for Hydrus currently. These can then either be siblinged down to an unsubjective tag (thighs
) if there's objective information in the tag, deleted if purely subjective, or deleted and replaced if ambiguous.
tl;dr
Using a trustworthy VPN for all your remotely fun internet traffic is a good idea. It is cheap and easy these days, and it offers multiple levels of general protection.
I have tried very hard to ensure the hydrus network servers respect your privacy. They do not work like normal websites, and the amount of information your client will reveal to them is very limited. For most general purposes, normal users can rest assured that their activity on a repository like the Public Tag Repository (PTR) is effectively completely anonymous.
You need an account to connect, but all that really means serverside is a random number with a random passcode. Your client tells nothing more to the server than the exact content you upload to it (e.g. tag mappings, which are a tag+file_hash pair). The server cannot help but be aware of your IP address to accept your network request, but in all but one situation--uploading a file to a file repository when the administrator has set to save IPs for DMCA purposes--it forgets your IP as soon as the job is done.
So that janitors can process petitions efficiently and correct mistakes, servers remember which accounts upload which content, but they do not communicate this to any place, and the memory only lasts for a certain time--after which the content is completely anonymised. The main potential privacy worries are over a malicious janitor or--more realistically, since the janitor UI is even more buggy and feature-poor than the hydrus front-end!--a malicious server owner or anyone else who gains raw access to the server's raw database files or its code as it operates. Even in the case where you cannot trust the server you are talking to, hydrus should be fairly robust, simply because the client does not say much to the server, nor that often. The only realistic worries, as I talk about in detail below, are if you actually upload personal files or tag personal files with real names. I can't do much about being Anon if you (accidentally or not), declare who you are.
So, in general, if you are on a good VPN and tagging anime babes from boorus, I think we are near perfect on privacy. That said, our community is rightly constantly thinking about this topic, so in the following I have tried to go into exhaustive detail. Some of the vulnerabilities are impractical and esoteric, but if nothing else it is fun to think about. If you can think of more problems, or decent mitigations, let me know!
"},{"location":"privacy.html#https_certificates","title":"https certificates","text":"Hydrus servers only communicate in https, so anyone who is able to casually observe your traffic (say your roommate cracked your router, or the guy running the coffee shop whose wifi you are using likes to snoop) should not ever be able to see what data you are sending or receiving. If you do not use a VPN, they will be able to see that you are talking to the repository (and the repository will technically see who you are, too, though as above, it normally isn't interested). Someone more powerful, like your ISP or Government, may be able to do more:
If you just start a new server yourselfWhen you first make a server, the 'certificate' it creates to enable https is a low quality one. It is called 'self-signed' because it is only endorsed by itself and it is not tied to a particular domain on the internet that everyone agrees on via DNS. Your traffic to this server is still encrypted, but an advanced attacker who stands between you and the server could potentially perform what is called a man-in-the-middle attack and see your traffic.
This problem is fairly mitigated by using a VPN, since even if someone were able to MitM your connection, they know no more than your VPN's location, not your IP.
A future version of the network will further mitigate this problem by having you enter unverified certificates into a certificate manager and then compare to that store on future requests, to try to detect if a MitM attack is occurring.
If the server is on a domain and now uses a proper verified certificate If the admin hosts the server on a website domain (rather than a raw IP address) and gets a proper certificate for that domain from a service like Let's Encrypt, they can swap that into the server and then your traffic should be protected from any eavesdropper. It is still good to use a VPN to further obscure who you are, including from the server admin.You can check how good a server's certificate is by loading its base address in the form https://host:port
into your browser. If it has a nice certificate--like the PTR--the welcome page will load instantly. If it is still on self-signed, you'll get one of those 'can't show this page unless you make an exception' browser error pages before it will show.
An account has two hex strings, like this:
Access key: 4a285629721ca442541ef2c15ea17d1f7f7578b0c3f4f5f2a05f8f0ab297786f
This is in your services->manage services panel, and acts like a password. Keep this absolutely secret--only you know it, and no one else ever needs to. If the server has not had its code changed, it does not actually know this string, but it is stores special data that lets it verify it when you 'log in'.
Account ID: 207d592682a7962564d52d2480f05e72a272443017553cedbd8af0fecc7b6e0a
This can be copied from a button in your services->review services panel, and acts a bit like a semi-private username. Only janitors should ever have access to this. If you ever want to contact the server admin about an account upgrade or similar, they will need to know this so they can load up your account and alter it.
When you generate a new account, the client first asks the server for a list of available auto-creatable account types, then asks for a registration token for one of them, then uses the token to generate an access key. The server is never told anything about you, and it forgets your IP address as soon as it finishes talking to you.
Your account also stores a bandwidth use record and some miscellaneous data such as when the account was created, if and when it expires, what permissions and bandwidth rules it has, an aggregate score of how often it has petitions approved rather than denied, and whether it is currently banned. I do not think someone inspecting the bandwidth record could figure out what you were doing based on byte counts (especially as with every new month the old month's bandwidth records are compressed to just one number) beyond the rough time you synced and whether you have done much uploading. Since only a janitor can see your account and could feasibly attempt to inspect bandwidth data, they would already know this information.
"},{"location":"privacy.html#downloading","title":"downloading","text":"When you sync with a repository, your client will download and then keep up to date with all the metadata the server knows. This metadata is downloaded the same way by all users, and it comes in a completely anonymous format. The server does not know what you are interested in, and no one who downloads knows who uploaded what. Since the client regularly updates, a detailed analysis of the raw update files will reveal roughly when a tag or other row was added or deleted, although that timestamp is no more precise than the duration of the update period (by default, 100,000 seconds, or a little over a day).
Your client will never ask the server for information about a particular file or tag. You download everything in generic chunks, form a local index of that information, and then all queries are performed on your own hard drive with your own CPU.
By just downloading, even if the server owner were to identify you by your IP address, all they know is that you sync. They cannot tell anything about your files.
In the case of a file repository, you client downloads all the thumbnails automatically, but then you download actual files separately as you like. The server does not log which files you download.
"},{"location":"privacy.html#uploading","title":"uploading","text":"When you upload, your account is temporarily linked to the rows of content you add. This is so janitors can group petitions by who makes them, undo large mistakes easily, and even leave you a brief message (like \"please stop adding those clothing siblings\") for your client to pick up the next time it syncs your account. After the temporary period is over, all submissions are anonymised. So, what are the privacy concerns with that? Isn't the account 'Anon'?
Privacy can be tricky. Hydrus tech is obviously far, far better than anything normal consumers use, but here I believe are the remaining barriers to pure Anonymity, assuming someone with resources was willing to put a lot of work in to attack you:
Note
I am using the PTR as the example since that is what most people are using. If you are uploading to a server run between friends, privacy is obviously more difficult to preserve--if there are only three users, it may not be too hard to figure out who is uploading the NarutoXSonichu diaperfur content! If you are talking to a server with a small group of users, don't upload anything crazy or personally identifying unless that's the point of the server.
"},{"location":"privacy.html#ip_address_across_network","title":"IP Address Across Network","text":"Attacker: ISP/Government.
Exposure: That you use the PTR.
Problem: Your IP address may be recorded by servers in between you and the PTR (e.g. your ISP/Government). Anyone who could convert that IP address and timestamp into your identity would know you were a PTR user.
Mitigation: Use a trustworthy VPN.
"},{"location":"privacy.html#ip_address_at_ptr","title":"IP Address At PTR","text":"Attacker: PTR administrator or someone else who has access to the server as it runs.
Exposure: Which PTR account you are.
Problem: I may be lying to you about the server forgetting IPs, or the admin running the PTR may have secretly altered its code. If the malicious admin were able to convert IP address and timestamp into your identity, they obviously be able to link that to your account and thus its various submissions.
Mitigation: Use a trustworthy VPN.
"},{"location":"privacy.html#time_identifiable_uploads","title":"Time Identifiable Uploads","text":"Attacker: Anyone with an account on the PTR.
Exposure: That you use the PTR.
Problem: If a tag was added way before the file was public, then it is likely the original owner tagged it. An example would be if you were an artist and you tagged your own work on the PTR two weeks before publishing the work. Anyone who looked through the server updates carefully and compared to file publish dates, particularly if they were targeting you already, could notice the date discrepancy and know you were a PTR user.
Mitigation: Don't tag any file you plan to share if you are currently the only person who has any copies. Upload it, then tag it.
"},{"location":"privacy.html#content_identifiable_uploads","title":"Content Identifiable Uploads","text":"Attacker: Anyone with an account on the PTR.
Exposure: That you use the PTR.
Problem: All uploads are shared anonymously with other users, but if the content itself is identifying, you may be exposed. An example would be if there was some popular lewd file floating around of you and your girlfriend, but no one knew who was in it. If you decided to tag it with accurate 'person:' tags, anyone synced with the PTR, when they next looked at that file, would see those person tags. The same would apply if the file was originally private but then leaked.
Mitigation: Just like an imageboard, do not upload any personally identifying information.
"},{"location":"privacy.html#individual_account_cross-referencing","title":"Individual Account Cross-referencing","text":"Attacker: PTR administrator or someone else with access to the server database files after one of your uploads has been connected to your real identity, perhaps with a Time/Content Identifiable Upload as above.
Exposure: What you have been uploading recently.
Problem: If you accidentally tie your identity to an individual content row (could be as simple as telling an admin 'yes, I, person whose name you know, uploaded that sibling last week'), then anyone who can see which accounts uploaded what will obviously be able to see your other uploads.
Mitigation: Best practise is to not to reveal specifically what you upload. Note that this vulnerability (an admin looking up what else you uploaded after they discover something else you did) is now well mitigated by the account history anonymisation as below (assuming the admin has not altered the code to disable it!). If the server is set to anonymise content after 90 days, then your account can only be identified from specific content rows that were uploaded in the past 90 days, and cross-references would also only see the last 90 days of activity.
"},{"location":"privacy.html#big_brain_individual_account_mapping_fingerprint_cross-referencing","title":"Big Brain Individual Account Mapping Fingerprint Cross-referencing","text":"Attacker: Someone who has access to tag/file favourite lists on another site and gets access to a hydrus repository that has been compromised to not anonymise history for a long duration.
Exposure: Which PTR account another website's account uses.
Problem: Someone who had raw access to the PTR database's historical account record (i.e. they had disabled the anonymisation routine below) and also had compiled some booru users' 'favourite tag/artist' lists and was very clever could try to cross reference those two lists and connect a particular PTR account to a particular booru account based on similar tag distributions. There would be many holes in the PTR record, since only the first account to upload a tag mapping is linked to it, but maybe it would be possible to get high confidence on a match if you have really distinct tastes. Favourites lists are probably decent digital fingerprints, and there may be a shadow of that in your PTR uploads, although I also think there are enough users uploading and 'competing' for saved records on different tags that each users' shadow would be too indistinct to really pull this off.
Mitigation: I am mostly memeing here. But privacy is tricky, and who knows what the scrapers of the future are going to do with all the cloud data they are sucking up. Even then, the historical anonymisation routine below now generally eliminates this threat, assuming the server has not been compromised to disable it, so it matters far less if its database files fall into bad hands in the future, but accounts on regular websites are already being aggregated by the big marketing engines, and this will only happen in more clever ways in future. I wouldn't be surprised if booru accounts are soon being connected to other online identities based on fingerprint profiles of likes and similar. Don't save your spicy favourites on a website, even if that list is private, since if that site gets hacked or just bought out one day, someone really smart could start connecting dots ten years from now.
"},{"location":"privacy.html#account_history","title":"account history anonymisation","text":"As the PTR moved to multiple accounts, we talked more about the potential account cross-referencing worries. The threats are marginal today, but it may be a real problem in future. If the server database files were to ever fall into bad hands, having a years-old record of who uploaded what is not excellent. Like the AOL search leak, that data may have unpleasant rammifications, especially to an intelligent scraper in the future. This historical record is also not needed for most janitorial work.
Therefore, hydrus repositories now completely anonymise all uploads after a certain delay. It works by assigning ownership of every file, mapping, or tag sibling/parent to a special 'null' account, so all trace that your account uploaded any of it is deleted. It happens by default 90 days after the content is uploaded, but it can be more or less depending on the local admin and janitors. You can see the current 'anonymisation' period under review services.
If you are a janitor with the ability to modify accounts based on uploaded content, you will see anything old will bring up the null account. It is specially labelled, so you can't miss it. You cannot ban or otherwise alter this account. No one can actually use it.
"},{"location":"reducing_lag.html","title":"reducing lag","text":""},{"location":"reducing_lag.html#intro","title":"hydrus is cpu and hdd hungry","text":"The hydrus client manages a lot of complicated data and gives you a lot of power over it. To add millions of files and tags to its database, and then to perform difficult searches over that information, it needs to use a lot of CPU time and hard drive time--sometimes in small laggy blips, and occasionally in big 100% CPU chunks. I don't put training wheels or limiters on the software either, so if you search for 300,000 files, the client will try to fetch that many.
Furthermore, I am just one unprofessional guy dealing with a lot of legacy code from when I was even worse at programming. I am always working to reduce lag and other inconveniences, and improve UI feedback when many things are going on, but there is still a lot for me to do.
In general, the client works best on snappy computers with low-latency hard drives where it does not have to constantly compete with other CPU- or HDD- heavy programs. Running hydrus on your games computer is no problem at all, but if you leave the client on all the time, then make sure under the options it is set not to do idle work while your CPU is busy, so your games can run freely. Similarly, if you run two clients on the same computer, you should have them set to work at different times, because if they both try to process 500,000 tags at once on the same hard drive, they will each slow to a crawl.
If you run on an HDD, keeping it defragged is very important, and good practice for all your programs anyway. Make sure you know what this is and that you do it.
"},{"location":"reducing_lag.html#maintenance_and_processing","title":"maintenance and processing","text":"I have attempted to offload most of the background maintenance of the client (which typically means repository processing and internal database defragging) to time when you are not using the client. This can either be 'idle time' or 'shutdown time'. The calculations for what these exactly mean are customisable in file->options->maintenance and processing.
If you run a quick computer, you likely don't have to change any of these options. Repositories will synchronise and the database will stay fairly optimal without you even noticing the work that is going on. This is especially true if you leave your client on all the time.
If you have an old, slower computer though, or if your hard drive is high latency, make sure these options are set for whatever is best for your situation. Turning off idle time completely is often helpful as some older computers are slow to even recognise--mid task--that you want to use the client again, or take too long to abandon a big task half way through. If you set your client to only do work on shutdown, then you can control exactly when that happens.
"},{"location":"reducing_lag.html#reducing_lag","title":"reducing search and general gui lag","text":"Searching for tags via the autocomplete dropdown and searching for files in general can sometimes take a very long time. It depends on many things. In general, the more predicates (tags and system:something) you have active for a search, and the more specific they are, the faster it will be.
You can also look at file->options->speed and memory. Increasing the autocomplete thresholds under tags->manage tag display and search is also often helpful. You can even force autocompletes to only fetch results when you manually ask for them.
Having lots of thumbnails open or downloads running can slow many things down. Check the 'pages' menu to see your current session weight. If it is about 50,000, or you have individual pages with more than 10,000 files or download URLs, try cutting down a bit.
"},{"location":"reducing_lag.html#profiles","title":"finally - profiles","text":"Programming is all about re-editing your first, second, third drafts of an idea. You are always going back to old code and adding new features or making it work better. If something is running slow for you, I can almost always speed it up or at least improve the way it schedules that chunk of work.
However figuring out exactly why something is running slow or holding up the UI is tricky and often gives an unexpected result. I can guess what might be running inefficiently from reports, but what I really need to be sure is a profile, which drills down into every function of a job, counting how many times they are called and timing how long they take. A profile for a single call looks like this.
So, please let me know:
You can generate a profile by hitting help->debug->profiling->profile mode, which tells the client to generate profile information for almost all of its behind the scenes jobs. This can be spammy, so don't leave it on for a very long time (you can turn it off by hitting the help menu entry again).
Turn on profile mode, do the thing that runs slow for you (importing a file, fetching some tags, whatever), and then check your database folder (most likely install_dir/db) for a new 'client profile - DATE.log' file. This file will be filled with several sets of tables with timing information. Please send that whole file to me, or if it is too large, cut what seems important. It should not contain any personal information, but feel free to look through it.
There are several ways to contact me.
"},{"location":"running_from_source.html","title":"running from source","text":"I write the client and server entirely in python, which can run straight from source. It is getting simpler and simpler to run python programs like this, so don't be afraid of it. If none of the built packages work for you (for instance if you use Windows 8.1 or 18.04 Ubuntu (or equivalent)), it may be the only way you can get the program to run. Also, if you have a general interest in exploring the code or wish to otherwise modify the program, you will obviously need to do this.
"},{"location":"running_from_source.html#simple_setup_guide","title":"Simple Setup Guide","text":"There are now setup scripts that make this easy on Windows and Linux. You do not need any python experience.
"},{"location":"running_from_source.html#summary","title":"Summary:","text":"First of all, you will need to install Python. Get 3.10 or 3.11 here. During the install process, make sure it has something like 'Add Python to PATH' checked. This makes Python available to your Windows.
You should already have a fairly new python. Ideally, you want at least 3.9.
You should already have python of about the correct version.
If you are already on a very new version of python, that's ok--you might need to select the 'advanced' setup later on and choose the '(t)est' options. If you are stuck on a much older version of python, try the same thing, but with the '(o)lder' options (but I can't promise it will work!).
Then, get the hydrus source. The github repo is https://github.com/hydrusnetwork/hydrus. If you are familiar with git, you can just clone the repo to the location you want with git clone https://github.com/hydrusnetwork/hydrus
, but if not, then just go to the latest release and download and extract the source code .zip somewhere. Make sure the directory has write permissions (e.g. don't put it in \"Program Files\"). Extracting straight to a spare drive, something like \"D:\\Hydrus Network\", is ideal.
We will call the base extract directory, the one with 'hydrus_client.py' in it, install_dir
.
Mixed Builds
Don't mix and match build extracts and source extracts. The process that runs the code gets confused if there are unexpected extra .dlls in the directory. If you need to convert between built and source releases, perform a clean install.
If you are converting from one install type to another, make a backup before you start. Then, if it all goes wrong, you'll always have a safe backup to rollback to.
"},{"location":"running_from_source.html#built_programs","title":"Built Programs","text":"There are three special external libraries. You just have to get them and put them in the correct place:
WindowsLinuxmacOSmpv
mpv-2.dll
.Then open that archive and place the 'mpv-1.dll' or 'mpv-2.dll' into install_dir
.
I have word that that newer mpv, the API version 2.1 that you have to rename to mpv-2.dll, will work on Qt5 and Windows 7. If this applies to you, feel free to have a play around with different versions here. You'll need the newer mpv choice in the setup-venv script too, which, depending on your situation, may not be possible.
SQLite3
Go to install_dir/static/build_files/windows
and copy 'sqlite3.dll' into install_dir
.
FFMPEG
Get a Windows build of FFMPEG here.
Extract the ffmpeg.exe into install_dir/bin
.
mpv
Try running apt-get install libmpv1
in a new terminal. You can type apt show libmpv1
to see your current version. Or, if you use a different package manager, try searching libmpv
or libmpv1
on that.
SQLite3
No action needed.
FFMPEG
You should already have ffmpeg. Just type ffmpeg
into a new terminal, and it should give a basic version response. If you somehow don't have ffmpeg, check your package manager.
If you run into trouble running newer versions of Qt6, which you will be setting up later, some users have fixed it by installing the packages libicu-dev
and libxcb-cursor-dev
. With apt
that will be:
sudo apt-get install libicu-dev
sudo apt-get install libxcb-cursor-dev
mpv
Unfortunately, mpv is not well supported in macOS yet. You may be able to install it in brew, but it seems to freeze the client as soon as it is loaded. Hydev is thinking about fixes here.
SQLite3
No action needed.
FFMPEG
You should already have ffmpeg.
Double-click setup_venv.bat
.
The file is setup_venv.sh
. You may be able to double-click it. If not, open a terminal in the folder and type:
./setup_venv.sh
If you do not have permission to execute the file, do this before trying again:
chmod +x setup_venv.sh
You will likely have to do the same on the other .sh files.
If you get an error about the venv failing to activate during setup_venv.sh
, you may need to install venv especially for your system. The specific error message should help you out, but you'll be looking at something along the lines of apt install python3.10-venv
.
If you like, you can run the setup_desktop.sh
file to install a hydrus.desktop file to your applications folder. (Or check the template in install_dir/static/hydrus.desktop
and do it yourself!)
Double-click setup_venv.command
.
If you do not have permission to run the .command file, then open a terminal on the folder and enter:
chmod +x setup_venv.command
You will likely have to do the same on the other .command files.
You may need to experiment with the advanced choices, especially if your macOS is a litle old.
The setup will ask you some questions. Just type the letters it asks for and hit enter. Most users are looking at the (s)imple setup, but if your situation is unusual, try the (a)dvanced, which will walk you through the main decisions. Once ready, it should take a minute to download its packages and a couple minutes to install them. Do not close it until it is finished installing everything and says 'Done!'. If it seems like it hung, just give it time to finish.
If something messes up, or you want to make a different decision, just run the setup script again and it will reinstall everything. Everything these scripts do ends up in the 'venv' directory, so you can also just delete that folder to 'uninstall' the venv. It should just work on most normal computers, but let me know if you have any trouble.
Then run the 'setup_help' script to build the help. This isn't necessary, but it is nice to have it built locally. You can run this again at any time to rebuild the current help.
"},{"location":"running_from_source.html#running_it_1","title":"Running it","text":"WindowsLinuxmacOSRun 'hydrus_client.bat' to start the client.
Run 'hydrus_client.sh' to start the client. Don't forget to set chmod +x hydrus_client.sh
if you need it.
Run 'hydrus_client.command' to start the client. Don't forget to set chmod +x hydrus_client.command
if you need it.
The first start will take a little longer (it has to compile all the code into something your computer understands). Once up, it will operate just like a normal build with the same folder structure and so on.
Missing a Library
If the client fails to boot, it should place a 'hydrus_crash.log' in your 'db' directory or your desktop, or, if it got far enough, it may write the error straight to the 'client - date.log' file in your db directory.
If that error talks about a missing library, try reinstalling your venv. Are you sure it finished correctly? Do you need to run the advanced setup and select a different version of Qt?
WindowsLinuxmacOSIf you want to redirect your database or use any other launch arguments, then copy 'hydrus_client.bat' to 'hydrus_client-user.bat' and edit it, inserting your desired db path. Run this instead of 'hydrus_client.bat'. New git pull
commands will not affect 'hydrus_client-user.bat'.
You probably can't pin your .bat file to your Taskbar or Start (and if you try and pin the running program to your taskbar, its icon may revert to Python), but you can make a shortcut to the .bat file, pin that to Start, and in its properties set a custom icon. There's a nice hydrus one in install_dir/static
.
However, some versions of Windows won't let you pin a shortcut to a bat to the start menu. In this case, make a shortcut like this:
C:\\Windows\\System32\\cmd.exe /c \"C:\\hydrus\\Hydrus Source\\hydrus_client-user.bat\"
This is a shortcut to tell the terminal to run the bat; it should be pinnable to start. You can give it a nice name and the hydrus icon and you should be good!
If you want to redirect your database or use any other launch arguments, then copy 'hydrus_client.sh' to 'hydrus_client-user.sh' and edit it, inserting your desired db path. Run this instead of 'hydrus_client.sh'. New git pull
commands will not affect 'hydrus_client-user.sh'.
If you want to redirect your database or use any other launch arguments, then copy 'hydrus_client.command' to 'hydrus_client-user.command' and edit it, inserting your desired db path. Run this instead of 'hydrus_client.command'. New git pull
commands will not affect 'hydrus_client-user.command'.
To update, you do the same thing as for the extract builds.
git pull
as normal.If you get a library version error when you try to boot, run the venv setup again. It is worth doing this anyway, every now and then, just to stay up to date.
"},{"location":"running_from_source.html#migrating_from_an_existing_install","title":"Migrating from an Existing Install","text":"Many users start out using one of the official built releases and decide to move to source. There is lots of information here about how to migrate the database, but for your purposes, the simple method is this:
If you never moved your database to another place and do not use -d/--db_dir launch parameter
db
directory.db
directory to the source.If you moved your database to another location and use the -d/--db_dir launch parameter
db
directory.This is for advanced users only.
If you have never used python before, do not try this. If the easy setup scripts failed for you and you don't know what happened, please contact hydev before trying this, as the thing that went wrong there will probably go much more wrong here.
You can also set up the environment yourself. Inside the extract should be hydrus_client.py and hydrus_server.py. You will be treating these basically the same as the 'client' and 'server' executables--with the right environment, you should be able to launch them the same way and they take the same launch parameters as the exes.
Hydrus needs a whole bunch of libraries, so let's now set your python up. I strongly recommend you create a virtual environment. It is easy and doesn't mess up your system python.
You have to do this in the correct order! Do not switch things up. If you make a mistake, delete your venv folder and start over from the beginning.
To create a new venv environment:
python3
doesn't work, use python
.python3 -m pip install virtualenv
(if you need it)python3 -m venv venv
source venv/bin/activate
(CALL venv\\Scripts\\activate.bat
in Windows cmd)python -m pip install --upgrade pip
python -m pip install --upgrade wheel
venvs
That source venv/bin/activate
line turns on your venv. You should see your terminal prompt note you are now in it. A venv is an isolated environment of python that you can install modules to without worrying about breaking something system-wide. Ideally, you do not want to install python modules to your system python.
This activate line will be needed every time you alter your venv or run the hydrus_client.py
/hydrus_server.py
files. You can easily tuck this into a launch script--check the easy setup files for examples.
On Windows Powershell, the command is .\\venv\\Scripts\\activate
, but you may find the whole deal is done much easier in cmd than Powershell. When in Powershell, just type cmd
to get an old fashioned command line. In cmd, the launch command is just venv\\scripts\\activate.bat
, no leading period.
After you have activated the venv, you can use pip to install everything you need to it from the requirements.txt in the install_dir:
python -m pip install -r requirements.txt\n
If you need different versions of libraries, check the cut-up requirements.txts the 'advanced' easy-setup uses in install_dir/static/requirements/advanced
. Check and compare their contents to the main requirements.txt to see what is going on. You'll likely need the newer OpenCV on Python 3.10, for instance.
Qt is the UI library. You can run PySide2, PySide6, PyQt5, or PyQt6. A wrapper library called qtpy
allows this. The default is PySide6, but if it is missing, qtpy will fall back to an available alternative. For PyQt5 or PyQt6, you need an extra Chart module, so go:
python -m pip install qtpy PyQtChart PyQt5\n-or-\npython -m pip install qtpy PyQt6-Charts PyQt6\n
If you have multiple Qts installed, then select which one you want to use by setting the QT_API
environment variable to 'pyside2', 'pyside6', 'pyqt5', or 'pyqt6'. Check help->about to make sure it loaded the right one.
If you want to set QT_API in a batch file, do this:
set QT_API=pyqt6
If you run <= Windows 8.1 or Ubuntu 18.04, you cannot run Qt6. Try PySide2 or PyQt5.
Qt compatibility notesIf you run into trouble running newer versions of Qt6 on Linux, some users have fixed it by installing the packages libicu-dev
and libxcb-cursor-dev
. With apt
that will be:
sudo apt-get install libicu-dev
sudo apt-get install libxcb-cursor-dev
If you still have trouble with the default Qt6 version, or you rebuilt your venv and the newer version of Qt6 gives you problems, check out the setup_venv script language and the advanced requirements.txts files it relies on in install_dir/static/requirements/advanced
. There should be several older version examples you can try out.
To install a specific version of a library with pip, activate your venv and then type something like pip install PySide6==6.3.1
.
MPV is optional and complicated, but it is great, so it is worth the time to figure out!
As well as the python wrapper, 'python-mpv' (which is in the requirements.txt), you also need the underlying dev library. This is not mpv the program, but 'libmpv', often called 'libmpv1'.
For Windows, the dll builds are here, although getting a stable version can be difficult. Just put it in your hydrus base install directory. Check the links in the easy-setup guide above for good versions. You can also just grab the 'mpv-1.dll'/'mpv-2.dll' I bundle in my extractable Windows release.
If you are on Linux, you can usually get 'libmpv1' like so:
apt-get install libmpv1
On macOS, you should be able to get it with brew install mpv
, but you are likely to find mpv crashes the program when it tries to load. Hydev is working on this, but it will probably need a completely different render API.
Hit help->about to see your mpv status. If you don't have it, it will present an error popup box with more info.
"},{"location":"running_from_source.html#sqlite","title":"SQLite","text":"If you can, update python's SQLite--it'll improve performance. The SQLite that comes with stock python is usually quite old, so you'll get a significant boost in speed. In some python deployments, the built-in SQLite not compiled with neat features like Fast Text Search (FTS) that hydrus needs.
On Windows, get the 64-bit sqlite3.dll here, and just drop it in your base install directory. You can also just grab the 'sqlite3.dll' I bundle in my extractable Windows release.
You may be able to update your SQLite on Linux or macOS with:
apt-get install libsqlite3-dev
python -m pip install pysqlite3
But as long as the program launches, it usually isn't a big deal.
Extremely safe no way it can go wrong
If you want to update SQLite for your Windows system python install, you can also drop it into C:\\Program Files\\Python310\\DLLs
or wherever you have python installed, and it'll update for all your python projects. You'll be overwriting the old file, so make a backup of the old one (I have never had trouble updating like this, however).
A user who made a Windows venv with Anaconda reported they had to replace the sqlite3.dll in their conda env at ~/.conda/envs/<envname>/Library/bin/sqlite3.dll
.
If you don't have FFMPEG in your PATH and you want to import anything more fun than jpegs, you will need to put a static FFMPEG executable in your PATH or the install_dir/bin
directory. This should always point to a new build for Windows. Alternately, you can just copy the exe from one of my extractable Windows releases.
Once you have everything set up, hydrus_client.py and hydrus_server.py should look for and run off client.db and server.db just like the executables. You can use the 'hydrus_client.bat/sh/command' scripts in the install dir or use them as inspiration for your own. In any case, you are looking at entering something like this into the terminal:
source venv/bin/activate\npython hydrus_client.py\n
This will use the 'db' directory for your database by default, but you can use the launch arguments just like for the executables. For example, this could be your client-user.sh file:
#!/bin/bash\n\nsource venv/bin/activate\npython hydrus_client.py -d=\"/path/to/database\"\n
"},{"location":"running_from_source.html#building_these_docs","title":"Building these Docs","text":"When running from source you may want to build the hydrus help docs yourself. You can also check the setup_help
scripts in the install directory.
Almost everything you get through pip is provided as pre-compiled 'wheels' these days, but if you get an error about Visual Studio C++ when you try to pip something, you have two choices:
Option B is always the simpler. If opencv-headless as the requirements.txt specifies won't compile in Python 3.10, then try a newer version--there will probably be one of these new highly compatible wheels and it'll just work in seconds. Check my build scripts and various requirements.txts for ideas on what versions to try for your python etc...
If you are confident you need Visual Studio tools, then prepare for headaches. Although the tools are free from Microsoft, it can be a pain to get them through the official (and often huge) downloader installer from Microsoft. Expect a 5GB+ install with an eye-watering number of checkboxes that probably needs some stackexchange searches to figure out.
On Windows 10, Chocolatey has been the easy answer. Get it installed and and use this one simple line:
choco install -y vcbuildtools visualstudio2017buildtools windows-sdk-10.0\n
Trust me, just do this, it will save a ton of headaches!
Update: On Windows 11, in 2023-01, I had trouble with the above. There's a couple '11' SDKs that installed ok, but the vcbuildtools stuff had unusual errors. I hadn't done this in years, so maybe they are broken for Windows 10 too! The good news is that a basic stock Win 11 install with Python 3.10 is fine getting everything on our requirements and even making a build without any extra compiler tech.
"},{"location":"running_from_source.html#additional_windows","title":"Additional Windows Info","text":"This does not matter much any more, but in the old days, building modules like lz4 and lxml was a complete nightmare, and hooking up Visual Studio was even more difficult. This page has a lot of prebuilt binaries--I have found it very helpful many times.
I have a fair bit of experience with Windows python, so send me a mail if you need help.
"},{"location":"running_from_source.html#my_code","title":"My Code","text":"I develop hydrus on and am most experienced with Windows, so the program is more stable and reasonable on that. I do not have as much experience with Linux or macOS, but I still appreciate and will work on your Linux/macOS bug reports.
My coding style is unusual and unprofessional. Everything is pretty much hacked together. If you are interested in how things work, please do look through the source and ask me if you don't understand something.
I'm constantly throwing new code together and then cleaning and overhauling it down the line. I work strictly alone. While I am very interested in detailed bug reports or suggestions for good libraries to use, I am not looking for pull requests or suggestions on style. I know a lot of things are a mess. Everything I do is WTFPL, so feel free to fork and play around with things on your end as much as you like.
"},{"location":"server.html","title":"running your own server","text":"Note
You do not need the server to do anything with hydrus! It is only for advanced users to do very specific jobs! The server is also hacked-together and quite technical. It requires a fair amount of experience with the client and its concepts, and it does not operate on a timescale that works well on a LAN. Only try running your own server once you have a bit of experience synchronising with something like the PTR and you think, 'Hey, I know exactly what that does, and I would like one!'
Here is a document put together by a user describing whether you want the server.
"},{"location":"server.html#intro","title":"setting up a server","text":"I will use two terms, server and service, to mean two distinct things:
/file
or /update
) that the hydrus client can plug into. A service might be a repository for a certain kind of data, the administration interface to manage what services run on a server, or anything else.Setting up a hydrus server is easy compared to, say, Apache. There are no .conf files to mess about with, and everything is controlled through the client. When started, the server will place an icon in your system tray in Windows or open a small frame in Linux or macOS. To close the server, either right-click the system tray icon and select exit, or just close the frame.
The basic process for setting up a server is:
Let's look at these steps in more detail:
"},{"location":"server.html#start","title":"start the server","text":"Since the server and client have so much common code, I package them together. If you have the client, you have the server. If you installed in Windows, you can hit the shortcut in your start menu. Otherwise, go straight to 'hydrus_server' or 'hydrus_server.exe' or 'hydrus_server.py' in your installation directory. The program will first try to take port 45870 for its administration interface, so make sure that is free. Open your firewall as appropriate.
"},{"location":"server.html#setting_up_the_client","title":"set up the client","text":"In the services->manage services dialog, add a new 'hydrus server administration service' and set up the basic options as appropriate. If you are running the server on the same computer as the client, its hostname is 'localhost'.
In order to set up the first admin account and an access key, use 'init' as a registration token. This special registration token will only work to initialise this first super-account.
YOU'LL WANT TO SAVE YOUR ACCESS KEY IN A SAFE PLACE
If you lose your admin access key, there is no way to get it back, and if you are not sqlite-proficient, you'll have to restart from the beginning by deleting your server's database files.
If the client can't connect to the server, it is either not running or you have a firewall/port-mapping problem. If you want a quick way to test the server's visibility, just put https://host:port
into your browser (make sure it is https! http will not work)--if it is working, your browser will probably complain about its self-signed https certificate. Once you add a certificate exception, the server should return some simple html identifying itself.
You should have a new submenu, 'administrate services', under 'services', in the client gui. This is where you control most server and service-wide stuff.
admin->your server->manage services lets you add, edit, and delete the services your server runs. Every time you add one, you will also be added as that service's first administrator, and the admin menu will gain a new entry for it.
"},{"location":"server.html#making_accounts","title":"making accounts","text":"Go admin->your service->create new accounts to create new registration tokens. Send the registration tokens to the users you want to give these new accounts. A registration token will only work once, so if you want to give several people the same account, they will have to share the access key amongst themselves once one of them has registered the account. (Or you can register the account yourself and send them all the same access key. Do what you like!)
Go admin->manage account types to add, remove, or edit account types. Make sure everyone has at least downloader (get_data) permissions so they can stay synchronised.
You can create as many accounts of whatever kind you like. Depending on your usage scenario, you may want to have all uploaders, one uploader and many downloaders, or just a single administrator. There are many combinations.
"},{"location":"server.html#have_fun","title":"???","text":"The most important part is to have fun! There are no losers on the INFORMATION SUPERHIGHWAY.
"},{"location":"server.html#profit","title":"profit","text":"I honestly hope you can get some benefit out of my code, whether just as a backup or as part of a far more complex system. Please mail me your comments as I am always keen to make improvements.
"},{"location":"server.html#backing_up","title":"btw, how to backup a repo's db","text":"All of a server's files and options are stored in its accompanying .db file and respective subdirectories, which are created on first startup (just like with the client). To backup or restore, you have two options:
server_install_dir/db/server_backup
. When the operation is complete, you can ftp/batch-copy/whatever the server_backup folder wherever you like.If you get to a point where you can no longer boot the repository, try running SQLite Studio and opening server.db. If the issue is simple--like manually changing the port number--you may be in luck. Send me an email if it is tricky.
Remember that everything is breaking all the time. Make regular backups, and you'll minimise your problems.
"},{"location":"support.html","title":"Financial Support","text":""},{"location":"support.html#support","title":"can I contribute to hydrus development?","text":"I do not expect anything from anyone. I'm amazed and grateful that anyone wants to use my software and share tags with others. I enjoy the feedback and work, and I hope to keep putting completely free weekly releases out as long as there is more to do.
That said, as I have developed the software, several users have kindly offered to contribute money, either as thanks for a specific feature or just in general. I kept putting the thought off, but I eventually got over my hesitance and set something up.
I find the tactics of most internet fundraising very distasteful, especially when they promise something they then fail to deliver. I much prefer the 'if you like me and would like to contribute, then please do, meanwhile I'll keep doing what I do' model. I support several 'put out regular free content' creators on Patreon in this way, and I get a lot out of it, even though I have no direct reward beyond the knowledge that I helped some people do something neat.
If you feel the same way about my work, I've set up a simple Patreon page here. If you can help out, it is deeply appreciated.
"},{"location":"wine.html","title":"running a client or server in wine","text":"Several Linux and macOS users have found success running hydrus with Wine. Here is a post from a Linux dude:
Some things I picked up on after extended use:
Installation process:
If you get the client running in Wine, please let me know how you get on!
"},{"location":"youDontWantTheServer.html","title":"You don't want the server","text":"The hydrus_server.exe/hydrus_server.py is the victim of many a misconception. You don't need to use the server to use Hydrus. The vast majority of features are contained in the client itself so if you're new to Hydrus, just use that.
The server is only really useful for a few specific cases which will not apply for the vast majority of users.
"},{"location":"youDontWantTheServer.html#the_server","title":"The server","text":"The Hydrus server doesn't really work as most people envision a server working. Rather than on-demand viewing, when you link with a Hydrus server, you synchronise a complete copy of all its data. For the tag repository, you download every single tag it has ever been told about. For the file repository, you download the whole file list, related file info, and every single thumbnail, which lets you browse the whole repository in your client in a regular search page--to view files in the media viewer, you need to download and import them specifically.
"},{"location":"youDontWantTheServer.html#you_dont_want_the_server_probably","title":"You don't want the server (probably)","text":"Do you want to remotely view your files? You don't want the server.
Do you want to host your files on another computer since your daily driver don't have a lot of storage space? You don't want the server.
Do you want to use multiple clients and have everything synced between them? You don't want the server.
Do you want to expose API for Hydrus Web, Hydroid, or some other third-party tool? You don't want the server.
Do you want to share some files and/or tags in a small group of friends? You might actually want the server.
"},{"location":"youDontWantTheServer.html#the_options","title":"The options","text":"Now, you're not the first person to have any of the above ideas and some of the thinkers even had enough programming know-how to make something for it. Below is a list of some options, see this page for a few more.
"},{"location":"youDontWantTheServer.html#hydrus_web","title":"Hydrus Web","text":"The hydrus network client is a desktop application written for Anonymous and other internet enthusiasts with large media collections. It organises your files into an internal database and browses them with tags instead of folders, a little like a booru on your desktop. Tags and files can be anonymously shared through custom servers that any user may run. Everything is free, nothing phones home, and the source code is included with the release. It is developed mostly for Windows, but builds for Linux and macOS are available (perhaps with some limitations, depending on your situation).
The software is constantly being improved. I try to put out a new release every Wednesday by 8pm Eastern.
Hydrus supports various filetypes for images, video and audio files, image project files, and more. A full list of supported filetypes is here.
On the Windows and Linux builds, an MPV window is embedded to play video and audio smoothly. For files like pdf, which cannot currently be viewed in the client, it is easy to launch any file with your OS's default program.
The client can download files and parse tags from a number of websites, including by default:
And can be extended to download from more locations using easily shareable user-made downloaders. It can also be set to 'subscribe' to any gallery search, repeating it every few days to keep up with new results.
The program's emphasis is on your freedom. There is no DRM, no spying, no censorship. The program never phones home.
"},{"location":"index.html#start_here","title":"Start Here","text":"If you would like to try hydrus, I strongly recommend you check out the help and getting started guide. It will take you through all the main systems.
"},{"location":"index.html#links","title":"links","text":"Killed
Add the followng line to the end of /etc/sysctl.conf
. You will need admin, so use
sudo nano /etc/sysctl.conf
or sudo gedit /etc/sysctl.conf
vm.min_free_kbytes=1153434\nvm.overcommit_memory=1\n
Check that you have (enough) swap space or you might still run out of memory.
sudo swapon --show\n
If you need swap
sudo fallocate -l 16G /swapfile #make 16GiB of swap\nsudo chmod 600 /swapfile\nsudo mkswap /swapfile\n
Add to /etc/fstab
so your swap is mounted on reboot /swapfile swap swap defaults 0 0\n
You may add as many swapfiles as you like, and should add a new swapfile before you delete an old one if you plan to do so, as unmounting a swapfile will evict its contents back in to real memory. You may also wish to use a swapfile type that uses compression, this saves you some disk space for a little bit of a performance hit, but also significantly saves on mostly empty memory.
Reboot for all changes to take effect, or use sysctl
to set vm
variables.
Linux's memory allocator is lazy and does not perform opportunistic reclaim. This means that the system will continue to give your process memory from the real and virtual memory pool(swap) until there is none left.
Linux will only cleanup if the available total real and virtual memory falls below the watermark as defined in the system control configuration file /etc/sysctl.conf
. The watermark's name is vm.min_free_kbytes
, it is the number of kilobytes the system keeps in reserve, and therefore the maximum amount of memory the system can allocate in one go before needing to reclaim memory it gave eariler but which is no longer in use.
The default value is vm.min_free_kbytes=65536
, which means 66MiB (megabytes).
If for a given request the amount of memory asked to be allocated is under vm.min_free_kbytes
, but this would result in an ammount of total free memory less than vm.min_free_kbytes
then the OS will clean up memory to service the request.
If vm.min_free_kbytes
is less than the ammount requested and there is no virtual memory left, then the system is officially unable to service the request and will lauch the OOMKiller (Out of Memory Killer) to free memory by kiling memory glut processes.
Increase the vm.min_free_kbytes
value to prevent this scenario.
The OOM kill decides which program to kill to reclaim memory, since hydrus loves memory it is usually picked first, even if another program asking for memory caused the OOM condition. Setting the minimum free kilobytes higher will avoid the running of the OOMkiller which is always preferable, and almost always preventable.
"},{"location":"Fixing_Hydrus_Random_Crashes_Under_Linux.html#memory_overcommmit","title":"Memory Overcommmit","text":"We mentioned that Linux will keep giving out memory, but actually it's possible for Linux to launch the OOM killer if it just feel like our program is aking for too much memory too quickly. Since hydrus is a heavyweight scientific processing package we need to turn this feature off. To turn it off change the value of vm.overcommit_memory
which defaults to 2
.
Set vm.overcommit_memory=1
this prevents the OS from using a heuristic and it will just always give memory to anyone who asks for it.
Swapiness is a setting you might have seen, but it only determines Linux's desire to spend a little bit of time moving memory you haven't touched in a while out of real memory and into virtual memory, it will not prevent the OOM condition it just determines how much time to use for moving things into swap.
"},{"location":"Fixing_Hydrus_Random_Crashes_Under_Linux.html#why_does_my_linux_system_studder_or_become_unresponsive_when_hydrus_has_been_running_a_while","title":"Why does my Linux system studder or become unresponsive when hydrus has been running a while?","text":"You are running out of pages because Linux releases I/O buffer pages only when a file is closed. Thus the OS is waiting for you to hit the watermark(as described in \"why is hydrus crashing\") to start freeing pages, which causes the chug. When contents is written from memory to disk the page is retained so that if you reread that part of the disk the OS does not need to access disk it just pulls it from the much faster memory. This is usually a good thing, but Hydrus does not close database files so it eats up pages over time. This is really good for hydrus but sucks for the responsiveness of other apps, and will cause hydrus to consume pages after doing a lengthy operation in anticipation of needing them again, even when it is thereafter idle. You need to set vm.dirtytime_expire_seconds
to a lower value.
vm.dirtytime_expire_seconds
When a lazytime inode is constantly having its pages dirtied, the inode with an updated timestamp will never get chance to be written out. And, if the only thing that has happened on the file system is a dirtytime inode caused by an atime update, a worker will be scheduled to make sure that inode eventually gets pushed out to disk. This tunable is used to define when dirty inode is old enough to be eligible for writeback by the kernel flusher threads. And, it is also used as the interval to wakeup dirtytime writeback thread.
On many distros this happens only once every 12 hours, try setting it close to every one hour or 2. This will cause the OS to drop pages that were written over 1-2 hours ago. Returning them to the free store for use by other programs.
https://www.kernel.org/doc/Documentation/sysctl/vm.txt
"},{"location":"Fixing_Hydrus_Random_Crashes_Under_Linux.html#why_does_everything_become_clunky_for_a_bit_if_i_have_tuned_all_of_the_above_settings","title":"Why does everything become clunky for a bit if I have tuned all of the above settings?","text":"The kernel launches a process called kswapd
to swap and reclaim memory pages, its behaviour is goverened by the following two values
vm.vfs_cache_pressure
The tendancy for the kernel to reclaim I/O cache for files and directories. Default=100, set to 110 to bias the kernel into reclaiming I/O pages over keeping them at a \"fair rate\" compared to other pages. Hydrus tends to write a lot of files and then ignore them for a long time, so its a good idea to prefer freeing pages for infrequent I/O. Note: Increasing vfs_cache_pressure
significantly beyond 100 may have negative performance impact. Reclaim code needs to take various locks to find freeable directory and inode objects. With vfs_cache_pressure=1000
, it will look for ten times more freeable objects than there are.
watermark_scale_factor
This factor controls the aggressiveness of kswapd. It defines the amount of memory left in a node/system before kswapd is woken up and how much memory needs to be free before kswapd goes back to sleep. The unit is in fractions of 10,000. The default value of 10 means the distances between watermarks are 0.1% of the available memory in the node/system. The maximum value is 1000, or 10% of memory. A high rate of threads entering direct reclaim (allocstall) or kswapd going to sleep prematurely (kswapd_low_wmark_hit_quickly) can indicate that the number of free pages kswapd maintains for latency reasons is too small for the allocation bursts occurring in the system. This knob can then be used to tune kswapd aggressiveness accordingly.
I like to keep watermark_scale_factor
at 70 (70/10,000)=0.7%, so kswapd will run until at least 0.7% of system memory has been reclaimed. i.e. If 32GiB (real and virt) of memory, it will try to keep at least 0.224 GiB immediately available.
An example /etc/sysctl.conf section for virtual memory settings.
########\n# virtual memory\n########\n\n#1 always overcommit, prevents the kernel from using a heuristic to decide that a process is bad for asking for a lot of memory at once and killing it.\n#https://www.kernel.org/doc/Documentation/vm/overcommit-accounting\nvm.overcommit_memory=1\n\n#force linux to reclaim pages if under a gigabyte \n#is available so large chunk allocates don't fire off the OOM killer\nvm.min_free_kbytes = 1153434\n\n#Start freeing up pages that have been written but which are in open files, after 2 hours.\n#Allows pages in long lived files to be reclaimed\nvm.dirtytime_expire_seconds = 7200\n\n#Have kswapd try to reclaim .7% = 70/10000 of pages before returning to sleep\n#This increases responsiveness by reclaiming a larger portion of pages in low memory condition\n#So that the next time you make a large allocation the kernel doesn't have to stall and look for pages to free immediately.\nvm.watermark_scale_factor=70\n\n#Have the kernel prefer to reclaim I/O pages at 110% of the rate at which it frees other pages.\n#Don't set this value much over 100 or the kernel will spend all its time reclaiming I/O pages\nvm.vfs_cache_pressure=110\n
"},{"location":"PTR.html","title":"PTR for Dummies","text":"or Myths and facts about the Public Tag Repository
"},{"location":"PTR.html#what_is_the_ptr","title":"What is the PTR?","text":"Short for Public Tag Repository, a now community managed repository of tags. Locally it acts as a tag service, just like my tags
. At the time of writing 54 million files have tags on it. The PTR only store the sha256 hash and tag mappings of a file, not the files themselves or any non-tag meta data. In other words: If you do not see it in the tag list then it is not stored.
Most of the things in this document also applies to self-hosted servers, except for tag guidelines.
"},{"location":"PTR.html#connecting_to_the_ptr","title":"Connecting to the PTR","text":"The easiest method is to use the built in function, found under help -> add the public tag repository
. For adding it manually, if you so desire, read the Hydrus help document on access keys.
Once you are connected, Hydrus will proceed to download and then process the update files. The progress of this can be seen under services -> review services -> remote -> tag repositories -> public tag repository
. Here you can view its status, your account (the default account is a shared public account. Currently only janitors and the administrator have personal accounts), tag status, and how synced you are. Being behind on the sync by a certain amount makes you unable to push tags and petitions until you are caught up again.
QuickSync 2
If you are starting out with a completely fresh client, you can instead download a fully pre-synced client here Though a little out of date, it will nonetheless save time. Some settings may differ from the defaults of an official installation.
"},{"location":"PTR.html#how_does_it_work","title":"How does it work?","text":"For something to end up on the PTR it has to be pushed there. Tags can either be entered into the tag service manually by the user through the manage tags
window, or be routed there by a parser when downloading files. See parsing tags. Once tags have been entered into the PTR tag service they are pending until pushed. This is indicated by the pending ()
that will appear between tags
and help
in the menu bar. Here you can chose to either push your changes to the PTR or discard them.
When making petitions it is important to remember that janitors are only human. We do not necessarily know everything about every niche. We do not necessarily have the files you are making changes for and we will only see a blank thumbnail if we do not have the file. Explain why you are making a petition. Try and keep the number of files manageable. If a janitor at any point is unsure if the petition is correct they are likely to deny the entire petition rather than risk losing good tags. Some users have pushed changes regarding hundreds of tags over thousands of files at once, but due to disregarding PTR tagging practices or being lazy with justification the petition has been denied entirely. Or they have just been plain wrong, trying to impose frankly stupid tagging methods.
Furthermore, if you are two weeks out of sync with PTR you are unable to push additions or deletions until you're back within the threshold.
Q: Does this automagically tag my files? A: No. Until we get machine learning based auto-tagging nothing is truly automatic. All tags on the PTR were uploaded by another user, so if nobody uploaded tags associated with the hash of your file it won't have any tags in the PTR. Q: How good is the PTR at tagging [insert file format or thing from site here]? A: That depends largely on if there's a scrapable database of tags for whatever you're asking about. Anything that comes from a booru or site that supports tags is fairly likely to have something on the PTR. Original content on some obscure chan-style imageboard is less so. Q: Help! My files don't have any tags! What do!? A: As stated above, some things are just very likely to not have any tags. It is also possible that the files have been altered by whichever service you downloaded from. Imgur, Reddit, Discord, and many other sites and services recompress images to save space which might give it a different hash even if it looks indistinguishable from the original file. Use one of the IQDB lookup programs linked in Cuddle's wiki. Q: Why is my database so big!? This can't be right. A: It is working as intended. The size is because you are literally downloading and processing the entire tag database and history of the PTR. It is done this way to ensure redundancy and privacy. Redundancy because anybody with an up-to-date PTR sync can just start their own. Privacy because nobody can tell what files you have since you are downloading the tags for everything the PTR has. Q: Does that mean I can't do anything about the size? A: Correct. There are some plans to crunch the size through a few methods but there are a lot of other far more requested features being, well, requested. Speaking crassly if you are bothered by the size requirement of the PTR you probably don't have a big enough library to really benefit and would be better off just using the IQDB script."},{"location":"PTR.html#janitors","title":"Janitors","text":"Janitors are the people that review petitions. You can meet us at the community Discord to ask questions or see us bitch about some of the silly stuff boorus and users cause to end up in the PTR.
"},{"location":"PTR.html#tag_guidelines","title":"Tag Guidelines","text":"These are a mix of standard practice used by various boorus and changes made by Hydrus Developer and PTR users, ratified by the janitors that actually have to manage all of this. The \"full\" document is viewable at Cuddle's git repo. See Hydrus Developer's thoughts on a public tagging schema.
If you are looking to help out by tagging low tag-count files, remember to keep the tags objective, start simple by for example adding the characters/persons and big obvious things in the image or what else. Tagging every little thing and detail is a sure path to burnout. If you are looking to petition removal of tags then it is preferable to sibling common misspellings, underscores, and defunct tags rather than deleting them outright. The exception is for ambiguous tags where it is better to delete and replace with a less ambiguous tag. When deleting tags that don't belong in the image it can be helpful if you include a short description as to why. It's also helpful if you sanitise downloaded tags from sites with tagged galleries before pushing them to the PTR. For example Pixiv, where you can have a gallery of multiple images, each containing one character, and all of the characters being tagged. Consequently all images in that gallery will have all of the character tags despite no image having more than one character.
"},{"location":"PTR.html#siblings_and_parents","title":"Siblings and parents","text":"When making siblings, go for the closest less-bad tag. Example: bad_tag
-> bad tag
, rather than going for what the top level sibling might be. This creates less potential future work in case standards change and makes it so your request is less likely to be denied by a janitor not being entirely certain that what you're asking is right. Be careful about creating siblings for potentially ambiguous tags. Is james bond
supposed to be character:james bond
or is it series:james bond
? This is a bit of a bad example due to having the case of the character always belonging to the series, so you can safely sibling it to series:james bond
since all instances of the character will also have the series, but not all instances of the series will have the character. So let us look at another example: how about wool
? Is it the material harvested from sheep, or is it the Malaysian artist that likes to draw Touhou? In doubtful cases it's better to leave it as is, petition the tag for deletion if it's incorrect and add the correct tag.
When making parents, make sure it's an always factually correct relationship. character:james bond
always belongs to series:james bond
. But character:james bond
is not always person:pierce brosnan
. Common examples of not-always true relationships: gender (genderbending), species (furrynisation/humanisation/anthropomorphism), hair colour, eye colour, and other mutable traits.
creator:
Used for the creator of the tagged piece of media. Hydrus being primarily used for images it will often be the artist that drew the image. Other potential examples are the author of a book or musician for a song. character:
Refers to characters. James Bond is a character. person:
Refers to real persons. Pierce Brosnan is a person. series:
Used for series. James Bond is a series tag and so is GoldenEye. Due to usage being different on some boorus chance is that you will also see things like Absolut Vodka and other brands in it. photoset:
Used for photosets. Primarily seen for content from idols, cosplayers, and gravure idols. studio:
Is used for the entity that facilitated the production of the file or what's in it. Eon Productions for the James Bond movies. species:
Species of the depicted characters/people/animals. Somewhat controversial for being needlessly detailed, some janitors not liking the namespace at all. Primarily used for furry content. title:
The title of the file. One of the tags Hydrus uses for various purposes such as sorting and collecting. Somewhat tainted by rampant Reddit parsers. medium:
Used for tags about the image and how it's made. Photography, water painting, napkin sketch as a few examples. White background, simple background, checkered background as a few others. What you see about the image. meta:
This namespace is used for information that isn't visible in the image itself or where you might need to go to the source. Some examples include: third-party edit, paid reward (patreon/enty/gumroad/fantia/fanbox), translated, commentary, and such. What you know about the image. Namespaces not listed above are not \"supported\" by the janitors and are liable to get siblinged out, removed, and/or mocked if judged being bad and annoying enough to justify the work. Do not take this to mean that all un-listed namespaces are bad, some are created and used by parsers to indicate where an image came from which can be helpful if somebody else wants to fetch the original or check source tags against the PTR tags. But do exercise some care in what you put on the PTR if you use custom namespaces. Recently clothing:
was removed due to being disliked, no booru using it, and the person(s) pushing for it seeming to have disappeared, leaving a less-than-finished mess behind. It was also rife with lossy siblings and things that just plain don't belong with clothing, such as clothing:brown hair
.
Tuning your database synchronization using the --db_synchronous_override=0
launch argument can make Hydrus significantly faster with some caveats.
--db_synchronous_override=1
on any modern filesystem and this is the default.0
you are gambling, but it is a safe gamble if you have a backup and know exactly what you are doingsync
on *NIX systems, or normal shutdown), orsynchronous=0
, other I/O on your system will slow down as the pending writes are interleaved. Normal shutdown may also take abnormally long because the system is flushing these pending writes, but you must allow it to take its time as explained in the section below.Note: In historical versions of hydrus (synchronous=2
), performance was terrible because hydrus would agressively (it was arguably somewhat paranoid) write changes to disk.
Setting the synchronous to 0 lets the database engine defer writing to disk as long as physically possible. In the normal operation of your system, files are constantly being partially transfered to disk, even if the OS pretends they have been fully written to disk. This is called write cache and it is really important to use it or your system's performance would be terrible. The caveat is that until you have \"synced
\" the disk cache, the changes to files are not actually in permanent storage. One purpose of a normal shutdown of the operating system is to make sure all disk caches have been flushed and synced. A program can also request that a file it has just written to be flushed or synced, and it will wait until that is done before continuing.
When not in synchronous 0 mode, the database engine syncs at regular intervals to make sure data has been written. - Setting synchronous to 0 is generally safe if and only if the system also shuts down normally, allowing any of these pending writes to be flushed. - The database can back out of partial changes if hydrus crashes even if synchronous=0
, so your database will not go corrupt from hydrus shutting down abnormally, only from the system shutting down abnormally.
Programmers are responsible for handling partially written files, but this is tedious for large complex data, so they use a database engine which handles all of this. The database ensures that any partially written data is reversible to a known state (called a rollback).
An existing file may be in 3 possible states:
fflush(FILE)
. fflush()
is called automatically when a programmer closes a file, or exits the program normally(under most runtimes but not for example in Java). If the program exits abnormally before data is flushed it will be lost when the program crashes.fflush()
. When you \"safely shutdown:, you are instructing the OS among other things to sync the flushed files. If someone decides to read a file before it has been synced the OS will read the contents up until the flush from the flush buffer, and return that instead of what is actually on disk. If the OS crashes due to error or power failure, data that are flushed but not synced will be lost.To ensure the consistency of the database and rollback when needed, the database engine keeps a journal of what it is doing. Each transaction ends in a flush
followed by a sync
. The flush ensures that everything written before the flush will occur before the line that indicats the transaction completed. The sync ensures that the entire contents of the transaction has been written to permenant storage before proceeding. The OS is not obligated to write chunks of the database file in the order it recieves them. It only guarantees that if you flush everything before the flush happens first, and everything after happens next.
The sync is what is controlled by the synchronous
switch. Allowing the database to ignore whether sync actually completes is the magic that makes synchronous=0
so dang fast.
Each of these steps are performed in order. Suppose a crash occcured mid writing
When the database resumes it will start scanning the journal at step 1. Since it will reach the end without seeing End Transaction 1
it knows that data was only partialy written, and can put the data back in the state before transaction 1 began. This property of a database is called atomicity in the sense that something atomic is \"indivisible\"; either all of the steps in transaction 1 occur or non of them occur.
Hydrus is structured in such a way that the database is written to to keep track of your file catalog only once the file has been fully imported and moved where it is supposed to be. Thus every action hydrus takes is kept \"atomic\" or \"repeatable\" (redo existing work that was partway through). If hydrus crashes in the middle of importing a file, then when it resumes, as far as it is aware, it didn't even start importing the file. It will repeat the steps from the start until the file catalog is \"consistent\" with what is on disk.
"},{"location":"Understanding_Database_Synchronization.html#where_synchronization_comes_in","title":"Where synchronization comes in","text":"Lets revisit the journal, this time with two transactions. Note that the database is syncing on step 8 and thus will have to wait for the OS to write to disk before proceeding, holding up transaction 2, and any other access to the database.
What happens if we remove step 6 and 8 and then die at step 11?
What if we crash and step, End Transaction
has not been written to disk. Now not only do we need to repeat transaction 2, we also need to repeat transaction 1. Note that this just increaeses the ammount of repeatable work, and actually is fully recoverable (assuming a file you were downloading didn't cease to exist in the interim).
Now what happens if we do the above and the OS crashes? The OS is not obligated to write chunks of the database file in the order you give them to it, in fact for harddrives it is optimal to scatter chunks of the file around the spinning disks so it might arbitrarily reorder your write calls.
END Transaction
is to flush()
END Transaction
was written before doing more changes is to sync()
.Thus if the OS crashes at the exact wrong moment, there is no way to be sure that the journal is correct if flushing was skipped (synchronous=0
). This means there is no way for you to determine whether the database file is correct after a system crash if you had synchronous 0, and you MUST restore your files from backup as this will be the ONLY WAY to know they are in a known good state.
So, setting synchronous=0
gets you a pretty huge speed boost, but you are gambling that everything goes perfectly and will pay the price of a manual restore every time it doesn't.
The Hydrus docs are built with MkDocs using the Material for MkDocs theme. The .md files in the docs
directory are converted into nice html in the help
directory. This is done automatically in the built releases, but if you run from source, you will want to build your own.
To see or work on the docs locally, install mkdocs-material
:
The recommended installation method is pip
:
pip install mkdocs-material\n
"},{"location":"about_docs.html#building","title":"Building","text":"To build the help, run:
mkdocs build -d help\n
In the base hydrus directory (same as the mkdocs.yml
file), which will build it into the help
directory. You will then be good! Repeat the command and MkDocs will clear out the old directory and rebuild it, so you can fold this into any update script.
"},{"location":"about_docs.html#live_preview","title":"Live Preview","text":"To edit the docs
directory, you can run the live preview development server with:
mkdocs serve \n
Again in the base hydrus directory. It will host the help site at http://127.0.0.1:8000/, and when you change a file, it will automatically rebuild and reload the page in your browser.
"},{"location":"access_keys.html","title":"PTR access keys","text":"The PTR is now run by users with more bandwidth than I had to give, so the bandwidth limits are gone! If you would like to talk with the new management, please check the discord.
A guide and schema for the new PTR is here.
"},{"location":"access_keys.html#first_off","title":"first off","text":"I don't like it when programs I use connect anywhere without asking me, so I have purposely not pre-baked any default repositories into the client. You have to choose to connect yourself. The client will never connect anywhere until you tell it to.
For a long time, I ran the Public Tag Repository myself and was the lone janitor. It grew to 650 million tags, and siblings and parents were just getting complicated, and I no longer had the bandwidth or time it deserved. It is now run by users.
There also used to be just one user account that everyone shared. Everyone was essentially the same Anon, and all uploads were merged to that one ID. As the PTR became more popular, and more sophisticated and automatically generated content was being added, it became increasingly difficult for the janitors to separate good submissions from bad and undo large scale mistakes.
That old shared account is now a 'read-only' account. This account can only download--it cannot upload new tags or siblings/parents. Users who want to upload now generate their own individual accounts, which are still Anon, but separate, which helps janitors approve and deny uploaded petitions more accurately and efficiently.
I recommend using the shared read-only account, below, to start with, but if you decide you would like to upload, making your own account is easy--just click the 'check for automatic account creation' button in services->manage services, and you should be good. You can change your access key on an existing service--you don't need to delete and re-add or anything--and your client should quickly resync and recognise your new permissions.
"},{"location":"access_keys.html#privacy","title":"privacy","text":"I have tried very hard to ensure the PTR respects your privacy. Your account is a very barebones thing--all a server stores is a couple of random hexadecimal texts and which rows of content you uploaded, and even the memory of what you uploaded is deleted after a delay. The server obviously needs to be aware of your IP address to accept your network request, but it forgets it as soon as the job is done. Normal users are never told which accounts submitted any content, so the only privacy implications are against janitors or (more realistically, since the janitor UI is even more buggy and feature-poor than the hydrus front-end!) the server owner or anyone else with raw access to the server as it operates or its database files.
Most users should have very few worries about privacy. The general rule is that it is always healthy to use a VPN, but please check here for a full discussion and explanation of the anonymisation routine.
"},{"location":"access_keys.html#ssd","title":"a note on resources","text":"Danger
If you are on an HDD, or your SSD does not have at least 64GB of free space, do not add the PTR!
The PTR has been operating since 2011 and is now huge, more than a billion mappings! Your client will be downloading and indexing them all, which is currently (2021-06) about 6GB of bandwidth and 50GB of hard drive space. It will take hours of total processing time to catch up on all the years of submissions. Furthermore, because of mechanical drive latency, HDDs are too slow to process all the content in reasonable time. Syncing is only recommended if your hydrus db is on an SSD. Even then, it is healthier and allows the client to 'grow into' the PTR if the work is done in small pieces in the background, either during idle time or shutdown time, rather than trying to do it all at once. Just leave it to download and process on its own--it usually takes a couple of weeks to quietly catch up. You'll see tags appear on your files as it proceeds, first on older, then all the way up to new files just uploaded a couple days ago. Once you are synced, the daily processing work to stay synced is usually just a few minutes. If you leave your client on all the time in the background, you'll likely never notice it.
"},{"location":"access_keys.html#easy_setup","title":"easy setup","text":"Hit help->add the public tag repository and you will all be set up.
"},{"location":"access_keys.html#manually","title":"manually","text":"Hit services->manage services and click add->hydrus tag repository. You'll get a panel, fill it out like this:
Here's the info so you can copy it:
address
ptr.hydrus.network\n
port45871\n
access key4a285629721ca442541ef2c15ea17d1f7f7578b0c3f4f5f2a05f8f0ab297786f\n
Note that because this is the public shared key, you can ignore the 'DO NOT SHARE' red text warning.
It is worth checking the 'test address' and 'test access key' buttons just to double-check your firewall and key are all correct. Notice the 'check for automatic account creation' button, for if and when you decide you want to contribute to the PTR.
Then you can check your PTR at any time under services->review services, under the 'remote' tab:
"},{"location":"access_keys.html#quicksync","title":"jump-starting an install","text":"A user kindly manages a store of update files and pre-processed empty client databases to get your synced quicker. This is generally recommended for advanced users or those following a guide, but if you are otherwise interested, please check it out:
https://cuddlebear92.github.io/Quicksync/
"},{"location":"adding_new_downloaders.html","title":"adding new downloaders","text":""},{"location":"adding_new_downloaders.html#anonymous","title":"all downloaders are user-creatable and -shareable","text":"Since the big downloader overhaul, all downloaders can be created, edited, and shared by any user. Creating one from scratch is not simple, and it takes a little technical knowledge, but importing what someone else has created is easy.
Hydrus objects like downloaders can sometimes be shared as data encoded into png files, like this:
This contains all the information needed for a client to add a realbooru tag search entry to the list you select from when you start a new download or subscription.
You can get these pngs from anyone who has experience in the downloader system. An archive is maintained here.
To 'add' the easy-import pngs to your client, hit network->downloaders->import downloaders. A little image-panel will appear onto which you can drag-and-drop these png files. The client will then decode and go through the png, looking for interesting new objects and automatically import and link them up without you having to do any more. Your only further input on your end is a 'does this look correct?' check right before the actual import, just to make sure there isn't some mistake or other glaring problem.
Objects imported this way will take precedence over existing functionality, so if one of your downloaders breaks due to a site change, importing a fixed png here will overwrite the broken entries and become the new default.
"},{"location":"advanced.html","title":"general clever tricks","text":"this is non-comprehensive
I am always changing and adding little things. The best way to learn is just to look around. If you think a shortcut should probably do something, try it out! If you can't find something, let me know and I'll try to add it!
"},{"location":"advanced.html#advanced_mode","title":"advanced mode","text":"To avoid confusing clutter, several advanced menu items and buttons are hidden by default. When you are comfortable with the program, hit help->advanced mode to reveal them!
"},{"location":"advanced.html#exclude_deleted_files","title":"exclude deleted files","text":"In the client's options is a checkbox to exclude deleted files. It recurs pretty much anywhere you can import, under 'import file options'. If you select this, any file you ever deleted will be excluded from all future remote searches and import operations. This can stop you from importing/downloading and filtering out the same bad files several times over. The default is off. You may wish to have it set one way most of the time, but switch it the other just for one specific import or search.
"},{"location":"advanced.html#ime","title":"inputting non-english lanuages","text":"If you typically use an IME to input Japanese or another non-english language, you may have encountered problems entering into the autocomplete tag entry control in that you need Up/Down/Enter to navigate the IME, but the autocomplete steals those key presses away to navigate the list of results. To fix this, press Insert to temporarily disable the autocomplete's key event capture. The autocomplete text box will change colour to let you know it has released its normal key capture. Use your IME to get the text you want, then hit Insert again to restore the autocomplete to normal behaviour.
"},{"location":"advanced.html#tag_display","title":"tag display","text":"If you do not like a particular tag or namespace, you can easily hide it with tags->manage tag display and search:
This image is out of date, sorry!
You can exclude single tags, like as shown above, or entire namespaces (enter the colon, like 'species:'), or all namespaced tags (use ':'), or all unnamespaced tags (''). 'all known tags' will be applied to everything, as well as any repository-specific rules you set.
A blacklist excludes whatever is listed; a whitelist excludes whatever is not listed.
This censorship is local to your client. No one else will experience your changes or know what you have censored.
"},{"location":"advanced.html#importing_with_tags","title":"importing and adding tags at the same time","text":"Add tags before importing on file->import files lets you give tags to the files you import en masse, and intelligently, using regexes that parse filename:
This should be somewhat self-explanatory to anyone familiar with regexes. I hate them, personally, but I recognise they are powerful and exactly the right tool to use in this case. This is a good introduction.
Once you are done, you'll get something neat like this:
Which you can more easily manage by collecting:
Collections have a small icon in the bottom left corner. Selecting them actually selects many files (see the status bar), and performing an action on them (like archiving, uploading) will do so to every file in the collection. Viewing collections fullscreen pages through their contents just like an uncollected search.
Here is a particularly zoomed out view, after importing volume 2:
Importing with tags is great for long-running series with well-formatted filenames, and will save you literally hours' finicky tagging.
"},{"location":"advanced.html#tag_migration","title":"tag migration","text":"Danger
At some point I will write some better help for this system, which is powerful. Be careful with it!
Sometimes, you may wish to move thousands or millions of tags from one place to another. These actions are now collected in one place: services->tag migration.
It proceeds from left to right, reading data from the source and applying it to the destination with the certain action. There are multiple filters available to select which sorts of tag mappings or siblings or parents will be selected from the source. The source and destination can be the same, for instance if you wanted to delete all 'clothing:' tags from a service, you would pull all those tags and then apply the 'delete' action on the same service.
You can import from and export to Hydrus Tag Archives (HTAs), which are external, portable .db files. In this way, you can move millions of tags between two hydrus clients, or share with a friend, or import from an HTA put together from a website scrape.
Tag Migration is a powerful system. Be very careful with it. Do small experiments before starting large jobs, and if you intend to migrate millions of tags, make a backup of your db beforehand, just in case it goes wrong.
This system was once much more simple, but it still had HTA support. If you wish to play around with some HTAs, there are some old user-created ones here.
"},{"location":"advanced.html#shortcuts","title":"custom shortcuts","text":"Once you are comfortable with manually setting tags and ratings, you may be interested in setting some shortcuts to do it quicker. Try hitting file->shortcuts or clicking the keyboard icon on any media viewer window's top hover window.
There are two kinds of shortcuts in the program--reserved, which have fixed names, are undeletable, and are always active in certain contexts (related to their name), and custom, which you create and name and edit and are only active in a media viewer when you want them to. You can redefine some simple shortcut commands, but most importantly, you can create shortcuts for adding/removing a tag or setting/unsetting a rating.
Use the same 'keyboard' icon to set the current and default custom shortcuts.
"},{"location":"advanced.html#finding_duplicates","title":"finding duplicates","text":"system:similar_to lets you run the duplicates processing page's searches manually. You can either insert the hash and hamming distance manually, or you can launch these searches automatically from the thumbnail right-click->find similar files menu. For example:
"},{"location":"advanced.html#file_import_errors","title":"truncated/malformed file import errors","text":"Some files, even though they seem ok in another program, will not import to hydrus. This is usually because they file has some 'truncated' or broken data, probably due to a bad upload or storage at some point in its internet history. While sophisticated external programs can usually patch the error (often rendering the bottom lines of a jpeg as grey, for instance), hydrus is not so clever. Please feel free to send or link me, hydrus developer, to these files, so I can check them out on my end and try to fix support.
If the file is one you particularly care about, the easiest solution is to open it in photoshop or gimp and save it again. Those programs should be clever enough to parse the file's weirdness, and then make a nice clean saved file when it exports. That new file should be importable to hydrus.
"},{"location":"advanced.html#password","title":"setting a password","text":"the client offers a very simple password system, enough to keep out noobs. You can set it at database->set a password. It will thereafter ask for the password every time you start the program, and will not open without it. However none of the database is encrypted, and someone with enough enthusiasm or a tool and access to your computer can still very easily see what files you have. The password is mainly to stop idle snoops checking your images if you are away from your machine.
"},{"location":"advanced_multiple_local_file_services.html","title":"multiple local file services","text":"The client lets you store your files in different overlapping partitions. This can help management workflows and privacy.
"},{"location":"advanced_multiple_local_file_services.html#the_problem","title":"what's the problem?","text":"Most of us end up storing all sorts of things in our clients, often from different parts of our lives. With everything in the same 'my files' domain, some personal photos might be sitting right beside nsfw content, a bunch of wallpapers, and thousands of comic pages. Different processing jobs, like 'go through those old vidya screenshots I imported' and 'filter my subscription files' and 'load up my favourite pictures of babes' all operate on the same gigantic list of files and must be defined through careful queries of tags, ratings, and other file metadata to separate what you want from what you don't.
The problem is aggravated the larger your client grows. When you are trying to sift the 500 art reference images out 850,000 random internet files from the last ten years, it can be difficult getting good tag counts or just generally browsing around without stumbling across other content. This particularly matters when you are typing in search tags, since the tag you want, 'anatomy drawing guide', is going to come with thousands of others, starting 'a...', 'an...', and 'ana...' as you type. If someone is looking over your shoulder as you load up the images, you want to preserve your privacy.
Wouldn't it be nice if you could break your collection into separate areas?
"},{"location":"advanced_multiple_local_file_services.html#file_domains","title":"multiple file domains","text":"tl;dr: you can have more than one 'my files', add them in 'manage services'.
A file domain (or file service) in the hydrus context, is, very simply, a list of files. There is a bit of extra metadata like the time each file was imported to the domain, and a ton of behind the scenes calculation to accelerate searching and aggregate autocomplete tag counts and so on, but overall, when you search in 'my files', you are telling the client \"find all the files in this list that have tag x, y, z on any tag domain\". If you switch to searching 'trash', you are then searching that list of trashed files.
A search page's tag domain is similar. Normally, you will be set to 'all known tags', which is basically the union of all your tag services, but if you need to, you can search just 'my tags' or 'PTR', which will make your search \"find all the files in my files that have tag x, y, z on my tags\". You are setting up an intersection of a file and a tag domain.
Changing the tag domain to 'PTR' or 'all known tags' would make for a different blue circle with a different intersection of search results ('PTR' probably has a lot more 'pretty dress', although maybe not for your files, and 'all known tags', being the union of all the blue circles, will make the same or larger intersection).
This idea of dynamically intersecting domains is very important to hydrus. Each service stands on its own, and the 'my tags' domain is not linked to 'my files'. It does not care where its tagged files are. When you delete a file, no tags are changed. But when you delete a file, the 'file domain' circle will shrink, and that may change the search results in the intersection.
With multiple local file services, you can create new file lists beyond 'my files', letting you make different red circles. You can move and copy files between your local file domains to make new sub-collections and search them separately for a very effective filter.
You can add and remove them under services->manage services:
"},{"location":"advanced_multiple_local_file_services.html#sfw","title":"what does this actually mean?","text":"I think the best simple idea for most regular users is to try a sfw/nsfw split. Make a new 'sfw' local file domain and start adding some images to it. You might eventualy plan to send all your sfw images there, or just your 'IRL' stuff like family photos, but it will be a separate area for whitelisted safe content you are definitely happy for others to glance at.
Search up some appropriate images in your collection and then add them to 'sfw':
This 'add' command is a copy. The files stay in 'my files', but they also go to 'sfw'. You still only have one file on your hard drive, but the database has its identifier in both file lists. Now make a new search page, switch it to 'sfw', and try typing in a search.
The tag results are limited to the files we added to 'sfw'. Nothing from 'my files' bleeds over. The same is true of a file search. Note the times the file was added to 'my files' and 'sfw' are both tracked.
Also note that these files now have two 'delete' commands. You will be presented with more complicated delete and undelete dialogs for files in multiple services. Files only end up in the trash when they are no longer in any local file domain.
You can be happy that any search in this new domain--for tags or files--is not going to provide any unexpected surprises. You can also do 'system:everything', 'system:limit=64' for a random sample, or any other simple search predicate for browsing, and the search should run fast and safe.
If you want to try multiple local file services out, I recommend this split to start off. If you don't like it, you can delete 'sfw' later with no harm done.
Note
While 'add to y' copies the files, 'move from x to y' deletes the files from the original location. They get a delete timestamp (\"deleted from my files 5 minutes ago\"), and they can be undeleted or 'added' back, and they will get their old import timestamp back.
"},{"location":"advanced_multiple_local_file_services.html#using_it","title":"using it","text":"The main way to add and move files around is the thumbnail/media viewer right-click menu.
You can make shortcuts for the add/move operations too. Check file->shortcuts and then the 'media actions' set.
In the future, I expect to have more ways to move files around, particularly integration into the archive/delete filter, and ideally a 'file migration' system that will allow larger operations such as 'add all the files in search x to place y'.
I also expect to write a system to easily merge clients together. Several users already run several different clients to get their 'my files' separation (e.g. a sfw client and a nsfw client), and now we have this tech supported in one client, it makes a lot of efficiency sense to merge them together.
Note that when you select a file domain, you can select 'multiple locations'. This provides the union of whichever domains you like. Tag counts will be correct but imprecise, often something like 'blonde hair (2-5)', meaning 'between two and five files', due to the complexity of quickly counting within these complicated domains.
As soon as you add another local file service, you will also see a 'all my files' service listed in the file domain selector. This is a virtual service that provides a very efficient and accurate search space of the union of all your local file domains.
This whole system is new. I will keep working on it, including better 'at a glance' indications of which files are where (current thoughts are custom thumbnail border colours and little indicator icons). Let me know how you get on with it!
"},{"location":"advanced_multiple_local_file_services.html#meta_file_domains","title":"advanced: a word on the meta file domains","text":"If you are in help->advanced mode, your file search file domain selectors will see 'all known files'. This domain is similar to 'all known tags', but it is not useful for normal browsing. It represents not filtering your tag services by any file list, fetching all tagged file results regardless of what your client knows about them.
If you search 'all known files'/'PTR', you can search all the files the PTR knows about, the vast majority of which you will likely never import. The client will show these files with a default hydrus thumbnail and offer very limited information about them. For file searches, this search domain is only useful for debug and janitorial purposes. You cannot combine 'all known files' with 'all known tags'. It also has limited sibling/parent support.
You can search for deleted files under 'multiple domains' too. These may or may not still be in your client, so they might get the hydrus icon again. You won't need to do this much, but it can be super useful for some maintenance operations like 'I know I deleted this file by accident, what was its URL so I can find it again?'.
Another service is 'all local files'. This is a larger version of 'all my files'. It essentially means 'all the files on your hard disk', which strictly means the union of all the files in your local file domains ('my files' and any others you create, i.e. the 'all my files' domain), 'repository updates' (which stores update files for hydrus repository sync), and 'trash'. This search can be useful for some advanced maintenance jobs.
If you select 'repository updates' specifically, you can inspect this advanced domain, but I recommend you not touch it! Otherwise, if you search 'all local files', repository files are usually hidden from view.
Your client looks a bit like this:
graph TB\n A[all local files] --- B[repository updates]\n A --- C[all my files]\n C --- D[local file domains]\n A --- E[trash]
Repository files, your media, and the trash are actually mutually exclusive. When a file is imported, it is added to 'all local files' and either repository updates or 'all my files' and one or more local file domains. When it is deleted from all of those, it is taken from 'all my files' and moved to trash. When trashed files are cleared, the files are removed from 'trash' and then 'all local files' and thus your hard disk.
"},{"location":"advanced_multiple_local_file_services.html#advanced","title":"more advanced usage","text":"Warning
Careful! It is easy to construct a massively overcomplicated Mind Palace here that won't actually help you due to the weight of overhead. If you want to categorise things, tags are generally better. But if you do want strict search separations for speed, workflow, or privacy, try this out.
If you put your files through several layers of processing, such as inbox/archive->tags->rating
, it might be helpful to create different file domains for each step. I have seen a couple of proposals like this that I think make sense:
graph LR\n A[inbox] --> B[sfw processing]\n A --> C[nsfw processing]\n B --> D[sfw archive]\n C --> E[nsfw archive]
Where the idea would be to make the 'is this sfw/nsfw?' choice early, probably at the same time as archive/delete, and splitting files off to either side before doing tagging and rating. I expect to expand the 'archive/delete' filter to support more actions soon to help make these workflows easy.
File Import Options allows you to specify which service it will import to. You can even import to multiple, although that is probably a bit much. If your inbox filters are overwhelming you--or each other--you might like to have more than one 'landing zone' for your files:
graph LR\n A[subscription and gallery inbox] --> B[archive]\n B --- C[sfw]\n D[watcher inbox] --> B\n E[hard drive inbox] --> B\n F[that zip of cool architecture photos] --> C
Some users have floated the idea of storing your archive on one drive and the inbox on another. This makes a lot of sense for network storage situations--the new inbox could be on a local disk, but the less-accessed archive on cheap network storage. File domains would be a great way to manage this in future, turning the workflow into nice storage commands.
Another likely use of this in future is in the Client API, when sharing with others. If you were to put the files you wanted to share in a file domain, and the Client API were set up to search just on that domain, this would guarantee great privacy. I am still thinking about this, and it may ultimately end up just being something that works that way behind the scenes.
graph LR\n A[inbox] --> B[19th century fishman conspiracy theory evidence]\n A --> C[the mlp x sonic hyperplex]\n A --> D[extremely detailed drawings of hands and feet]\n A --> E[normal stuff]\n E --- F[share with dave]
"},{"location":"advanced_parents.html","title":"tag parents","text":"Tag parents let you automatically add a particular tag every time another tag is added. The relationship will also apply retroactively.
"},{"location":"advanced_parents.html#the_problem","title":"what's the problem?","text":"Tags often fall into certain heirarchies. Certain tags always imply other tags, and it is annoying and time-consuming to type them all out individually every time.
As a basic example, a car
is a vehicle
. It is a subset. Any time you see a car, you also see a vehicle. Similarly, a rifle
is a firearm
, face tattoo
implies tattoo
, and species:pikachu
implies species:pok\u00e9mon
which also implies series:pok\u00e9mon
.
Another way of thinking about this is considering what you would expect to see when you search these terms. If you search vehicle
, you would expect the result to include all cars
. If you search series:league of legends
, you would expect to see all instances of character:ahri
(even if, on rare occasion, she were just appearing in cameo or in a crossover).
For hydrus terms, character x is in series y
is a common relationship, as is costume x is of character y
:
graph TB\n C[series:metroid] --- B[character:samus aran] --- A[character:zero suit samus]
In this instance, anything with character:zero suit samus
would also have character:samus aran
. Anything with character:samus aran
(and thus anything with character:zero suit samus
) would have series:metroid
.
Remember that the reverse is not true. Samus comes inextricably from Metroid, but not everything Metroid is Samus (e.g. a picture of just Ridley).
Even a small slice of these relationships can get complicated:
graph TB\n A[studio:blizzard entertainment]\n A --- B[series:overwatch]\n B --- B1[character:dr. angela 'mercy' ziegler]\n B1 --- B1b[character:pink mercy]\n B1 --- B1c[character:witch mercy]\n B --- B2[character:hana 'd.va' song]\n B2 --- B2b[\"character:d.va (gremlin)\"]\n A --- C[series:world of warcraft]\n C --- C1[character:jaina proudmoore]\n C1 --- C1a[character:dreadlord jaina]\n C --- C2[character:sylvanas windrunner]
Some franchises are bananas:
Also, unlike siblings, which as we previously saw are n->1
, some tags have more than one implication (n->n
):
graph TB\n A[adjusting clothes] --- B[adjusting swimsuit]\n C[swimsuit] --- B
adjusting swimsuit
implies both a swimsuit
and adjusting clothes
. Consider how adjusting bikini
might fit on this chart--perhaps this:
graph TB\n A[adjusting clothes] --- B[adjusting swimsuit]\n A --- E[adjusting bikini]\n C[swimsuit] --- B\n F[bikini] --- E\n D[swimwear] --- C\n D --- F
Note this is not a loop--like with siblings, loops are not allowed--this is a family tree with three 'generations'. adjusting bikini
is a child to both bikini
and adjusting clothes
, and bikini
is a child to the new swimwear
, which is also a parent to swimsuit
. adjusting bikini
and adjusting swimsuit
are both grandchildren to swimwear
.
This can obviously get as complicated and over-engineered as you like, but be careful of being too confident. Reasonable people disagree on what is 'clearly' a parent or sibling, or what is an excessive level of detail (e.g. person:scarlett johansson
may be gender:female
, if you think that useful, but species:human
, species:mammal
, and species:animal
may be going a little far). Beyond its own intellectual neatness, ask yourself the purpose of what you are creating.
Of course you can create any sort of parent tags on your local tags or your own tag repositories, but this sort of thing can easily lead to arguments between reasonable people on a shared server like the PTR.
Just like with normal tags, try not to create anything 'perfect' or stray away from what you actually search with, as it usually ends up wasting time. Act from need, not toward purpose.
"},{"location":"advanced_parents.html#tag_parents","title":"tag parents","text":"Let's define the child-parent relationship 'C->P' as saying that tag P is the semantic superset/superclass of tag C. All files that have C should also have P, without exception.
Any file that has C should appear to have P. Any search for P will include all of C implicitly.
Tags can have multiple parents, and multiple tags have the same parent. Loops are not allowed.
Note
In hydrus, tag parents are virtual. P is not actually added to every file by C, it just appears as if it is. When you look at a file in manage tags, you will see the implication, just like you see how tags will be renamed by siblings, but you won't see the parent unless it actually happens to also be there as a 'hard' tag. If you remove a C->P
parent relationship, all the implied P tags will disappear!
It also takes a bunch of CPU to figure this stuff out. Please bear with this system, sometimes it can take time.
"},{"location":"advanced_parents.html#how_to_do_it","title":"how you do it","text":"Go to tags->manage tag parents:
Which looks and works just like the manage tag siblings dialog.
Note that when you hit ok, the client will look up all the files with all your added tag Cs and retroactively apply/pend the respective tag Ps if needed. This could mean thousands of tags!
Once you have some relationships added, the parents and grandparents will show indented anywhere you 'write' tags, such as the manage tags dialog:
"},{"location":"advanced_parents.html#remote_parents","title":"remote parents","text":"Whenever you add or remove a tag parent pair to a tag repository, you will have to supply a reason (like when you petition a tag). A janitor will review this petition, and will approve or deny it. If it is approved, all users who synchronise with that tag repository will gain that parent pair. If it is denied, only you will see it.
"},{"location":"advanced_parents.html#parent_favourites","title":"parent 'favourites'","text":"As you use the client, you will likely make several processing workflows to archive/delete your different sorts of imports. You don't always want to go through things randomly--you might want to do some big videos for a bit, or focus on a particular character. A common search page is something like [system:inbox, creator:blah, limit:256]
, which will show a sample of a creator in your inbox, so you can process just that creator. This is easy to set up and save in your favourite searches and quick to run, so you can load it up, do some archive/delete, and then dismiss it without too much hassle.
But what happens if you want to search for multiple creators? You might be tempted to make a large OR search predicate, like creator:aaa OR creator:bbb OR creator:ccc OR creator:ddd
, of all your favourite creators so you can process them together as a 'premium' group. But if you want to add or remove a creator from that long OR, it can be cumbersome. And OR searches can just run slow sometimes. One answer is to use the new tag parents tools to apply a 'favourite' parent on all the artists and then search for that favourite.
Let's assume you want to search bunch of 'creator' tags on the PTR. What you will do is:
Under tags->manage tag parents, on your 'my parent favourites' service, add:
creator:aaa->favourite:aesthetic art
creator:bbb->favourite:aesthetic art
creator:ccc->favourite:aesthetic art
creator:ddd->favourite:aesthetic art
Watch/wait a few seconds for the parents to apply across the PTR for those creator tags.
Then save a new favourite search of [system:inbox, favourite:aesthetic art, limit:256]
. This search will deliver results with any of the child 'creator' tags, just like a big OR search, and real fast!
If you want to add or remove any creators to the 'aesthetic art' group, you can simply go back to tags->manage tag parents, and it will apply everywhere. You can create more umbrella/group tags if you like (and not just creators--think about clothing, or certain characters), and also use them in regular searches when you just want to browse some cool files.
"},{"location":"advanced_siblings.html","title":"tag siblings","text":"Tag siblings let you replace a bad tag with a better tag.
"},{"location":"advanced_siblings.html#the_problem","title":"what's the problem?","text":"Reasonable people often use different words for the same things.
A great example is in Japanese names, which are natively written surname first. character:ayanami rei
and character:rei ayanami
have the same meaning, but different users will use one, or the other, or even both.
Other examples are tiny syntactic changes, common misspellings, and unique acronyms:
A particular repository may have a preferred standard, but it is not easy to guarantee that all the users will know exactly which tag to upload or search for.
After some time, you get this:
Without continual intervention by janitors or other experienced users to make sure y\u2287x (i.e. making the yellow circle entirely overlap the blue by manually giving y to everything with x), searches can only return x (blue circle) or y (yellow circle) or x\u2229y (the lens-shaped overlap). What we really want is x\u222ay (both circles).
So, how do we fix this problem?
"},{"location":"advanced_siblings.html#tag_siblings","title":"tag siblings","text":"Let's define a relationship, A->B, that means that any time we would normally see or use tag A or tag B, we will instead only get tag B:
Note that this relationship implies that B is in some way 'better' than A.
"},{"location":"advanced_siblings.html#more_complicated","title":"ok, I understand; now confuse me","text":"This relationship is transitive, which means as well as saying A->B
, you can also say B->C
, which implies A->C
and B->C
.
graph LR\n A[lena_oxton] --> B[lena oxton] --> C[character:tracer];
In this case, everything with 'lena_oxton' or 'lena oxton' will show 'character:tracer' instead.
You can also have an A->C
and B->C
that does not include A->B
.
graph LR\n A[d.va] --> C[character:hana 'd.va' song]\n B[hana song] --> C
The outcome of these two arrangements is the same--everything ends up as C.
Many complicated arrangements are possible (and inevitable, as we try to merge many different communities' ideal tags):
graph LR\n A[angela_ziegler] --> B[angela ziegler] --> I[character:dr. angela 'mercy' ziegler]\n C[\"angela_ziegler_(overwatch)\"] --> B\n D[character:mercy] --> I\n E[\"character:mercy (overwatch)\"] --> I\n F[dr angela ziegler] --> I\n G[\"character:\u30de\u30fc\u30b7\u30fc\uff08\u30aa\u30fc\u30d0\u30fc\u30a6\u30a9\u30c3\u30c1\uff09\"] --> E\n H[overwatch mercy] --> I
Note that if you say A->B
, you cannot also say A->C
. This is an n->1
relationship. Many things can point to a single ideal, but a tag cannot have more than one ideal. Also, obviously, these graphs are non-cyclic--no loops.
Just open tags->manage tag siblings, and add a few.
The client will automatically collapse the tagspace to whatever you set. It'll even work with autocomplete, like so:
Please note that siblings' autocomplete counts may be slightly inaccurate, as unioning the count is difficult to quickly estimate.
The client will not collapse siblings anywhere you 'write' tags, such as the manage tags dialog. You will be able to add or remove A as normal, but it will be written in some form of \"A (B)\" to let you know that, ultimately, the tag will end up displaying in the main gui as B:
Although the client may present A as B, it will secretly remember A! You can remove the association A->B, and everything will return to how it was. No information is lost at any point.
"},{"location":"advanced_siblings.html#remote_siblings","title":"remote siblings","text":"Whenever you add or remove a tag sibling pair to a tag repository, you will have to supply a reason (like when you petition a tag). A janitor will review this petition, and will approve or deny it. If it is approved, all users who synchronise with that tag repository will gain that sibling pair. If it is denied, only you will see it.
"},{"location":"advanced_sidecars.html","title":"sidecars","text":"Sidecars are files that provide additional metadata about a master file. They typically share the same basic filename--if the master is 'Image_123456.jpg', the sidecar will be something like 'Image_123456.txt' or 'Image_123456.jpg.json'. This obviously makes it easy to figure out which sidecar goes with which file.
Hydrus does not use sidecars in its own storage, but it can import data from them and export data to them. It currently supports raw data in .txt files and encoded data in .json files, and that data can be either tags or URLs. I expect to extend this system in future to support XML and other metadata types such as ratings, timestamps, and inbox/archive status.
We'll start with .txt, since they are simpler.
"},{"location":"advanced_sidecars.html#importing_sidecars","title":"Importing Sidecars","text":"Imagine you have some jpegs you downloaded with another program. That program grabbed the files' tags somehow, and you want to import the files with their tags without messing around with the Client API.
If your extra program can export the tags to a simple format--let's say newline-separated .txt files with the same basic filename as the jpegs, or you can, with some very simple scripting, convert to that format--then importing them to hydrus is easy!
Put the jpegs and the .txt files in the same directory and then drag and drop the directory onto the client, as you would for a normal import. The .txt files should not be added to the list. Then click 'add tags/urls with the import'. The sidecars are managed on one of the tabs:
This system can get quite complicated, but the essential idea is that you are selecting one or more sidecar sources
, parsing their text, and sending that list of data to one hydrus service destination
. Most of the time you will be pulling from just one sidecar at a time.
The source
is a description of a sidecar to load and how to read what it contains.
In this example, the texts are like so:
4e01850417d1978e6328d4f40c3b550ef582f8558539b4ad46a1cb7650a2e10b.jpg.txtflowers\nlandscape\nblue sky\n
5e390f043321de57cb40fd7ca7cf0cfca29831670bd4ad71622226bc0a057876.jpg.txtfast car\nanime girl\nnight sky\n
Since our sidecars in this example are named (filename.ext).txt, and use newlines as the separator character, we can leave things mostly as default.
If you do not have newline-separated tags, for instance comma-separated tags (flowers, landscape, blue sky
), then you can set that here. Be careful if you are making your own sidecars, since any separator character obviously cannot be used in tag text!
If your sidecars are named (filename).txt instead of (filename.ext).txt, then just hit the checkbox, but if the conversion is more complicated, then play around with the filename string converter and the test boxes.
If you need to, you can further process the texts that are loaded. They'll be trimmed of extra whitespace and so on automatically, so no need to worry about that, but if you need to, let's say, add the creator:
prefix to everything, or filter out some mis-parsed garbage, this is the place.
A 'Router' is a single set of orders to grab from one or more sidecars and send to a destination. You can have several routers in a single import or export context.
You can do more string processing here, and it will apply to everything loaded from every sidecar.
The destination is either a tag service (adding the loaded strings as tags), or your known URLs store.
"},{"location":"advanced_sidecars.html#previewing","title":"Previewing","text":"Once you have something set up, you can see the results are live-loaded in the dialog. Make sure everything looks all correct, and then start the import as normal and you should see the tags or URLs being added as the import works.
It is good to try out some simple situations with one or two files just to get a feel for the system.
"},{"location":"advanced_sidecars.html#import_folders","title":"Import Folders","text":"If you have a constant flow of sidecar-attached media, then you can add sidecars to Import Folders too. Do a trial-run of anything you want to parse with a manual import before setting up the automatic system.
"},{"location":"advanced_sidecars.html#exporting_sidecars","title":"Exporting Sidecars","text":"The rules for exporting are similar, but now you are pulling from one or more hydrus service sources
and sending to a single destination
sidecar every time. Let's look at the UI:
I have chosen to select these files' URLs and send them to newline-separated .urls.txt files. If I wanted to get the tags too, I could pull from one or more tag services, filter and convert the tags as needed, and then output to a .tags.txt file.
The best way to learn with this is just to experiment. The UI may seem intimidating, but most jobs don't need you to work with multiple sidecars or string processing or clever filenames.
"},{"location":"advanced_sidecars.html#json_files","title":"JSON Files","text":"JSON is more complicated than .txt. You might have multiple metadata types all together in one file, so you may end up setting up multiple routers that parse the same file for different content, or for an export you might want to populate the same export file with multiple kinds of content. Hydrus can do it!
"},{"location":"advanced_sidecars.html#importing","title":"Importing","text":"Since JSON files are richly structured, we will have to dip into the Hydrus parsing system:
If you have made a downloader before, you will be familiar with this. If not, then you can brave the help or just have a play around with the UI. In this example, I am getting the URL(s) of each JSON file, which are stored in a list under the file_info_urls
key.
It is important to paste an example JSON file that you want to parse into the parsing testing area (click the paste button) so you can test on read data live.
Once you have the parsing set up, the rest of the sidecar UI is the same as for .txt. The JSON Parsing formula is just the replacement/equivalent for the .txt 'separator' setting.
Note that you could set up a second Router to import the tags from this file!
"},{"location":"advanced_sidecars.html#exporting","title":"Exporting","text":"In Hydrus, the exported JSON is typically a nested Object with a similar format as in the Import example. You set the names of the Object keys.
Here I have set the URLs of each file to be stored under metadata->urls
, which will make this sort of structure:
{\n \"metadata\" : {\n \"urls\" : [\n \"http://example.com/123456\",\n \"https://site.org/post/45678\"\n ]\n }\n}\n
The cool thing about JSON files is I can export multiple times to the same file and it will update it! Lets say I made a second Router that grabbed the tags, and it was set to export to the same filename but under metadata->tags
. The final sidecar would look like this:
{\n \"metadata\" : {\n \"tags\" : [\n \"blonde hair\",\n \"blue eyes\",\n \"skirt\"\n ],\n \"urls\" : [\n \"http://example.com/123456\",\n \"https://site.org/post/45678\"\n ]\n }\n}\n
You should be careful that the location you are exporting to does not have any old JSON files with conflicting filenames in it--hydrus will update them, not overwrite them! This may be an issue if you have an synchronising Export Folder that exports random files with the same filenames.
"},{"location":"advanced_sidecars.html#note_on_notes","title":"Note on Notes","text":"You can now import/export notes with your sidecars. Since notes have two variables--name and text--but the sidecars system only supports lists of single strings, I merge these together! If you export notes, they will output in the form 'name: text'. If you want to import notes, arrange them in the same form, 'name: text'.
If you do need to select a particular note out of many, see if a String Match (regex ^name:
) in the String Processor will do it.
If you need to work with multiple notes that have newlines, I recommend you use JSON rather than txt. If you have to use txt on multiple multi-paragraph-notes, then try a different separator than newline. Go for ||||
or something, whatever works for your job.
Depending on how awkward this all is, I may revise it.
"},{"location":"after_disaster.html","title":"Recovering After Disaster","text":""},{"location":"after_disaster.html#you_just_had_a_database_problem","title":"you just had a database problem","text":"I have helped quite a few users recover a mangled database from disk failure or accidental deletion. You just had similar and have been pointed here. This is a simple spiel on the next step that I, hydev, like to give people once we are done.
"},{"location":"after_disaster.html#what_next","title":"what next?","text":"When I was younger, I lost a disk with about 75,000 curated files. It really sucks to go through, and whether you have only had a brush with death or lost tens or hundreds of thousands of files, I know exactly how you have been feeling. The only thing you can change now is the future. Let's make sure it does not happen again.
The good news is the memory of that sinking 'oh shit' feeling is a great motivator. You don't want to feel that way again, so use that to set up and maintain a proper backup regime. If you have a good backup, the worst case scenario, even if your whole computer blows up, is usually just a week's lost work.
So, plan to get a good external USB drive and figure out a backup script and a reminder to ensure you never forget to run it. Having a 'backup day' in your schedule works well, and you can fold in other jobs like computer updates and restarts at the same time. It takes a bit of extra 'computer budget' every year and a few minutes a week, but it is absolutely worth the peace of mind it brings.
Here's the how to backup help, if you want to revisit it. If you would like help setting up FreeFileSync or ToDoList or other similar software, let me know.
This is also a great time to think about backing up other things in your life. All of your documents, family photos, your password manager file--are they backed up? Would you be ok with losing them if their drive failed tomorrow? Movies and music will need a real drive, but your smaller things like documents can also fit on an (encrypted) USB stick that you can put in your wallet or keychain.
"},{"location":"changelog.html","title":"changelog","text":"Note
This is the new changelog, only the most recent builds. For all versions, see the old changelog.
"},{"location":"changelog.html#version_552","title":"Version 552","text":""},{"location":"changelog.html#misc","title":"misc","text":"false
for a while, until the file maintenance catches up/manage_database/get_client_options
call that fetches a heap of different client options. this exposes a mess that may change with any update, but there may be something neat you can hook into. this week we fixed a thing that was breaking this call for probably all old clientssystem:date
predicates were displaying labels an hour off (usually midnight -> 11pm, thus cycling back to the previous day) thanks to the clocks changed (in the USA) last weekend. I suspect there is more of this, here and there, so let me know what you see(t)est
Qt version in the 'setup_venv' now points to this. it seems fine to me on a fairly normal Win 11 machine, but if recent history is any guide, there's going to be a niggle somewhere. if you have been waiting for a fix on the menu position issue or anything else, give it a go! if things go well, I'll roll this into a larger 'future' test release and then we'll integrate it into main(w)rite
your own version in!distutils
, and thus should now be compatible (or less incompatible, let's see, ha ha) with python 3.12. thanks for the user report and assistance hereauto_update_installer.bat
, to the main install directory. it will download the latest Windows exe installer using winget and install it to the current location. if you use the installer, you might want to experiment with it (make a backup first!) as an easy hands-free update solution. let me know how it goes, and if there are no problems in a couple of weeks, I'll add it to the helpversion
and hydrus_version
in every JSON Client API response. CBOR responses are not affected. if you need to hook into these numbers for a completely stateless interface, it is now super convenient. I'm not delighted with the spamminess of this, but it is just a handful of characters and it adds value for several situations, so I'm willing to try it outHydrusImageMetadata
fileHydrusBlurhash
fileHydrusImageNormalisation
fileHydrusImageColours
fileOPENCV_OK
fallback code, which was only used, superfluously, in a couple of final places. OpenCV is not optional to run hydrus, server or clientfile_metadata
call now says the new blurhash. if you pipe it into a blurhash library and blow it up to an appopriate ratio canvas, it should just work. the typical use is as a placeholder while you wait for thumbs/files to downloadinclude_blurhash
parameter will include the blurhash when only_return_basic_information
is truefile_metadata
also shows the file's pixel_hash
now. the algorithm here is proprietary to hydrus, but you can throw it into 'system:similar files' to find pixel dupes. I expect to add perceptual hashes tooPillow
library, which also rolled out a fix. I'm not sure how vulnerable hydrus ever was, since we are usually jank about how we do anything, but best to be safe about these things. there were apparently exploits for this floating aroundPillow
migrate database
dialog now allows you to set a 'max size' for all but one of your media locations. if you have a 500GB drive you want to store some stuff on, you no longer have to balance the weights in your head--just set a max size of 450GB and hydrus will figure it out for you. it is not super precise (and it isn't healthy to fill drives up to 98% anyway), so make sure you leave some padding/get_files/render
command, which gives you a 100% zoom png render of the given file. useful if you want to display a PSD on a web page!/get_files/search_files
, the help talks about it. He also cancels his work early if the request is terminated/add_tags/get_siblings_and_parents
now properly cleans the tags you give it, trimming whitespace and lowercasing letters and so ondateparser
library. all old 'datestring to timestamp' rules remain as they are, but are now called '(advanced)'. a new option, 'datestring to timestamp (easy)', which has exactly zero variables to fiddle with, just eats up pretty much any date string you can think of, including timezone conversions, and even stuff like '2 hours ago'. you need the dateparser library for this to work, so if you run from source, you might like to rebuild your venv this week. your dateparser
import status is in help->aboutThe hydrus client now supports a very simple API so you can access it with external programs.
"},{"location":"client_api.html#enabling_the_api","title":"Enabling the API","text":"By default, the Client API is not turned on. Go to services->manage services and give it a port to get it started. I recommend you not allow non-local connections (i.e. only requests from the same computer will work) to start with.
The Client API should start immediately. It will only be active while the client is open. To test it is running all correct (and assuming you used the default port of 45869), try loading this:
http://127.0.0.1:45869
You should get a welcome page. By default, the Client API is HTTP, which means it is ok for communication on the same computer or across your home network (e.g. your computer's web browser talking to your computer's hydrus), but not secure for transmission across the internet (e.g. your phone to your home computer). You can turn on HTTPS, but due to technical complexities it will give itself a self-signed 'certificate', so the security is good but imperfect, and whatever is talking to it (e.g. your web browser looking at https://127.0.0.1:45869) may need to add an exception.
The Client API is still experimental and sometimes not user friendly. If you want to talk to your home computer across the internet, you will need some networking experience. You'll need a static IP or reverse proxy service or dynamic domain solution like no-ip.org so your device can locate it, and potentially port-forwarding on your router to expose the port. If you have a way of hosting a domain and have a signed certificate (e.g. from Let's Encrypt), you can overwrite the client.crt and client.key files in your 'db' directory and HTTPS hydrus should host with those.
Once the API is running, go to its entry in services->review services. Each external program trying to access the API will need its own access key, which is the familiar 64-character hexadecimal used in many places in hydrus. You can enter the details manually from the review services panel and then copy/paste the key to your external program, or the program may have the ability to request its own access while a mini-dialog launched from the review services panel waits to catch the request.
"},{"location":"client_api.html#tools_created_by_hydrus_users","title":"Tools created by hydrus users","text":""},{"location":"client_api.html#browser_add-on","title":"Browser Add-on","text":"I welcome all your bug reports, questions, ideas, and comments. It is always interesting to see how other people are using my software and what they generally think of it. Most of the changes every week are suggested by users.
You can contact me by email, twitter, discord, or the release threads on 8chan or Endchan--I do not mind which. Please know that I have difficulty with social media, and while I try to reply to all messages, it sometimes takes me a while to catch up.
If you need it, here's my public GPG key.
The Github Issue Tracker was turned off for some time, as it did not fit my workflow and I could not keep up, but it is now running again, managed by a team of volunteer users. Please feel free to submit feature requests there if you are comfortable with Github. I am not socially active on Github, please do not ping me there.
I am on the discord on Saturday afternoon, USA time, if you would like to talk live, and briefly on Wednesday after I put the release out. If that is not a good time for you, please leave me a DM and I will get to you when I can. There are also plenty of other hydrus users who idle who can help with support questions.
I delete all tweets and resolved email conversations after three months. So, if you think you are waiting for a reply, or I said I was going to work on something you care about and seem to have forgotten, please do nudge me.
I am always overwhelmed by work and behind on my messages. This is not to say that I do not enjoy just hanging out or talking about possible new features, but forgive me if some work takes longer than expected or if I cannot get to a particular idea quickly. In the same way, if you encounter actual traceback-raising errors or crashes, there is only one guy to fix it, so I prefer to know ASAP so I can prioritise.
I work by myself because I have acute difficulty working with others. Please do not spontaneously write long design documents or prepare other work for me--I find it more stressful than helpful, every time, and I won't give it the attention it deserves. If you would like to contribute time to hydrus, the user projects like the downloader repository and wiki help guides always have things to do.
That said:
Warning
I am working on this system right now and will be moving the 'move files now' action to a more granular, always-on background migration. This document will update to reflect those changes!
"},{"location":"database_migration.html#database_migration","title":"database migration","text":""},{"location":"database_migration.html#intro","title":"the hydrus database","text":"A hydrus client consists of three components:
the software installation
This is the part that comes with the installer or extract release, with the executable and dlls and a handful of resource folders. It doesn't store any of your settings--it just knows how to present a database as a nice application. If you just run the hydrus_client executable straight, it looks in its 'db' subdirectory for a database, and if one is not found, it creates a new one. If it sees a database running at a lower version than itself, it will update the database before booting it.
It doesn't really matter where you put this. An SSD will load it marginally quicker the first time, but you probably won't notice. If you run it without command-line parameters, it will try to write to its own directory (to create the initial database), so if you mean to run it like that, it should not be in a protected place like Program Files.
the actual SQLite database
The client stores all its preferences and current state and knowledge about files--like file size and resolution, tags, ratings, inbox status, and so on and on--in a handful of SQLite database files, defaulting to install_dir/db. Depending on the size of your client, these might total 1MB in size or be as much as 10GB.
In order to perform a search or to fetch or process tags, the client has to interact with these files in many small bursts, which means it is best if these files are on a drive with low latency. An SSD is ideal, but a regularly-defragged HDD with a reasonable amount of free space also works well.
your media files
All of your jpegs and webms and so on (and their thumbnails) are stored in a single complicated directory that is by default at install_dir/db/client_files. All the files are named by their hash and stored in efficient hash-based subdirectories. In general, it is not navigable by humans, but it works very well for the fast access from a giant pool of files the client needs to do to manage your media.
Thumbnails tend to be fetched dozens at a time, so it is, again, ideal if they are stored on an SSD. Your regular media files--which on many clients total hundreds of GB--are usually fetched one at a time for human consumption and do not benefit from the expensive low-latency of an SSD. They are best stored on a cheap HDD, and, if desired, also work well across a network file system.
Although an initial install will keep these parts together, it is possible to, say, run the SQLite database on a fast drive but keep your media in cheap slow storage. This is an excellent arrangement that works for many users. And if you have a very large collection, you can even spread your files across multiple drives. It is not very technically difficult, but I do not recommend it for new users.
Backing such an arrangement up is obviously more complicated, and the internal client backup is not sophisticated enough to capture everything, so I recommend you figure out a broader solution with a third-party backup program like FreeFileSync.
"},{"location":"database_migration.html#pulling_media_apart","title":"pulling your media apart","text":"Danger
As always, I recommend creating a backup before you try any of this, just in case it goes wrong.
If you would like to move your files and thumbnails to new locations, I generally recommend you not move their folders around yourself--the database has an internal knowledge of where it thinks its file and thumbnail folders are, and if you move them while it is closed, it will become confused.
Missing LocationsIf your folders are in the wrong locations on a client boot, a repair dialog appears, and you can manually update the client's internal understanding. This is not impossible to figure out, and in some tricky storage situations doing this on purpose can be faster than letting the client migrate things itself, but generally it is best and safest to do everything through the dialog.
Go database->migrate database, giving you this dialog:
The buttons let you add more locations and remove old ones. The operations on this dialog are simple and atomic--at no point is your db ever invalid.
Beneath db? means that the path is beneath the main db dir and so is stored internally as a relative path. Portable paths will still function if the database changes location between boots (for instance, if you run the client from a USB drive and it mounts under a different location).
Weight means the relative amount of media you would like to store in that location. It only matters if you are spreading your files across multiple locations. If location A has a weight of 1 and B has a weight of 2, A will get approximately one third of your files and B will get approximately two thirds.
Max Size means the max total size of files the client will want to store in that location. Again, it only matters if you are spreading your files across multiple locations, but it is a simple way to ensure you don't go over a particular smaller hard drive's size. One location must always be limitless. This is not precise, so give it some padding. When one location is maxed out, the remaining locations will distribute the remainder of the files according to their respective weights. For the meantime, this will not update by itself. If you import many files, the location may go over its limit and you will have to revisit 'migrate database' to rebalance your files again. Bear with me--I will fix this soon with the background migrate.
Let's set up an example move:
I made several changes:
C:\\hydrus_files
to store files.D:\\hydrus_files
to store files, with a max size of 128MB.C:\\hydrus_thumbs
as the location to store thumbnails.C:\\Hydrus Network\\db\\client_files
location.While the ideal usage has changed significantly, note that the current usage remains the same. Nothing moves until you click 'move files now'. Moving files will take some time to finish. Once done, it looks like this:
The current and ideal usages line up, and the defunct C:\\Hydrus Network\\db\\client_files
location, which no longer stores anything, is removed from the list.
A straight call to the hydrus_client executable will look for a SQLite database in install_dir/db. If one is not found, it will create one. If you move your database and then try to run the client again, it will try to create a new empty database in that old location!
To tell it about the new database location, pass it a -d
or --db_dir
command line argument, like so:
hydrus_client -d=\"D:\\media\\my_hydrus_database\"
hydrus_client --db_dir=\"G:\\misc documents\\New Folder (3)\\DO NOT ENTER\"
python hydrus_client.py -d=\"D:\\media\\my_hydrus_database\"
open -n -a \"Hydrus Network.app\" --args -d=\"/path/to/db\"
And it will instead use the given path. If no database is found, it will similarly create a new empty one at that location. You can use any path that is valid in your system.
Bad Locations
Do not run a SQLite database on a network location! The database relies on clever hardware-level exclusive file locks, which network interfaces often fake. While the program may work, I cannot guarantee the database will stay non-corrupt.
Do not run a SQLite database on a location with filesystem-level compression enabled! In the best case (BTRFS), the database can suddenly get extremely slow when it hits a certain size; in the worst (NTFS), a >50GB database will encounter I/O errors and receive sporadic corruption!
Rather than typing the path out in a terminal every time you want to launch your external database, create a new shortcut with the argument in. Something like this:
Note that an install with an 'external' database no longer needs access to write to its own path, so you can store it anywhere you like, including protected read-only locations (e.g. in 'Program Files'). Just double-check your shortcuts are good.
"},{"location":"database_migration.html#finally","title":"backups","text":"If your database now lives in one or more new locations, make sure to update your backup routine to follow them!
"},{"location":"database_migration.html#to_an_ssd","title":"moving to an SSD","text":"As an example, let's say you started using the hydrus client on your HDD, and now you have an SSD available and would like to move your thumbnails and main install to that SSD to speed up the client. Your database will be valid and functional at every stage of this, and it can all be undone. The basic steps are:
Specifically:
You should now have something like this (let's say the D drive is the fast SSD, and E is the high capacity HDD):
"},{"location":"database_migration.html#multiple_clients","title":"p.s. running multiple clients","text":"Since you now know how to tell the software about an external database, you can, if you like, run multiple clients from the same install (and if you previously had multiple install folders, now you can now just use the one). Just make multiple shortcuts to the same hydrus_client executable but with different database directories. They can run at the same time. You'll save yourself a little memory and update-hassle.
"},{"location":"developer_api.html","title":"API documentation","text":""},{"location":"developer_api.html#library_modules_created_by_hydrus_users","title":"Library modules created by hydrus users","text":"In general, the API deals with standard UTF-8 JSON. POST requests and 200 OK responses are generally going to be a JSON 'Object' with variable names as keys and values obviously as values. There are examples throughout this document. For GET requests, everything is in standard GET parameters, but some variables are complicated and will need to be JSON encoded and then URL encoded. An example would be the 'tags' parameter on GET /get_files/search_files, which is a list of strings. Since GET http URLs have limits on what characters are allowed, but hydrus tags can have all sorts of characters, you'll be doing this:
Your list of tags:
[ 'character:samus aran', 'creator:\u9752\u3044\u685c', 'system:height > 2000' ]\n
JSON encoded:
[\"character:samus aran\", \"creator:\\\\u9752\\\\u3044\\\\u685c\", \"system:height > 2000\"]\n
Then URL encoded:
%5B%22character%3Asamus%20aran%22%2C%20%22creator%3A%5Cu9752%5Cu3044%5Cu685c%22%2C%20%22system%3Aheight%20%3E%202000%22%5D\n
In python, converting your tag list to the URL encoded string would be:
urllib.parse.quote( json.dumps( tag_list ) )\n
Full URL path example:
/get_files/search_files?file_sort_type=6&file_sort_asc=false&tags=%5B%22character%3Asamus%20aran%22%2C%20%22creator%3A%5Cu9752%5Cu3044%5Cu685c%22%2C%20%22system%3Aheight%20%3E%202000%22%5D\n
The API returns JSON for everything except actual file/thumbnail requests. Every JSON response includes the version
of the Client API and hydrus_version
of the Client hosting it (for brevity, these values are not included in the example responses in this help). For errors, you'll typically get 400 for a missing/invalid parameter, 401/403/419 for missing/insufficient/expired access, and 500 for a real deal serverside error.
Note
For any request sent to the API, the total size of the initial request line (this includes the URL and any parameters) and the headers must not be larger than 2 megabytes. Exceeding this limit will cause the request to fail. Make sure to use pagination if you are passing very large JSON arrays as parameters in a GET request.
"},{"location":"developer_api.html#cbor","title":"CBOR","text":"The API now tentatively supports CBOR, which is basically 'byte JSON'. If you are in a lower level language or need to do a lot of heavy work quickly, try it out!
To send CBOR, for POST put Content-Type application/cbor
in your request header instead of application/json
, and for GET just add a cbor=1
parameter to the URL string. Use CBOR to encode any parameters that you would previously put in JSON:
For POST requests, just print the pure bytes in the body, like this:
cbor2.dumps( arg_dict )\n
For GET, encode the parameter value in base64, like this:
base64.urlsafe_b64encode( cbor2.dumps( argument ) )\n
-or- str( base64.urlsafe_b64encode( cbor2.dumps( argument ) ), 'ascii' )\n
If you send CBOR, the client will return CBOR. If you want to send CBOR and get JSON back, or vice versa (or you are uploading a file and can't set CBOR Content-Type), send the Accept request header, like so:
Accept: application/cbor\nAccept: application/json\n
If the client does not support CBOR, you'll get 406.
"},{"location":"developer_api.html#access_and_permissions","title":"Access and permissions","text":"The client gives access to its API through different 'access keys', which are the typical 64-character hex used in many other places across hydrus. Each guarantees different permissions such as handling files or tags. Most of the time, a user will provide full access, but do not assume this. If the access header or parameter is not provided, you will get 401, and all insufficient permission problems will return 403 with appropriate error text.
Access is required for every request. You can provide this as an http header, like so:
Hydrus-Client-API-Access-Key : 0150d9c4f6a6d2082534a997f4588dcf0c56dffe1d03ffbf98472236112236ae\n
Or you can include it in the normal parameters of any request (except POST /add_files/add_file, which uses the entire POST body for the file's bytes). For GET, this means including it into the URL parameters:
/get_files/thumbnail?file_id=452158&Hydrus-Client-API-Access-Key=0150d9c4f6a6d2082534a997f4588dcf0c56dffe1d03ffbf98472236112236ae\n
For POST, this means in the JSON body parameters, like so:
{\n \"hash_id\" : 123456,\n \"Hydrus-Client-API-Access-Key\" : \"0150d9c4f6a6d2082534a997f4588dcf0c56dffe1d03ffbf98472236112236ae\"\n}\n
There is also a simple 'session' system, where you can get a temporary key that gives the same access without having to include the permanent access key in every request. You can fetch a session key with the /session_key command and thereafter use it just as you would an access key, just with Hydrus-Client-API-Session-Key instead.
Session keys will expire if they are not used within 24 hours, or if the client is restarted, or if the underlying access key is deleted. An invalid/expired session key will give a 419 result with an appropriate error text.
Bear in mind the Client API is still under construction. Setting up the Client API to be accessible across the internet requires technical experience to be convenient. HTTPS is available for encrypted comms, but the default certificate is self-signed (which basically means an eavesdropper can't see through it, but your ISP/government could if they decided to target you). If you have your own domain to host from and an SSL cert, you can replace them and it'll use them instead (check the db directory for client.crt and client.key). Otherwise, be careful about transmitting sensitive content outside of your localhost/network.
"},{"location":"developer_api.html#common_complex_parameters","title":"Common Complex Parameters","text":""},{"location":"developer_api.html#parameters_files","title":"files","text":"If you need to refer to some files, you can use any of the following:
Arguments:file_id
: (selective, a numerical file id)file_ids
: (selective, a list of numerical file ids)hash
: (selective, a hexadecimal SHA256 hash)hashes
: (selective, a list of hexadecimal SHA256 hashes)In GET requests, make sure any list is percent-encoded.
"},{"location":"developer_api.html#parameters_file_domain","title":"file domain","text":"When you are searching, you may want to specify a particular file domain. Most of the time, you'll want to just set file_service_key
, but this can get complex:
file_service_key
: (optional, selective A, hexadecimal, the file domain on which to search)file_service_keys
: (optional, selective A, list of hexadecimals, the union of file domains on which to search)deleted_file_service_key
: (optional, selective B, hexadecimal, the 'deleted from this file domain' on which to search)deleted_file_service_keys
: (optional, selective B, list of hexadecimals, the union of 'deleted from this file domain' on which to search)The service keys are as in /get_services.
Hydrus supports two concepts here:
You can play around with this yourself by clicking 'multiple locations' in the client with help->advanced mode on.
In extreme edge cases, these two can be mixed by populating both A and B selective, making a larger union of both current and deleted file records.
Please note that unions can be very very computationally expensive. If you can achieve what you want with a single file_service_key, two queries in a row with different service keys, or an umbrella like all my files
or all local files
, please do. Otherwise, let me know what is running slow and I'll have a look at it.
'deleted from all local files' includes all files that have been physically deleted (i.e. deleted from the trash) and not available any more for fetch file/thumbnail requests. 'deleted from all my files' includes all of those physically deleted files and the trash. If a file is deleted with the special 'do not leave a deletion record' command, then it won't show up in a 'deleted from file domain' search!
'all known files' is a tricky domain. It converts much of the search tech to ignore where files actually are and look at the accompanying tag domain (e.g. all the files that have been tagged), and can sometimes be very expensive.
Also, if you have the option to set both file and tag domains, you cannot enter 'all known files'/'all known tags'. It is too complicated to support, sorry!
"},{"location":"developer_api.html#legacy_service_name_parameters","title":"legacy service_name parameters","text":"The Client API used to respond to name-based service identifiers, for instance using 'my tags' instead of something like '6c6f63616c2074616773'. Service names can change, and they aren't strictly unique either, so I have moved away from them, but there is some soft legacy support.
The client will attempt to convert any of these to their 'service_key(s)' equivalents:
But I strongly encourage you to move away from them as soon as reasonably possible. Look up the service keys you need with /get_service or /get_services.
If you have a clever script/program that does many things, then hit up /get_services on session initialisation and cache an internal map of key_to_name for the labels to use when you present services to the user.
Also, note that all users can now copy their service keys from review services.
"},{"location":"developer_api.html#services_object","title":"The Services Object","text":"Hydrus manages its different available domains and actions with what it calls services. If you are a regular user of the program, you will know about review services and manage services. The Client API needs to refer to services, either to accept commands from you or to tell you what metadata files have and where.
When it does this, it gives you this structure, typically under a services
key right off the root node:
{\n \"c6f63616c2074616773\" : {\n \"name\" : \"my tags\",\n \"type\": 5,\n \"type_pretty\" : \"local tag service\"\n },\n \"5674450950748cfb28778b511024cfbf0f9f67355cf833de632244078b5a6f8d\" : {\n \"name\" : \"example tag repo\",\n \"type\" : 0,\n \"type_pretty\" : \"hydrus tag repository\"\n },\n \"6c6f63616c2066696c6573\" : {\n \"name\" : \"my files\",\n \"type\" : 2,\n \"type_pretty\" : \"local file domain\"\n },\n \"7265706f7369746f72792075706461746573\" : {\n \"name\" : \"repository updates\",\n \"type\" : 20,\n \"type_pretty\" : \"local update file domain\"\n },\n \"ae7d9a603008919612894fc360130ae3d9925b8577d075cd0473090ac38b12b6\" : {\n \"name\": \"example file repo\",\n \"type\" : 1,\n \"type_pretty\" : \"hydrus file repository\"\n },\n \"616c6c206c6f63616c2066696c6573\" : {\n \"name\" : \"all local files\",\n \"type\": 15,\n \"type_pretty\" : \"virtual combined local file service\"\n },\n \"616c6c206c6f63616c206d65646961\" : {\n \"name\" : \"all my files\",\n \"type\" : 21,\n \"type_pretty\" : \"virtual combined local media service\"\n },\n \"616c6c206b6e6f776e2066696c6573\" : {\n \"name\" : \"all known files\",\n \"type\" : 11,\n \"type_pretty\" : \"virtual combined file service\"\n },\n \"616c6c206b6e6f776e2074616773\" : {\n \"name\" : \"all known tags\",\n \"type\": 10,\n \"type_pretty\" : \"virtual combined tag service\"\n },\n \"74d52c6238d25f846d579174c11856b1aaccdb04a185cb2c79f0d0e499284f2c\" : {\n \"name\" : \"example local rating like service\",\n \"type\" : 7,\n \"type_pretty\" : \"local like/dislike rating service\",\n \"star_shape\" : \"circle\"\n },\n \"90769255dae5c205c975fc4ce2efff796b8be8a421f786c1737f87f98187ffaf\" : {\n \"name\" : \"example local rating numerical service\",\n \"type\" : 6,\n \"type_pretty\" : \"local numerical rating service\",\n \"star_shape\" : \"fat star\",\n \"min_stars\" : 1,\n \"max_stars\" : 5\n },\n \"b474e0cbbab02ca1479c12ad985f1c680ea909a54eb028e3ad06750ea40d4106\" : {\n \"name\" : \"example local rating inc/dec service\",\n \"type\" : 22,\n \"type_pretty\" : \"local inc/dec rating service\"\n },\n \"7472617368\" : {\n \"name\" : \"trash\",\n \"type\" : 14,\n \"type_pretty\" : \"local trash file domain\"\n }\n}\n
I hope you recognise some of the information here. But what's that hex key on each section? It is the service_key
.
All services have these properties:
name
- A mutable human-friendly name like 'my tags'. You can use this to present the service to the user--they should recognise it.type
- An integer enum saying whether the service is a local tag service or like/dislike rating service or whatever. This cannot change.service_key
- The true 'id' of the service. It is a string of hex, sometimes just twenty or so characters but in many cases 64 characters. This cannot change, and it is how we will refer to different services.This service_key
is important. A user can rename their services, so name
is not an excellent identifier, and definitely not something you should save to any permanent config file.
If we want to search some files on a particular file and tag domain, we should expect to be saying something like file_service_key=6c6f63616c2066696c6573
and tag_service_key=f032e94a38bb9867521a05dc7b189941a9c65c25048911f936fc639be2064a4b
somewhere in the request.
You won't see all of these, but the service type
enum is:
type_pretty
is something you can show users. Hydrus uses the same labels in manage services and so on.
Rating services now have some extra data:
star_shape
, which is one of circle | square | fat star | pentagram star
min_stars
(0 or 1) and max_stars
(1 to 20)If you are displaying ratings, don't feel crazy obligated to obey the shape! Show a \u2158, select from a dropdown list, do whatever you like!
If you want to know the services in a client, hit up /get_services, which simply gives the above. The same structure has recently been added to /get_files/file_metadata for convenience, since that refers to many different services when it is talking about file locations and ratings and so on.
Note: If you need to do some quick testing, you should be able to copy the service_key
of any service by hitting the 'copy service key' button in review services.
/api_version
","text":"Gets the current API version. This increments every time I alter the API.
Restricted access: NO.
Required Headers: n/a
Arguments: n/a
Response: Some simple JSON describing the current api version (and hydrus client version, if you are interested). Note that this is not very useful any more, for two reasons:{\n \"version\" : 17,\n \"hydrus_version\" : 441\n}\n
"},{"location":"developer_api.html#request_new_permissions","title":"GET /request_new_permissions
","text":"Register a new external program with the client. This requires the 'add from api request' mini-dialog under services->review services to be open, otherwise it will 403.
Restricted access: NO.
Required Headers: n/a
Arguments:name
: (descriptive name of your access)basic_permissions
: A JSON-encoded list of numerical permission identifiers you want to request.
The permissions are currently:
/request_new_permissions?name=my%20import%20script&basic_permissions=[0,1]\n
Response: Some JSON with your access key, which is 64 characters of hex. This will not be valid until the user approves the request in the client ui. Example response{\n \"access_key\" : \"73c9ab12751dcf3368f028d3abbe1d8e2a3a48d0de25e64f3a8f00f3a1424c57\"\n}\n
"},{"location":"developer_api.html#session_key","title":"GET /session_key
","text":"Get a new session key.
Restricted access: YES. No permissions required.
Required Headers: n/a
Arguments: n/a
Response: Some JSON with a new session key in hex. Example response{\n \"session_key\" : \"f6e651e7467255ade6f7c66050f3d595ff06d6f3d3693a3a6fb1a9c2b278f800\"\n}\n
Note
Note that the access you provide to get a new session key can be a session key, if that happens to be useful. As long as you have some kind of access, you can generate a new session key.
A session key expires after 24 hours of inactivity, whenever the client restarts, or if the underlying access key is deleted. A request on an expired session key returns 419.
"},{"location":"developer_api.html#verify_access_key","title":"GET/verify_access_key
","text":"Check your access key is valid.
Restricted access: YES. No permissions required.
Required Headers: n/a
Arguments: n/a
Response: 401/403/419 and some error text if the provided access/session key is invalid, otherwise some JSON with basic permission info. Example response{\n \"basic_permissions\" : [0, 1, 3],\n \"human_description\" : \"API Permissions (autotagger): add tags to files, import files, search for files: Can search: only autotag this\"\n}\n
"},{"location":"developer_api.html#get_service","title":"GET /get_service
","text":"Ask the client about a specific service.
Restricted access: YES. At least one of Add Files, Add Tags, Manage Pages, or Search Files permission needed.Required Headers: n/a
Arguments:service_name
: (selective, string, the name of the service)service_key
: (selective, hex string, the service key of the service)/get_service?service_name=my%20tags\n/get_service?service_key=6c6f63616c2074616773\n
Response: Some JSON about the service. A similar format as /get_services and The Services Object. Example response{\n \"service\" : {\n \"name\" : \"my tags\",\n \"service_key\" : \"6c6f63616c2074616773\",\n \"type\" : 5,\n \"type_pretty\" : \"local tag service\"\n }\n}\n
If the service does not exist, this gives 404. It is very unlikely but edge-case possible that two services will have the same name, in this case you'll get the pseudorandom first.
It will only respond to services in the /get_services list. I will expand the available types in future as we add ratings etc... to the Client API.
"},{"location":"developer_api.html#get_services","title":"GET/get_services
","text":"Ask the client about its services.
Restricted access: YES. At least one of Add Files, Add Tags, Manage Pages, or Search Files permission needed.Required Headers: n/a
Arguments: n/a
Response: Some JSON listing the client's services. Example response{\n \"services\" : \"The Services Object\"\n}\n
This now primarily uses The Services Object.
Note
If you do the request and look at the actual response, you will see a lot more data under different keys--this is deprecated, and will be deleted in 2024. If you use the old structure, please move over!
"},{"location":"developer_api.html#importing_and_deleting_files","title":"Importing and Deleting Files","text":""},{"location":"developer_api.html#add_files_add_file","title":"POST/add_files/add_file
","text":"Tell the client to import a file.
Restricted access: YES. Import Files permission needed. Required Headers:application/json
(if sending path), application/octet-stream
(if sending file)path
: (the path you want to import){\n \"path\" : \"E:\\\\to_import\\\\ayanami.jpg\"\n}\n
Arguments (as bytes): You can alternately just send the file's bytes as the POST body. Response: Some JSON with the import result. Please note that file imports for large files may take several seconds, and longer if the client is busy doing other db work, so make sure your request is willing to wait that long for the response. Example response
{\n \"status\" : 1,\n \"hash\" : \"29a15ad0c035c0a0e86e2591660207db64b10777ced76565a695102a481c3dd1\",\n \"note\" : \"\"\n}\n
status
is:
A file 'veto' is caused by the file import options (which in this case is the 'quiet' set under the client's options->importing) stopping the file due to its resolution or minimum file size rules, etc...
'hash' is the file's SHA256 hash in hexadecimal, and 'note' is any additional human-readable text appropriate to the file status that you may recognise from hydrus's normal import workflow. For an outright import error, it will be a summary of the exception that you can present to the user, and a new field traceback
will have the full trace for debugging purposes.
/add_files/delete_files
","text":"Tell the client to send files to the trash.
Restricted access: YES. Import Files permission needed. Required Headers:Content-Type
: application/json
reason
: (optional, string, the reason attached to the delete action){\n \"hash\" : \"78f92ba4a786225ee2a1236efa6b7dc81dd729faf4af99f96f3e20bad6d8b538\"\n}\n
Response: 200 and no content. If you specify a file service, the file will only be deleted from that location. Only local file domains are allowed (so you can't delete from a file repository or unpin from ipfs yet). It defaults to 'all my files', which will delete from all local services (i.e. force sending to trash). Sending 'all local files' on a file already in the trash will trigger a physical file delete.
"},{"location":"developer_api.html#add_files_undelete_files","title":"POST/add_files/undelete_files
","text":"Tell the client to pull files back out of the trash.
Restricted access: YES. Import Files permission needed. Required Headers:Content-Type
: application/json{\n \"hash\" : \"78f92ba4a786225ee2a1236efa6b7dc81dd729faf4af99f96f3e20bad6d8b538\"\n}\n
Response: 200 and no content. You can use hash or hashes, whichever is more convenient.
This is the reverse of a delete_files--removing files from trash and putting them back where they came from. If you specify a file service, the files will only be undeleted to there (if they have a delete record, otherwise this is nullipotent). The default, 'all my files', undeletes to all local file services for which there are deletion records. There is no error if any of the files do not currently exist in 'trash'.
"},{"location":"developer_api.html#add_files_archive_files","title":"POST/add_files/archive_files
","text":"Tell the client to archive inboxed files.
Restricted access: YES. Import Files permission needed. Required Headers:Content-Type
: application/json{\n \"hash\" : \"78f92ba4a786225ee2a1236efa6b7dc81dd729faf4af99f96f3e20bad6d8b538\"\n}\n
Response: 200 and no content. This puts files in the 'archive', taking them out of the inbox. It only has meaning for files currently in 'my files' or 'trash'. There is no error if any files do not currently exist or are already in the archive.
"},{"location":"developer_api.html#add_files_unarchive_files","title":"POST/add_files/unarchive_files
","text":"Tell the client re-inbox archived files.
Restricted access: YES. Import Files permission needed. Required Headers:Content-Type
: application/json{\n \"hash\" : \"78f92ba4a786225ee2a1236efa6b7dc81dd729faf4af99f96f3e20bad6d8b538\"\n}\n
Response: 200 and no content. This puts files back in the inbox, taking them out of the archive. It only has meaning for files currently in 'my files' or 'trash'. There is no error if any files do not currently exist or are already in the inbox.
"},{"location":"developer_api.html#add_files_generate_hashes","title":"POST/add_files/generate_hashes
","text":"Generate hashes for an arbitrary file.
Restricted access: YES. Import Files permission needed. Required Headers:application/json
(if sending path), application/octet-stream
(if sending file)path
: (the path you want to import){\n \"path\" : \"E:\\\\to_import\\\\ayanami.jpg\"\n}\n
Arguments (as bytes): You can alternately just send the file's bytes as the POST body. Response: Some JSON with the hashes of the file Example response
{\n \"hash\": \"7de421a3f9be871a7037cca8286b149a31aecb6719268a94188d76c389fa140c\",\n \"perceptual_hashes\": [\n \"b44dc7b24dcb381c\"\n ],\n \"pixel_hash\": \"c7bf20e5c4b8a524c2c3e3af2737e26975d09cba2b3b8b76341c4c69b196da4e\",\n}\n
hash
is the sha256 hash of the submitted file.perceptual_hashes
is a list of perceptual hashes for the file.pixel_hash
is the sha256 hash of the pixel data of the rendered image.hash
will always be returned for any file, the others will only be returned for filetypes they can be generated for.
/add_urls/get_url_files
","text":"Ask the client about an URL's files.
Restricted access: YES. Import URLs permission needed.Required Headers: n/a
Arguments:url
: (the url you want to ask about)doublecheck_file_system
: true or false (optional, defaults False)http://safebooru.org/index.php?page=post&s=view&id=2753608
: /add_urls/get_url_files?url=http%3A%2F%2Fsafebooru.org%2Findex.php%3Fpage%3Dpost%26s%3Dview%26id%3D2753608\n
Response: Some JSON which files are known to be mapped to that URL. Note this needs a database hit, so it may be delayed if the client is otherwise busy. Don't rely on this to always be fast. Example response{\n \"normalised_url\" : \"https://safebooru.org/index.php?id=2753608&page=post&s=view\",\n \"url_file_statuses\" : [\n {\n \"status\" : 2,\n \"hash\" : \"20e9002824e5e7ffc240b91b6e4a6af552b3143993c1778fd523c30d9fdde02c\",\n \"note\" : \"url recognised: Imported at 2015/10/18 10:58:01, which was 3 years 4 months ago (before this check).\"\n }\n ]\n}\n
The url_file_statuses
is a list of zero-to-n JSON Objects, each representing a file match the client found in its database for the URL. Typically, it will be of length 0 (for as-yet-unvisited URLs or Gallery/Watchable URLs that are not attached to files) or 1, but sometimes multiple files are given the same URL (sometimes by mistaken misattribution, sometimes by design, such as pixiv manga pages). Handling n files per URL is a pain but an unavoidable issue you should account for.
status
is the same as for /add_files/add_file
:
hash
is the file's SHA256 hash in hexadecimal, and 'note' is some occasional additional human-readable text you may recognise from hydrus's normal import workflow.
If you set doublecheck_file_system
to true
, then any result that is 'already in db' (2) will be double-checked against the actual file system. This check happens on any normal file import process, just to check for and fix missing files (if the file is missing, the status becomes 0--new), but the check can take more than a few milliseconds on an HDD or a network drive, so the default behaviour, assuming you mostly just want to spam for 'seen this before' file statuses, is to not do it.
/add_urls/get_url_info
","text":"Ask the client for information about a URL.
Restricted access: YES. Import URLs permission needed.Required Headers: n/a
Arguments:url
: (the url you want to ask about)https://8ch.net/tv/res/1846574.html
: /add_urls/get_url_info?url=https%3A%2F%2F8ch.net%2Ftv%2Fres%2F1846574.html\n
Response: Some JSON describing what the client thinks of the URL. Example response
{\n \"normalised_url\" : \"https://8ch.net/tv/res/1846574.html\",\n \"url_type\" : 4,\n \"url_type_string\" : \"watchable url\",\n \"match_name\" : \"8chan thread\",\n \"can_parse\" : true\n}\n
The url types are currently:
'Unknown' URLs are treated in the client as direct File URLs. Even though the 'File URL' type is available, most file urls do not have a URL Class, so they will appear as Unknown. Adding them to the client will pass them to the URL Downloader as a raw file for download and import.
"},{"location":"developer_api.html#add_urls_add_url","title":"POST/add_urls/add_url
","text":"Tell the client to 'import' a URL. This triggers the exact same routine as drag-and-dropping a text URL onto the main client window.
Restricted access: YES. Import URLs permission needed. Add Tags needed to include tags. Required Headers:Content-Type
: application/json
url
: (the url you want to add)destination_page_key
: (optional page identifier for the page to receive the url)destination_page_name
: (optional page name to receive the url)show_destination_page
: (optional, defaulting to false, controls whether the UI will change pages on add)service_keys_to_additional_tags
: (optional, selective, tags to give to any files imported from this url)filterable_tags
: (optional tags to be filtered by any tag import options that applies to the URL)If you specify a destination_page_name
and an appropriate importer page already exists with that name, that page will be used. Otherwise, a new page with that name will be recreated (and used by subsequent calls with that name). Make sure it that page name is unique (e.g. '/b/ threads', not 'watcher') in your client, or it may not be found.
Alternately, destination_page_key
defines exactly which page should be used. Bear in mind this page key is only valid to the current session (they are regenerated on client reset or session reload), so you must figure out which one you want using the /manage_pages/get_pages call. If the correct page_key is not found, or the page it corresponds to is of the incorrect type, the standard page selection/creation rules will apply.
show_destination_page
defaults to False to reduce flicker when adding many URLs to different pages quickly. If you turn it on, the client will behave like a URL drag and drop and select the final page the URL ends up on.
service_keys_to_additional_tags
uses the same data structure as in /add_tags/add_tags--service keys to a list of tags to add. You will need 'add tags' permission or this will 403. These tags work exactly as 'additional' tags work in a tag import options. They are service specific, and always added unless some advanced tag import options checkbox (like 'only add tags to new files') is set.
filterable_tags works like the tags parsed by a hydrus downloader. It is just a list of strings. They have no inherant service and will be sent to a tag import options, if one exists, to decide which tag services get what. This parameter is useful if you are pulling all a URL's tags outside of hydrus and want to have them processed like any other downloader, rather than figuring out service names and namespace filtering on your end. Note that in order for a tag import options to kick in, I think you will have to have a Post URL URL Class hydrus-side set up for the URL so some tag import options (whether that is Class-specific or just the default) can be loaded at import time.
Example request body
{\n \"url\" : \"https://8ch.net/tv/res/1846574.html\",\n \"destination_page_name\" : \"kino zone\",\n \"service_keys_to_additional_tags\" : {\n \"6c6f63616c2074616773\" : [\"as seen on /tv/\"]\n }\n}\n
Example request body{\n \"url\" : \"https://safebooru.org/index.php?page=post&s=view&id=3195917\",\n \"filterable_tags\" : [\n \"1girl\",\n \"artist name\",\n \"creator:azto dio\",\n \"blonde hair\",\n \"blue eyes\",\n \"breasts\",\n \"character name\",\n \"commentary\",\n \"english commentary\",\n \"formal\",\n \"full body\",\n \"glasses\",\n \"gloves\",\n \"hair between eyes\",\n \"high heels\",\n \"highres\",\n \"large breasts\",\n \"long hair\",\n \"long sleeves\",\n \"looking at viewer\",\n \"series:metroid\",\n \"mole\",\n \"mole under mouth\",\n \"patreon username\",\n \"ponytail\",\n \"character:samus aran\",\n \"solo\",\n \"standing\",\n \"suit\",\n \"watermark\"\n ]\n}\n
Response: Some JSON with info on the URL added. Example response{\n \"human_result_text\" : \"\\\"https://8ch.net/tv/res/1846574.html\\\" URL added successfully.\",\n \"normalised_url\" : \"https://8ch.net/tv/res/1846574.html\"\n}\n
"},{"location":"developer_api.html#add_urls_associate_url","title":"POST /add_urls/associate_url
","text":"Manage which URLs the client considers to be associated with which files.
Restricted access: YES. Import URLs permission needed. Required Headers:Content-Type
: application/json
url_to_add
: (optional, selective A, an url you want to associate with the file(s))urls_to_add
: (optional, selective A, a list of urls you want to associate with the file(s))url_to_delete
: (optional, selective B, an url you want to disassociate from the file(s))urls_to_delete
: (optional, selective B, a list of urls you want to disassociate from the file(s))The single/multiple arguments work the same--just use whatever is convenient for you. Unless you really know what you are doing with URL Classes, I strongly recommend you stick to associating URLs with just one single 'hash' at a time. Multiple hashes pointing to the same URL is unusual and frequently unhelpful. Example request body
{\n \"url_to_add\" : \"https://rule34.xxx/index.php?id=2588418&page=post&s=view\",\n \"hash\" : \"3b820114f658d768550e4e3d4f1dced3ff8db77443472b5ad93700647ad2d3ba\"\n}\n
Response: 200 with no content. Like when adding tags, this is safely idempotent--do not worry about re-adding URLs associations that already exist or accidentally trying to delete ones that don't."},{"location":"developer_api.html#editing_file_tags","title":"Editing File Tags","text":""},{"location":"developer_api.html#add_tags_clean_tags","title":"GET /add_tags/clean_tags
","text":"Ask the client about how it will see certain tags.
Restricted access: YES. Add Tags permission needed.Required Headers: n/a
Arguments (in percent-encoded JSON):tags
: (a list of the tags you want cleaned)[ \" bikini \", \"blue eyes\", \" character : samus aran \", \" :)\", \" \", \"\", \"10\", \"11\", \"9\", \"system:wew\", \"-flower\" ]
: /add_tags/clean_tags?tags=%5B%22%20bikini%20%22%2C%20%22blue%20%20%20%20eyes%22%2C%20%22%20character%20%3A%20samus%20aran%20%22%2C%20%22%3A%29%22%2C%20%22%20%20%20%22%2C%20%22%22%2C%20%2210%22%2C%20%2211%22%2C%20%229%22%2C%20%22system%3Awew%22%2C%20%22-flower%22%5D\n
Response: The tags cleaned according to hydrus rules. They will also be in hydrus human-friendly sorting order. Example response
{\n \"tags\" : [\"9\", \"10\", \"11\", \" ::)\", \"bikini\", \"blue eyes\", \"character:samus aran\", \"flower\", \"wew\"]\n}\n
Mostly, hydrus simply trims excess whitespace, but the other examples are rare issues you might run into. 'system' is an invalid namespace, tags cannot be prefixed with hyphens, and any tag starting with ':' is secretly dealt with internally as \"[no namespace]:[colon-prefixed-subtag]\". Again, you probably won't run into these, but if you see a mismatch somewhere and want to figure it out, or just want to sort some numbered tags, you might like to try this.
"},{"location":"developer_api.html#add_tags_get_siblings_and_parents","title":"GET/add_tags/get_siblings_and_parents
","text":"Ask the client about tags' sibling and parent relationships.
Restricted access: YES. Add Tags permission needed.Required Headers: n/a
Arguments (in percent-encoded JSON):tags
: (a list of the tags you want info on)[ \"blue eyes\", \"samus aran\" ]
: /add_tags/get_siblings_and_parents?tags=%5B%22blue%20eyes%22%2C%20%22samus%20aran%22%5D\n
Response: An Object showing all the display relationships for each tag on each service. Also The Services Object. Example response
{\n \"services\" : \"The Services Object\"\n \"tags\" : {\n \"blue eyes\" : {\n \"6c6f63616c2074616773\" : {\n \"ideal_tag\" : \"blue eyes\",\n \"siblings\" : [\n \"blue eyes\",\n \"blue_eyes\",\n \"blue eye\",\n \"blue_eye\"\n ],\n \"descendants\" : [],\n \"ancestors\" : []\n },\n \"877bfcf81f56e7e3e4bc3f8d8669f92290c140ba0acfd6c7771c5e1dc7be62d7\": {\n \"ideal_tag\" : \"blue eyes\",\n \"siblings\" : [\n \"blue eyes\"\n ],\n \"descendants\" : [],\n \"ancestors\" : []\n }\n },\n \"samus aran\" : {\n \"6c6f63616c2074616773\" : {\n \"ideal_tag\" : \"character:samus aran\",\n \"siblings\" : [\n \"samus aran\",\n \"samus_aran\",\n \"character:samus aran\"\n ],\n \"descendants\" : [\n \"character:samus aran (zero suit)\"\n \"cosplay:samus aran\"\n ],\n \"ancestors\" : [\n \"series:metroid\",\n \"studio:nintendo\"\n ]\n },\n \"877bfcf81f56e7e3e4bc3f8d8669f92290c140ba0acfd6c7771c5e1dc7be62d7\": {\n \"ideal_tag\" : \"samus aran\",\n \"siblings\" : [\n \"samus aran\"\n ],\n \"descendants\" : [\n \"zero suit samus\",\n \"samus_aran_(cosplay)\"\n ],\n \"ancestors\" : []\n }\n }\n }\n}\n
This data is essentially how mappings in the storage
tag_display_type
become display
.
The hex keys are the service keys, which you will have seen elsewhere, like GET /get_files/file_metadata. Note that there is no concept of 'all known tags' here. If a tag is in 'my tags', it follows the rules of 'my tags', and then all the services' display tags are merged into the 'all known tags' pool for user display.
Also, the siblings and parents here are not just what is in tags->manage tag siblings/parents, they are the final computed combination of rules as set in tags->manage where tag siblings and parents apply. The data given here is not guaranteed to be useful for editing siblings and parents on a particular service. That data, which is currently pair-based, will appear in a different API request in future.
ideal_tag
is how the tag appears in normal display to the user.siblings
is every tag that will show as the ideal_tag
, including the ideal_tag
itself.descendants
is every child (and recursive grandchild, great-grandchild...) that implies the ideal_tag
.ancestors
is every parent (and recursive grandparent, great-grandparent...) that our tag implies.Every descendant and ancestor is an ideal_tag
itself that may have its own siblings.
Most situations are simple, but remember that siblings and parents in hydrus can get complex. If you want to display this data, I recommend you plan to support simple service-specific workflows, and add hooks to recognise conflicts and other difficulty and, when that happens, abandon ship (send the user back to Hydrus proper). Also, if you show summaries of the data anywhere, make sure you add a 'and 22 more...' overflow mechanism to your menus, since if you hit up 'azur lane' or 'pokemon', you are going to get hundreds of children.
I generally warn you off computing sibling and parent mappings or counts yourself. The data from this request is best used for sibling and parent decorators on individual tags in a 'manage tags' presentation. The code that actually computes what siblings and parents look like in the 'display' context can be a pain at times, and I've already done it. Just run /search_tags or /file_metadata again after any changes you make and you'll get updated values.
"},{"location":"developer_api.html#add_tags_search_tags","title":"GET/add_tags/search_tags
","text":"Search the client for tags.
Restricted access: YES. Search for Files and Add Tags permission needed.Required Headers: n/a
Arguments:search
: (the tag text to search for, enter exactly what you would in the client UI)tag_service_key
: (optional, hexadecimal, the tag domain on which to search, defaults to 'all known tags')tag_display_type
: (optional, string, to select whether to search raw or sibling-processed tags, defaults to 'storage')The file domain
and tag_service_key
perform the function of the file and tag domain buttons in the client UI.
The tag_display_type
can be either storage
(the default), which searches your file's stored tags, just as they appear in a 'manage tags' dialog, or display
, which searches the sibling-processed tags, just as they appear in a normal file search page. In the example above, setting the tag_display_type
to display
could well combine the two kim possible tags and give a count of 3 or 4.
'all my files'/'all known tags' works fine for most cases, but a specific tag service or 'all known files'/'tag service' can work better for editing tag repository storage
contexts, since it provides results just for that service, and for repositories, it gives tags for all the non-local files other users have tagged.
/add_tags/search_tags?search=kim&tag_display_type=display\n
Response: Some JSON listing the client's matching tags. Example response{\n \"tags\" : [\n {\n \"value\" : \"series:kim possible\", \n \"count\" : 3\n },\n {\n \"value\" : \"kimchee\", \n \"count\" : 2\n },\n {\n \"value\" : \"character:kimberly ann possible\", \n \"count\" : 1\n }\n ]\n}\n
The tags
list will be sorted by descending count. The various rules in tags->manage tag display and search (e.g. no pure *
searches on certain services) will also be checked--and if violated, you will get 200 OK but an empty result.
Note that if your client api access is only allowed to search certain tags, the results will be similarly filtered.
"},{"location":"developer_api.html#add_tags_add_tags","title":"POST/add_tags/add_tags
","text":"Make changes to the tags that files have.
Restricted access: YES. Add Tags permission needed.Required Headers: n/a
Arguments (in JSON):service_keys_to_tags
: (selective B, an Object of service keys to lists of tags to be 'added' to the files)service_keys_to_actions_to_tags
: (selective B, an Object of service keys to content update actions to lists of tags)In 'service_keys_to...', the keys are as in /get_services. You may need some selection UI on your end so the user can pick what to do if there are multiple choices.
Also, you can use either '...to_tags', which is simple and add-only, or '...to_actions_to_tags', which is more complicated and allows you to remove/petition or rescind pending content.
The permitted 'actions' are:
When you petition a tag from a repository, a 'reason' for the petition is typically needed. If you send a normal list of tags here, a default reason of \"Petitioned from API\" will be given. If you want to set your own reason, you can instead give a list of [ tag, reason ] pairs.
Some example requests:Adding some tags to a file
{\n \"hash\" : \"df2a7b286d21329fc496e3aa8b8a08b67bb1747ca32749acb3f5d544cbfc0f56\",\n \"service_keys_to_tags\" : {\n \"6c6f63616c2074616773\" : [\"character:supergirl\", \"rating:safe\"]\n }\n}\n
Adding more tags to two files{\n \"hashes\" : [\n \"df2a7b286d21329fc496e3aa8b8a08b67bb1747ca32749acb3f5d544cbfc0f56\",\n \"f2b022214e711e9a11e2fcec71bfd524f10f0be40c250737a7861a5ddd3faebf\"\n ],\n \"service_keys_to_tags\" : {\n \"6c6f63616c2074616773\" : [\"process this\"],\n \"ccb0cf2f9e92c2eb5bd40986f72a339ef9497014a5fb8ce4cea6d6c9837877d9\" : [\"creator:dandon fuga\"]\n }\n}\n
A complicated transaction with all possible actions{\n \"hash\" : \"df2a7b286d21329fc496e3aa8b8a08b67bb1747ca32749acb3f5d544cbfc0f56\",\n \"service_keys_to_actions_to_tags\" : {\n \"6c6f63616c2074616773\" : {\n \"0\" : [\"character:supergirl\", \"rating:safe\"],\n \"1\" : [\"character:superman\"]\n },\n \"aa0424b501237041dab0308c02c35454d377eebd74cfbc5b9d7b3e16cc2193e9\" : {\n \"2\" : [\"character:supergirl\", \"rating:safe\"],\n \"3\" : [\"filename:image.jpg\"],\n \"4\" : [[\"creator:danban faga\", \"typo\"], [\"character:super_girl\", \"underscore\"]],\n \"5\" : [\"skirt\"]\n }\n }\n}\n
This last example is far more complicated than you will usually see. Pend rescinds and petition rescinds are not common. Petitions are also quite rare, and gathering a good petition reason for each tag is often a pain.
Note that the enumerated status keys in the service_keys_to_actions_to_tags structure are strings, not ints (JSON does not support int keys for Objects).
Response description: 200 and no content.Note
Note also that hydrus tag actions are safely idempotent. You can pend a tag that is already pended, or add a tag that already exists, and not worry about an error--the surplus add action will be discarded. The same is true if you try to pend a tag that actually already exists, or rescinding a petition that doesn't. Any invalid actions will fail silently.
It is fine to just throw your 'process this' tags at every file import and not have to worry about checking which files you already added them to.
HOWEVER
When you delete a tag, a deletion record is made even if the tag does not exist on the file. This is important if you expect to add the tags again via parsing, because, in general, when hydrus adds tags through a downloader, it will not overwrite a previously 'deleted' tag record (this is to stop re-downloads overwriting the tags you hand-removed previously). Undeletes usually have to be done manually by a human.
So, do be careful about how you spam delete unless it is something that doesn't matter or it is something you'll only be touching again via the API anyway.
"},{"location":"developer_api.html#editing_file_ratings","title":"Editing File Ratings","text":""},{"location":"developer_api.html#edit_ratings_set_rating","title":"POST/edit_ratings/set_rating
","text":"Add or remove ratings associated with a file.
Restricted access: YES. Edit Ratings permission needed. Required Headers:Content-Type
: application/json
rating_service_key
: (hexadecimal, the rating service you want to edit)rating
: (mixed datatype, the rating value you want to set){\n \"hash\" : \"3b820114f658d768550e4e3d4f1dced3ff8db77443472b5ad93700647ad2d3ba\",\n \"rating_service_key\" : \"282303611ba853659aa60aeaa5b6312d40e05b58822c52c57ae5e320882ba26e\",\n \"rating\" : 2\n}\n
This is fairly simple, but there are some caveats around the different rating service types and the actual data you are setting here. It is the same as you'll see in GET /get_files/file_metadata.
"},{"location":"developer_api.html#likedislike_ratings","title":"Like/Dislike Ratings","text":"Send true
for 'like', false
for 'dislike', or null
for 'unset'.
Send an int
for the number of stars to set, or null
for 'unset'.
Send an int
for the number to set. 0 is your minimum.
As with GET /get_files/file_metadata, check The Services Object for the min/max stars on a numerical rating service.
Response: 200 and no content."},{"location":"developer_api.html#editing_file_notes","title":"Editing File Notes","text":""},{"location":"developer_api.html#add_notes_set_notes","title":"POST/add_notes/set_notes
","text":"Add or update notes associated with a file.
Restricted access: YES. Add Notes permission needed. Required Headers:Content-Type
: application/json
notes
: (an Object mapping string names to string texts)hash
: (selective, an SHA256 hash for the file in 64 characters of hexadecimal)file_id
: (selective, the integer numerical identifier for the file)merge_cleverly
: true or false (optional, defaults false)extend_existing_note_if_possible
: true or false (optional, defaults true)conflict_resolution
: 0, 1, 2, or 3 (optional, defaults 3)With merge_cleverly
left false
, then this is a simple update operation. Existing notes will be overwritten exactly as you specify. Any other notes the file has will be untouched. Example request body
{\n \"notes\" : {\n \"note name\" : \"content of note\",\n \"another note\" : \"asdf\"\n },\n \"hash\" : \"3b820114f658d768550e4e3d4f1dced3ff8db77443472b5ad93700647ad2d3ba\"\n}\n
If you turn on merge_cleverly
, then the client will merge your new notes into the file's existing notes using the same logic you have seen in Note Import Options and the Duplicate Metadata Merge Options. This navigates conflict resolution, and you should use it if you are adding potential duplicate content from an 'automatic' source like a parser and do not want to wade into the logic. Do not use it for a user-editing experience (a user expects a strict overwrite/replace experience and will be confused by this mode).
To start off, in this mode, if your note text exists under a different name for the file, your dupe note will not be added to your new name. extend_existing_note_if_possible
makes it so your existing note text will overwrite an existing name (or a '... (1)' rename of that name) if the existing text is inside your given text. conflict_resolution
is an enum governing what to do in all other conflicts:
merge_cleverly=false
, this is exactly what you gave, and this operation is idempotent. If merge_cleverly=true
, then this may differ, even be empty, and this operation might not be idempotent. Example response{\n \"notes\" : {\n \"note name\" : \"content of note\",\n \"another note (1)\" : \"asdf\"\n }\n}\n
"},{"location":"developer_api.html#add_notes_delete_notes","title":"POST /add_notes/delete_notes
","text":"Remove notes associated with a file.
Restricted access: YES. Add Notes permission needed. Required Headers:Content-Type
: application/json
note_names
: (a list of string note names to delete)hash
: (selective, an SHA256 hash for the file in 64 characters of hexadecimal)file_id
: (selective, the integer numerical identifier for the file){\n \"note_names\" : [\"note name\", \"another note\"],\n \"hash\" : \"3b820114f658d768550e4e3d4f1dced3ff8db77443472b5ad93700647ad2d3ba\"\n}\n
Response: 200 with no content. This operation is idempotent."},{"location":"developer_api.html#searching_and_fetching_files","title":"Searching and Fetching Files","text":"File search in hydrus is not paginated like a booru--all searches return all results in one go. In order to keep this fast, search is split into two steps--fetching file identifiers with a search, and then fetching file metadata in batches. You may have noticed that the client itself performs searches like this--thinking a bit about a search and then bundling results in batches of 256 files before eventually throwing all the thumbnails on screen.
"},{"location":"developer_api.html#get_files_search_files","title":"GET/get_files/search_files
","text":"Search for the client's files.
Restricted access: YES. Search for Files permission needed. Additional search permission limits may apply.Required Headers: n/a
Arguments (in percent-encoded JSON):tags
: (a list of tags you wish to search for)tag_service_key
: (optional, hexadecimal, the tag domain on which to search, defaults to 'all my files')file_sort_type
: (optional, integer, the results sort method, defaults to 'all known tags')file_sort_asc
: true or false (optional, the results sort order)return_file_ids
: true or false (optional, default true, returns file id results)return_hashes
: true or false (optional, default false, returns hex hash results)/get_files/search_files?tags=%5B%22blue%20eyes%22%2C%20%22blonde%20hair%22%2C%20%22%5Cu043a%5Cu0438%5Cu043d%5Cu043e%22%2C%20%22system%3Ainbox%22%2C%20%22system%3Alimit%3D16%22%5D\n
If the access key's permissions only permit search for certain tags, at least one positive whitelisted/non-blacklisted tag must be in the \"tags\" list or this will 403. Tags can be prepended with a hyphen to make a negated tag (e.g. \"-green eyes\"), but these will not be checked against the permissions whitelist.
Wildcards and namespace searches are supported, so if you search for 'character:sam*' or 'series:*', this will be handled correctly clientside.
Many system predicates are also supported using a text parser! The parser was designed by a clever user for human input and allows for a certain amount of error (e.g. ~= instead of \u2248, or \"isn't\" instead of \"is not\") or requires more information (e.g. the specific hashes for a hash lookup). Here's a big list of examples that are supported:
System Predicatesservice_name
service_name
service_name
> \u2157 (numerical services)service_name
is like (like/dislike services)service_name
= 13 (inc/dec services)Please test out the system predicates you want to send. If you are in help->advanced mode, you can test this parser in the advanced text input dialog when you click the OR* button on a tag autocomplete dropdown. More system predicate types and input formats will be available in future. Reverse engineering system predicate data from text is obviously tricky. If a system predicate does not parse, you'll get 400.
Also, OR predicates are now supported! Just nest within the tag list, and it'll be treated like an OR. For instance:
[ \"skirt\", [ \"samus aran\", \"lara croft\" ], \"system:height > 1000\" ]
Makes:
The file and tag services are for search domain selection, just like clicking the buttons in the client. They are optional--default is 'all my files' and 'all known tags'.
File searches occur in the display
tag_display_type
. If you want to pair autocomplete tag lookup from /search_tags to this file search (e.g. for making a standard booru search interface), then make sure you are searching display
tags there.
file_sort_asc is 'true' for ascending, and 'false' for descending. The default is descending.
file_sort_type is by default import time. It is an integer according to the following enum, and I have written the semantic (asc/desc) meaning for each type after:
The full list of numerical file ids that match the search. Example response
{\n \"file_ids\" : [125462, 4852415, 123, 591415]\n}\n
Example response with return_hashes=true{\n \"hashes\" : [\n \"1b04c4df7accd5a61c5d02b36658295686b0abfebdc863110e7d7249bba3f9ad\",\n \"fe416723c731d679aa4d20e9fd36727f4a38cd0ac6d035431f0f452fad54563f\",\n \"b53505929c502848375fbc4dab2f40ad4ae649d34ef72802319a348f81b52bad\"\n ],\n \"file_ids\" : [125462, 4852415, 123]\n}\n
You can of course also specify return_hashes=true&return_file_ids=false
just to get the hashes. The order of both lists is the same.
File ids are internal and specific to an individual client. For a client, a file with hash H always has the same file id N, but two clients will have different ideas about which N goes with which H. IDs are a bit faster to retrieve than hashes and search with en masse, which is why they are exposed here.
This search does not apply the implicit limit that most clients set to all searches (usually 10,000), so if you do system:everything on a client with millions of files, expect to get boshed. Even with a system:limit included, complicated queries with large result sets may take several seconds to respond. Just like the client itself.
"},{"location":"developer_api.html#get_files_file_hashes","title":"GET/get_files/file_hashes
","text":"Lookup file hashes from other hashes.
Restricted access: YES. Search for Files permission needed.Required Headers: n/a
Arguments (in percent-encoded JSON):hash
: (selective, a hexadecimal hash)hashes
: (selective, a list of hexadecimal hashes)source_hash_type
: [sha256|md5|sha1|sha512] (optional, defaulting to sha256)desired_hash_type
: [sha256|md5|sha1|sha512]If you have some MD5 hashes and want to see what their SHA256 are, or vice versa, this is the place. Hydrus records the non-SHA256 hashes for every file it has ever imported. This data is not removed on file deletion.
Example request/get_files/file_hashes?hash=ec5c5a4d7da4be154597e283f0b6663c&source_hash_type=md5&desired_hash_type=sha256\n
Response: A mapping Object of the successful lookups. Where no matching hash is found, no entry will be made (therefore, if none of your source hashes have matches on the client, this will return an empty hashes
Object). Example response{\n \"hashes\" : {\n \"ec5c5a4d7da4be154597e283f0b6663c\" : \"2a0174970defa6f147f2eabba829c5b05aba1f1aea8b978611a07b7bb9cf9399\"\n }\n}\n
"},{"location":"developer_api.html#get_files_file_metadata","title":"GET /get_files/file_metadata
","text":"Get metadata about files in the client.
Restricted access: YES. Search for Files permission needed. Additional search permission limits may apply.Required Headers: n/a
Arguments (in percent-encoded JSON):create_new_file_ids
: true or false (optional if asking with hash(es), defaulting to false)only_return_identifiers
: true or false (optional, defaulting to false)only_return_basic_information
: true or false (optional, defaulting to false)detailed_url_information
: true or false (optional, defaulting to false)include_blurhash
: true or false (optional, defaulting to false. Only applies when only_return_basic_information
is true)include_notes
: true or false (optional, defaulting to false)include_services_object
: true or false (optional, defaulting to true)hide_service_keys_tags
: Deprecated, will be deleted soon! true or false (optional, defaulting to true)If your access key is restricted by tag, the files you search for must have been in the most recent search result.
Example request for two files with ids 123 and 4567/get_files/file_metadata?file_ids=%5B123%2C%204567%5D\n
The same, but only wants hashes back/get_files/file_metadata?file_ids=%5B123%2C%204567%5D&only_return_identifiers=true\n
And one that fetches two hashes/get_files/file_metadata?hashes=%5B%224c77267f93415de0bc33b7725b8c331a809a924084bee03ab2f5fae1c6019eb2%22%2C%20%223e7cb9044fe81bda0d7a84b5cb781cba4e255e4871cba6ae8ecd8207850d5b82%22%5D\n
This request string can obviously get pretty ridiculously long. It also takes a bit of time to fetch metadata from the database. In its normal searches, the client usually fetches file metadata in batches of 256.
Response: A list of JSON Objects that store a variety of file metadata. Also The Services Object for service reference.Example response
{\n \"services\" : \"The Services Object\",\n \"metadata\" : [\n {\n \"file_id\" : 123,\n \"hash\" : \"4c77267f93415de0bc33b7725b8c331a809a924084bee03ab2f5fae1c6019eb2\",\n \"size\" : 63405,\n \"mime\" : \"image/jpeg\",\n \"filetype_human\" : \"jpeg\",\n \"filetype_enum\" : 1,\n \"ext\" : \".jpg\",\n \"width\" : 640,\n \"height\" : 480,\n \"thumbnail_width\" : 200,\n \"thumbnail_height\" : 150,\n \"duration\" : null,\n \"time_modified\" : null,\n \"time_modified_details\" : {},\n \"file_services\" : {\n \"current\" : {},\n \"deleted\" : {}\n },\n \"ipfs_multihashes\" : {},\n \"has_audio\" : false,\n \"blurhash\" : \"U6PZfSi_.AyE_3t7t7R**0o#DgR4_3R*D%xt\",\n \"pixel_hash\" : \"2519e40f8105599fcb26187d39656b1b46f651786d0e32fff2dc5a9bc277b5bb\",\n \"num_frames\" : null,\n \"num_words\" : null,\n \"is_inbox\" : false,\n \"is_local\" : false,\n \"is_trashed\" : false,\n \"is_deleted\" : false,\n \"has_exif\" : true,\n \"has_human_readable_embedded_metadata\" : true,\n \"has_icc_profile\" : true,\n \"has_transparency\" : false,\n \"known_urls\" : [],\n \"ratings\" : {\n \"74d52c6238d25f846d579174c11856b1aaccdb04a185cb2c79f0d0e499284f2c\" : null,\n \"90769255dae5c205c975fc4ce2efff796b8be8a421f786c1737f87f98187ffaf\" : null,\n \"b474e0cbbab02ca1479c12ad985f1c680ea909a54eb028e3ad06750ea40d4106\" : 0\n },\n \"tags\" : {\n \"6c6f63616c2074616773\" : {\n \"storage_tags\" : {},\n \"display_tags\" : {}\n },\n \"37e3849bda234f53b0e9792a036d14d4f3a9a136d1cb939705dbcd5287941db4\" : {\n \"storage_tags\" : {},\n \"display_tags\" : {}\n },\n \"616c6c206b6e6f776e2074616773\" : {\n \"storage_tags\" : {},\n \"display_tags\" : {}\n }\n }\n },\n {\n \"file_id\" : 4567,\n \"hash\" : \"3e7cb9044fe81bda0d7a84b5cb781cba4e255e4871cba6ae8ecd8207850d5b82\",\n \"size\" : 199713,\n \"mime\" : \"video/webm\",\n \"filetype_human\" : \"webm\",\n \"filetype_enum\" : 21,\n \"ext\" : \".webm\",\n \"width\" : 1920,\n \"height\" : 1080,\n \"thumbnail_width\" : 200,\n \"thumbnail_height\" : 113,\n \"duration\" : 4040,\n \"time_modified\" : 1604055647,\n \"time_modified_details\" : {\n \"local\" : 1641044491,\n \"gelbooru.com\" : 1604055647\n },\n \"file_services\" : {\n \"current\" : {\n \"616c6c206c6f63616c2066696c6573\" : {\n \"time_imported\" : 1641044491\n },\n \"616c6c206c6f63616c206d65646961\" : {\n \"time_imported\" : 1641044491\n },\n \"cb072cffbd0340b67aec39e1953c074e7430c2ac831f8e78fb5dfbda6ec8dcbd\" : {\n \"time_imported\" : 1641204220\n }\n },\n \"deleted\" : {\n \"6c6f63616c2066696c6573\" : {\n \"time_deleted\" : 1641204274,\n \"time_imported\" : 1641044491\n }\n }\n },\n \"ipfs_multihashes\" : {\n \"55af93e0deabd08ce15ffb2b164b06d1254daab5a18d145e56fa98f71ddb6f11\" : \"QmReHtaET3dsgh7ho5NVyHb5U13UgJoGipSWbZsnuuM8tb\"\n },\n \"has_audio\" : true,\n \"blurhash\" : \"UHF5?xYk^6#M@-5b,1J5@[or[k6.};FxngOZ\",\n \"pixel_hash\" : \"1dd9625ce589eee05c22798a9a201602288a1667c59e5cd1fb2251a6261fbd68\",\n \"num_frames\" : 102,\n \"num_words\" : null,\n \"is_inbox\" : false,\n \"is_local\" : true,\n \"is_trashed\" : false,\n \"is_deleted\" : false,\n \"has_exif\" : false,\n \"has_human_readable_embedded_metadata\" : false,\n \"has_icc_profile\" : false,\n \"has_transparency\" : false,\n \"known_urls\" : [\n \"https://gelbooru.com/index.php?page=post&s=view&id=4841557\",\n \"https://img2.gelbooru.com//images/80/c8/80c8646b4a49395fb36c805f316c49a9.jpg\",\n \"http://origin-orig.deviantart.net/ed31/f/2019/210/7/8/beachqueen_samus_by_dandonfuga-ddcu1xg.jpg\"\n ],\n \"ratings\" : {\n \"74d52c6238d25f846d579174c11856b1aaccdb04a185cb2c79f0d0e499284f2c\" : true,\n \"90769255dae5c205c975fc4ce2efff796b8be8a421f786c1737f87f98187ffaf\" : 3,\n \"b474e0cbbab02ca1479c12ad985f1c680ea909a54eb028e3ad06750ea40d4106\" : 11\n },\n \"tags\" : {\n \"6c6f63616c2074616773\" : {\n \"storage_tags\" : {\n \"0\" : [\"samus favourites\"],\n \"2\" : [\"process this later\"]\n },\n \"display_tags\" : {\n \"0\" : [\"samus favourites\", \"favourites\"],\n \"2\" : [\"process this later\"]\n }\n },\n \"37e3849bda234f53b0e9792a036d14d4f3a9a136d1cb939705dbcd5287941db4\" : {\n \"storage_tags\" : {\n \"0\" : [\"blonde_hair\", \"blue_eyes\", \"looking_at_viewer\"],\n \"1\" : [\"bodysuit\"]\n },\n \"display_tags\" : {\n \"0\" : [\"blonde hair\", \"blue_eyes\", \"looking at viewer\"],\n \"1\" : [\"bodysuit\", \"clothing\"]\n }\n },\n \"616c6c206b6e6f776e2074616773\" : {\n \"storage_tags\" : {\n \"0\" : [\"samus favourites\", \"blonde_hair\", \"blue_eyes\", \"looking_at_viewer\"],\n \"1\" : [\"bodysuit\"]\n },\n \"display_tags\" : {\n \"0\" : [\"samus favourites\", \"favourites\", \"blonde hair\", \"blue_eyes\", \"looking at viewer\"],\n \"1\" : [\"bodysuit\", \"clothing\"]\n }\n }\n }\n }\n ]\n}\n
And one where only_return_identifiers is true{\n \"services\" : \"The Services Object\",\n \"metadata\" : [\n {\n \"file_id\" : 123,\n \"hash\" : \"4c77267f93415de0bc33b7725b8c331a809a924084bee03ab2f5fae1c6019eb2\"\n },\n {\n \"file_id\" : 4567,\n \"hash\" : \"3e7cb9044fe81bda0d7a84b5cb781cba4e255e4871cba6ae8ecd8207850d5b82\"\n }\n ]\n}\n
And where only_return_basic_information is true{\n \"services\" : \"The Services Object\",\n \"metadata\" : [\n {\n \"file_id\" : 123,\n \"hash\" : \"4c77267f93415de0bc33b7725b8c331a809a924084bee03ab2f5fae1c6019eb2\",\n \"size\" : 63405,\n \"mime\" : \"image/jpeg\",\n \"filetype_human\" : \"jpeg\",\n \"filetype_enum\" : 1,\n \"ext\" : \".jpg\",\n \"width\" : 640,\n \"height\" : 480,\n \"duration\" : null,\n \"has_audio\" : false,\n \"num_frames\" : null,\n \"num_words\" : null\n },\n {\n \"file_id\" : 4567,\n \"hash\" : \"3e7cb9044fe81bda0d7a84b5cb781cba4e255e4871cba6ae8ecd8207850d5b82\",\n \"size\" : 199713,\n \"mime\" : \"video/webm\",\n \"filetype_human\" : \"webm\",\n \"filetype_enum\" : 21,\n \"ext\" : \".webm\",\n \"width\" : 1920,\n \"height\" : 1080,\n \"duration\" : 4040,\n \"has_audio\" : true,\n \"num_frames\" : 102,\n \"num_words\" : null\n }\n ]\n}\n
"},{"location":"developer_api.html#basics","title":"basics","text":"Size is in bytes. Duration is in milliseconds, and may be an int or a float.
is_trashed
means if the file is currently in the trash but available on the hard disk. is_deleted
means currently either in the trash or completely deleted from disk.
file_services
stores which file services the file is currently in and deleted from. The entries are by the service key, same as for tags later on. In rare cases, the timestamps may be null
, if they are unknown (e.g. a time_deleted
for the file deleted before this information was tracked). The time_modified
can also be null. Time modified is just the filesystem modified time for now, but it will evolve into more complicated storage in future with multiple locations (website post times) that'll be aggregated to a sensible value in UI.
ipfs_multihashes
stores the ipfs service key to any known multihash for the file.
The thumbnail_width
and thumbnail_height
are a generally reliable prediction but aren't a promise. The actual thumbnail you get from /get_files/thumbnail will be different if the user hasn't looked at it since changing their thumbnail options. You only get these rows for files that hydrus actually generates an actual thumbnail for. Things like pdf won't have it. You can use your own thumb, or ask the api and it'll give you a fixed fallback; those are mostly 200x200, but you can and should size them to whatever you want.
If the file has a thumbnail, blurhash
gives a base 83 encoded string of its blurhash. pixel_hash
is an SHA256 of the image's pixel data and should exactly match for pixel-identical files (it is used in the duplicate system for 'must be pixel duplicates').
The tags
structure is similar to the /add_tags/add_tags scheme, excepting that the status numbers are:
Note
Since JSON Object keys must be strings, these status numbers are strings, not ints.
While the 'storage_tags' represent the actual tags stored on the database for a file, 'display_tags' reflect how tags appear in the UI, after siblings are collapsed and parents are added. If you want to edit a file's tags, refer to the storage tags. If you want to render to the user, use the display tags. The display tag calculation logic is very complicated; if the storage tags change, do not try to guess the new display tags yourself--just ask the API again.
"},{"location":"developer_api.html#ratings","title":"ratings","text":"The ratings
structure is simple, but it holds different data types. For each service:
Check The Services Object to see the shape of a rating star, and min/max number of stars in a numerical service.
"},{"location":"developer_api.html#services","title":"services","text":"The tags
, ratings
, and file_services
structures use the hexadecimal service_key
extensively. If you need to look up the respective service name or type, check The Services Object under the top level services
key.
Note
If you look, those file structures actually include the service name and type already, but this bloated data is deprecated and will be deleted in 2024, so please transition over.
If you don't want the services object (it is generally superfluous on the 'simple' responses), then add include_services_object=false
.
The metadata
list should come back in the same sort order you asked, whether that is in file_ids
or hashes
!
If you ask with hashes rather than file_ids, hydrus will, by default, only return results when it has seen those hashes before. This is to stop the client making thousands of new file_id records in its database if you perform a scanning operation. If you ask about a hash the client has never encountered before--for which there is no file_id--you will get this style of result:
Missing file_id example{\n \"metadata\" : [\n {\n \"file_id\" : null,\n \"hash\" : \"766da61f81323629f982bc1b71b5c1f9bba3f3ed61caf99906f7f26881c3ae93\"\n }\n ]\n}\n
You can change this behaviour with create_new_file_ids=true
, but bear in mind you will get a fairly 'empty' metadata result with lots of 'null' lines, so this is only useful for gathering the numerical ids for later Client API work.
If you ask about file_ids that do not exist, you'll get 404.
If you set only_return_basic_information=true
, this will be much faster for first-time requests than the full metadata result, but it will be slower for repeat requests. The full metadata object is cached after first fetch, the limited file info object is not. You can optionally set include_blurhash
when using this option to fetch blurhash strings for the files.
If you add detailed_url_information=true
, a new entry, detailed_known_urls
, will be added for each file, with a list of the same structure as /add_urls/get_url_info
. This may be an expensive request if you are querying thousands of files at once.
{\n \"detailed_known_urls\": [\n {\n \"normalised_url\": \"https://gelbooru.com/index.php?id=4841557&page=post&s=view\",\n \"url_type\": 0,\n \"url_type_string\": \"post url\",\n \"match_name\": \"gelbooru file page\",\n \"can_parse\": true\n },\n {\n \"normalised_url\": \"https://img2.gelbooru.com//images/80/c8/80c8646b4a49395fb36c805f316c49a9.jpg\",\n \"url_type\": 5,\n \"url_type_string\": \"unknown url\",\n \"match_name\": \"unknown url\",\n \"can_parse\": false\n }\n ]\n}\n
"},{"location":"developer_api.html#get_files_file","title":"GET /get_files/file
","text":"Get a file.
Restricted access: YES. Search for Files permission needed. Additional search permission limits may apply.Required Headers: n/a
Arguments :file_id
: (selective, numerical file id for the file)hash
: (selective, a hexadecimal SHA256 hash for the file)download
: (optional, boolean, default false
)Only use one of file_id or hash. As with metadata fetching, you may only use the hash argument if you have access to all files. If you are tag-restricted, you will have to use a file_id in the last search you ran.
Example request
/get_files/file?file_id=452158\n
Example request/get_files/file?hash=7f30c113810985b69014957c93bc25e8eb4cf3355dae36d8b9d011d8b0cf623a&download=true\n
Response: The file itself. You should get the correct mime type as the Content-Type header. By default, this will set the Content-Disposition
header to inline
, which causes a web browser to show the file. If you set download=true
, it will set it to attachment
, which triggers the browser to automatically download it (or open the 'save as' dialog) instead.
/get_files/thumbnail
","text":"Get a file's thumbnail.
Restricted access: YES. Search for Files permission needed. Additional search permission limits may apply.Required Headers: n/a
Arguments:file_id
: (selective, numerical file id for the file)hash
: (selective, a hexadecimal SHA256 hash for the file)Only use one. As with metadata fetching, you may only use the hash argument if you have access to all files. If you are tag-restricted, you will have to use a file_id in the last search you ran.
Example request
/get_files/thumbnail?file_id=452158\n
Example request/get_files/thumbnail?hash=7f30c113810985b69014957c93bc25e8eb4cf3355dae36d8b9d011d8b0cf623a\n
Response: The thumbnail for the file. Some hydrus thumbs are jpegs, some are pngs. It should give you the correct image/jpeg or image/png Content-Type.
If hydrus keeps no thumbnail for the filetype, for instance with pdfs, then you will get the same default 'pdf' icon you see in the client. If the file does not exist in the client, or the thumbnail was expected but is missing from storage, you will get the fallback 'hydrus' icon, again just as you would in the client itself. This request should never give a 404.
Size of Normal Thumbs
Thumbnails are not guaranteed to be the correct size! If a thumbnail has not been loaded in the client in years, it could well have been fitted for older thumbnail settings. Also, even 'clean' thumbnails will not always fit inside the settings' bounding box; they may be boosted due to a high-DPI setting or spill over due to a 'fill' vs 'fit' preference. You cannot easily predict what resolution a thumbnail will or should have!
In general, thumbnails are the correct ratio. If you are drawing thumbs, you should embed them to fit or fill, but don't fix them at 100% true size: make sure they can scale to the size you want!
Size of Defaults
If you get a 'default' filetype thumbnail like the pdf or hydrus one, you will be pulling the pngs straight from the hydrus/static folder. They will most likely be 200x200 pixels.
"},{"location":"developer_api.html#get_files_render","title":"GET/get_files/render
","text":"Get an image file as rendered by Hydrus.
Restricted access: YES. Search for Files permission needed. Additional search permission limits may apply.Required Headers: n/a
Arguments :file_id
: (selective, numerical file id for the file)hash
: (selective, a hexadecimal SHA256 hash for the file)download
: (optional, boolean, default false
)Only use one of file_id or hash. As with metadata fetching, you may only use the hash argument if you have access to all files. If you are tag-restricted, you will have to use a file_id in the last search you ran.
The file you request must be a still image file that Hydrus can render (this includes PSD files). This request uses the client image cache.
Example request
/get_files/render?file_id=452158\n
Example request/get_files/render?hash=7f30c113810985b69014957c93bc25e8eb4cf3355dae36d8b9d011d8b0cf623a&download=true\n
Response: A PNG file of the image as would be rendered in the client. It will be converted to sRGB color if the file had a color profile but the rendered PNG will not have any color profile. By default, this will set the Content-Disposition
header to inline
, which causes a web browser to show the file. If you set download=true
, it will set it to attachment
, which triggers the browser to automatically download it (or open the 'save as' dialog) instead.
This refers to the File Relationships system, which includes 'potential duplicates', 'duplicates', and 'alternates'.
This system is pending significant rework and expansion, so please do not get too married to some of the routines here. I am mostly just exposing my internal commands, so things are a little ugly/hacked. I expect duplicate and alternate groups to get some form of official identifier in future, which may end up being the way to refer and edit things here.
Also, at least for now, 'Manage File Relationships' permission is not going to be bound by the search permission restrictions that normal file search does. Getting this file relationship management permission allows you to search anything.
There is more work to do here, including adding various 'dissolve'/'undo' commands to break groups apart.
"},{"location":"developer_api.html#manage_file_relationships_get_file_relationships","title":"GET/manage_file_relationships/get_file_relationships
","text":"Get the current relationships for one or more files.
Restricted access: YES. Manage File Relationships permission needed.Required Headers: n/a
Arguments (in percent-encoded JSON):/manage_file_relationships/get_file_relationships?hash=ac940bb9026c430ea9530b4f4f6980a12d9432c2af8d9d39dfc67b05d91df11d\n
Response: A JSON Object mapping the hashes to their relationships. Example response{\n \"file_relationships\" : {\n \"ac940bb9026c430ea9530b4f4f6980a12d9432c2af8d9d39dfc67b05d91df11d\" : {\n \"is_king\" : false,\n \"king\" : \"8784afbfd8b59de3dcf2c13dc1be9d7cb0b3d376803c8a7a8b710c7c191bb657\",\n \"king_is_on_file_domain\" : true,\n \"king_is_local\" : true,\n \"0\" : [\n ],\n \"1\" : [],\n \"3\" : [\n \"8bf267c4c021ae4fd7c4b90b0a381044539519f80d148359b0ce61ce1684fefe\"\n ],\n \"8\" : [\n \"8784afbfd8b59de3dcf2c13dc1be9d7cb0b3d376803c8a7a8b710c7c191bb657\",\n \"3fa8ef54811ec8c2d1892f4f08da01e7fc17eed863acae897eb30461b051d5c3\"\n ]\n }\n }\n}\n
king
refers to which file is set as the best of a duplicate group. If you are doing potential duplicate comparisons, the kings of your two groups are usually the ideal representatives, and the 'get some pairs to filter'-style commands try to select the kings of the various to-be-compared duplicate groups. is_king
is a convenience bool for when a file is king of its own group.
It is possible for the king to not be available. Every group has a king, but if that file has been deleted, or if the file domain here is limited and the king is on a different file service, then it may not be available. A similar issue occurs when you search for filtering pairs--while it is ideal to compare kings with kings, if you set 'files must be pixel dupes', then the user will expect to see those pixel duplicates, not their champions--you may be forced to compare non-kings. king_is_on_file_domain
lets you know if the king is on the file domain you set, and king_is_local
lets you know if it is on the hard disk--if king_is_local=true
, you can do a /get_files/file
request on it. It is generally rare, but you have to deal with the king being unavailable--in this situation, your best bet is to just use the file itself as its own representative.
All the relationships you get are filtered by the file domain. If you set the file domain to 'all known files', you will get every relationship a file has, including all deleted files, which is often less useful than you would think. The default, 'all my files' is usually most useful.
A file that has no duplicates is considered to be in a duplicate group of size 1 and thus is always its own king.
The numbers are from a duplicate status enum, as so:
Note that because of JSON constraints, these are the string versions of the integers since they are Object keys.
All the hashes given here are in 'all my files', i.e. not in the trash. A file may have duplicates that have long been deleted, but, like the null king above, they will not show here.
"},{"location":"developer_api.html#manage_file_relationships_get_potentials_count","title":"GET/manage_file_relationships/get_potentials_count
","text":"Get the count of remaining potential duplicate pairs in a particular search domain. Exactly the same as the counts you see in the duplicate processing page.
Restricted access: YES. Manage File Relationships permission needed.Required Headers: n/a
Arguments (in percent-encoded JSON):tag_service_key_1
: (optional, default 'all known tags', a hex tag service key)tags_1
: (optional, default system:everything, a list of tags you wish to search for)tag_service_key_2
: (optional, default 'all known tags', a hex tag service key)tags_2
: (optional, default system:everything, a list of tags you wish to search for)potentials_search_type
: (optional, integer, default 0, regarding how the pairs should match the search(es))pixel_duplicates
: (optional, integer, default 1, regarding whether the pairs should be pixel duplicates)max_hamming_distance
: (optional, integer, default 4, the max 'search distance' of the pairs)/manage_file_relationships/get_potentials_count?tag_service_key_1=c1ba23c60cda1051349647a151321d43ef5894aacdfb4b4e333d6c4259d56c5f&tags_1=%5B%22dupes_to_process%22%2C%20%22system%3Awidth%3C400%22%5D&potentials_search_type=1&pixel_duplicates=2&max_hamming_distance=0&max_num_pairs=50\n
tag_service_key_x
and tags_x
work the same as /get_files/search_files. The _2
variants are only useful if the potentials_search_type
is 2.
potentials_search_type
and pixel_duplicates
are enums:
-and-
The max_hamming_distance
is the same 'search distance' you see in the Client UI. A higher number means more speculative 'similar files' search. If pixel_duplicates
is set to 'must be', then max_hamming_distance
is obviously ignored.
{\n \"potential_duplicates_count\" : 17\n}\n
If you confirm that a pair of potentials are duplicates, this may transitively collapse other potential pairs and decrease the count by more than 1.
"},{"location":"developer_api.html#manage_file_relationships_get_potential_pairs","title":"GET/manage_file_relationships/get_potential_pairs
","text":"Get some potential duplicate pairs for a filtering workflow. Exactly the same as the 'duplicate filter' in the duplicate processing page.
Restricted access: YES. Manage File Relationships permission needed.Required Headers: n/a
Arguments (in percent-encoded JSON):tag_service_key_1
: (optional, default 'all known tags', a hex tag service key)tags_1
: (optional, default system:everything, a list of tags you wish to search for)tag_service_key_2
: (optional, default 'all known tags', a hex tag service key)tags_2
: (optional, default system:everything, a list of tags you wish to search for)potentials_search_type
: (optional, integer, default 0, regarding how the pairs should match the search(es))pixel_duplicates
: (optional, integer, default 1, regarding whether the pairs should be pixel duplicates)max_hamming_distance
: (optional, integer, default 4, the max 'search distance' of the pairs)max_num_pairs
: (optional, integer, defaults to client's option, how many pairs to get in a batch)/manage_file_relationships/get_potential_pairs?tag_service_key_1=c1ba23c60cda1051349647a151321d43ef5894aacdfb4b4e333d6c4259d56c5f&tags_1=%5B%22dupes_to_process%22%2C%20%22system%3Awidth%3C400%22%5D&potentials_search_type=1&pixel_duplicates=2&max_hamming_distance=0&max_num_pairs=50\n
The search arguments work the same as /manage_file_relationships/get_potentials_count.
max_num_pairs
is simple and just caps how many pairs you get.
{\n \"potential_duplicate_pairs\" : [\n [ \"16470d6e73298cd75d9c7e8e2004810e047664679a660a9a3ba870b0fa3433d3\", \"7ed062dc76265d25abeee5425a859cfdf7ab26fd291f50b8de7ca381e04db079\" ],\n [ \"eeea390357f259b460219d9589b4fa11e326403208097b1a1fbe63653397b210\", \"9215dfd39667c273ddfae2b73d90106b11abd5fd3cbadcc2afefa526bb226608\" ],\n [ \"a1ea7d671245a3ae35932c603d4f3f85b0d0d40c5b70ffd78519e71945031788\", \"8e9592b2dfb436fe0a8e5fa15de26a34a6dfe4bca9d4363826fac367a9709b25\" ]\n ]\n}\n
The selected pair sample and their order is strictly hardcoded for now (e.g. to guarantee that a decision will not invalidate any other pair in the batch, you shouldn't see the same file twice in a batch, nor two files in the same duplicate group). Treat it as the client filter does, where you fetch batches to process one after another. I expect to make it more flexible in future, in the client itself and here.
You will see significantly fewer than max_num_pairs
(and potential duplicate count) as you close to the last available pairs, and when there are none left, you will get an empty list.
/manage_file_relationships/get_random_potentials
","text":"Get some random potentially duplicate file hashes. Exactly the same as the 'show some random potential dupes' button in the duplicate processing page.
Restricted access: YES. Manage File Relationships permission needed.Required Headers: n/a
Arguments (in percent-encoded JSON):tag_service_key_1
: (optional, default 'all known tags', a hex tag service key)tags_1
: (optional, default system:everything, a list of tags you wish to search for)tag_service_key_2
: (optional, default 'all known tags', a hex tag service key)tags_2
: (optional, default system:everything, a list of tags you wish to search for)potentials_search_type
: (optional, integer, default 0, regarding how the files should match the search(es))pixel_duplicates
: (optional, integer, default 1, regarding whether the files should be pixel duplicates)max_hamming_distance
: (optional, integer, default 4, the max 'search distance' of the files)/manage_file_relationships/get_random_potentials?tag_service_key_1=c1ba23c60cda1051349647a151321d43ef5894aacdfb4b4e333d6c4259d56c5f&tags_1=%5B%22dupes_to_process%22%2C%20%22system%3Awidth%3C400%22%5D&potentials_search_type=1&pixel_duplicates=2&max_hamming_distance=0\n
The arguments work the same as /manage_file_relationships/get_potentials_count, with the caveat that potentials_search_type
has special logic:
Essentially, the first hash is the 'master' to which the others are paired. The other files will include every matching file.
Response: A JSON Object listing a group of hashes exactly as the client would. Example response{\n \"random_potential_duplicate_hashes\" : [\n \"16470d6e73298cd75d9c7e8e2004810e047664679a660a9a3ba870b0fa3433d3\",\n \"7ed062dc76265d25abeee5425a859cfdf7ab26fd291f50b8de7ca381e04db079\",\n \"9e0d6b928b726562d70e1f14a7b506ba987c6f9b7f2d2e723809bb11494c73e6\",\n \"9e01744819b5ff2a84dda321e3f1a326f40d0e7f037408ded9f18a11ee2b2da8\"\n ]\n}\n
If there are no potential duplicate groups in the search, this returns an empty list.
"},{"location":"developer_api.html#manage_file_relationships_set_file_relationships","title":"POST/manage_file_relationships/set_file_relationships
","text":"Set the relationships to the specified file pairs.
Restricted access: YES. Manage File Relationships permission needed. Required Headers:Content-Type
: application/jsonrelationships
: (a list of Objects, one for each file-pair being set)Each Object is:
* `hash_a`: (a hexadecimal SHA256 hash)\n* `hash_b`: (a hexadecimal SHA256 hash)\n* `relationship`: (integer enum for the relationship being set)\n* `do_default_content_merge`: (bool)\n* `delete_a`: (optional, bool, default false)\n* `delete_b`: (optional, bool, default false)\n
hash_a
and hash_b
are normal hex SHA256 hashes for your file pair.
relationship
is one of this enum:
2, 4, and 7 all make the files 'duplicates' (8 under /get_file_relationships
), which, specifically, merges the two files' duplicate groups. 'same quality' has different duplicate content merge options to the better/worse choices, but it ultimately sets something similar to A>B (but see below for more complicated outcomes). You obviously don't have to use 'B is better' if you prefer just to swap the hashes. Do what works for you.
do_default_content_merge
sets whether the user's duplicate content merge options should be loaded and applied to the files along with the relationship. Most operations in the client do this automatically, so the user may expect it to apply, but if you want to do content merge yourself, set this to false.
delete_a
and delete_b
are booleans that select whether to delete A and/or B in the same operation as setting the relationship. You can also do this externally if you prefer.
{\n \"relationships\" : [\n {\n \"hash_a\" : \"b54d09218e0d6efc964b78b070620a1fa19c7e069672b4c6313cee2c9b0623f2\",\n \"hash_b\" : \"bbaa9876dab238dcf5799bfd8319ed0bab805e844f45cf0de33f40697b11a845\",\n \"relationship\" : 4,\n \"do_default_content_merge\" : true,\n \"delete_b\" : true\n },\n {\n \"hash_a\" : \"22667427eaa221e2bd7ef405e1d2983846c863d40b2999ce8d1bf5f0c18f5fb2\",\n \"hash_b\" : \"65d228adfa722f3cd0363853a191898abe8bf92d9a514c6c7f3c89cfed0bf423\",\n \"relationship\" : 4,\n \"do_default_content_merge\" : true,\n \"delete_b\" : true\n },\n {\n \"hash_a\" : \"0480513ffec391b77ad8c4e57fe80e5b710adfa3cb6af19b02a0bd7920f2d3ec\",\n \"hash_b\" : \"5fab162576617b5c3fc8caabea53ce3ab1a3c8e0a16c16ae7b4e4a21eab168a7\",\n \"relationship\" : 2,\n \"do_default_content_merge\" : true\n }\n ]\n}\n
Response: 200 with no content. If you try to add an invalid or redundant relationship, for instance setting files that are already duplicates as potential duplicates, no changes are made.
This is the file relationships request that is probably most likely to change in future. I may implement content merge options. I may move from file pairs to group identifiers. When I expand alternates, those file groups are going to support more variables.
"},{"location":"developer_api.html#king_merge_rules","title":"king merge rules","text":"Recall in /get_file_relationships
that we discussed how duplicate groups have a 'king' for their best file. This file is the most useful representative when you do comparisons, since if you say \"King A > King B\", then we know that King A is also better than all of King B's normal duplicate group members. We can merge the group simply just by folding King B and all the other members into King A's group.
So what happens if you say 'A = B'? We have to have a king, so which should it be?
What happens if you say \"non-king member of A > non-king member of B\"? We don't want to merge all of B into A, since King B might be higher quality than King A.
The logic here can get tricky, but I have tried my best to avoid overcommitting and accidentally promoting the wrong king. Here are all the possible situations ('>' means 'better than', and '=' means 'same quality as'):
MergesSo, if you can, always present kings to your users, and action using those kings' hashes. It makes the merge logic easier in all cases. Remember that you can set system:is the best quality file of its duplicate group
in any file search to exclude any non-kings (e.g. if you are hunting for easily actionable pixel potential duplicates).
/manage_file_relationships/set_kings
","text":"Set the specified files to be the kings of their duplicate groups.
Restricted access: YES. Manage File Relationships permission needed. Required Headers:Content-Type
: application/json{\n \"file_id\" : 123\n}\n
Response: 200 with no content. The files will be promoted to be the kings of their respective duplicate groups. If the file is already the king (also true for any file with no duplicates), this is idempotent. It also processes the files in the given order, so if you specify two files in the same group, the latter will be the king at the end of the request.
"},{"location":"developer_api.html#managing_cookies","title":"Managing Cookies","text":"This refers to the cookies held in the client's session manager, which you can review under network->data->manage session cookies. These are sent to every request on the respective domains.
"},{"location":"developer_api.html#manage_cookies_get_cookies","title":"GET/manage_cookies/get_cookies
","text":"Get the cookies for a particular domain.
Restricted access: YES. Manage Cookies and Headers permission needed.Required Headers: n/a
Arguments:domain
/manage_cookies/get_cookies?domain=gelbooru.com\n
Response: A JSON Object listing all the cookies for that domain in [ name, value, domain, path, expires ] format. Example response{\n \"cookies\" : [\n [\"__cfduid\", \"f1bef65041e54e93110a883360bc7e71\", \".gelbooru.com\", \"/\", 1596223327],\n [\"pass_hash\", \"0b0833b797f108e340b315bc5463c324\", \"gelbooru.com\", \"/\", 1585855361],\n [\"user_id\", \"123456\", \"gelbooru.com\", \"/\", 1585855361]\n ]\n}\n
Note that these variables are all strings except 'expires', which is either an integer timestamp or _null_ for session cookies.\n\nThis request will also return any cookies for subdomains. The session system in hydrus generally stores cookies according to the second-level domain, so if you request for specific.someoverbooru.net, you will still get the cookies for someoverbooru.net and all its subdomains.\n
"},{"location":"developer_api.html#manage_cookies_set_cookies","title":"POST /manage_cookies/set_cookies
","text":"Set some new cookies for the client. This makes it easier to 'copy' a login from a web browser or similar to hydrus if hydrus's login system can't handle the site yet.
Restricted access: YES. Manage Cookies and Headers permission needed. Required Headers:Content-Type
: application/jsoncookies
: (a list of cookie rows in the same format as the GET request above){\n \"cookies\" : [\n [\"PHPSESSID\", \"07669eb2a1a6e840e498bb6e0799f3fb\", \".somesite.com\", \"/\", 1627327719],\n [\"tag_filter\", \"1\", \".somesite.com\", \"/\", 1627327719]\n ]\n}\n
You can set 'value' to be null, which will clear any existing cookie with the corresponding name, domain, and path (acting essentially as a delete).
Expires can be null, but session cookies will time-out in hydrus after 60 minutes of non-use.
"},{"location":"developer_api.html#managing_http_headers","title":"Managing HTTP Headers","text":"This refers to the custom headers you can see under network->data->manage http headers.
"},{"location":"developer_api.html#manage_headers_get_headers","title":"GET/manage_headers/get_headers
","text":"Get the custom http headers.
Restricted access: YES. Manage Cookies and Headers permission needed.Required Headers: n/a
Arguments:domain
: optional, the domain to fetch headers for/manage_headers/get_headers?domain=gelbooru.com\n
Example request (for global)/manage_headers/get_headers\n
Response: A JSON Object listing all the headers: Example response{\n \"network_context\" : {\n \"type\" : 2,\n \"data\" : \"gelbooru.com\"\n },\n \"headers\" : {\n \"User-Agent\" : {\n \"value\" : \"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0\",\n \"approved\" : \"approved\",\n \"reason\" : \"Set by Client API\"\n },\n \"DNT\" : {\n \"value\" : \"1\",\n \"approved\" : \"approved\",\n \"reason\" : \"Set by Client API\"\n }\n }\n}\n
"},{"location":"developer_api.html#manage_headers_set_headers","title":"POST /manage_headers/set_headers
","text":"Manages the custom http headers.
Restricted access: YES. Manage Cookies and Headers permission needed. Required Headers: *Content-Type
: application/json Arguments (in JSON): domain
: (optional, the specific domain to set the header for)headers
: (a JSON Object that holds \"key\" objects){\n \"domain\" : \"mysite.com\",\n \"headers\" : {\n \"User-Agent\" : {\n \"value\" : \"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0\"\n },\n \"DNT\" : {\n \"value\" : \"1\"\n },\n \"CoolStuffToken\" : {\n \"value\" : \"abcdef0123456789\",\n \"approved\" : \"pending\",\n \"reason\" : \"This unlocks the Sonic fanfiction!\"\n }\n }\n}\n
Example request body that deletes{\n \"domain\" : \"myothersite.com\",\n \"headers\" : {\n \"User-Agent\" : {\n \"value\" : null\n },\n \"Authorization\" : {\n \"value\" : null\n }\n }\n}\n
If you do not set a domain, or you set it to null
, the 'context' will be the global context, which applies as a fallback to all jobs.
Domain headers also apply to their subdomains--unless they are overwritten by specific subdomain entries.
Each key
Object under headers
has the same form as /manage_headers/get_headers. value
is obvious--it is the value of the header. If the pair doesn't exist yet, you need the value
, but if you just want to approve something, it is optional. Set it to null
to delete an existing pair.
You probably won't ever use approved
or reason
, but they plug into the 'validation' system in the client. They are both optional. Approved can be any of [ approved, denied, pending ]
, and by default everything you add will be approved
. If there is anything pending
when a network job asks, the user will be presented with a yes/no popup presenting the reason for the header. If they click 'no', the header is set to denied
and the network job goes ahead without it. If you have a header that changes behaviour or unlocks special content, you might like to make it optional in this way.
If you need to reinstate it, the default global
User-Agent
is Mozilla/5.0 (compatible; Hydrus Client)
.
/manage_headers/set_user_agent
","text":"This is deprecated--move to /manage_headers/set_headers!
This sets the 'Global' User-Agent for the client, as typically editable under network->data->manage http headers, for instance if you want hydrus to appear as a specific browser associated with some cookies.
Restricted access: YES. Manage Cookies and Headers permission needed. Required Headers: *Content-Type
: application/json Arguments (in JSON): user-agent
: (a string){\n \"user-agent\" : \"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0\"\n}\n
Send an empty string to reset the client back to the default User-Agent, which should be Mozilla/5.0 (compatible; Hydrus Client)
.
This refers to the pages of the main client UI.
"},{"location":"developer_api.html#manage_pages_get_pages","title":"GET/manage_pages/get_pages
","text":"Get the page structure of the current UI session.
Restricted access: YES. Manage Pages permission needed.Required Headers: n/a
Arguments: n/a
Response:A JSON Object of the top-level page 'notebook' (page of pages) detailing its basic information and current sub-pages. Page of pages beneath it will list their own sub-page lists. Example response
{\n \"pages\" : {\n \"name\" : \"top pages notebook\",\n \"page_key\" : \"3b28d8a59ec61834325eb6275d9df012860a1ecfd9e1246423059bc47fb6d5bd\",\n \"page_state\" : 0,\n \"page_type\" : 10,\n \"selected\" : true,\n \"pages\" : [\n {\n \"name\" : \"files\",\n \"page_key\" : \"d436ff5109215199913705eb9a7669d8a6b67c52e41c3b42904db083255ca84d\",\n \"page_state\" : 0,\n \"page_type\" : 6,\n \"selected\" : false\n },\n {\n \"name\" : \"thread watcher\",\n \"page_key\" : \"40887fa327edca01e1d69b533dddba4681b2c43e0b4ebee0576177852e8c32e7\",\n \"page_state\" : 0,\n \"page_type\" : 9,\n \"selected\" : false\n },\n {\n \"name\" : \"pages\",\n \"page_key\" : \"2ee7fa4058e1e23f2bd9e915cdf9347ae90902a8622d6559ba019a83a785c4dc\",\n \"page_state\" : 0,\n \"page_type\" : 10,\n \"selected\" : true,\n \"pages\" : [\n {\n \"name\" : \"urls\",\n \"page_key\" : \"9fe22cb760d9ee6de32575ed9f27b76b4c215179cf843d3f9044efeeca98411f\",\n \"page_state\" : 0,\n \"page_type\" : 7,\n \"selected\" : true\n },\n {\n \"name\" : \"files\",\n \"page_key\" : \"2977d57fc9c588be783727bcd54225d577b44e8aa2f91e365a3eb3c3f580dc4e\",\n \"page_state\" : 0,\n \"page_type\" : 6,\n \"selected\" : false\n }\n ]\n }\n ]\n }\n}\n
name
is the full text on the page tab.
page_key
is a unique identifier for the page. It will stay the same for a particular page throughout the session, but new ones are generated on a session reload.
page_type
is as follows:
page_state
is as follows:
Most pages will be 0, normal/ready, at all times. Large pages will start in an 'initialising' state for a few seconds, which means their session-saved thumbnails aren't loaded yet. Search pages will enter 'searching' after a refresh or search change and will either return to 'ready' when the search is complete, or fall to 'search cancelled' if the search was interrupted (usually this means the user clicked the 'stop' button that appears after some time).
selected
means which page is currently in view. It will propagate down the page of pages until it terminates. It may terminate in an empty page of pages, so do not assume it will end on a media page.
The top page of pages will always be there, and always selected.
"},{"location":"developer_api.html#manage_pages_get_page_info","title":"GET/manage_pages/get_page_info
","text":"Get information about a specific page.
Under Construction
This is under construction. The current call dumps a ton of info for different downloader pages. Please experiment in IRL situations and give feedback for now! I will flesh out this help with more enumeration info and examples as this gets nailed down. POST commands to alter pages (adding, removing, highlighting), will come later.
Restricted access: YES. Manage Pages permission needed.Required Headers: n/a
Arguments:page_key
: (hexadecimal page_key as stated in /manage_pages/get_pages)simple
: true or false (optional, defaulting to true)/manage_pages/get_page_info?page_key=aebbf4b594e6986bddf1eeb0b5846a1e6bc4e07088e517aff166f1aeb1c3c9da&simple=true\n
Response description A JSON Object of the page's information. At present, this mostly means downloader information. Example response with simple = true
{\n \"page_info\" : {\n \"name\" : \"threads\",\n \"page_key\" : \"aebbf4b594e6986bddf1eeb0b5846a1e6bc4e07088e517aff166f1aeb1c3c9da\",\n \"page_state\" : 0,\n \"page_type\" : 3,\n \"management\" : {\n \"multiple_watcher_import\" : {\n \"watcher_imports\" : [\n {\n \"url\" : \"https://someimageboard.net/m/123456\",\n \"watcher_key\" : \"cf8c3525c57a46b0e5c2625812964364a2e801f8c49841c216b8f8d7a4d06d85\",\n \"created\" : 1566164269,\n \"last_check_time\" : 1566164272,\n \"next_check_time\" : 1566174272,\n \"files_paused\" : false,\n \"checking_paused\" : false,\n \"checking_status\" : 0,\n \"subject\" : \"gundam pictures\",\n \"imports\" : {\n \"status\" : \"4 successful (2 already in db)\",\n \"simple_status\" : \"4\",\n \"total_processed\" : 4,\n \"total_to_process\" : 4\n },\n \"gallery_log\" : {\n \"status\" : \"1 successful\",\n \"simple_status\" : \"1\",\n \"total_processed\" : 1,\n \"total_to_process\" : 1\n }\n },\n {\n \"url\" : \"https://someimageboard.net/a/1234\",\n \"watcher_key\" : \"6bc17555b76da5bde2dcceedc382cf7d23281aee6477c41b643cd144ec168510\",\n \"created\" : 1566063125,\n \"last_check_time\" : 1566063133,\n \"next_check_time\" : 1566104272,\n \"files_paused\" : false,\n \"checking_paused\" : true,\n \"checking_status\" : 1,\n \"subject\" : \"anime pictures\",\n \"imports\" : {\n \"status\" : \"124 successful (22 already in db), 2 previously deleted\",\n \"simple_status\" : \"124\",\n \"total_processed\" : 124,\n \"total_to_process\" : 124\n },\n \"gallery_log\" : {\n \"status\" : \"3 successful\",\n \"simple_status\" : \"3\",\n \"total_processed\" : 3,\n \"total_to_process\" : 3\n }\n }\n ]\n },\n \"highlight\" : \"cf8c3525c57a46b0e5c2625812964364a2e801f8c49841c216b8f8d7a4d06d85\"\n }\n },\n \"media\" : {\n \"num_files\" : 4\n }\n}\n
name
, page_key
, page_state
, and page_type
are as in /manage_pages/get_pages.
As you can see, even the 'simple' mode can get very large. Imagine that response for a page watching 100 threads! Turning simple mode off will display every import item, gallery log entry, and all hashes in the media (thumbnail) panel.
For this first version, the five importer pages--hdd import, simple downloader, url downloader, gallery page, and watcher page--all give rich info based on their specific variables. The first three only have one importer/gallery log combo, but the latter two of course can have multiple. The \"imports\" and \"gallery_log\" entries are all in the same data format.
"},{"location":"developer_api.html#manage_pages_add_files","title":"POST/manage_pages/add_files
","text":"Add files to a page.
Restricted access: YES. Manage Pages permission needed. Required Headers:Content-Type
: application/jsonpage_key
: (the page key for the page you wish to add files to)The files you set will be appended to the given page, just like a thumbnail drag and drop operation. The page key is the same as fetched in the /manage_pages/get_pages call.
Example request body{\n \"page_key\" : \"af98318b6eece15fef3cf0378385ce759bfe056916f6e12157cd928eb56c1f18\",\n \"file_ids\" : [123, 124, 125]\n}\n
Response: 200 with no content. If the page key is not found, this will 404."},{"location":"developer_api.html#manage_pages_focus_page","title":"POST /manage_pages/focus_page
","text":"'Show' a page in the main GUI, making it the current page in view. If it is already the current page, no change is made.
Restricted access: YES. Manage Pages permission needed. Required Headers:Content-Type
: application/jsonpage_key
: (the page key for the page you wish to show)The page key is the same as fetched in the /manage_pages/get_pages call.
Example request body{\n \"page_key\" : \"af98318b6eece15fef3cf0378385ce759bfe056916f6e12157cd928eb56c1f18\"\n}\n
Response: 200 with no content. If the page key is not found, this will 404."},{"location":"developer_api.html#manage_pages_refresh_page","title":"POST /manage_pages/refresh_page
","text":"Refresh a page in the main GUI. Like hitting F5 in the client, this obviously makes file search pages perform their search again, but for other page types it will force the currently in-view files to be re-sorted.
Restricted access: YES. Manage Pages permission needed. Required Headers:Content-Type
: application/jsonpage_key
: (the page key for the page you wish to refresh)The page key is the same as fetched in the /manage_pages/get_pages call. If a file search page is not set to 'searching immediately', a 'refresh' command does nothing.
Example request body{\n \"page_key\" : \"af98318b6eece15fef3cf0378385ce759bfe056916f6e12157cd928eb56c1f18\"\n}\n
Response: 200 with no content. If the page key is not found, this will 404. Poll the page_state
in /manage_pages/get_pages or /manage_pages/get_page_info to see when the search is complete.
/manage_database/lock_on
","text":"Pause the client's database activity and disconnect the current connection.
Restricted access: YES. Manage Database permission needed.Arguments: None
This is a hacky prototype. It commands the client database to pause its job queue and release its connection (and related file locks and journal files). This puts the client in a similar position as a long VACUUM command--it'll hang in there, but not much will work, and since the UI async code isn't great yet, the UI may lock up after a minute or two. If you would like to automate database backup without shutting the client down, this is the thing to play with.
This should return pretty quick, but it will wait up to five seconds for the database to actually disconnect. If there is a big job (like a VACUUM) current going on, it may take substantially longer to finish that up and process this STOP command. You might like to check for the existence of a journal file in the db dir just to be safe.
As long as this lock is on, all Client API calls except the unlock command will return 503. (This is a decent way to test the current lock status, too)
"},{"location":"developer_api.html#manage_database_lock_off","title":"POST/manage_database/lock_off
","text":"Reconnect the client's database and resume activity.
Restricted access: YES. Manage Database permission needed.Arguments: None
This is the obvious complement to the lock. The client will resume processing its job queue and will catch up. If the UI was frozen, it should free up in a few seconds, just like after a big VACUUM.
"},{"location":"developer_api.html#manage_database_mr_bones","title":"GET/manage_database/mr_bones
","text":"Get the data from help->how boned am I?. This is a simple Object of numbers just for hacky advanced purposes if you want to build up some stats in the background. The numbers are the same as the dialog shows, so double check that to confirm what means what.
Restricted access: YES. Manage Database permission needed. Arguments (in percent-encoded JSON):tags
: (optional, a list of tags you wish to search for)tag_service_key
: (optional, hexadecimal, the tag domain on which to search, defaults to 'all my files')/manage_database/mr_bones\n/manage_database/mr_bones?tags=%5B%22blonde_hair%22%2C%20%22blue_eyes%22%5D\n
Example response{\n \"boned_stats\" : {\n \"num_inbox\" : 8356,\n \"num_archive\" : 229,\n \"num_deleted\" : 7010,\n \"size_inbox\" : 7052596762,\n \"size_archive\" : 262911007,\n \"size_deleted\" : 13742290193,\n \"earliest_import_time\" : 1451408539,\n \"total_viewtime\" : [3280, 41621, 2932, 83021],\n \"total_alternate_files\" : 265,\n \"total_duplicate_files\" : 125,\n \"total_potential_pairs\" : 3252\n }\n}\n
The arguments here are the same as for GET /get_files/search_files. You can set any or none of them to set a search domain like in the dialog.
"},{"location":"developer_api.html#manage_database_get_client_options","title":"GET/manage_database/get_client_options
","text":"Unstable Response
The response for this path is unstable and subject to change without warning. No examples are given.
Gets the current options from the client.
Restricted access: YES. Manage Database permission needed.Required Headers: n/a
Arguments: n/a
Response: A JSON dump of nearly all options set in the client. The format of this is based on internal hydrus structures and is subject to change without warning with new hydrus versions. Do not rely on anything you find here to continue to exist and don't rely on the structure to be the same."},{"location":"docker.html","title":"Hydrus in a container(HiC)","text":"Latest hydrus client that runs in docker 24/7. Employs xvfb and vnc. Runs on alpine.
TL;DR: docker run --name hydrusclient -d -p 5800:5800 -p 5900:5900 ghcr.io/hydrusnetwork/hydrus:latest
. Connect to noVNC via http://yourdockerhost:5800/vnc.html
or use Tiger VNC Viewer or any other VNC client and connect on port 5900.
For persistent storage you can either create a named volume or mount a new/existing db path -v /hydrus/client/db:/opt/hydrus/db
. The client runs with default permissions of 1000:1000
, this can be changed by the ENV UID
and GID
(not working atm, fixed to 1000) will be fixed someday\u2122.
If you have enough RAM, mount /tmp
as tmpfs. If not, download more RAM.
As of v359
hydrus understands IPFS nocopy
. And can be easily run with go-ipfs container. Read Hydrus IPFS help. Mount HOST_PATH_DB/client_files
to /data/client_files
in ipfs. Go manage the ipfs service and set the path to /data/client_files
, you'll know where to put it in.
Example compose file:
version: '3.8'\nvolumes:\n tor-config:\n driver: local\n hybooru-pg-data:\n driver: local\n hydrus-server:\n driver: local\n hydrus-client:\n driver: local\n ipfs-data:\n driver: local\n hydownloader-data:\n driver: local\nservices:\n hydrusclient:\n image: ghcr.io/hydrusnetwork/hydrus:latest\n container_name: hydrusclient\n restart: unless-stopped\n environment:\n - UID=1000\n - GID=1000\n volumes:\n - hydrus-client:/opt/hydrus/db\n tmpfs:\n - /tmp #optional for SPEEEEEEEEEEEEEEEEEEEEEEEEED and less disk access\n ports:\n - 5800:5800 #noVNC\n - 5900:5900 #VNC\n - 45868:45868 #Booru\n - 45869:45869 #API\n\n hydrusserver:\n image: ghcr.io/hydrusnetwork/hydrus:server\n container_name: hydrusserver\n restart: unless-stopped\n volumes:\n - hydrus-server:/opt/hydrus/db\n\n hydrusclient-ipfs:\n image: ipfs/go-ipfs\n container_name: hydrusclient-ipfs\n restart: unless-stopped\n volumes:\n - ipfs-data:/data/ipfs\n - hydrus-clients:/data/db:ro\n ports:\n - 4001:4001 # READ\n - 5001:5001 # THE\n - 8080:8080 # IPFS\n - 8081:8081 # DOCS\n\n hydrus-web:\n image: floogulinc/hydrus-web\n container_name: hydrus-web\n restart: always\n ports:\n - 8080:80 # READ\n\n hybooru-pg:\n image: healthcheck/postgres\n container_name: hybooru-pg\n environment:\n - POSTGRES_USER=hybooru\n - POSTGRES_PASSWORD=hybooru\n - POSTGRES_DB=hybooru\n volumes:\n - hybooru-pg-data:/var/lib/postgresql/data\n restart: unless-stopped\n\n hybooru:\n image: suika/hybooru:latest # https://github.com/funmaker/hybooru build it yourself\n container_name: hybooru\n restart: unless-stopped\n depends_on:\n hybooru-pg:\n condition: service_started\n ports:\n - 8081:80 # READ\n volumes:\n - hydrus-client:/opt/hydrus/db\n\n hydownloader:\n image: ghcr.io/thatfuckingbird/hydownloader:edge\n container_name: hydownloader\n restart: unless-stopped\n ports:\n - 53211:53211\n volumes:\n - hydownloader-data:/db\n - hydrus-client:/hydb\n\n tor-socks-proxy:\n #network_mode: \"container:myvpn_container\" # in case you have a vpn container\n container_name: tor-socks-proxy\n image: peterdavehello/tor-socks-proxy:latest\n restart: unless-stopped\n\n tor-hydrus:\n image: goldy/tor-hidden-service\n container_name: tor-hydrus\n depends_on:\n hydrusclient:\n condition: service_healthy\n hydrusserver:\n condition: service_healthy\n hybooru:\n condition: service_started\n environment:\n HYBOORU_TOR_SERVICE_HOSTS: '80:hybooru:80'\n HYBOORU_TOR_SERVICE_VERSION: '3'\n HYSERV_TOR_SERVICE_HOSTS: 45870:hydrusserver:45870,45871:hydrusserver:45871\n HYSERV_TOR_SERVICE_VERSION: '3'\n HYCLNT_TOR_SERVICE_HOSTS: 45868:hydrusclient:45868,45869:hydrusclient:45869\n HYCLNT_TOR_SERVICE_VERSION: '3'\n volumes:\n - tor-config:/var/lib/tor/hidden_service \n
Further containerized application of interest: # Alpine (client)\ncd hydrus/\ndocker build -t ghcr.io/hydrusnetwork/hydrus:latest -f static/build_files/docker/client/Dockerfile .\n
"},{"location":"downloader_completion.html","title":"Putting it all together","text":"Now you know what GUGs, URL Classes, and Parsers are, you should have some ideas of how URL Classes could steer what happens when the downloader is faced with an URL to process. Should a URL be imported as a media file, or should it be parsed? If so, how?
You may have noticed in the Edit GUG ui that it lists if a current URL Class matches the example URL output. If the GUG has no matching URL Class, it won't be listed in the main 'gallery selector' button's list--it'll be relegated to the 'non-functioning' page. Without a URL Class, the client doesn't know what to do with the output of that GUG. But if a URL Class does match, we can then hand the result over to a parser set at network->downloader components->manage url class links:
Here you simply set which parsers go with which URL Classes. If you have URL Classes that do not have a parser linked (which is the default for new URL Classes), you can use the 'try to fill in gaps...' button to automatically fill the gaps based on guesses using the parsers' example URLs. This is usually the best way to line things up unless you have multiple potential parsers for that URL Class, in which case it'll usually go by the parser name earliest in the alphabet.
If the URL Class has no parser set or the parser is broken or otherwise invalid, the respective URL's file import object in the downloader or subscription is going to throw some kind of error when it runs. If you make and share some parsers, the first indication that something is wrong is going to be several users saying 'I got this error: (copy notes from file import status window)'. You can then load the parser back up in manage parsers and try to figure out what changed and roll out an update.
manage url class links also shows 'api/redirect link review', which summarises which URL Classes redirect to others. In these cases, only the redirected-to URL gets a parser entry in the first 'parser links' window, since the first will never be fetched for parsing (in the downloader, it will always be converted to the Redirected URL, and that is fetched and parsed).
Once your GUG has a URL Class and your URL Classes have parsers linked, test your downloader! Note that Hydrus's URL drag-and-drop import uses URL Classes, so if you don't have the GUG and gallery stuff done but you have a Post URL set up, you can test that just by dragging a Post URL from your browser to the client, and it should be added to a new URL Downloader and just work. It feels pretty good once it does!
"},{"location":"downloader_gugs.html","title":"Gallery URL Generators","text":"Gallery URL Generators, or GUGs are simple objects that take a simple string from the user, like:
And convert them into an initialising Gallery URL, such as:
These are all the 'first page' of the results if you type or click-through to the same location on those sites. We are essentially emulating their own simple search-url generation inside the hydrus client.
"},{"location":"downloader_gugs.html#doing_it","title":"actually doing it","text":"Although it is usually a fairly simple process of just substituting the inputted tags into a string template, there are a couple of extra things to think about. Let's look at the ui under network->downloader components->manage gugs:
The client will split whatever the user enters by whitespace, so blue_eyes blonde_hair
becomes two search terms, [ 'blue_eyes', 'blonde_hair' ]
, which are then joined back together with the given 'search terms separator', to make blue_eyes+blonde_hair
. Different sites use different separators, although ' ', '+', and ',' are most common. The new string is substituted into the %tags%
in the template phrase, and the URL is made.
Note that you will not have to make %20 or %3A percent-encodings for reserved characters here--the network engine handles all that before the request is sent. For the most part, if you need to include or a user puts in ':' or ' ' or '\u304a\u3063\u3071\u3044', you can just pass it along straight into the final URL without worrying.
This ui should update as you change it, so have a play and look at how the output example url changes to get a feel for things. Look at the other defaults to see different examples. Even if you break something, you can just cancel out.
The name of the GUG is important, as this is what will be listed when the user chooses what 'downloader' they want to use. Make sure it has a clear unambiguous name.
The initial search text is also important. Most downloaders just take some text tags, but if your GUG expects a numerical artist id (like pixiv artist search does), you should specify that explicitly to the user. You can even put in a brief '(two tag maximum)' type of instruction if you like.
Notice that the Deviart Art example above is actually the stream of wlop's favourites, not his works, and without an explicit notice of that, a user could easily mistake what they have selected. 'gelbooru' or 'newgrounds' are bad names, 'type here' is a bad initialising text.
"},{"location":"downloader_gugs.html#nested_gugs","title":"Nested GUGs","text":"Nested Gallery URL Generators are GUGs that hold other GUGs. Some searches actually use more than one stream (such as a Hentai Foundry artist lookup, where you might want to get both their regular works and their scraps, which are two separate galleries under the site), so NGUGs allow you to generate multiple initialising URLs per input. You can experiment with this ui if you like--it isn't too complicated--but you might want to hold off doing anything for real until you are comfortable with everything and know how producing multiple initialising URLs is going to work in the actual downloader.
"},{"location":"downloader_intro.html","title":"Making a Downloader","text":"Caution
Creating custom downloaders is only for advanced users who understand HTML or JSON. Beware! If you are simply looking for how to add new downloaders, please head over here.
"},{"location":"downloader_intro.html#intro","title":"this system","text":"The first versions of hydrus's downloaders were all hardcoded and static--I wrote everything into the program itself and nothing was user-creatable or -fixable. After the maintenance burden of the entire messy system proved too large for me to keep up with and a semi-editable booru system proved successful, I decided to overhaul the entire thing to allow user creation and sharing of every component. It is designed to be very simple to the front-end user--they will typically handle a couple of png files and then select a new downloader from a list--but very flexible (and hence potentially complicated) on the back-end. These help pages describe the different compontents with the intention of making an HTML- or JSON- fluent user able to create and share a full new downloader on their own.
As always, this is all under active development. Your feedback on the system would be appreciated, and if something is confusing or you discover something in here that is out of date, please let me know.
"},{"location":"downloader_intro.html#downloader","title":"what is a downloader?","text":"In hydrus, a downloader is one of:
Gallery Downloader This takes a string like 'blue_eyes' to produce a series of thumbnail gallery page URLs that can be parsed for image page URLs which can ultimately be parsed for file URLs and metadata like tags. Boorus fall into this category. URL Downloader This does just the Gallery Downloader's back-end--instead of taking a string query, it takes the gallery or post URLs directly from the user, whether that is one from a drag-and-drop event or hundreds pasted from clipboard. For our purposes here, the URL Downloader is a subset of the Gallery Downloader. Watcher This takes a URL that it will check in timed intervals, parsing it for new URLs that it then queues up to be downloaded. It typically stops checking after the 'file velocity' (such as '1 new file per day') drops below a certain level. It is mostly for watching imageboard threads. Simple Downloader This takes a URL one-time and parses it for direct file URLs. This is a miscellaneous system for certain simple gallery types and some testing/'I just need the third tag's src on this one page' jobs.The system currently supports HTML and JSON parsing. XML should be fine under the HTML parser--it isn't strict about checking types and all that.
"},{"location":"downloader_intro.html#pipeline","title":"what does a downloader do?","text":"The Gallery Downloader is the most complicated downloader and uses all the possible components. In order for hydrus to convert our example 'blue_eyes' query into a bunch of files with tags, it needs to:
So we have three components:
URL downloaders and watchers do not need the Gallery URL Generator, as their input is an URL. And simple downloaders also have an explicit 'just download it and parse it with this simple rule' action, so they do not use URL Classes (or even full-fledged Page Parsers) either.
"},{"location":"downloader_login.html","title":"Login Manager","text":"The system works, but this help was never done! Check the defaults for examples of how it works, sorry!
"},{"location":"downloader_parsers.html","title":"Parsers","text":"In hydrus, a parser is an object that takes a single block of HTML or JSON data and returns many kinds of hydrus-level metadata.
Parsers are flexible and potentially quite complicated. You might like to open network->downloader components->manage parsers and explore the UI as you read these pages. Check out how the default parsers already in the client work, and if you want to write a new one, see if there is something already in there that is similar--it is usually easier to duplicate an existing parser and then alter it than to create a new one from scratch every time.
There are three main components in the parsing system (click to open each component's help page):
Once you are comfortable with these objects, you might like to check out these walkthroughs, which create full parsers from nothing:
Once you are comfortable with parsers, and if you are feeling brave, check out how the default imageboard and pixiv parsers work. These are complicated and use more experimental areas of the code to get their job done. If you are trying to get a new imageboard parser going and can't figure out subsidiary page parsers, send me a mail or something and I'll try to help you out!
When you are making a parser, consider this checklist (you might want to copy/have your own version of this somewhere):
Taken a break? Now let's put it all together ---->
"},{"location":"downloader_parsers_content_parsers.html","title":"Content Parsers","text":"So, we can now generate some strings from a document. Content Parsers will let us apply a single metadata type to those strings to inform hydrus what they are.
A content parser has a name, a content type, and a formula. This example fetches the character tags from a danbooru post.
The name is just decorative, but it is generally a good idea so you can find things again when you next revisit them.
The current content types are:
"},{"location":"downloader_parsers_content_parsers.html#intro","title":"urls","text":"This should be applied to relative ('/image/smile.jpg') and absolute ('https://mysite.com/content/image/smile.jpg') URLs. If the URL is relative, the client will generate an absolute URL based on the original URL used to fetch the data being parsed (i.e. it should all just work).
You can set several types of URL:
The 'file url quality precedence' allows the client to select the best of several possible URLs. Given multiple content parsers producing URLs at the same 'level' of parsing, it will select the one with the highest value. Consider these two posts:
The Garnet image fits into a regular page and so Danbooru embed the whole original file in the main media canvas. One easy way to find the full File URL in this case would be to select the \"src\" attribute of the \"img\" tag with id=\"image\".
The Cirno one, however, is much larger and has been scaled down. The src of the main canvas tag points to a resized 'sample' link. The full link can be found at the 'view original' link up top, which is an \"a\" tag with id=\"image-resize-link\".
The Garnet post does not have the 'view original' link, so to cover both situations we might want two content parsers--one fetching the 'canvas' \"src\" and the other finding the 'view original' \"href\". If we set the 'canvas' one with a quality of 40 and the 'view original' 60, then the parsing system would know to select the 60 when it was available but to fall back to the 40 if not.
As it happens, Danbooru (afaik, always) gives a link to the original file under the 'Size:' metadata to the left. This is the same 'best link' for both posts above, but it isn't so easy to identify. It is a quiet \"a\" tag without an \"id\" and it isn't always in the same location, but if you could pin it down reliably, it might be nice to circumvent the whole issue.
Sites can change suddenly, so it is nice to have a bit of redundancy here if it is easy.
"},{"location":"downloader_parsers_content_parsers.html#tags","title":"tags","text":"These are simple--they tell the client that the given strings are tags. You set the namespace here as well. I recommend you parse 'splashbrush' and set the namespace 'creator' here rather than trying to mess around with 'append prefix \"creator:\"' string conversions at the formula level--it is simpler up here and it lets hydrus handle any edge case logic for you.
Leave the namespace field blank for unnamespaced tags.
"},{"location":"downloader_parsers_content_parsers.html#file_hash","title":"file hash","text":"This says 'this is the hash for the file otherwise referenced in this parser'. So, if you have another content parser finding a File or Post URL, this lets the client know early that that destination happens to have a particular MD5, for instance. The client will look for that hash in its own database, and if it finds a match, it can predetermine if it already has the file (or has previously deleted it) without ever having to download it. When this happens, it will still add tags and associate the file with the URL for it's 'known urls' just as if it had downloaded it!
If you understand this concept, it is great to include. It saves time and bandwidth for everyone. Many site APIs include a hash for this exact reason--they want you to be able to skip a needless download just as much as you do.
The usual suite of hash types are supported: MD5, SHA1, SHA256, and SHA512. An old version of this required some weird string decoding, but this is no longer true. Select 'hex' or 'base64' from the encoding type dropdown, and then just parse the 'e5af57a687f089894f5ecede50049458' or '5a9XpofwiYlPXs7eUASUWA==' text, and hydrus should handle the rest. It will present the parsed hash in hex.
"},{"location":"downloader_parsers_content_parsers.html#timestamp","title":"timestamp","text":"This lets you say that a given number refers to a particular time for a file. At the moment, I only support 'source time', which represents a 'post' time for the file and is useful for thread and subscription check time calculations. It takes a Unix time integer, like 1520203484, which many APIs will provide.
If you are feeling very clever, you can decode a 'MM/DD/YYYY hh:mm:ss' style string to a Unix time integer using string converters, which use some hacky and semi-reliable python %d-style values as per here. Look at the existing defaults for examples of this, and don't worry about being more accurate than 12/24 hours--trying to figure out timezone is a hell not worth attempting, and doesn't really matter in the long-run for subscriptions and thread watchers that might care.
"},{"location":"downloader_parsers_content_parsers.html#page_title","title":"watcher page title","text":"This lets the watcher know a good name/subject for its entries. The subject of a thread is obviously ideal here, but failing that you can try to fetch the first part of the first post's comment. It has precendence, like for URLs, so you can tell the parser which to prefer if you have multiple options. Just for neatness and ease of testing, you probably want to use a string converter here to cut it down to the first 64 characters or so.
"},{"location":"downloader_parsers_content_parsers.html#veto","title":"veto","text":"This is a special content type--it tells the next highest stage of parsing that this 'post' of parsing is invalid and to cancel and not return any data. For instance, if a thread post's file was deleted, the site might provide a default '404' stock File URL using the same markup structure as it would for normal images. You don't want to give the user the same 404 image ten times over (with fifteen kinds of tag and source time metadata attached), so you can add a little rule here that says \"If the image link is 'https://somesite.com/404.png', raise a veto: File 404\" or \"If the page has 'No results found' in its main content div, raise a veto: No results found\" or \"If the expected download tag does not have 'download link' as its text, raise a veto: No Download Link found--possibly Ugoira?\" and so on.
They will associate their name with the veto being raised, so it is useful to give these a decent descriptive name so you can see what might be going right or wrong during testing. If it is an appropriate and serious enough veto, it may also rise up to the user level and will be useful if they need to report you an error (like \"After five pages of parsing, it gives 'veto: no next page link'\").
"},{"location":"downloader_parsers_formulae.html","title":"Parser Formulae","text":"Formulae are tools used by higher-level components of the parsing system. They take some data (typically some HTML or JSON) and return 0 to n strings. For our purposes, these strings will usually be tags, URLs, and timestamps. You will usually see them summarised with this panel:
The different types are currently html, json, compound, and context variable.
"},{"location":"downloader_parsers_formulae.html#html_formula","title":"html","text":"This takes a full HTML document or a sample of HTML--and any regular sort of XML should also work. It starts at the root node and searches for lower nodes using one or more ordered rules based on tag name and attributes, and then returns string data from those final nodes.
For instance, if you have this:
<html>\n <body>\n <div class=\"media_taglist\">\n <span class=\"generaltag\"><a href=\"(search page)\">blonde hair</a> (3456)</span>\n <span class=\"generaltag\"><a href=\"(search page)\">blue eyes</a> (4567)</span>\n <span class=\"generaltag\"><a href=\"(search page)\">bodysuit</a> (5678)</span>\n <span class=\"charactertag\"><a href=\"(search page)\">samus aran</a> (2345)</span>\n <span class=\"artisttag\"><a href=\"(search page)\">splashbrush</a> (123)</span>\n </div>\n <div class=\"content\">\n <span class=\"media\">(a whole bunch of content that doesn't have tags in)</span>\n </div>\n </body>\n</html>\n
(Most boorus have a taglist like this on their file pages.)
To find the artist, \"splashbrush\", here, you could:
<html>
) for the <div>
tag with attribute class=\"media_taglist\"
<div>
for <span>
tags with attribute class=\"artisttag\"
<span>
tags for <a>
tags<a>
tagsChanging the artisttag
to charactertag
or generaltag
would give you samus aran
or blonde hair
, blue eyes
, bodysuit
respectively.
You might be tempted to just go straight for any <span>
with class=\"artisttag\"
, but many sites use the same class to render a sidebar of favourite/popular tags or some other sponsored content, so it is generally best to try to narrow down to a larger <div>
container so you don't get anything you don't mean.
Clicking 'edit formula' on an HTML formula gives you this:
You edit on the left and test on the right.
"},{"location":"downloader_parsers_formulae.html#finding_the_right_html_tags","title":"finding the right html tags","text":"When you add or edit one of the specific tag search rules, you get this:
You can set multiple key/value attribute search conditions, but you'll typically be searching for 'class' or 'id' here, if anything.
Note that you can set it to fetch only the xth instance of a found tag, which can be useful in situations like this:
<span class=\"generaltag\">\n <a href=\"(add tag)\">+</a>\n <a href=\"(remove tag)\">-</a>\n <a href=\"(search page)\">blonde hair</a> (3456)\n</span>\n
Without any more attributes, there isn't a great way to distinguish the <a>
with \"blonde hair\" from the other two--so just set get the 3rd <a> tag
and you are good.
Most of the time, you'll be searching descendants (i.e. walking down the tree), but sometimes you might have this:
<span>\n <a href=\"(link to post url)\">\n <img class=\"thumb\" src=\"(thumbnail image)\" />\n </a>\n</span>\n
There isn't a great way to find the <span>
or the <a>
when looking from above here, as they are lacking a class or id, but you can find the <img>
ok, so if you find those and then add a rule where instead of searching descendants, you are 'walking back up ancestors' like this:
You can solve some tricky problems this way!
You can also set a String Match, which is the same panel as you say in with URL Classes. It tests its best guess at the tag's 'string' value, so you can find a tag with 'Original Image' as its text or that with a regex starts with 'Posted on: '. Have a play with it and you'll figure it out.
"},{"location":"downloader_parsers_formulae.html#content_to_fetch","title":"content to fetch","text":"Once you have narrowed down the right nodes you want, you can decide what text to fetch. Given a node of:
<a href=\"(URL A)\" class=\"thumb_title\">Forest Glade</a>\n
Returning the href
attribute would return the string \"(URL A)\", returning the string content would give \"Forest Glade\", and returning the full html would give <a href=\"(URL A)\" class=\"thumb\">Forest Glade</a>
. This last choice is useful in complicated situations where you want a second, separated layer of parsing, which we will get to later.
You can set a final String Match to filter the parsed results (e.g. \"only allow strings that only contain numbers\" or \"only allow full URLs as based on (complicated regex)\") and String Converter to edit it (e.g. \"remove the first three characters of whatever you find\" or \"decode from base64\").
You won't use these much, but they can sometimes get you out of a complicated situation.
"},{"location":"downloader_parsers_formulae.html#testing","title":"testing","text":"The testing panel on the right is important and worth using. Copy the html from the source you want to parse and then hit the paste buttons to set that as the data to test with.
"},{"location":"downloader_parsers_formulae.html#json_formula","title":"json","text":"This takes some JSON and does a similar style of search:
It is a bit simpler than HTML--if the current node is a list (called an 'Array' in JSON), you can fetch every item or the xth item, and if it is a dictionary (called an 'Object' in JSON), you can fetch a particular entry by name. Since you can't jump down several layers with attribute lookups or tag names like with HTML, you have to go down every layer one at a time. In any case, if you have something like this:
Note
It is a great idea to check the html or json you are trying to parse with your browser. Some web browsers have excellent developer tools that let you walk through the nodes of the document you are trying to parse in a prettier way than I would ever have time to put together. This image is one of the views Firefox provides if you simply enter a JSON URL.
Searching for \"posts\"->1st list item->\"sub\" on this data will give you \"Nobody like kino here.\".
Searching for \"posts\"->all list items->\"tim\" will give you the three SHA256 file hashes (since the third post has no file attached and so no 'tim' entry, the parser skips over it without complaint).
Searching for \"posts\"->1st list item->\"com\" will give you the OP's comment, ~AS RAW UNPARSED HTML~.
The default is to fetch the final nodes' 'data content', which means coercing simple variables into strings. If the current node is a list or dict, no string is returned.
But if you like, you can return the json beneath the current node (which, like HTML, includes the current node). This again will come in useful later.
"},{"location":"downloader_parsers_formulae.html#compound_formula","title":"compound","text":"If you want to create a string from multiple parsed strings--for instance by appending the 'tim' and the 'ext' in our json example together--you can use a Compound formula. This fetches multiple lists of strings and tries to place them into a single string using \\1
regex substitution syntax:
This is a complicated example taken from one of my thread parsers. I have to take a modified version of the original thread URL (the first rule, so \\1
) and then append the filename (\\2
) and its extension (\\3
) on the end to get the final file URL of a post. You can mix in more characters in the substitution phrase, like \\1.jpg
or even have multiple instances (https://\\2.muhsite.com/\\2/\\1
), if that is appropriate.
This is where the magic happens, sometimes, so keep it in mind if you need to do something cleverer than the data you have seems to provide.
"},{"location":"downloader_parsers_formulae.html#context_variable_formula","title":"context variable","text":"This is a basic hacky answer to a particular problem. It is a simple key:value dictionary that at the moment only stores one variable, 'url', which contains the original URL used to fetch the data being parsed.
If a different URL Class links to this parser via an API URL, this 'url' variable will always be the API URL (i.e. it literally is the URL used to fetch the data), not any thread/whatever URL the user entered.
Hit the 'edit example parsing context' to change the URL used for testing.
I have used this several times to stitch together file URLs when I am pulling data from APIs, like in the compound formula example above. In this case, the starting URL is https://a.4cdn.org/tg/thread/57806016.json
, from which I extract the board name, \"tg\", using the string converter, and then add in 4chan's CDN domain to make the appropriate base file URL (https:/i.4cdn.org/tg/
) for the given thread. I only have to jump through this hoop in 4chan's case because they explicitly store file URLs by board name. 8chan on the other hand, for instance, has a static https://media.8ch.net/file_store/
for all files, so it is a little easier (I think I just do a single 'prepend' string transformation somewhere).
If you want to make some parsers, you will have to get familiar with how different sites store and present their data!
"},{"location":"downloader_parsers_full_example_api.html","title":"api example","text":"Some sites offer API calls for their pages. Depending on complexity and quality of content, using these APIs may or may not be a good idea. Artstation has a good one--let's first review our URL Classes:
We convert the original Post URL, https://www.artstation.com/artwork/mQLe1 to https://www.artstation.com/projects/mQLe1.json. Note that Artstation Post URLs can produce multiple files, and that the API url should not be associated with those final files.
So, when the client encounters an 'artstation file page' URL, it will generate the equivalent 'artstation file page json api' URL and use that for downloading and parsing. If you would like to review your API links, check out network->downloader components->manage url class links->api links. Using Example URLs, it will figure out which URL Classes link to others and ensure you are mapping parsers only to the final link in the chain--there should be several already in there by default.
Now lets look at the JSON. Loading clean JSON in a browser should present you with a nicer view:
I have highlighted the data we want, which is:
JSON is a dream to parse, and I will assume you are comfortable with Content Parsers from the previous examples, so I'll simply paste the different formulae one after another:
Each image is stored under a separate numbered 'assets' list item. This one has just two, but some Artstation pages have dozens of images. The only unusual part here is I also put a String Match of ^(?!.*assets\\/covers).*$
, which filters out 'cover' images (such as on here), which make for nice portfolio thumbs on the site but are not interesting to us.
This fetches the 'creator' tag. Artstation's API is great because it includes profile data in content requests. There's the creator's presentation name, username, profile link, avatar URLs, all that inside a regular request about this particular work. When that information is missing (like in yiff.party), it may make the API useless to you.
These are all simple. You can take or leave the title and medium tags--some people like them, some don't. This example has no unnamespaced tags, but this one does. Creator-entered tags are sometimes not worth parsing (on tumblr, for instance, you often get run-on tags like #imbored #whatisevengoingon that are irrelevent to the work), but Artstation users are all professionals trying to get their work noticed, so the tags are usually pretty good.
This again uses python's datetime to decode the date, which Artstation presents with millisecond accuracy, ha ha. I use a (.+:..)\\..*->\\1
regex (i.e. \"get everything before the period\") to strip off the timezone and milliseconds and then decode as normal.
APIs that are stable and free to access (e.g. do not require OAuth or other complicated login headers) can make parsing fantastic. They save bandwidth and CPU time, and they are typically easier to work with than HTML. Unfortunately, the boorus that do provide APIs often list their tags without namespace information, so I recommend you double-check you can get what you want before you get too deep into it. Some APIs also offer incomplete data, such as relative URLs (relative to the original URL!), which can be a pain to figure out in our system.
"},{"location":"downloader_parsers_full_example_file_page.html","title":"file page example","text":"Let's look at this page: https://gelbooru.com/index.php?page=post&s=view&id=3837615.
What sorts of data are we interested in here?
A tempting strategy for pulling the file URL is to just fetch the src of the embedded <img>
tag, but:
<video>
and <embed>
tags.If you have an account with the site you are parsing and have clicked the appropriate 'Always view original' setting, you may not see these sorts of sample-size banners! I recommend you log out of/go incognito for sites you are inspecting for hydrus parsing (unless a log-in is required to see content, so the hydrus user will have to set up hydrus-side login to actually use the parser), or you can easily NSFW-gates and other logged-out hurdles.
When trying to pin down the right link, if there are no good alternatives, you often have to write several File URL rules with different precedence, saying 'get the \"Click Here to See Full Size\" link at 75' and 'get the embed's \"src\" at 25' and so on to make sure you cover different situations, but as it happens Gelbooru always posts the actual File URL at:
<meta property=\"og:image\" content=\"https://gelbooru.com/images/38/6e/386e12e33726425dbd637e134c4c09b5.jpeg\" />
under the <head>
<a href=\"https://simg3.gelbooru.com//images/38/6e/386e12e33726425dbd637e134c4c09b5.jpeg\" target=\"_blank\" style=\"font-weight: bold;\">Original image</a>
which can be found by putting a String Match in the html formula.<meta>
with property=\"og:image\"
is easy to search for (and they use the same tag for video links as well!). For the Original Image, you can use a String Match like so:
Gelbooru uses \"Original Image\" even when they link to webm, which is helpful, but like \"og:image\", it could be changed to 'video' in future.
I think I wrote my gelbooru parser before I added String Matches to individual HTML formulae tag rules, so I went with this, which is a bit more cheeky:
But it works. Sometimes, just regexing for links that fit the site's CDN is a good bet for finding difficult stuff.
"},{"location":"downloader_parsers_full_example_file_page.html#tags","title":"tags","text":"Most boorus have a taglist on the left that has a nice id or class you can pull, and then each namespace gets its own class for CSS-colouring:
Make sure you browse around the booru for a bit, so you can find all the different classes they use. character/artist/copyright are common, but some sneak in the odd meta/species/rating.
Skipping ?/-/+ characters can be a pain if you are lacking a nice tag-text class, in which case you can add a regex String Match to the HTML formula (as I do here, since Gelb offers '?' links for tag definitions) like [^\\?\\-+\\s], which means \"the text includes something other than just '?' or '-' or '+' or whitespace\".
"},{"location":"downloader_parsers_full_example_file_page.html#md5_hash","title":"md5 hash","text":"If you look at the Gelbooru File URL, https://gelbooru.com/images/38/6e/386e12e33726425dbd637e134c4c09b5.jpeg, you may notice the filename is all hexadecimal. It looks like they store their files under a two-deep folder structure, using the first four characters--386e here--as the key. It sure looks like '386e12e33726425dbd637e134c4c09b5' is not random ephemeral garbage!
In fact, Gelbooru use the MD5 of the file as the filename. Many storage systems do something like this (hydrus uses SHA256!), so if they don't offer a <meta>
tag that explicitly states the md5 or sha1 or whatever, you can sometimes infer it from one of the file links. This screenshot is from the more recent version of hydrus, which has the more powerful 'string processing' system for string transformations. It has an intimidating number of nested dialogs, but we can stay simple for now, with only the one regex substitution step inside a string 'converter':
Here we are using the same property=\"og:image\" rule to fetch the File URL, and then we are regexing the hex hash with .*(\\[0-9a-f\\]{32}).*
(MD5s are 32 hex characters). We select 'hex' as the encoding type. Hashes require a tiny bit more data handling behind the scenes, but in the Content Parser test page it presents the hash again neatly in English: \"md5 hash: 386e12e33726425dbd637e134c4c09b5\"), meaning everything parsed correct. It presents the hash in hex even if you select the encoding type as base64.
If you think you have found a hash string, you should obviously test your theory! The site might not be using the actual MD5 of file bytes, as hydrus does, but instead some proprietary scheme. Download the file and run it through a program like HxD (or hydrus!) to figure out its hashes, and then search the View Source for those hex strings--you might be surprised!
Finding the hash is hugely beneficial for a parser--it lets hydrus skip downloading files without ever having seen them before!
"},{"location":"downloader_parsers_full_example_file_page.html#source_time","title":"source time","text":"Post/source time lets subscriptions and watchers make more accurate guesses at current file velocity. It is neat to have if you can find it, but:
FUCK ALL TIMEZONES FOREVER
Gelbooru offers--
<li>Posted: 2017-08-18 19:59:44<br /> by <a href=\"index.php?page=account&s=profile&uname=jayage5ds\">jayage5ds</a></li>\n
--so let's see how we can turn that into a Unix timestamp:
I find the <li>
that starts \"Posted: \" and then decode the date according to the hackery-dackery-doo format from here. %c
and %z
are unreliable, and attempting timezone adjustments is overall a supervoid that will kill your time for no real benefit--subs and watchers work fine with 12-hour imprecision, so if you have a +0300 or EST in your string, just cut those characters off with another String Transformation. As long as you are getting about the right day, you are fine.
Source URLs are nice to have if they are high quality. Some boorus only ever offer artist profiles, like https://twitter.com/artistname
, whereas we want singular Post URLs that point to other places that host this work. For Gelbooru, you could fetch the Source URL as we did source time, searching for \"Source: \", but they also offer more easily in an edit form:
<input type=\"text\" name=\"source\" size=\"40\" id=\"source\" value=\"https://www.deviantart.com/art/Lara-Croft-Artifact-Dive-699335378\" />\n
This is a bit of a fragile location to parse from--Gelb could change or remove this form at any time, whereas the \"Posted: \" <li>
is probably firmer, but I expect I wrote it before I had String Matches in. It works for now, which in this game is often Good Enough\u2122.
Also--be careful pulling from text or tooltips rather than an href-like attribute, as whatever is presented to the user may be clipped for longer URLs. Make sure you try your rules on a couple of different pages to make sure you aren't pulling \"https://www.deviantart.com/art/Lara...\" by accident anywhere!
"},{"location":"downloader_parsers_full_example_file_page.html#summary","title":"summary","text":"Phew--all that for a bit of Lara Croft! Thankfully, most sites use similar schemes. Once you are familiar with the basic idea, the only real work is to duplicate an existing parser and edit for differences. Our final parser looks like this:
This is overall a decent parser. Some parts of it may fail when Gelbooru update to their next version, but that can be true of even very good parsers with multiple redundancy. For now, hydrus can use this to quickly and efficiently pull content from anything running Gelbooru 0.2.5., and the effort spent now can save millions of combined right-click->save as and manual tag copies in future. If you make something like this and share it about, you'll be doing a good service for those who could never figure it out.
"},{"location":"downloader_parsers_full_example_gallery_page.html","title":"gallery page example","text":"Caution
These guides should roughly follow what comes with the client by default! You might like to have the actual UI open in front of you so you can play around with the rules and try different test parses yourself.
Let's look at this page: https://e621.net/post/index/1/rating:safe pokemon
We've got 75 thumbnails and a bunch of page URLs at the bottom.
"},{"location":"downloader_parsers_full_example_gallery_page.html#main_page","title":"first, the main page","text":"This is easy. It gets a good name and some example URLs. e621 has some different ways of writing out their queries (and as they use some tags with '/', like 'male/female', this can cause character encoding issues depending on whether the tag is in the path or query!), but we'll put that off for now--we just want to parse some stuff.
"},{"location":"downloader_parsers_full_example_gallery_page.html#thumbnail_urls","title":"thumbnail links","text":"Most browsers have some good developer tools to let you Inspect Element and get a better view of the HTML DOM. Be warned that this information isn't always the same as View Source (which is what hydrus will get when it downloads the initial HTML document), as some sites load results dynamically with javascript and maybe an internal JSON API call (when sites move to systems that load more thumbs as you scroll down, it makes our job more difficult--in these cases, you'll need to chase down the embedded JSON or figure out what API calls their JS is making--the browser's developer tools can help you here again). Thankfully, e621 is (and most boorus are) fairly static and simple:
Every thumb on e621 is a <span>
with class=\"thumb\" wrapping an <a>
and an <img>
. This is a common pattern, and easy to parse:
There's no tricky String Matches or String Converters needed--we are just fetching hrefs. Note that the links get relative-matched to example.com for now--I'll probably fix this to apply to one of the example URLs, but rest assured that IRL the parser will 'join' its url up with the appropriate Gallery URL used to fetch the data. Sometimes, you might want to add a rule for search descendents for the first <div> tag with id=content
to make sure you are only grabbing thumbs from the main box, whether that is a <div>
or a <span>
, and whether it has id=\"content
\" or class=\"mainBox\"
, but unless you know that booru likes to embed \"popular\" or \"favourite\" 'thumbs' up top that will be accidentally caught by a <span>
's with class=\"thumb\"
, I recommend you not make your rules overly specific--all it takes is for their dev to change the name of their content box, and your whole parser breaks. I've ditched the <span>
requirement in the rule here for exactly that reason--class=\"thumb\"
is necessary and sufficient.
Remember that the parsing system allows you to go up ancestors as well as down descendants. If your thumb-box has multiple links--like to see the artist's profile or 'set as favourite'--you can try searching for the <span>
s, then down to the <img>
, and then up to the nearest <a>
. In English, this is saying, \"Find me all the image link URLs in the thumb boxes.\"
Most boorus have 'next' or '>>' at the bottom, which can be simple enough, but many have a neat <link href=\"/post/index/2/rating:safe%20pokemon\" rel=\"next\" />
in the <head>
. The <head>
solution is easier, if available, but my default e621 parser happens to pursue the 'paginator':
As it happens, e621 also apply the rel=\"next\"
attribute to their \"Next >>\" links, which makes it all that easier for us to find. Sometimes there is no \"next\" id or class, and you'll want to add a String Match to your html formula to test for a string value of '>>' or whatever it is. A good trick is to View Source and then search for the critical /post/index/2/
phrase you are looking for--you might find what you want in a <link>
tag you didn't expect or even buried in a hidden 'share to tumblr' button. <form>
s for reporting or commenting on content are another good place to find content ids.
Note that this finds two URLs. e621 apply the rel=\"next\"
to both the \"2\" link and the \"Next >>\" one. The download engine merges the parser's dupes, so don't worry if you end up parsing both the 'top' and 'bottom' next page links, or if you use multiple rules to parse the same data in different ways.
With those two rules, we are done. Gallery parsers are nice and simple.
"},{"location":"downloader_parsers_page_parsers.html","title":"Page Parsers","text":"We can now produce individual rows of rich metadata. To arrange them all into a useful structure, we will use Page Parsers.
The Page Parser is the top level parsing object. It takes a single document and produces a list--or a list of lists--of metadata. Here's the main UI:
Notice that the edit panel has three sub-pages.
"},{"location":"downloader_parsers_page_parsers.html#main","title":"main","text":"This page is just a simple list:
Each content parser here will be applied to the document and returned in this page parser's results list. Like most boorus, e621's File Pages only ever present one file, and they have simple markup, so the solution here was simple. The full contents of that test window are:
*** 1 RESULTS BEGIN ***\n\ntag: character:krystal\ntag: creator:s mino930\nfile url: https://static1.e621.net/data/fc/b6/fcb673ed89241a7b8d87a5dcb3a08af7.jpg\ntag: anthro\ntag: black nose\ntag: blue fur\ntag: blue hair\ntag: clothing\ntag: female\ntag: fur\ntag: green eyes\ntag: hair\ntag: hair ornament\ntag: jewelry\ntag: short hair\ntag: solo\ntag: video games\ntag: white fur\ntag: series:nintendo\ntag: series:star fox\ntag: species:canine\ntag: species:fox\ntag: species:mammal\n\n*** RESULTS END ***\n
When the client sees this in a downloader context, it will where to download the file and which tags to associate with it based on what the user has chosen in their 'tag import options'.
"},{"location":"downloader_parsers_page_parsers.html#subsidiary_page_parsers","title":"subsidiary page parsers","text":"Here be dragons. This was an attempt to make parsing more helpful in certain API situations, but it ended up ugly. I do not recommend you use it, as I will likely scratch the whole thing and replace it with something better one day. It basically splits the page up into pieces that can then be parsed by nested page parsers as separate objects, but the UI and workflow is hell. Afaik, the imageboard API parsers use it, but little/nothing else. If you are really interested, check out how those work and maybe duplicate to figure out your own imageboard parser and/or send me your thoughts on how to separate File URL/timestamp combos better.
"},{"location":"downloader_sharing.html","title":"Sharing Downloaders","text":"If you are working with users who also understand the downloader system, you can swap your GUGs, URL Classes, and Parsers separately using the import/export buttons on the relevant dialogs, which work in pngs and clipboard text.
But if you want to share conveniently, and with users who are not familiar with the different downloader objects, you can package everything into a single easy-import png as per here.
The dialog to use is network->downloader components->export downloaders:
It isn't difficult. Essentially, you want to bundle enough objects to make one or more 'working' GUGs at the end. I recommend you start by just hitting 'add gug', which--using Example URLs--will attempt to figure out everything you need by itself.
This all works on Example URLs and some domain guesswork, so make sure your url classes are good and the parsers have correct Example URLs as well. If they don't, they won't all link up neatly for the end user. If part of your downloader is on a different domain to the GUGs and Gallery URLs, then you'll have to add them manually. Just start with 'add gug' and see if it looks like enough.
Once you have the necessary and sufficient objects added, you can export to png. You'll get a similar 'does this look right?' summary as what the end-user will see, just to check you have everything in order and the domains all correct. If that is good, then make sure to give the png a sensible filename and embellish the title and description if you need to. You can then send/post that png wherever, and any regular user will be able to use your work.
"},{"location":"downloader_url_classes.html","title":"URL Classes","text":"The fundamental connective tissue of the downloader system is the 'URL Class'. This object identifies and normalises URLs and links them to other components. Whenever the client handles a URL, it tries to match it to a URL Class to figure out what to do.
"},{"location":"downloader_url_classes.html#url_types","title":"the types of url","text":"For hydrus, an URL is useful if it is one of:
File URLThis returns the full, raw media file with no HTML wrapper. They typically end in a filename like http://safebooru.org//images/2333/cab1516a7eecf13c462615120ecf781116265f17.jpg, but sometimes they have a more complicated fetch command ending like 'file.php?id=123456' or '/post/content/123456'.
These URLs are remembered for the file in the 'known urls' list, so if the client happens to encounter the same URL in future, it can determine whether it can skip the download because the file is already in the database or has previously been deleted.
It is not important that File URLs be matched by a URL Class. File URL is considered the 'default', so if the client finds no match, it will assume the URL is a file and try to download and import the result. You might want to particularly specify them if you want to present them in the media viewer or discover File URLs are being confused for Post URLs or something.
Post URLThis typically return some HTML that contains a File URL and metadata such as tags and post time. They sometimes present multiple sizes (like 'sample' vs 'full size') of the file or even different formats (like 'ugoira' vs 'webm'). The Post URL for the file above, http://safebooru.org/index.php?page=post&s=view&id=2429668 has this 'sample' presentation. Finding the best File URL in these cases can be tricky!
This URL is also saved to 'known urls' and will usually be similarly skipped if it has previously been downloaded. It will also appear in the media viewer as a clickable link.
Gallery URL This presents a list of Post URLs or File URLs. They often also present a 'next page' URL. It could be a page like http://safebooru.org/index.php?page=post&s=list&tags=yorha_no._2_type_b&pid=0 or an API URL like http://safebooru.org/index.php?page=dapi&s=post&tags=yorha_no._2_type_b&q=index&pid=0. Watchable URL This is the same as a Gallery URL but represents an ephemeral page that receives new files much faster than a gallery but will soon 'die' and be deleted. For our purposes, this typically means imageboard threads."},{"location":"downloader_url_classes.html#url_components","title":"the components of a url","text":"As far as we are concerned, a URL string has four parts:
http
or https
safebooru.org
or i.4cdn.org
or cdn002.somebooru.net
index.php
or tesla/res/7518.json
or pictures/user/daruak/page/2
or art/Commission-animation-Elsa-and-Anna-541820782
page=post&s=list&tags=yorha_no._2_type_b&pid=40
or page=post&s=view&id=2429668
So, let's look at the 'edit url class' panel, which is found under network->downloader components->manage url classes:
A TBIB File Page like https://tbib.org/index.php?page=post&s=view&id=6391256 is a Post URL. Let's look at the metadata first:
Name and typeLike with GUGs, we should set a good unambiguous name so the client can clearly summarise this url to the user. 'tbib file page' is good.
This is a Post URL, so we set the 'post url' type.
Association logicAll boorus and most sites only present one file per page, but some sites present multiple files on one page, usually several pages in a series/comic, as with pixiv. Danbooru-style thumbnail links to 'this file has a post parent' do not count here--I mean that a single URL embeds multiple full-size images, either with shared or separate tags. It is very important to the hydrus client's downloader logic (making decisions about whether it has previously visited a URL, so whether to skip checking it again) that if a site can present multiple files on a single page that 'can produce multiple files' is checked.
Related is the idea of whether a 'known url' should be associated. Typically, this should be checked for Post and File URLs, which are fixed, and unchecked for Gallery and Watchable URLs, which are ephemeral and give different results from day to day. There are some unusual exceptions, so give it a brief thought--but if you have no special reason, leave this as the default for the url type.
And now, for matching the string itself, let's revisit our four components:
Scheme TBIB supports http and https, so I have set the 'preferred' scheme to https. Any 'http' TBIB URL a user inputs will be automatically converted to https. Location/DomainFor Post URLs, the domain is always \"tbib.org\".
The 'allow' and 'keep' subdomains checkboxes let you determine if a URL with \"artistname.artsite.com\" will match a URL Class with \"artsite.com\" domain and if that subdomain should be remembered going forward. Most sites do not host content on subdomains, so you can usually leave 'match' unchecked. The 'keep' option (which is only available if 'keep' is checked) is more subtle, only useful for rare cases, and unless you have a special reason, you should leave it checked. (For keep: In cases where a site farms out File URLs to CDN servers on subdomains--like randomly serving a mirror of \"https://muhbooru.org/file/123456\" on \"https://srv2.muhbooru.org/file/123456\"--and removing the subdomain still gives a valid URL, you may not wish to keep the subdomain.) Since TBIB does not use subdomains, these options do not matter--we can leave both unchecked.
'www' and 'www2' and similar subdomains are automatically matched. Don't worry about them.
Path Components TBIB just uses a single \"index.php\" on the root directory, so the path is not complicated. Were it longer (like \"gallery/cgi/index.php\", we would add more (\"gallery\" and \"cgi\"), and since the path of a URL has a strict order, we would need to arrange the items in the listbox there so they were sorted correctly. Parameters TBIB's index.php takes many parameters to render different page types. Note that the Post URL uses \"s=view\", while TBIB Gallery URLs use \"s=list\". In any case, for a Post URL, \"id\", \"page\", and \"s\" are necessary and sufficient."},{"location":"downloader_url_classes.html#string_matches","title":"string matches","text":"As you edit these components, you will be presented with the Edit String Match Panel:
This lets you set the type of string that will be valid for that component. If a given path or query component does not match the rules given here, the URL will not match the URL Class. Most of the time you will probably want to set 'fixed characters' of something like \"post\" or \"index.php\", but if the component you are editing is more complicated and could have a range of different valid values, you can specify just numbers or letters or even a regex pattern. If you try to do something complicated, experiment with the 'example string' entry to make sure you have it set how you think.
Don't go overboard with this stuff, though--most sites do not have super-fine distinctions between their different URL types, and hydrus users will not be dropping user account or logout pages or whatever on the client, so you can be fairly liberal with the rules.
"},{"location":"downloader_url_classes.html#match_details","title":"how do they match, exactly?","text":"This URL Class will be assigned to any URL that matches the location, path, and query. Missing path component or parameters in the URL will invalidate the match but additonal ones will not!
For instance, given:
Only URL A will match
And:
Both URL A and B will match
And:
Both URL A and B will match, URL C will not
If multiple URL Classes match a URL, the client will try to assign the most 'complicated' one, with the most path components and then parameters.
Given two example URLs and URL Classes:
URL A will match URL Class A but not URL Class B and so will receive A.
URL B will match both and receive URL Class B as it is more complicated.
This situation is not common, but when it does pop up, it can be a pain. It is usually a good idea to match exactly what you need--no more, no less.
"},{"location":"downloader_url_classes.html#url_normalisation","title":"normalising urls","text":"Different URLs can give the same content. The http and https versions of a URL are typically the same, and:
And:
Since we are in the business of storing and comparing URLs, we want to 'normalise' them to a single comparable beautiful value. You see a preview of this normalisation on the edit panel. Normalisation happens to all URLs that enter the program.
Note that in e621's case (and for many other sites!), that text after the id is purely decoration. It can change when the file's tags change, so if we want to compare today's URLs with those we saw a month ago, we'd rather just be without it.
On normalisation, all URLs will get the preferred http/https switch, and their parameters will be alphabetised. File and Post URLs will also cull out any surplus path or query components. This wouldn't affect our TBIB example above, but it will clip the e621 example down to that 'bare' id URL, and it will take any surplus 'lang=en' or 'browser=netscape_24.11' garbage off the query text as well. URLs that are not associated and saved and compared (i.e. normal Gallery and Watchable URLs) are not culled of unmatched path components or query parameters, which can sometimes be useful if you want to match (and keep intact) gallery URLs that might or might not include an important 'sort=desc' type of parameter.
Since File and Post URLs will do this culling, be careful that you not leave out anything important in your rules. Make sure what you have is both necessary (nothing can be removed and still keep it valid) and sufficient (no more needs to be added to make it valid). It is a good idea to try pasting the 'normalised' version of the example URL into your browser, just to check it still works.
"},{"location":"downloader_url_classes.html#default_values","title":"'default' values","text":"Some sites present the first page of a search like this:
https://danbooru.donmai.us/posts?tags=skirt
But the second page is:
https://danbooru.donmai.us/posts?tags=skirt&page=2
Another example is:
https://www.hentai-foundry.com/pictures/user/Mister69M
https://www.hentai-foundry.com/pictures/user/Mister69M/page/2
What happened to 'page=1' and '/page/1'? Adding those '1' values in works fine! Many sites, when an index is absent, will secretly imply an appropriate 0 or 1. This looks pretty to users looking at a browser address bar, but it can be a pain for us, who want to match both styles to one URL Class. It would be nice if we could recognise the 'bare' initial URL and fill in the '1' values to coerce it to the explicit, automation-friendly format. Defaults to the rescue:
After you set a path component or parameter String Match, you will be asked for an optional 'default' value. You won't want to set one most of the time, but for Gallery URLs, it can be hugely useful--see how the normalisation process automatically fills in the missing path component with the default! There are plenty of examples in the default Gallery URLs of this, so check them out. Most sites use page indices starting at '1', but Gelbooru-style imageboards use 'pid=0' file index (and often move forward 42, so the next pages will be 'pid=42', 'pid=84', and so on, although others use deltas of 20 or 40).
"},{"location":"downloader_url_classes.html#next_gallery_page_prediction","title":"can we predict the next gallery page?","text":"Now we can harmonise gallery urls to a single format, we can predict the next gallery page! If, say, the third path component or 'page' parameter is always a number referring to page, you can select this under the 'next gallery page' section and set the delta to change it by. The 'next gallery page url' section will be automatically filled in. This value will be consulted if the parser cannot find a 'next gallery page url' from the page content.
It is neat to set this up, but I only recommend it if you actually cannot reliably parse a next gallery page url from the HTML later in the process. It is neater to have searches stop naturally because the parser said 'no more gallery pages' than to have hydrus always one page beyond and end every single search on an uglier 'No results found' or 404 result.
Unfortunately, some sites will either not produce an easily parsable next page link or randomly just not include it due to some issue on their end (Gelbooru is a funny example of this). Also, APIs will often have a kind of 'start=200&num=50', 'start=250&num=50' progression but not include that state in the XML or JSON they return. These cases require the automatic next gallery page rules (check out Artstation and tumblr api gallery page URL Classes in the defaults for examples of this).
"},{"location":"downloader_url_classes.html#api_links","title":"how do we link to APIs?","text":"If you know that a URL has an API backend, you can tell the client to use that API URL when it fetches data. The API URL needs its own URL Class.
To define the relationship, click the \"String Converter\" button, which gives you this:
You may have seen this panel elsewhere. It lets you convert a string to another over a number of transformation steps. The steps can be as simple as adding or removing some characters or applying a full regex substitution. For API URLs, you are mostly looking to isolate some unique identifying data (\"m/thread/16086187\" in this case) and then substituting that into the new API path. It is worth testing this with several different examples!
When the client links regular URLs to API URLs like this, it will still associate the human-pretty regular URL when it needs to display to the user and record 'known urls' and so on. The API is just a quick lookup when it actually fetches and parses the respective data.
"},{"location":"duplicates.html","title":"duplicates","text":"As files are shared on the internet, they are often resized, cropped, converted to a different format, altered by the original or a new artist, or turned into a template and reinterpreted over and over and over. Even if you have a very restrictive importing workflow, your client is almost certainly going to get some duplicates. Some will be interesting alternate versions that you want to keep, and others will be thumbnails and other low-quality garbage you accidentally imported and would rather delete. Along the way, it would be nice to merge your ratings and tags to the better files so you don't lose any work.
Finding and processing duplicates within a large collection is impossible to do by hand, so I have written a system to do the heavy lifting for you. It currently works on still images, but an extension for gifs and video is planned.
Hydrus finds potential duplicates using a search algorithm that compares images by their shape. Once these pairs of potentials are found, they are presented to you through a filter like the archive/delete filter to determine their exact relationship and if you want to make a further action, such as deleting the 'worse' file of a pair. All of your decisions build up in the database to form logically consistent groups of duplicates and 'alternate' relationships that can be used to infer future information. For instance, if you say that file A is a duplicate of B and B is a duplicate of C, A and C are automatically recognised as duplicates as well.
This all starts on--
"},{"location":"duplicates.html#duplicates_page","title":"the duplicates processing page","text":"On the normal 'new page' selection window, hit special->duplicates processing. This will open this page:
Let's go to the preparation page first:
The 'similar shape' algorithm works on distance. Two files with 0 distance are likely exact matches, such as resizes of the same file or lower/higher quality jpegs, whereas those with distance 4 tend to be to be hairstyle or costume changes. You will be starting on distance 0 and not expect to ever go above 4 or 8 or so. Going too high increases the danger of being overwhelmed by false positives.
If you are interested, the current version of this system uses a 64-bit phash to represent the image shape and a VPTree to search different files' phashes' relative hamming distance. I expect to extend it in future with multiple phash generation (flips, rotations, and 'interesting' image crops and video frames) and most-common colour comparisons.
Searching for duplicates is fairly fast per file, but with a large client with hundreds of thousands of files, the total CPU time adds up. You can do a little manual searching if you like, but once you are all settled here, I recommend you hit the cog icon on the preparation page and let hydrus do this page's catch-up search work in your regular maintenance time. It'll swiftly catch up and keep you up to date without you even thinking about it.
Start searching on the 'exact match' search distance of 0. It is generally easier and more valuable to get exact duplicates out of the way first.
Once you have some files searched, you should see a potential pair count appear in the 'filtering' page.
"},{"location":"duplicates.html#duplicate_filtering_page","title":"the filtering page","text":"Processing duplicates can be real trudge-work if you do not set up a workflow you enjoy. It is a little slower than the archive/delete filter, and sometimes takes a bit more cognitive work. For many users, it is a good task to do while listening to a podcast or having a video going on another screen.
If you have a client with tens of thousands of files, you will likely have thousands of potential pairs. This can be intimidating, but do not worry--due to the A, B, C logical inferrences as above, you will not have to go through every single one. The more information you put into the system, the faster the number will drop.
The filter has a regular file search interface attached. As you can see, it defaults to system:everything, but you can limit what files you will be working on simply by adding new search predicates. You might like to only work on files in your archive (i.e. that you know you care about to begin with), for instance. You can choose whether both files of the pair should match the search, or just one. 'creator:' tags work very well at cutting the search domain to something more manageable and consistent--try your favourite creator!
If you would like an example from the current search domain, hit the 'show some random potential pairs' button, and it will show two or more files that seem related. It is often interesting and surprising to see what it finds! The action buttons below allow for quick processing of these pairs and groups when convenient (particularly for large cg sets with 100+ alternates), but I recommend you leave these alone until you know the system better.
When you are ready, launch the filter.
"},{"location":"duplicates.html#duplicates_filter","title":"the duplicates filter","text":"We have not set up your duplicate 'merge' options yet, so do not get too into this. For this first time, just poke around, make some pretend choices, and then cancel out and choose to forget them.
Like the archive/delete filter, this uses quick mouse-clicks, keyboard shortcuts, or button clicks to action pairs. It presents two files at a time, labelled A and B, which you can quickly switch between just as in the normal media viewer. As soon as you action them, the next pair is shown. The two files will have their current zoom-size locked so they stay the same size (and in the same position) as you switch between them. Scroll your mouse wheel a couple of times and see if any obvious differences stand out.
Please note the hydrus media viewer does not currently work well with large resolutions at high zoom (it gets laggy and may have memory issues). Don't zoom in to 1600% and try to look at jpeg artifact differences on very large files, as this is simply not well supported yet.
The hover window on the right also presents a number of 'comparison statements' to help you make your decision. Green statements mean this current file is probably 'better', and red the opposite. Larger, older, higher-quality, more-tagged files are generally considered better. These statements have scores associated with them (which you can edit in file->options->duplicates), and the file of the pair with the highest score is presented first. If the files are duplicates, you can generally assume the first file you see, the 'A', is the better, particularly if there are several green statements.
The filter will need to occasionally checkpoint, saving the decisions so far to the database, before it can fetch the next batch. This allows it to apply inferred information from your current batch and reduce your pending count faster before serving up the next set. It will present you with a quick interstitial 'confirm/back' dialog just to let you know. This happens more often as the potential count decreases.
"},{"location":"duplicates.html#duplicates_decisions","title":"the decisions to make","text":"There are three ways a file can be related to another in the current duplicates system: duplicates, alternates, or false positive (not related).
False positive (not related) is the easiest. You will not see completely unrelated pairs presented very often in the filter, particularly at low search distances, but if the shape of face and hair and clothing happen to line up (or geometric shapes, often), the search system may make a false positive match. In this case, just click 'they are not related'.
Alternate relations are files that are not duplicates but obviously related in some way. Perhaps a costume change or a recolour. Hydrus does not have rich alternate support yet (but it is planned, and highly requested), so this relationship is mostly a 'holding area' for files that we will revisit for further processing in the future.
Duplicate files are of the exact same thing. They may be different resolutions, file formats, encoding quality, or one might even have watermark, but they are fundamentally different views on the exact same art. As you can see with the buttons, you can select one file as the 'better' or say they are about the same. If the files are basically the same, there is no point stressing about which is 0.2% better--just click 'they are the same'. For better/worse pairs, you might have reason to keep both, but most of the time I recommend you delete the worse.
You can customise the shortcuts under file->shortcuts->duplicate_filter. The defaults are:
Left-click or space: this is better, delete the other.
Right-click: they are related alternates.
Middle-click: Go back one decision.
Enter/Escape: Stop filtering.
If two duplicates have different metadata like tags or archive status, you probably want to merge them. Cancel out of the filter and click the 'edit default duplicate metadata merge options' button:
By default, these options are fairly empty. You will have to set up what you want based on your services and preferences. Setting a simple 'copy all tags' is generally a good idea, and like/dislike ratings also often make sense. The settings for better and same quality should probably be similar, but it depends on your situation.
If you choose the 'custom action' in the duplicate filter, you will be presented with a fresh 'edit duplicate merge options' panel for the action you select and can customise the merge specifically for that choice. ('favourite' options will come here in the future!)
Once you are all set up here, you can dive into the duplicate filter. Please let me know how you get on with it!
"},{"location":"duplicates.html#future","title":"what now?","text":"The duplicate system is still incomplete. Now the db side is solid, the UI needs to catch up. Future versions will show duplicate information on thumbnails and the media viewer and allow quick-navigation to a file's duplicates and alternates.
For now, if you wish to see a file's duplicates, right-click it and select file relationships. You can review all its current duplicates, open them in a new page, appoint the new 'best file' of a duplicate group, and even mass-action selections of thumbnails.
You can also search for files based on the number of file relations they have (including when setting the search domain of the duplicate filter!) using system:file relationships. You can also search for best/not best files of groups, which makes it easy, for instance, to find all the spare duplicate files if you decide you no longer want to keep them.
I expect future versions of the system to also auto-resolve easy duplicate pairs, such as clearing out pixel-for-pixel png versions of jpgs.
"},{"location":"duplicates.html#game_cgs","title":"game cgs","text":"If you import a lot of game CGs, which frequently have dozens or hundreds of alternates, I recommend you set them as alternates by selecting them all and setting the status through the thumbnail right-click menu. The duplicate filter, being limited to pairs, needs to compare all new members of an alternate group to all other members once to verify they are not duplicates. This is not a big deal for alternates with three or four members, but game CGs provide an overwhelming edge case. Setting a group of thumbnails as alternate 'fixes' their alternate status immediately, discounting the possibility of any internate duplicates, and provides an easy way out of this situation.
"},{"location":"duplicates.html#duplicates_examples","title":"more information and examples","text":""},{"location":"duplicates.html#duplicates_examples_better_worse","title":"better/worse","text":"Which of two files is better? Here are some common reasons:
However these are not hard rules--sometimes a file has a larger resolution or filesize due to a bad upscaling or encoding decision by the person who 'reinterpreted' it. You really have to look at it and decide for yourself.
Here is a good example of a better/worse pair:
The first image is better because it is a png (pixel-perfect pngs are always better than jpgs for screenshots of applications--note how obvious the jpg's encoding artifacts are on the flat colour background) and it has a slightly higher (original) resolution, making it less blurry. I presume the second went through some FunnyJunk-tier trash meme site to get automatically cropped to 960px height and converted to the significantly smaller jpeg. Whatever happened, let's drop the second and keep the first.
When both files are jpgs, differences in quality are very common and often significant:
Again, this is mostly due to some online service resizing and lowering quality to ease on their bandwidth costs. There is usually no reason to keep the lower quality version.
"},{"location":"duplicates.html#duplicates_examples_same","title":"same quality duplicates","text":"When are two files the same quality? A good rule of thumb is if you scroll between them and see no obvious differences, and the comparison statements do not suggest anything significant, just set them as same quality.
Here are two same quality duplicates:
There is no obvious different between those two. The filesize is significantly different, so I suspect the smaller is a lossless png optimisation, but in the grand scheme of things, that doesn't matter so much. Many of the big content providers--Facebook, Google, Cloudflare--automatically 'optimise' the data that goes through their networks in order to save bandwidth. Although jpegs are often a slaughterhouse, with pngs it is usually harmless.
Given the filesize, you might decide that these are actually a better/worse pair--but if the larger image had tags and was the 'canonical' version on most boorus, the decision might not be so clear. You can choose better/worse and delete one randomly, but sometimes you may just want to keep both without a firm decision on which is best, so just set 'same quality' and move on. Your time is more valuable than a few dozen KB.
Sometimes, you will see pixel-for-pixel duplicate jpegs of very slightly different size, such as 787KB vs 779KB. The smaller of these is usually an exact duplicate that has had its internal metadata (e.g. EXIF tags) stripped by a program or website CDN. They are same quality unless you have a strong opinion on whether having internal metadata in a file is useful.
"},{"location":"duplicates.html#duplicates_examples_alternates","title":"alternates","text":"As I wrote above, hydrus's alternates system in not yet properly ready. It is important to have a basic 'alternates' relationship for now, but it is a holding area until we have a workflow to apply 'WIP'- or 'recolour'-type labels and present that information nicely in the media viewer.
Alternates are not of exactly the same thing, but one is variant of the other or they are both descended from a common original. The precise definition is up to you, but it generally means something like:
Here are some recolours of the same image:
And some WIP:
And a costume change:
None of these are duplicates, but they are obviously related. The duplicate search will notice they are similar, so we should let the client know they are 'alternate'.
Here's a subtler case:
These two files are very similar, but try opening both in separate tabs and then flicking back and forth: the second's glove-string is further into the mouth and has improved chin shading, a more refined eye shape, and shaved pubic hair. It is simple to spot these differences in the client's duplicate filter when you scroll back and forth.
I believe the second is an improvement on the first by the same artist, so it is a WIP alternate. You might also consider it a 'better' improvement.
Here are three files you might or might not consider to be alternates:
These are all based on the same template--which is why the dupe filter found them--but they are not so closely related as those above, and the last one is joking about a different ideology entirely and might deserve to be in its own group. Ultimately, you might prefer just to give them some shared tag and consider them not alternates per se.
"},{"location":"duplicates.html#duplicates_examples_false_positive","title":"not related/false positive","text":"Here are two files that match false positively:
Despite their similar shape, they are neither duplicates nor of even the same topic. The only commonality is the medium. I would not consider them close enough to be alternates--just adding something like 'screenshot' and 'imageboard' as tags to both is probably the closest connection they have.
Recording the 'false positive' relationship is important to make sure the comparison does not come up again in the duplicate filter.
The incidence of false positives increases as you broaden the search distance--the less precise your search, the less likely it is to be correct. At distance 14, these files all match, but uselessly:
"},{"location":"duplicates.html#duplicates_advanced","title":"the duplicates system","text":"(advanced nonsense, you can skip this section. tl;dr: duplicate file groups keep track of their best quality file, sometimes called the King)
Hydrus achieves duplicate transitivity by treating duplicate files as groups. Although you action pairs, if you set (A duplicate B), that creates a group (A,B). Subsequently setting (B duplicate C) extends the group to be (A,B,C), and so (A duplicate C) is transitively implied.
The first version of the duplicate system attempted to record better/worse/same information for all files in a virtual duplicate group, but this proved very complicated, workflow-heavy, and not particularly useful. The new system instead appoints a single King as the best file of a group. All other files in the group are beneath the King and have no other relationship data retained.
This King represents the group in the duplicate filter (and in potential pairs, which are actually recorded between duplicate media groups--even if most of them at the outset only have one member). If the other file in a pair is considered better, it becomes the new King, but if it is worse or equal, it merges into the other members. When two Kings are compared, whole groups can merge!
Alternates are stored in a similar way, except the members are duplicate groups rather than individual files and they have no significant internal relationship metadata yet. If \u03b1, \u03b2, and \u03b3 are duplicate groups that each have one or more files, then setting (\u03b1 alt \u03b2) and (\u03b2 alt \u03b3) creates an alternate group (\u03b1,\u03b2,\u03b3), with the caveat that \u03b1 and \u03b3 will still be sent to the duplicate filter once just to check they are not duplicates by chance. The specific file members of these groups, A, B, C and so on, inherit the relationships of their parent groups when you right-click on their thumbnails.
False positive relationships are stored between pairs of alternate groups, so they apply transitively between all the files of either side's alternate group. If (\u03b1 alt \u03b2) and (\u03c8 alt \u03c9) and you apply (\u03b1 fp \u03c8), then (\u03b1 fp \u03c9), (\u03b2 fp \u03c8), and (\u03b2 fp \u03c9) are all transitively implied.
More examples"},{"location":"faq.html","title":"FAQ","text":""},{"location":"faq.html#repositories","title":"What is a repository?","text":"
A repository is a service in the hydrus network that stores a certain kind of information--files or tag mappings, for instance--as submitted by users all over the internet. Those users periodically synchronise with the repository so they know everything that it stores. Sometimes, like with tags, this means creating a complete local copy of everything on the repository. Hydrus network clients never send queries to repositories; they perform queries over their local cache of the repository's data, keeping everything confined to the same computer.
"},{"location":"faq.html#tags","title":"What is a tag?","text":"wiki
A tag is a small bit of text describing a single property of something. They make searching easy. Good examples are \"flower\" or \"nicolas cage\" or \"the sopranos\" or \"2003\". By combining several tags together ( e.g. [ 'tiger woods', 'sports illustrated', '2008' ] or [ 'cosplay', 'the legend of zelda' ] ), a huge image collection is reduced to a tiny and easy-to-digest sample.
A good word for the connection of a particular tag to a particular file is mapping.
Hydrus is designed with the intention that tags are for searching, not describing. Workflows and UI are tuned for finding files and other similar files (e.g. by the same artist), and while it is possible to have nice metadata overlays around files, this is not considered their chief purpose. Trying to have 'perfect' descriptions for files is often a rabbit-hole that can consume hours of work with relatively little demonstrable benefit.
All tags are automatically converted to lower case. 'Sunset Drive' becomes 'sunset drive'. Why?
Furthermore, leading and trailing whitespace is removed, and multiple whitespace is collapsed to a single character.
' yellow dress '
becomes
'yellow dress'
"},{"location":"faq.html#namespaces","title":"What is a namespace?","text":"A namespace is a category that in hydrus prefixes a tag. An example is 'person' in the tag 'person:ron paul'--it lets people and software know that 'ron paul' is a name. You can create any namespace you like; just type one or more words and then a colon, and then the next string of text will have that namespace.
The hydrus client gives namespaces different colours so you can pick out important tags more easily in a large list, and you can also search by a particular namespace, even creating complicated predicates like 'give all files that do not have any character tags', for instance.
"},{"location":"faq.html#filenames","title":"Why not use filenames and folders?","text":"As a retrieval method, filenames and folders are less and less useful as the number of files increases. Why?
A filename is often--for ridiculous reasons--limited to a certain prohibitive character set. Even when utf-8 is supported, some arbitrary ascii characters are usually not, and different localisations, operating systems and formatting conventions only make it worse.
Folders can offer context, but they are clunky and time-consuming to change. If you put each chapter of a comic in a different folder, for instance, reading several volumes in one sitting can be a pain. Nesting many folders adds navigation-latency and tends to induce less informative \"04.jpg\"-type filenames.
So, the client tracks files by their hash. This technical identifier easily eliminates duplicates and permits the database to robustly attach other metadata like tags and ratings and known urls and notes and everything else, even across multiple clients and even if a file is deleted and later imported.
As a general rule, I suggest you not set up hydrus to parse and display all your imported files' filenames as tags. 'image.jpg' is useless as a tag. Shed the concept of filenames as you would chains.
"},{"location":"faq.html#external_files","title":"Can the client manage files from their original locations?","text":"When the client imports a file, it makes a quickly accessible but human-ugly copy in its internal database, by default under install_dir/db/client_files. When it needs to access that file again, it always knows where it is, and it can be confident it is what it expects it to be. It never accesses the original again.
This storage method is not always convenient, particularly for those who are hesitant about converting to using hydrus completely and also do not want to maintain two large copies of their collections. The question comes up--\"can hydrus track files from their original locations, without having to copy them into the db?\"
The technical answer is, \"This support could be added,\" but I have decided not to, mainly because:
It is not unusual for new users who ask for this feature to find their feelings change after getting more experience with the software. If desired, path text can be preserved as tags using regexes during import, and getting into the swing of searching by metadata rather than navigating folders often shows how very effective the former is over the latter. Most users eventually import most or all of their collection into hydrus permanently, deleting their old folder structure as they go.
For this reason, if you are hesitant about doing things the hydrus way, I advise you try running it on a smaller subset of your collection, say 5,000 files, leaving the original copies completely intact. After a month or two, think about how often you used hydrus to look at the files versus navigating through folders. If you barely used the folders, you probably do not need them any more, but if you used them a lot, then hydrus might not be for you, or it might only be for some sorts of files in your collection.
"},{"location":"faq.html#sqlite","title":"Why use SQLite?","text":"Hydrus uses SQLite for its database engine. Some users who have experience with other engines such as MySQL or PostgreSQL sometimes suggest them as alternatives. SQLite serves hydrus's needs well, and at the moment, there are no plans to change.
Since this question has come up frequently, a user has written an excellent document talking about the reasons to stick with SQLite. If you are interested in this subject, please check it out here:
https://gitgud.io/prkc/hydrus-why-sqlite/blob/master/README.md
"},{"location":"faq.html#hashes","title":"What is a hash?","text":"wiki
Hashes are a subject you usually have to be a software engineer to find interesting. The simple answer is that they are unique names for things. Hashes make excellent identifiers inside software, as you can safely assume that f099b5823f4e36a4bd6562812582f60e49e818cf445902b504b5533c6a5dad94 refers to one particular file and no other. In the client's normal operation, you will never encounter a file's hash. If you want to see a thumbnail bigger, double-click it; the software handles the mathematics.
For those who are interested: hydrus uses SHA-256, which spits out 32-byte (256-bit) hashes. The software stores the hash densely, as 32 bytes, only encoding it to 64 hex characters when the user views it or copies to clipboard. SHA-256 is not perfect, but it is a great compromise candidate; it is secure for now, it is reasonably fast, it is available for most programming languages, and newer CPUs perform it more efficiently all the time.
"},{"location":"faq.html#access_keys","title":"What is an access key?","text":"The hydrus network's repositories do not use username/password, but instead a single strong identifier-password like this:
7ce4dbf18f7af8b420ee942bae42030aab344e91dc0e839260fcd71a4c9879e3
These hex numbers give you access to a particular account on a particular repository, and are often combined like so:
7ce4dbf18f7af8b420ee942bae42030aab344e91dc0e839260fcd71a4c9879e3@hostname.com:45871
They are long enough to be impossible to guess, and also randomly generated, so they reveal nothing personally identifying about you. Many people can use the same access key (and hence the same account) on a repository without consequence, although they will have to share any bandwidth limits, and if one person screws around and gets the account banned, everyone will lose access.
The access key is the account. Do not give it to anyone you do not want to have access to the account. An administrator will never need it; instead they will want your account id.
"},{"location":"faq.html#account_ids","title":"What is an account id?","text":"This is another long string of random hexadecimal that identifies your account without giving away access. If you need to identify yourself to a repository administrator (say, to get your account's permissions modified), you will need to tell them your account id. You can copy it to your clipboard in services->review services.
"},{"location":"faq.html#service_isolation","title":"Why does the file I deleted and then re-imported still have its tags?","text":"Hydrus splits its different abilities and domains (e.g. the list of files on your disk, or the tag mappings in 'my tags', or your files' notes) into separate services. You can see these in review services and manage services. Although the services of the same type may interact (e.g. deleting a file from one service might send that file to the 'trash' service, or adding tag parents to one tag service might implicate tags on another), those of different types are generally completely independent. Your tags don't care where the files they map to are.
So, when you delete a file from 'my files', none of its tag mappings in 'my tags' change--they remain attached to the 'ghost' of the deleted file. Your notes, ratings, and known URLs are the same (URLs is important, since it lets the client skip URLs for files you previously deleted). If you re-import the file, it will have everything it did before, with only a couple of pertinent changes like, obviously, import time.
This is an important part of how the PTR works--when you sync with the PTR, your client downloads a couple billion mappings for files you do not have yet. Then, when you happen to import one of those files, it appears in your importer with its PTR tags 'apparently' already set--in truth, it always had them.
When you feel like playing with some more advanced concepts, turn on help->advanced mode and open a new search page. Change the file domain from 'my files' to 'all known files' or 'deleted from my files' and start typing a common tag--you'll get autocomplete results with counts! You can even run the search, and you'll get a ton of 'non-local' and therefore non-viewable files that are typically given a default hydrus thumbnail. These are files that your client is aware of, but does not currently have. You can run the manage x dialogs and edit the metadata of these ghost files just as you can your real ones. The only thing hydrus ever needs to attach metadata to a file is the file's SHA256 hash.
If you really want to delete the tags or other data for some files you deleted, then:
Ctrl+A->manage tags
and manually delete the tags there.Not really. Unless your situation involves millions of richly locally tagged files and a gigantic deleted:kept file ratio, don't worry about it.
"},{"location":"faq.html#does_the_metadata_for_files_i_deleted_mean_there_is_some_kind_of_a_permanent_record_of_which_files_my_client_has_heard_about_andor_seen_directly_even_if_i_purge_the_deletion_record","title":"Does the metadata for files I deleted mean there is some kind of a permanent record of which files my client has heard about and/or seen directly, even if I purge the deletion record?","text":"Yes. I am working on updating the database infrastructure to allow a full purge, but the structure is complicated, so it will take some time. If you are afraid of someone stealing your hard drive and matriculating your sordid MLP collection (or, in this case, the historical log of horrors that you rejected), do some research into drive encryption. Hydrus runs fine off an encrypted disk.
"},{"location":"faq.html#encryption","title":"Does Hydrus run ok off an encrypted drive partition?","text":"Yes! Both the database and your files should be fine on any of the popular software solutions. These programs give your OS a virtual drive that on my end looks and operates like any other. I have yet to encounter one that SQLite has a problem with. Make sure you don't have auto-dismount set--or at least be hawkish that it will never trigger while hydrus is running--or you could damage your database.
Drive encryption is a good idea for all your private things. If someone steals your laptop or USB stick, it means you only have to deal with frustration and replacement expenses (rather than also a nightmare of anxiety and identity-loss as some bad guy combs through all your things).
If you don't know how drive encryption works, search it up and have a play with a spare USB stick or a small 256MB file partition. Veracrypt is a popular and easy program, but there are several solutions. Get some practice and take it seriously, since if you act foolishly you can really screw yourself (e.g. locking yourself out of the only copy of data you have left because you forgot the password). Make sure you have a good plan, reliable (encrypted) backups, and a password manager.
"},{"location":"faq.html#delays","title":"Why can my friend not see what I just uploaded?","text":"The repositories do not work like conventional search engines; it takes a short but predictable while for changes to propagate to other users.
The client's searches only ever happen over its local cache of what is on the repository. Any changes you make will be delayed for others until their next update occurs. At the moment, the update period is 100,000 seconds, which is about 1 day and 4 hours.
"},{"location":"filetypes.html","title":"Supported Filetypes","text":"This is a list of all filetypes Hydrus can import. Hydrus determines the filetype based on examining the file itself rather than the extension or MIME type.
"},{"location":"filetypes.html#images","title":"Images","text":"Filetype Extension MIME type Thumbnails Viewable in Hydrus Notes jpeg.jpeg
image/jpeg
\u2705 \u2705 png .png
image/png
\u2705 \u2705 static gif .gif
image/gif
\u2705 \u2705 webp .webp
image/webp
\u2705 \u2705 Animated webp files will display as static tiff .tiff
image/tiff
\u2705 \u2705 qoi .qoi
image/qoi
\u2705 \u2705 Quite OK Image Format icon .ico
image/x-icon
\u2705 \u2705 bmp .bmp
image/bmp
\u2705 \u2705 heif .heif
image/heif
\u2705 \u2705 heic .heic
image/heic
\u2705 \u2705 avif .avif
image/avif
\u2705 \u2705"},{"location":"filetypes.html#animations","title":"Animations","text":"Filetype Extension MIME type Thumbnails Viewable in Hydrus Notes animated gif .gif
image/gif
\u2705 \u2705 apng .apng
image/apng
\u2705 \u2705 heif sequence .heifs
image/heif-sequence
\u2705 \u2705 heic sequence .heics
image/heic-sequence
\u2705 \u2705 avif sequence .avifs
image/avif-sequence
\u2705 \u2705"},{"location":"filetypes.html#video","title":"Video","text":"Filetype Extension MIME type Thumbnails Viewable in Hydrus Notes mp4 .mp4
video/mp4
\u2705 \u2705 webm .webm
video/webm
\u2705 \u2705 matroska .mkv
video/x-matroska
\u2705 \u2705 avi .avi
video/x-msvideo
\u2705 \u2705 flv .flv
video/x-flv
\u2705 \u2705 quicktime .mov
video/quicktime
\u2705 \u2705 mpeg .mpeg
video/mpeg
\u2705 \u2705 ogv .ogv
video/ogg
\u2705 \u2705 realvideo .rm
video/vnd.rn-realvideo
\u2705 \u2705 wmv .wmv
video/x-ms-wmv
\u2705 \u2705"},{"location":"filetypes.html#audio","title":"Audio","text":"Filetype Extension MIME type Viewable in Hydrus Notes mp3 .mp3
audio/mp3
\u2705 ogg .ogg
audio/ogg
\u2705 flac .flac
audio/flac
\u2705 m4a .m4a
audio/mp4
\u2705 matroska audio .mkv
audio/x-matroska
\u2705 mp4 audio .mp4
audio/mp4
\u2705 realaudio .ra
audio/vnd.rn-realaudio
\u2705 tta .tta
audio/x-tta
\u2705 wave .wav
audio/x-wav
\u2705 wavpack .wv
audio/wavpack
\u2705 wma .wma
audio/x-ms-wma
\u2705"},{"location":"filetypes.html#applications","title":"Applications","text":"Filetype Extension MIME type Thumbnails Viewable in Hydrus Notes flash .swf
application/x-shockwave-flash
\u2705 \u274c pdf .pdf
application/pdf
\u2705 \u274c 300 DPI assumed for resolution. No thumbnails for encrypted PDFs. epub .epub
application/epub+zip
\u274c \u274c djvu .djvu
image/vnd.djvu
\u274c \u274c"},{"location":"filetypes.html#image_project_files","title":"Image Project Files","text":"Filetype Extension MIME type Thumbnails Viewable in Hydrus Notes psd .psd
image/vnd.adobe.photoshop
\u2705 \u2705 Adobe Photoshop. Hydrus shows the embedded preview image if present in the file. clip .clip
application/clip
1 \u2705 \u274c Clip Studio Paint sai2 .sai2
application/sai2
1 \u274c \u274c PaintTool SAI2 krita .kra
application/x-krita
\u2705 \u2705 Krita. Hydrus shows the embedded preview image if present in the file. svg .svg
image/svg+xml
\u2705 \u274c xcf .xcf
application/x-xcf
\u274c \u274c GIMP procreate .procreate
application/x-procreate
1 \u2705 \u274c Procreate app"},{"location":"filetypes.html#archives","title":"Archives","text":"Filetype Extension MIME type Notes 7z .7z
application/x-7z-compressed
gzip .gz
application/gzip
rar .rar
application/vnd.rar
zip .zip
application/zip
This filetype doesn't have an official or de facto media type, the one listed was made up for Hydrus.\u00a0\u21a9\u21a9\u21a9
This page serves as a checklist or overview for the getting started part of Hydrus. It is recommended to read at least all of the getting started pages, but if you want to head to some specific section directly go ahead and do so.
"},{"location":"gettingStartedOverview.html#the_client","title":"The client","text":"Have a look at getting started with files to get an overview of the Hydrus client.
"},{"location":"gettingStartedOverview.html#local_files","title":"Local files","text":"If you already have many local files, either downloaded by hand or by some other downloader tool, head to the getting started importing section to begin importing them.
"},{"location":"gettingStartedOverview.html#downloading","title":"Downloading","text":"If you want to download with Hydrus, check out getting started with downloading. If you want to add the ability to download from sites not already available in Hydrus by default, check out adding new downloaders for how and a link to a user-maintained archive of downloaders.
"},{"location":"gettingStartedOverview.html#tags_and_ratings","title":"Tags and ratings","text":"If you have imported and/or downloaded some files and want to get started searching and tagging see searching and sorting and getting started with ratings.
It is also worth having a look at siblings for when you want to consolidate different tags that all mean the same thing, common misspellings, or preferential differences into one tag.
Parents are for when you want a tag to always add another tag. Commonly used for characters since you would usually want to add the series they're from too.
"},{"location":"gettingStartedOverview.html#duplicates","title":"Duplicates","text":"Have a lot of very similar looking pictures because of one reason or another? Have a look at duplicates, Hydrus' duplicates finder and filtering tool.
"},{"location":"gettingStartedOverview.html#api","title":"API","text":"Hydrus has an API that lets external tools connect to it. See API for how to turn it on and a list of some of these tools.
"},{"location":"getting_started_downloading.html","title":"Getting started with downloading","text":"The hydrus client has a sophisticated and completely user-customisable download system. It can pull from any booru or regular gallery site or imageboard, and also from some special examples like twitter and tumblr. A single file or URL to massive imports, the downloader can handle it all. A fresh install will by default have support for the bigger sites, but it is possible, with some work, for any user to create a new shareable downloader for a new site.
The downloader is highly parallelisable, and while the default bandwidth rules should stop you from running too hot and downloading so much at once that you annoy the servers you are downloading from, there are no brakes in the program on what you can get.
Danger
It is very important that you take this slow. Many users get overexcited with their new ability to download 500,000 files and then do so, only discovering later that 98% of what they got was junk that they now have to wade through. Figure out what workflows work for you, how fast you process files, what content you actually want, how much bandwidth and hard drive space you have, and prioritise and throttle your incoming downloads to match. If you can realistically only archive/delete filter 50 files a day, there is little benefit to downloading 500 new files a day. START SLOW.
It also takes a decent whack of CPU to import a file. You'll usually never notice this with just one hard drive import going, but if you have twenty different download queues all competing for database access and individual 0.1-second hits of heavy CPU work, you will discover your client starts to judder and lag. Keep it in mind, and you'll figure out what your computer is happy with. I also recommend you try to keep your total loaded files/urls to be under 20,000 to keep things snappy. Remember that you can pause your import queues, if you need to calm things down a bit.
"},{"location":"getting_started_downloading.html#downloader_types","title":"Downloader types","text":"There are a number of different downloader types, each with its own purpose:
URL download Intended for single posts or images. (Works with the API) Gallery For big download jobs such as an artist's catalogue, everything with a given tag on a booru. Subscriptions Repeated gallery jobs, for keeping up to date with an artist or tag. Use gallery downloader to get everything and a subscription to keep updated. Watcher Imageboard thread downloader, such as 4chan, 8chan, and what else exists. (Works with the API) Simple downloader Intended for simple one-off jobs like grabbing all linked images in a page."},{"location":"getting_started_downloading.html#url_download","title":"URL download","text":"The url downloader works like the gallery downloader but does not do searches. You can paste downloadable URLs to it, and it will work through them as one list. Dragging and dropping recognisable URLs onto the client (e.g. from your web browser) will also spawn and use this downloader.
The button next to the input field lets you paste multiple URLs at once such as if you've copied from a document or browser bookmarks. The URLs need to be newline separated.
"},{"location":"getting_started_downloading.html#api","title":"API","text":"If you use API-connected programs such as the Hydrus Companion, then any non-watchable URLs sent to Hydrus through them will end up in an URL downloader page, the specifics depending on the program's settings. You can't use this to force Hydrus to download paged galleries since the URL downloader page doesn't support traversing to the next page, use the gallery downloader for this.
"},{"location":"getting_started_downloading.html#gallery_download","title":"Gallery download","text":"The gallery page can download from multiple sources at the same time. Each entry in the list represents a basic combination of two things:
Source The site you are getting from. Safebooru or Danbooru or Deviant Art or twitter or anywhere else. In the example image this is the button labelledartstation artist lookup
. Query text Something like 'contrapposto' or 'blonde_hair blue_eyes' or an artist name like 'incase'. Whatever is searched on the site to return a list of ordered media. In the example image this is the text field with artist username
in it. So, when you want to start a new download, you first select the source with the button and then type in a query in the text box and hit enter. The download will soon start and fill in information, and thumbnails should stream in, just like the hard drive importer. The downloader typically works by walking through the search's gallery pages one by one, queueing up the found files for later download. There are several intentional delays built into the system, so do not worry if work seems to halt for a little while--you will get a feel for hydrus's 'slow persistent growth' style with experience.
Do a test download now, for fun! Pause its gallery search after a page or two, and then pause the file import queue after a dozen or so files come in.
The thumbnail panel can only show results from one queue at a time, so double-click on an entry to 'highlight' it, which will show its thumbs and also give more detailed info and controls in the 'highlighted query' panel. I encourage you to explore the highlight panel over time, as it can show and do quite a lot. Double-click again to 'clear' it.
It is a good idea to 'test' larger downloads, either by visiting the site itself for that query, or just waiting a bit and reviewing the first files that come in. Just make sure that you are getting what you thought you would, whether that be verifying that the query text is correct or that the site isn't only giving you bloated gifs or other bad quality files. The 'file limit', which stops the gallery search after the set number of files, is also great for limiting fishing expeditions (such as overbroad searches like 'wide_hips', which on the bigger boorus have 100k+ results and return variable quality). If the gallery search runs out of new files before the file limit is hit, the search will naturally stop (and the entry in the list should gain a \u23f9 'stop' symbol).
Note that some sites only serve 25 or 50 pages of results, despite their indices suggesting hundreds. If you notice that one site always bombs out at, say, 500 results, it may be due to a decision on their end. You can usually test this by visiting the pages hydrus tried in your web browser.
In general, particularly when starting out, artist searches are best. They are usually fewer than a thousand files and have fairly uniform quality throughout.
"},{"location":"getting_started_downloading.html#subscriptions","title":"Subscriptions","text":"Let's say you found an artist you like. You downloaded everything of theirs from some site, but every week, one or two new pieces is posted. You'd like to keep up with the new stuff, but you don't want to manually make a new download job every week for every single artist you like.
Subscriptions are a way to automatically recheck a good query in future, to keep up with new files. Many users come to use them. You set up a number of saved queries, and the client will 'sync' with the latest files in the gallery and download anything new, just as if you were running the download yourself.
Subscriptions only work for booru-like galleries that put the newest files first, and they only keep up with new content--once they have done their first sync, which usually gets the most recent hundred files or so, they will never reach further into the past. Getting older files, as you will see later, is a job best done with a normal download page.
Note
The entire subscription system assumes the source is a typical 'newest first' booru-style search. If you dick around with some order_by:rating/random metatag, it will not work reliably.
It is important to note that while subscriptions can have multiple queries (even hundreds!), they generally only work on one site. Expect to create one subscription for safebooru, one for artstation, one for paheal, and so on for every site you care about. Advanced users may be able to think of ways to get around this, but I recommend against it as it throws off some of the internal check timing calculations.
"},{"location":"getting_started_downloading.html#setting_up_subscriptions","title":"Setting up subscriptions","text":"Here's the dialog, which is under network->manage subscriptions:
This is a very simple example--there is only one subscription, for safebooru. It has two 'queries' (i.e. searches to keep up with).
Before we trip over the advanced buttons here, let's zoom in on the actual subscription:
Danger
Do not change the max number of new files options until you know exactly what they do and have a good reason to alter them!
This is a big and powerful panel! I recommend you open the screenshot up in a new browser tab, or in the actual client, so you can refer to it.
Despite all the controls, the basic idea is simple: Up top, I have selected the 'safebooru tag search' download source, and then I have added two artists--\"hong_soon-jae\" and \"houtengeki\". These two queries have their own panels for reviewing what URLs they have worked on and further customising their behaviour, but all they really are is little bits of search text. When the subscription runs, it will put the given search text into the given download source just as if you were running the regular downloader.
Warning
Subscriptions syncs are somewhat fragile. Do not try to play with the limits or checker options to download a whole 5,000 file query in one go--if you want everything for a query, run it in the manual downloader and get everything, then set up a normal sub for new stuff. There is no benefit to having a 'large' subscription, and it will trim itself down in time anyway.
You might want to put subscriptions off until you are more comfortable with galleries. There is more help here.
"},{"location":"getting_started_downloading.html#watchers","title":"Watchers","text":"If you are an imageboard user, try going to a thread you like and drag-and-drop its URL (straight from your web browser's address bar) onto the hydrus client. It should open up a new 'watcher' page and import the thread's files!
With only one URL to check, watchers are a little simpler than gallery searches, but as that page is likely receiving frequent updates, it checks it over and over until it dies. By default, the watcher's 'checker options' will regulate how quickly it checks based on the speed at which new files are coming in--if a thread is fast, it will check frequently; if it is running slow, it may only check once per day. When a thread falls below a critical posting velocity or 404s, checking stops.
In general, you can leave the checker options alone, but you might like to revisit them if you are always visiting faster or slower boards and find you are missing files or getting DEAD too early.
"},{"location":"getting_started_downloading.html#api_1","title":"API","text":"If you use API-connected programs such as the Hydrus Companion, then any watchable URLs sent to Hydrus through them will end up in a watcher page, the specifics depending on the program's settings.
"},{"location":"getting_started_downloading.html#simple_downloader","title":"Simple downloader","text":"The simple downloader will do very simple parsing for unusual jobs. If you want to download all the images in a page, or all the image link destinations, this is the one to use. There are several default parsing rules to choose from, and if you learn the downloader system yourself, it will be easy to make more.
"},{"location":"getting_started_downloading.html#import_options","title":"Import options","text":"Every importer in Hydrus has some 'import options' that change what is allowed, what is blacklisted, and whether tags or notes should be saved.
In previous versions these were split into completely different windows called file import options
and tag import options
so if you see those anywhere, this is what they're talking about and not some hidden menu anywhere.
Importers that download from websites rely on a flexible 'defaults' system, so you do not have to set them up every time you start a new downloader. While you should play around with your import options, once you know what works for you, you should set that as the default under network->downloaders->manage default import options. You can set them for all file posts generally, all watchers, and for specific sites as well.
"},{"location":"getting_started_downloading.html#file_import_options","title":"File import options","text":"This deals with the files being downloaded and what should happen to them. There's a few more tickboxes if you turn on advanced mode.
pre-import checks Pretty self-explanatory for the most part. If you want to redownload previously deleted files turning offexclude previously deleted files
will have Hydrus ignore deletion status. A few of the options have more information if you hover over them. import destinations See multiple file services, an advanced feature. post import actions See the files section on filtering for the first option, the other two have information if you hover over them."},{"location":"getting_started_downloading.html#tag_parsing","title":"Tag Parsing","text":"By default, hydrus now starts with a local tag service called 'downloader tags' and it will parse (get) all the tags from normal gallery sites and put them in this service. You don't have to do anything, you will get some decent tags. As you use the client, you will figure out which tags you like and where you want them. On the downloader page, click import options
:
This is an important dialog, although you will not need to use it much. It governs which tags are parsed and where they go. To keep things easy to manage, a new downloader will refer to the 'default' tag import options for a website, but for now let's set some values just for this downloader:
You can see that each tag service on your client has a separate section. If you add the PTR, that will get a new box too. A new client is set to get all tags for 'downloader tags' service. Things can get much more complicated. Have a play around with the options here as you figure things out. Most of the controls have tooltips or longer explainers in sub-dialogs, so don't be afraid to try things.
It is easy to get tens of thousands of tags by downloading this way. Different sites offer different kinds and qualities of tags, and the client's downloaders (which were designed by me, the dev, or a user) may parse all or only some of them. Many users like to just get everything on offer, but others only ever want, say, creator
, series
, and character
tags. If you feel brave, click that 'all tags' button, which will take you into hydrus's advanced 'tag filter', which allows you to select which of the incoming list of tags will be added.
The blacklist button will let you skip downloading files that have certain tags (perhaps you would like to auto-skip all images with gore
, scat
, or diaper
?), again using the tag filter, while the whitelist enables you to only allow files that have at least one of a set of tags. The 'additional tags' adds some fixed personal tags to all files coming in--for instance, you might like to add 'process into favourites' to your 'my tags' for some query you really like so you can find those files again later and process them separately. That little 'cog' icon button can also do some advanced things.
Warning
The file limit and import options on the upper panel of a gallery or watcher page, if changed, will only apply to new queries. If you want to change the options for an existing queue, either do so on its highlight panel below or use the 'set options to queries' button.
"},{"location":"getting_started_downloading.html#note_parsing","title":"Note Parsing","text":"Hydrus alsos parse 'notes' from some sites. This is a young feature, and a little advanced at times, but it generally means the comments that artists leave on certain gallery sites, or something like a tweet text. Notes are editable by you and appear in a hovering window on the right side of the media viewer.
Most of the controls here ensure that successive parses do not duplicate existing notes. The default settings are fine for all normal purposes, and you can leave them alone unless you know you want something special (e.g. turning note parsing off completely).
"},{"location":"getting_started_downloading.html#bandwidth","title":"Bandwidth","text":"It will not be too long until you see a \"bandwidth free in xxxxx...\" message. As a long-term storage solution, hydrus is designed to be polite in its downloading--both to the source server and your computer. The client's default bandwidth rules have some caps to stop big mistakes, spread out larger jobs, and at a bare minimum, no domain will be hit more than once a second.
All the bandwidth rules are completely customisable and are found in network > data > review bandwidth usage and edit rules
. They can get quite complicated. I strongly recommend you not look for them until you have more experience. I especially strongly recommend you not ever turn them all off, thinking that will improve something, as you'll probably render the client too laggy to function and get yourself an IP ban from the next server you pull from.
If you want to download 10,000 files, set up the queue and let it work. The client will take breaks, likely even to the next day, but it will get there in time. Many users like to leave their clients on all the time, just running in the background, which makes these sorts of downloads a breeze--you check back in the evening and discover your download queues, watchers, and subscriptions have given you another thousand things to deal with.
Again: the real problem with downloading is not finding new things, it is keeping up with what you get. Start slow and figure out what is important to your bandwidth budget, hard drive budget, and free time budget. Almost everyone fails at this.
"},{"location":"getting_started_downloading.html#logins","title":"Logins","text":"The client now supports a flexible (but slightly prototype and ugly) login system. It can handle simple sites and is as completely user-customisable as the downloader system. The client starts with multiple login scripts by default, which you can review under network->logins->manage logins:
Many sites grant all their content without you having to log in at all, but others require it for NSFW or special content, or you may wish to take advantage of site-side user preferences like personal blacklists. If you wish, you can give hydrus some login details here, and it will try to login--just as a browser would--before it downloads anything from that domain.
Warning
For multiple reasons, I do not recommend you use important accounts with hydrus. Use a throwaway account you don't care much about.
To start using a login script, select the domain and click 'edit credentials'. You'll put in your username/password, and then 'activate' the login for the domain, and that should be it! The next time you try to get something from that site, the first request will wait (usually about ten seconds) while a login popup performs the login. Most logins last for about thirty days (and many refresh that 30-day timer every time you make a new request), so once you are set up, you usually never notice it again, especially if you have a subscription on the domain.
Most sites only have one way of logging in, but hydrus does support more. Hentai Foundry is a good example--by default, the client performs the 'click-through' login as a guest, which requires no credentials and means any hydrus client can get any content from the start. But this way of logging in only lasts about 60 minutes or so before having to be refreshed, and it does not hide any spicy stuff, so if you use HF a lot, I recommend you create a throwaway account, set the filters you like in your HF profile (e.g. no guro content), and then click the 'change login script' in the client to the proper username/pass login.
The login system is not very clever. Don't try to pull off anything too weird with it! If anything goes wrong, it will likely delay the script (and hence the whole domain) from working for a while, or invalidate it entirely. If the error is something simple, like a password typo or current server maintenance, go back to this dialog to fix and scrub the error and try again. If the site just changed its layout, you may need to update the login script. If it is more complicated, please contact me, hydrus_dev, with the details!
If you would like to login to a site that is not yet supported by hydrus (usually ones with a Captcha in the login page), you have two options:
Boorus are usually easy to parse from, and there are many hydrus downloaders available that work well. Other sites are less easy to download from. Some will purposefully disguise access behind captchas or difficult login tokens that the hydrus downloader just isn't clever enough to handle. In these cases, it can be best just to go to an external downloader program that is specially tuned for these complex sites.
It takes a bit of time to set up these sorts of programs--and if you get into them, you'll likely want to make a script to help automate their use--but if you know they solve your problem, it is well worth it!
With these tools, used manually and/or with some scripts you set up, you may be able to set up a regular import workflow to hydrus (especilly with an Import Folder
as under the file
menu) and get most of what you would with an internal downloader. Some things like known URLs and tag parsing may be limited or non-existant, but it is better than nothing, and if you only need to do it for a couple sources on a couple sites every month, you can fill in the most of the gap manually yourself.
Hydev is planning to roll yt-dlp and gallery-dl support into the program natively in a future update of the downloader engine.
"},{"location":"getting_started_files.html","title":"Getting started with files","text":"Warning
Hydrus can be powerful, and you control everything. By default, you are not connected to any servers and absolutely nothing is shared with other users--and you can't accidentally one-click your way to exposing your whole collection--but if you tag private files with real names and click to upload that data to a tag repository that other people have access to, the program won't try to stop you. If you want to do private sexy slideshows of your shy wife, that's great, but think twice before you upload files or tags anywhere, particularly as you learn. It is impossible to contain leaks of private information.
There are no limits and few brakes on your behaviour. It is possible to import millions of files. For many new users, their first mistake is downloading too much too fast in overexcitement and becoming overwhelmed. Take things slow and figure out good processing workflows that work for your schedule before you start adding 500 subscriptions.
"},{"location":"getting_started_files.html#the_problem","title":"The problem","text":"If you have ever seen something like this--
--then you already know the problem: using a filesystem to manage a lot of images sucks.
Finding the right picture quickly can be difficult. Finding everything by a particular artist at a particular resolution is unthinkable. Integrating new files into the whole nested-folder mess is a further pain, and most operating systems bug out when displaying 10,000+ thumbnails.
"},{"location":"getting_started_files.html#the_client","title":"The client","text":"Let's first focus on importing files.
When you first boot the client, you will see a blank page. There are no files in the database and so there is nothing to search. To get started, I suggest you simply drag-and-drop a folder with a hundred or so images onto the main window. A dialog will appear affirming what you want to import. Ok that, and a new page will open. Thumbnails will stream in as the software processes each file.
The files are being imported into the client's database. The client discards their filenames.
Notice your original folder and its files are untouched. You can move the originals somewhere else, delete them, and the client will still return searches fine. In the same way, you can delete from the client, and the original files will remain unchanged--import is a copy, not a move, operation. The client performs all its operations on its internal database, which holds copies of the files it imports. If you find yourself enjoying using the client and decide to completely switch over, you can delete the original files you import without worry. You can always export them back again later.
FAQ: can the client manage files from their original locations?
Now:
Move your mouse to the top-left, top-middle and top-right of the media viewer. You should see some 'hover' panels pop into place.
The one on the left is for tags, the middle is for browsing and zoom commands, and the right is for status and ratings icons. You will learn more about these things as you get more experience with the program.
Press Enter or double/middle-click again to close the media viewer.
On the left of a normal search page is a text box. When it is focused, a dropdown window appears. It looks like this:
This is where you enter the predicates that define the current search. If the text box is empty, the dropdown will show 'system' tags that let you search by file metadata such as file size or animation duration. To select one, press the up or down arrow keys and then enter, or double click with the mouse.
When you have some tags in your database, typing in the text box will search them:
The (number) shows how many files have that tag, and hence how large the search result will be if you select that tag.
Clicking 'searching immediately' will pause the searcher, letting you add several tags in a row without sending it off to get results immediately. Ignore the other buttons for now--you will figure them out as you gain experience with the program.
You can remove from the list of 'active tags' in the box above with a double-click, or by entering the exact same tag again through the dropdown.
Hydrus supports many filetypes. A full list can be viewed on the Supported Filetypes page.
Although some support is imperfect for the complicated filetypes. For the Windows and Linux built releases, hydrus now embeds an MPV player for video, audio and gifs, which provides smooth playback and audio, but some other environments may not support MPV and so will default when possible to the native hydrus software renderer, which does not support audio. When something does not render how you want, right-clicking on its thumbnail presents the option 'open externally', which will open the file in the appropriate default program (e.g. ACDSee, VLC).
The client can also download files from several websites, including 4chan and other imageboards, many boorus, and gallery sites like deviant art and hentai foundry. You will learn more about this later.
"},{"location":"getting_started_files.html#inbox_and_archive","title":"Inbox and archive","text":"The client sends newly imported files to an inbox, just like your email. Inbox acts like a tag, matched by 'system:inbox'. A small envelope icon is drawn in the top corner of all inbox files:
If you are sure you want to keep a file long-term, you should archive it, which will remove it from the inbox. You can archive from your selected thumbnails' right-click menu, or by pressing F7. If you make a mistake, you can spam Ctrl+Z for undo or hit Shift+F7 on any set of files to explicitly return them to the inbox.
Anything you do not want to keep should be deleted by selecting from the right-click menu or by hitting the delete key. Deleted files are sent to the trash. They will get a little trash icon:
A trashed file will not appear in subsequent normal searches, although you can search the trash specifically by clicking the 'my files' button on the autocomplete dropdown and changing the file domain to 'trash'. Undeleting a file (Shift+Del) will return it to 'my files' as if nothing had happened. Files that remain in the trash will be permanently deleted, usually after a few days. You can change the permanent deletion behaviour in the client's options.
A quick way of processing new files is\u2013
"},{"location":"getting_started_files.html#filtering_your_inbox","title":"Filtering your inbox","text":"Lets say you just downloaded a good thread, or perhaps you just imported an old folder of miscellany. You now have a whole bunch of files in your inbox--some good, some awful. You probably want to quickly go through them, saying yes, yes, yes, no, yes, no, no, yes, where yes means 'keep and archive' and no means 'delete this trash'. Filtering is the solution.
Select some thumbnails, and either choose filter->archive/delete from the right-click menu or hit F12. You will see them in a special version of the media viewer, with the following default controls:
Your choices will not be committed until you finish filtering.
This saves time.
"},{"location":"getting_started_files.html#what_hydrus_is_for","title":"What Hydrus is for","text":"The hydrus client's workflows are not designed for half-finished files that you are still working on. Think of it as a giant archive for everything excellent you have decided to store away. It lets you find and remember these things quickly.
In general, Hydrus is good for individual files like you commonly find on imageboards or boorus. Although advanced users can cobble together some page-tag-based solutions, it is not yet great for multi-file media like comics and definitely not as a typical playlist-based music player.
If you are looking for a comic manager to supplement hydrus, check out this user-made guide to other archiving software here!
And although the client can hold millions of files, it starts to creak and chug when displaying or otherwise tracking more than about 40,000 or so in a single gui window. As you learn to use it, please try not to let your download queues or general search pages regularly sit at more than 40 or 50k total items, or you'll start to slow other things down. Another common mistake is to leave one large 'system:everything' or 'system:inbox' page open with 70k+ files. For these sorts of 'ongoing processing' pages, try adding a 'system:limit=256' to keep them snappy. One user mentioned he had regular gui hangs of thirty seconds or so, and when we looked into it, it turned out his handful of download pages had three million files queued up! Just try and take things slow until you figure out what your computer's limits are.
"},{"location":"getting_started_importing.html","title":"Importing and exporting","text":"By now you should have launched Hydrus. If you're like most new users you probably already have a fair bit of images or other media files that you're looking at getting organised.
Note
If you're planning to import or export a large amount of files it's recommended to use the automated folders since Hydrus can have trouble dealing with large, single jobs. Splitting them up in this manner will make it much easier on the program.
"},{"location":"getting_started_importing.html#importing_files","title":"Importing files","text":"Navigate to file -> import files
in the toolbar. OR Drag-and-drop one or more folders or files into Hydrus.
This will open the import files
window. Here you can add files or folders, or delete files from the import queue. Let Hydrus parse what it will update and then look over the options. By default the option to delete original files after succesful import (if it's ignored for any reason or already present in Hydrus for example) is not checked, activate on your own risk. In file import options
you can find some settings for minimum and maximum file size, resolution, and whether to import previously deleted files or not.
From here there's two options: import now
which will just import as is, and add tags before import >>
which lets you set up some rules to add tags to files on import. Examples are keeping filename as a tag, add folders as tag (useful if you have some sort of folder based organisation scheme), or load tags from an accompanying text file generated by some other program.
Once you're done click apply (or import now
) and Hydrus will start processing the files. Exact duplicates are not imported so if you had dupes spread out you will end up with only one file in the end. If files look similar but Hydrus imports both then that's a job for the dupe filter as there is some difference even if you can't tell it by eye. A common one is compression giving files with different file sizes, but otherwise looking identical or files with extra meta data baked into them.
If you want to share your files then export is the way to go. Basic way is to mark the files in Hydrus, dragging from there and dropping the files where you want them. You can also copy files or use export files to, well, export your files to a select location. All (or at least most) non-drag'n'drop export options can be found on right-clicking the select files and going down share
and then either copy
or export
.
Just dragging from the thumbnail view will export (copy) all the selected files to wherever you drop them. You can also start a drag and drop for single files from the media viewer using this arrow button on the top hover window:
If you want to drag and drop to discord, check the special BUGFIX option under options > gui
. You also find a filename pattern setting for that drag and drop here.
By default, the files will be named by their ugly hexadecimal hash, which is how they are stored inside the database.
If you use a drag and drop to open a file inside an image editing program, remember to hit 'save as' and give it a new filename in a new location! The client does not expect files inside its db directory to ever change.
"},{"location":"getting_started_importing.html#copy","title":"Copy","text":"You can also copy the files by right-clicking and going down share -> copy -> files
and then pasting the files where you want them.
You can also export files with tags, either in filename or as a sidecar file by right-clicking and going down share -> export -> files
. Have a look at the settings and then press export
. You can create folders to export files into by using backslashes on Windows (\\
) and slashes on Linux (/
) in the filename. This can be combined with the patterns listed in the pattern shortcut button dropdown. As example [series]\\{filehash}
will export files into folders named after the series:
namespaced tags on the files, all files tagged with one series goes into one folder, files tagged with another series goes into another folder as seen in the image below.
Clicking the pattern shortcuts
button gives you an overview of available patterns.
The EXPERIMENTAL option is only available under advanced mode, use at your own risk.
"},{"location":"getting_started_importing.html#automation","title":"Automation","text":"Under file -> import and export folders
you'll find options for setting up automated import and export folders that can run on a schedule. Both have a fair deal of options and rules you can set so look them over carefully.
Like with a manual import, if you wish you can import tags by parsing filenames or loading sidecars.
"},{"location":"getting_started_importing.html#export_folders","title":"Export folders","text":"Like with manual export, you can set the filenames using a tag pattern, and you can export to sidecars too.
"},{"location":"getting_started_importing.html#importing_and_exporting_tags","title":"Importing and exporting tags","text":"While you can import and export tags together with images sometimes you just don't want to deal with the files.
Going to tags -> migrate tags
you get a window that lets you deal with just tags. One of the options here is what's called a Hydrus Tag Archive, a file containing the hash <-> tag mappings for the files and tags matching the query.
If any of this is confusing, a simpler guide is here, and some video guides are here!
"},{"location":"getting_started_installing.html#downloading","title":"Downloading","text":"You can get the latest release at the github releases page.
I try to release a new version every Wednesday by 8pm EST and write an accompanying post on my tumblr and a Hydrus Network General thread on 8chan.moe /t/.
"},{"location":"getting_started_installing.html#installing","title":"Installing","text":"The hydrus releases are 64-bit only. If you are a python expert, there is the slimmest chance you'll be able to get it running from source on a 32-bit machine, but it would be easier just to find a newer computer to run it on.
WindowsmacOSLinuxDockerFrom Sourcehydrus-network
in the 'Extras' bucket) winget install --id=HydrusNetwork.HydrusNetwork -e --location \"\\PATH\\TO\\INSTALL\\HERE\"
, which can, if you know what you are doing, be winget install --id=HydrusNetwork.HydrusNetwork -e --location \".\\\"
, maybe rolled into a batch file.apt-get install libmpv1
OSError: /lib/x86_64-linux-gnu/libgio-2.0.so.0: undefined symbol: g_module_open_full\n(traceback)\npyimod04_ctypes.install.<locals>.PyInstallerImportError: Failed to load dynlib/dll 'libmpv.so.1'. Most likely this dynlib/dll was not found when the application was frozen.\n
Then please do this: libgmodule*
. You are looking for something like libgmodule-2.0.so
. Users report finding it in /usr/lib64/
and /usr/lib/x86_64-linux-gnu
.By default, hydrus stores all its data\u2014options, files, subscriptions, everything\u2014entirely inside its own directory. You can extract it to a usb stick, move it from one place to another, have multiple installs for multiple purposes, wrap it all up inside a truecrypt volume, whatever you like. The .exe installer writes some unavoidable uninstall registry stuff to Windows, but the 'installed' client itself will run fine if you manually move it.
Bad Locations
Do not install to a network location! (i.e. on a different computer's hard drive) The SQLite database is sensitive to interruption and requires good file locking, which network interfaces often fake. There are ways of splitting your client up so the database is on a local SSD but the files are on a network--this is fine--but you really should not put the database on a remote machine unless you know what you are doing and have a backup in case things go wrong.
Do not install to a location with filesystem-level compression enabled! It may work ok to start, but when the SQLite database grows to large size, this can cause extreme access latency and I/O errors and corruption.
For macOS users
The Hydrus App is non-portable and puts your database in ~/Library/Hydrus
(i.e. /Users/[You]/Library/Hydrus
). You can update simply by replacing the old App with the new, but if you wish to backup, you should be looking at ~/Library/Hydrus
, not the App itself.
Hydrus is made by an Anon out of duct tape and string. It combines file parsing tech with lots of network and database code in unusual and powerful ways, and all through a hacked-together executable that isn't signed by any big official company.
Unfortunately, we have been hit by anti-virus false positives throughout development. Every few months, one or more of the larger anti-virus programs sees some code that looks like something bad, or they run the program in a testbed and don't like something it does, and then they quarantine it. Every single instance of this so far has been a false positive. They usually go away the next week or two when the next set of definitions roll out. Some hydrus users are kind enough to report the program as a false positive to the anti-virus companies themselves, which also helps here.
Some users have never had the problem, some get hit regularly. The situation is obviously worse on Windows. If you try to extract the zip and hydrus_client.exe or the whole folder suddenly disappears, please check your anti-virus software.
I am interested in reports about these false-positives, just so I know what is going on. Sometimes I have been able to reduce problems by changing something in the build (one of these was, no shit, an anti-virus testbed running the installer and then opening the help html at the end, which launched Edge browser, which then triggered Windows Update, which hit UAC and was considered suspicious. I took out the 'open help' checkbox from the installer as a result).
You should be careful about random software online. For my part, the program is completely open source, and I have a long track record of designing it with privacy foremost. There is no intentional spyware of any sort--the program never connects to another computer unless you tell it to. Furthermore, the exe you download is now built on github's cloud, so there are very few worries about a trojan-infected build environment putting something I did not intend into the program (as there once were when I built the release on my home machine). That doesn't stop Windows Defender from sometimes calling it an ugly name like \"Tedy.4675\" and definitively declaring \"This program is dangerous and executes commands from an attacker\" but that's the modern anti-virus ecosystem.
There aren't excellent solutions to this problem. I don't like to say 'just exclude the program directory from your anti-virus settings', but some users are comfortable with this and say it works fine. One thing I do know that helps (with other things too), if you are using the default Windows Defender, is going into the Windows Security shield icon on your taskbar, and 'virus and threat protection' and then 'virus and threat protection settings', and turning off 'Cloud-delivered protection' and 'Automatic sample submission'. It seems with these on, Windows will talk with a central server about executables you run and download early updates, and this gives a lot of false positives.
If you are still concerned, please feel free to run from source, as above. You are controlling everything, then, and can change anything about the program you like. Or you can only run releases from four weeks ago, since you know the community would notice by then if there ever were a true positive. Or just run it in a sandbox and watch its network traffic.
In 2022 I am going to explore a different build process to see if that reduces the false positives. We currently make the executable with PyInstaller, which has some odd environment set-up the anti-virus testbeds don't seem to like, and perhaps PyOxidizer will be better. We'll see.
"},{"location":"getting_started_installing.html#running","title":"Running","text":"To run the client:
WindowsmacOSLinux./client
from the terminal.Warning
Hydrus is imageboard-tier software, wild and fun but unprofessional. It is written by one Anon spinning a lot of plates. Mistakes happen from time to time, usually in the update process. There are also no training wheels to stop you from accidentally overwriting your whole db if you screw around. Be careful when updating. Make backups beforehand!
Hydrus does not auto-update. It will stay the same version unless you download and install a new one.
Although I put out a new version every week, you can update far less often if you prefer. The client keeps to itself, so if it does exactly what you want and a new version does nothing you care about, you can just leave it. Other users enjoy updating every week, simply because it makes for a nice schedule. Others like to stay a week or two behind what is current, just in case I mess up and cause a temporary bug in something they like.
A user has written a longer and more formal guide to updating, and information on the 334->335 step (python2 to python3) here.
The 526->527 step was also important.527 changed the program executable name from 'client' to 'hydrus_client'. There was also a library update that caused a dll conflict with previous installs.
If you need to update from 526 or before, then:
git pull
as normal. If you haven't already, feel free to run setup_venv again to get the new OpenCV. Update your launch scripts to point at the new hydrus_client.py
boot scripts.The update process:
Unless the update specifically disables or reconfigures something, all your files and tags and settings will be remembered after the update.
Releases typically need to update your database to their version. New releases can retroactively perform older database updates, so if the new version is v255 but your database is on v250, you generally only need to get the v255 release, and it'll do all the intervening v250->v251, v251->v252, etc... update steps in order as soon as you boot it. If you need to update from a release more than, say, ten versions older than current, see below. You might also like to skim the release posts or changelog to see what is new.
Clients and servers of different versions can usually connect to one another, but from time to time, I make a change to the network protocol, and you will get polite error messages if you try to connect to a newer server with an older client or vice versa. There is still no need to update the client--it'll still do local stuff like searching for files completely fine. Read my release posts and judge for yourself what you want to do.
"},{"location":"getting_started_installing.html#clean_installs","title":"Clean installs","text":"This is usually only relevant if you know you have a dll conflict or otherwise update and cannot boot at all.
Very rarely, hydrus needs a clean install. This can be due to a special update like when we moved from 32-bit to 64-bit or needing to otherwise 'reset' a custom install situation. The problem is usually that a library file has been renamed in a new version and hydrus has trouble figuring out whether to use the older one (from a previous version) or the newer.
In any case, if you cannot boot hydrus and it either fails silently or you get a crash log or system-level error popup complaining in a technical way about not being able to load a dll/pyd/so file, you may need a clean install, which essentially means clearing any old files out and reinstalling.
However, you need to be careful not to delete your database! It sounds silly, but at least one user has made a mistake here. The process is simple, do not deviate:
After that, you'll have a 'clean' version of hydrus that only has the latest version's dlls. If hydrus still will not boot, I recommend you roll back to your last working backup and let me, hydrus dev, know what your error is.
"},{"location":"getting_started_installing.html#big_updates","title":"Big updates","text":"If you have not updated in some time--say twenty versions or more--doing it all in one jump, like v250->v290, is likely not going to work. I am doing a lot of unusual stuff with hydrus, change my code at a fast pace, and do not have a ton of testing in place. Hydrus update code often falls to bitrot, and so some underlying truth I assumed for the v255->v256 code may not still apply six months later. If you try to update more than 50 versions at once (i.e. trying to perform more than a year of updates in one go), the client will give you a polite error rather than even try.
As a result, if you get a failure on trying to do a big update, try cutting the distance in half--try v270 first, and then if that works, try v270->v290. If it doesn't, try v260, and so on.
If you narrow the gap down to just one version and still get an error, please let me know. I am very interested in these sorts of problems and will be happy to help figure out a fix with you (and everyone else who might be affected).
All that said, and while updating is complex and every client is different, various user reports over the years suggest this route works and is efficient: 204 > 238 > 246 > 291 > 328 > 335 > 376 > 421 > 466 > 474 ? 480 > 521
"},{"location":"getting_started_installing.html#backing_up","title":"Backing up","text":"I am not joking around: if you end up liking hydrus, you should back up your database
Maintaining a regular backup is important for hydrus. The program stores a lot of complicated data that you will put hours and hours of work into, and if you only have one copy and your hard drive breaks, you could lose everything. This has happened before--to people who thought it would never happen to them--and it sucks big time to go through. Don't let it be you.
Hydrus's database engine, SQLite, is excellent at keeping data safe, but it cannot work in a faulty environment. Ways in which users of hydrus have damaged/lost their database:
Some of those you can mitigate (don't run the database over a network!) and some will always be a problem, but if you have a backup, none of them can kill you.
This mostly means your database, not your files
Note that nearly all the serious and difficult-to-fix problems occur to the database, which is four large .db files, not your media. All your images and movies are read-only in hydrus, and there's less worry if they are on a network share with bad locks or a machine that suddenly loses power. The database, however, maintains a live connection, with regular complex writes, and here a hardware failure can lead to corruption (basically the failure scrambles the data that is written, so when you try to boot back up, a small section of the database is incomprehensible garbage).
If you do not already have a backup routine for your files, this is a great time to start. I now run a backup every week of all my data so that if my computer blows up or anything else awful happens, I'll at worst have lost a few days' work. Before I did this, I once lost an entire drive with tens of thousands of files, and it felt awful. If you are new to saving a lot of media, I hope you can avoid what I felt. ;_;
I use ToDoList to remind me of my jobs for the day, including backup tasks, and FreeFileSync to actually mirror over to an external usb drive. I recommend both highly (and for ToDoList, I recommend hiding the complicated columns, stripping it down to a simple interface). It isn't a huge expense to get a couple-TB usb drive either--it is absolutely worth it for the peace of mind.
By default, hydrus stores all your user data in one location, so backing up is simple:
"},{"location":"getting_started_installing.html#the_simple_way_-_inside_the_client","title":"The simple way - inside the client","text":"Go database->set up a database backup location in the client. This will tell the client where you want your backup to be stored. A fresh, empty directory on a different drive is ideal.
Once you have your location set up, you can thereafter hit database->update database backup. It will lock everything and mirror your files, showing its progress in a popup message. The first time you make this backup, it may take a little while (as it will have to fully copy your database and all its files), but after that, it will only have to copy new or altered files and should only ever take a couple of minutes.
Advanced users who have migrated their database and files across multiple locations will not have this option--use an external program in this case.
"},{"location":"getting_started_installing.html#the_powerful_and_best_way_-_using_an_external_program","title":"The powerful (and best) way - using an external program","text":"Doing it yourself is best. If you are an advanced user with a complicated hydrus install migrated across multiple drives, then you will have to do it this way--the simple backup will be disabled.
You need to backup two things, which are both, by default, beneath install_dir/db: the four client*.db files and your client_files directory(ies). The .db files contain absolutely everything about your client and files--your settings and file lists and metadata like inbox/archive and tags--while the client_files subdirs store your actual media and its thumbnails.
If everything is still under install_dir/db, then it is usually easiest to just backup the whole install dir, keeping a functional 'portable' copy of your install that you can restore no prob. Make sure you keep the .db files together--they are not interchangeable and mostly useless on their own!
An example FreeFileSync profile for backing up a database will look like this:
Note it has 'file time and size' and 'mirror' as the main settings. This quickly ensures that changes to the left-hand side are copied to the right-hand side, adding new files and removing since-deleted files and overwriting modified files. You can save a backup profile like that and it should only take a few minutes every week to stay safely backed up, even if you have hundreds of thousands of files.
Shut the client down while you run the backup, obviously.
"},{"location":"getting_started_installing.html#a_few_options","title":"A few options","text":"There are a host of other great alternatives out there, probably far too many to count. These are a couple that are often recommended and used by Hydrus users and are, in the spirit of Hydrus Network itself, free and open source.
FreeFileSync Linux, MacOS, Windows. Recommended and used by dev. Somewhat basic but does the job well enough.
Borg Backup FreeBSD, Linux, MacOS. More advanced and featureful backup tool.
Restic Almost every OS you can name.
Danger
Do not put your live database in a folder that continuously syncs to a cloud backup. Many of these services will interfere with a running client and can cause database corruption. If you still want to use a system like this, either turn the sync off while the client is running, or use the above backup workflows to safely backup your client to a separate folder that syncs to the cloud.
There is significantly more information about the database structure here.
I recommend you always backup before you update, just in case there is a problem with my update code that breaks your database. If that happens, please contact me, describing the problem, and revert to the functioning older version. I'll get on any problems like that immediately.
"},{"location":"getting_started_installing.html#backing_up_small","title":"Backing up with not much space","text":"If you decide not to maintain a backup because you cannot afford drive space for all your files, please please at least back up your actual database files. Use FreeFileSync or a similar program to back up the four 'client*.db' files in install_dir/db when the client is not running. Just make sure you have a copy of those files, and then if your main install becomes damaged, we will have a reference to either roll back to or manually restore data from. Even if you lose a bunch of media files in this case, with an intact database we'll be able to schedule recovery of anything with a URL.
If you are really short on space, note also that the database files are very compressible. A very large database where the four files add up to 70GB can compress down to 17GB zip with 7zip on default settings. Better compression ratios are possible if you make sure to put all four files in the same archive and turn up the quality. This obviously takes some additional time to do, but if you are really short on space it may be the only way it fits, and if your only backup drive is a slow USB stick, then you might actually save time from not having to transfer the other 53GB! Media files (jpegs, webms, etc...) are generally not very compressible, usually 5% at best, so it is usually not worth trying.
It is best to have all four database files. It is generally easy and quick to fix problems if you have a backup of all four. If client.caches.db is missing, you can recover but it might take ten or more hours of CPU work to regenerate. If client.mappings.db is missing, you might be able to recover tags for your local files from a mirror in an intact client.caches.db. However, client.master.db and client.db are the most important. If you lose either of those, or they become too damaged to read and you have no backup, then your database is essentially dead and likely every single archive and view and tag and note and url record you made is lost. This has happened before, do not let it be you.
"},{"location":"getting_started_more_tags.html","title":"Tags Can Get Complicated","text":"Tags are powerful, and there are many tools within hydrus to customise how they apply and display. I recommend you play around with the basics before making your own new local tag services or jumping right into the PTR, so take it slow.
"},{"location":"getting_started_more_tags.html#tag_services","title":"Tag services","text":"Hydrus lets you organise tags across multiple separate 'services'. By default there are two, but you can have however many you want (services->manage services
). You might like to add more for different sets of siblings/parents, tags you don't want to see but still search by, parsing tags into different services based on reliability of the source or the source itself. You could for example parse all tags from Pixiv into one service, Danbooru tags into another, Deviantart etc. and so on as you chose. You must always have at least one local tag service.
Local tag services are stored only on your hard drive--they are completely private. No tags, siblings, or parents will accidentally leak, so feel free to go wild with whatever odd scheme you want to try out.
Each tag service comes with its own tags, siblings and parents.
"},{"location":"getting_started_more_tags.html#my_tags","title":"My tags","text":"The intent is to use this service for tags you yourself want to add.
"},{"location":"getting_started_more_tags.html#downloader_tags","title":"Downloader tags","text":"The default tag parse target. Tags of things you download will end up here unless you change the settings. It's probably a good idea to set up some tag blacklists for tags you don't want.
"},{"location":"getting_started_more_tags.html#tag_repositories","title":"Tag repositories","text":"It can take a long time to tag even small numbers of files well, so I created tag repositories so people can share the work.
Tag repos store many file->tag relationships. Anyone who has an access key to the repository can sync with it and hence download all these relationships. If any of their own files match up, they will get those tags. Access keys will also usually have permission to upload new tags and ask for incorrect ones to be deleted.
Anyone can run a tag repository, but it is a bit complicated for new users. I ran a public tag repository for a long time, and now this large central store is run by users. It has over a billion tags and is free to access and contribute to.
To connect with it, please check here. Please read that page if you want to try out the PTR. It is only appropriate for someone on an SSD!
If you add it, your client will download updates from the repository over time and, usually when it is idle or shutting down, 'process' them into its database until it is fully synchronised. The processing step is CPU and HDD heavy, and you can customise when it happens in file->options->maintenance and processing. As the repository synchronises, you should see some new tags appear, particularly on famous files that lots of people have.
You can watch more detailed synchronisation progress in the services->review services window.
Your new service should now be listed on the left of the manage tags dialog. Adding tags to a repository works very similarly to the 'my tags' service except hitting 'apply' will not immediately confirm your changes--it will put them in a queue to be uploaded. These 'pending' tags will be counted with a plus '+' or minus '-' sign.
Notice that a 'pending' menu has appeared on the main window. This lets you start the upload when you are ready and happy with everything that you have queued.
When you upload your pending tags, they will commit and look to you like any other tag. The tag repository will anonymously bundle them into the next update, which everyone else will download in a day or so. They will see your tags just like you saw theirs.
If you attempt to remove a tag that has been uploaded, you may be prompted to give a reason, creating a petition that a janitor for the repository will review.
I recommend you not spam tags to the public tag repo until you get a rough feel for the guidelines, and my original tag schema thoughts, or just lurk until you get the idea. It roughly follows what you will see on a typical booru. The general rule is to only add factual tags--no subjective opinion.
You can connect to more than one tag repository if you like. When you are in the manage tags dialog, pressing the up or down arrow keys on an empty input switches between your services.
FAQ: why can my friend not see what I just uploaded?
"},{"location":"getting_started_more_tags.html#siblings_and_parents","title":"Siblings and parents","text":"For more in-depth information, see siblings and parents.
tl;dr: Siblings rename/alias tags in an undoable way. Parents virtually add/imply one or more tags (parents) if the 'child' tag is present. The PTR has a lot of them.
"},{"location":"getting_started_more_tags.html#display_rules","title":"Display rules","text":"If you go to tags -> manage where siblings and parents apply
you'll get a window where you can customise where and in what order siblings and parents apply. The service at the top of the list has precedence over all else, then second, and so on depending on how many you have. If you for example have PTR you can use a tag service to overwrite tags/siblings for cases where you disagree with the PTR standards.
The hydrus client supports two kinds of ratings: like/dislike and numerical. Let's start with the simpler one:
"},{"location":"getting_started_ratings.html#like_dislike","title":"like/dislike","text":"A new client starts with one of these, called 'favourites'. It can set one of two values to a file. It does not have to represent like or dislike--it can be anything you want, like 'send to export folder' or 'explicit/safe' or 'cool babes'. Go to services->manage services->add->local like/dislike ratings:
You can set a variety of colours and shapes.
"},{"location":"getting_started_ratings.html#numerical","title":"numerical","text":"This is '3 out of 5 stars' or '8/10'. You can set the range to whatever whole numbers you like:
As well as the shape and colour options, you can set how many 'stars' to display and whether 0/10 is permitted.
If you change the star range at a later date, any existing ratings will be 'stretched' across the new range. As values are collapsed to the nearest integer, this is best done for scales that are multiples. \u2156 will neatly become 4/10 on a zero-allowed service, for instance, and 0/4 can nicely become \u2155 if you disallow zero ratings in the same step. If you didn't intuitively understand that, just don't touch the number of stars or zero rating checkbox after you have created the numerical rating service!
"},{"location":"getting_started_ratings.html#using_ratings","title":"now what?","text":"Ratings are displayed in the top-right of the media viewer:
Hovering over each control will pop up its name, in case you forget which is which. You can set then them with a left- or right-click. Like/dislike and numerical have slightly different click behaviour, so have a play with them to get their feel. Pressing F4 on a selection of thumbnails will open a dialog with a very similar layout, which will let you set the same rating to many files simultaneously.
Once you have some ratings set, you can search for them using system:rating, which produces this dialog:
On my own client, I find it useful to have several like/dislike ratings set up as one-click pseudo-tags, like the 'OP images' above.
"},{"location":"getting_started_searching.html","title":"Searching and sorting","text":"The primary purpose of tags is to be able to find what you've tagged again. Let's see more how it works.
"},{"location":"getting_started_searching.html#searching","title":"Searching","text":"Just open a new search page (pages > new file search page
or Ctrl+T > file search
) and start typing in the search field which should be focused when you first open the page.
Let's look at the tag autocomplete dropdown:
system predicates
Hydrus calls search terms predicates. 'system predicates', which search metadata other than simple tags, show on any search page with an empty autocomplete input. You can mix them into any search alongside tags. They are very useful, so try them out!
include current/pending tags
Turn these on and off to control whether tag predicates apply to tags that exist, or those pending to be uploaded to a tag repository. Just searching 'pending' tags is useful if you want to scan what you have pending to go up to the PTR--just turn off 'current' tags and search system:num tags > 0
.
searching immediately
This controls whether a change to the list of current search predicates will instantly run the new search and get new results. Turning this off is helpful if you want to add, remove, or replace several heavy search terms in a row without getting UI lag.
OR
You only see this if you have 'advanced mode' on. It lets you enter some pretty complicated tags!
file/tag domains
By default, you will search in 'my files' and 'all known tags' domain. This is the intersection of your local media files (on your hard disk) and the union of all known tag searches. If you search for character:samus aran
, then you will get file results from your 'my files' domain that have character:samus aran
in any known tag service. For most purposes, this combination is fine, but as you use the client more, you will sometimes want to access different search domains.
For instance, if you change the file domain to 'trash', then you will instead get files that are in your trash. Setting the tag domain to 'my tags' will ignore other tag services (e.g. the PTR) for all tag search predicates, so a system:num_tags
or a character:samus aran
will only look 'my tags'.
Turning on 'advanced mode' gives access to more search domains. Some of them are subtly complicated, run extremely slowly, and only useful for clever jobs--most of the time, you still want 'my files' and 'all known tags'.
favourite searches star
Once you are more experienced, have a play with this. It lets you save your common searches for future, so you don't have to either keep re-entering them or keep them open all the time. If you close big things down when you aren't using them, you will keep your client lightweight and save time.
When you type a tag in a search page, Hydrus will treat a space the same way as an underscore. Searching character:samus aran
will find files tagged with character:samus aran
and character:samus_aran
. This is true of some other syntax characters, [](){}/\\\"'-
, too.
Tags will be searchable by all their siblings. If there's a sibling for large
-> huge
then typing large
will provide huge
as a suggestion. This goes for the whole sibling chain, no matter how deep or a tag's position in it.
The autocomplete tag dropdown supports wildcard searching with *
.
The *
will match any number of characters. Every normal autocomplete search has a secret *
on the end that you don't see, which is how full words get matched from you only typing in a few letters.
This is useful when you can only remember part of a word, or can't spell part of it. You can put *
characters anywhere, but you should experiment to get used to the exact way these searches work. Some results can be surprising!
You can select the special predicate inserted at the top of your autocomplete results (the highlighted *gelion
and *va*ge*
above). It will return all files that match that wildcard, i.e. every file for every other tag in the dropdown list.
This is particularly useful if you have a number of files with commonly structured over-informationed tags, like this:
In this case, selecting the title:cool pic*
predicate will return all three images in the same search, where you can conveniently give them some more-easily searched tags like series:cool pic
and page:1
, page:2
, page:3
.
You can edit any selected 'active' search predicates by either its Right-Click menu or through Shift+Double-Left-Click on the selection. For simple tags, this means just changing the text (and, say, adding/removing a leading hyphen for negation/inclusion), but any 'system' predicate can be fully edited with its original panel. If you entered 'system:filesize < 200KB' and want to make it a little bigger, don't delete and re-add--just edit the existing one in place.
"},{"location":"getting_started_searching.html#other_shortcuts","title":"Other Shortcuts","text":"These will eventually be migrated to the shortcut system where they will be more visible and changeable, but for now:
Searches find files that match every search 'predicate' in the list (it is an AND search), which makes it difficult to search for files that include one OR another tag. For example the query red eyes
AND green eyes
(aka what you get if you enter each tag by itself) will only find files that has both tags. While the query red eyes
OR green eyes
will present you with files that are tagged with red eyes or green eyes, or both.
More recently, simple OR search support was added. All you have to do is hold down Shift when you enter/double-click a tag in the autocomplete entry area. Instead of sending the tag up to the active search list up top, it will instead start an under-construction 'OR chain' in the tag results below:
You can keep searching for and entering new tags. Holding down ++Shift++ on new tags will extend the OR chain, and entering them as normal will 'cap' the chain and send it to the complete and active search predicates above.
Any file that has one or more of those OR sub-tags will match.
If you enter an OR tag incorrectly, you can either cancel or 'rewind' the under-construction search predicate with these new buttons that will appear:
You can also cancel an under-construction OR by hitting Esc on an empty input. You can add any sort of search term to an OR search predicate, including system predicates. Some unusual sub-predicates (typically a -tag
, or a very broad system predicate) can run very slowly, but they will run much faster if you include non-OR search predicates in the search:
This search will return all files that have the tag fanfic
and one or more of medium:text
, a positive value for the like/dislike rating 'read later', or PDF mime.
There's a more advanced OR search function available by pressing the OR button. Previous knowledge of operators expected and required.
"},{"location":"getting_started_searching.html#sorting","title":"Sorting","text":"At the top-left of most pages there's a sort by:
dropdown menu. Most of the options are self-explanatory. They do nothing except change in what order Hydrus presents the currently searched files to you.
Default sort order and more sort by: namespace
are found in file -> options -> sort/collect
.
system:limit
","text":"If you add system:limit
to a search, the client will consider what that page's file sort currently is. If it is simple enough--something like file size or import time--then it will sort your results before they come back and clip the limit according to that sort, getting the n 'largest file size' or 'newest imports' and so on. This can be a great way to set up a lightweight filtering page for 'the 256 biggest videos in my inbox'.
If you change the sort, hydrus will not refresh the search, it'll just re-sort the n files you have. Hit F5 to refresh the search with a new sort.
Not all sorts are supported. Anything complicated like tag sort will result in a random sample instead.
"},{"location":"getting_started_searching.html#collecting","title":"Collecting","text":"Collection is found under the sort by:
dropdown and uses namespaces listed in the sort by: namespace
sort options. The new namespaces will only be available in new pages.
The introduction to subscriptions has been moved to the main downloading help here.
"},{"location":"getting_started_subscriptions.html#description","title":"how do subscriptions work?","text":"For the most part, all you need to do to set up a good subscription is give it a name, select the download source, and use the 'paste queries' button to paste what you want to search. Subscriptions have great default options for almost all query types, so you don't have to go any deeper than that to get started.
Once you hit ok on the main subscription dialog, the subscription system should immediately come alive. If any queries are due for a 'check', they will perform their search and look for new files (i.e. URLs it has not seen before). Once that is finished, the file download queue will be worked through as normal. Typically, the sub will make a popup like this while it works:
The initial sync can sometimes take a few minutes, but after that, each query usually only needs thirty seconds' work every few days. If you leave your client on in the background, you'll rarely see them. If they ever get in your way, don't be afraid to click their little cancel button or call a global halt with network->pause->subscriptions--the next time they run, they will resume from where they were before.
Similarly, the initial sync may produce a hundred files, but subsequent runs are likely to only produce one to ten. If a subscription comes across a lot of big files at once, it may not download them all in one go--but give it time, and it will catch back up before you know it.
When it is done, it leaves a little popup button that will open a new page for you:
This can often be a nice surprise!
"},{"location":"getting_started_subscriptions.html#good_subs","title":"what makes a good subscription?","text":"The same rules as for downloaders apply: start slow, be hesitant, and plan for the long-term. Artist queries make great subscriptions as they update reliably but not too often and have very stable quality. Pick the artists you like most, see where their stuff is posted, and set up your subs like that.
Series and character subscriptions are sometimes valuable, but they can be difficult to keep up with and have highly variable quality. It is not uncommon for users to only keep 15% of what a character sub produces. I do not recommend them for anything but your waifu.
Attribute subscriptions like 'blue_eyes' or 'smile' make for terrible subs as the quality is all over the place and you will be inundated by too much content. The only exceptions are for specific, low-count searches that really matter to you, like 'contrapposto' or 'gothic trap thighhighs'.
If you end up subscribing to eight hundred things and get ten thousand new files a week, you made a mistake. Subscriptions are for keeping up with things you like. If you let them overwhelm you, you'll resent them.
It is a good idea to run a 'full' download for a search before you set up a subscription. As well as making sure you have the exact right query text and that you have everything ever posted (beyond the 100 files deep a sub will typically look), it saves the bulk of the work (and waiting on bandwidth) for the manual downloader, where it belongs. When a new subscription picks up off a freshly completed download queue, its initial subscription sync only takes thirty seconds since its initial URLs are those that were already processed by the manual downloader. I recommend you stack artist searches up in the manual downloader using 'no limit' file limit, and when they are all finished, select them in the list and right-click->copy queries, which will put the search texts in your clipboard, newline-separated. This list can be pasted into the subscription dialog in one go with the 'paste queries' button again!
"},{"location":"getting_started_subscriptions.html#checking","title":"images/how often do subscriptions check?","text":"Hydrus subscriptions use the same variable-rate checking system as its thread watchers, just on a larger timescale. If you subscribe to a busy feed, it might check for new files once a day, but if you enter an artist who rarely posts, it might only check once every month. You don't have to do anything. The fine details of this are governed by the 'checker options' button. This is one of the things you should not mess with as you start out.
If a query goes too 'slow' (typically, this means no new files for 180 days), it will be marked DEAD in the same way a thread will, and it will not be checked again. You will get a little popup when this happens. This is all editable as you get a better feel for the system--if you wish, it is completely possible to set up a sub that never dies and only checks once a year.
I do not recommend setting up a sub that needs to check more than once a day. Any search that is producing that many files is probably a bad fit for a subscription. Subscriptions are for lightweight searches that are updated every now and then.
(you might like to come back to this point once you have tried subs for a week or so and want to refine your workflow)
"},{"location":"getting_started_subscriptions.html#presentation","title":"ok, I set up three hundred queries, and now these popup buttons are a hassle","text":"On the edit subscription panel, the 'presentation' options let you publish files to a page. The page will have the subscription's name, just like the button makes, but it cuts out the middle-man and 'locks it in' more than the button, which will be forgotten if you restart the client. Also, if a page with that name already exists, the new files will be appended to it, just like a normal import page! I strongly recommend moving to this once you have several subs going. Make a 'page of pages' called 'subs' and put all your subscription landing pages in there, and then you can check it whenever is convenient.
If you discover your subscription workflow tends to be the same for each sub, you can also customise the publication 'label' used. If multiple subs all publish to the 'nsfw subs' label, they will all end up on the same 'nsfw subs' popup button or landing page. Sending multiple subscriptions' import streams into just one or two locations like this can be great.
You can also hide the main working popup. I don't recommend this unless you are really having a problem with it, since it is useful to have that 'active' feedback if something goes wrong.
Note that subscription file import options will, by default, only present 'new' files. Anything already in the db will still be recorded in the internal import cache and used to calculate next check times and so on, but it won't clutter your import stream. This is different to the default for all the other importers, but when you are ready to enter the ranks of the Patricians, you will know to edit your 'loud' default file import options under options->importing to behave this way as well. Efficient workflows only care about new files.
"},{"location":"getting_started_subscriptions.html#syncing_explanation","title":"how exactly does the sync work?","text":"Figuring out when a repeating search has 'caught up' can be a tricky problem to solve. It sounds simple, but unusual situations like 'a file got tagged late, so it inserted deeper than it ideally should in the gallery search' or 'the website changed its URL format completely, help' can cause problems. Subscriptions are automatic systems, so they tend to be a bit more careful and paranoid about problems, lest they burn 10GB on 10,000 unexpected diaperfur images.
The initial sync is simple. It does a regular search, stopping if it reaches the 'initial file limit' or the last file in the gallery, whichever comes first. The default initial file sync is 100, which is a great number for almost all situations.
Subsequent syncs are more complicated. It ideally 'stops' searching when it reaches files it saw in a previous sync, but if it comes across new files mixed in with the old, it will search a bit deeper. It is not foolproof, and if a file gets tagged very late and ends up a hundred deep in the search, it will probably be missed. There is no good and computationally cheap way at present to resolve this problem, but thankfully it is rare.
Remember that an important 'staying sane' philosophy of downloading and subscriptions is to focus on dealing with the 99.5% you have before worrying about the 0.5% you do not.
The amount of time between syncs is calculated by the checker options. Based on the timestamps attached to existing urls in the subscription cache (either added time, or the post time as parsed from the url), the sub estimates how long it will be before n new files appear, and then next check is scheduled for then. Unless you know what you are doing, checker options, like file limits, are best left alone. A subscription will naturally adapt its checking speed to the file 'velocity' of the source, and there is usually very little benefit to trying to force a sub to check at a radically different speed.
Tip
If you want to force your subs to run at the same time, say every evening, it is easier to just use network->pause->subscriptions as a manual master on/off control. The ones that are due will catch up together, the ones that aren't won't waste your time.
Remember that subscriptions only keep up with new content. They cannot search backwards in time in order to 'fill out' a search, nor can they fill in gaps. Do not change the file limits or check times to try to make this happen. If you want to ensure complete sync with all existing content for a particular search, use the manual downloader.
In practice, most subs only need to check the first page of a gallery since only the first two or three urls are new.
"},{"location":"getting_started_subscriptions.html#periodic_file_limit","title":"periodic file limit exceeded","text":"If, during a regular sync, the sub keeps finding new URLs, never hitting a block of already-seen URLs, it will stop upon hitting its 'periodic file limit', which is also usually 100. When it happens, you will get a popup message notification. There are two typical reasons for this:
The first case is a natural accident of statistics. The subscription now has a 'gap' in its sync. If you want to get what you missed, you can try to fill in the gap with a manual downloader page. Just download to 200 files or so, and the downloader will work quickly to one-time work through the URLs in the gap.
The second case is a safety stopgap for hydrus. If a site decides to have /post/123456
style URLs instead of post.php?id=123456
style, hydrus will suddenly see those as entirely 'new' URLs. It could also be because of an updated downloader, which pulls URLs in API format or similar. This is again thankfully quite rare, but it triggers several problems--the associated downloader usually breaks, as it does not yet recognise those new URLs, and all your subs for that site will parse through and hit the periodic limit for every query. When this happens, you'll usually get several periodic limit popups at once, and you may need to update your downloader. If you know the person who wrote the original downloader, they'll likely want to know about the problem, or may already have a fix sorted. It is often a good idea to pause the affected subs until you have it figured out and working in a normal gallery downloader page.
On the main subscription dialog, there are 'merge' and 'separate' buttons. These are powerful, but they will walk you through the process of pulling queries out of a sub and merging them back into a different one. Only subs that use the same download source can be merged. Give them a go, and if it all goes wrong, just hit the cancel button on the dialog.
"},{"location":"getting_started_tags.html","title":"Getting started with tags","text":"A tag is a small bit of text describing a single property of something. They make searching easy. Good examples are \"flower\" or \"nicolas cage\" or \"the sopranos\" or \"2003\". By combining several tags together ( e.g. [ 'tiger woods', 'sports illustrated', '2008' ] or [ 'cosplay', 'the legend of zelda' ] ), a huge image collection is reduced to a tiny and easy-to-digest sample.
"},{"location":"getting_started_tags.html#intro","title":"How do we find files?","text":"So, you have some files imported. Let's give them some tags so we can find them again later.
FAQ: what is a tag?
Your client starts with two local tags services, called 'my tags' and 'downloader tags' which keep all of their file->tag mappings in your client's database where only you can see them. 'my tags' is a good place to practise.
Select a file and press F3 to open the manage tags dialog:
The area below where you type is the 'autocomplete dropdown'. You will see this on normal search pages too. Type part of a tag, and matching results will appear below. Since you are starting out, your 'my tags' service won't have many tags in it yet, but things will populate fast! Select the tag you want with the arrow keys and hit enter. If you want to remove a tag, enter the exact same thing again or double-click it in the box above.
Prefixing a tag with a category and a colon will create a namespaced tag. This helps inform the software and other users about what the tag is. Examples of namespaced tags are:
character:batman
series:street fighter
person:jennifer lawrence
title:vitruvian man
The client is set up to draw common namespaces in different colours, just like boorus do. You can change these colours in the options.
Once you are happy with your tag changes, click 'apply', or hit F3 again, or simply press Enter on the text box while it is empty. The tags are now saved to your database.
Media Viewer Manage Tags
You can also open the manage tags dialog from the full media viewer, but note that this one does not have 'apply' and 'cancel' buttons, only 'close'. It makes its changes instantly, and you can keep using the rest of the program while it is open (it is a non-'modal' dialog).
Also, you need not close the media viewer's manage tags dialog while you browse. Just like you can hit Enter on the empty text box to close the dialog, hitting Page Up/Down navigates the parent viewer Back/Forward!
AlsoHit Arrow Up/Down on an empty text input to switch between the tag service tabs!
Once you have some tags set, typing the first few characters of one in on a search page will show the counts of all the tags that start with that. Enter the one you want, and the search will run:
If you add more 'predicates' to a search, you will limit the results to those files that match every single one:
You can also exclude a tag by prefixing it with a hyphen (e.g. -solo
).
You can add as many tags as you want. In general, the more search predicates you add, the smaller and faster the results will be, but some types of tag (like excluded -tags
), or the cleverer system
tags that you will soon learn about, can be suddenly CPU expensive. If a search takes more than a few seconds to run, a 'stop' button appears by the tag input. It cancels things out pretty quick in most cases.
Click the links on the left to go through the getting started guide. Subheadings are on the right. Larger sections are up top. Please at least skim every page in the getting started section, as this will introduce you to the main systems in the client. There is a lot, so you do not have to do it all in one go.
The section on installing, updating, and backing up is very important.
This help is available locally in every release. Hit help->help and getting started guide
in the client, or open install_dir/help/index.html
.
I've been on the internet and imageboards for a long time, saving everything I like to my hard drive. After a while, the whole collection was just too large to manage on my own. I couldn't find anything in the mess, and I just saved new files in there with names like 'image1257.jpg'.
There aren't many solutions to this problem that aren't online, and I didn't want to lose my privacy or control.
"},{"location":"introduction.html#anonymous","title":"on being anonymous","text":"I enjoy being anonymous online. When you aren't afraid of repercussions, you can be as truthful as you want and share interesting things, no matter how unusual. You can have unique conversations and tackle some otherwise unsolvable problems. It's fun!
I'm a normal Anon, nothing special. :^)
"},{"location":"introduction.html#hydrus_network","title":"the hydrus network","text":"So! I'm developing a program that helps people organise their files on their own terms and, if they want to, collaborate with others anonymously. I want to help you do what you want with your stuff, and that's it. You can share some tags (and files, but this is limited) with other people if you want to, but you don't have to connect to anything if you don't. The default is complete privacy, no sharing, and every upload requires a conscious action on your part. I don't plan to ever record metrics on users, nor serve ads, nor charge for my software. The software never phones home.
This does a lot more than a normal image viewer. If you are totally new to the idea of personal media collections and booru-style tagging, I suggest you start slow, walk through the getting started guides, and experiment doing different things. If you aren't sure on what a button does, try clicking it! You'll be importing thousands of files and applying tens of thousands of tags in no time. The best way to learn is just to try things out.
The client is chiefly a file database. It stores your files inside its own folders, managing them far better than an explorer window or some online gallery. Here's a screenshot of one of my test installs with a search showing all files:
As well as the client, there is also a server that anyone can run to store files or tags for sharing between many users. This is advanced, and almost always confusing to new users, do not explore this until you know what you are doing. There is however, a user-run public tag repository, with more than a billion tags, that you can access and contribute to if you wish.
I have many plans to expand the client and the network.
"},{"location":"introduction.html#principles","title":"statement of principles","text":"None of the above are currently true, but I would love to live in a world where they were. My software is an attempt to move us a little closer.
Where possible, I prefer decentralised systems that are focused on people. I still use gmail and youtube IRL just like pretty much everyone, but I would rather we have alternative systems for alternate work, especially in the future. No one seemed to be making what I wanted for file management, particularly as everything rushed to the cloud space, so I decided to make a local solution myself, and here we are.
If, after a few months, you find you enjoy the software and would like to further support it, I have set up a simple no-reward patreon, which you can read more about here.
"},{"location":"introduction.html#license","title":"license","text":"These programs are free software. Everything I, hydrus dev, have made is under the Do What The Fuck You Want To Public License, Version 3, as published by Kris Craig.
license.txt DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE\n Version 3, May 2010\n\nCopyright (C) 2010 by Kris Craig\nOlympia, WA USA\n\nEveryone is permitted to copy and distribute verbatim or modified\ncopies of this license document, and changing it is allowed as long\nas the name is changed.\n\nThis license applies to any copyrightable work with which it is\npackaged and/or distributed, except works that are already covered by\nanother license. Any other license that applies to the same work\nshall take precedence over this one.\n\nTo the extent permitted by applicable law, the works covered by this\nlicense are provided \"as is\" and do not come with any warranty except\nwhere otherwise explicitly stated.\n\n\n DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE\n TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION, AND MODIFICATION\n\n 0. You just DO WHAT THE FUCK YOU WANT TO.\n
Do what the fuck you want to with my software, and if shit breaks, DEAL WITH IT.
"},{"location":"ipfs.html","title":"IPFS","text":"IPFS is a p2p protocol that makes it easy to share many sorts of data. The hydrus client can communicate with an IPFS daemon to send and receive files.
You can read more about IPFS from their homepage, or this guide that explains its various rules in more detail.
For our purposes, we only need to know about these concepts:
Note there is now a nicer desktop package here. I haven't used it, but it may be a nicer intro to the program.
Get the prebuilt executable here. Inside should be a very simple 'ipfs' executable that does everything. Extract it somewhere and open up a terminal in the same folder, and then type:
ipfs init
ipfs daemon
The IPFS exe should now be running in that terminal, ready to respond to requests:
You can kill it with Ctrl+C and restart it with the ipfs daemon
call again (you only have to run ipfs init
once).
When it is running, opening this page should download and display an example 'Hello World!' file from ~~~across the internet~~~.
Your daemon listens for other instances of ipfs using port 4001, so if you know how to open that port in your firewall and router, make sure you do.
"},{"location":"ipfs.html#connecting","title":"connecting your client","text":"IPFS daemons are treated as services inside hydrus, so go to services->manage services->remote->ipfs daemons and add in your information. Hydrus uses the API port, default 5001, so you will probably want to use credentials of 127.0.0.1:5001
. You can click 'test credentials' to make sure everything is working.
Thereafter, you will get the option to 'pin' and 'unpin' from a thumbnail's right-click menu, like so:
This works like hydrus's repository uploads--it won't happen immediately, but instead will be queued up at the pending menu. Commit all your pins when you are ready:
Notice how the IPFS icon appears on your pending and pinned files. You can search for these files using 'system:file service'.
Unpin works the same as pin, just like a hydrus repository petition.
Right-clicking any pinned file will give you a new 'share' action:
Which will put it straight in your clipboard. In this case, it is QmP6BNvWfkNf74bY3q1ohtDZ9gAmss4LAjuFhqpDPQNm1S.
If you want to share a pinned file with someone, you have to tell them this multihash. They can then:
http://127.0.0.1:8080/ipfs/[multihash]
http://ipfs.io/ipfs/[multihash]
If you have many files to share, IPFS also supports directories, and now hydrus does as well. IPFS directories use the same sorts of multihash as files, and you can download them into the hydrus client using the same pages->new download popup->an ipfs multihash menu entry. The client will detect the multihash represents a directory and give you a simple selection dialog:
You may recognise those hash filenames--this example was created by hydrus, which can create ipfs directories from any selection of files from the same right-click menu:
Hydrus will pin all the files and then wrap them in a directory, showing its progress in a popup. Your current directory shares are summarised on the respective services->review services panel:
"},{"location":"ipfs.html#additional_links","title":"additional links","text":"If you find you use IPFS a lot, here are some add-ons for your web browser, as recommended by /tech/:
This script changes all bare ipfs hashes into clickable links to the ipfs gateway (on page loads):
These redirect all gateway links to your local daemon when it's on, it works well with the previous script:
You can launch the program with several different arguments to alter core behaviour. If you are not familiar with this, you are essentially putting additional text after the launch command that runs the program. You can run this straight from a terminal console (usually good to test with), or you can bundle it into an easy shortcut that you only have to double-click. An example of a launch command with arguments:
C:\\Hydrus Network\\hydrus_client.exe -d=\"E:\\hydrus db\" --no_db_temp_files\n
You can also add --help to your program path, like this:
hydrus_client.py --help
hydrus_server.exe --help
./hydrus_server --help
Which gives you a full listing of all below arguments, however this will not work with the built hydrus_client executables, which are bundled as a non-console programs and will not give you text output to any console they are launched from. As hydrus_client.exe is the most commonly run version of the program, here is the list, with some more help about each command:
"},{"location":"launch_arguments.html#-d_db_dir_--db_dir_db_dir","title":"-d DB_DIR, --db_dir DB_DIR
","text":"Lets you customise where hydrus should use for its base database directory. This is install_dir/db by default, but many advanced deployments will move this around, as described here. When an argument takes a complicated value like a path that could itself include whitespace, you should wrap it in quote marks, like this:
-d=\"E:\\my hydrus\\hydrus db\"\n
"},{"location":"launch_arguments.html#--temp_dir_temp_dir","title":"--temp_dir TEMP_DIR
","text":"This tells all aspects of the client, including the SQLite database, to use a different path for temp operations. This would be by default your system temp path, such as:
C:\\Users\\You\\AppData\\Local\\Temp\n
But you can also check it in help->about. A handful of database operations (PTR tag processing, vacuums) require a lot of free space, so if your system drive is very full, or you have unusual ramdisk-based temp storage limits, you may want to relocate to another location or drive.
"},{"location":"launch_arguments.html#--db_journal_mode_waltruncatepersistmemory","title":"--db_journal_mode {WAL,TRUNCATE,PERSIST,MEMORY}
","text":"Change the journal mode of the SQLite database. The default is WAL, which works great for almost all SSD drives, but if you have a very old or slow drive, or if you encounter 'disk I/O error' errors on Windows with an NVMe drive, try TRUNCATE. Full docs are here.
Briefly:
--db_transaction_commit_period DB_TRANSACTION_COMMIT_PERIOD
","text":"Change the regular duration at which any database changes are committed to disk. By default this is 30 (seconds) for the client, and 120 for the server. Minimum value is 10. Typically, if hydrus crashes, it may 'forget' what happened up to this duration on the next boot. Increasing the duration will result in fewer overall 'commit' writes during very heavy work that makes several changes to the same database pages (read up on WAL mode for more details here), but it will increase commit time and memory/storage needs. Note that changes can only be committed after a job is complete, so if a single job takes longer than this period, changes will not be saved until it is done.
"},{"location":"launch_arguments.html#--db_cache_size_db_cache_size","title":"--db_cache_size DB_CACHE_SIZE
","text":"Change the size of the cache SQLite will use for each db file, in MB. By default this is 256, for 256MB, which for the four main client db files could mean an absolute 1GB peak use if you run a very heavy client and perform a long period of PTR sync. This does not matter so much (nor should it be fully used) if you have a smaller client.
"},{"location":"launch_arguments.html#--db_synchronous_override_0123","title":"--db_synchronous_override {0,1,2,3}
","text":"Change the rules governing how SQLite writes committed changes to your disk. The hydrus default is 1 with WAL, 2 otherwise.
A user has written a full guide on this value here! SQLite docs here.
"},{"location":"launch_arguments.html#--no_db_temp_files","title":"--no_db_temp_files
","text":"When SQLite performs very large queries, it may spool temporary table results to disk. These go in your temp directory. If your temp dir is slow but you have a ton of memory, set this to never spool to disk, as here.
"},{"location":"launch_arguments.html#--boot_debug","title":"--boot_debug
","text":"Prints additional debug information to the log during the bootup phase of the application.
"},{"location":"launch_arguments.html#--profile_mode","title":"--profile_mode
","text":"This starts the program with 'Profile Mode' turned on, which captures the performance of boot functions. This is also a way to get Profile Mode on the server, although support there is very limited.
"},{"location":"launch_arguments.html#--win_qt_darkmode_test","title":"--win_qt_darkmode_test
","text":"Windows only, client only: This starts the program with Qt's 'darkmode' detection enabled, as here, set to 1 mode. It will override any existing qt.conf, so it is only for experimentation. We are going to experiment more with the 2 mode, but that locks the style to windows
, and can't handle switches between light and dark mode.
The server supports the same arguments. It also takes an optional positional argument of 'start' (start the server, the default), 'stop' (stop any existing server), or 'restart' (do a stop, then a start), which should go before any of the above arguments.
"},{"location":"local_booru.html","title":"local booru","text":"Warning
This was a fun project, but it never advanced beyond a prototype. The future of this system is other people's nice applications plugging into the Client API.
The hydrus client has a simple booru to help you share your files with others over the internet.
First of all, this is hosted from your client, which means other people will be connecting to your computer and fetching files you choose to share from your hard drive. If you close your client or shut your computer down, the local booru will no longer work.
"},{"location":"local_booru.html#setting_up","title":"how to do it","text":"First of all, turn the local booru server on by going to services->manage services and giving it a port:
It doesn't matter what you pick, but make it something fairly high. When you ok that dialog, the client should start the booru. You may get a firewall warning.
Then right click some files you want to share and select share->local booru. This will throw up a small dialog, like so:
This lets you enter an optional name, which titles the share and helps you keep track of it, an optional text, which lets you say some words or html to the people you are sharing with, and an expiry, which lets you determine if and when the share will no longer work.
You can also copy either the internal or external link to your clipboard. The internal link (usually starting something like http://127.0.0.1:45866/
) works inside your network and is great just for testing, while the external link (starting http://[your external ip address]:[external port]/
) will work for anyone around the world, as long as your booru's port is being forwarded correctly.
If you use a dynamic-ip service like No-IP, you can replace your external IP with your redirect hostname. You have to do it by hand right now, but I'll add a way to do it automatically in future.
Danger
Note that anyone with the external link will be able to see your share, so make sure you only share links with people you trust.
"},{"location":"local_booru.html#port_forwarding","title":"forwarding your port","text":"Your home router acts as a barrier between the computers inside the network and the internet. Those inside can see out, but outsiders can only see what you tell the router to permit. Since you want to let people connect to your computer, you need to tell the router to forward all requests of a certain kind to your computer, and thus your client.
If you have never done this before, it can be a headache, especially doing it manually. Luckily, a technology called UPnP makes it a ton easier, and this is how your Skype or Bittorrent clients do it automatically. Not all routers support it, but most do. You can have hydrus try to open a port this way back on services->manage services. Unless you know what you are doing and have a good reason to make them different, you might as well keep the internal and external ports the same.
Once you have it set up, the client will try to make sure your router keeps that port open for your client. If it all works, you should see the new mapping appear in your services->manage local upnp dialog, which lists all your router's current port mappings.
If you want to test that the port forward is set up correctly, going to http://[external ip]:[external port]/
should give a little html just saying hello. Your ISP might not allow you to talk to yourself, though, so ask a friend to try if you are having trouble.
If you still do not understand what is going on here, this is a good article explaining everything.
If you do not like UPnP or your router does not support it, you can set the port forward up manually, but I encourage you to keep the internal and external port the same, because absent a 'upnp port' option, the 'copy external share link' button will use the internal port.
"},{"location":"local_booru.html#example","title":"so, what do you get?","text":"The html layout is very simple:
It uses a very similar stylesheet to these help pages. If you would like to change the style, have a look at the html and then edit install_dir/static/local_booru_style.css. The thumbnails will be the same size as in your client.
"},{"location":"local_booru.html#editing_shares","title":"editing an existing share","text":"You can review all your shares on services->review services, under local->booru. You can copy the links again, change the title/text/expiration, and delete any shares you don't want any more.
"},{"location":"local_booru.html#future","title":"future plans","text":"This was a fun project, but it never advanced beyond a prototype. The future of this system is other people's nice applications plugging into the Client API.
"},{"location":"petitionPractices.html","title":"Petitions practices","text":"This document exists to give a rough idea what to do in regard to the PTR to avoid creating uncecessary work for the janitors.
"},{"location":"petitionPractices.html#general_practice","title":"General practice","text":"Kindly avoid creating unnecessary work. Create siblings for underscore and non-namespaced/namespaced versions. Petition for deletion if they are wrong. Providing a reason outside of the stock choices helps the petition getting accepted. If, for whatever reason, you have some mega job that needs doing it's often a good idea to talk to a janitor instead since we can just go ahead and do the job directly without having to deal with potentially tens of petitions because of how Hydrus splits them on the server. An example that we often come across is the removal of the awful Sankaku URLs that are almost everywhere these days due to people using a faulty parser. It's a pretty easy search and delete for a janitor, but a lot of annoying clicking if dealt with as a petition since one big petition can be split out to God-only-knows-how many.
Eventually the PTR janitors will get tools to replace various bad but correct tags on the server itself. These include underscored, wrong or no namespace, common misspelling, wrong locale, and so on. Since we're going to have to do the job eventually anyway there's not much of a point making us do it twice by petitioning the existing bad but correct tags. Just sibling them and leave them be for now.
"},{"location":"petitionPractices.html#ambiguity","title":"Ambiguity","text":"Don't make additions involving ambiguous tags. hibiki
-> character:hibiki (kantai collection)
is bad since there's far more than one character with that name. There's quite a few wrongly tagged images because of things like this. Petitioning the deletion of such a bad sibling is good.
Anything that's covered by system predicates. Siblinging these is unecessary and parenting pointless. There's no harm leaving them be aside from crowding the tag list but there's no harm to deleting them either.
system:dimensions
covers most everything related to resolution and aspect ratios. medium:high resolution
, 4:3 aspect ratio
, and pixel count.
system:duration
for whether something has duration (is a video or animated gif/png/whatever), or is a still image.
system:has audio
for if an image has audio or not. system:has duration + system:no audio
replaces video with no sound
as an example.
system:filesize
for things like huge filesize
.
system:filetype
for filetypes. Gif, webm, mp4, psd, and so on. Anything that Hydrus can recognise which is quite a bit.
Don't push parents for tags that are not top-level siblings. It makes tracking down potential issues hard.
Only push parents for relations that are literally always true, no exceptions. character:james bond
-> series:james bond
is a good example because James Bond always belong to that series. -> gender:male
is bad because an artist might decide to draw a genderbent piece of art. Similarily -> person:pierce brosnan
is bad because there have been other actors for the character.
List of some bad parents to character:
tags as an example: - species:
due to the various -zations (humanization, animalization, mechanization). - creator:
since just about anybody can draw art of the character. - gender:
Since genderswap
and variations exists. - Any form of physical characteristics such as hair or eye colour, hair length, clothing and accessories, etc.
Translations should be siblinged to what the closest in-use romanised tag is if there's no proper translation. If the tag is ambiguous, such as \u97ff
or \u30d2\u30d3\u30ad
which means hibiki
, just sibling them to the ambiguous tag. The tag can then later on be deleted and replaced by a less ambiguous tag. On the other hand, \u97ff(\u8266\u968a\u3053\u308c\u304f\u3057\u3087\u3093)
straight up means hibiki (kantai kollection)
and can safely be siblinged to the proper character:
tag. Do the same for subjective tags. \u9b45\u60d1\u306e\u3075\u3068\u3082\u3082
can be translated to bewitching thighs
. \u307e\u3063\u305f\u304f\u3001\u99c6\u9010\u8266\u306f\u6700\u9ad8\u3060\u305c!!
straight up translates to Geez, destroyers are the best!!
, which does not contain much usable information for Hydrus currently. These can then either be siblinged down to an unsubjective tag (thighs
) if there's objective information in the tag, deleted if purely subjective, or deleted and replaced if ambiguous.
tl;dr
Using a trustworthy VPN for all your remotely fun internet traffic is a good idea. It is cheap and easy these days, and it offers multiple levels of general protection.
I have tried very hard to ensure the hydrus network servers respect your privacy. They do not work like normal websites, and the amount of information your client will reveal to them is very limited. For most general purposes, normal users can rest assured that their activity on a repository like the Public Tag Repository (PTR) is effectively completely anonymous.
You need an account to connect, but all that really means serverside is a random number with a random passcode. Your client tells nothing more to the server than the exact content you upload to it (e.g. tag mappings, which are a tag+file_hash pair). The server cannot help but be aware of your IP address to accept your network request, but in all but one situation--uploading a file to a file repository when the administrator has set to save IPs for DMCA purposes--it forgets your IP as soon as the job is done.
So that janitors can process petitions efficiently and correct mistakes, servers remember which accounts upload which content, but they do not communicate this to any place, and the memory only lasts for a certain time--after which the content is completely anonymised. The main potential privacy worries are over a malicious janitor or--more realistically, since the janitor UI is even more buggy and feature-poor than the hydrus front-end!--a malicious server owner or anyone else who gains raw access to the server's raw database files or its code as it operates. Even in the case where you cannot trust the server you are talking to, hydrus should be fairly robust, simply because the client does not say much to the server, nor that often. The only realistic worries, as I talk about in detail below, are if you actually upload personal files or tag personal files with real names. I can't do much about being Anon if you (accidentally or not), declare who you are.
So, in general, if you are on a good VPN and tagging anime babes from boorus, I think we are near perfect on privacy. That said, our community is rightly constantly thinking about this topic, so in the following I have tried to go into exhaustive detail. Some of the vulnerabilities are impractical and esoteric, but if nothing else it is fun to think about. If you can think of more problems, or decent mitigations, let me know!
"},{"location":"privacy.html#https_certificates","title":"https certificates","text":"Hydrus servers only communicate in https, so anyone who is able to casually observe your traffic (say your roommate cracked your router, or the guy running the coffee shop whose wifi you are using likes to snoop) should not ever be able to see what data you are sending or receiving. If you do not use a VPN, they will be able to see that you are talking to the repository (and the repository will technically see who you are, too, though as above, it normally isn't interested). Someone more powerful, like your ISP or Government, may be able to do more:
If you just start a new server yourselfWhen you first make a server, the 'certificate' it creates to enable https is a low quality one. It is called 'self-signed' because it is only endorsed by itself and it is not tied to a particular domain on the internet that everyone agrees on via DNS. Your traffic to this server is still encrypted, but an advanced attacker who stands between you and the server could potentially perform what is called a man-in-the-middle attack and see your traffic.
This problem is fairly mitigated by using a VPN, since even if someone were able to MitM your connection, they know no more than your VPN's location, not your IP.
A future version of the network will further mitigate this problem by having you enter unverified certificates into a certificate manager and then compare to that store on future requests, to try to detect if a MitM attack is occurring.
If the server is on a domain and now uses a proper verified certificate If the admin hosts the server on a website domain (rather than a raw IP address) and gets a proper certificate for that domain from a service like Let's Encrypt, they can swap that into the server and then your traffic should be protected from any eavesdropper. It is still good to use a VPN to further obscure who you are, including from the server admin.You can check how good a server's certificate is by loading its base address in the form https://host:port
into your browser. If it has a nice certificate--like the PTR--the welcome page will load instantly. If it is still on self-signed, you'll get one of those 'can't show this page unless you make an exception' browser error pages before it will show.
An account has two hex strings, like this:
Access key: 4a285629721ca442541ef2c15ea17d1f7f7578b0c3f4f5f2a05f8f0ab297786f
This is in your services->manage services panel, and acts like a password. Keep this absolutely secret--only you know it, and no one else ever needs to. If the server has not had its code changed, it does not actually know this string, but it is stores special data that lets it verify it when you 'log in'.
Account ID: 207d592682a7962564d52d2480f05e72a272443017553cedbd8af0fecc7b6e0a
This can be copied from a button in your services->review services panel, and acts a bit like a semi-private username. Only janitors should ever have access to this. If you ever want to contact the server admin about an account upgrade or similar, they will need to know this so they can load up your account and alter it.
When you generate a new account, the client first asks the server for a list of available auto-creatable account types, then asks for a registration token for one of them, then uses the token to generate an access key. The server is never told anything about you, and it forgets your IP address as soon as it finishes talking to you.
Your account also stores a bandwidth use record and some miscellaneous data such as when the account was created, if and when it expires, what permissions and bandwidth rules it has, an aggregate score of how often it has petitions approved rather than denied, and whether it is currently banned. I do not think someone inspecting the bandwidth record could figure out what you were doing based on byte counts (especially as with every new month the old month's bandwidth records are compressed to just one number) beyond the rough time you synced and whether you have done much uploading. Since only a janitor can see your account and could feasibly attempt to inspect bandwidth data, they would already know this information.
"},{"location":"privacy.html#downloading","title":"downloading","text":"When you sync with a repository, your client will download and then keep up to date with all the metadata the server knows. This metadata is downloaded the same way by all users, and it comes in a completely anonymous format. The server does not know what you are interested in, and no one who downloads knows who uploaded what. Since the client regularly updates, a detailed analysis of the raw update files will reveal roughly when a tag or other row was added or deleted, although that timestamp is no more precise than the duration of the update period (by default, 100,000 seconds, or a little over a day).
Your client will never ask the server for information about a particular file or tag. You download everything in generic chunks, form a local index of that information, and then all queries are performed on your own hard drive with your own CPU.
By just downloading, even if the server owner were to identify you by your IP address, all they know is that you sync. They cannot tell anything about your files.
In the case of a file repository, you client downloads all the thumbnails automatically, but then you download actual files separately as you like. The server does not log which files you download.
"},{"location":"privacy.html#uploading","title":"uploading","text":"When you upload, your account is temporarily linked to the rows of content you add. This is so janitors can group petitions by who makes them, undo large mistakes easily, and even leave you a brief message (like \"please stop adding those clothing siblings\") for your client to pick up the next time it syncs your account. After the temporary period is over, all submissions are anonymised. So, what are the privacy concerns with that? Isn't the account 'Anon'?
Privacy can be tricky. Hydrus tech is obviously far, far better than anything normal consumers use, but here I believe are the remaining barriers to pure Anonymity, assuming someone with resources was willing to put a lot of work in to attack you:
Note
I am using the PTR as the example since that is what most people are using. If you are uploading to a server run between friends, privacy is obviously more difficult to preserve--if there are only three users, it may not be too hard to figure out who is uploading the NarutoXSonichu diaperfur content! If you are talking to a server with a small group of users, don't upload anything crazy or personally identifying unless that's the point of the server.
"},{"location":"privacy.html#ip_address_across_network","title":"IP Address Across Network","text":"Attacker: ISP/Government.
Exposure: That you use the PTR.
Problem: Your IP address may be recorded by servers in between you and the PTR (e.g. your ISP/Government). Anyone who could convert that IP address and timestamp into your identity would know you were a PTR user.
Mitigation: Use a trustworthy VPN.
"},{"location":"privacy.html#ip_address_at_ptr","title":"IP Address At PTR","text":"Attacker: PTR administrator or someone else who has access to the server as it runs.
Exposure: Which PTR account you are.
Problem: I may be lying to you about the server forgetting IPs, or the admin running the PTR may have secretly altered its code. If the malicious admin were able to convert IP address and timestamp into your identity, they obviously be able to link that to your account and thus its various submissions.
Mitigation: Use a trustworthy VPN.
"},{"location":"privacy.html#time_identifiable_uploads","title":"Time Identifiable Uploads","text":"Attacker: Anyone with an account on the PTR.
Exposure: That you use the PTR.
Problem: If a tag was added way before the file was public, then it is likely the original owner tagged it. An example would be if you were an artist and you tagged your own work on the PTR two weeks before publishing the work. Anyone who looked through the server updates carefully and compared to file publish dates, particularly if they were targeting you already, could notice the date discrepancy and know you were a PTR user.
Mitigation: Don't tag any file you plan to share if you are currently the only person who has any copies. Upload it, then tag it.
"},{"location":"privacy.html#content_identifiable_uploads","title":"Content Identifiable Uploads","text":"Attacker: Anyone with an account on the PTR.
Exposure: That you use the PTR.
Problem: All uploads are shared anonymously with other users, but if the content itself is identifying, you may be exposed. An example would be if there was some popular lewd file floating around of you and your girlfriend, but no one knew who was in it. If you decided to tag it with accurate 'person:' tags, anyone synced with the PTR, when they next looked at that file, would see those person tags. The same would apply if the file was originally private but then leaked.
Mitigation: Just like an imageboard, do not upload any personally identifying information.
"},{"location":"privacy.html#individual_account_cross-referencing","title":"Individual Account Cross-referencing","text":"Attacker: PTR administrator or someone else with access to the server database files after one of your uploads has been connected to your real identity, perhaps with a Time/Content Identifiable Upload as above.
Exposure: What you have been uploading recently.
Problem: If you accidentally tie your identity to an individual content row (could be as simple as telling an admin 'yes, I, person whose name you know, uploaded that sibling last week'), then anyone who can see which accounts uploaded what will obviously be able to see your other uploads.
Mitigation: Best practise is to not to reveal specifically what you upload. Note that this vulnerability (an admin looking up what else you uploaded after they discover something else you did) is now well mitigated by the account history anonymisation as below (assuming the admin has not altered the code to disable it!). If the server is set to anonymise content after 90 days, then your account can only be identified from specific content rows that were uploaded in the past 90 days, and cross-references would also only see the last 90 days of activity.
"},{"location":"privacy.html#big_brain_individual_account_mapping_fingerprint_cross-referencing","title":"Big Brain Individual Account Mapping Fingerprint Cross-referencing","text":"Attacker: Someone who has access to tag/file favourite lists on another site and gets access to a hydrus repository that has been compromised to not anonymise history for a long duration.
Exposure: Which PTR account another website's account uses.
Problem: Someone who had raw access to the PTR database's historical account record (i.e. they had disabled the anonymisation routine below) and also had compiled some booru users' 'favourite tag/artist' lists and was very clever could try to cross reference those two lists and connect a particular PTR account to a particular booru account based on similar tag distributions. There would be many holes in the PTR record, since only the first account to upload a tag mapping is linked to it, but maybe it would be possible to get high confidence on a match if you have really distinct tastes. Favourites lists are probably decent digital fingerprints, and there may be a shadow of that in your PTR uploads, although I also think there are enough users uploading and 'competing' for saved records on different tags that each users' shadow would be too indistinct to really pull this off.
Mitigation: I am mostly memeing here. But privacy is tricky, and who knows what the scrapers of the future are going to do with all the cloud data they are sucking up. Even then, the historical anonymisation routine below now generally eliminates this threat, assuming the server has not been compromised to disable it, so it matters far less if its database files fall into bad hands in the future, but accounts on regular websites are already being aggregated by the big marketing engines, and this will only happen in more clever ways in future. I wouldn't be surprised if booru accounts are soon being connected to other online identities based on fingerprint profiles of likes and similar. Don't save your spicy favourites on a website, even if that list is private, since if that site gets hacked or just bought out one day, someone really smart could start connecting dots ten years from now.
"},{"location":"privacy.html#account_history","title":"account history anonymisation","text":"As the PTR moved to multiple accounts, we talked more about the potential account cross-referencing worries. The threats are marginal today, but it may be a real problem in future. If the server database files were to ever fall into bad hands, having a years-old record of who uploaded what is not excellent. Like the AOL search leak, that data may have unpleasant rammifications, especially to an intelligent scraper in the future. This historical record is also not needed for most janitorial work.
Therefore, hydrus repositories now completely anonymise all uploads after a certain delay. It works by assigning ownership of every file, mapping, or tag sibling/parent to a special 'null' account, so all trace that your account uploaded any of it is deleted. It happens by default 90 days after the content is uploaded, but it can be more or less depending on the local admin and janitors. You can see the current 'anonymisation' period under review services.
If you are a janitor with the ability to modify accounts based on uploaded content, you will see anything old will bring up the null account. It is specially labelled, so you can't miss it. You cannot ban or otherwise alter this account. No one can actually use it.
"},{"location":"reducing_lag.html","title":"reducing lag","text":""},{"location":"reducing_lag.html#intro","title":"hydrus is cpu and hdd hungry","text":"The hydrus client manages a lot of complicated data and gives you a lot of power over it. To add millions of files and tags to its database, and then to perform difficult searches over that information, it needs to use a lot of CPU time and hard drive time--sometimes in small laggy blips, and occasionally in big 100% CPU chunks. I don't put training wheels or limiters on the software either, so if you search for 300,000 files, the client will try to fetch that many.
Furthermore, I am just one unprofessional guy dealing with a lot of legacy code from when I was even worse at programming. I am always working to reduce lag and other inconveniences, and improve UI feedback when many things are going on, but there is still a lot for me to do.
In general, the client works best on snappy computers with low-latency hard drives where it does not have to constantly compete with other CPU- or HDD- heavy programs. Running hydrus on your games computer is no problem at all, but if you leave the client on all the time, then make sure under the options it is set not to do idle work while your CPU is busy, so your games can run freely. Similarly, if you run two clients on the same computer, you should have them set to work at different times, because if they both try to process 500,000 tags at once on the same hard drive, they will each slow to a crawl.
If you run on an HDD, keeping it defragged is very important, and good practice for all your programs anyway. Make sure you know what this is and that you do it.
"},{"location":"reducing_lag.html#maintenance_and_processing","title":"maintenance and processing","text":"I have attempted to offload most of the background maintenance of the client (which typically means repository processing and internal database defragging) to time when you are not using the client. This can either be 'idle time' or 'shutdown time'. The calculations for what these exactly mean are customisable in file->options->maintenance and processing.
If you run a quick computer, you likely don't have to change any of these options. Repositories will synchronise and the database will stay fairly optimal without you even noticing the work that is going on. This is especially true if you leave your client on all the time.
If you have an old, slower computer though, or if your hard drive is high latency, make sure these options are set for whatever is best for your situation. Turning off idle time completely is often helpful as some older computers are slow to even recognise--mid task--that you want to use the client again, or take too long to abandon a big task half way through. If you set your client to only do work on shutdown, then you can control exactly when that happens.
"},{"location":"reducing_lag.html#reducing_lag","title":"reducing search and general gui lag","text":"Searching for tags via the autocomplete dropdown and searching for files in general can sometimes take a very long time. It depends on many things. In general, the more predicates (tags and system:something) you have active for a search, and the more specific they are, the faster it will be.
You can also look at file->options->speed and memory. Increasing the autocomplete thresholds under tags->manage tag display and search is also often helpful. You can even force autocompletes to only fetch results when you manually ask for them.
Having lots of thumbnails open or downloads running can slow many things down. Check the 'pages' menu to see your current session weight. If it is about 50,000, or you have individual pages with more than 10,000 files or download URLs, try cutting down a bit.
"},{"location":"reducing_lag.html#profiles","title":"finally - profiles","text":"Programming is all about re-editing your first, second, third drafts of an idea. You are always going back to old code and adding new features or making it work better. If something is running slow for you, I can almost always speed it up or at least improve the way it schedules that chunk of work.
However figuring out exactly why something is running slow or holding up the UI is tricky and often gives an unexpected result. I can guess what might be running inefficiently from reports, but what I really need to be sure is a profile, which drills down into every function of a job, counting how many times they are called and timing how long they take. A profile for a single call looks like this.
So, please let me know:
You can generate a profile by hitting help->debug->profiling->profile mode, which tells the client to generate profile information for almost all of its behind the scenes jobs. This can be spammy, so don't leave it on for a very long time (you can turn it off by hitting the help menu entry again).
Turn on profile mode, do the thing that runs slow for you (importing a file, fetching some tags, whatever), and then check your database folder (most likely install_dir/db) for a new 'client profile - DATE.log' file. This file will be filled with several sets of tables with timing information. Please send that whole file to me, or if it is too large, cut what seems important. It should not contain any personal information, but feel free to look through it.
There are several ways to contact me.
"},{"location":"running_from_source.html","title":"running from source","text":"I write the client and server entirely in python, which can run straight from source. It is getting simpler and simpler to run python programs like this, so don't be afraid of it. If none of the built packages work for you (for instance if you use Windows 8.1 or 18.04 Ubuntu (or equivalent)), it may be the only way you can get the program to run. Also, if you have a general interest in exploring the code or wish to otherwise modify the program, you will obviously need to do this.
"},{"location":"running_from_source.html#simple_setup_guide","title":"Simple Setup Guide","text":"There are now setup scripts that make this easy on Windows and Linux. You do not need any python experience.
"},{"location":"running_from_source.html#summary","title":"Summary:","text":"First of all, you will need to install Python. Get 3.10 or 3.11 here. During the install process, make sure it has something like 'Add Python to PATH' checked. This makes Python available to your Windows.
You should already have a fairly new python. Ideally, you want at least 3.9.
You should already have python of about the correct version.
If you are already on a very new version of python, that's ok--you might need to select the 'advanced' setup later on and choose the '(t)est' options. If you are stuck on a much older version of python, try the same thing, but with the '(o)lder' options (but I can't promise it will work!).
Then, get the hydrus source. The github repo is https://github.com/hydrusnetwork/hydrus. If you are familiar with git, you can just clone the repo to the location you want with git clone https://github.com/hydrusnetwork/hydrus
, but if not, then just go to the latest release and download and extract the source code .zip somewhere. Make sure the directory has write permissions (e.g. don't put it in \"Program Files\"). Extracting straight to a spare drive, something like \"D:\\Hydrus Network\", is ideal.
We will call the base extract directory, the one with 'hydrus_client.py' in it, install_dir
.
Mixed Builds
Don't mix and match build extracts and source extracts. The process that runs the code gets confused if there are unexpected extra .dlls in the directory. If you need to convert between built and source releases, perform a clean install.
If you are converting from one install type to another, make a backup before you start. Then, if it all goes wrong, you'll always have a safe backup to rollback to.
"},{"location":"running_from_source.html#built_programs","title":"Built Programs","text":"There are three special external libraries. You just have to get them and put them in the correct place:
WindowsLinuxmacOSmpv
mpv-2.dll
.Then open that archive and place the 'mpv-1.dll' or 'mpv-2.dll' into install_dir
.
I have word that that newer mpv, the API version 2.1 that you have to rename to mpv-2.dll, will work on Qt5 and Windows 7. If this applies to you, feel free to have a play around with different versions here. You'll need the newer mpv choice in the setup-venv script too, which, depending on your situation, may not be possible.
SQLite3
Go to install_dir/static/build_files/windows
and copy 'sqlite3.dll' into install_dir
.
FFMPEG
Get a Windows build of FFMPEG here.
Extract the ffmpeg.exe into install_dir/bin
.
mpv
Try running apt-get install libmpv1
in a new terminal. You can type apt show libmpv1
to see your current version. Or, if you use a different package manager, try searching libmpv
or libmpv1
on that.
SQLite3
No action needed.
FFMPEG
You should already have ffmpeg. Just type ffmpeg
into a new terminal, and it should give a basic version response. If you somehow don't have ffmpeg, check your package manager.
If you run into trouble running newer versions of Qt6, which you will be setting up later, some users have fixed it by installing the packages libicu-dev
and libxcb-cursor-dev
. With apt
that will be:
sudo apt-get install libicu-dev
sudo apt-get install libxcb-cursor-dev
mpv
Unfortunately, mpv is not well supported in macOS yet. You may be able to install it in brew, but it seems to freeze the client as soon as it is loaded. Hydev is thinking about fixes here.
SQLite3
No action needed.
FFMPEG
You should already have ffmpeg.
Double-click setup_venv.bat
.
The file is setup_venv.sh
. You may be able to double-click it. If not, open a terminal in the folder and type:
./setup_venv.sh
If you do not have permission to execute the file, do this before trying again:
chmod +x setup_venv.sh
You will likely have to do the same on the other .sh files.
If you get an error about the venv failing to activate during setup_venv.sh
, you may need to install venv especially for your system. The specific error message should help you out, but you'll be looking at something along the lines of apt install python3.10-venv
.
If you like, you can run the setup_desktop.sh
file to install a hydrus.desktop file to your applications folder. (Or check the template in install_dir/static/hydrus.desktop
and do it yourself!)
Double-click setup_venv.command
.
If you do not have permission to run the .command file, then open a terminal on the folder and enter:
chmod +x setup_venv.command
You will likely have to do the same on the other .command files.
You may need to experiment with the advanced choices, especially if your macOS is a litle old.
The setup will ask you some questions. Just type the letters it asks for and hit enter. Most users are looking at the (s)imple setup, but if your situation is unusual, try the (a)dvanced, which will walk you through the main decisions. Once ready, it should take a minute to download its packages and a couple minutes to install them. Do not close it until it is finished installing everything and says 'Done!'. If it seems like it hung, just give it time to finish.
If something messes up, or you want to make a different decision, just run the setup script again and it will reinstall everything. Everything these scripts do ends up in the 'venv' directory, so you can also just delete that folder to 'uninstall' the venv. It should just work on most normal computers, but let me know if you have any trouble.
Then run the 'setup_help' script to build the help. This isn't necessary, but it is nice to have it built locally. You can run this again at any time to rebuild the current help.
"},{"location":"running_from_source.html#running_it_1","title":"Running it","text":"WindowsLinuxmacOSRun 'hydrus_client.bat' to start the client.
Run 'hydrus_client.sh' to start the client. Don't forget to set chmod +x hydrus_client.sh
if you need it.
Run 'hydrus_client.command' to start the client. Don't forget to set chmod +x hydrus_client.command
if you need it.
The first start will take a little longer (it has to compile all the code into something your computer understands). Once up, it will operate just like a normal build with the same folder structure and so on.
Missing a Library
If the client fails to boot, it should place a 'hydrus_crash.log' in your 'db' directory or your desktop, or, if it got far enough, it may write the error straight to the 'client - date.log' file in your db directory.
If that error talks about a missing library, try reinstalling your venv. Are you sure it finished correctly? Do you need to run the advanced setup and select a different version of Qt?
WindowsLinuxmacOSIf you want to redirect your database or use any other launch arguments, then copy 'hydrus_client.bat' to 'hydrus_client-user.bat' and edit it, inserting your desired db path. Run this instead of 'hydrus_client.bat'. New git pull
commands will not affect 'hydrus_client-user.bat'.
You probably can't pin your .bat file to your Taskbar or Start (and if you try and pin the running program to your taskbar, its icon may revert to Python), but you can make a shortcut to the .bat file, pin that to Start, and in its properties set a custom icon. There's a nice hydrus one in install_dir/static
.
However, some versions of Windows won't let you pin a shortcut to a bat to the start menu. In this case, make a shortcut like this:
C:\\Windows\\System32\\cmd.exe /c \"C:\\hydrus\\Hydrus Source\\hydrus_client-user.bat\"
This is a shortcut to tell the terminal to run the bat; it should be pinnable to start. You can give it a nice name and the hydrus icon and you should be good!
If you want to redirect your database or use any other launch arguments, then copy 'hydrus_client.sh' to 'hydrus_client-user.sh' and edit it, inserting your desired db path. Run this instead of 'hydrus_client.sh'. New git pull
commands will not affect 'hydrus_client-user.sh'.
If you want to redirect your database or use any other launch arguments, then copy 'hydrus_client.command' to 'hydrus_client-user.command' and edit it, inserting your desired db path. Run this instead of 'hydrus_client.command'. New git pull
commands will not affect 'hydrus_client-user.command'.
To update, you do the same thing as for the extract builds.
git pull
as normal.If you get a library version error when you try to boot, run the venv setup again. It is worth doing this anyway, every now and then, just to stay up to date.
"},{"location":"running_from_source.html#migrating_from_an_existing_install","title":"Migrating from an Existing Install","text":"Many users start out using one of the official built releases and decide to move to source. There is lots of information here about how to migrate the database, but for your purposes, the simple method is this:
If you never moved your database to another place and do not use -d/--db_dir launch parameter
db
directory.db
directory to the source.If you moved your database to another location and use the -d/--db_dir launch parameter
db
directory.This is for advanced users only.
If you have never used python before, do not try this. If the easy setup scripts failed for you and you don't know what happened, please contact hydev before trying this, as the thing that went wrong there will probably go much more wrong here.
You can also set up the environment yourself. Inside the extract should be hydrus_client.py and hydrus_server.py. You will be treating these basically the same as the 'client' and 'server' executables--with the right environment, you should be able to launch them the same way and they take the same launch parameters as the exes.
Hydrus needs a whole bunch of libraries, so let's now set your python up. I strongly recommend you create a virtual environment. It is easy and doesn't mess up your system python.
You have to do this in the correct order! Do not switch things up. If you make a mistake, delete your venv folder and start over from the beginning.
To create a new venv environment:
python3
doesn't work, use python
.python3 -m pip install virtualenv
(if you need it)python3 -m venv venv
source venv/bin/activate
(CALL venv\\Scripts\\activate.bat
in Windows cmd)python -m pip install --upgrade pip
python -m pip install --upgrade wheel
venvs
That source venv/bin/activate
line turns on your venv. You should see your terminal prompt note you are now in it. A venv is an isolated environment of python that you can install modules to without worrying about breaking something system-wide. Ideally, you do not want to install python modules to your system python.
This activate line will be needed every time you alter your venv or run the hydrus_client.py
/hydrus_server.py
files. You can easily tuck this into a launch script--check the easy setup files for examples.
On Windows Powershell, the command is .\\venv\\Scripts\\activate
, but you may find the whole deal is done much easier in cmd than Powershell. When in Powershell, just type cmd
to get an old fashioned command line. In cmd, the launch command is just venv\\scripts\\activate.bat
, no leading period.
After you have activated the venv, you can use pip to install everything you need to it from the requirements.txt in the install_dir:
python -m pip install -r requirements.txt\n
If you need different versions of libraries, check the cut-up requirements.txts the 'advanced' easy-setup uses in install_dir/static/requirements/advanced
. Check and compare their contents to the main requirements.txt to see what is going on. You'll likely need the newer OpenCV on Python 3.10, for instance.
Qt is the UI library. You can run PySide2, PySide6, PyQt5, or PyQt6. A wrapper library called qtpy
allows this. The default is PySide6, but if it is missing, qtpy will fall back to an available alternative. For PyQt5 or PyQt6, you need an extra Chart module, so go:
python -m pip install qtpy PyQtChart PyQt5\n-or-\npython -m pip install qtpy PyQt6-Charts PyQt6\n
If you have multiple Qts installed, then select which one you want to use by setting the QT_API
environment variable to 'pyside2', 'pyside6', 'pyqt5', or 'pyqt6'. Check help->about to make sure it loaded the right one.
If you want to set QT_API in a batch file, do this:
set QT_API=pyqt6
If you run <= Windows 8.1 or Ubuntu 18.04, you cannot run Qt6. Try PySide2 or PyQt5.
Qt compatibility notesIf you run into trouble running newer versions of Qt6 on Linux, some users have fixed it by installing the packages libicu-dev
and libxcb-cursor-dev
. With apt
that will be:
sudo apt-get install libicu-dev
sudo apt-get install libxcb-cursor-dev
If you still have trouble with the default Qt6 version, or you rebuilt your venv and the newer version of Qt6 gives you problems, check out the setup_venv script language and the advanced requirements.txts files it relies on in install_dir/static/requirements/advanced
. There should be several older version examples you can try out.
To install a specific version of a library with pip, activate your venv and then type something like pip install PySide6==6.3.1
.
MPV is optional and complicated, but it is great, so it is worth the time to figure out!
As well as the python wrapper, 'python-mpv' (which is in the requirements.txt), you also need the underlying dev library. This is not mpv the program, but 'libmpv', often called 'libmpv1'.
For Windows, the dll builds are here, although getting a stable version can be difficult. Just put it in your hydrus base install directory. Check the links in the easy-setup guide above for good versions. You can also just grab the 'mpv-1.dll'/'mpv-2.dll' I bundle in my extractable Windows release.
If you are on Linux, you can usually get 'libmpv1' like so:
apt-get install libmpv1
On macOS, you should be able to get it with brew install mpv
, but you are likely to find mpv crashes the program when it tries to load. Hydev is working on this, but it will probably need a completely different render API.
Hit help->about to see your mpv status. If you don't have it, it will present an error popup box with more info.
"},{"location":"running_from_source.html#sqlite","title":"SQLite","text":"If you can, update python's SQLite--it'll improve performance. The SQLite that comes with stock python is usually quite old, so you'll get a significant boost in speed. In some python deployments, the built-in SQLite not compiled with neat features like Fast Text Search (FTS) that hydrus needs.
On Windows, get the 64-bit sqlite3.dll here, and just drop it in your base install directory. You can also just grab the 'sqlite3.dll' I bundle in my extractable Windows release.
You may be able to update your SQLite on Linux or macOS with:
apt-get install libsqlite3-dev
python -m pip install pysqlite3
But as long as the program launches, it usually isn't a big deal.
Extremely safe no way it can go wrong
If you want to update SQLite for your Windows system python install, you can also drop it into C:\\Program Files\\Python310\\DLLs
or wherever you have python installed, and it'll update for all your python projects. You'll be overwriting the old file, so make a backup of the old one (I have never had trouble updating like this, however).
A user who made a Windows venv with Anaconda reported they had to replace the sqlite3.dll in their conda env at ~/.conda/envs/<envname>/Library/bin/sqlite3.dll
.
If you don't have FFMPEG in your PATH and you want to import anything more fun than jpegs, you will need to put a static FFMPEG executable in your PATH or the install_dir/bin
directory. This should always point to a new build for Windows. Alternately, you can just copy the exe from one of my extractable Windows releases.
Once you have everything set up, hydrus_client.py and hydrus_server.py should look for and run off client.db and server.db just like the executables. You can use the 'hydrus_client.bat/sh/command' scripts in the install dir or use them as inspiration for your own. In any case, you are looking at entering something like this into the terminal:
source venv/bin/activate\npython hydrus_client.py\n
This will use the 'db' directory for your database by default, but you can use the launch arguments just like for the executables. For example, this could be your client-user.sh file:
#!/bin/bash\n\nsource venv/bin/activate\npython hydrus_client.py -d=\"/path/to/database\"\n
"},{"location":"running_from_source.html#building_these_docs","title":"Building these Docs","text":"When running from source you may want to build the hydrus help docs yourself. You can also check the setup_help
scripts in the install directory.
Almost everything you get through pip is provided as pre-compiled 'wheels' these days, but if you get an error about Visual Studio C++ when you try to pip something, you have two choices:
Option B is always the simpler. If opencv-headless as the requirements.txt specifies won't compile in Python 3.10, then try a newer version--there will probably be one of these new highly compatible wheels and it'll just work in seconds. Check my build scripts and various requirements.txts for ideas on what versions to try for your python etc...
If you are confident you need Visual Studio tools, then prepare for headaches. Although the tools are free from Microsoft, it can be a pain to get them through the official (and often huge) downloader installer from Microsoft. Expect a 5GB+ install with an eye-watering number of checkboxes that probably needs some stackexchange searches to figure out.
On Windows 10, Chocolatey has been the easy answer. Get it installed and and use this one simple line:
choco install -y vcbuildtools visualstudio2017buildtools windows-sdk-10.0\n
Trust me, just do this, it will save a ton of headaches!
Update: On Windows 11, in 2023-01, I had trouble with the above. There's a couple '11' SDKs that installed ok, but the vcbuildtools stuff had unusual errors. I hadn't done this in years, so maybe they are broken for Windows 10 too! The good news is that a basic stock Win 11 install with Python 3.10 is fine getting everything on our requirements and even making a build without any extra compiler tech.
"},{"location":"running_from_source.html#additional_windows","title":"Additional Windows Info","text":"This does not matter much any more, but in the old days, building modules like lz4 and lxml was a complete nightmare, and hooking up Visual Studio was even more difficult. This page has a lot of prebuilt binaries--I have found it very helpful many times.
I have a fair bit of experience with Windows python, so send me a mail if you need help.
"},{"location":"running_from_source.html#my_code","title":"My Code","text":"I develop hydrus on and am most experienced with Windows, so the program is more stable and reasonable on that. I do not have as much experience with Linux or macOS, but I still appreciate and will work on your Linux/macOS bug reports.
My coding style is unusual and unprofessional. Everything is pretty much hacked together. If you are interested in how things work, please do look through the source and ask me if you don't understand something.
I'm constantly throwing new code together and then cleaning and overhauling it down the line. I work strictly alone. While I am very interested in detailed bug reports or suggestions for good libraries to use, I am not looking for pull requests or suggestions on style. I know a lot of things are a mess. Everything I do is WTFPL, so feel free to fork and play around with things on your end as much as you like.
"},{"location":"server.html","title":"running your own server","text":"Note
You do not need the server to do anything with hydrus! It is only for advanced users to do very specific jobs! The server is also hacked-together and quite technical. It requires a fair amount of experience with the client and its concepts, and it does not operate on a timescale that works well on a LAN. Only try running your own server once you have a bit of experience synchronising with something like the PTR and you think, 'Hey, I know exactly what that does, and I would like one!'
Here is a document put together by a user describing whether you want the server.
"},{"location":"server.html#intro","title":"setting up a server","text":"I will use two terms, server and service, to mean two distinct things:
/file
or /update
) that the hydrus client can plug into. A service might be a repository for a certain kind of data, the administration interface to manage what services run on a server, or anything else.Setting up a hydrus server is easy compared to, say, Apache. There are no .conf files to mess about with, and everything is controlled through the client. When started, the server will place an icon in your system tray in Windows or open a small frame in Linux or macOS. To close the server, either right-click the system tray icon and select exit, or just close the frame.
The basic process for setting up a server is:
Let's look at these steps in more detail:
"},{"location":"server.html#start","title":"start the server","text":"Since the server and client have so much common code, I package them together. If you have the client, you have the server. If you installed in Windows, you can hit the shortcut in your start menu. Otherwise, go straight to 'hydrus_server' or 'hydrus_server.exe' or 'hydrus_server.py' in your installation directory. The program will first try to take port 45870 for its administration interface, so make sure that is free. Open your firewall as appropriate.
"},{"location":"server.html#setting_up_the_client","title":"set up the client","text":"In the services->manage services dialog, add a new 'hydrus server administration service' and set up the basic options as appropriate. If you are running the server on the same computer as the client, its hostname is 'localhost'.
In order to set up the first admin account and an access key, use 'init' as a registration token. This special registration token will only work to initialise this first super-account.
YOU'LL WANT TO SAVE YOUR ACCESS KEY IN A SAFE PLACE
If you lose your admin access key, there is no way to get it back, and if you are not sqlite-proficient, you'll have to restart from the beginning by deleting your server's database files.
If the client can't connect to the server, it is either not running or you have a firewall/port-mapping problem. If you want a quick way to test the server's visibility, just put https://host:port
into your browser (make sure it is https! http will not work)--if it is working, your browser will probably complain about its self-signed https certificate. Once you add a certificate exception, the server should return some simple html identifying itself.
You should have a new submenu, 'administrate services', under 'services', in the client gui. This is where you control most server and service-wide stuff.
admin->your server->manage services lets you add, edit, and delete the services your server runs. Every time you add one, you will also be added as that service's first administrator, and the admin menu will gain a new entry for it.
"},{"location":"server.html#making_accounts","title":"making accounts","text":"Go admin->your service->create new accounts to create new registration tokens. Send the registration tokens to the users you want to give these new accounts. A registration token will only work once, so if you want to give several people the same account, they will have to share the access key amongst themselves once one of them has registered the account. (Or you can register the account yourself and send them all the same access key. Do what you like!)
Go admin->manage account types to add, remove, or edit account types. Make sure everyone has at least downloader (get_data) permissions so they can stay synchronised.
You can create as many accounts of whatever kind you like. Depending on your usage scenario, you may want to have all uploaders, one uploader and many downloaders, or just a single administrator. There are many combinations.
"},{"location":"server.html#have_fun","title":"???","text":"The most important part is to have fun! There are no losers on the INFORMATION SUPERHIGHWAY.
"},{"location":"server.html#profit","title":"profit","text":"I honestly hope you can get some benefit out of my code, whether just as a backup or as part of a far more complex system. Please mail me your comments as I am always keen to make improvements.
"},{"location":"server.html#backing_up","title":"btw, how to backup a repo's db","text":"All of a server's files and options are stored in its accompanying .db file and respective subdirectories, which are created on first startup (just like with the client). To backup or restore, you have two options:
server_install_dir/db/server_backup
. When the operation is complete, you can ftp/batch-copy/whatever the server_backup folder wherever you like.If you get to a point where you can no longer boot the repository, try running SQLite Studio and opening server.db. If the issue is simple--like manually changing the port number--you may be in luck. Send me an email if it is tricky.
Remember that everything is breaking all the time. Make regular backups, and you'll minimise your problems.
"},{"location":"support.html","title":"Financial Support","text":""},{"location":"support.html#support","title":"can I contribute to hydrus development?","text":"I do not expect anything from anyone. I'm amazed and grateful that anyone wants to use my software and share tags with others. I enjoy the feedback and work, and I hope to keep putting completely free weekly releases out as long as there is more to do.
That said, as I have developed the software, several users have kindly offered to contribute money, either as thanks for a specific feature or just in general. I kept putting the thought off, but I eventually got over my hesitance and set something up.
I find the tactics of most internet fundraising very distasteful, especially when they promise something they then fail to deliver. I much prefer the 'if you like me and would like to contribute, then please do, meanwhile I'll keep doing what I do' model. I support several 'put out regular free content' creators on Patreon in this way, and I get a lot out of it, even though I have no direct reward beyond the knowledge that I helped some people do something neat.
If you feel the same way about my work, I've set up a simple Patreon page here. If you can help out, it is deeply appreciated.
"},{"location":"wine.html","title":"running a client or server in wine","text":"Several Linux and macOS users have found success running hydrus with Wine. Here is a post from a Linux dude:
Some things I picked up on after extended use:
Installation process:
If you get the client running in Wine, please let me know how you get on!
"},{"location":"youDontWantTheServer.html","title":"You don't want the server","text":"The hydrus_server.exe/hydrus_server.py is the victim of many a misconception. You don't need to use the server to use Hydrus. The vast majority of features are contained in the client itself so if you're new to Hydrus, just use that.
The server is only really useful for a few specific cases which will not apply for the vast majority of users.
"},{"location":"youDontWantTheServer.html#the_server","title":"The server","text":"The Hydrus server doesn't really work as most people envision a server working. Rather than on-demand viewing, when you link with a Hydrus server, you synchronise a complete copy of all its data. For the tag repository, you download every single tag it has ever been told about. For the file repository, you download the whole file list, related file info, and every single thumbnail, which lets you browse the whole repository in your client in a regular search page--to view files in the media viewer, you need to download and import them specifically.
"},{"location":"youDontWantTheServer.html#you_dont_want_the_server_probably","title":"You don't want the server (probably)","text":"Do you want to remotely view your files? You don't want the server.
Do you want to host your files on another computer since your daily driver don't have a lot of storage space? You don't want the server.
Do you want to use multiple clients and have everything synced between them? You don't want the server.
Do you want to expose API for Hydrus Web, Hydroid, or some other third-party tool? You don't want the server.
Do you want to share some files and/or tags in a small group of friends? You might actually want the server.
"},{"location":"youDontWantTheServer.html#the_options","title":"The options","text":"Now, you're not the first person to have any of the above ideas and some of the thinkers even had enough programming know-how to make something for it. Below is a list of some options, see this page for a few more.
"},{"location":"youDontWantTheServer.html#hydrus_web","title":"Hydrus Web","text":"