Merge pull request #1070 from floogulinc/mkdocs

Use mkdocs to generate documentation
Thank you again for this work
This commit is contained in:
Hydrus Network Developer 2022-02-19 18:37:02 -06:00 committed by GitHub
commit c499059d98
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
273 changed files with 5664 additions and 4223 deletions

View File

@ -7,7 +7,7 @@ on:
jobs:
build-client:
runs-on: [ubuntu-latest]
runs-on: ubuntu-latest
steps:
-
name: Checkout
@ -53,7 +53,7 @@ jobs:
labels: ${{ steps.docker_meta.outputs.labels }}
build-server:
runs-on: [ubuntu-latest]
runs-on: ubuntu-latest
steps:
-
name: Checkout

21
.github/workflows/publish_docs.yml vendored Normal file
View File

@ -0,0 +1,21 @@
name: Publish Docs
on:
push:
branches:
- master
concurrency:
group: publish-docs
cancel-in-progress: true
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: 3.x
- run: pip install mkdocs-material mkdocs-redirects
- run: mkdocs gh-deploy --force --config-file mkdocs-gh-pages.yml

View File

@ -18,6 +18,12 @@ jobs:
id: setup_ffmpeg
with:
token: ${{ secrets.GITHUB_TOKEN }}
-
name: Install mkdocs-material
run: python3 -m pip install mkdocs-material
-
name: Build docs to /help
run: mkdocs build -d help
-
name: Install PyOxidizer
run: python3 -m pip install pyoxidizer
@ -68,6 +74,13 @@ jobs:
with:
python-version: 3.8
architecture: x64
-
name: Install mkdocs-material
run: pip install mkdocs-material
-
name: Build docs to /help
run: mkdocs build -d help
working-directory: hydrus
#- name: Cache Qt
# id: cache-qt
# uses: actions/cache@v1
@ -126,7 +139,7 @@ jobs:
retention-days: 2
build-windows:
runs-on: [windows-2019]
runs-on: windows-2019
steps:
-
name: Checkout
@ -145,6 +158,13 @@ jobs:
with:
python-version: 3.8
architecture: x64
-
name: Install mkdocs-material
run: pip install mkdocs-material
-
name: Build docs to /help
run: mkdocs build -d help
working-directory: hydrus
-
name: Cache Qt
id: cache_qt

6
.gitignore vendored
View File

@ -48,4 +48,8 @@ db/*.log
db/*.conf
db/client_files/
db/server_files/
db/missing_and_invalid_files/
db/missing_and_invalid_files/
# docs builds
site/
help/

View File

@ -6,7 +6,7 @@ I am continually working on the software and try to put out a new release every
This github repository is currently a weekly sync with my home dev environment, where I work on hydrus by myself. **Feel free to fork and do whatever you like with my code, but please do not make pull requests.** The [issue tracker here on Github](https://github.com/hydrusnetwork/hydrus/issues) is active and run by blessed volunteer users. I am not active here on Github, and I have difficulty keeping up with social media in general, but I welcome feedback of any sort and will eventually catch up with and reply to email, the 8chan or Endchan, tumblr, twitter, or the discord.
The client can do quite a lot! Please check out the help inside the release or [here](https://hydrusnetwork.github.io/hydrus/help), which includes a comprehensive getting started guide.
The client can do quite a lot! Please check out the help inside the release or [here](https://hydrusnetwork.github.io/hydrus/), which includes a comprehensive getting started guide.
A rudimentary documentation for the [container](https://github.com/hydrusnetwork/hydrus/pkgs/container/hydrus) setup can be found [here](https://github.com/hydrusnetwork/hydrus/blob/master/static/build_files/docker/README.md).

View File

@ -1,3 +1,8 @@
---
hide:
- navigation
---
# Virtual Memory Under Linux
## Why does hydrus keep crashing under Linux when it has lots of virtual memory?
@ -65,7 +70,7 @@ Set `vm.overcommit_memory=1` this prevents the OS from using a heuristic and it
#### What about swappiness?
Swapiness is a setting you might have seen, but it only determines Linux's desire to spend a little bit of time moving memory you haven't touched in a while out of real memory and into virtual memory, it will not prevent the OOM condition it just determines how much time to use for moving things into swap.
# Why does my Linux system studder or become unresponsive when hydrus has been running a while?
## Why does my Linux system studder or become unresponsive when hydrus has been running a while?
You are running out of pages because Linux releases I/O buffer pages only when a file is closed. Thus the OS is waiting for you to hit the watermark(as described in "why is hydrus crashing") to start freeing pages, which causes the chug. When contents is written from memory to disk the page is retained so that if you reread that part of the disk the OS does not need to access disk it just pulls it from the much faster memory. This is usually a good thing, but Hydrus does not close database files so it eats up pages over time. This is really good for hydrus but sucks for the responsiveness of other apps, and will cause hydrus to consume pages after doing a lengthy operation in anticipation of needing them again, even when it is thereafter idle. You need to set `vm.dirtytime_expire_seconds` to a lower value.
@ -84,7 +89,7 @@ https://www.kernel.org/doc/Documentation/sysctl/vm.txt
# Why does everything become clunky for a bit if I have tuned all of the above settings?
## Why does everything become clunky for a bit if I have tuned all of the above settings?
The kernel launches a process called `kswapd` to swap and reclaim memory pages, its behaviour is goverened by the following two values

98
docs/PTR.md Normal file
View File

@ -0,0 +1,98 @@
---
title: PTR Guide
---
# PTR for Dummies
or *Myths and facts about the Public Tag Repository*
## What is the PTR?
Short for <b>P</b>ublic <b>T</b>ag <b>R</b>epository, a now community managed repository of tags. Locally it acts as a tag service, just like `my tags`. At the time of writing 54 million files have tags on it. The PTR only store the sha256 hash and tag mappings of a file, not the files themselves or any non-tag meta data. In other words: If you don not see it in the tag list then it is not stored.
Most of the things in this document also applies to [self-hosted servers](https://hydrusnetwork.github.io/hydrus/help/server.html), except for tag guidelines.
## Connecting to the PTR
The easiest method is to use the built in function, found under `help -> add the public tag repository`. For adding it manually, if you so desire, read the Hydrus help document on [access keys](access_keys.md).
If you are starting out completely fresh you can also download a fully synced client [here](https://koto.reisen/quicksync/). Though possibly a bit (couple of days or so usually) out of date it will none the less save time. Some settings may differ from the defaults of an official installation.
Once you are connected Hydrus will proceed to download and then process the update files. The progress of this can be seen under `services -> review services -> remote -> tag repositories -> public tag repository`. Here you can view its status, your account (the default account is a shared public account. Currently only janitors and the administrator have personal accounts), tag status, and how synced you are. Being behind on the sync by a certain amount makes you unable to push tags and petitions until you are caught up again.
## How does it work?
For something to end up on the PTR it has to be pushed there. Tags can either be entered into the tag service manually by the user through the `manage tags` window, or be routed there by a parser when downloading files. See [parsing tags](https://hydrusnetwork.github.io/hydrus/help/getting_started_downloading.html). Once tags have been entered into the PTR tag service they are pending until pushed. This is indicated by the `pending ()` that will appear between `tags` and `help` in the menu bar. Here you can chose to either push your changes to the PTR or discard them.
- Adding tags pass automatically.
- Deleting (petitioning) tags requires janitor action.
- If a tag has been deleted from a file it will not be added again.
- Currently there is no way for a normal user to re-add a deleted tag. If it gets deleted then it is gone. A janitor can undelete tags manually.
- Adding and petitioning siblings and parents all require janitor action.
- The client always assumes the server approves any petition. If your petition gets rejected you wont know.
When making petitions it is important to remember that janitors are only human. We do not necessarily know everything about every niche. We do not necessarily have the files you are making changes for and we will only see a blank thumbnail if we do not have the file. Explain why you are making a petition. Try and keep the number of files manageable. If a janitor at any point is unsure if the petition is correct they are likely to deny the entire petition rather than risk losing good tags. Some users have pushed changes regarding hundreds of tags over thousands of files at once, but due to disregarding PTR tagging practices or being lazy with justification the petition has been denied entirely. Or they have just been plain wrong, trying to impose frankly stupid tagging methods.
Furthermore, if you are two weeks out of sync with PTR you are unable to push additions or deletions until you're back within the threshold.
---
Q: Does this automagically tag my files?
: A: No. Until we get machine learning based auto-tagging nothing is truly automatic. All tags on the PTR were uploaded by another user, so if nobody uploaded tags associated with the hash of your file it won't have any tags in the PTR.
Q: How good is the PTR at tagging [insert file format or thing from site here]?
: A: That depends largely on if there's a scrapable database of tags for whatever you're asking about. Anything that comes from a booru or site that supports tags is fairly likely to have something on the PTR. Original content on some obscure chan-style imageboard is less so.
Q: Help! My files don't have any tags! What do!?
: A: As stated above, some things are just very likely to not have any tags. It is also possible that the files have been altered by whichever service you downloaded from. Imgur, Reddit, Discord, and many other sites and services recompress images to save space which might give it a different hash even if it looks indistinguishable from the original file. Use one of the IQDB lookup programs linked in [Cuddle's wiki](https://github.com/CuddleBear92/Hydrus-Presets-and-Scripts/wiki/0-Hydrus-Apps-and-Scripts).
Q: Why is my database so big!? This can't be right.
: A: It is working as intended. The size is because you are literally downloading and processing the entire tag database and history of the PTR. It is done this way to ensure redundancy and privacy. Redundancy because anybody with an up-to-date PTR sync can just start their own. Privacy because nobody can tell what files you have since you are downloading the tags for everything the PTR has.
Q: Does that mean I can't do anything about the size?
: A: Correct. There are some plans to crunch the size through a few methods but there are a lot of other far more requested features being, well, requested. Speaking crassly if you are bothered by the size requirement of the PTR you probably don't have a big enough library to really benefit and would be better of just using the IQDB script.
---
## Janitors
Janitors are the people that review petitions. You can meet us at the community [Discord](https://discord.gg/3H8UTpb) to ask questions or see us bitch about some of the silly stuff boorus and users cause to end up in the PTR.
## Tag Guidelines
These are a mix of standard practice used by various boorus and changes made by Hydrus Developer and PTR users, ratified by the janitors that actually have to manage all of this. The "full" document is viewable at [Cuddle's git repo](https://raw.githubusercontent.com/CuddleBear92/Hydrus-Presets-and-Scripts/master/tag%20guidelines). See Hydrus Developer's [thoughts on a public tagging schema](tagging_schema.html).
If you are looking to help out by tagging low tag-count files, remember to keep the tags objective, start simple by for example adding the characters/persons and big obvious things in the image or what else. Tagging every little thing and detail is a sure path to burnout.
If you are looking to petition removal of tags then it is preferable to sibling common misspellings, underscores, and defunct tags rather than deleting them outright. The exception is for ambiguous tags where it is better to delete and replace with a less ambiguous tag. When deleting tags that don't belong in the image it can be helpful if you include a short description as to why.
It's also helpful if you sanitise downloaded tags from sites with tagged galleries before pushing them to the PTR. For example Pixiv, where you can have a gallery of multiple images, each containing one character, and all of the characters being tagged. Consequently all images in that gallery will have all of the character tags despite no image having more than one character.
### Siblings and parents
When making siblings, go for the closest less-bad tag. Example: `bad_tag` -> `bad tag`, rather than going for what the top level sibling might be. This creates less potential future work in case standards change and makes it so your request is less likely to be denied by a janitor not being entirely certain that what you're asking is right. Be careful about creating siblings for potentially ambiguous tags. Is `james bond` supposed to be `character:james bond` or is it `series:james bond`? This is a bit of a bad example due to having the case of the character always belonging to the series, so you can safely sibling it to `series:james bond` since all instances of the character will also have the series, but not all instances of the series will have the character. So let us look at another example: how about `wool`? Is it the material harvested from sheep, or is it the Malaysian artist that likes to draw Touhou? In doubtful cases it's better to leave it as is, petition the tag for deletion if it's incorrect and add the correct tag.
When making parents, make sure it's an always factually correct relationship. `character:james bond` always belongs to `series:james bond`. But `character:james bond` is not always `person:pierce brosnan`. Common examples of not-always true relationships: gender (genderbending), species (furrynisation/humanisation/anthropomorphism), hair colour, eye colour, and other mutable traits.
### Namespaces
`creator:`
: Used for the creator of the tagged piece of media. Hydrus being primarily used for images it will often be the artist that drew the image. Other potential examples are the author of a book or musician for a song.
`character:`
: Refers to characters. James Bond is a character.
`person:`
: Refers to real persons. Pierce Brosnan is a person.
`series:`
: Used for series. *James Bond* is a series tag and so is *GoldenEye*. Due to usage being different on some boorus chance is that you will also see things like Absolut Vodka and other brands in it.
`photoset:`
: Used for photosets. Primarily seen for content from idols, cosplayers, and gravure idols.
`studio:`
: Is used for the entity that facilitated the production of the file or what's in it. Eon Productions for the *James Bond* movies.
`species:`
: Species of the depicted characters/people/animals. Somewhat controversial for being needlessly detailed, some janitors not liking the namespace at all. Primarily used for furry content.
`title:`
: The title of the file. One of the tags Hydrus uses for various purposes such as sorting and collecting. Somewhat tainted by rampant Reddit parsers.
`medium:`
: Used for tags about the image and how it's made. Photography, water painting, napkin sketch as a few examples. White background, simple background, checkered background as a few others. **What you see about the image.**
`meta:`
: This namespace is used for information that isn't visible in the image itself or where you might need to go to the source. Some examples include: third-party edit, paid reward (patreon/enty/gumroad/fantia/fanbox), translated, commentary, and such. **What you know about the image.**
Namespaces not listed above are not "supported" by the janitors and are liable to get siblinged out, removed, and/or mocked if judged being bad and annoying enough to justify the work. Do not take this to mean that all un-listed namespaces are bad, some are created and used by parsers to indicate where an image came from which can be helpful if somebody else wants to fetch the original or check source tags against the PTR tags. But do exercise some care in what you put on the PTR if you use custom namespaces. Recently `clothing:` was removed due to being disliked, no booru using it, and the person(s) pushing for it seeming to have disappeared, leaving a less-than-finished mess behind. It was also rife with lossy siblings and things that just plain don't belong with clothing, such as `clothing:brown hair`.

View File

@ -74,9 +74,9 @@ Lets revisit the journal, this time with two transactions. Note that the databa
3. Write Change 2
4. Read data
5. Write Change 3
~~6. FLUSH~~
6. ~~FLUSH~~
7. End Transaction 1
~~8. FLUSH~~
8. ~~FLUSH~~
9. Begin Transaction 2
10. Write Change 2
11. Write Ch

34
docs/about_docs.md Normal file
View File

@ -0,0 +1,34 @@
# About These Docs
The Hydrus docs are built with [MkDocs](https://www.mkdocs.org/) using the [Material for MkDocs](https://squidfunk.github.io/mkdocs-material/) theme.
## Local Setup
To work on the docs locally, [install `mkdocs-material`](https://squidfunk.github.io/mkdocs-material/getting-started/):
The recommended installation method is `pip`:
```
pip install mkdocs-material
```
## Live Preview
Once installed you can run the live preview development server with
```
mkdocs serve
```
It will automatically rebuild the site when you save it and reload the page in your browser.
## Building
To build the docs site (for example if running from source), run:
```
mkdocs build
```
This by default builds the docs into the `site/` directory. To build to the traditional `help/` directory use
```
mkdocs build -d help
```
The docs are automatically deployed to GitHub Pages on every push to the `master` branch. They are also built automatically in the GitHub Actions workflows for each release and included in those builds.

69
docs/access_keys.md Normal file
View File

@ -0,0 +1,69 @@
---
title: PTR access keys
---
The PTR is now run by users with more bandwidth than I had to give, so the bandwidth limits are gone! If you would like to talk with the new management, please check the [discord](https://discord.gg/wPHPCUZ).
A guide and schema for the new PTR is [here](https://github.com/Zweibach/text/blob/master/Hydrus/PTR.md).
## first off
I don't like it when programs I use connect anywhere without asking me, so I have purposely not pre-baked any default repositories into the client. You have to choose to connect yourself. **The client will never connect anywhere until you tell it to.**
For a long time, I ran the Public Tag Repository myself and was the lone janitor. It grew to 650 million tags, and siblings and parents were just getting complicated, and I no longer had the bandwidth or time it deserved. It is now run by users.
There also used to be just one user account that everyone shared. Everyone was essentially the same Anon, and all uploads were merged to that one ID. As the PTR became more popular, and more sophisticated and automatically generated content was being added, it became increasingly difficult for the janitors to separate good submissions from bad and undo large scale mistakes.
That old shared account is now a 'read-only' account. This account can only download--it cannot upload new tags or siblings/parents. Users who want to upload now generate their own individual accounts, which are still Anon, but separate, which helps janitors approve and deny uploaded petitions more accurately and efficiently.
I recommend using the shared read-only account, below, to start with, but if you decide you would like to upload, making your own account is easy--just click the 'check for automatic account creation' button in _services->manage services_, and you should be good. You can change your access key on an existing service--you don't need to delete and re-add or anything--and your client should quickly resync and recognise your new permissions.
## privacy
I have tried very hard to ensure the PTR respects your privacy. Your account is a very barebones thing--all a server stores is a couple of random hexadecimal texts and which rows of content you uploaded, and even the memory of what you uploaded is deleted after a delay. The server obviously needs to be aware of your IP address to accept your network request, but it forgets it as soon as the job is done. Normal users are never told which accounts submitted any content, so the only privacy implications are against janitors or (more realistically, since the janitor UI is even more buggy and feature-poor than the hydrus front-end!) the server owner or anyone else with raw access to the server as it operates or its database files.
Most users should have very few worries about privacy. The general rule is that it is always healthy to use a VPN, but please check [here for a full discussion and explanation of the anonymisation routine](privacy.md).
## a note on resources { id="ssd" }
!!! danger
**If you are on an HDD, or your SSD does not have at least 64GB of free space, do not add the PTR!**
The PTR has been operating since 2011 and is now huge, more than a billion mappings! Your client will be downloading and indexing them all, which is currently (2021-06) about 6GB of bandwidth and 50GB of hard drive space. It will take _hours_ of total processing time to catch up on all the years of submissions. Furthermore, because of mechanical drive latency, HDDs are too slow to process all the content in reasonable time. Syncing is only recommended if your [hydrus db is on an SSD](database_migration.md). Even then, it is healthier and allows the client to 'grow into' the PTR if the work is done in small pieces in the background, either during idle time or shutdown time, rather than trying to do it all at once. Just leave it to download and process on its own--it usually takes a couple of weeks to quietly catch up. You'll see tags appear on your files as it proceeds, first on older, then all the way up to new files just uploaded a couple days ago. Once you are synced, the daily processing work to stay synced is usually just a few minutes. If you leave your client on all the time in the background, you'll likely never notice it.
## easy setup
Hit _help->add the public tag repository_ and you will all be set up.
## manually
Hit _services->manage services_ and click _add->hydrus tag repository_. You'll get a panel, fill it out like this:
![](images/edit_repos_public_tag_repo.png)
Here's the info so you can copy it:
``` title="address"
ptr.hydrus.network
```
``` title="port"
45871
```
``` title="access key"
4a285629721ca442541ef2c15ea17d1f7f7578b0c3f4f5f2a05f8f0ab297786f
```
Note that because this is the public shared key, you can ignore the '<span class="hydrus-warning">DO NOT SHARE</span>' red text warning.
It is worth checking the 'test address' and 'test access key' buttons just to double-check your firewall and key are all correct. Notice the 'check for automatic account creation' button, for if and when you decide you want to contribute to the PTR.
Then you can check your PTR at any time under _services->review services_, under the 'remote' tab:
![](images/review_repos_public_tag_repo.png)
## jump-starting an install { id="quicksync" }
A user kindly manages a store of update files and pre-processed empty client databases to get your synced quicker. This is generally recommended for advanced users or those following a guide, but if you are otherwise interested, please check it out:
[https://cuddlebear92.github.io/Quicksync/](https://cuddlebear92.github.io/Quicksync/)

View File

@ -0,0 +1,21 @@
---
title: adding new downloaders
---
# adding new downloaders
## all downloaders are user-creatable and -shareable { id="anonymous" }
Since the big downloader overhaul, all downloaders can be created, edited, and shared by any user. Creating one from scratch is not simple, and it takes a little technical knowledge, but importing what someone else has created is easy.
Hydrus objects like downloaders can sometimes be shared as data encoded into png files, like this:
![](images/easy-import-realbooru.com-search-2018.09.21.png)
This contains all the information needed for a client to add a realbooru tag search entry to the list you select from when you start a new download or subscription.
You can get these pngs from anyone who has experience in the downloader system. An archive is maintained [here](https://github.com/CuddleBear92/Hydrus-Presets-and-Scripts/tree/master/Downloaders).
To 'add' the easy-import pngs to your client, hit _network->downloaders->import downloaders_. A little image-panel will appear onto which you can drag-and-drop these png files. The client will then decode and go through the png, looking for interesting new objects and automatically import and link them up without you having to do any more. Your only further input on your end is a 'does this look correct?' check right before the actual import, just to make sure there isn't some mistake or other glaring problem.
Objects imported this way will take precedence over existing functionality, so if one of your downloaders breaks due to a site change, importing a fixed png here will overwrite the broken entries and become the new default.

98
docs/advanced.md Normal file
View File

@ -0,0 +1,98 @@
---
title: general clever tricks
---
!!! note "this is non-comprehensive"
I am always changing and adding little things. The best way to learn is just to look around. If you think a shortcut should probably do something, try it out! If you can't find something, let me know and I'll try to add it!
## advanced mode { id="advanced_mode" }
To avoid confusing clutter, several advanced menu items and buttons are hidden by default. When you are comfortable with the program, hit _help->advanced mode_ to reveal them!
## exclude deleted files { id="exclude_deleted_files" }
In the client's options is a checkbox to exclude deleted files. It recurs pretty much anywhere you can import, under 'import file options'. If you select this, any file you ever deleted will be excluded from all future remote searches and import operations. This can stop you from importing/downloading and filtering out the same bad files several times over. The default is off. You may wish to have it set one way most of the time, but switch it the other just for one specific import or search.
## inputting non-english lanuages { id="ime" }
If you typically use an IME to input Japanese or another non-english language, you may have encountered problems entering into the autocomplete tag entry control in that you need Up/Down/Enter to navigate the IME, but the autocomplete steals those key presses away to navigate the list of results. To fix this, press Insert to temporarily disable the autocomplete's key event capture. The autocomplete text box will change colour to let you know it has released its normal key capture. Use your IME to get the text you want, then hit Insert again to restore the autocomplete to normal behaviour.
## tag display { id="tag_display" }
If you do not like a particular tag or namespace, you can easily hide it with _services->manage tag display_:
_This image is out of date, sorry!_
![](images/tag_censorship.png)
You can exclude single tags, like as shown above, or entire namespaces (enter the colon, like 'species:'), or all namespaced tags (use ':'), or all unnamespaced tags (''). 'all known tags' will be applied to everything, as well as any repository-specific rules you set.
A blacklist excludes whatever is listed; a whitelist excludes whatever is _not_ listed.
This censorship is local to your client. No one else will experience your changes or know what you have censored.
## importing and adding tags at the same time { id="importing_with_tags" }
_Add tags before importing_ on _file->import files_ lets you give tags to the files you import _en masse_, and intelligently, using regexes that parse filename:
![](images/gunnerkrigg_import.png)
This should be somewhat self-explanatory to anyone familiar with regexes. I hate them, personally, but I recognise they are powerful and exactly the right tool to use in this case. [This](http://www.aivosto.com/vbtips/regex.html) is a good introduction.
Once you are done, you'll get something neat like this:
![](images/gunnerkrigg_page.png)
Which you can more easily manage by collecting:
![](images/gunnerkrigg_chapter.png)
Collections have a small icon in the bottom left corner. Selecting them actually selects many files (see the status bar), and performing an action on them (like archiving, uploading) will do so to every file in the collection. Viewing collections fullscreen pages through their contents just like an uncollected search.
Here is a particularly zoomed out view, after importing volume 2:
![](images/gunnerkrigg_volume.png)
Importing with tags is great for long-running series with well-formatted filenames, and will save you literally hours' finicky tagging.
## tag migration { id="tag_migration" }
!!! danger
At _some_ point I will write some better help for this system, which is powerful. Be careful with it!
Sometimes, you may wish to move thousands or millions of tags from one place to another. These actions are now collected in one place: _services->tag migration_.
![](images/tag_migration.png)
It proceeds from left to right, reading data from the source and applying it to the destination with the certain action. There are multiple filters available to select which sorts of tag mappings or siblings or parents will be selected from the source. The source and destination can be the same, for instance if you wanted to delete all 'clothing:' tags from a service, you would pull all those tags and then apply the 'delete' action on the same service.
You can import from and export to Hydrus Tag Archives (HTAs), which are external, portable .db files. In this way, you can move millions of tags between two hydrus clients, or share with a friend, or import from an HTA put together from a website scrape.
Tag Migration is a powerful system. Be very careful with it. Do small experiments before starting large jobs, and if you intend to migrate millions of tags, make a backup of your db beforehand, just in case it goes wrong.
This system was once much more simple, but it still had HTA support. If you wish to play around with some HTAs, there are some old user-created ones [here](https://www.mediafire.com/folder/yoy1dx6or0tnr/tag_archives).
## custom shortcuts { id="shortcuts" }
Once you are comfortable with manually setting tags and ratings, you may be interested in setting some shortcuts to do it quicker. Try hitting _file->shortcuts_ or clicking the keyboard icon on any media viewer window's top hover window.
There are two kinds of shortcuts in the program--_reserved_, which have fixed names, are undeletable, and are always active in certain contexts (related to their name), and _custom_, which you create and name and edit and are only active in a media viewer when you want them to. You can redefine some simple shortcut commands, but most importantly, you can create shortcuts for adding/removing a tag or setting/unsetting a rating.
Use the same 'keyboard' icon to set the current and default custom shortcuts.
## finding duplicates { id="finding_duplicates" }
_system:similar_to_ lets you run the duplicates processing page's searches manually. You can either insert the hash and hamming distance manually, or you can launch these searches automatically from the thumbnail _right-click->find similar files_ menu. For example:
![](images/similar_gununu.png)
## truncated/malformed file import errors { id="file_import_errors" }
Some files, even though they seem ok in another program, will not import to hydrus. This is usually because they file has some 'truncated' or broken data, probably due to a bad upload or storage at some point in its internet history. While sophisticated external programs can usually patch the error (often rendering the bottom lines of a jpeg as grey, for instance), hydrus is not so clever. Please feel free to send or link me, hydrus developer, to these files, so I can check them out on my end and try to fix support.
If the file is one you particularly care about, the easiest solution is to open it in photoshop or gimp and save it again. Those programs should be clever enough to parse the file's weirdness, and then make a nice clean saved file when it exports. That new file should be importable to hydrus.
## setting a password { id="password" }
the client offers a very simple password system, enough to keep out noobs. You can set it at _database->set a password_. It will thereafter ask for the password every time you start the program, and will not open without it. However none of the database is encrypted, and someone with enough enthusiasm or a tool and access to your computer can still very easily see what files you have. The password is mainly to stop idle snoops checking your images if you are away from your machine.

76
docs/advanced_parents.md Normal file
View File

@ -0,0 +1,76 @@
---
title: tag parents
---
Tag parents let you automatically add a particular tag every time another tag is added. The relationship will also apply retroactively.
## what's the problem? { id="the_problem" }
Tags often fall into certain heirarchies. Certain tags _always_ imply certain other tags, and it is annoying and time-consuming to add them all individually every time.
For example, whenever you tag a file with _ak-47_, you probably also want to tag it _assault rifle_, and maybe even _firearm_ as well.
![](images/tag_parents_venn.png)
Another time, you might tag a file _character:eddard stark_, and then also have to type in _house stark_ and then _series:game of thrones_. (you might also think _series:game of thrones_ should actually be _series:a song of ice and fire_, but that is an issue for [siblings](advanced_siblings.md))
Drawing more relationships would make a significantly more complicated venn diagram, so let's draw a family tree instead:
![](images/tag_parents_got.png)
## tag parents { id="tag_parents" }
Let's define the child-parent relationship 'C->P' as saying that tag P is the semantic superset/superclass of tag C. **All files that have C should also have P, without exception.** When the user tries to add tag C to a file, tag P is added automatically.
Let's expand our weapon example:
![](images/tag_parents_firearms.png)
In that graph, adding _ar-15_ to a file would also add _semi-automatic rifle_, _rifle_, and _firearm_. Searching for _handgun_ would return everything with _m1911_ and _smith and wesson model 10_.
This can obviously get as complicated and autistic as you like, but be careful of being too confident--this is just a fun example, but is an AK-47 truly _always_ an assault rifle? Some people would say no, and beyond its own intellectual neatness, what is the purpose of attempting to create such a complicated and 'perfect' tree? Of course you can create any sort of parent tags on your local tags or your own tag repositories, but this sort of thing can easily lead to arguments between reasonable people. I only mean to say, as someone who does a lot of tag work, to try not to create anything 'perfect', as it usually ends up wasting time. Act from need, not toward purpose.
## how you do it { id="how_to_do_it" }
Go to _services->manage tag parents_:
![](images/tag_parents_dialog.png)
Which looks and works just like the manage tag siblings dialog.
Note that when you hit ok, the client will look up all the files with all your added tag Cs and retroactively apply/pend the respective tag Ps if needed. This could mean thousands of tags!
Once you have some relationships added, the parents and grandparents will show indented anywhere you 'write' tags, such as the manage tags dialog:
![](images/tag_parents_ac_1.png)
Hitting enter on cersei will try to add _house lannister_ and _series:game of thrones_ as well.
![](images/tag_parents_ac_2.png)
## remote parents { id="remote_parents" }
Whenever you add or remove a tag parent pair to a tag repository, you will have to supply a reason (like when you petition a tag). A janitor will review this petition, and will approve or deny it. If it is approved, all users who synchronise with that tag repository will gain that parent pair. If it is denied, only you will see it.
## parent 'favourites' { id="parent_favourites" }
As you use the client, you will likely make several processing workflows to archive/delete your different sorts of imports. You don't always want to go through things randomly--you might want to do some big videos for a bit, or focus on a particular character. A common search page is something like `[system:inbox, creator:blah, limit:256]`, which will show a sample of a creator in your inbox, so you can process just that creator. This is easy to set up and save in your favourite searches and quick to run, so you can load it up, do some archive/delete, and then dismiss it without too much hassle.
But what happens if you want to search for multiple creators? You might be tempted to make a large OR search predicate, like `creator:aaa OR creator:bbb OR creator:ccc OR creator:ddd`, of all your favourite creators so you can process them together as a 'premium' group. But if you want to add or remove a creator from that long OR, it can be cumbersome. And OR searches can just run slow sometimes. One answer is to use the new tag parents tools to apply a 'favourite' parent on all the artists and then search for that favourite.
Let's assume you want to search bunch of 'creator' tags on the PTR. What you will do is:
* Create a new 'local tag service' in _manage services_ called 'my parent favourites'. This will hold our subjective parents without uploading anything to the PTR.
* Go to _tags->manage where tag siblings and parents apply_ and add 'my parent favourites' as the top priority for parents, leaving 'PTR' as second priority.
* Under _tags->manage tag parents_, on your 'my parent favourites' service, add:
* `creator:aaa->favourite:aesthetic art`
* `creator:bbb->favourite:aesthetic art`
* `creator:ccc->favourite:aesthetic art`
* `creator:ddd->favourite:aesthetic art`
Watch/wait a few seconds for the parents to apply across the PTR for those creator tags.
* Then save a new favourite search of `[system:inbox, favourite:aesthetic art, limit:256]`. This search will deliver results with any of the child 'creator' tags, just like a big OR search, and real fast!
If you want to add or remove any creators to the 'aesthetic art' group, you can simply go back to _tags->manage tag parents_, and it will apply everywhere. You can create more umbrella/group tags if you like (and not just creators--think about clothing, or certain characters), and also use them in regular searches when you just want to browse some cool files.

79
docs/advanced_siblings.md Normal file
View File

@ -0,0 +1,79 @@
---
title: tag siblings
---
Tag siblings let you replace a bad tag with a better tag.
## what's the problem? { id="the_problem" }
Reasonable people often use different words for the same things.
A great example is in Japanese names, which are natively written surname first. `character:ayanami rei` and `character:rei ayanami` have the same meaning, but different users will use one, or the other, or even both.
Other examples are tiny syntactic changes, common misspellings, and unique acronyms:
* _smiling_ and _smile_
* _staring at camera_ and _looking at viewer_
* _pokemon_ and _pokémon_
* _jersualem_ and _jerusalem_
* _lotr_ and _series:the lord of the rings_
* _marimite_ and _series:maria-sama ga miteru_
* _ishygddt_ and _i sure hope you guys don't do that_
A particular repository may have a preferred standard, but it is not easy to guarantee that all the users will know exactly which tag to upload or search for.
After some time, you get this:
![](images/tag_siblings_venn_1.png)
Without continual intervention by janitors or other experienced users to make sure y⊇x (i.e. making the yellow circle entirely overlap the blue by manually giving y to everything with x), searches can only return x (blue circle) or y (yellow circle) or x∩y (the lens-shaped overlap). What we really want is xy (both circles).
So, how do we fix this problem?
## tag siblings { id="tag_siblings" }
Let's define a relationship, **A->B**, that means that any time we would normally see or use tag A or tag B, we will instead only get tag B:
![](images/tag_siblings_venn_2.png)
Note that this relationship implies that B is in some way 'better' than A.
## ok, I understand; now confuse me { id="more_complicated" }
This relationship is transitive, which means as well as saying A->B, you can also say B->C, which implies A->C and B->C.
![](images/tag_siblings_usa.png)
You can also have an A->C and B->C that does not include A->B.
![](images/tag_siblings_what_is_a_man.png)
The outcome of these two arrangements is the same (everything ends up as C), but the underlying semantics are a little different if you ever want to edit them.
Many complicated arrangements are possible:
![](images/tag_siblings_yo_dawg.png)
Note that if you say A->B, you cannot say A->C; the left-hand side can only go to one. The right-hand side can receive many. The client will stop you from constructing loops.
## how you do it { id="how_to_do_it" }
Just open _services->manage tag siblings_, and add a few.
![](images/tag_siblings_dialog.png)
The client will automatically collapse the tagspace to whatever you set. It'll even work with autocomplete, like so:
![](images/tag_siblings_rei.png)
Please note that siblings' autocomplete counts may be slightly inaccurate, as unioning the _count_ is difficult to quickly estimate.
The client will not collapse siblings anywhere you 'write' tags, such as the manage tags dialog. You will be able to add or remove A as normal, but it will be written in some form of "A (B)" to let you know that, ultimately, the tag will end up displaying in the main gui as B:
![](images/tag_siblings_ac_write.png)
Although the client may present A as B, it will secretly remember A! You can remove the association A->B, and everything will return to how it was. **No information is lost at any point.**
## remote siblings { id="remote_siblings" }
Whenever you add or remove a tag sibling pair to a tag repository, you will have to supply a reason (like when you petition a tag). A janitor will review this petition, and will approve or deny it. If it is approved, all users who synchronise with that tag repository will gain that sibling pair. If it is denied, only you will see it.

View File

@ -0,0 +1 @@
<svg xmlns="http://www.w3.org/2000/svg" fill-rule="evenodd" stroke-linejoin="round" stroke-miterlimit="2" clip-rule="evenodd" viewBox="0 0 688 934"><path fill-rule="nonzero" d="M301.334 0v678.8l-6.267-.8c-15.2-1.734-19.067-2.4-30.4-4.934-47.467-10.533-79.467-35.466-96-74.666-4.133-9.6-8.8-24.267-10.8-33.467-2.4-10.933-4.667-23.733-5.333-29.467-.4-3.733-1.067-8.266-1.334-10.133-1.866-10.667-2.533-52.267-2.533-171.6V220H0v25.333l3.734.8c24.666 5.467 41.2 17.067 46.666 32.8 6 17.067 7.2 31.334 7.734 89.734.933 93.2 1.733 145.199 2.4 149.999.266 2.534.933 7.6 1.333 11.334 1.067 8 2.8 18.266 5.6 33.466.267 1.467 2.4 9.067 4.667 16.667 18 60.667 52.533 98.267 110.666 120.533 29.467 11.334 72 18.534 114.267 19.2l4.267.134v213.333h85.333V720l3.066-.134c15.6-.666 32.534-1.866 44.934-3.2 28.8-3.333 58.666-12 84.666-24.666 50.667-24.934 82-65.867 98-128.134 1.6-6.133 3.2-12.666 3.467-14.533.4-1.867 1.2-6.4 1.867-10 .666-3.733 1.6-9.333 2-12.667.533-3.333 1.066-7.866 1.466-10 2-14.133 2.534-32 3.867-150.666.267-30.8.8-57.867 1.2-60 .267-2.267 1.067-7.333 1.467-11.333 3.2-25.6 12.8-36.134 42-45.734l12.666-4.266L687.2 234c-.133-5.867-.267-11.467-.4-12.4-.133-1.333-15.333-1.6-73.733-1.333l-73.6.4L539.333 358c-.133 75.467-.666 142.4-1.2 148.666-2.133 23.067-2.666 27.067-4.933 40.667-.8 4.4-1.733 9.467-2 11.333-1.2 6.267-7.2 27.334-9.733 34.134-7.6 19.6-15.067 31.6-28 44.266-19.067 18.8-42.534 29.734-80.134 37.467-6.933 1.333-15.866 2.933-19.6 3.333l-7.066.8V0h-85.333z"/></svg>

After

Width:  |  Height:  |  Size: 1.5 KiB

View File

@ -0,0 +1 @@
<svg xmlns="http://www.w3.org/2000/svg" fill-rule="evenodd" stroke-linejoin="round" stroke-miterlimit="2" clip-rule="evenodd" viewBox="0 0 688 934"><path fill="#fff" fill-rule="nonzero" d="M301.334 0v678.8l-6.267-.8c-15.2-1.734-19.067-2.4-30.4-4.934-47.467-10.533-79.467-35.466-96-74.666-4.133-9.6-8.8-24.267-10.8-33.467-2.4-10.933-4.667-23.733-5.333-29.467-.4-3.733-1.067-8.266-1.334-10.133-1.866-10.667-2.533-52.267-2.533-171.6V220H0v25.333l3.734.8c24.666 5.467 41.2 17.067 46.666 32.8 6 17.067 7.2 31.334 7.734 89.734.933 93.2 1.733 145.199 2.4 149.999.266 2.534.933 7.6 1.333 11.334 1.067 8 2.8 18.266 5.6 33.466.267 1.467 2.4 9.067 4.667 16.667 18 60.667 52.533 98.267 110.666 120.533 29.467 11.334 72 18.534 114.267 19.2l4.267.134v213.333h85.333V720l3.066-.134c15.6-.666 32.534-1.866 44.934-3.2 28.8-3.333 58.666-12 84.666-24.666 50.667-24.934 82-65.867 98-128.134 1.6-6.133 3.2-12.666 3.467-14.533.4-1.867 1.2-6.4 1.867-10 .666-3.733 1.6-9.333 2-12.667.533-3.333 1.066-7.866 1.466-10 2-14.133 2.534-32 3.867-150.666.267-30.8.8-57.867 1.2-60 .267-2.267 1.067-7.333 1.467-11.333 3.2-25.6 12.8-36.134 42-45.734l12.666-4.266L687.2 234c-.133-5.867-.267-11.467-.4-12.4-.133-1.333-15.333-1.6-73.733-1.333l-73.6.4L539.333 358c-.133 75.467-.666 142.4-1.2 148.666-2.133 23.067-2.666 27.067-4.933 40.667-.8 4.4-1.733 9.467-2 11.333-1.2 6.267-7.2 27.334-9.733 34.134-7.6 19.6-15.067 31.6-28 44.266-19.067 18.8-42.534 29.734-80.134 37.467-6.933 1.333-15.866 2.933-19.6 3.333l-7.066.8V0h-85.333z"/></svg>

After

Width:  |  Height:  |  Size: 1.5 KiB

View File

@ -0,0 +1,22 @@
:root {
--md-text-font: Roboto, ui-sans-serif, -apple-system, BlinkMacSystemFont, "Segoe UI", Ubuntu, sans-serif;
--md-code-font: "Roboto Mono", ui-monospace, Consolas, Menlo, monospace;
}
.spoiler:not(:hover) {
color: #000;
background-color: #000;
}
.hydrus-warning {
color: #d00;
}
.dealwithit {
color: #ff0000;
text-shadow:
1px 1px 0 #ff7f00,
2px 2px 0 #ffff00,
3px 3px 0 #00ff00,
4px 4px 0 #0000ff,
5px 5px 0 #6600ff,
6px 6px 0 #8b00ff;
}

299
docs/changelog.md Normal file
View File

@ -0,0 +1,299 @@
# Changelog
!!! note
This is the new changelog. For versions prior to 466 see the [old changelog](old_changelog.html).
## [Version 474](https://github.com/hydrusnetwork/hydrus/releases/tag/v474)
### command palette
* the guy who put the command pallete together has fixed a 'show palette' bug some people encountered (issue #1060)
* he also added mouse support!
* he added support to show checkable menu items too, and I integrated this for the menubar (lightning bolt icon) items
* I added a line to the default QSS that I think fixes the odd icon/text background colours some users saw in the command palette
### misc
* file archive times are now recorded in the background. there's no load/search/sort yet, but this will be added in future
* under 'manage shortcuts', there is a new checkbox to rename left- and right-click to primary- and secondary- in the shortcuts UI. if you have a flipped mouse or any other odd situation, try it out
* if a file storage location does not have enough free disk space for a file, or if it just has <100MB generally, the client now throws up a popup to say what happened specifically with instructions to shut down and fix now and automatically pauses subscriptions, paged file import queues, and import folders. this test occurs before the attempt to copy the file into place. free space isn't actually checked over and over, it is cached for up to an hour depending on the last free space amount
* this 'paused all regular imports' mode is also now fired any time any simple file-add action fails to copy. at this stage, we are talking 'device disconnected' and 'device failed' style errors, so might as well pause everything just to be careful
* when the downloader hits a post url that spawns several subsidiary downloads (for instance on pixiv and artstation when you have a multi-file post), the status of that parent post is now 'completed', a new status to represent 'good, but not direct file'. new download queues will then present '3N' and '3 successful' summary counts that actually correspond to number of files rather than number of successful items
* pages now give a concise 'summary name' of 'name - num_files - import progress' (it also eli...des for longer names) for menus and the new command palette, which unlike the older status-bar-based strings are always available and will stop clients with many pages becoming multi-wide-column-menu-hell
* improved apng parsing. hydrus can now detect that pngs are actually apngs for (hopefully) all types of valid apng. it turns out some weird apngs have some additional header data, but I wrote a new chunk parser that should figure it all out
* with luck, users who have window focus issues when closing a child window (e.g. close review services, the main gui does not get focus back), should now see that happen (issue #1063). this may need some more work, so let me know
* the session weight count in the 'pages' menu now updates on any add thumbs, remove thumbs, or thumbnail panel swap. this _should_ be fast all the time, and buffer nicely if it is ever overwhelmed, but let me know if you have a madlad session and get significant new lag when you watch a downloader bring in new files
* a user came up with a clever idea to efficiently target regenerations for the recent fix to pixel duplicate calculations for images with opaque alpha channels, so this week I will queue up some pixel hash regeneration. it does not fix every file with an opaque alpha channel, but it should help out. it also shouldn't take _all_ that long to clear this queue out. lastly, I renamed that file maintenance job from 'calculate file pixel hash' to 'regenerate pixel duplicate data'
* the various duplicate system actions on thumbnails now specify the number of files being acted on in the yes/no dialog
* fixed a bug when searching in complicated multi-file-service domains on a client that has been on for a long time (some data used here was being reset in regular db maintenance)
* fixed a bug where for very unlucky byte sizes, for instance 188213746, the client was flipping between two different output values (e.g. 179MB/180MB) on subsequent calls (issue #1068)
* after some user profiles and experimental testing, rebalanced some optimisations in sibling and parent calculation. fingers crossed, some larger sibling groups with worst-case numbers should calculate more efficiently
* if sibling/parent calculation hits a heavy bump and takes a really long time to do a job during 'normal' time, the whole system now takes a much longer break (half an hour) before continuing
### boring stuff
* the delete dialog has basic multiple local file service support ready for that expansion. it no longer refers to the old static 'my files' service identifier. I think it will need some user-friendly more polish once that feature is in
* the 'migrate tags' dialog's file service filtering now supports n local file services, and 'all local files'
* updated the build scripts to force windows server 2019 (and macos-11). github is rolling out windows 2022 as the new latest, and there's a couple of things to iron out first on our end. this is probably going to happen this year though, along with Qt6 and python 3.9, which will all mean end of life for windows 7 in our built hydrus release
* removed the spare platform-specific github workflow scripts from the static folder--I wanted these as a sort of backup, but they never proved useful and needed to be synced on all changes
## [Version 473](https://github.com/hydrusnetwork/hydrus/releases/tag/v473)
### misc
* fixed the recent problem with drag and dropping thumbnails to a level below the top row of pages. sorry for the trouble!
* fixed a bug where the client would not load results sorting by 'import time' when the search file domain was a single deleted file domain
* fixed a list display bug in the edit page parser dialog when a subsidiary page parser has two complicated string-match based content parsers
* collections now sort by modified time, using the largest known modified time in their collection
* added sqlite3.exe console back into the windows build--sorry, it was missing since the github build changeover!
* added a note to the help about backing up when tight on space, which I will repeat here: the sqlite database files are very compressible (70GB->17GB on default 7zip settings!), so if you need more space on your backup drive, this is a good way to reclaim it
### command palette
* a user has written a cool 'command palette' for the program! it brings up a type-and-search interface to navigate to pages or menu entries.
* I have integrated his first version and set the default shortcut to Ctrl+P. users who update will get this shortcut if they have nothing else on Ctrl+P on 'main window' set. if you prefer Ctrl+K or anything else, you can change it under _file->shortcuts->the main window_
* regular users will get a page list they can search and select, advanced users will also get the (potentially dangerous) full scan of the menubar and current thumbnail right-click menu. I will be polishing this latter feature in future to filter out big maintenance jobs and show checkbox status and similar, so if you are advanced, please be careful for now
* try it out, and let me know how it goes. the underlying widget is neat, and I can change its behaviour and extend it significantly
### (mostly advanced) deleted file improvements
* files that have been deleted from a local file domain are now aware of their file deletion reason. this is visible in the right-click menu of thumb or media canvas
* the advanced file deletion dialog now initialises using this stored reason. if all pending deletees have the same existing reason stored, it will display it, and if they are all set but differ, this will be noted and an option to not alter them is also available. this will come up later in niche advanced situations with mutiple file services
* reversing a recent change, local file deletion reasons are no longer cleared on undelete or (re)import. they'll now hang around invisibly and initialise any future advanced file deletion dialog
* updated the thumbnail and canvas undelete mechanism to handle multiple services. now, if the files are deleted in more than one domain, you will be asked to multiple-select which you wish to undelete for. if there is only one eligible undelete service, the process remains unchanged--you'll just get a yes/no confirmation if the 'confirm trash' option is set
* misc multiple local file services code conversion work
## [Version 472](https://github.com/hydrusnetwork/hydrus/releases/tag/v472b)
### highlights
* the file domain button of every autocomplete input now has a 'multiple locations' entry. this launches a checkboxlist of all possible search locations and allows you to search more than one domain at once. it works, too! in future, when we can have multiple 'my files' services, you'll be able to choose here unions of what to search. users in advanced mode will see repository updates, all local files, all known files, and the new deleted file domains on this list. I removed the deleted file domains from the front menu because I expect them to be rarely used
* in _options->thumbnails_, there is now a 'thumbnail scaling' dropdown. you can set it so thumbs only ever scale down (which remains the default), scale to fit (i.e. very small images are also scaled up), or scale to fill. the 'animation' as thumbnails refit and delayed-regen themselves to 'scale to fill' is accidentally one of the coolest things I have done
* removed the old 'EXPERIMENTAL: thumbnail fill' option. the new mode works essentially the same, but faster and higher quality
* in the page tab menu, there is a new submenu 'pages', which shows all the pages at or below the current level. if you right-click on a page of pages tab, it will just show for that page of pages. click any of the entries, you will select that page. it is a web browser-like quick navigation menu, let me know what you think!
* rejiggered the page tab menu a little, reordering groups a bit with nicer separators and putting 'select' navigation on the menu even if you click in greyspace
* fixed a problem in page tab menu logic where if you right-clicked on greyspace, it would render the menu for the bottommost page of pages row rather than the one actually clicked
* last week's update where a mouse release event will no longer fire in the shortcuts system if the mouse moved a decent distance between press and release should now work in the media viewer canvas when dragging is set to anchor the mouse in place. some advanced users may wish to try setting archive/delete to work on mouse release and use left click to drag
### bug fixes
* fixed pages force-refreshing file queries on session load. this has never been intentional, but it slipped through again and was happening for a month or two now. I have added an explicit test to my routine to make sure this doesn't happen again, sorry for the trouble!
* fixed a problem in the recent fast shutdown code that was accidentally also shutting down some maintenance work like repository processing as soon as it started, even if 'exit and force work' was chosen
* all images with a completely opaque alpha channel will now have that alpha channel dropped for the new pixel hash calculation, meaning they will now match with regular non-alpha images with the same colour pixel data. in fact, all images with an opaque alpha now have that channel dropped on load, which will save a little memory and CPU any time they are handled (issue #770)
* if the 'durable' temporary database exists on boot, it is now deleted and a fresh one created rather than trying to re-use the old one (which would not have any useful information anyway), and a note is made to log. one user recently had a problem where an existing corrupt temp dir was stopping boot, which this fixes
### misc
* updated the windows build to use a newer version of mpv, moving from roughly 2021-02 to 2022-01. this replaces mpv-1.dll with mpv-2.dll, and I set the installer to delete the old '1'. if you extract, you can delete the old '1' yourself, but things seems fine with both. in any case, let me know if you have any trouble!
* updated the windows build to use sqlite 3.37.2, the sqlite3 in the db dir is also updated
* the deleted files system now neatly cleans up old file deletion reasons on file import and file undelete
* cleared out some old thumbnail generation code, including deleting an old and now obselete optimisation where too-large thumbs were scaled down to make new thumbs rather than revisiting source. since our thumb situation is more complicated now, this is gone in favour of nicer quality thumbs and simpler code
* fixed up some upcoming database maintenance code in my new modules
* updated and cleaned the code in the old wx-converted checkboxlist and replaced some awkward old access routines
* cleared out some old HTA archive code
## [Version 471](https://github.com/hydrusnetwork/hydrus/releases/tag/v471)
### times
* if you have file viewing stats turned on (by default it is), the client will now track the 'last viewed time' of your files, both in preview and media viewers. a record is only made assuming they pass the viewtime checks under _options->file viewing statistics_ (so if you scroll through really quick but have it set to only record after five seconds of viewing, it will not save that as the last viewed time). this last viewed time is shown on the right-click menu with the normal file viewing statistics
* sorting by 'import time' and 'modified time' are moved to a new 'time' subgroup in the sort button menu
* also added to 'time' is 'last viewed time'. note that this has not been tracked until now, so you will have to look at a bunch of things for a few seconds each to get some data to sort with
* to go with 'x time' pattern, 'time imported' is renamed to 'import time' across the program. both should work for system predicate parsing
* system:'import time' and 'modified time' are now bundled into a new 'system:time' stub in the system predicates list. the window launched from here is an experimental new paged panel. I am not sure I really like it, but let's see how it works IRL
* 'system:last view time' is added to search the new field! give it a go once you have some data
* also note that the search and sort of last viewed time works on the 'media viewer' number. those users who use preview or combined numbers for stuff, let me know if and how you would like that to work here--sort/search for both media and preview, try to combine based on the logic in the options, or something else?
### loading serialised pngs
* the client can now load serialised downloader-pngs if they are a perfect RGB conversion of an original greyscale export.
* the pngs don't technically have to be pngs anymore! if you drag and drop an image from firefox, the temporary bitmap exported and attached to the DnD _should_ work!
* the lain easy downloader import now has a clipboard paste button. it can take regular json text, and now, bitmap data!
* the 'import->from clipboard' button action in many multiple column lists across the program (e.g. manage parsers) (but not every list, a couple are working on older code) also now accepts bitmap data on the clipboard
* the various load errors here are also improved
### custom widget colors
* (advanced users only for now)
* after banging my head against it, I finally figured out an ok way to send colors from a QSS style file to python code. this means I can convert my custom widgets to inherit colours from the current QSS. I expect to migrate pretty much everything currently fixed over to this, except tag colours and maybe some thumbnail border stuff, and retire the old darkmode
* if you are a QSS lad, please check out the new entries at the bottom of default_hydrus.qss and play around with them in your own QSSes. please do not send me any updates to be folded in to the install yet as I still have a bunch of other colours to add. this week is just a test--please let me know how it works for you
### misc
* mouse release events no longer trigger a command in the shortcuts system if the release happens more than about 20 pixels from the original mouse down. this is tricky, so if you are into clever shortcuts, let me know how it works for you
* the file maintenance manager (which has been getting a lot of work recently with icc profiles, pixel dupes, some thumb regen, and new audio channel checks), now saves its work and publishes updates faster to the UI, at least once every ten seconds
* the sort entries in the page sort control are now always sorted according to their full (type, name) string, and the mouse-wheel-to-navigate is now fixed to always mirror this
* improved some 'delete file reason' handling. currently, a file deletion reason should only be applied when a file is entering trash. there was a bug that force-physical-deleting files from trash would overwrite their original deletion reason. this is now fixed. the advanced delete files dialog now disables the whole reason panel better when needed, never sends a file reason down to the database when there should be no reason, disables the panel if all the files are in the trash, and at the database level when file deletion reasons are being set, all files are filtered for status beforehand to ensure none are accidentally set by other means. I am about to make trash more intelligent as part of multiple local file services, so I expect to revisit this soon
* the new ICC Profile conversion no longer occurs on I or F mode files. there are weird 32/64 bit monochrome files, and mode/ICC conversion goes whack with current PIL code
* replaced the critical hamming test in the duplicate files system with a different bit-counting strategy that works about 9% faster. hamming test is used in all duplicate file searching, so this should help out a tiny bit in a lot of places
### boring cleanup
* cleaned up how media viewer canvas type is stored and tested in many places
* all across the program, file viewing statistics are now tracked by type rather than a hardcoded double of preview & media viewer. it will take a moment to update the database to reflect this this week
* cleaned up a ton of file viewing stats code
* cleared out the last twenty or so uses of the old 'execute many select' database access routine in favour of the new lower-overhead and more query-optimisable temporary integer tables method
## [Version 470](https://github.com/hydrusnetwork/hydrus/releases/tag/v470b)
### multiple file services
* I finished the conversion of all UI search to the new multiple location object. everything from back- to frontend now supports cleverer search. since searching deleted files is simple to add, users in advanced mode will now see 'deleted from...' in a new list in the tag autocomplete dropdown file domain button
* the next step is writing a widget that allows multiple selection, and then all this should work right out of the box, and we'll be an important step closer to allowing multiple local file services
### misc
* the video parsing routine is better at detecting when a present audio track is actually silent (and hence when it should mark a video as 'no audio'). all video with audio will be requeued for a metadata reparse in the files maintenance system on update
* fixed an error from last week when trying to create a new page from the tags (e.g. middle-clicking them) in the active search list
* added 'audio mkv' format to the client, to represent mkvs without a video track. I think most of the time this is going to be audio track webms from youtube-dl and similar
* added 'file relationships: set files as potential duplicates' command to the 'media actions' shortcut set
* I expanded the 'backing up' section in 'installing and updating' help
* I wrote an 'anti-virus' section for 'installing and updating' help, since I kept writing the same basic spiel about false positives. please feel free to point people there in future to relieve their concerns
* improved some shutdown tests, the client and server should exit faster in some cases (e.g. when a hydrus repository network job is hanging on reconnection attempts, holding up the 'synchronise_repository' daemon shutdown)
* the 'file was xxx at (y timestamp), which was (z time units) before this check' line in file import notes now always puts 'z time units' as that, ignoring the 'always show ISO time' setting, which was just substituting it with 'y timestamp' again. let me know if you spot other bad grammar with this setting on, I'll fix it!
* fingers crossed, images in the LAB colourspace _should_ now normalise to sRGB with the correct whitepoint. thanks to the user who provided example test tiff images here. this now uses the new PIL-based colourspace conversion I used to make ICC profiles work, just on LAB->sRGB. as far as I understand, OpenCV uses a fixed whitepoint of D65, resulting in yellow/warm conversions for some formats, but PIL may be able to figure out if D50 is needed??? if you have some crazy LUV or YPbPr or YIQ image that shows up wrong, please send it in and I'll see what I can do!
### boring rewrites and cleanup while doing file service work
* many more UI objects now store and do file service logic using a more complicated 'location context', which can store a mix of multiple services and 'deleted from service' data. all the search code that works on this can now propagate to display:
* the management objects behind every page now store a multiple location object, not a single file service id
* all media panels (the thumbnail grid on a page) are now instantiated by a multiple location object, and when they serve a highlighted downloader, they now inherit that from the file import options, which in future will dictate import destinations
* all canvases are now the same, inheriting their new location context from their parents
* all tag lists are the same. mostly they don't care much about file domain, but when you middle-click to create new pages from the autocomplete dropdown list or active search list, it can matter, so they now propagate it along
* the underlying medialist objects are now the same, and various delete logic (e.g. 'should we remove this thumb we just deleted?') is updated to work on complex domains
* some duplicate lookup code now works on location context
* renamed 'location search context' object to 'location context' since it is used all over now and put it in its own file. also wrote it some neater initialisation and meta object code
* mr bones now gives duplicate data based on the union of all non-trash local services sans update files (another case of now supporting n services but n is fixed for the moment at 1, 'my files')
* a bunch of places across the program that used to default to 'my files' or 'all local files' (which is everything on disk, including trash and repository update files) now default to this new union of all non-trash local media services
* when doing page-to-page file drag and drops, the location context is now preserved (previously, the new page would always be 'my files')
* whole heap of other cleanup in these systems
* when a thumbnail cannot be provided (for deleted files or many 'all known files' situations), the thumbnail cache now provides the hydrus icon stand-in instantly, no delayed waterfall
* fixed an unusual situation where the file search could not provide a file in a tagless search when that file had no detailed file info row in the database. this seems to effect a legacy borked row or two in the new deleted file domain searches
* removed some ancient dumper status code from thumbnail objects
## [Version 469](https://github.com/hydrusnetwork/hydrus/releases/tag/v469)
### misc
* the 'search log' button and the window panel now let you delete log entries. you can delete by completion status from the menu or specifically by row in the panel (just like the file log)
* fixed the new 'file is writable' checks for Linux/macOS, which were testing permissions overbroadly and messing with users with user-only permissions set. the code now hands off specific user/group negotiation to the OS. thanks to the maintainer of the AUR package for helping me out here (issue #1042)
* the various places where a file's permission bits are set are also cleaned up--hydrus now makes a distinction between double-checking a file is set user-writable before deleting/overwriting vs making a file's permission bits (which were potentially messed up in the past) 'nice' for human use after export. in the latter case, which still defaults to 644 on linux/macOS, the user's umask is now applied, so it should be 600 if you prefer that
* fixed a bug where the media viewer could have trouble initialising video when the player window instantiation was delayed (e.g. with embed button)
### client api
* added 'return_hashes' boolean parameter to GET /get_files/search_files, which makes the command return hashes instead of file ids. this is documented in the help and has a new unit test
* client api version is now 25
### multiple local file services work
* I rewrote a lot of code this week, but it proved more complex than I expected. I also discovered I'll have to switch the pages and canvases over too before I can nicely switch the top level UI over to allow multiple search. rather than release a borked feature, I decided not to rush the final phase, so this remains boring for now! the good news is that it works well when I hack it in, so I just need to keep pushing
* rewrote the caller side of tag autocomplete lookup to work on the new multiple file search domain
* rewrote the main database level tag lookup code to work on the new multiple file search domain
* certain types of complicated tag autocomplete lookup, particularly on all known tags and any client with lots of siblings, will be faster now
* an unusual and complicated too-expansive sibling lookup on autocomplete lookups on 'all known tags' is now fixed
### boring cleanup and refactoring
* predicate counts are now managed by a new object. predicates also support 0 minimum count for x-y count ranges, which is now possible when fetching count results from non-cross-referenced file domains (for now this means searching deleted files)
* cleaned up a ton of predicate instantiation code
* updated autocomplete, predicate, and pred count unit tests to handle the new objects and bug fixes
* wrote new classess to cover convenient multiple file search domain at the database level and updated a bunch of tag autocomplete search code to use it
* misc cleanup and refactoring for file domain search code
* purged more single file service inspection code from file search systems
* refactored most duplicate files storage code (about 70KB) to a new client db module
## [Version 468](https://github.com/hydrusnetwork/hydrus/releases/tag/v468)
### misc
* fixed an issue where the one pixel border on the new 'control bar' on the media viewer was annoyingly catching mouse events at the bottom of the screen when in borderless fullscreen mode (and hence dragging the video, not scanning video position). the animation scanbar now owns its own border and processes mouse events on it properly
* fixed a typo bug in the new pixel hash system that meant new imports were not being added to the system correctly. on update, all files affected will be fixed of bad data and scheduled for a pixel hash regen. sorry for the trouble, and thank you for the quick reports here
* added a 'fixed font size example' qss file to the install. I have passed this file to others as an example of a quick way to make the font (and essentially ui scale) larger. it has some help comments inside and is now always available. the default example font size is lmao
* fixed another type checking problem for (mostly Arch/AUR) PyQt5 users (issue #1033)
* wrote a new display mappings maintenance routine for the database menu that repopulates the display mappings cache for missing rows. this task will be radically faster than a full regen for some problems, but it only solves those problems
* on boot, the program now explicitly checks if any of the database files are set as read-only and if so will dump out with an appropriate error
* rewrote my various 'file size problem' exception hierarchy to clearly split 'the file import options disallow this big gif' vs 'this file is zero size/this file is malformed'. we've had several problems here recently, but file import options rule-breaking should set 'ignore' again, and import objects should be better about ignore vs fail state from now on
* added more error handling for broken image files. some will report cleaner errors, some will now import
* the new parsing system that discards source urls if they share a domain with a primary import url is now stricter--now discarding only if they share the same url class. the domain check was messing up saving post urls when they were parsed from an api url (issue #1036)
* the network engine no longer sends a Range header if it is expecting to pull html/json, just files. this fixes fetching pages from nijie.info (and several other server engines, it seems), which has some unusual access rules regarding Range and Accept-Encoding
* fixed a problem with no_daemons and the docker package server scripts (issue #1039)
* if the server engine (serverside or client api) is running a request during program shutdown, it now politely says 'Application is shutting down!' with a 503 rather than going bananas and dumping to log with an uncaught 500
* fixed some bad client db update error handling code
### multiple local file services (system predicate edition)
* system:file service now supports 'deleted' and 'petitioned' status
* advanced 'all known files' search pages now show more system predicates
* when inbox and archive are hidden because one has 0 count, and the search space is simple, system everything now says what they are, e.g. "system:everything (24) (all in inbox)"
* file repos' 'system:local/not local' now sort at the top of the system predicate list, like inbox/archive
### client api
* the GET /get_files/file_metadata call now returns the file modified date and imported/deleted timestamps for each file service the file is currently in or deleted from. check the help for an example!
* fixed client api file search with random sort (file_sort_type = 4)
* client api version is now 24
### boring multiple local file services work
* the system predicates listed in a search page dropdown now support the new 'multiple location search context' object, which means in future I will be able to switch over to 'file domain is union of (A, deleted from B, and C)' and all the numbers will add up appropriately with ranged 'x-y' counts and deal with combinations of file repo and local service and current/deleted neatly
* when fetching system preds in 'all known files', the system:everything 'num files' count will be stated if available in the cache
* for the new system:file service search, refactored db level file filtering to support all status types
* cleaned up how system preds are generated
### boring refactoring work
* moved GUGs from network domain code to their own file
* moved URL Class from network domain code to its own file
* moved the pure functions from network domain code to their own file
* cleared up some file maintenance enum variable names
* sped up random file sort for large result sets
* misc client network code cleanup and type hints, and rejiggered cleaner imports after the refactoring
## [Version 467](https://github.com/hydrusnetwork/hydrus/releases/tag/v467)
### new scanbar cleanup
* the media container's scanbar and volume control are now combined on the same widget, meaning they now show/hide in sync and faster. their layout calculation is also more sensible. the new controls bar also has a thin border to make it pop better against a background video
* improved the way some auto-hide anti-flicker tech on the scanbar now works. it all hides a frame faster sometimes
* figured out some new anti-flicker tech to reduce/eliminate a frame of stretch when flicking from a static image to an mpv video, particularly for the first or second time in a session
* fixed a bug where clicking the global mute/unmute on an mpv player meant that certain shortcut keys (usually those with arrow keys) would not work on that player again. (it was a focus issue on the button, which then captured some form navigation keys but they had nowhere to go)
* brushed up some mouse coordinate testing logic across the program. some linux clients had trouble with the new animation scanbar popping up over mpv, I think I improved it!
* fixed another type problem with newer python/PyQt5 on Arch, also in scanbar coordinate testing
* fixed some dodgy colours in the scanbar initialisation and volume control border
* macOS users: I undid a long-time paint hack on the media container and the static image canvas. Qt is responsible for clearing the background again, which allows me to remove some jank anti-flicker tech. HOWEVER, the original reason for this hack was because without it, old macOS went to 100% CPU whenever the media viewer was showing something. therefore, to be safe, this option is still on for macOS users for now. you'll get a little flicker when browsing. please try hitting _help->debug->gui actions->macOS anti-flicker test_ and do some mixed video/image browsing. does your whole damn client lock up?
### misc
* the 'file log' and 'search log' buttons are now a new widget class that puts an arrow on the side that opens a menu. the secret right-click menus of these buttons is now exposed for all
* fixed a bug affecting some greyscale pngs with ICC profiles--they were coming out pure white due to a colourspace conversion problem
* fixed an import problem when PIL could not load a file (due to file malformation) but OpenCV could. this was causing a failed import from the new ICC profile detection code
* when the downloader hits a broken image file that cannot be imported due to malformation, the status is now 'error' instead of the incorrect 'ignored'
* fixed the duplicate file filesize comparison statement sometimes showing > in one direction and ≈ in the other. it happened when the larger file was between 20/19 and 21/20 times the size of the smaller, just a logic typo (issue #1028)
* the trash maintenance daemon is moved from the old threaded daemon system to the new repeating job worker pool. this is the last daemon cleaned up, so I am retiring the old and mostly defunct 'no_daemons' launch argument. a variety of other daemon infrastructure for things like shutdown checks is similarly removed. the program also now waits for the newer daemon jobs to finish working on shutdown
* moved most client daemon jobs like repository sync and dirty object save down so they start after the first session is loaded rather than right after boot
* if a file is called to regen its thumbnail but currently has no dimension, this is now a no-op rather than an error. in the situation where users force thumb regen before metadata regen and encounter this, it is sorted out later when the metadata regen recognises new dimensions and reschedules the thumb regen
* added an extensive user-written guide to the --db_synchronous_override launch argument to the launch arguments help page. it is possible and safe to run the program with synchronous=0 as long as certain caveats are obeyed. thanks to the user who figured this out and wrote it up
* the downloader engine now discards source urls in an import job if they have the same domain as any existing primary url. this will ensure that if a booru has a link back to itself as a source url, when the 'source' is really an alternate rather than a dupe, it won't be added in hydrus as a known url for that imported file
* misc cleanup in downloader system and file/search log UI
* fixed a type bug in the file and search log 'import from png' action. if you have existing pngs previously exported from here, they will import ok now
* refactored the various hydrus compression code to a new HydrusCompression file
* exported serialisable data pngs such as from file or search log that hold simple Strings now always compress the data before embedding it in the png. existing pngs that hold uncompressed strings should still load ok
* the payload in an exported png is now always compressed, and the payload description always states the uncompressed size
* sped up client shutdown when network traffic has been paused the whole time and a repo sync job might have wanted to run. these jobs also do not hang on a thread worker if network traffic is paused, but they should wake immediately when it is unpaused
* the hydrus login system is now resistant to connection failures; previously it was getting hung up and jamming the whole hydrus sync system when a server was down
### client api
* added GET /manage_database/mr_bones to the Client API. it returns a JSON Object with the same numbers used in the _help->how boned am I?_ dialog
* incremented Client API version to 23
## [Version 466](https://github.com/hydrusnetwork/hydrus/releases/tag/v466)
### video scanbar autohide
* the scanbar that shows below audio and video is now embedded inside the video frame, and it show/hides based on how close your mouse is to it
* I've wanted to do this for a long time, since it will allow you to watch 16:9 videos at true 100% in borderless fullscreen, but the hackery of how the media viewer works behind the scenes means this took more work than you'd think and is still a little jank. there's a small amount of flicker when it pops in and out, which I will work on in future. in any case, please have a play with it and let me know what you think. I expect to add some more options, like for the activation padding area around it, and I will be tidying up more layout stuff throughout the media viewer
* if you are a mostly keyboard user, please check out the new 'global' shortcut to flip on/off a 'force the animation scanbar to show' mode
* I don't really want to bring back the always-on hanging-below scanbar that just takes up space, but if you try this new embedded scanbar and really hate it, we'll see what we can do
### more duplicate filter search options
* the duplicates page now has a dropdown on the search for 'must be/can be/excludes pixel dupes'!
* the duplicates page now has a number control on the search for what distance the pair was found at! I am not sure how accurate this thing is in all cases, but it seems I started tracking this data some time ago and forgot I even had it
* these new options are remembered in your session and _should_ remain fast in most normal cases. I put time into some complicated database work this week to get this going, please let me know if you have any trouble with it
### misc
* when the export filename pattern in the export files dialog means many of the files share the same base and hence need to do 'filename (5)'-style suffixes to be unique, the number here is now calculated much more efficiently. opening this dialog on 10,000 files with an oft-duplicate pattern should now be a jolt of lag but not many minutes
* when you choose to 'separate' a subscription with more than 100 queries, you are no longer forced to break it into half
* when you do break a subscription in half, it now makes sure to sort the query texts before separating
* if you are in advanced mode, the 'selection tags' list on the left of every page can now switch its tag display type between 'multiple media views', 'display', and 'storage'. this is experimental and a bunch of stuff like 'select files with this tag' won't work yet
* janitors' petition pages now start with their tag list in 'storage' mode, so you can see the actual tags being changed rather than with siblings and parents calculated
* rebalanced some janitor mapping petition weights. jannies _should_ see a smoother balance of 'lots of small petitions' vs 'a few larger petitions' amongst petitions all with the same reason and creator
### boring cleanup and little fixes
* when you set the checker options in the edit subscription dialog, the queries now recalculate their file velocity better. previously, they would just set 'unknown' and recalc on the next run, but now they will actually recalculate if the query container is loaded into memory or otherwise put a status that says 'will calculate on next run'
* removed the 'should be namespaced' reason from the manage tags quick petition reasons. this is now all handled by siblings, tidying up storage tags manually is busywork
* when you click 'copy traceback' on an error popup, it also copies the software version, your platform, and if you are on a frozen build or running from source
* the logger now prints version number for every block, just before the timestamp
* cleaned up a variety of media viewer UI code while working on the scanbar, fixing some misc display bugs
* moved pixel hash storage responsibility from 'file metadata' to 'similar files' module
* the similar files system now searches pixel hashes when it is called to do any similar files search. they count as 'exact match' distance
* when a file gets a new pixel hash, it now sees if any other files have that same hash. if so, it now gets queued up again in the similar files search system, ensuring this match is not missed
* misc nomenclature cleanup--since we now have both 'pixel hashes' and 'phashes', phashes are now referred to as 'perceptual hashes' everywhere
* massively refactored the primary table join that drives potential duplicates search. it should work a bit faster now and it is much easier to work with
* I added pixel dupe and distance search to the standard search results version of the join and the 'system:everything' version, which has several optimisations
* silenced some shutdown handling in file maintenance that was being printed to log as an error
* fixed some 'broken object load' error handling to print the timestamp of the specific bad object, not whatever timestamp was requested. this error handling now also prints the full dump name and version to the log, and version to the exported filename. I was working with a user who had broken subs this week, and lacking this full info made things just a little trickier to put back together
* fixed some drag and drop handling where it was possible to drop thumbnails on a certain location of a page of pages that held an empty page of pages but it would not create a new child media page to hold them
* misc serverside db code cleanup
* fixed python 3.10 type bugs in window coordinate saving and Qt image generation from buffer (issue #1027)

1578
docs/client_api.md Normal file

File diff suppressed because it is too large Load Diff

32
docs/contact.md Normal file
View File

@ -0,0 +1,32 @@
# contact and links
I welcome all your bug reports, questions, ideas, and comments. It is always interesting to see how other people are using my software and what they generally think of it. Most of the changes every week are suggested by users.
You can contact me by email, twitter, tumblr, discord, or the 8chan.moe /t/ thread or Endchan board--I do not mind which. Please know that I have difficulty with social media, and while I try to reply to all messages, it sometimes takes me a while to catch up.
The [Github Issue Tracker](https://github.com/hydrusnetwork/hydrus/issues) was turned off for some time, as it did not fit my workflow and I could not keep up, but it is now running again, managed by a team of volunteer users. Please feel free to submit feature requests there if you are comfortable with Github. I am not socially active on Github, please do not ping me there.
I am on the discord on Saturday afternoon, USA time, if you would like to talk live, and briefly on Wednesday after I put the release out. If that is not a good time for you, please leave me a DM and I will get to you when I can. There are also plenty of other hydrus users who idle who can help with support questions.
I delete all tweets and resolved email conversations after three months. So, if you think you are waiting for a reply, or I said I was going to work on something you care about and seem to have forgotten, please do nudge me.
I am always overwhelmed by work and behind on my messages. This is not to say that I do not enjoy just hanging out or talking about possible new features, but forgive me if some work takes longer than expected or if I cannot get to a particular idea quickly. In the same way, if you encounter actual traceback-raising errors or crashes, there is only one guy to fix it, so I prefer to know ASAP so I can prioritise.
I work by myself because I have acute difficulty working with others. Please do not spontaneously write long design documents or prepare other work for me--I find it more stressful than helpful, every time, and I won't give it the attention it deserves. If you would like to contribute time to hydrus, the user projects like the downloader repository and wiki help guides always have things to do.
That said:
* [homepage](https://hydrusnetwork.github.io/hydrus/)
* [github](https://github.com/hydrusnetwork/hydrus) ([latest build](https://github.com/hydrusnetwork/hydrus/releases/latest))
* [issue tracker](https://github.com/hydrusnetwork/hydrus/issues)
* [8chan.moe /t/ (Hydrus Network General)](https://8chan.moe/t/catalog.html) ([endchan bunker](https://endchan.net/hydrus/) [(.org)](https://endchan.org/hydrus/))
* [tumblr](https://hydrus.tumblr.com) ([rss](http://hydrus.tumblr.com/rss))
* [new downloads](https://github.com/hydrusnetwork/hydrus/releases)
* [old downloads](https://www.mediafire.com/hydrus)
* [twitter](https://twitter.com/hydrusnetwork)
* [email](mailto:hydrus.admin@gmail.com)
* [discord](https://discord.gg/wPHPCUZ)
* [patreon](https://www.patreon.com/hydrus_dev)
* [user-run repository and wiki (including download presets for several non-default boorus)](https://github.com/CuddleBear92/Hydrus-Presets-and-Scripts)

114
docs/database_migration.md Normal file
View File

@ -0,0 +1,114 @@
---
title: database migration
---
## the hydrus database { id="intro" }
A hydrus client consists of three components:
1. **the software installation**
This is the part that comes with the installer or extract release, with the executable and dlls and a handful of resource folders. It doesn't store any of your settings--it just knows how to present a database as a nice application. If you just run the client executable straight, it looks in its 'db' subdirectory for a database, and if one is not found, it creates a new one. If it sees a database running at a lower version than itself, it will update the database before booting it.
It doesn't really matter where you put this. An SSD will load it marginally quicker the first time, but you probably won't notice. If you run it without command-line parameters, it will try to write to its own directory (to create the initial database), so if you mean to run it like that, it should not be in a protected place like _Program Files_.
2. **the actual database**
The client stores all its preferences and current state and knowledge _about_ files--like file size and resolution, tags, ratings, inbox status, and so on and so on--in a handful of SQLite database files, defaulting to _install_dir/db_. Depending on the size of your client, these might total 1MB in size or be as much as 10GB.
In order to perform a search or to fetch or process tags, the client has to interact with these files in many small bursts, which means it is best if these files are on a drive with low latency. An SSD is ideal, but a regularly-defragged HDD with a reasonable amount of free space also works well.
3. **your media files**
All of your jpegs and webms and so on (and their thumbnails) are stored in a single complicated directory that is by default at _install\_dir/db/client\_files_. All the files are named by their hash and stored in efficient hash-based subdirectories. In general, it is not navigable by humans, but it works very well for the fast access from a giant pool of files the client needs to do to manage your media.
Thumbnails tend to be fetched dozens at a time, so it is, again, ideal if they are stored on an SSD. Your regular media files--which on many clients total hundreds of GB--are usually fetched one at a time for human consumption and do not benefit from the expensive low-latency of an SSD. They are best stored on a cheap HDD, and, if desired, also work well across a network file system.
## these components can be put on different drives { id="different_drives" }
Although an initial install will keep these parts together, it is possible to, say, run the database on a fast drive but keep your media in cheap slow storage. This is an excellent arrangement that works for many users. And if you have a very large collection, you can even spread your files across multiple drives. It is not very technically difficult, but I do not recommend it for new users.
Backing such an arrangement up is obviously more complicated, and the internal client backup is not sophisticated enough to capture everything, so I recommend you figure out a broader solution with a third-party backup program like FreeFileSync.
## pulling your media apart { id="pulling_media_apart" }
!!! danger
**As always, I recommend creating a backup before you try any of this, just in case it goes wrong.**
If you would like to move your files and thumbnails to new locations, I generally recommend you not move their folders around yourself--the database has an internal knowledge of where it thinks its file and thumbnail folders are, and if you move them while it is closed, it will become confused and you will have to manually relocate what is missing on the next boot via a repair dialog. This is not impossible to figure out, but if the program's 'client files' folder confuses you at all, I'd recommend you stay away. Instead, you can simply do it through the gui:
Go _database->migrate database_, giving you this dialog:
![](images/db_migration.png)
This is an image from my old laptop's client. At that time, I had moved the main database and its files out of the install directory but otherwise kept everything together. Your situation may be simpler or more complicated.
To move your files somewhere else, add the new location, empty/remove the old location, and then click 'move files now'.
**Portable** means that the path is beneath the main db dir and so is stored as a relative path. Portable paths will still function if the database changes location between boots (for instance, if you run the client from a USB drive and it mounts under a different location).
**Weight** means the relative amount of media you would like to store in that location. It only matters if you are spreading your files across multiple locations. If location A has a weight of 1 and B has a weight of 2, A will get approximately one third of your files and B will get approximately two thirds.
The operations on this dialog are simple and atomic--at no point is your db ever invalid. Once you have the locations and ideal usage set how you like, hit the 'move files now' button to actually shuffle your files around. It will take some time to finish, but you can pause and resume it later if the job is large or you want to undo or alter something.
If you decide to move your actual database, the program will have to shut down first. Before you boot up again, you will have to create a new program shortcut:
## informing the software that the database is not in the default location { id="launch_parameter" }
A straight call to the client executable will look for a database in _install_dir/db_. If one is not found, it will create one. So, if you move your database and then try to run the client again, it will try to create a new empty database in the previous location!
So, pass it a -d or --db_dir command line argument, like so:
* `client -d="D:\\media\\my\_hydrus\_database"`
* _--or--_
* `client --db_dir="G:\\misc documents\\New Folder (3)\\DO NOT ENTER"`
* _--or, for macOS--_
* `open -n -a "Hydrus Network.app" --args -d="/path/to/db"`
And it will instead use the given path. If no database is found, it will similarly create a new empty one at that location. You can use any path that is valid in your system, but I would not advise using network locations and so on, as the database works best with some clever device locking calls these interfaces may not provide.
Rather than typing the path out in a terminal every time you want to launch your external database, create a new shortcut with the argument in. Something like this, which is from my main development computer and tests that a fresh default install will run an existing database ok:
![](images/db_migration_shortcut.png)
Note that an install with an 'external' database no longer needs access to write to its own path, so you can store it anywhere you like, including protected read-only locations (e.g. in 'Program Files'). If you do move it, just double-check your shortcuts are still good and you are done.
## finally { id="finally" }
If your database now lives in one or more new locations, make sure to update your backup routine to follow them!
## moving to an SSD { id="to_an_ssd" }
As an example, let's say you started using the hydrus client on your HDD, and now you have an SSD available and would like to move your thumbnails and main install to that SSD to speed up the client. Your database will be valid and functional at every stage of this, and it can all be undone. The basic steps are:
1. Move your 'fast' files to the fast location.
2. Move your 'slow' files out of the main install directory.
3. Move the install and db itself to the fast location and update shortcuts.
Specifically:
* Update your backup if you maintain one.
* Create an empty folder on your HDD that is outside of your current install folder. Call it 'hydrus_files' or similar.
* Create two empty folders on your SSD with names like 'hydrus\_db' and 'hydrus\_thumbnails'.
* Set the 'thumbnail location override' to 'hydrus_thumbnails'. You should get that new location in the list, currently empty but prepared to take all your thumbs.
* Hit 'move files now' to actually move the thumbnails. Since this involves moving a lot of individual files from a high-latency source, it will take a long time to finish. The hydrus client may hang periodically as it works, but you can just leave it to work on its own--it will get there in the end. You can also watch it do its disk work under Task Manager.
* Now hit 'add location' and select your new 'hydrus\_files'. 'hydrus\_files' should be added and willing to take 50% of the files.
* Select the old location (probably 'install\_dir/db/client\_files') and hit 'decrease weight' until it has weight 0 and you are prompted to remove it completely. 'hydrus_files' should now be willing to take all the files from the old location.
* Hit 'move files now' again to make this happen. This should be fast since it is just moving a bunch of folders across the same partition.
* With everything now 'non-portable' and hence decoupled from the db, you can now easily migrate the install and db to 'hydrus_db' simply by shutting the client down and moving the install folder in a file explorer.
* Update your shortcut to the new client.exe location and try to boot.
* Update your backup scheme to match your new locations.
* Enjoy a much faster client.
You should now have _something_ like this:
![](images/db_migration_example.png)
## p.s. running multiple clients { id="multiple_clients" }
Since you now know how to tell the software about an external database, you can, if you like, run multiple clients from the same install (and if you previously had multiple install folders, now you can now just use the one). Just make multiple shortcuts to the same client executable but with different database directories. They can run at the same time. You'll save yourself a little memory and update-hassle. I do this on my laptop client to run a regular client for my media and a separate 'admin' client to do PTR petitions and so on.

7
docs/docker.md Normal file
View File

@ -0,0 +1,7 @@
---
title: Docker
hide:
- toc
---
--8<-- "static/build_files/docker/README.md"

View File

@ -0,0 +1,17 @@
---
title: Putting it All Together
---
Now you know what GUGs, URL Classes, and Parsers are, you should have some ideas of how URL Classes could steer what happens when the downloader is faced with an URL to process. Should a URL be imported as a media file, or should it be parsed? If so, how?
You may have noticed in the Edit GUG ui that it lists if a current URL Class matches the example URL output. If the GUG has no matching URL Class, it won't be listed in the main 'gallery selector' button's list--it'll be relegated to the 'non-functioning' page. Without a URL Class, the client doesn't know what to do with the output of that GUG. But if a URL Class does match, we can then hand the result over to a parser set at _network->downloader definitions->manage url class links_:
![](images/downloader_completion_url_links.png)
Here you simply set which parsers go with which URL Classes. If you have URL Classes that do not have a parser linked (which is the default for new URL Classes), you can use the 'try to fill in gaps...' button to automatically fill the gaps based on guesses using the parsers' example URLs. This is usually the best way to line things up unless you have multiple potential parsers for that URL Class, in which case it'll usually go by the parser name earliest in the alphabet.
If the URL Class has no parser set or the parser is broken or otherwise invalid, the respective URL's file import object in the downloader or subscription is going to throw some kind of error when it runs. If you make and share some parsers, the first indication that something is wrong is going to be several users saying 'I got this error: (_copy notes_ from file import status window)'. You can then load the parser back up in _manage parsers_ and try to figure out what changed and roll out an update.
_manage url class links_ also shows 'api link review', which summarises which URL Classes api-link to others. In these cases, only the api URL gets a parser entry in the first 'parser links' window, since the first will never be fetched for parsing (in the downloader, it will always be converted to the API URL, and _that_ is fetched and parsed).
Once your GUG has a URL Class and your URL Classes have parsers linked, test your downloader! Note that Hydrus's URL drag-and-drop import uses URL Classes, so if you don't have the GUG and gallery stuff done but you have a Post URL set up, you can test that just by dragging a Post URL from your browser to the client, and it should be added to a new URL Downloader and just work. It feels pretty good once it does!

45
docs/downloader_gugs.md Normal file
View File

@ -0,0 +1,45 @@
---
title: Gallery URL Generators
---
Gallery URL Generators, or **GUGs** are simple objects that take a simple string from the user, like:
* blue_eyes
* blue\_eyes blonde\_hair
* InCase
* elsa dandon_fuga
* wlop
* goth* order:id_asc
And convert them into an initialising Gallery URL, such as:
* [http://safebooru.org/index.php?page=post&s=list&tags=blue_eyes&pid=0](http://safebooru.org/index.php?page=post&s=list&tags=blue_eyes&pid=0)
* [https://konachan.com/post?page=1&tags=blonde\_hair+blue\_eyes](https://konachan.com/post?page=1&tags=blonde_hair+blue_eyes)
* [https://www.hentai-foundry.com/pictures/user/InCase/page/1](https://www.hentai-foundry.com/pictures/user/InCase/page/1)
* [http://rule34.paheal.net/post/list/elsa dandon_fuga/1](http://rule34.paheal.net/post/list/elsa dandon_fuga/1)
* [https://www.deviantart.com/wlop/favourites/?offset=0](https://www.deviantart.com/wlop/favourites/?offset=0)
* [https://danbooru.donmai.us/posts?page=1&tags=goth*+order:id_asc](https://danbooru.donmai.us/posts?page=1&tags=goth*+order:id_asc)
These are all the 'first page' of the results if you type or click-through to the same location on those sites. We are essentially emulating their own simple search-url generation inside the hydrus client.
## actually doing it { id="doing_it" }
Although it is usually a fairly simple process of just substituting the inputted tags into a string template, there are a couple of extra things to think about. Let's look at the ui under _network->downloader definitions->manage gugs_:
![](images/downloader_edit_gug_panel.png)
The client will split whatever the user enters by whitespace, so `blue_eyes blonde_hair` becomes two search terms, `[ 'blue_eyes', 'blonde_hair' ]`, which are then joined back together with the given 'search terms separator', to make `blue_eyes+blonde_hair`. Different sites use different separators, although ' ', '+', and ',' are most common. The new string is substituted into the `%tags%` in the template phrase, and the URL is made.
Note that you will not have to make %20 or %3A percent-encodings for reserved characters here--the network engine handles all that before the request is sent. For the most part, if you need to include or a user puts in ':' or ' ' or 'おっぱい', you can just pass it along straight into the final URL without worrying.
This ui should update as you change it, so have a play and look at how the output example url changes to get a feel for things. Look at the other defaults to see different examples. Even if you break something, you can just cancel out.
The name of the GUG is important, as this is what will be listed when the user chooses what 'downloader' they want to use. Make sure it has a clear unambiguous name.
The initial search text is also important. Most downloaders just take some text tags, but if your GUG expects a numerical artist id (like pixiv artist search does), you should specify that explicitly to the user. You can even put in a brief '(two tag maximum)' type of instruction if you like.
Notice that the Deviart Art example above is actually the stream of wlop's _favourites_, not his works, and without an explicit notice of that, a user could easily mistake what they have selected. 'gelbooru' or 'newgrounds' are bad names, 'type here' is a bad initialising text.
## Nested GUGs { id="nested_gugs" }
Nested Gallery URL Generators are GUGs that hold other GUGs. Some searches actually use more than one stream (such as a Hentai Foundry artist lookup, where you might want to get both their regular works and their scraps, which are two separate galleries under the site), so NGUGs allow you to generate multiple initialising URLs per input. You can experiment with this ui if you like--it isn't too complicated--but you might want to hold off doing anything for real until you are comfortable with everything and know how producing multiple initialising URLs is going to work in the actual downloader.

50
docs/downloader_intro.md Normal file
View File

@ -0,0 +1,50 @@
---
title: Making a Downloader
---
# Making a Downloader
!!! caution
Creating custom downloaders is only for advanced users who understand HTML or JSON. Beware! If you are simply looking for how to add new downloaders, please head over [here](adding_new_downloaders.md).
## this system { id="intro" }
The first versions of hydrus's downloaders were all hardcoded and static--I wrote everything into the program itself and nothing was user-creatable or -fixable. After the maintenance burden of the entire messy system proved too large for me to keep up with and a semi-editable booru system proved successful, I decided to overhaul the entire thing to allow user creation and sharing of every component. It is designed to be very simple to the front-end user--they will typically handle a couple of png files and then select a new downloader from a list--but very flexible (and hence potentially complicated) on the back-end. These help pages describe the different compontents with the intention of making an HTML- or JSON- fluent user able to create and share a full new downloader on their own.
As always, this is all under active development. Your feedback on the system would be appreciated, and if something is confusing or you discover something in here that is out of date, please [let me know](contact.md).
## what is a downloader? { id="downloader" }
In hydrus, a downloader is one of:
**Gallery Downloader**
: This takes a string like 'blue_eyes' to produce a series of thumbnail gallery page URLs that can be parsed for image page URLs which can ultimately be parsed for file URLs and metadata like tags. Boorus fall into this category.
**URL Downloader**
: This does just the Gallery Downloader's back-end--instead of taking a string query, it takes the gallery or post URLs directly from the user, whether that is one from a drag-and-drop event or hundreds pasted from clipboard. For our purposes here, the URL Downloader is a subset of the Gallery Downloader.
**Watcher**
: This takes a URL that it will check in timed intervals, parsing it for new URLs that it then queues up to be downloaded. It typically stops checking after the 'file velocity' (such as '1 new file per day') drops below a certain level. It is mostly for watching imageboard threads.
**Simple Downloader**
: This takes a URL one-time and parses it for direct file URLs. This is a miscellaneous system for certain simple gallery types and some testing/'I just need the third <img> tag's _src_ on this one page' jobs.
The system currently supports HTML and JSON parsing. XML should be fine under the HTML parser--it isn't strict about checking types and all that.
## what does a downloader do? { id="pipeline" }
The Gallery Downloader is the most complicated downloader and uses all the possible components. In order for hydrus to convert our example 'blue_eyes' query into a bunch of files with tags, it needs to:
* Present some user interface named 'safebooru tag search' to the user that will convert their input of 'blue_eyes' into [https://safebooru.org/index.php?page=post&s=list&tags=blue_eyes&pid=0](https://safebooru.org/index.php?page=post&s=list&tags=blue_eyes&pid=0).
* Recognise [https://safebooru.org/index.php?page=post&s=list&tags=blue_eyes&pid=0](https://safebooru.org/index.php?page=post&s=list&tags=blue_eyes&pid=0) as a Safebooru Gallery URL.
* Convert the HTML of a Safebooru Gallery URL into a list URLs like [https://safebooru.org/index.php?page=post&s=view&id=2437965](https://safebooru.org/index.php?page=post&s=view&id=2437965) and possibly a 'next page' URL (e.g. [https://safebooru.org/index.php?page=post&s=list&tags=blue_eyes&pid=40](https://safebooru.org/index.php?page=post&s=list&tags=blue_eyes&pid=40)) that points to the next page of thumbnails.
* Recognise the [https://safebooru.org/index.php?page=post&s=view&id=2437965](https://safebooru.org/index.php?page=post&s=view&id=2437965) URLs as Safebooru Post URLs.
* Convert the HTML of a Safebooru Post URL into a file URL like [https://safebooru.org//images/2329/b6e8c263d691d1c39a2eeba5e00709849d8f864d.jpg](https://safebooru.org//images/2329/b6e8c263d691d1c39a2eeba5e00709849d8f864d.jpg) and some tags like: 1girl, bangs, black gloves, blonde hair, blue eyes, braid, closed mouth, day, fingerless gloves, fingernails, gloves, grass, hair ornament, hairclip, hands clasped, creator:hankuri, interlocked fingers, long hair, long sleeves, outdoors, own hands together, parted bangs, pointy ears, character:princess zelda, smile, solo, series:the legend of zelda, underbust.
So we have three components:
* [**Gallery URL Generator (GUG):**](downloader_gugs.md) faces the user and converts text input into initialising Gallery URLs.
* [**URL Class:**](downloader_url_classes.md) identifies URLs and informs the client how to deal with them.
* [**Parser:**](downloader_parsers.md) converts data from URLs into hydrus-understandable metadata.
URL downloaders and watchers do not need the Gallery URL Generator, as their input _is_ an URL. And simple downloaders also have an explicit 'just download it and parse it with this simple rule' action, so they do not use URL Classes (or even full-fledged Page Parsers) either.

3
docs/downloader_login.md Normal file
View File

@ -0,0 +1,3 @@
# Login Manager
The system works, but this help was never done! Check the defaults for examples of how it works, sorry!

View File

@ -0,0 +1,30 @@
# Parsers
In hydrus, a parser is an object that takes a single block of HTML or JSON data and returns many kinds of hydrus-level metadata.
Parsers are flexible and potentially quite complicated. You might like to open _network->manage parsers_ and explore the UI as you read these pages. Check out how the default parsers already in the client work, and if you want to write a new one, see if there is something already in there that is similar--it is usually easier to duplicate an existing parser and then alter it than to create a new one from scratch every time.
There are three main components in the parsing system (click to open each component's help page):
* [**Formulae:**](downloader_parsers_formulae.md) Take parsable data, search it in some manner, and return 0 to n strings.
* [**Content Parsers:**](downloader_parsers_content_parsers.md) Take parsable data, apply a formula to it to get some strings, and apply a single metadata 'type' and perhaps some additional modifiers.
* [**Page Parsers:**](downloader_parsers_page_parsers.md) Take parsable data, apply content parsers to it, and return all the metadata in an appropriate structure.
Once you are comfortable with these objects, you might like to check out these walkthroughs, which create full parsers from nothing:
* [e621 HTML gallery page](downloader_parsers_full_example_gallery_page.md)
* [Gelbooru HTML file page](downloader_parsers_full_example_file_page.md)
* [Artstation JSON file page API](downloader_parsers_full_example_api.md)
Once you are comfortable with parsers, and if you are feeling brave, check out how the default imageboard and pixiv parsers work. These are complicated and use more experimental areas of the code to get their job done. If you are trying to get a new imageboard parser going and can't figure out subsidiary page parsers, send me a mail or something and I'll try to help you out!
When you are making a parser, consider this checklist (you might want to copy/have your own version of this somewhere):
* Do you get good URLs with good priority? Do you ever accidentally get favourite/popular/advert results you didn't mean to?
* If you need a next gallery page URL, is it ever not available (and hence needs a URL Class fix)? Does it change for search tags with unicode or http-restricted characters?
* Do you get nice namespaced tags? Are any unwanted single characters like -/+/? getting through?
* Is the file hash available anywhere?
* Is a source/post time available?
* Is a source URL available? Is it good quality, or does it often just point to an artist's base twitter profile? If you pull it from text or a tooltip, is it clipped for longer URLs?
[Taken a break? Now let's put it all together ---->](downloader_completion.md)

View File

@ -0,0 +1,70 @@
# Content Parsers
So, we can now generate some strings from a document. Content Parsers will let us apply a single metadata type to those strings to inform hydrus what they are.
![](images/edit_content_parser_panel_tags.png)
A content parser has a name, a content type, and a formula. This example fetches the character tags from a danbooru post.
The name is just decorative, but it is generally a good idea so you can find things again when you next revisit them.
The current content types are:
## urls { id="intro" }
This should be applied to relative ('/image/smile.jpg') and absolute ('https://mysite.com/content/image/smile.jpg') URLs. If the URL is relative, the client will generate an absolute URL based on the original URL used to fetch the data being parsed (i.e. it should all just work).
You can set several types of URL:
* **url to download/pursue** means a Post URL or a File URL in our URL Classes system, like a booru post or an actual raw file like a jpg or webm.
* **url to associate** means an URL you want added to the list of 'known urls' for the file, but not one you want to client to actually download and parse. Use this to neatly add booru 'source' urls.
* **next gallery page** means the next Gallery URL on from the current one.
The 'file url quality precedence' allows the client to select the best of several possible URLs. Given multiple content parsers producing URLs at the same 'level' of parsing, it will select the one with the highest value. Consider these two posts:
* [https://danbooru.donmai.us/posts/3016415](https://danbooru.donmai.us/posts/3016415)
* [https://danbooru.donmai.us/posts/3040603](https://danbooru.donmai.us/posts/3040603)
The Garnet image fits into a regular page and so Danbooru embed the whole original file in the main media canvas. One easy way to find the full File URL in this case would be to select the "src" attribute of the "img" tag with id="image".
The Cirno one, however, is much larger and has been scaled down. The src of the main canvas tag points to a resized 'sample' link. The full link can be found at the 'view original' link up top, which is an "a" tag with id="image-resize-link".
The Garnet post does not have the 'view original' link, so to cover both situations we might want two content parsers--one fetching the 'canvas' "src" and the other finding the 'view original' "href". If we set the 'canvas' one with a quality of 40 and the 'view original' 60, then the parsing system would know to select the 60 when it was available but to fall back to the 40 if not.
As it happens, Danbooru (afaik, always) gives a link to the original file under the 'Size:' metadata to the left. This is the same 'best link' for both posts above, but it isn't so easy to identify. It is a quiet "a" tag without an "id" and it isn't always in the same location, but if you could pin it down reliably, it might be nice to circumvent the whole issue.
Sites can change suddenly, so it is nice to have a bit of redundancy here if it is easy.
## tags { id="tags" }
These are simple--they tell the client that the given strings are tags. You set the namespace here as well. I recommend you parse 'splashbrush' and set the namespace 'creator' here rather than trying to mess around with 'append prefix "creator:"' string conversions at the formula level--it is simpler up here and it lets hydrus handle any edge case logic for you.
Leave the namespace field blank for unnamespaced tags.
## file hash { id="file_hash" }
This says 'this is the hash for the file otherwise referenced in this parser'. So, if you have another content parser finding a File or Post URL, this lets the client know early that that destination happens to have a particular MD5, for instance. The client will look for that hash in its own database, and if it finds a match, it can predetermine if it already has the file (or has previously deleted it) without ever having to download it. When this happens, it will still add tags and associate the file with the URL for it's 'known urls' just as if it _had_ downloaded it!
If you understand this concept, it is great to include. It saves time and bandwidth for everyone. Many site APIs include a hash for this exact reason--they want you to be able to skip a needless download just as much as you do.
![](images/edit_content_parser_panel_hash.png)
The usual suite of hash types are supported: MD5, SHA1, SHA256, and SHA512. An old version of this required some weird string decoding, but this is no longer true. Select 'hex' or 'base64' from the encoding type dropdown, and then just parse the 'e5af57a687f089894f5ecede50049458' or '5a9XpofwiYlPXs7eUASUWA==' text, and hydrus should handle the rest. It will present the parsed hash in hex.
## timestamp { id="timestamp" }
This lets you say that a given number refers to a particular time for a file. At the moment, I only support 'source time', which represents a 'post' time for the file and is useful for thread and subscription check time calculations. It takes a Unix time integer, like 1520203484, which many APIs will provide.
If you are feeling very clever, you can decode a 'MM/DD/YYYY hh:mm:ss' style string to a Unix time integer using string converters, which use some hacky and semi-reliable python %d-style values as per [here](https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior). Look at the existing defaults for examples of this, and don't worry about being more accurate than 12/24 hours--trying to figure out timezone is a hell not worth attempting, and doesn't really matter in the long-run for subscriptions and thread watchers that might care.
## watcher page title { id="page_title" }
This lets the watcher know a good name/subject for its entries. The subject of a thread is obviously ideal here, but failing that you can try to fetch the first part of the first post's comment. It has precendence, like for URLs, so you can tell the parser which to prefer if you have multiple options. Just for neatness and ease of testing, you probably want to use a string converter here to cut it down to the first 64 characters or so.
## veto { id="veto" }
This is a special content type--it tells the next highest stage of parsing that this 'post' of parsing is invalid and to cancel and not return any data. For instance, if a thread post's file was deleted, the site might provide a default '404' stock File URL using the same markup structure as it would for normal images. You don't want to give the user the same 404 image ten times over (with fifteen kinds of tag and source time metadata attached), so you can add a little rule here that says "If the image link is 'https://somesite.com/404.png', raise a veto: File 404" or "If the page has 'No results found' in its main content div, raise a veto: No results found" or "If the expected download tag does not have 'download link' as its text, raise a veto: No Download Link found--possibly Ugoira?" and so on.
![](images/edit_content_parser_panel_veto.png)
They will associate their name with the veto being raised, so it is useful to give these a decent descriptive name so you can see what might be going right or wrong during testing. If it is an appropriate and serious enough veto, it may also rise up to the user level and will be useful if they need to report you an error (like "After five pages of parsing, it gives 'veto: no next page link'").

View File

@ -0,0 +1,160 @@
---
title: Parser Formulae
---
# Parser Formulae { id="formulae" }
Formulae are tools used by higher-level components of the parsing system. They take some data (typically some HTML or JSON) and return 0 to n strings. For our purposes, these strings will usually be tags, URLs, and timestamps. You will usually see them summarised with this panel:
![](images/edit_formula_panel.png)
The different types are currently [html](#html_formula), [json](#json_formula), [compound](#compound_formula), and [context variable](#context_variable_formula).
## html { id="html_formula" }
This takes a full HTML document or a sample of HTML--and any regular sort of XML _should_ also work. It starts at the root node and searches for lower nodes using one or more ordered rules based on tag name and attributes, and then returns string data from those final nodes.
For instance, if you have this:
```html
<html>
<body>
<div class="media_taglist">
<span class="generaltag"><a href="(search page)">blonde hair</a> (3456)</span>
<span class="generaltag"><a href="(search page)">blue eyes</a> (4567)</span>
<span class="generaltag"><a href="(search page)">bodysuit</a> (5678)</span>
<span class="charactertag"><a href="(search page)">samus aran</a> (2345)</span>
<span class="artisttag"><a href="(search page)">splashbrush</a> (123)</span>
</div>
<div class="content">
<span class="media">(a whole bunch of content that doesn't have tags in)</span>
</div>
</body>
</html>
```
_(Most boorus have a taglist like this on their file pages.)_
To find the artist, "splashbrush", here, you could:
* search beneath the root tag (`#!html <html>`) for the `#!html <div>` tag with attribute `class="media_taglist"`
* search beneath that `#!html <div>` for `#!html <span>` tags with attribute `class="artisttag"`
* search beneath those `#!html <span>` tags for `#!html <a>` tags
* and then get the string content of those `#!html <a>` tags
Changing the `artisttag` to `charactertag` or `generaltag` would give you `samus aran` or `blonde hair`, `blue eyes`, `bodysuit` respectively.
You might be tempted to just go straight for any `#!html <span>` with `class="artisttag"`, but many sites use the same class to render a sidebar of favourite/popular tags or some other sponsored content, so it is generally best to try to narrow down to a larger `#!html <div>` container so you don't get anything you don't mean.
### the ui
Clicking 'edit formula' on an HTML formula gives you this:
![](images/edit_html_formula_panel.png)
You edit on the left and test on the right.
### finding the right html tags
When you add or edit one of the specific tag search rules, you get this:
![](images/edit_html_tag_rule_panel.png)
You can set multiple key/value attribute search conditions, but you'll typically be searching for 'class' or 'id' here, if anything.
Note that you can set it to fetch only the xth instance of a found tag, which can be useful in situations like this:
```html
<span class="generaltag">
<a href="(add tag)">+</a>
<a href="(remove tag)">-</a>
<a href="(search page)">blonde hair</a> (3456)
</span>
```
Without any more attributes, there isn't a great way to distinguish the `#!html <a>` with "blonde hair" from the other two--so just set `get the 3rd <a> tag` and you are good.
Most of the time, you'll be searching descendants (i.e. walking down the tree), but sometimes you might have this:
```html
<span>
<a href="(link to post url)">
<img class="thumb" src="(thumbnail image)" />
</a>
</span>
```
There isn't a great way to find the `#!html <span>` or the `#!html <a>` when looking from above here, as they are lacking a class or id, but you can find the `#!html <img>` ok, so if you find those and then add a rule where instead of searching descendants, you are 'walking back up ancestors' like this:
![](images/edit_html_formula_panel_descendants_ancestors.png)
You can solve some tricky problems this way!
You can also set a String Match, which is the same panel as you say in with URL Classes. It tests its best guess at the tag's 'string' value, so you can find a tag with 'Original Image' as its text or that with a regex starts with 'Posted on: '. Have a play with it and you'll figure it out.
### content to fetch
Once you have narrowed down the right nodes you want, you can decide what text to fetch. Given a node of:
```html
<a href="(URL A)" class="thumb_title">Forest Glade</a>
```
Returning the `href` attribute would return the string "(URL A)", returning the string content would give "Forest Glade", and returning the full html would give `#!html <a href="(URL A)" class="thumb">Forest Glade</a>`. This last choice is useful in complicated situations where you want a second, separated layer of parsing, which we will get to later.
### string match and conversion
You can set a final String Match to filter the parsed results (e.g. "only allow strings that only contain numbers" or "only allow full URLs as based on (complicated regex)") and String Converter to edit it (e.g. "remove the first three characters of whatever you find" or "decode from base64").
You won't use these much, but they can sometimes get you out of a complicated situation.
### testing
The testing panel on the right is important and worth using. Copy the html from the source you want to parse and then hit the paste buttons to set that as the data to test with.
## json { id="json_formula" }
This takes some JSON and does a similar style of search:
![](images/edit_json_formula_panel.png)
It is a bit simpler than HTML--if the current node is a list (called an 'Array' in JSON), you can fetch every item or the xth item, and if it is a dictionary (called an 'Object' in JSON), you can fetch a particular entry by name. Since you can't jump down several layers with attribute lookups or tag names like with HTML, you have to go down every layer one at a time. In any case, if you have something like this:
[![](images/json_thread_example.png)](images/json_thread_example.png)
!!! note
It is a great idea to check the html or json you are trying to parse with your browser. Some web browsers have excellent developer tools that let you walk through the nodes of the document you are trying to parse in a prettier way than I would ever have time to put together. This image is one of the views Firefox provides if you simply enter a JSON URL.
Searching for "posts"->1st list item->"sub" on this data will give you "Nobody like kino here.".
Searching for "posts"->all list items->"tim" will give you the three SHA256 file hashes (since the third post has no file attached and so no 'tim' entry, the parser skips over it without complaint).
Searching for "posts"->1st list item->"com" will give you the OP's comment, <span class="dealwithit">\~AS RAW UNPARSED HTML\~</span>.
The default is to fetch the final nodes' 'data content', which means coercing simple variables into strings. If the current node is a list or dict, no string is returned.
But if you like, you can return the json beneath the current node (which, like HTML, includes the current node). This again will come in useful later.
## compound { id="compound_formula" }
If you want to create a string from multiple parsed strings--for instance by appending the 'tim' and the 'ext' in our json example together--you can use a Compound formula. This fetches multiple lists of strings and tries to place them into a single string using `\1` regex substitution syntax:
![](images/edit_compound_formula_panel.png)
This is a complicated example taken from one of my thread parsers. I have to take a modified version of the original thread URL (the first rule, so `\1`) and then append the filename (`\2`) and its extension (`\3`) on the end to get the final file URL of a post. You can mix in more characters in the substitution phrase, like `\1.jpg` or even have multiple instances (`https://\2.muhsite.com/\2/\1`), if that is appropriate.
This is where the magic happens, sometimes, so keep it in mind if you need to do something cleverer than the data you have seems to provide.
## context variable { id="context_variable_formula" }
This is a basic hacky answer to a particular problem. It is a simple key:value dictionary that at the moment only stores one variable, 'url', which contains the original URL used to fetch the data being parsed.
If a different URL Class links to this parser via an API URL, this 'url' variable will always be the API URL (i.e. it literally is the URL used to fetch the data), not any thread/whatever URL the user entered.
![](images/edit_context_variable_formula_panel.png)
Hit the 'edit example parsing context' to change the URL used for testing.
I have used this several times to stitch together file URLs when I am pulling data from APIs, like in the compound formula example above. In this case, the starting URL is `https://a.4cdn.org/tg/thread/57806016.json`, from which I extract the board name, "tg", using the string converter, and then add in 4chan's CDN domain to make the appropriate base file URL (`https:/i.4cdn.org/tg/`) for the given thread. I only have to jump through this hoop in 4chan's case because they explicitly store file URLs by board name. 8chan on the other hand, for instance, has a static `https://media.8ch.net/file_store/` for all files, so it is a little easier (I think I just do a single 'prepend' string transformation somewhere).
If you want to make some parsers, you will have to get familiar with how different sites store and present their data!

View File

@ -0,0 +1,45 @@
# api example
Some sites offer API calls for their pages. Depending on complexity and quality of content, using these APIs may or may not be a good idea. Artstation has a good one--let's first review our URL Classes:
![](images/downloader_api_example_url_class_1.png) ![](images/downloader_api_example_url_class_2.png)
We convert the original Post URL, [https://www.artstation.com/artwork/mQLe1](https://www.artstation.com/artwork/mQLe1) to [https://www.artstation.com/projects/mQLe1.json](https://www.artstation.com/projects/mQLe1.json). Note that Artstation Post URLs can produce multiple files, and that the API url should not be associated with those final files.
So, when the client encounters an 'artstation file page' URL, it will generate the equivalent 'artstation file page json api' URL and use that for downloading and parsing. If you would like to review your API links, check out _network->downloader definitions->manage url class links->api links_. Using Example URLs, it will figure out which URL Classes link to others and ensure you are mapping parsers only to the final link in the chain--there should be several already in there by default.
Now lets look at the JSON. Loading clean JSON in a browser should present you with a nicer view:
![](images/downloader_api_example_json.png)
I have highlighted the data we want, which is:
* The File URLs.
* Creator, title, medium, and unnamespaced tags.
* Source time.
JSON is a dream to parse, and I will assume you are comfortable with Content Parsers from the previous examples, so I'll simply paste the different formulae one after another:
![](images/downloader_api_example_file_urls.png)
Each image is stored under a separate numbered 'assets' list item. This one has just two, but some Artstation pages have dozens of images. The only unusual part here is I also put a String Match of `^(?!.*assets\/covers).*$`, which filters out 'cover' images (such as on [here](https://www.artstation.com/projects/3KyXA.json)), which make for nice portfolio thumbs on the site but are not interesting to us.
![](images/downloader_api_example_creator.png)
This fetches the 'creator' tag. Artstation's API is great because it includes profile data in content requests. There's the creator's presentation name, username, profile link, avatar URLs, all that inside a regular request about this particular work. When that information is missing (like in yiff.party), it may make the API useless to you.
![](images/downloader_api_example_title.png)
![](images/downloader_api_example_medium.png)
![](images/downloader_api_example_unnamespaced.png)
These are all simple. You can take or leave the title and medium tags--some people like them, some don't. This example has no unnamespaced tags, but [this one](https://www.artstation.com/projects/XRm50.json) does. Creator-entered tags are sometimes not worth parsing (on tumblr, for instance, you often get run-on tags like #imbored #whatisevengoingon that are irrelevent to the work), but Artstation users are all professionals trying to get their work noticed, so the tags are usually pretty good.
![](images/downloader_api_example_source_time.png)
This again uses python's datetime to decode the date, which Artstation presents with millisecond accuracy, ha ha. I use a `(.+:..)\..*->\1` regex (i.e. "get everything before the period") to strip off the timezone and milliseconds and then decode as normal.
## summary { id="summary" }
APIs that are stable and free to access (e.g. do not require OAuth or other complicated login headers) can make parsing fantastic. They save bandwidth and CPU time, and they are typically easier to work with than HTML. Unfortunately, the boorus that do provide APIs often list their tags without namespace information, so I recommend you double-check you can get what you want before you get too deep into it. Some APIs also offer incomplete data, such as relative URLs (relative to the original URL!), which can be a pain to figure out in our system.

View File

@ -0,0 +1,99 @@
# file page example
Let's look at this page: [https://gelbooru.com/index.php?page=post&s=view&id=3837615](https://gelbooru.com/index.php?page=post&s=view&id=3837615).
What sorts of data are we interested in here?
* The image URL.
* The different tags and their namespaces.
* The secret md5 hash buried in the HTML.
* The post time.
* The Deviant Art source URL.
## the file url { id="the_file_url" }
A tempting strategy for pulling the file URL is to just fetch the src of the embedded `#!html <img>` tag, but:
* If the booru also supports videos or flash, you'll have to write separate and likely more complicated rules for `#!html <video>` and `#!html <embed>` tags.
* If the booru shows 'sample' sizes for large images--as this one does!--pulling the src of the image you see won't get the full-size original for large images.
If you have an account with the site you are parsing and have clicked the appropriate 'Always view original' setting, you may not see these sorts of sample-size banners! I recommend you log out of/go incognito for sites you are inspecting for hydrus parsing (unless a log-in is required to see content, so the hydrus user will have to set up hydrus-side login to actually use the parser), or you can easily NSFW-gates and other logged-out hurdles.
When trying to pin down the right link, if there are no good alternatives, you often have to write several File URL rules with different precedence, saying 'get the "Click Here to See Full Size" link at 75' and 'get the embed's "src" at 25' and so on to make sure you cover different situations, but as it happens Gelbooru always posts the actual File URL at:
* `#!html <meta property="og:image" content="https://gelbooru.com/images/38/6e/386e12e33726425dbd637e134c4c09b5.jpeg" />` under the `#!html <head>`
* `#!html <a href="https://simg3.gelbooru.com//images/38/6e/386e12e33726425dbd637e134c4c09b5.jpeg" target="_blank" style="font-weight: bold;">Original image</a>` which can be found by putting a String Match in the html formula.
`#!html <meta>` with `property="og:image"` is easy to search for (and they use the same tag for video links as well!). For the Original Image, you can use a String Match like so:
[![](images/downloader_post_example_clean.png)](images/downloader_post_example_clean.png)
Gelbooru uses "Original Image" even when they link to webm, which is helpful, but like "og:image", it could be changed to 'video' in future.
I think I wrote my gelbooru parser before I added String Matches to individual HTML formulae tag rules, so I went with this, which is a bit more cheeky:
![](images/downloader_post_example_cheeky.png)
But it works. Sometimes, just regexing for links that fit the site's CDN is a good bet for finding difficult stuff.
## tags { id="tags" }
Most boorus have a taglist on the left that has a nice id or class you can pull, and then each namespace gets its own class for CSS-colouring:
![](images/downloader_post_example_meta_tag.png)
Make sure you browse around the booru for a bit, so you can find all the different classes they use. character/artist/copyright are common, but some sneak in the odd meta/species/rating.
Skipping ?/-/+ characters can be a pain if you are lacking a nice tag-text class, in which case you can add a regex String Match to the HTML formula (as I do here, since Gelb offers '?' links for tag definitions) like \[^\\?\\-+\\s\], which means "the text includes something other than just '?' or '-' or '+' or whitespace".
## md5 hash { id="md5_hash" }
If you look at the Gelbooru File URL, [**https://gelbooru.com/images/38/6e/386e12e33726425dbd637e134c4c09b5.jpeg**](https://gelbooru.com/images/38/6e/386e12e33726425dbd637e134c4c09b5.jpeg), you may notice the filename is all hexadecimal. It looks like they store their files under a two-deep folder structure, using the first four characters--386e here--as the key. It sure looks like '386e12e33726425dbd637e134c4c09b5' is not random ephemeral garbage!
In fact, Gelbooru use the MD5 of the file as the filename. Many storage systems do something like this (hydrus uses SHA256!), so if they don't offer a `#!html <meta>` tag that explicitly states the md5 or sha1 or whatever, you can sometimes infer it from one of the file links. This screenshot is from the more recent version of hydrus, which has the more powerful 'string processing' system for string transformations. It has an intimidating number of nested dialogs, but we can stay simple for now, with only the one regex substitution step inside a string 'converter':
![](images/downloader_post_example_md5.png)
Here we are using the same property="og:image" rule to fetch the File URL, and then we are regexing the hex hash with `.*(\[0-9a-f\]{32}).*` (MD5s are 32 hex characters). We select 'hex' as the encoding type. Hashes require a tiny bit more data handling behind the scenes, but in the Content Parser test page it presents the hash again neatly in English: "md5 hash: 386e12e33726425dbd637e134c4c09b5"), meaning everything parsed correct. It presents the hash in hex even if you select the encoding type as base64.
If you think you have found a hash string, you should obviously test your theory! The site might not be using the actual MD5 of file bytes, as hydrus does, but instead some proprietary scheme. Download the file and run it through a program like HxD (or hydrus!) to figure out its hashes, and then search the View Source for those hex strings--you might be surprised!
Finding the hash is hugely beneficial for a parser--it lets hydrus skip downloading files without ever having seen them before!
## source time { id="source_time" }
Post/source time lets subscriptions and watchers make more accurate guesses at current file velocity. It is neat to have if you can find it, but:
<b class="dealwithit">FUCK ALL TIMEZONES FOREVER</b>
Gelbooru offers--
```html
<li>Posted: 2017-08-18 19:59:44<br /> by <a href="index.php?page=account&s=profile&uname=jayage5ds">jayage5ds</a></li>
```
--so let's see how we can turn that into a Unix timestamp:
![](images/downloader_post_example_source_time.png)
I find the `#!html <li>` that starts "Posted: " and then decode the date according to the hackery-dackery-doo format from [here](https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior). `%c` and `%z` are unreliable, and attempting timezone adjustments is overall a supervoid that will kill your time for no real benefit--subs and watchers work fine with 12-hour imprecision, so if you have a +0300 or EST in your string, just cut those characters off with another String Transformation. As long as you are getting about the right day, you are fine.
## source url { id="source_url" }
Source URLs are nice to have if they are high quality. Some boorus only ever offer artist profiles, like `https://twitter.com/artistname`, whereas we want singular Post URLs that point to other places that host this work. For Gelbooru, you could fetch the Source URL as we did source time, searching for "Source: ", but they also offer more easily in an edit form:
```html
<input type="text" name="source" size="40" id="source" value="https://www.deviantart.com/art/Lara-Croft-Artifact-Dive-699335378" />
```
This is a bit of a fragile location to parse from--Gelb could change or remove this form at any time, whereas the "Posted: " `#!html <li>` is probably firmer, but I expect I wrote it before I had String Matches in. It works for now, which in this game is often Good Enough™.
Also--be careful pulling from text or tooltips rather than an href-like attribute, as whatever is presented to the user may be clipped for longer URLs. Make sure you try your rules on a couple of different pages to make sure you aren't pulling "https://www.deviantart.com/art/Lara..." by accident anywhere!
## summary { id="summary" }
Phew--all that for a bit of Lara Croft! Thankfully, most sites use similar schemes. Once you are familiar with the basic idea, the only real work is to duplicate an existing parser and edit for differences. Our final parser looks like this:
![](images/downloader_post_example_final.png)
This is overall a decent parser. Some parts of it may fail when Gelbooru update to their next version, but that can be true of even very good parsers with multiple redundancy. For now, hydrus can use this to quickly and efficiently pull content from anything running Gelbooru 0.2.5., and the effort spent now can save millions of combined _right-click->save as_ and manual tag copies in future. If you make something like this and share it about, you'll be doing a good service for those who could never figure it out.

View File

@ -0,0 +1,42 @@
# gallery page example
!!! caution
These guides should _roughly_ follow what comes with the client by default! You might like to have the actual UI open in front of you so you can play around with the rules and try different test parses yourself.
Let's look at this page: [https://e621.net/post/index/1/rating:safe pokemon](https://e621.net/post/index/1/rating:safe pokemon)
We've got 75 thumbnails and a bunch of page URLs at the bottom.
## first, the main page { id="main_page" }
This is easy. It gets a good name and some example URLs. e621 has some different ways of writing out their queries (and as they use some tags with '/', like 'male/female', this can cause character encoding issues depending on whether the tag is in the path or query!), but we'll put that off for now--we just want to parse some stuff.
![](images/downloader_gallery_example_main.png)
## thumbnail links { id="thumbnail_urls" }
Most browsers have some good developer tools to let you Inspect Element and get a better view of the HTML DOM. Be warned that this information isn't always the same as View Source (which is what hydrus will get when it downloads the initial HTML document), as some sites load results dynamically with javascript and maybe an internal JSON API call (when sites move to systems that load more thumbs as you scroll down, it makes our job more difficult--in these cases, you'll need to chase down the embedded JSON or figure out what API calls their JS is making--the browser's developer tools can help you here again). Thankfully, e621 is (and most boorus are) fairly static and simple:
![](images/downloader_gallery_example_thumb_html.png)
Every thumb on e621 is a `#!html <span>` with class="thumb" wrapping an `#!html <a>` and an `#!html <img>`. This is a common pattern, and easy to parse:
![](images/downloader_gallery_example_thumb_parsing.png)
There's no tricky String Matches or String Converters needed--we are just fetching hrefs. Note that the links get relative-matched to example.com for now--I'll probably fix this to apply to one of the example URLs, but rest assured that IRL the parser will 'join' its url up with the appropriate Gallery URL used to fetch the data. Sometimes, you might want to add a rule for `search descendents for the first <div> tag with id=content` to make sure you are only grabbing thumbs from the main box, whether that is a `#!html <div>` or a `#!html <span>`, and whether it has `id="content`" or `class="mainBox"`, but unless you know that booru likes to embed "popular" or "favourite" 'thumbs' up top that will be accidentally caught by a `#!html <span>`'s with `class="thumb"`, I recommend you not make your rules overly specific--all it takes is for their dev to change the name of their content box, and your whole parser breaks. I've ditched the `#!html <span>` requirement in the rule here for exactly that reason--`class="thumb"` is necessary and sufficient.
Remember that the parsing system allows you to go up ancestors as well as down descendants. If your thumb-box has multiple links--like to see the artist's profile or 'set as favourite'--you can try searching for the `#!html <span>`s, then down to the `#!html <img>`, and then _up_ to the nearest `#!html <a>`. In English, this is saying, "Find me all the image link URLs in the thumb boxes."
## next gallery page link { id="next_gallery_url" }
Most boorus have 'next' or '>>' at the bottom, which can be simple enough, but many have a neat `#!html <link href="/post/index/2/rating:safe%20pokemon" rel="next" />` in the `#!html <head>`. The `#!html <head>` solution is easier, if available, but my default e621 parser happens to pursue the 'paginator':
![](images/downloader_gallery_example_paginator_parsing.png)
As it happens, e621 also apply the `rel="next"` attribute to their "Next >>" links, which makes it all that easier for us to find. Sometimes there is no "next" id or class, and you'll want to add a String Match to your html formula to test for a string value of '>>' or whatever it is. A good trick is to View Source and then search for the critical `/post/index/2/` phrase you are looking for--you might find what you want in a `#!html <link>` tag you didn't expect or even buried in a hidden 'share to tumblr' button. `#!html <form>`s for reporting or commenting on content are another good place to find content ids.
Note that this finds two URLs. e621 apply the `rel="next"` to both the "2" link and the "Next >>" one. The download engine merges the parser's dupes, so don't worry if you end up parsing both the 'top' and 'bottom' next page links, or if you use multiple rules to parse the same data in different ways.
## summary { id="summary" }
With those two rules, we are done. Gallery parsers are nice and simple.

View File

@ -0,0 +1,59 @@
# Page Parsers
We can now produce individual rows of rich metadata. To arrange them all into a useful structure, we will use Page Parsers.
The Page Parser is the top level parsing object. It takes a single document and produces a list--or a list of lists--of metadata. Here's the main UI:
![](images/edit_page_parser_panel_e621_main.png)
Notice that the edit panel has three sub-pages.
## main { id="main" }
* **Name**: Like for content parsers, I recommend you add good names for your parsers.
* **Pre-parsing conversion**: If your API source encodes or wraps the data you want to parse, you can do some string transformations here. You won't need to use this very often, but if your source gives the JSON wrapped in javascript (like the old tumblr API), it can be invaluable.
* **Example URLs**: Here you should add a list of example URLs the parser works for. This lets the client automatically link this parser up with URL classes for you and any users you share the parser with.
## content parsers { id="content_parsers" }
This page is just a simple list:
![](images/edit_page_parser_panel_e621_content_parsers.png)
Each content parser here will be applied to the document and returned in this page parser's results list. Like most boorus, e621's File Pages only ever present one file, and they have simple markup, so the solution here was simple. The full contents of that test window are:
```
*** 1 RESULTS BEGIN ***
tag: character:krystal
tag: creator:s mino930
file url: https://static1.e621.net/data/fc/b6/fcb673ed89241a7b8d87a5dcb3a08af7.jpg
tag: anthro
tag: black nose
tag: blue fur
tag: blue hair
tag: clothing
tag: female
tag: fur
tag: green eyes
tag: hair
tag: hair ornament
tag: jewelry
tag: short hair
tag: solo
tag: video games
tag: white fur
tag: series:nintendo
tag: series:star fox
tag: species:canine
tag: species:fox
tag: species:mammal
*** RESULTS END ***
```
When the client sees this in a downloader context, it will where to download the file and which tags to associate with it based on what the user has chosen in their 'tag import options'.
## subsidiary page parsers { id="subsidiary_page_parsers" }
Here be dragons. This was an attempt to make parsing more helpful in certain API situations, but it ended up ugly. I do not recommend you use it, as I will likely scratch the whole thing and replace it with something better one day. It basically splits the page up into pieces that can then be parsed by nested page parsers as separate objects, but the UI and workflow is hell. Afaik, the imageboard API parsers use it, but little/nothing else. If you are really interested, check out how those work and maybe duplicate to figure out your own imageboard parser and/or send me your thoughts on how to separate File URL/timestamp combos better.

View File

@ -0,0 +1,19 @@
---
title: Sharing Downloaders
---
# Sharing Downloaders
If you are working with users who also understand the downloader system, you can swap your GUGs, URL Classes, and Parsers separately using the import/export buttons on the relevant dialogs, which work in pngs and clipboard text.
But if you want to share conveniently, and with users who are not familiar with the different downloader objects, you can package everything into a single easy-import png as per [here](adding_new_downloaders.md).
The dialog to use is _network->downloader definitions->export downloaders_:
![](images/downloader_export_panel.png)
It isn't difficult. Essentially, you want to bundle enough objects to make one or more 'working' GUGs at the end. I recommend you start by just hitting 'add gug', which--using Example URLs--will attempt to figure out everything you need by itself.
This all works on Example URLs and some domain guesswork, so make sure your url classes are good and the parsers have correct Example URLs as well. If they don't, they won't all link up neatly for the end user. If part of your downloader is on a different domain to the GUGs and Gallery URLs, then you'll have to add them manually. Just start with 'add gug' and see if it looks like enough.
Once you have the necessary and sufficient objects added, you can export to png. You'll get a similar 'does this look right?' summary as what the end-user will see, just to check you have everything in order and the domains all correct. If that is good, then make sure to give the png a sensible filename and embellish the title and description if you need to. You can then send/post that png wherever, and any regular user will be able to use your work.

View File

@ -0,0 +1,203 @@
# URL Classes
The fundamental connective tissue of the downloader system is the 'URL Class'. This object identifies and normalises URLs and links them to other components. Whenever the client handles a URL, it tries to match it to a URL Class to figure out what to do.
## the types of url { id="url_types" }
For hydrus, an URL is useful if it is one of:
File URL
:
This returns the full, raw media file with no HTML wrapper. They typically end in a filename like [http://safebooru.org//images/2333/cab1516a7eecf13c462615120ecf781116265f17.jpg](http://safebooru.org//images/2333/cab1516a7eecf13c462615120ecf781116265f17.jpg), but sometimes they have a more complicated fetch command ending like 'file.php?id=123456' or '/post/content/123456'.
These URLs are remembered for the file in the 'known urls' list, so if the client happens to encounter the same URL in future, it can determine whether it can skip the download because the file is already in the database or has previously been deleted.
It is not important that File URLs be matched by a URL Class. File URL is considered the 'default', so if the client finds no match, it will assume the URL is a file and try to download and import the result. You might want to particularly specify them if you want to present them in the media viewer or discover File URLs are being confused for Post URLs or something.
Post URL
:
This typically return some HTML that contains a File URL and metadata such as tags and post time. They sometimes present multiple sizes (like 'sample' vs 'full size') of the file or even different formats (like 'ugoira' vs 'webm'). The Post URL for the file above, [http://safebooru.org/index.php?page=post&s=view&id=2429668](http://safebooru.org/index.php?page=post&s=view&id=2429668) has this 'sample' presentation. Finding the best File URL in these cases can be tricky!
This URL is also saved to 'known urls' and will usually be similarly skipped if it has previously been downloaded. It will also appear in the media viewer as a clickable link.
Gallery URL
:
This presents a list of Post URLs or File URLs. They often also present a 'next page' URL. It could be a page like [http://safebooru.org/index.php?page=post&s=list&tags=yorha\_no.\_2\_type\_b&pid=0](http://safebooru.org/index.php?page=post&s=list&tags=yorha_no._2_type_b&pid=0) or an API URL like [http://safebooru.org/index.php?page=dapi&s=post&tags=yorha\_no.\_2\_type\_b&q=index&pid=0](http://safebooru.org/index.php?page=dapi&s=post&tags=yorha_no._2_type_b&q=index&pid=0).
Watchable URL
:
This is the same as a Gallery URL but represents an ephemeral page that receives new files much faster than a gallery but will soon 'die' and be deleted. For our purposes, this typically means imageboard threads.
## the components of a url { id="url_components" }
As far as we are concerned, a URL string has four parts:
* **Scheme:** `http` or `https`
* **Location/Domain:** `safebooru.org` or `i.4cdn.org` or `cdn002.somebooru.net`
* **Path Components:** `index.php` or `tesla/res/7518.json` or `pictures/user/daruak/page/2` or `art/Commission-animation-Elsa-and-Anna-541820782`
* **Query Parameters:** `page=post&s=list&tags=yorha_no._2_type_b&pid=40` or `page=post&s=view&id=2429668`
So, let's look at the 'edit url class' panel, which is found under _network->manage url classes_:
![](images/downloader_edit_url_class_panel.png)
A TBIB File Page like [https://tbib.org/index.php?page=post&s=view&id=6391256](https://tbib.org/index.php?page=post&s=view&id=6391256) is a Post URL. Let's look at the metadata first:
Name and type
:
Like with GUGs, we should set a good unambiguous name so the client can clearly summarise this url to the user. 'tbib file page' is good.
This is a Post URL, so we set the 'post url' type.
Association logic
:
All boorus and most sites only present one file per page, but some sites present multiple files on one page, usually several pages in a series/comic, as with pixiv. Danbooru-style thumbnail links to 'this file has a post parent' do not count here--I mean that a single URL embeds multiple full-size images, either with shared or separate tags. It is **very important** to the hydrus client's downloader logic (making decisions about whether it has previously visited a URL, so whether to skip checking it again) that if a site can present multiple files on a single page that 'can produce multiple files' is checked.
Related is the idea of whether a 'known url' should be associated. Typically, this should be checked for Post and File URLs, which are fixed, and unchecked for Gallery and Watchable URLs, which are ephemeral and give different results from day to day. There are some unusual exceptions, so give it a brief thought--but if you have no special reason, leave this as the default for the url type.
And now, for matching the string itself, let's revisit our four components:
Scheme
:
TBIB supports http and https, so I have set the 'preferred' scheme to https. Any 'http' TBIB URL a user inputs will be automatically converted to https.
Location/Domain
:
For Post URLs, the domain is always "tbib.org".
The 'allow' and 'keep' subdomains checkboxes let you determine if a URL with "artistname.artsite.com" will match a URL Class with "artsite.com" domain and if that subdomain should be remembered going forward. Most sites do not host content on subdomains, so you can usually leave 'match' unchecked. The 'keep' option (which is only available if 'keep' is checked) is more subtle, only useful for rare cases, and unless you have a special reason, you should leave it checked. (For keep: In cases where a site farms out File URLs to CDN servers on subdomains--like randomly serving a mirror of "https://muhbooru.org/file/123456" on "https://srv2.muhbooru.org/file/123456"--and removing the subdomain still gives a valid URL, you may not wish to keep the subdomain.) Since TBIB does not use subdomains, these options do not matter--we can leave both unchecked.
'www' and 'www2' and similar subdomains are automatically matched. Don't worry about them.
Path Components
:
TBIB just uses a single "index.php" on the root directory, so the path is not complicated. Were it longer (like "gallery/cgi/index.php", we would add more ("gallery" and "cgi"), and since the path of a URL has a strict order, we would need to arrange the items in the listbox there so they were sorted correctly.
Query Parameters
:
TBIB's index.php takes many query parameters to render different page types. Note that the Post URL uses "s=view", while TBIB Gallery URLs use "s=list". In any case, for a Post URL, "id", "page", and "s" are necessary and sufficient.
## string matches { id="string_matches" }
As you edit these components, you will be presented with the Edit String Match Panel:
![](images/edit_string_match_panel.png)
This lets you set the type of string that will be valid for that component. If a given path or query component does not match the rules given here, the URL will not match the URL Class. Most of the time you will probably want to set 'fixed characters' of something like "post" or "index.php", but if the component you are editing is more complicated and could have a range of different valid values, you can specify just numbers or letters or even a regex pattern. If you try to do something complicated, experiment with the 'example string' entry to make sure you have it set how you think.
Don't go overboard with this stuff, though--most sites do not have super-fine distinctions between their different URL types, and hydrus users will not be dropping user account or logout pages or whatever on the client, so you can be fairly liberal with the rules.
## how do they match, exactly? { id="match_details" }
This URL Class will be assigned to any URL that matches the location, path, and query. Missing path compontent or query parameters in the URL will invalidate the match but additonal ones will not!
For instance, given:
* URL A: https://8ch.net/tv/res/1002432.html
* URL B: https://8ch.net/tv/res
* URL C: https://8ch.net/tv/res/1002432
* URL D: https://8ch.net/tv/res/1002432.json
* URL Class that looks for "(characters)/res/(numbers).html" for the path
Only URL A will match
And:
* URL A: https://boards.4chan.org/m/thread/16086187
* URL B: https://boards.4chan.org/m/thread/16086187/ssg-super-sentai-general-651
* URL Class that looks for "(characters)/thread/(numbers)" for the path
Both URL A and B will match
And:
* URL A: https://www.pixiv.net/member\_illust.php?mode=medium&illust\_id=66476204
* URL B: https://www.pixiv.net/member\_illust.php?mode=medium&illust\_id=66476204&lang=jp
* URL C: https://www.pixiv.net/member_illust.php?mode=medium
* URL Class that looks for "illust_id=(numbers)" in the query
Both URL A and B will match, URL C will not
If multiple URL Classes match a URL, the client will try to assign the most 'complicated' one, with the most path components and then query parameters.
Given two example URLs and URL Classes:
* URL A: https://somebooru.com/post/123456
* URL B: https://somebooru.com/post/123456/manga_subpage/2
* URL Class A that looks for "post/(number)" for the path
* URL Class B that looks for "post/(number)/manga_subpage/(number)" for the path
URL A will match URL Class A but not URL Class B and so will receive A.
URL B will match both and receive URL Class B as it is more complicated.
This situation is not common, but when it does pop up, it can be a pain. It is usually a good idea to match exactly what you need--no more, no less.
## normalising urls { id="url_normalisation" }
Different URLs can give the same content. The http and https versions of a URL are typically the same, and:
* [https://gelbooru.com/index.php?page=post&s=view&id=3767497](https://gelbooru.com/index.php?page=post&s=view&id=3767497)
* gives the same as:
* [https://gelbooru.com/index.php?id=3767497&page=post&s=view](https://gelbooru.com/index.php?id=3767497&page=post&s=view)
And:
* [https://e621.net/post/show/1421754/abstract\_background-animal\_humanoid-blush-brown_ey](https://e621.net/post/show/1421754/abstract_background-animal_humanoid-blush-brown_ey)
* is the same as:
* [https://e621.net/post/show/1421754](https://e621.net/post/show/1421754)
* _is the same as:_
* [https://e621.net/post/show/1421754/help\_computer-made\_up_tags-REEEEEEEE](https://e621.net/post/show/1421754/help_computer-made_up_tags-REEEEEEEE)
Since we are in the business of storing and comparing URLs, we want to 'normalise' them to a single comparable beautiful value. You see a preview of this normalisation on the edit panel. Normalisation happens to all URLs that enter the program.
Note that in e621's case (and for many other sites!), that text after the id is purely decoration. It can change when the file's tags change, so if we want to compare today's URLs with those we saw a month ago, we'd rather just be without it.
On normalisation, all URLs will get the preferred http/https switch, and their query parameters will be alphabetised. File and Post URLs will also cull out any surplus path or query components. This wouldn't affect our TBIB example above, but it will clip the e621 example down to that 'bare' id URL, and it will take any surplus 'lang=en' or 'browser=netscape_24.11' garbage off the query text as well. URLs that are not associated and saved and compared (i.e. normal Gallery and Watchable URLs) are not culled of unmatched path components or query parameters, which can sometimes be useful if you want to match (and keep intact) gallery URLs that might or might not include an important 'sort=desc' type of parameter.
Since File and Post URLs will do this culling, be careful that you not leave out anything important in your rules. Make sure what you have is both necessary (nothing can be removed and still keep it valid) and sufficient (no more needs to be added to make it valid). It is a good idea to try pasting the 'normalised' version of the example URL into your browser, just to check it still works.
## 'default' values { id="default_values" }
Some sites present the first page of a search like this:
[https://danbooru.donmai.us/posts?tags=skirt](https://danbooru.donmai.us/posts?tags=skirt)
But the second page is:
[https://danbooru.donmai.us/posts?tags=skirt&page=2](https://danbooru.donmai.us/posts?tags=skirt&page=2)
Another example is:
[https://www.hentai-foundry.com/pictures/user/Mister69M](https://www.hentai-foundry.com/pictures/user/Mister69M)
[https://www.hentai-foundry.com/pictures/user/Mister69M/page/2](https://www.hentai-foundry.com/pictures/user/Mister69M/page/2)
What happened to 'page=1' and '/page/1'? Adding those '1' values in works fine! Many sites, when an index is absent, will secretly imply an appropriate 0 or 1. This looks pretty to users looking at a browser address bar, but it can be a pain for us, who want to match both styles to one URL Class. It would be nice if we could recognise the 'bare' initial URL and fill in the '1' values to coerce it to the explicit, automation-friendly format. Defaults to the rescue:
![](images/downloader_edit_url_class_panel_default.png)
After you set a path component or query parameter String Match, you will be asked for an optional 'default' value. You won't want to set one most of the time, but for Gallery URLs, it can be hugely useful--see how the normalisation process automatically fills in the missing path component with the default! There are plenty of examples in the default Gallery URLs of this, so check them out. Most sites use page indices starting at '1', but Gelbooru-style imageboards use 'pid=0' file index (and often move forward 42, so the next pages will be 'pid=42', 'pid=84', and so on, although others use deltas of 20 or 40).
## can we predict the next gallery page? { id="next_gallery_page_prediction" }
Now we can harmonise gallery urls to a single format, we can predict the next gallery page! If, say, the third path component or 'page' query parameter is always a number referring to page, you can select this under the 'next gallery page' section and set the delta to change it by. The 'next gallery page url' section will be automatically filled in. This value will be consulted if the parser cannot find a 'next gallery page url' from the page content.
It is neat to set this up, but I only recommend it if you actually cannot reliably parse a next gallery page url from the HTML later in the process. It is neater to have searches stop naturally because the parser said 'no more gallery pages' than to have hydrus always one page beyond and end every single search on an uglier 'No results found' or 404 result.
Unfortunately, some sites will either not produce an easily parsable next page link or randomly just not include it due to some issue on their end (Gelbooru is a funny example of this). Also, APIs will often have a kind of 'start=200&num=50', 'start=250&num=50' progression but not include that state in the XML or JSON they return. These cases require the automatic next gallery page rules (check out Artstation and tumblr api gallery page URL Classes in the defaults for examples of this).
## how do we link to APIs? { id="api_links" }
If you know that a URL has an API backend, you can tell the client to use that API URL when it fetches data. The API URL needs its own URL Class.
To define the relationship, click the "String Converter" button, which gives you this:
![](images/edit_string_converter_panel.png)
You may have seen this panel elsewhere. It lets you convert a string to another over a number of transformation steps. The steps can be as simple as adding or removing some characters or applying a full regex substitution. For API URLs, you are mostly looking to isolate some unique identifying data ("m/thread/16086187" in this case) and then substituting that into the new API path. It is worth testing this with several different examples!
When the client links regular URLs to API URLs like this, it will still associate the human-pretty regular URL when it needs to display to the user and record 'known urls' and so on. The API is just a quick lookup when it actually fetches and parses the respective data.

230
docs/duplicates.md Normal file
View File

@ -0,0 +1,230 @@
---
title: duplicates
---
# duplicates { id="intro" }
As files are shared on the internet, they are often resized, cropped, converted to a different format, altered by the original or a new artist, or turned into a template and reinterpreted over and over and over. Even if you have a very restrictive importing workflow, your client is almost certainly going to get some **duplicates**. Some will be interesting alternate versions that you want to keep, and others will be thumbnails and other low-quality garbage you accidentally imported and would rather delete. Along the way, it would be nice to merge your ratings and tags to the better files so you don't lose any work.
Finding and processing duplicates within a large collection is impossible to do by hand, so I have written a system to do the heavy lifting for you. It currently works on still images, but an extension for gifs and video is planned.
Hydrus finds _potential_ duplicates using a search algorithm that compares images by their shape. Once these pairs of potentials are found, they are presented to you through a filter like the archive/delete filter to determine their exact relationship and if you want to make a further action, such as deleting the 'worse' file of a pair. All of your decisions build up in the database to form logically consistent groups of duplicates and 'alternate' relationships that can be used to infer future information. For instance, if you say that file A is a duplicate of B and B is a duplicate of C, A and C are automatically recognised as duplicates as well.
This all starts on--
## the duplicates processing page { id="duplicates_page" }
On the normal 'new page' selection window, hit _special->duplicates processing_. This will open this page:
![](images/dupe_management.png)
Let's go to the preparation page first:
![](images/dupe_preparation.png)
The 'similar shape' algorithm works on _distance_. Two files with 0 distance are likely exact matches, such as resizes of the same file or lower/higher quality jpegs, whereas those with distance 4 tend to be to be hairstyle or costume changes. You will be starting on distance 0 and not expect to ever go above 4 or 8 or so. Going too high increases the danger of being overwhelmed by false positives.
If you are interested, the current version of this system uses a 64-bit [phash](https://jenssegers.com/perceptual-image-hashes) to represent the image shape and a [VPTree](https://en.wikipedia.org/wiki/VP-tree) to search different files' phashes' relative [hamming distance](https://en.wikipedia.org/wiki/Hamming_distance). I expect to extend it in future with multiple phash generation (flips, rotations, and 'interesting' image crops and video frames) and most-common colour comparisons.
Searching for duplicates is fairly fast per file, but with a large client with hundreds of thousands of files, the total CPU time adds up. You can do a little manual searching if you like, but once you are all settled here, I recommend you hit the cog icon on the preparation page and let hydrus do this page's catch-up search work in your regular maintenance time. It'll swiftly catch up and keep you up to date without you even thinking about it.
Start searching on the 'exact match' search distance of 0. It is generally easier and more valuable to get exact duplicates out of the way first.
Once you have some files searched, you should see a potential pair count appear in the 'filtering' page.
## the filtering page { id="duplicate_filtering_page" }
_Processing duplicates can be real trudge-work if you do not set up a workflow you enjoy. It is a little slower than the archive/delete filter, and sometimes takes a bit more cognitive work. For many users, it is a good task to do while listening to a podcast or having a video going on another screen._
If you have a client with tens of thousands of files, you will likely have thousands of potential pairs. This can be intimidating, but do not worry--due to the A, B, C logical inferrences as above, you will not have to go through every single one. The more information you put into the system, the faster the number will drop.
The filter has a regular file search interface attached. As you can see, it defaults to _system:everything_, but you can limit what files you will be working on simply by adding new search predicates. You might like to only work on files in your archive (i.e. that you know you care about to begin with), for instance. You can choose whether both files of the pair should match the search, or just one. 'creator:' tags work very well at cutting the search domain to something more manageable and consistent--try your favourite creator!
If you would like an example from the current search domain, hit the 'show some random potential pairs' button, and it will show two or more files that seem related. It is often interesting and surprising to see what it finds! The action buttons below allow for quick processing of these pairs and groups when convenient (particularly for large cg sets with 100+ alternates), but I recommend you leave these alone until you know the system better.
When you are ready, launch the filter.
## the duplicates filter { id="duplicates_filter" }
_We have not set up your duplicate 'merge' options yet, so do not get too into this. For this first time, just poke around, make some pretend choices, and then cancel out and choose to forget them._
![](images/dupe_filter.png)
Like the archive/delete filter, this uses quick mouse-clicks, keyboard shortcuts, or button clicks to action pairs. It presents two files at a time, labelled A and B, which you can quickly switch between just as in the normal media viewer. As soon as you action them, the next pair is shown. The two files will have their current zoom-size locked so they stay the same size (and in the same position) as you switch between them. Scroll your mouse wheel a couple of times and see if any obvious differences stand out.
Please note the hydrus media viewer does not currently work well with large resolutions at high zoom (it gets laggy and may have memory issues). Don't zoom in to 1600% and try to look at jpeg artifact differences on very large files, as this is simply not well supported yet.
The hover window on the right also presents a number of 'comparison statements' to help you make your decision. Green statements mean this current file is probably 'better', and red the opposite. Larger, older, higher-quality, more-tagged files are generally considered better. These statements have scores associated with them (which you can edit in _file->options->duplicates_), and the file of the pair with the highest score is presented first. If the files are duplicates, you can _generally_ assume the first file you see, the 'A', is the better, particularly if there are several green statements.
The filter will need to occasionally checkpoint, saving the decisions so far to the database, before it can fetch the next batch. This allows it to apply inferred information from your current batch and reduce your pending count faster before serving up the next set. It will present you with a quick interstitial 'confirm/back' dialog just to let you know. This happens more often as the potential count decreases.
## the decisions to make { id="duplicates_decisions" }
There are three ways a file can be related to another in the current duplicates system: duplicates, alternates, or false positive (not related).
**False positive (not related)** is the easiest. You will not see completely unrelated pairs presented very often in the filter, particularly at low search distances, but if the shape of face and hair and clothing happen to line up (or geometric shapes, often), the search system may make a false positive match. In this case, just click 'they are not related'.
**Alternate** relations are files that are not duplicates but obviously related in some way. Perhaps a costume change or a recolour. Hydrus does not have rich alternate support yet (but it is planned, and highly requested), so this relationship is mostly a 'holding area' for files that we will revisit for further processing in the future.
**Duplicate** files are of **the exact same thing**. They may be different resolutions, file formats, encoding quality, or one might even have watermark, but they are fundamentally different views on the exact same art. As you can see with the buttons, you can select one file as the 'better' or say they are about the same. If the files are basically the same, there is no point stressing about which is 0.2% better--just click 'they are the same'. For better/worse pairs, you might have reason to keep both, but most of the time I recommend you delete the worse.
You can customise the shortcuts under _file->shortcuts->duplicate_filter_. The defaults are:
* Left-click or space: **this is better, delete the other**.
* Right-click: **they are related alternates**.
* Middle-click: **Go back one decision.**
* Enter/Escape: **Stop filtering.**
## merging metadata { id="duplicates_merging" }
If two duplicates have different metadata like tags or archive status, you probably want to merge them. Cancel out of the filter and click the 'edit default duplicate metadata merge options' button:
![](images/dupe_merge_options.png)
By default, these options are fairly empty. You will have to set up what you want based on your services and preferences. Setting a simple 'copy all tags' is generally a good idea, and like/dislike ratings also often make sense. The settings for better and same quality should probably be similar, but it depends on your situation.
If you choose the 'custom action' in the duplicate filter, you will be presented with a fresh 'edit duplicate merge options' panel for the action you select and can customise the merge specifically for that choice. ('favourite' options will come here in the future!)
Once you are all set up here, you can dive into the duplicate filter. Please let me know how you get on with it!
## what now? { id="future" }
The duplicate system is still incomplete. Now the db side is solid, the UI needs to catch up. Future versions will show duplicate information on thumbnails and the media viewer and allow quick-navigation to a file's duplicates and alternates.
For now, if you wish to see a file's duplicates, right-click it and select _file relationships_. You can review all its current duplicates, open them in a new page, appoint the new 'best file' of a duplicate group, and even mass-action selections of thumbnails.
You can also search for files based on the number of file relations they have (including when setting the search domain of the duplicate filter!) using _system:file relationships_. You can also search for best/not best files of groups, which makes it easy, for instance, to find all the spare duplicate files if you decide you no longer want to keep them.
I expect future versions of the system to also auto-resolve easy duplicate pairs, such as clearing out pixel-for-pixel png versions of jpgs.
## game cgs { id="game_cgs" }
If you import a lot of game CGs, which frequently have dozens or hundreds of alternates, I recommend you set them as alternates by selecting them all and setting the status through the thumbnail right-click menu. The duplicate filter, being limited to pairs, needs to compare all new members of an alternate group to all other members once to verify they are not duplicates. This is not a big deal for alternates with three or four members, but game CGs provide an overwhelming edge case. Setting a group of thumbnails as alternate 'fixes' their alternate status immediately, discounting the possibility of any internate duplicates, and provides an easy way out of this situation.
## more information and examples { id="duplicates_examples" }
### better/worse { id="duplicates_examples_better_worse" }
Which of two files is better? Here are some common reasons:
* higher resolution
* better image quality
* png over jpg for screenshots
* jpg over png for busy images
* jpg over png for pixel-for-pixel duplicates
* a better crop
* no watermark or site-frame or undesired blemish
* has been tagged by other people, so is likely to be the more 'popular'
However these are not hard rules--sometimes a file has a larger resolution or filesize due to a bad upscaling or encoding decision by the person who 'reinterpreted' it. You really have to look at it and decide for yourself.
Here is a good example of a better/worse pair:
[![](images/dupe_better_1.png)](images/dupe_better_1.png) [![](images/dupe_better_2.jpg)](images/dupe_better_2.jpg)
The first image is better because it is a png (pixel-perfect pngs are always better than jpgs for screenshots of applications--note how obvious the jpg's encoding artifacts are on the flat colour background) and it has a slightly higher (original) resolution, making it less blurry. I presume the second went through some FunnyJunk-tier trash meme site to get automatically cropped to 960px height and converted to the significantly smaller jpeg. Whatever happened, let's drop the second and keep the first.
When both files are jpgs, differences in quality are very common and often significant:
[![](images/dupes_better_sg_a.jpg)](images/dupes_better_sg_a.jpg) [![](images/dupes_better_sg_b.jpg)](images/dupes_better_sg_b.jpg)
Again, this is mostly due to some online service resizing and lowering quality to ease on their bandwidth costs. There is usually no reason to keep the lower quality version.
### same quality duplicates { id="duplicates_examples_same" }
When are two files the same quality? A good rule of thumb is if you scroll between them and see no obvious differences, and the comparison statements do not suggest anything significant, just set them as same quality.
Here are two same quality duplicates:
[![](images/dupe_exact_match_1.png)](images/dupe_exact_match_1.png) [![](images/dupe_exact_match_2.png)](images/dupe_exact_match_2.png)
There is no obvious different between those two. The filesize is significantly different, so I suspect the smaller is a lossless png optimisation, but in the grand scheme of things, that doesn't matter so much. Many of the big content providers--Facebook, Google, Cloudflare--automatically 'optimise' the data that goes through their networks in order to save bandwidth. Although jpegs are often a slaughterhouse, with pngs it is usually harmless.
Given the filesize, you might decide that these are actually a better/worse pair--but if the larger image had tags and was the 'canonical' version on most boorus, the decision might not be so clear. You can choose better/worse and delete one randomly, but sometimes you may just want to keep both without a firm decision on which is best, so just set 'same quality' and move on. Your time is more valuable than a few dozen KB.
Sometimes, you will see pixel-for-pixel duplicate jpegs of very slightly different size, such as 787KB vs 779KB. The smaller of these is usually an exact duplicate that has had its internal metadata (e.g. EXIF tags) stripped by a program or website CDN. They are same quality unless you have a strong opinion on whether having internal metadata in a file is useful.
### alternates { id="duplicates_examples_alternates" }
As I wrote above, hydrus's alternates system in not yet properly ready. It is important to have a basic 'alternates' relationship for now, but it is a holding area until we have a workflow to apply 'WIP'- or 'recolour'-type labels and present that information nicely in the media viewer.
Alternates are not of exactly the same thing, but one is variant of the other or they are both descended from a common original. The precise definition is up to you, but it generally means something like:
* the files are recolours
* the files are alternate versions of the same image produced by the same or different artists (e.g. clean/messy or with/without hair ribbon)
* iterations on a close template
* different versions of a file's progress, such as the steps from the initial draft sketch to a final shaded version
Here are some recolours of the same image:
[![](images/dupe_alternates_recolours.png)](images/dupe_alternates_recolours.png)
And some WIP:
[![](images/dupe_alternates_progress.png)](images/dupe_alternates_progress.png)
And a costume change:
[![](images/dupe_alternates_costume.png)](images/dupe_alternates_costume.png)
None of these are duplicates, but they are obviously related. The duplicate search will notice they are similar, so we should let the client know they are 'alternate'.
Here's a subtler case:
[![](images/dupe_alternate_boxer_a.jpg)](images/dupe_alternate_boxer_a.jpg) [![](images/dupe_alternate_boxer_b.jpg)](images/dupe_alternate_boxer_b.jpg)
These two files are very similar, but try opening both in separate tabs and then flicking back and forth: the second's glove-string is further into the mouth and has improved chin shading, a more refined eye shape, and shaved pubic hair. It is simple to spot these differences in the client's duplicate filter when you scroll back and forth.
I believe the second is an improvement on the first by the same artist, so it is a WIP alternate. You might also consider it a 'better' improvement.
Here are three files you might or might not consider to be alternates:
[![](images/dupe_alternate_1.jpg)](images/dupe_alternate_1.jpg)
[![](images/dupe_alternate_2.jpg)](images/dupe_alternate_2.jpg)
[![](images/dupe_alternate_3.jpg)](images/dupe_alternate_3.jpg)
These are all based on the same template--which is why the dupe filter found them--but they are not so closely related as those above, and the last one is joking about a different ideology entirely and might deserve to be in its own group. Ultimately, you might prefer just to give them some shared tag and consider them not alternates _per se_.
### not related/false positive { id="duplicates_examples_false_positive" }
Here are two files that match false positively:
[![](images/dupe_not_dupes_1.png)](images/dupe_not_dupes_1.png)
[![](images/dupe_not_dupes_2.jpg)](images/dupe_not_dupes_2.jpg)
Despite their similar shape, they are neither duplicates nor of even the same topic. The only commonality is the medium. I would not consider them close enough to be alternates--just adding something like 'screenshot' and 'imageboard' as tags to both is probably the closest connection they have.
Recording the 'false positive' relationship is important to make sure the comparison does not come up again in the duplicate filter.
The incidence of false positives increases as you broaden the search distance--the less precise your search, the less likely it is to be correct. At distance 14, these files all match, but uselessly:
[![](images/dupe_garbage.png)](images/dupe_garbage.png)
## the duplicates system { id="duplicates_advanced" }
_(advanced nonsense, you can skip this section. tl;dr: duplicate file groups keep track of their best quality file, sometimes called the King)_
Hydrus achieves duplicate transitivity by treating duplicate files as groups. Although you action pairs, if you set (A duplicate B), that creates a group (A,B). Subsequently setting (B duplicate C) extends the group to be (A,B,C), and so (A duplicate C) is transitively implied.
The first version of the duplicate system attempted to record better/worse/same information for all files in a virtual duplicate group, but this proved very complicated, workflow-heavy, and not particularly useful. The new system instead appoints a single _King_ as the best file of a group. All other files in the group are beneath the King and have no other relationship data retained.
![](images/dupe_dupe_group_diagram.png)
This King represents the group in the duplicate filter (and in potential pairs, which are actually recorded between duplicate media groups--even if most of them at the outset only have one member). If the other file in a pair is considered better, it becomes the new King, but if it is worse or equal, it [merges into the other members](images/dupe_simple_merge.png). When two Kings are compared, [whole groups can merge](images/dupe_group_merge.png)!
Alternates are stored in a similar way, except [the members are duplicate groups](images/dupe_alternate_group_diagram.png) rather than individual files and they have no significant internal relationship metadata yet. If α, β, and γ are duplicate groups that each have one or more files, then setting (α alt β) and (β alt γ) creates an alternate group (α,β,γ), with the caveat that α and γ will still be sent to the duplicate filter once just to check they are not duplicates by chance. The specific file members of these groups, A, B, C and so on, inherit the relationships of their parent groups when you right-click on their thumbnails.
False positive relationships are stored between pairs of alternate groups, so they apply transitively between all the files of either side's alternate group. If (α alt β) and (ψ alt ω) and you apply (α fp ψ), then (α fp ω), (β fp ψ), and (β fp ω) are all transitively implied.
??? example "More examples"
[![](images/dupe_whole_system_diagram.png "Some fun.")](images/dupe_whole_system_diagram.png)
[![](images/dupe_whole_system_simple_diagram.png "And simpler.")](images/dupe_whole_system_simple_diagram.png)

108
docs/faq.md Normal file
View File

@ -0,0 +1,108 @@
# FAQ
## what is a repository? { id="repositories" }
A _repository_ is a service in the hydrus network that stores a certain kind of information--files or tag mappings, for instance--as submitted by users all over the internet. Those users periodically synchronise with the repository so they know everything that it stores. Sometimes, like with tags, this means creating a complete local copy of everything on the repository. Hydrus network clients never send queries to repositories; they perform queries over their local cache of the repository's data, keeping everything confined to the same computer.
## what is a tag? { id="tags" }
[wiki](https://en.wikipedia.org/wiki/Tag_(metadata))
A _tag_ is a small bit of text describing a single property of something. They make searching easy. Good examples are "flower" or "nicolas cage" or "the sopranos" or "2003". By combining several tags together ( e.g. \[ 'tiger woods', 'sports illustrated', '2008' \] or \[ 'cosplay', 'the legend of zelda' \] ), a huge image collection is reduced to a tiny and easy-to-digest sample.
A good word for the connection of a particular tag to a particular file is _mapping_.
Hydrus is designed with the intention that tags are for _searching_, not _describing_. Workflows and UI are tuned for finding files and other similar files (e.g. by the same artist), and while it is possible to have nice metadata overlays around files, this is not considered their chief purpose. Trying to have 'perfect' descriptions for files is often a rabbit-hole that can consume hours of work with relatively little demonstrable benefit.
All tags are automatically converted to lower case. 'Sunset Drive' becomes 'sunset drive'. Why?
1. Although it is more beautiful to have 'The Lord of the Rings' rather than 'the lord of the rings', there are many, many special cases where style guides differ on which words to capitalise.
2. As 'The Lord of the Rings' and 'the lord of the rings' are semantically identical, it is natural to search in a case insensitive way. When case does not matter, what point is there in recording it?
Furthermore, leading and trailing whitespace is removed, and multiple whitespace is collapsed to a single character.
' yellow dress '
becomes
'yellow dress'
## what is a namespace? { id="namespaces" }
A _namespace_ is a category that in hydrus prefixes a tag. An example is 'person' in the tag 'person:ron paul'--it lets people and software know that 'ron paul' is a name. You can create any namespace you like; just type one or more words and then a colon, and then the next string of text will have that namespace.
The hydrus client gives namespaces different colours so you can pick out important tags more easily in a large list, and you can also search by a particular namespace, even creating complicated predicates like 'give all files that do not have any character tags', for instance.
## why not use filenames and folders? { id="filenames" }
As a retrieval method, filenames and folders are less and less useful as the number of files increases. Why?
* A filename is not unique; did you mean this "04.jpg" or _this_ "04.jpg" in another folder? Perhaps "04 (3).jpg"?
* A filename is not guaranteed to describe the file correctly, e.g. hello.jpg
* A filename is not guaranteed to stay the same, meaning other programs cannot rely on the filename address being valid or even returning the same data every time.
* A filename is often--for _ridiculous_ reasons--limited to a certain prohibitive character set. Even when utf-8 is supported, some arbitrary ascii characters are usually not, and different localisations, operating systems and formatting conventions only make it worse.
* Folders can offer context, but they are clunky and time-consuming to change. If you put each chapter of a comic in a different folder, for instance, reading several volumes in one sitting can be a pain. Nesting many folders adds navigation-latency and tends to induce less informative "04.jpg"-type filenames.
So, the client tracks files by their _hash_. This technical identifier easily eliminates duplicates and permits the database to robustly attach other metadata like tags and ratings and known urls and notes and everything else, even across multiple clients and even if a file is deleted and later imported.
As a general rule, I suggest you not set up hydrus to parse and display all your imported files' filenames as tags. 'image.jpg' is useless as a tag. [Shed the concept of filenames as you would chains.](https://www.youtube.com/watch?v=_yYS0ZZdsnA)
## can the client manage files from their original locations? { id="external_files" }
When the client imports a file, it makes a quickly accessible but human-ugly copy in its internal database, by default under _install\_dir/db/client\_files_. When it needs to access that file again, it always knows where it is, and it can be confident it is what it expects it to be. It never accesses the original again.
This storage method is not always convenient, particularly for those who are hesitant about converting to using hydrus completely and also do not want to maintain two large copies of their collections. The question comes up--"can hydrus track files from their original locations, without having to copy them into the db?"
The technical answer is, "This support could be added," but I have decided not to, mainly because:
* Files stored in locations outside of hydrus's responsibility can change or go missing (particularly if a whole parent folder is moved!), which erodes the assumptions it makes about file access, meaning additional checks would have to be added before important operations, often with no simple recovery.
* External duplicates would not be merged, and the file system would have to be extended to handle pointless 1->n hash->path relationships.
* Many regular operations--like figuring out whether orphaned files should be physically deleted--are less simple.
* Backing up or restoring a distributed external file system is much more complicated.
* It would require more code to maintain and would mean a laggier db and interface.
* Hydrus is an attempt to get _away_ from files and folders--if a collection is too large and complicated to manage using explorer, what's the point in supporting that old system?
It is not unusual for new users who ask for this feature to find their feelings change after getting more experience with the software. If desired, path text can be preserved as tags using regexes during import, and getting into the swing of searching by metadata rather than navigating folders often shows how very effective the former is over the latter. Most users eventually import most or all of their collection into hydrus permanently, deleting their old folder structure as they go.
For this reason, if you are hesitant about doing things the hydrus way, I advise you try running it on a smaller subset of your collection, say 5,000 files, leaving the original copies completely intact. After a month or two, think about how often you used hydrus to look at the files versus navigating through folders. If you barely used the folders, you probably do not need them any more, but if you used them a lot, then hydrus might not be for you, or it might only be for some sorts of files in your collection.
## why use sqlite? { id="sqlite" }
Hydrus uses SQLite for its database engine. Some users who have experience with other engines such as MySQL or PostgreSQL sometimes suggest them as alternatives. SQLite serves hydrus's needs well, and at the moment, there are no plans to change.
Since this question has come up frequently, a user has written an excellent document talking about the reasons to stick with SQLite. If you are interested in this subject, please check it out here:
[https://gitgud.io/prkc/hydrus-why-sqlite/blob/master/README.md](https://gitgud.io/prkc/hydrus-why-sqlite/blob/master/README.md)
## what is a hash? { id="hashes" }
[wiki](https://en.wikipedia.org/wiki/Hash_function)
Hashes are a subject you usually have to be a software engineer to find interesting. The simple answer is that they are unique names for things. Hashes make excellent identifiers inside software, as you can safely assume that f099b5823f4e36a4bd6562812582f60e49e818cf445902b504b5533c6a5dad94 refers to one particular file and no other. In the client's normal operation, you will never encounter a file's hash. If you want to see a thumbnail bigger, double-click it; the software handles the mathematics.
_For those who_ are _interested: hydrus uses SHA-256, which spits out 32-byte (256-bit) hashes. The software stores the hash densely, as 32 bytes, only encoding it to 64 hex characters when the user views it or copies to clipboard. SHA-256 is not perfect, but it is a great compromise candidate; it is secure for now, it is reasonably fast, it is available for most programming languages, and newer CPUs perform it more efficiently all the time._
## what is an access key? { id="access_keys" }
The hydrus network's repositories do not use username/password, but instead a single strong identifier-password like this:
_7ce4dbf18f7af8b420ee942bae42030aab344e91dc0e839260fcd71a4c9879e3_
These hex numbers give you access to a particular account on a particular repository, and are often combined like so:
_7ce4dbf18f7af8b420ee942bae42030aab344e91dc0e839260fcd71a4c9879e3@hostname.com:45871_
They are long enough to be impossible to guess, and also randomly generated, so they reveal nothing personally identifying about you. Many people can use the same access key (and hence the same account) on a repository without consequence, although they will have to share any bandwidth limits, and if one person screws around and gets the account banned, everyone will lose access.
The access key is the account. Do not give it to anyone you do not want to have access to the account. An administrator will never need it; instead they will want your _account id_.
## what is an account id? { id="account_ids" }
This is another long string of random hexadecimal that _identifies_ your account without giving away access. If you need to identify yourself to a repository administrator (say, to get your account's permissions modified), you will need to tell them your account id. You can copy it to your clipboard in _services->review services_.
## why can my friend not see what I just uploaded? { id="delays" }
The repositories do not work like conventional search engines; it takes a short but predictable while for changes to propagate to other users.
The client's searches only ever happen over its local cache of what is on the repository. Any changes you make will be delayed for others until their next update occurs. At the moment, the update period is 100,000 seconds, which is about 1 day and 4 hours.

View File

@ -0,0 +1,115 @@
---
title: downloading
---
# getting started with downloading
## downloading
The hydrus client has a sophisticated and completely user-customisable download system. It can pull from any booru or regular gallery site or imageboard, and also from some special examples like twitter and tumblr. A fresh install will by default have support for the bigger sites, but it _is_ possible, with some work, for any user to create a new shareable downloader for a new site.
The downloader is highly parallelisable, and while the default bandwidth rules should stop you from running too hot and downloading so much at once that you annoy the servers you are downloading from, there are no brakes in the program on what you can get.
!!! danger
It is very important that you take this slow. Many users get overexcited with their new ability to download 500,000 files _and then do so_, only discovering later that 98% of what they got was junk that they now have to wade through. Figure out what workflows work for you, how fast you process files, what content you _actually_ want, how much bandwidth and hard drive space you have, and prioritise and throttle your incoming downloads to match. If you can realistically only archive/delete filter 50 files a day, there is little benefit to downloading 500 new files a day. START SLOW.
It also takes a decent whack of CPU to import a file. You'll usually never notice this with just one hard drive import going, but if you have twenty different download queues all competing for database access and individual 0.1-second hits of heavy CPU work, you will discover your client starts to judder and lag. Keep it in mind, and you'll figure out what your computer is happy with. I also recommend you try to keep your total loaded files/urls to be under 20,000 to keep things snappy. Remember that you can pause your import queues, if you need to calm things down a bit.
## let's do it { id="start" }
Open the new page selector with F9 and then hit _download->gallery_:
![](images/downloader_page.png)
The gallery page can download from multiple sources at the same time. Each entry in the list represents a basic combination of two things:
**source**
: The site you are getting from. Safebooru or Danbooru or Deviant Art or twitter or anywhere else.
**query text**
: Something like 'contrapposto' or 'blonde\_hair blue\_eyes' or an artist name like 'incase'. Whatever is searched on the site to return a list of ordered media.
So, when you want to start a new download, you first select the source with the button and then type in a query in the text box and hit enter. The download will soon start and fill in information, and thumbnails should stream in, just like the hard drive importer. The downloader typically works by walking through the search's gallery pages one by one, queueing up the found files for later download. There are several intentional delays built into the system, so do not worry if work seems to halt for a little while--you will get a feel for hydrus's 'slow persistent growth' style with experience.
Do a test download now, for fun! Pause its gallery search after a page or two, and then pause the file import queue after a dozen or so files come in.
The thumbnail panel can only show results from one queue at a time, so double-click on an entry to 'highlight' it, which will show its thumbs and also give more detailed info and controls in the 'highlighted query' panel. I encourage you to explore the highlight panel over time, as it can show and do quite a lot. Double-click again to 'clear' it.
It is a good idea to 'test' larger downloads, either by visiting the site itself for that query, or just waiting a bit and reviewing the first files that come in. Just make sure that you _are_ getting what you thought you would, whether that be verifying that the query text is correct or that the site isn't only giving you bloated gifs or other bad quality files. The 'file limit', which stops the gallery search after the set number of files, is also great for limiting fishing expeditions (such as overbroad searches like 'wide_hips', which on the bigger boorus have 100k+ results and return _variable_ quality). If the gallery search runs out of new files before the file limit is hit, the search will naturally stop (and the entry in the list should gain a ⏹ 'stop' symbol).
_Note that some sites only serve 25 or 50 pages of results, despite their indices suggesting hundreds. If you notice that one site always bombs out at, say, 500 results, it may be due to a decision on their end. You can usually test this by visiting the pages hydrus tried in your web browser._
**In general, particularly when starting out, artist searches are best.** They are usually fewer than a thousand files and have fairly uniform quality throughout.
## parsing tags
But we don't just want files--most sites offer tags as well. By default, hydrus now starts with a local tag service called 'downloader tags' and it will parse (get) all the tags from normal gallery sites and put them in this service. You don't have to do anything, you will get some decent tags. As you use the client, you will figure out which tags you like and where you want them. On the downloader page, click _tag import options_:
![](images/tag_import_options_default.png)
This is an important dialog, although you will not need to use it much. It governs which tags are parsed and where they go. To keep things easy to manage, a new downloader will refer to the 'default' tag import options for a website, but for now let's set some values just for this downloader:
![](images/tag_import_options_specific.png)
You can see that each tag service on your client has a separate section. If you add the PTR, that will get a new box too. A new client is set to _get all tags_ for 'downloader tags' service. Things can get much more complicated. Have a play around with the options here as you figure things out. Most of the controls have tooltips or longer explainers in sub-dialogs, so don't be afraid to try things.
It is easy to get tens of thousands of tags by downloading this way. Different sites offer different kinds and qualities of tags, and the client's downloaders (which were designed by me, the dev, or a user) may parse all or only some of them. Many users like to just get everything on offer, but others only ever want, say, 'creator', 'series', and 'character' tags. If you feel brave, click that 'all tags' button, which will take you into hydrus's advanced 'tag filter', which allows you to select which of the incoming list of tags will be added.
The blacklist button will let you skip downloading files that have certain tags (perhaps you would like to auto-skip all images with 'gore', 'scat', or 'diaper'?), again using the tag filter, while the whitelist enables you to only allow files that have at least one of a set of tags. The 'additional tags' adds some fixed personal tags to all files coming in--for instance, you might like to add 'process into favourites' to your 'my tags' for some query you really like so you can find those files again later and process them separately. That little 'cog' icon button can also do some advanced things.
To edit the defaults, hit up _network->downloaders->manage default tag import options_. You should do this as you get a better idea of your preferences. You can set them for all file posts generally, all watchers, and for specific sites as well.
!!! warning
The file limit and file/tag import options on the upper panel, if changed, will only apply to **new** queries. If you want to change the options for an existing queue, either do so on its highlight panel below or use the 'set options to queries' button.
## watching threads { id="threads" }
If you are an imageboard user, try going to a thread you like and drag-and-drop its URL (straight from your web browser's address bar) onto the hydrus client. It should open up a new 'watcher' page and import the thread's files!
![](images/watcher_page.png)
With only one URL to check, watchers are a little simpler than gallery searches, but as that page is likely receiving frequent updates, it checks it over and over until it dies. By default, the watcher's 'checker options' will regulate how quickly it checks based on the speed at which new files are coming in--if a thread is fast, it will check frequently; if it is running slow, it may only check once per day. When a thread falls below a critical posting velocity or 404s, checking stops.
In general, you can leave the checker options alone, but you might like to revisit them if you are always visiting faster or slower boards and find you are missing files or getting DEAD too early.
## bandwidth
It will not be too long until you see a "bandwidth free in xxxxx..." message. As a long-term storage solution, hydrus is designed to be polite in its downloading--both to the source server and your computer. The client's default bandwidth rules have some caps to stop big mistakes, spread out larger jobs, and at a bare minimum, no domain will be hit more than once a second.
All the bandwidth rules are completely customisable. They can get quite complicated. I **strongly** recommend you not look for them until you have more experience. I **especially strongly** recommend you not ever turn them all off, thinking that will improve something, as you'll probably render the client too laggy to function and get yourself an IP ban from the next server you pull from.
If you want to download 10,000 files, set up the queue and let it work. The client will take breaks, likely even to the next day, but it will get there in time. Many users like to leave their clients on all the time, just running in the background, which makes these sorts of downloads a breeze--you check back in the evening and discover your download queues, watchers, and subscriptions have given you another thousand things to deal with.
Again: the real problem with downloading is not finding new things, it is keeping up with what you get. Start slow and figure out what is important to your bandwidth budget, hard drive budget, and free time budget. <span class="spoiler">Almost everyone fails at this.</span>
## subscriptions
Subscriptions are a way to automatically recheck a good query in future, to keep up with new files. Many users come to use them. When you are comfortable with downloaders and have an idea of what you like, come back and read the subscription help, which is [here](getting_started_subscriptions.md).
## other downloading { id="other_downloaders" }
There are two other ways of downloading, mostly for advanced or one-off use.
The **url downloader** works like the gallery downloader but does not do searches. You can paste downloadable URLs to it, and it will work through them as one list. Dragging and dropping recognisable URLs onto the client (e.g. from your web browser) will also spawn and use this downloader.
The **simple downloader** will do very simple parsing for unusual jobs. If you want to download all the images in a page, or all the image link destinations, this is the one to use. There are several default parsing rules to choose from, and if you learn the downloader system yourself, it will be easy to make more.
## logins
The client now supports a flexible (but slightly prototype and ugly) login system. It can handle simple sites and is as completely user-customisable as the downloader system. The client starts with multiple login scripts by default, which you can review under _network->downloaders->manage logins_:
![](images/manage_logins.png)
Many sites grant all their content without you having to log in at all, but others require it for NSFW or special content, or you may wish to take advantage of site-side user preferences like personal blacklists. If you wish, you can give hydrus some login details here, and it will try to login--just as a browser would--before it downloads anything from that domain.
!!! warning
For multiple reasons, I do not recommend you use important accounts with hydrus. Use a throwaway account you don't care much about.
To start using a login script, select the domain and click 'edit credentials'. You'll put in your username/password, and then 'activate' the login for the domain, and that should be it! The next time you try to get something from that site, the first request will wait (usually about ten seconds) while a login popup performs the login. Most logins last for about thirty days (and many refresh that 30-day timer every time you make a new request), so once you are set up, you usually never notice it again, especially if you have a subscription on the domain.
Most sites only have one way of logging in, but hydrus does support more. Hentai Foundry is a good example--by default, the client performs the 'click-through' login as a guest, which requires no credentials and means any hydrus client can get any content from the start. But this way of logging in only lasts about 60 minutes or so before having to be refreshed, and it does not hide any spicy stuff, so if you use HF a lot, I recommend you create a throwaway account, set the filters you like in your HF profile (e.g. no guro content), and then click the 'change login script' in the client to the proper username/pass login.
The login system is new and still a bit experimental. Don't try to pull off anything too weird with it! If anything goes wrong, it will likely delay the script (and hence the whole domain) from working for a while, or invalidate it entirely. If the error is something simple, like a password typo or current server maintenance, go back to this dialog to fix and scrub the error and try again. If the site just changed its layout, you may need to update the login script. If it is more complicated, please contact me, hydrus_dev, with the details!
If you would like to login to a site that is not yet supported by hydrus (usually ones with a Captcha in the login page), see about getting a web browser add-on that lets you export a cookies.txt (either for the whole browser or just for that domain) and then drag and drop that file onto the hydrus _network->data->review session cookies_ dialog. This sometimes does not work if your add-on's export formatting is unusual. If it does work, hydrus will import and use those cookies, which skips the login by making your hydrus pretend to be your browser directly. This is obviously advanced and hacky, so if you need to do it, let me know how you get on and what tools you find work best!

View File

@ -0,0 +1,143 @@
---
title: files
---
# getting started with files
If any of this is confusing, a simpler guide is [here](https://github.com/Zweibach/text/blob/master/Hydrus/Hydrus%20Help%20Docs/00_tableOfContents.md), and some video guides are [here](https://github.com/CuddleBear92/Hydrus-guides)!
!!! warning
Hydrus can be powerful, and you control everything. By default, you are not connected to any servers and absolutely nothing is shared with other users--and you can't accidentally one-click your way to exposing your whole collection--but if you tag private files with real names and click to upload that data to a tag repository that other people have access to, the program won't try to stop you. If you want to do private sexy slideshows of your shy wife, that's great, but think twice before you upload files or tags anywhere, particularly as you learn. It is **impossible** to contain leaks of private information.
There are no limits and few brakes on your behaviour. It is possible to import millions of files. For many new users, their first mistake is downloading too much too fast in overexcitement and becoming overwhelmed. Take things slow and figure out good processing workflows that work for your schedule before you start adding 500 subscriptions.
## the problem { id="hellmode_dot_exe" }
If you have ever seen something like this--
![](images/pictures.png "After a while, I started just dropping everything in here unsorted. It would only grow, hungry and untouchable.")
--then you already know the problem: using a filesystem to manage a lot of images sucks.
Finding the right picture quickly can be difficult. Finding everything by a particular artist at a particular resolution is unthinkable. Integrating new files into the whole nested-folder mess is a further pain, and most operating systems bug out when displaying 10,000+ thumbnails.
## so, what does the hydrus client do? { id="the_client" }
Let's first focus on _importing_ files.
When you first boot the client, you will see a blank page. There are no files in the database and so there is nothing to search. To get started, I suggest you simply drag-and-drop a folder with a hundred or so images onto the main window. A dialog will appear affirming what you want to import. Ok that, and a new page will open. Thumbnails will stream in as the software processes each file.
[![](images/import.png)](images/import.png)
The files are being imported into the client's database. [The client discards their filenames.](faq.md#filenames)
Notice your original folder and its files are untouched. You can move the originals somewhere else, delete them, and the client will still return searches fine. In the same way, you can delete from the client, and the original files will remain unchanged--import is a **copy**, not a move, operation. The client performs all its operations on its internal database, which holds copies of the files it imports. If you find yourself enjoying using the client and decide to completely switch over, you can delete the original files you import without worry. You can always export them back again later.
[FAQ: can the client manage files from their original locations?](faq.md#external_files)
Now:
* Click on a thumbnail; it'll show in the preview screen, bottom left.
* Double- or middle-click the thumbnail to open the media viewer. You can hit ++f++ to switch between giving the fullscreen a frame or not. You can use your scrollwheel or page up/down to browse the media and ctrl+scrollwheel to zoom in and out.
* Move your mouse to the top-left, top-middle and top-right of the media viewer. You should see some 'hover' panels pop into place.
![](images/hover_command.png)
The one on the left is for tags, the middle is for browsing and zoom commands, and the right is for status and ratings icons. You will learn more about these things as you get more experience with the program.
* Press ++enter++ or double/middle-click again to close the media viewer.
* You can quickly select multiple files by shift- or ctrl- clicking. Notice how the status bar at the bottom of the screen updates with the number selected and their total size. Right-clicking your selection will present another summary and many actions.
* Hit ++f9++ to bring up a new page chooser. You can navigate it with the arrow keys, your numpad, or your mouse.
* On the left of a normal search page is a text box. When it is focused, a dropdown window appears. It looks like this:
![](images/ac_dropdown_system.png)
This is where you enter the predicates that define the current search. If the text box is empty, the dropdown will show 'system' tags that let you search by file metadata such as file size or animation duration. To select one, press the up or down arrow keys and then enter, or double click with the mouse.
When you have some tags in your database, typing in the text box will search them:
![](images/ac_dropdown_feel.png)
The (number) shows how many files have that tag, and hence how large the search result will be if you select that tag.
Clicking 'searching immediately' will pause the searcher, letting you add several tags in a row without sending it off to get results immediately. Ignore the other buttons for now--you will figure them out as you gain experience with the program.
* You can remove from the list of 'active tags' in the box above with a double-click, or by entering the exact same tag again through the dropdown.
* Play with the system tags more if you like, and the sort-by dropdown. The collect-by dropdown is advanced, so wait until you understand _namespaces_ before expecting it to do anything.
* To close a page, middle-click its tab.
The client can currently import the following mimetypes:
* **image/bmp** (.bmp - converted to image/png on import)
* **image/gif** (.gif)
* **image/png** (.png)
* **image/apng** (.apng)
* **image/jpeg** (.jpg)
* **image/tiff** (.tiff)
* **image/webp** (.webp)
* **video/x-msvideo** (.avi)
* **video/x-flv** (.flv)
* **video/x-matroska** (.mkv)
* **video/quicktime** (.mov)
* **video/mp4** (.mp4)
* **video/mpeg** (.mpeg)
* **video/webm** (.webm)
* **video/x-ms-wmv** (.wmv)
* **audio/mp3** (.mp3)
* **audio/ogg** (.ogg)
* **audio/flac** (.flac)
* **audio/x-ms-wma** (.wma)
* **application/x-shockwave-flash** (.swf)
* **application/pdf** (.pdf)
* **application/x-photoshop** (.psd)
* **application/clip** (.clip)
* **application/vnd.rar** (.rar)
* **application/zip** (.zip)
* **application/x-7z-compressed** (.7z)
Although some support is imperfect for the complicated filetypes. For the Windows and Linux built releases, hydrus now embeds an MPV player for video, audio and gifs, which provides smooth playback and audio, but some other environments may not support MPV and so will default when possible to the native hydrus software renderer, which does not support audio. When something does not render how you want, right-clicking on its thumbnail presents the option 'open externally', which will open the file in the appropriate default program (e.g. ACDSee, VLC).
The client can also download files from several websites, including 4chan and other imageboards, many boorus, and gallery sites like deviant art and hentai foundry. You will learn more about this later.
## inbox and archiving { id="inbox" }
The client sends newly imported files to an **inbox**, just like your email. Inbox acts like a tag, matched by 'system:inbox'. A small envelope icon is drawn in the top corner of all inbox files:
![](images/fresh_imports.png)
If you are sure you want to keep a file long-term, you should **archive** it, which will remove it from the inbox. You can archive from your selected thumbnails' right-click menu, or by pressing ++f7++. If you make a mistake, you can spam ++ctrl+z++ for undo or hit ++shift+f7++ on any set of files to explicitly return them to the inbox.
Anything you do not want to keep should be deleted by selecting from the right-click menu or by hitting the delete key. Deleted files are sent to the trash. They will get a little trash icon:
![](images/processed_imports.png)
A trashed file will not appear in subsequent normal searches, although you can search the trash specifically by clicking the 'my files' button on the autocomplete dropdown and changing the file domain to 'trash'. Undeleting a file (++shift+delete++) will return it to 'my files' as if nothing had happened. Files that remain in the trash will be permanently deleted, usually after a few days. You can change the permanent deletion behaviour in the client's options.
A quick way of processing new files is&ndash;
## filtering { id="filtering_inbox" }
Lets say you just downloaded a good thread, or perhaps you just imported an old folder of miscellany. You now have a whole bunch of files in your inbox--some good, some awful. You probably want to quickly go through them, saying _yes, yes, yes, no, yes, no, no, yes_, where _yes_ means 'keep and archive' and _no_ means 'delete this trash'. **Filtering** is the solution.
Select some thumbnails, and either choose _filter->archive/delete_ from the right-click menu or hit F12. You will see them in a special version of the media viewer, with the following controls:
* ++left-button++, ++space++, or ++f7++: **keep and archive the file, move on**
* ++right-button++ or ++delete++: **delete the file, move on**
* ++up++: **Skip this file, move on**
* ++middle-button++ or ++backspace++: **I didn't mean that, go back one**
* ++escape++, ++return++, or ++f12++: **stop filtering now**
Your choices will not be committed until you finish filtering.
This saves time.
## lastly { id="what_hydrus_is_for" }
The hydrus client's workflows are not designed for half-finished files that you are still working on. Think of it as a giant archive for everything excellent you have decided to store away. It lets you find and remember these things quickly.
In general, hydrus is good for individual files like you commonly find on imageboards or boorus. Although advanced users can cobble together some page-tag-based solutions, it is not yet great for multi-file media like comics and definitely not as a typical playlist-based music player.
If you are looking for a comic manager to supplement hydrus, check out this user-made guide to other archiving software [here](https://github.com/CuddleBear92/Hydrus-Presets-and-Scripts/wiki/0-Alternative-Programs-and-Resources#software)!
And although the client can hold millions of files, it starts to creak and chug when displaying or otherwise tracking more than about 40,000 or so in a single gui window. As you learn to use it, please try not to let your download queues or general search pages regularly sit at more than 40 or 50k total _items_, or you'll start to slow other things down. Another common mistake is to leave one large 'system:everything' or 'system:inbox' page open with 70k+ files. For these sorts of 'ongoing processing' pages, try adding a 'system:limit=256' to keep them snappy. One user mentioned he had regular gui hangs of thirty seconds or so, and when we looked into it, it turned out his handful of download pages had three million files queued up! Just try and take things slow until you figure out what your computer's limits are.

View File

@ -0,0 +1,183 @@
---
title: installing and updating
---
If any of this is confusing, a simpler guide is [here](https://github.com/Zweibach/text/blob/master/Hydrus/Hydrus%20Help%20Docs/00_tableOfContents.md), and some video guides are [here](https://github.com/CuddleBear92/Hydrus-guides)!
## downloading
You can get the latest release at [my github releases page](https://github.com/hydrusnetwork/hydrus/releases).
I try to release a new version every Wednesday by 8pm EST and write an accompanying post on [my tumblr](http://hydrus.tumblr.com/) and a Hydrus Network General thread on [8chan.moe /t/](https://8chan.moe/t/catalog.html).
## installing
!!! warning ""
The hydrus releases are 64-bit only. If you are a python expert, there is the slimmest chance you'll be able to get it running from source on a 32-bit machine, but it would be easier just to find a newer computer to run it on.
=== "Windows"
* If you want the easy solution, download the .exe installer. Run it, hit ok several times.
* If you know what you are doing and want a little more control, get the .zip. Don't extract it to Program Files unless you are willing to run it as administrator every time (it stores all its user data inside its own folder). You probably want something like D:\\hydrus.
* _Note if you run <Win10, you may need [Visual C++ Redistributable for Visual Studio 2015](https://www.microsoft.com/en-us/download/details.aspx?id=48145), if you don't already have it for vidya. If you run Win7, you will need some/all core OS updates released before 2017._
* If you use Windows 10 N (a version of Windows without some media playback features), you will likely need the 'Media Feature Pack'. There have been several versions of this, so it may best found by searching for the latest version or hitting Windows Update, but otherwise check [here](https://support.microsoft.com/en-us/help/3145500/media-feature-pack-list-for-windows-n-editions).
=== "macOS"
* Get the .dmg App. Open it, drag it to Applications, and check the readme inside.
=== "Linux"
* Get the .tag.gz. Extract it somewhere useful and create shortcuts to 'client' and 'server' as you like. The build is made on Ubuntu, so if you run something else, compatibility is hit and miss.
* If you use Arch Linux, you can check out the AUR package a user maintains [here](https://aur.archlinux.org/packages/hydrus/).
* If you have problems running the Ubuntu build, users with some python experience generally find running from source works well.
* You might need to get 'libmpv1' to get mpv working and playing video/audio. This is the mpv library, not the player. Check help->about to see if it is available--if not, see if you can get it with _apt_.
* You can also try [running the Windows version in wine](wine.md).
=== "From Source"
* If you have some python experience, you can [run from source](running_from_source.md).
By default, hydrus stores all its data—options, files, subscriptions, _everything_—entirely inside its own directory. You can extract it to a usb stick, move it from one place to another, have multiple installs for multiple purposes, wrap it all up inside a truecrypt volume, whatever you like. The .exe installer writes some unavoidable uninstall registry stuff to Windows, but the 'installed' client itself will run fine if you manually move it.
!!! info "For macOS users"
The Hydrus App is **non-portable** and puts your database in `~/Library/Hydrus` (i.e. `/Users/[You]/Library/Hydrus`). You can update simply by replacing the old App with the new, but if you wish to backup, you should be looking at `~/Library/Hydrus`, not the App itself.
## anti-virus { id="anti_virus" }
Hydrus is made by an Anon out of duct tape and string. It combines file parsing tech with lots of network and database code in unusual and powerful ways, and all through a hacked-together executable that isn't signed by any big official company.
Unfortunately, we have been hit by anti-virus false positives throughout development. Every few months, one or more of the larger anti-virus programs sees some code that looks like something bad, or they run the program in a testbed and don't like something it does, and then they quarantine it. Every single instance of this so far has been a false positive. They usually go away the next week or two when the next set of definitions roll out. Some hydrus users are kind enough to report the program as a false positive to the anti-virus companies themselves, which also helps here.
Some users have never had the problem, some get hit regularly. The situation is obviously worse on Windows. If you try to extract the zip and client.exe or the whole folder suddenly disappears, please check your anti-virus software.
I am interested in reports about these false-positives, just so I know what is going on. Sometimes I have been able to reduce problems by changing something in the build (one of these was, no shit, an anti-virus testbed running the installer and then opening the help html at the end, which launched Edge browser, which then triggered Windows Update, which hit UAC and was considered suspicious. I took out the 'open help' checkbox from the installer as a result).
You should be careful about random software online. For my part, the program is completely open source, and I have a long track record of designing it with privacy foremost. There is no intentional spyware of any sort--the program never connects to another computer unless you tell it to. Furthermore, the exe you download is now built on github's cloud, so there are very few worries about a trojan-infected build environment putting something I did not intend into the program (as there once were when I built the release on my home machine). That doesn't stop Windows Defender from sometimes calling it an ugly name like "Tedy.4675" and definitively declaring "This program is dangerous and executes commands from an attacker" but that's the modern anti-virus ecosystem.
There aren't excellent solutions to this problem. I don't like to say 'just exclude the program directory from your anti-virus settings', but some users are comfortable with this and say it works fine. One thing I do know that helps (with other things too), if you are using the default Windows Defender, is going into the Windows Security shield icon on your taskbar, and 'virus and threat protection' and then 'virus and threat protection settings', and turning off 'Cloud-delivered protection' and 'Automatic sample submission'. It seems with these on, Windows will talk with a central server about executables you run and download early updates, and this gives a lot of false positives.
If you are still concerned, please feel free to run from source, as above. You are controlling everything, then, and can change anything about the program you like. Or you can only run releases from four weeks ago, since you know the community would notice by then if there ever were a true positive. Or just run it in a sandbox and watch its network traffic.
In 2022 I am going to explore a different build process to see if that reduces the false positives. We currently make the executable with PyInstaller, which has some odd environment set-up the anti-virus testbeds don't seem to like, and perhaps PyOxidizer will be better. We'll see.
## running
To run the client:
=== "Windows"
* For the installer, run the Start menu shortcut it added.
* For the extract, run 'client.exe' in the base directory, or make a shortcut to it.
=== "macOS"
* Run the App you installed.
=== "Linux"
* Run the 'client' executable in the base directory. You may be able to double-click it, otherwise you are talking `./client` from the terminal.
* If you experience virtual memory crashes, please review [this thorough guide](Fixing_Hydrus_Random_Crashes_Under_Linux.md) by a user.
## updating
!!! warning
Hydrus is imageboard-tier software, wild and fun but unprofessional. It is written by one Anon spinning a lot of plates. Mistakes happen from time to time, usually in the update process. There are also no training wheels to stop you from accidentally overwriting your whole db if you screw around. Be careful when updating. Make backups beforehand!
**Hydrus does not auto-update. It will stay the same version unless you download and install a new one.**
Although I put out an new version every week, you can update far less often if you want. The client keeps to itself, so if it does exactly what you want and a new version does nothing you care about, you can just leave it. Other users enjoy updating every week, simply because it makes for a nice schedule. Others like to stay a week or two behind what is current, just in case I mess up and cause a temporary bug in something they like.
A user has written a longer and more formal guide to updating, and information on the 334->335 step [here](update_guide.rtf).
The update process:
* If the client is running, close it!
* If you maintain a backup, run it now!
* If you use the installer, just download the new installer and run it. It should detect where the last install was and overwrite everything automatically.
* If you extract, then just extract the new version right on top of your current install and overwrite manually.
* Start your client or server. It may take a few minutes to update its database. I will say in the release post if it is likely to take longer.
Unless the update specifically disables or reconfigures something, all your files and tags and settings will be remembered after the update.
Releases typically need to update your database to their version. New releases can retroactively perform older database updates, so if the new version is v255 but your database is on v250, you generally only need to get the v255 release, and it'll do all the intervening v250->v251, v251->v252, etc... update steps in order as soon as you boot it. If you need to update from a release more than, say, ten versions older than current, see below. You might also like to skim the release posts or [changelog](changelog.md) to see what is new.
Clients and servers of different versions can usually connect to one another, but from time to time, I make a change to the network protocol, and you will get polite error messages if you try to connect to a newer server with an older client or _vice versa_. There is still no _need_ to update the client--it'll still do local stuff like searching for files completely fine. Read my release posts and judge for yourself what you want to do.
## clean installs
**This is only relevant if you update and cannot boot at all.**
Very rarely, hydrus needs a clean install. This can be due to a special update like when we moved from 32-bit to 64-bit or needing to otherwise 'reset' a custom install situation. The problem is usually that a library file has been renamed in a new version and hydrus has trouble figuring out whether to use the older one (from a previous version) or the newer.
In any case, if you cannot boot hydrus and it either fails silently or you get a crash log or system-level error popup complaining in a technical way about not being able to load a dll/pyd/so file, you may need a clean install, which essentially means clearing any old files out and reinstalling.
However, you need to be careful not to delete your database! It sounds silly, but at least one user has made a mistake here. The process is simple, do not deviate:
* Make a backup if you can!
* Go to your install directory.
* Delete all the files and folders except the 'db' dir (and all of its contents, obviously).
* Reinstall/extract hydrus as you normally do.
After that, you'll have a 'clean' version of hydrus that only has the latest version's dlls. If hydrus still will not boot, I recommend you roll back to your last working backup and let me, hydrus dev, know what your error is.
## big updates
If you have not updated in some time--say twenty versions or more--doing it all in one jump, like v250->v290, is likely not going to work. I am doing a lot of unusual stuff with hydrus, change my code at a fast pace, and do not have a ton of testing in place. Hydrus update code often falls to [bitrot](https://en.wikipedia.org/wiki/Software_rot), and so some underlying truth I assumed for the v255->v256 code may not still apply six months later. If you try to update more than 50 versions at once (i.e. trying to perform more than a year of updates in one go), the client will give you a polite error rather than even try.
As a result, if you get a failure on trying to do a big update, try cutting the distance in half--try v270 first, and then if that works, try v270->v290. If it doesn't, try v260, and so on.
If you narrow the gap down to just one version and still get an error, please let me know. I am very interested in these sorts of problems and will be happy to help figure out a fix with you (and everyone else who might be affected).
## backing up
!!! danger "I am not joking around: if you end up liking hydrus, you should back up your database"
**Maintaining a regular backup is important for hydrus.** The program stores a lot of complicated data that you will put hours and hours of work into, and if you only have one copy and your hard drive breaks, you could lose everything. This has happened before--to people who thought it would never happen to them--and it sucks big time to go through. **Don't let it be you.**
Hydrus's database engine, SQLite, is excellent at keeping data safe, but it cannot work in a faulty environment. Ways in which users of hydrus have damaged/lost their database:
* Hard drive hardware failure (age, bad ventilation, bad cables, etc...)
* Lightning strike on non-protected socket or rough power cut on non-UPS'd power supply
* RAM failure
* Motherboard/PSU power problems
* Accidental deletion
* Accidental overwrite (usually during a borked update)
* Encrypted partition auto-dismount/other borked settings
* Cloud backup interfering with ongoing writes
* Network drive location not guaranteeing accurate file locks
Some of those you can mitigate (don't run the database over a network!) and some will always be a problem, but if you have a backup, none of them can kill you.
If you do not already have a backup routine for your files, this is a great time to start. I now run a backup every week of all my data so that if my computer blows up or anything else awful happens, I'll at worst have lost a few days' work. Before I did this, I once lost an entire drive with tens of thousands of files, and it felt awful. If you are new to saving a lot of media, I hope you can avoid what I felt. ;\_;
I use [ToDoList](http://abstractspoon.com/) to remind me of my jobs for the day, including backup tasks, and [FreeFileSync](http://sourceforge.net/projects/freefilesync/) to actually mirror over to an external usb drive. I recommend both highly (and for ToDoList, I recommend hiding the complicated columns, stripping it down to a simple interface). It isn't a huge expense to get a couple-TB usb drive either--it is **absolutely** worth it for the peace of mind.
By default, hydrus stores all your user data in one location, so backing up is simple:
#### the simple way - inside the client
:
Go _database->set up a database backup location_ in the client. This will tell the client where you want your backup to be stored. A fresh, empty directory on a different drive is ideal.
Once you have your location set up, you can thereafter hit _database->update database backup_. It will lock everything and mirror your files, showing its progress in a popup message. The first time you make this backup, it may take a little while (as it will have to fully copy your database and all its files), but after that, it will only have to copy new or altered files and should only ever take a couple of minutes.
Advanced users who have migrated their database across multiple locations will not have this option--use an external program in this case.
#### the powerful (and best) way - using an external program
:
If you would like to integrate hydrus into a broader backup scheme you already run, or you are an advanced user with a complicated hydrus install that you have migrated across multiple drives, then you need to backup two things: the client\*.db files and your client\_files directory(ies). By default, they are all stored in install\_dir/db. The .db files contain your settings and file metadata like inbox/archive and tags, while the client\_files subdirs store your actual media and its thumbnails. If everything is still under install\_dir/db, then it is usually easiest to just backup the whole install dir, keeping a functional 'portable' copy of your install that you can restore no prob. Make sure you keep the .db files together--they are not interchangeable and mostly useless on their own!
Shut the client down while you run the backup, obviously.
!!! danger
Do not put your live database in a folder that continuously syncs to a cloud backup. Many of these services will interfere with a running client and can cause database corruption. If you still want to use a system like this, either turn the sync off while the client is running, or use the above backup workflows to safely backup your client to a separate folder that syncs to the cloud.
I recommend you always backup before you update, just in case there is a problem with my code that breaks your database. If that happens, please [contact me](contact.md), describing the problem, and revert to the functioning older version. I'll get on any problems like that immediately.
## backing up with not much space { id="backing_up_small" }
If you decide not to maintain a backup because you cannot afford drive space for all your files, please please at least back up your actual database files. Use FreeFileSync or a similar program to back up the four 'client*.db' files in install_dir/db when the client is not running. Just make sure you have a copy of those files, and then if your main install becomes damaged, we will have a reference to either roll back to or manually restore data from. Even if you lose a bunch of media files in this case, with an intact database we'll be able to schedule recovery of anything with a URL.
If you are really short on space, note also that the database files are very compressible. A very large database where the four files add up to 70GB can compress down to 17GB zip with 7zip on default settings. This obviously takes some additional time to do, but if you are really short on space it may be the only way it fits, and if your only backup drive is a slow USB stick, then you might actually save time from not having to transfer the other 53GB! Media files (jpegs, webms, etc...) are generally not very compressible, usually 5% at best, so it is usually not worth trying.
It is best to have all four database files. It is generally easy and quick to fix problems if you have a backup of all four. If client.caches.db is missing, you can recover but it might take ten or more hours of CPU work to regenerate. If client.mappings.db is missing, you might be able to recover tags for your local files from a mirror in an intact client.caches.db. However, client.master.db and client.db are the most important. If you lose either of those, or they become too damaged to read and you have no backup, then your database is essentially dead and likely every single archive and view and tag and note and url record you made is lost. This has happened before, do not let it be you.

View File

@ -0,0 +1,96 @@
---
title: more files
---
# more getting started with files
## searching with wildcards { id="wildcards" }
The autocomplete tag dropdown supports wildcard searching with `*`.
![](images/wildcard_gelion.png)
The `*` will match any number of characters. Every normal autocomplete search has a secret `*` on the end that you don't see, which is how full words get matched from you only typing in a few letters.
This is useful when you can only remember part of a word, or can't spell part of it. You can put `*` characters anywhere, but you should experiment to get used to the exact way these searches work. Some results can be surprising!
![](images/wildcard_vage.png)
You can select the special predicate inserted at the top of your autocomplete results (the highlighted `*gelion` and `*va*ge*` above). **It will return all files that match that wildcard,** i.e. every file for every other tag in the dropdown list.
This is particularly useful if you have a number of files with commonly structured over-informationed tags, like this:
![](images/wildcard_cool_pic.png)
In this case, selecting the `title:cool pic*` predicate will return all three images in the same search, where you can conveniently give them some more-easily searched tags like `series:cool pic` and `page:1`, `page:2`, `page:3`.
## more searching
Let's look at the tag autocomplete dropdown again:
![](images/ac_dropdown.png)
* **favourite searches star**
Once you get experience with the client, have a play with this. Rather than leaving common search pages open, save them in here and load them up as needed. You will keep your client lightweight and save time.
* **include current/pending tags**
Turn these on and off to control whether tag _search predicates_ apply to tags the exist, or limit just to those pending to be uploaded to a tag repository. Just searching 'pending' tags is useful if you want to scan what you have pending to go up to the PTR--just turn off 'current' tags and search `system:num tags > 0`.
* **searching immediately**
This controls whether a change to the search tags will instantly run the new search and get new results. Turning this off is helpful if you want to add, remove, or replace several heavy search terms in a row without getting UI lag.
* **OR**
You only see this if you have 'advanced mode' on. It is an experimental module. Have a play with it--it lets you enter some pretty complicated tags!
* **file/tag domains**
By default, you will search in 'my files' and 'all known tags' domain. This is the intersection of your local media files (on your hard disk) and the union of all known tag searches. If you search for `character:samus aran`, then you will get file results from your 'my files' domain that have `character:samus aran` in any tag service. For most purposes, this search domain is fine, but as you use the client more, you may want to access different search domains.
For instance, if you change the file domain to 'trash', then you will instead get files that are in your trash. Setting the tag domain to 'my tags' will ignore other tag services (e.g. the PTR) for all tag search predicates, so a `system:num_tags` or a `character:samus aran` will only look 'my tags'.
Turning on 'advanced mode' gives access to more search domains. Some of them are subtly complicated and only useful for clever jobs--most of the time, you still want 'my files' and 'all known tags'.
## sorting with system limit
If you add system:limit to a search, the client will consider what that page's file sort currently is. If it is simple enough--something like file size or import time--then it will sort your results before they come back and clip the limit according to that sort, getting the n 'largest file size' or 'newest imports' and so on. This can be a great way to set up a lightweight filtering page for 'the 256 biggest videos in my inbox'.
If you change the sort, hydrus will not refresh the search, it'll just re-sort the n files you have. Hit F5 to refresh the search with a new sort.
Not all sorts are supported. Anything complicated like tag sort will result in a random sample instead.
## exporting and uploading { id="intro" }
There are many ways to export files from the client:
* **drag and drop**
Just dragging from the thumbnail view will export (copy) all the selected files to wherever you drop them.
The files will be named by their ugly hexadecimal [hash](faq.md#hashes), which is how they are stored inside the database.
If you use this to open a file inside an image editing program, remember to go 'save as' and give it a new filename! The client does not expect files inside its db directory to change.
* **export dialog**
Right clicking some files and selecting _share->export->files_ will open this dialog:
![](images/export.png)
Which lets you export the selected files with custom filenames. It will initialise trying to export the files named by their hashes, but once you are comfortable with tags, you'll be able to generate much cleverer and prettier filenames.
* **share->copy->files**
This will copy the files themselves to your clipboard. You can then paste them wherever you like, just as with normal files. They will have their hashes for filenames.
This is a very quick operation. It can also be triggered by hitting Ctrl+C.
* **share->copy->hashes**
This will copy the files' unique identifiers to your clipboard, in hexadecimal.
You will not have to do this often. It is best when you want to identify a number of files to someone else without having to send them the actual files.

View File

@ -0,0 +1,38 @@
---
title: ratings
---
# getting started with ratings
The hydrus client supports two kinds of ratings: _like/dislike_ and _numerical_. Let's start with the simpler one:
## like/dislike { id="like_dislike" }
A new client starts with one of these, called 'favourites'. It can set one of two values to a file. It does not have to represent like or dislike--it can be anything you want, like 'send to export folder' or 'explicit/safe' or 'cool babes'. Go to _services->manage services->local->like/dislike ratings_:
![](images/ratings_like.png)
You can set a variety of colours and shapes.
## numerical
This is '3 out of 5 stars' or '8/10'. You can set the range to whatever whole numbers you like:
![](images/ratings_numerical.png)
As well as the shape and colour options, you can set how many 'stars' to display and whether 0/10 is permitted.
If you change the star range at a later date, any existing ratings will be 'stretched' across the new range. As values are collapsed to the nearest integer, this is best done for scales that are multiples. 2/5 will neatly become 4/10 on a zero-allowed service, for instance, and 0/4 can nicely become 1/5 if you disallow zero ratings in the same step. If you didn't intuitively understand that, just don't touch the number of stars or zero rating checkbox after you have created the numerical rating service!
## now what? { id="using_ratings" }
Ratings are displayed in the top-right of the media viewer:
![](images/ratings_ebola_chan.png)
Hovering over each control will pop up its name, in case you forget which is which. You can set then them with a left- or right-click. Like/dislike and numerical have slightly different click behaviour, so have a play with them to get their feel. Pressing F4 on a selection of thumbnails will open a dialog with a very similar layout, which will let you set the same rating to many files simultaneously.
Once you have some ratings set, you can search for them using system:rating, which produces this dialog:
![](images/ratings_system_pred.png)
On my own client, I find it useful to have several like/dislike ratings set up as one-click pseudo-tags, like the 'OP images' above.

View File

@ -0,0 +1,128 @@
---
title: subscriptions
---
# getting started with subscriptions
Do not try to create a subscription until you are comfortable with a normal gallery download page! Go [here](getting_started_downloading.md).
Let's say you found an artist you like. You downloaded everything of theirs from some site, but one or two pieces of new work is posted every week. You'd like to keep up with the new stuff, but you don't want to manually make a new download job every week for every single artist you like.
## what are subs? { id="intro" }
Subscriptions are a way of telling the client to regularly and quietly repeat a gallery search. You set up a number of saved queries, and the client will 'sync' with the latest files in the gallery and download anything new, just as if you were running the download yourself.
Subscriptions only work for booru-like galleries that put the newest files first, and they only keep up with new content--once they have done their first sync, which usually gets the most recent hundred files or so, they will never reach further into the past. Getting older files, as you will see later, is a job best done with a normal download page.
Here's the dialog, which is under _network->downloaders->manage subscriptions_:
![](images/subscriptions_edit_subscriptions.png)
This is a very simple example--there is only one subscription, for safebooru. It has two 'queries' (i.e. searches to keep up with).
It is important to note that while subscriptions can have multiple queries (even hundreds!), they _generally_ only work on one site. Expect to create one subscription for safebooru, one for artstation, one for paheal, and so on for every site you care about. Advanced users may be able to think of ways to get around this, but I recommend against it as it throws off some of the internal check timing calculations.
Before we trip over the advanced buttons here, let's zoom in on the actual subscription:
[![](images/subscriptions_edit_subscription.png)](images/subscriptions_edit_subscription.png)
This is a big and powerful panel! I recommend you open the screenshot up in a new browser tab, or in the actual client, so you can refer to it.
Despite all the controls, the basic idea is simple: Up top, I have selected the 'safebooru tag search' download source, and then I have added two artists--"hong_soon-jae" and "houtengeki". These two queries have their own panels for reviewing what URLs they have worked on and further customising their behaviour, but all they _really_ are is little bits of search text. When the subscription runs, it will put the given search text into the given download source just as if you were running the regular downloader.
**For the most part, all you need to do to set up a good subscription is give it a name, select the download source, and use the 'paste queries' button to paste what you want to search. Subscriptions have great default options for almost all query types, so you don't have to go any deeper than that to get started.**
!!! danger
**Do not change the max number of new files options until you know _exactly_ what they do and have a good reason to alter them!**
## how do subscriptions work? { id="description" }
Once you hit ok on the main subscription dialog, the subscription system should immediately come alive. If any queries are due for a 'check', they will perform their search and look for new files (i.e. URLs it has not seen before). Once that is finished, the file download queue will be worked through as normal. Typically, the sub will make a popup like this while it works:
![](images/subscriptions_popup.png)
The initial sync can sometimes take a few minutes, but after that, each query usually only needs thirty seconds' work every few days. If you leave your client on in the background, you'll rarely see them. If they ever get in your way, don't be afraid to click their little cancel button or call a global halt with _network->pause->subscriptions_--the next time they run, they will resume from where they were before.
Similarly, the initial sync may produce a hundred files, but subsequent runs are likely to only produce one to ten. If a subscription comes across a lot of big files at once, it may not download them all in one go--but give it time, and it will catch back up before you know it.
When it is done, it leaves a little popup button that will open a new page for you:
![](images/subscriptions_thumbnails.png)
This can often be a nice surprise!
## what makes a good subscription? { id="good_subs" }
The same rules as for downloaders apply: **start slow, be hesitant, and plan for the long-term.** Artist queries make great subscriptions as they update reliably but not too often and have very stable quality. Pick the artists you like most, see where their stuff is posted, and set up your subs like that.
Series and character subscriptions are sometimes valuable, but they can be difficult to keep up with and have highly variable quality. It is not uncommon for users to only keep 15% of what a character sub produces. I do not recommend them for anything but your waifu.
Attribute subscriptions like 'blue_eyes' or 'smile' make for terrible subs as the quality is all over the place and you will be inundated by too much content. The only exceptions are for specific, low-count searches that really matter to you, like 'contrapposto' or 'gothic trap thighhighs'.
If you end up subscribing to eight hundred things and get ten thousand new files a week, you made a mistake. Subscriptions are for _keeping up_ with things you like. If you let them overwhelm you, you'll resent them.
!!! warning
Subscriptions syncs are somewhat fragile. Do not try to play with the limits or checker options to download a whole 5,000 file query in one go--if you want everything for a query, run it in the manual downloader and get everything, then set up a normal sub for new stuff. There is no benefit to having a 'large' subscription, and it will trim itself down in time anyway.
It is a good idea to run a 'full' download for a search before you set up a subscription. As well as making sure you have the exact right query text and that you have everything ever posted (beyond the 100 files deep a sub will typically look), it saves the bulk of the work (and waiting on bandwidth) for the manual downloader, where it belongs. When a new subscription picks up off a freshly completed download queue, its initial subscription sync only takes thirty seconds since its initial URLs are those that were already processed by the manual downloader. I recommend you stack artist searches up in the manual downloader using 'no limit' file limit, and when they are all finished, select them in the list and _right-click->copy queries_, which will put the search texts in your clipboard, newline-separated. This list can be pasted into the subscription dialog in one go with the 'paste queries' button again!
!!! note
The entire subscription system assumes the source is a typical 'newest first' booru-style search. If you dick around with some order_by:rating/random metatag, it will not work reliably.
## images/how often do subscriptions check? { id="checking" }
Hydrus subscriptions use the same variable-rate checking system as its thread watchers, just on a larger timescale. If you subscribe to a busy feed, it might check for new files once a day, but if you enter an artist who rarely posts, it might only check once every month. You don't have to do anything. The fine details of this are governed by the 'checker options' button. **This is one of the things you should not mess with as you start out.**
If a query goes too 'slow' (typically, this means no new files for 180 days), it will be marked DEAD in the same way a thread will, and it will not be checked again. You will get a little popup when this happens. This is all editable as you get a better feel for the system--if you wish, it is completely possible to set up a sub that never dies and only checks once a year.
I do not recommend setting up a sub that needs to check more than once a day. Any search that is producing that many files is probably a bad fit for a subscription. **Subscriptions are for lightweight searches that are updated every now and then.**
* * *
_(you might like to come back to this point once you have tried subs for a week or so and want to refine your workflow)_
* * *
## ok, I set up three hundred queries, and now these popup buttons are a hassle { id="presentation" }
One the edit subscription panel, the 'presentation' options let you publish files to a page. The page will have the subscription's name, just like the button makes, but it cuts out the middle-man and 'locks it in' more than the button, which will be forgotten if you restart the client. **Also, if a page with that name already exists, the new files will be appended to it, just like a normal import page!** I strongly recommend moving to this once you have several subs going. Make a 'page of pages' called 'subs' and put all your subscription landing pages in there, and then you can check it whenever is convenient.
If you discover your subscription workflow tends to be the same for each sub, you can also customise the publication 'label' used. If multiple subs all publish to the 'nsfw subs' label, they will all end up on the same 'nsfw subs' popup button or landing page. Sending multiple subscriptions' import streams into just one or two locations like this can be great.
You can also hide the main working popup. I don't recommend this unless you are really having a problem with it, since it is useful to have that 'active' feedback if something goes wrong.
Note that subscription file import options will, by default, only present 'new' files. Anything already in the db will still be recorded in the internal import cache and used to calculate next check times and so on, but it won't clutter your import stream. This is different to the default for all the other importers, but when you are ready to enter the ranks of the Patricians, you will know to edit your 'loud' default file import options under _options->importing_ to behave this way as well. Efficient workflows only care about new files.
## how exactly does the sync work? { id="syncing_explanation" }
Figuring out when a repeating search has 'caught up' can be a tricky problem to solve. It sounds simple, but unusual situations like 'a file got tagged late, so it inserted deeper than it ideally should in the gallery search' or 'the website changed its URL format completely, help' can cause problems. Subscriptions are automatic systems, so they tend to be a bit more careful and paranoid about problems, lest they burn 10GB on 10,000 unexpected diaperfur images.
The initial sync is simple. It does a regular search, stopping if it reaches the 'initial file limit' or the last file in the gallery, whichever comes first. The default initial file sync is 100, which is a great number for almost all situations.
Subsequent syncs are more complicated. It ideally 'stops' searching when it reaches files it saw in a previous sync, but if it comes across new files mixed in with the old, it will search a bit deeper. It is not foolproof, and if a file gets tagged very late and ends up a hundred deep in the search, it will probably be missed. There is no good and computationally cheap way at present to resolve this problem, but thankfully it is rare.
Remember that an important 'staying sane' philosophy of downloading and subscriptions is to focus on dealing with the 99.5% you have before worrying about the 0.5% you do not.
The amount of time between syncs is calculated by the checker options. Based on the timestamps attached to existing urls in the subscription cache (either added time, or the post time as parsed from the url), the sub estimates how long it will be before n new files appear, and then next check is scheduled for then. Unless you know what you are doing, checker options, like file limits, are best left alone. A subscription will naturally adapt its checking speed to the file 'velocity' of the source, and there is usually very little benefit to trying to force a sub to check at a radically different speed.
!!! tip
If you want to force your subs to run at the same time, say every evening, it is easier to just use _network->pause->subscriptions_ as a manual master on/off control. The ones that are due will catch up together, the ones that aren't won't waste your time.
Remember that subscriptions only keep up with new content. They cannot search backwards in time in order to 'fill out' a search, nor can they fill in gaps. **Do not change the file limits or check times to try to make this happen.** If you want to ensure complete sync with all existing content for a particular search, use the manual downloader.
In practice, most subs only need to check the first page of a gallery since only the first two or three urls are new.
## periodic file limit exceeded { id="periodic_file_limit" }
If, during a regular sync, the sub keeps finding new URLs, never hitting a block of already-seen URLs, it will stop upon hitting its 'periodic file limit', which is also usually 100. When it happens, you will get a popup message notification. There are two typical reasons for this:
* A user suddenly posted a large number of files to the site for that query. This sometimes happens with CG gallery spam.
* The website changed their URL format.
The first case is a natural accident of statistics. The subscription now has a 'gap' in its sync. If you want to get what you missed, you can try to fill in the gap with a manual downloader page. Just download to 200 files or so, and the downloader will work quickly to one-time work through the URLs in the gap.
The second case is a safety stopgap for hydrus. If a site decides to have `/post/123456` style URLs instead of `post.php?id=123456` style, hydrus will suddenly see those as entirely 'new' URLs. It could also be because of an updated downloader, which pulls URLs in API format or similar. This is again thankfully quite rare, but it triggers several problems--the associated downloader usually breaks, as it does not yet recognise those new URLs, and all your subs for that site will parse through and hit the periodic limit for every query. When this happens, you'll usually get several periodic limit popups at once, and you may need to update your downloader. If you know the person who wrote the original downloader, they'll likely want to know about the problem, or may already have a fix sorted. It is often a good idea to pause the affected subs until you have it figured out and working in a normal gallery downloader page.
## I put character queries in my artist sub, and now things are all mixed up { id="merging_and_separating" }
On the main subscription dialog, there are 'merge' and 'separate' buttons. These are powerful, but they will walk you through the process of pulling queries out of a sub and merging them back into a different one. Only subs that use the same download source can be merged. Give them a go, and if it all goes wrong, just hit the cancel button on the dialog.

View File

@ -0,0 +1,96 @@
---
title: tags
---
# getting started with tags
If any of this is confusing, a simpler guide is [here](https://github.com/Zweibach/text/blob/master/Hydrus/Hydrus%20Help%20Docs/00_tableOfContents.md), and some video guides are [here](https://github.com/CuddleBear92/Hydrus-guides)!
## how do we find files? { id="intro" }
So, you have stored some media in your database. Everything is hashed and cached. You can search by inbox and resolution and size and so on, but if you really want to find what we are looking for, you will have to use _tags_.
[FAQ: what is a tag?](faq.md#tags)
Your client starts with one local tags service, called 'my tags', which keeps all of its file->tag mappings in your client's database where only you can see them. It is a good place to practise. So, select a file and press F3:
[![](images/sororitas_local.png)](images/sororitas_local.png)
The autocomplete dropdown in the manage tags dialog works very like the one in a normal search page--you type part of a tag, and matching results will appear below. You select the tag you want with the arrow keys and hit enter. Since your 'my tags' service doesn't have any tags in it yet, you won't get any results here except the exact match of what you typed. If you want to remove a tag, enter the exact same thing again or double-click it in the box above.
Prefixing a tag with a category and a colon will create a _namespaced_ tag. This helps inform the software and other users about what the tag is. Examples of namespaced tags are:
* `character:batman`
* `series:street fighter`
* `person:jennifer lawrence`
* `title:vitruvian man`
The client is set up to draw common namespaces in different colours, just like boorus do. You can change these colours in the options.
Once you are happy with your tags, hit 'apply' or just press enter on the text box if it is empty.
[![](images/sororitas_local_done.png)](images/sororitas_local_done.png)
The tags are now saved to your database. Searching for any of them will return this file and anything else so tagged:
[![](images/sororitas_search.png)](images/sororitas_search.png)
If you add more tags or system predicates to a search, you will limit the results to those files that match every single one:
[![](images/sororitas_hanako.png)](images/sororitas_hanako.png)
You can also exclude a tag by prefixing it with a hyphen (e.g. `-heresy`).
## OR searching
Searches find files that match every search 'predicate' in the list (it is an **AND** search), which makes it difficult to search for files that include one **OR** another tag. More recently, simple OR search support was added. All you have to do is hold down Shift when you enter/double-click a tag in the autocomplete entry area. Instead of sending the tag up to the active search list up top, it will instead start an under-construction 'OR chain' in the tag results below:
![](images/or_under_construction.png)
You can keep searching for and entering new tags. Holding down Shift on new tags will extend the OR chain, and entering them as normal will 'cap' the chain and send it to the complete and active search predicates above.
![](images/or_done.png)
Any file that has one or more of those OR sub-tags will match.
If you enter an OR tag incorrectly, you can either cancel or 'rewind' the under-construction search predicate with these new buttons that will appear:
![](images/or_buttons.png)
You can also cancel an under-construction OR by hitting Esc on an empty input. You can add any sort of search term to an OR search predicate, including system predicates. Some unusual sub-predicates (typically a `-tag`, or a very broad system predicate) can run very slowly, but they will run much faster if you include non-OR search predicates in the search:
![](images/or_mixed.png)
This search will return all files that have the tag `fanfic` and one or more of `medium:text`, a positive value for the like/dislike rating 'read later', or PDF mime.
## tag repositories
It can take a long time to tag even small numbers of files well, so I created _tag repositories_ so people can share the work.
Tag repos store many file->tag relationships. Anyone who has an access key to the repository can sync with it and hence download all these relationships. If any of their own files match up, they will get those tags. Access keys will also usually have permission to upload new tags and ask for incorrect ones to be deleted.
Anyone can run a tag repository, but it is a bit complicated for new users. I ran a public tag repository for a long time, and now this large central store is run by users. It has over a billion tags and is free to access and contribute to.
To connect with it, please check [here](access_keys.md). **Please read that page if you want to try out the PTR. It is only appropriate for someone on an SSD!**
If you add it, your client will download updates from the repository over time and, usually when it is idle or shutting down, 'process' them into its database until it is fully synchronised. The processing step is CPU and HDD heavy, and you can customise when it happens in _file->options->maintenance and processing_. As the repository synchronises, you should see some new tags appear, particularly on famous files that lots of people have.
You can watch more detailed synchronisation progress in the _services->review services_ window.
![](images/review_repos_public_tag_repo.png)
Your new service should now be listed on the left of the manage tags dialog. Adding tags to a repository works very similarly to the 'my tags' service except hitting 'apply' will not immediately confirm your changes--it will put them in a queue to be uploaded. These 'pending' tags will be counted with a plus '+' or minus '-' sign:
[![](images/rlm_pending.png)](images/rlm_pending.png)
Notice that a 'pending' menu has appeared on the main window. This lets you start the upload when you are ready and happy with everything that you have queued.
When you upload your pending tags, they will commit and look to you like any other tag. The tag repository will anonymously bundle them into the next update, which everyone else will download in a day or so. They will see your tags just like you saw theirs.
If you attempt to remove a tag that has been uploaded, you may be prompted to give a reason, creating a petition that a janitor for the repository will review.
I recommend you not spam tags to the public tag repo until you get a rough feel for the [guidelines](https://github.com/CuddleBear92/Hydrus-Presets-and-Scripts/blob/master/tag%20guidelines), and my original [tag schema](tagging_schema.html) thoughts, or just lurk until you get the idea. It roughly follows what you will see on a typical booru. The general rule is to only add factual tags--no subjective opinion.
You can connect to more than one tag repository if you like. When you are in the _manage tags_ dialog, pressing the up or down arrow keys on an empty input switches between your services.
[FAQ: why can my friend not see what I just uploaded?](faq.md#delays)

View File

Before

Width:  |  Height:  |  Size: 15 KiB

After

Width:  |  Height:  |  Size: 15 KiB

View File

Before

Width:  |  Height:  |  Size: 8.9 KiB

After

Width:  |  Height:  |  Size: 8.9 KiB

View File

Before

Width:  |  Height:  |  Size: 13 KiB

After

Width:  |  Height:  |  Size: 13 KiB

View File

Before

Width:  |  Height:  |  Size: 24 KiB

After

Width:  |  Height:  |  Size: 24 KiB

View File

Before

Width:  |  Height:  |  Size: 24 KiB

After

Width:  |  Height:  |  Size: 24 KiB

View File

Before

Width:  |  Height:  |  Size: 20 KiB

After

Width:  |  Height:  |  Size: 20 KiB

View File

Before

Width:  |  Height:  |  Size: 3.7 KiB

After

Width:  |  Height:  |  Size: 3.7 KiB

View File

Before

Width:  |  Height:  |  Size: 4.5 KiB

After

Width:  |  Height:  |  Size: 4.5 KiB

View File

Before

Width:  |  Height:  |  Size: 136 KiB

After

Width:  |  Height:  |  Size: 136 KiB

View File

Before

Width:  |  Height:  |  Size: 3.7 KiB

After

Width:  |  Height:  |  Size: 3.7 KiB

View File

Before

Width:  |  Height:  |  Size: 6.1 KiB

After

Width:  |  Height:  |  Size: 6.1 KiB

View File

Before

Width:  |  Height:  |  Size: 2.8 KiB

After

Width:  |  Height:  |  Size: 2.8 KiB

View File

Before

Width:  |  Height:  |  Size: 3.5 KiB

After

Width:  |  Height:  |  Size: 3.5 KiB

View File

Before

Width:  |  Height:  |  Size: 66 KiB

After

Width:  |  Height:  |  Size: 66 KiB

View File

Before

Width:  |  Height:  |  Size: 34 KiB

After

Width:  |  Height:  |  Size: 34 KiB

View File

Before

Width:  |  Height:  |  Size: 42 KiB

After

Width:  |  Height:  |  Size: 42 KiB

View File

Before

Width:  |  Height:  |  Size: 82 KiB

After

Width:  |  Height:  |  Size: 82 KiB

View File

Before

Width:  |  Height:  |  Size: 63 KiB

After

Width:  |  Height:  |  Size: 63 KiB

View File

Before

Width:  |  Height:  |  Size: 35 KiB

After

Width:  |  Height:  |  Size: 35 KiB

View File

Before

Width:  |  Height:  |  Size: 30 KiB

After

Width:  |  Height:  |  Size: 30 KiB

View File

Before

Width:  |  Height:  |  Size: 85 KiB

After

Width:  |  Height:  |  Size: 85 KiB

View File

Before

Width:  |  Height:  |  Size: 64 KiB

After

Width:  |  Height:  |  Size: 64 KiB

View File

Before

Width:  |  Height:  |  Size: 272 KiB

After

Width:  |  Height:  |  Size: 272 KiB

View File

Before

Width:  |  Height:  |  Size: 103 KiB

After

Width:  |  Height:  |  Size: 103 KiB

View File

Before

Width:  |  Height:  |  Size: 886 KiB

After

Width:  |  Height:  |  Size: 886 KiB

View File

Before

Width:  |  Height:  |  Size: 57 KiB

After

Width:  |  Height:  |  Size: 57 KiB

View File

Before

Width:  |  Height:  |  Size: 108 KiB

After

Width:  |  Height:  |  Size: 108 KiB

View File

Before

Width:  |  Height:  |  Size: 79 KiB

After

Width:  |  Height:  |  Size: 79 KiB

View File

Before

Width:  |  Height:  |  Size: 167 KiB

After

Width:  |  Height:  |  Size: 167 KiB

View File

Before

Width:  |  Height:  |  Size: 61 KiB

After

Width:  |  Height:  |  Size: 61 KiB

View File

Before

Width:  |  Height:  |  Size: 85 KiB

After

Width:  |  Height:  |  Size: 85 KiB

View File

Before

Width:  |  Height:  |  Size: 60 KiB

After

Width:  |  Height:  |  Size: 60 KiB

View File

Before

Width:  |  Height:  |  Size: 90 KiB

After

Width:  |  Height:  |  Size: 90 KiB

View File

Before

Width:  |  Height:  |  Size: 38 KiB

After

Width:  |  Height:  |  Size: 38 KiB

View File

Before

Width:  |  Height:  |  Size: 73 KiB

After

Width:  |  Height:  |  Size: 73 KiB

View File

Before

Width:  |  Height:  |  Size: 100 KiB

After

Width:  |  Height:  |  Size: 100 KiB

View File

Before

Width:  |  Height:  |  Size: 123 KiB

After

Width:  |  Height:  |  Size: 123 KiB

View File

Before

Width:  |  Height:  |  Size: 99 KiB

After

Width:  |  Height:  |  Size: 99 KiB

View File

Before

Width:  |  Height:  |  Size: 191 KiB

After

Width:  |  Height:  |  Size: 191 KiB

View File

Before

Width:  |  Height:  |  Size: 81 KiB

After

Width:  |  Height:  |  Size: 81 KiB

View File

Before

Width:  |  Height:  |  Size: 289 KiB

After

Width:  |  Height:  |  Size: 289 KiB

View File

Before

Width:  |  Height:  |  Size: 62 KiB

After

Width:  |  Height:  |  Size: 62 KiB

View File

Before

Width:  |  Height:  |  Size: 70 KiB

After

Width:  |  Height:  |  Size: 70 KiB

View File

Before

Width:  |  Height:  |  Size: 72 KiB

After

Width:  |  Height:  |  Size: 72 KiB

View File

Before

Width:  |  Height:  |  Size: 48 KiB

After

Width:  |  Height:  |  Size: 48 KiB

View File

Before

Width:  |  Height:  |  Size: 558 KiB

After

Width:  |  Height:  |  Size: 558 KiB

View File

Before

Width:  |  Height:  |  Size: 135 KiB

After

Width:  |  Height:  |  Size: 135 KiB

View File

Before

Width:  |  Height:  |  Size: 63 KiB

After

Width:  |  Height:  |  Size: 63 KiB

View File

Before

Width:  |  Height:  |  Size: 93 KiB

After

Width:  |  Height:  |  Size: 93 KiB

View File

Before

Width:  |  Height:  |  Size: 18 KiB

After

Width:  |  Height:  |  Size: 18 KiB

View File

Before

Width:  |  Height:  |  Size: 44 KiB

After

Width:  |  Height:  |  Size: 44 KiB

View File

Before

Width:  |  Height:  |  Size: 207 KiB

After

Width:  |  Height:  |  Size: 207 KiB

View File

Before

Width:  |  Height:  |  Size: 11 KiB

After

Width:  |  Height:  |  Size: 11 KiB

View File

Before

Width:  |  Height:  |  Size: 72 KiB

After

Width:  |  Height:  |  Size: 72 KiB

View File

Before

Width:  |  Height:  |  Size: 305 KiB

After

Width:  |  Height:  |  Size: 305 KiB

View File

Before

Width:  |  Height:  |  Size: 79 KiB

After

Width:  |  Height:  |  Size: 79 KiB

Some files were not shown because too many files have changed in this diff Show More