privacy

tl;dr: Using a trustworthy VPN for all your remotely fun internet traffic is a good idea. It is cheap and easy these days, and it offers multiple levels of general protection.

I have tried very hard to ensure the hydrus network servers respect your privacy. They do not work like normal websites, and the amount of information your client will reveal to them is very limited. For most general purposes, normal users can rest assured that their activity on a repository like the Public Tag Repository (PTR) is effectively completely anonymous.

You need an account to connect, but all that really means serverside is a random number with a random passcode. Your client tells nothing more to the server than the exact content you upload to it (e.g. tag mappings, which are a tag+file_hash pair). The server cannot help but be aware of your IP address to accept your network request, but in all but one situation--uploading a file to a file repository when the administrator has set to save IPs for DMCA purposes--it forgets your IP as soon as the job is done.

So that janitors can process petitions efficiently and correct mistakes, servers remember which accounts upload which content, but they do not communicate this to any place, and the memory only lasts for a certain time--after which the content is completely anonymised. The main potential privacy worries are over a malicious janitor or--more realistically, since the janitor UI is even more buggy and feature-poor than the hydrus front-end!--a malicious server owner or anyone else who gains raw access to the server's raw database files or its code as it operates. Even in the case where you cannot trust the server you are talking to, hydrus should be fairly robust, simply because the client does not say much to the server, nor that often. The only realistic worries, as I talk about in detail below, are if you actually upload personal files or tag personal files with real names. I can't do much about being Anon if you (accidentally or not), declare who you are.

So, in general, if you are on a good VPN and tagging anime babes from boorus, I think we are near perfect on privacy. That said, our community is rightly constantly thinking about this topic, so in the following I have tried to go into exhaustive detail. Some of the vulnerabilities are impractical and esoteric, but if nothing else it is fun to think about. If you can think of more problems, or decent mitigations, let me know!

https certificates

Hydrus servers only communicate in https, so anyone who is able to casually observe your traffic (say your roommate cracked your router, or the guy running the coffee shop whose wifi you are using likes to snoop) should not ever be able to see what data you are sending or receiving. If you do not use a VPN, they will be able to see that you are talking to the repository (and the repository will technically see who you are, too, though as above, it normally isn't interested). Someone more powerful, like your ISP or Government, may be able to do more:

You can check how good a server's certificate is by loading its base address in the form https://host:port into your browser. If it has a nice certificate--like the PTR--the welcome page will load instantly. If it is still on self-signed, you'll get one of those 'can't show this page unless you make an exception' browser error pages before it will show.

accounts

An account has two hex strings, like this:

When you generate a new account, the client first asks the server for a list of available auto-creatable account types, then asks for a registration token for one of them, then uses the token to generate an access key. The server is never told anything about you, and it forgets your IP address as soon as it finishes talking to you.

Your account also stores a bandwidth use record and some miscellaneous data such as when the account was created, if and when it expires, what permissions and bandwidth rules it has, an aggregate score of how often it has petitions approved rather than denied, and whether it is currently banned. I do not think someone inspecting the bandwidth record could figure out what you were doing based on byte counts (especially as with every new month the old month's bandwidth records are compressed to just one number) beyond the rough time you synced and whether you have done much uploading. Since only a janitor can see your account and could feasibly attempt to inspect bandwidth data, they would already know this information.

downloading

When you sync with a repository, your client will download and then keep up to date with all the metadata the server knows. This metadata is downloaded the same way by all users, and it comes in a completely anonymous format. The server does not know what you are interested in, and no one who downloads knows who uploaded what. Since the client regularly updates, a detailed analysis of the raw update files will reveal roughly when a tag or other row was added or deleted, although that timestamp is no more precise than the duration of the update period (by default, 100,000 seconds, or a little over a day).

Your client will never ask the server for information about a particular file or tag. You download everything in generic chunks, form a local index of that information, and then all queries are performed on your own hard drive with your own CPU.

By just downloading, even if the server owner were to identify you by your IP address, all they know is that you sync. They cannot tell anything about your files.

In the case of a file repository, you client downloads all the thumbnails automatically, but then you download actual files separately as you like. The server does not log which files you download.

uploading

When you upload, your account is temporarily linked to the rows of content you add. This is so janitors can group petitions by who makes them, undo large mistakes easily, and even leave you a brief message (like "please stop adding those clothing siblings") for your client to pick up the next time it syncs your account. After the temporary period is over, all submissions are anonymised. So, what are the privacy concerns with that? Isn't the account 'Anon'?

Privacy can be tricky. Hydrus tech is obviously far, far better than anything normal consumers use, but here I believe are the remaining barriers to pure Anonymity, assuming someone with resources was willing to put a lot of work in to attack you:

I am using the PTR as the example since that is what most people are using. If you are uploading to a server run between friends, privacy is obviously more difficult to preserve--if there are only three users, it may not be too hard to figure out who is uploading the NarutoXSonichu diaperfur content! If you are talking to a server with a small group of users, don't upload anything crazy or personally identifying unless that's the point of the server.

account history anonymisation

As the PTR moved to multiple accounts, we talked more about the potential account cross-referencing worries. The threats are marginal today, but it may be a real problem in future. If the server database files were to ever fall into bad hands, having a years-old record of who uploaded what is not excellent. Like the AOL search leak, that data may have unpleasant rammifications, especially to an intelligent scraper in the future. This historical record is also not needed for most janitorial work.

Therefore, hydrus repositories now completely anonymise all uploads after a certain delay. It works by assigning ownership of every file, mapping, or tag sibling/parent to a special 'null' account, so all trace that your account uploaded any of it is deleted. It happens by default 90 days after the content is uploaded, but it can be more or less depending on the local admin and janitors. You can see the current 'anonymisation' period under review services.

If you are a janitor with the ability to modify accounts based on uploaded content, you will see anything old will bring up the null account. It is specially labelled, so you can't miss it. You cannot ban or otherwise alter this account. No one can actually use it.