<---- Back to the introduction

url classes

The fundamental connective part of the downloader system is the 'URL Class'. This object identifies and normalises URLs and links them to other components. When the client handles or presents a URL, it consults the respective URL Class on what to do.

the types of url

For hydrus, an URL is useful if it is one of:

the components of a url

For our purposes, a URL string has four parts:

So, let's look at the 'edit url class' panel, which is found under network->manage url classes:

A TBIB File Page like https://tbib.org/index.php?page=post&s=view&id=6391256 is a Post URL. Let's go over the four components again:

This URL Class will be assigned to any URL that matches the location, path, and query. Missing components in the URL will invalidate the match but additonal components will not!

For instance:

Only URL A will match

And:

Both URL A and B will match

And:

Both URL A and B will match, URL C will not

If multiple URL Classes match a URL, the client will try to assign the most 'complicated' one, with the most path components and then query parameters.

Given two example URLs and URL Classes:

URL A will match URL Class A but not URL Class B and so will receive A.

URL B will match both and will receive URL Class B as it is more complicated.

This situation is not common, but I expect it to be an issue with Pixiv, where some Post URLs link to a subset of manga pages that have their own gallery system, wew.

string matches

As you edit these components, you will be presented with the Edit String Match Panel:

This lets you set the type of string that will be valid for that component. If a given path or query component does not match the rules given here, the URL will not match the URL Class. Most of the time you will probably want to set 'fixed characters' of something like "post" or "index.php", but if the component you are editing is more complicated and could have a range of different valid values, you can specify just numbers or letters or even a regex pattern. If you try to do something complicated, experiment with the 'example string' entry to make sure you have it set how you think.

Don't go overboard with this stuff, though--most sites do not have super-fine distinctions between their different URL types, and hydrus users will not be dropping user account or logout pages or whatever on the client, so you can be fairly liberal with the rules.

normalising urls

Different URLs can give the same page. The http and https versions of a URL are typically the same, and "http://site.com/index.php?s=post&id=123456" results in the same content as "http://site.com/index.php?id=123456&s=post", and "https://e621.net/post/show/1421754/abstract_background-animal_humanoid-blush-brown_ey" is the same as "https://e621.net/post/show/1421754".

Since we are in the business of storing and comparing URLs, we want to 'normalise' them to a single comparable beautiful value. You see a preview of this normalisation on the edit panel.

Gallery and Watchable URLs are not compared, so a normalise call for them only switches their http/https to the preferred value, but File and Post URLs will cut out any surplus path or query components and will alphabetise the query arguments as well.

Since File and Post URLs will remove anything surplus, be careful that you not leave out anything important in your rules. Make sure what you have is both necessary (nothing can be removed and still keep it valid) and sufficient (no more needs to be added to make it valid). It is a good idea to try pasting the 'normalised' version of the example URL into your browser, just to check it still works.

gallery rules do not need to be sufficient

Advanced--feel free to skip for now

For Gallery URLs, however, it can sometimes be useful to specify just a set of necessary rules. This saves your time and covers a broader set of URLs like these:

Rather than making two rules--one with the additional "/page/(number)" and one without--you can just make one for "pictures/user/(characters)/scraps", which will match all three examples above.

While hydrus downloaders tend to generate valid first page URLs with something like "/page/1" or "pid=0" or "index=0", the sites themselves tend to link a 'bare' URL to a user browsing with a mouse. If you demand the 'page' or 'index' part in your Gallery URL Classes, a user who finds a nice gallery and tries to drop the first page's URL, as the site presented it, onto the client will only get a 'Couldn't find a URL Class for that!' error.

But if there isn't a nice way to create a single non-ambiguous class, just make multiple.

api urls

If you know that a URL has an API backend, you can tell the client to use that API URL when it fetches data. The API URL needs its own URL Class.

To define the relationship, click the "String Converter" button, which gives you this:

You may have seen this panel elsewhere. It lets you convert a string to another over a number of transformation steps. The steps can be as simple as adding or removing some characters or applying a full regex substitution. For API URLs, you are mostly looking to isolate some unique identifying data ("m/thread/16086187" in this case) and then substituting that into the new API path. It is worth testing this with several different examples!

When the client links regular URLs to API URLs like this, it will still associate the human-pretty regular URL when it needs to display to the user and record 'known urls' and so on. The API is just a quick lookup when it actually fetches and parses the respective data.

Let's learn about Parsers ---->