From b4afedb6178669f8263f0818c866d25576f8ac5c Mon Sep 17 00:00:00 2001 From: Hydrus Network Developer Date: Wed, 27 Sep 2023 16:12:55 -0500 Subject: [PATCH] Version 545 closes #394 --- docs/changelog.md | 92 ++-- docs/database_migration.md | 11 +- docs/developer_api.md | 4 +- docs/getting_started_installing.md | 6 +- docs/old_changelog.html | 44 +- hydrus/client/ClientDuplicates.py | 38 +- hydrus/client/ClientFiles.py | 99 ++-- hydrus/client/caches/ClientCaches.py | 57 ++- hydrus/client/db/ClientDB.py | 450 ++++++++++++++++-- hydrus/client/db/ClientDBFilesMaintenance.py | 38 +- .../db/ClientDBFilesMaintenanceQueue.py | 15 +- .../client/db/ClientDBFilesMetadataBasic.py | 138 +++--- hydrus/client/db/ClientDBFilesMetadataRich.py | 169 ------- hydrus/client/db/ClientDBSimilarFiles.py | 5 + hydrus/client/gui/ClientGUI.py | 25 +- hydrus/client/gui/ClientGUICharts.py | 2 + hydrus/client/gui/ClientGUIMedia.py | 42 +- .../gui/ClientGUIScrolledPanelsReview.py | 181 ++++++- hydrus/client/gui/canvas/ClientGUICanvas.py | 25 +- hydrus/client/gui/pages/ClientGUIResults.py | 35 +- hydrus/client/importing/ClientImportFiles.py | 15 +- .../client/importing/ClientImportGallery.py | 10 +- .../client/importing/ClientImportWatchers.py | 10 +- hydrus/client/media/ClientMedia.py | 7 +- hydrus/client/media/ClientMediaManagers.py | 4 +- .../networking/ClientLocalServerResources.py | 13 +- hydrus/core/HydrusArchiveHandling.py | 10 +- hydrus/core/HydrusConstants.py | 4 +- hydrus/core/HydrusExceptions.py | 1 + hydrus/core/HydrusFileHandling.py | 20 +- hydrus/core/HydrusGlobals.py | 2 + hydrus/core/HydrusImageHandling.py | 108 ++++- hydrus/core/HydrusPSDHandling.py | 6 +- .../core/networking/HydrusServerResources.py | 18 +- hydrus/external/blurhash.py | 6 +- hydrus/test/TestClientAPI.py | 34 +- .../gugs/deviant art artist lookup.png | Bin 2565 -> 0 bytes .../login_scripts/deviant art login.png | Bin 2693 -> 0 bytes 38 files changed, 1229 insertions(+), 515 deletions(-) delete mode 100644 static/default/gugs/deviant art artist lookup.png delete mode 100644 static/default/login_scripts/deviant art login.png diff --git a/docs/changelog.md b/docs/changelog.md index 946ebe64..423f0065 100644 --- a/docs/changelog.md +++ b/docs/changelog.md @@ -7,6 +7,55 @@ title: Changelog !!! note This is the new changelog, only the most recent builds. For all versions, see the [old changelog](old_changelog.html). +## [Version 545](https://github.com/hydrusnetwork/hydrus/releases/tag/v545) + +### blurhash + +* thanks to a user's work, hydrus now calculates the [blurhash](https://blurha.sh/) of files with a thumbnail! (issue #394) +* if a file has no thumbnail but does have a blurhash (e.g. missing files, or files you previously deleted and are looking at in a clever view), it now presents a thumbnail generated from that blurhash +* all existing thumbnail-having files are scheduled for a blurhash calculation (this is a new job in the file maintenance system). if you have hundreds of thousands of files, expect it to take a couple of weeks/months to clear. if you need to hurry this along, the queue is under _database->file maintenance_ +* any time a file's thumbnail changes, the blurhash is scheduled for a regen +* for this first version, the blurhash is very small and simple, either 15 or 16 cells for ~34 bytes. if we end up using it a lot somewhere, I'd be open to making a size setting so you can set 8x8 or higher grids for actually decent blur-thumbs +* a new _help->debug_ report mode switches to blurhashes instead of normal thumbs + +### file history search + +* I did to the file history chart (_help->view file history_) what I did to mr bones a couple weeks ago. you can now search your history of imports, archives, and deletes for creator x, filetype y, or any other search you can think of +* I hacked this all together right at the end of my week, so please bear with me if there are bugs or dumb permitted domains/results. the default action when you first open it up should all work the same way as before, no worries™, but let me know how you get on and I'll fix it! +* there's more to do here. we'll want a hideable search panel, a widget to control the resolution of the chart (currently fixed at 7680 to look good blown up on a 4k), and it'd be nice to have a selectable date range +* in the longer term future, it'd be nice to have more lines of data and that chart tech you see on financial sites where it shows you the current value where your mouse is + +### client api + +* the `file_metadata` call now says the new blurhash. if you pipe it into a blurhash library and blow it up to an appopriate ratio canvas, it _should_ just work. the typical use is as a placeholder while you wait for thumbs/files to download +* a new `include_blurhash` parameter will include the blurhash when `only_return_basic_information` is true +* `file_metadata` also shows the file's `pixel_hash` now. the algorithm here is proprietary to hydrus, but you can throw it into 'system:similar files' to find pixel dupes. I expect to add perceptual hashes too +* the help is updated to talk about this +* I updated the unit tests to deal with this +* the error when the api fails to parse the client api header is now a properly handled 400 (previously it was falling to the 500 backstop) +* the client api version is now 53 + +### misc + +* I'm sorry to say I'm removing the Deviant Art artist search and login script for all new users, since they are both broken. DA have been killing their nice old API in pieces, and they finally took down the old artist gallery fetch. :(. there may be a way to finagle-parse their new phone-friendly, live-loading, cloud-deployed engine, but when I look at it, it seems like a much bigger mess than hydrus's parsing system can happily handle atm. the 'correct' way to programatically parse DA is through their new OAuth API, which we simply do not support. individual page URLs seem to still work, but I expect them to go soon too. Sorry folks, try gallery-dl for now--they have a robust OAuth solution +* thanks to a user, we now have 'epub' ebook support! no 'num_words' support yet, but it looks like epubs are really just zips with some weird metadata files and a bunch of html inside, so I think this'll be doable with a future hacky parser. all your existing zip files wil be scheduled for a metadata rescan to see if they are actually epubs (this'll capture any secret kritas and procreates, too, I think) +* the main UI-level media object is now aware of a file's pixel hash. this is now used in the duplicate filter's 'these are pixel duplicates' statements to save CPU. the jank old on-the-fly calculation code is all removed now, and if these values are missing from the media object, a message will now be shown saying the pixel dupe status could not be determined. we have had multiple rounds of regen over the past year and thus almost all clients have full database data here, so fingers crossed we won't see this error state much if at all, but let me know if you do and I'll figure out a button to accelerate the fix +* the thumbnail _right-click->open->similar files_ menu now has an entry for 'open the selection in a new duplicate filter page', letting you quickly resolve the duplicates that involve the selected files +* pixel hash and blurhash are now listed, with the actual hash value, in the _share->copy->hash_ thumbnail right-click menu +* thanks to a user, 'MPO' jpegs (some weird multi-picture jpeg that we can't page through yet) now parse their EXIF correctly and should rotate on a metadata-reparse. since these are rare, I'm not going to schedule a rescan over everyone's jpegs, but if you see a jpeg that is rotated wrong, try hitting _manage->regenerate->file metadata_ on its thumbnail menu +* I may have fixed a rare hang when highlighting a downloader/watcher during very busy network time that involves that includes that importer +* added a warning to the 'getting started with installing' and 'database migration' help about running the SQLite database off a compressed filesystem--don't do it! +* fixed thumbnail generation for greyspace PSDs (and perhaps some others) + +### boring cleanup + +* I cleaned some code and added some tests around the new blurhash tech and thumbs in general +* a variety of metadata changes such as 'has exif', 'has icc profile' now trigger a live update on thumbnails currently loaded into the UI +* cleaned up some old file metadata loading code +* re-sorted the job list dropdown in the file maintenance dialog +* some file maintenance database work should be a bit faster +* fixed some behind the scenes stuff when the file history chart has no file info to show + ## [Version 544](https://github.com/hydrusnetwork/hydrus/releases/tag/v544) ### webp vulnerability @@ -367,46 +416,3 @@ title: Changelog * wrote a unit test to catch the new delete lock test * deleted the old-and-deprecated-in-one-week 'pair_rows' parameter-handling code in the set_file_relationships command * the client api version is now 49 - -## [Version 535](https://github.com/hydrusnetwork/hydrus/releases/tag/v535) - -### misc - -* thanks to a user, we now have Krita (.kra, .krz) support! it even pulls thumbnails! -* thanks to another user, we now have SVG (.svg) support! it even generates thumbnails! -* I think I fixed a comparison statement calculator divide-by-zero error in the duplicate filter when you compare a file with a resolution with a file without one - -### petitions overview - -* _this is a workflow/usability update only for server janitors_ -* tl;dr: the petitions page now fetches many petitions at once. update your servers and clients for it all to work right -* so, the petitions page now fetches lots of petitions with each 'fetch' button click. you can set how many it will fetch with a new number control -* the petitions are shown in a new multi-column list that shows action, account id, reason, and total weight. the actual data for the petitions will load in quickly, reflected in the list. as soon as the first is loaded, it is highlighted, but double-click any to highlight it in the old petition UI as normal -* when you process petitions, the client moves instantly to the next, all fitting into the existing workflow, without having to wait for the server to fetch a new one after you commit -* you can also mass approve/deny from here! if one account is doing great or terrible stuff, you can now blang it all in one go - -### petitions details - -* the 'fetch x petition' buttons now show `(*)` in their label if they are the active petition type being worked on -* petition pages now remember: the last petition type they were looking at; the number of petitions to fetch; and the number of files to show -* the petition page will pause any ongoing petition fetches if you close it, and resume if you unclose it -* a system where multi-mapping petitions would be broken up and delivered in tags with weight-similar chunks (e.g. if would say 'aaa for 11 files' and 'bbb in 15 files' in the same fetch, but not 'ccc in 542,154 files') is abandoned. this was not well explained and was causing confusion and code complexity. these petitions now appear clientside in full -* another system, where multi-mapping petitions would be delivered in same-namespace chunks, is also abandoned, for similar reasons. it was causing more confusion, especially when compared to the newer petition counting tech I've added. perhaps it will come back in as a clientside filter option -* the list of petitions you are given _should_ also be neatly grouped by account id, so rather than randomly sampling from all petitions, you'll get batches by user x, y, or z, and in most cases you'll be looking at everything by user x, and y, and then z up to the limit of num petitions you chose to fetch -* drawback: since petitions' content can overlap in complicated ways, and janitors can work on the same list at the same time, in edge cases the list you see can be slightly out of sync with what the server actually has. this isn't a big deal, and the worst case is wasted work as you approve the same thing twice. I tried to implement 'refresh list if count drops more than expected' tech, but the situation is complicated and it was spamming too much. I will let you refresh the list with a button click yourself for now, as you like, and please let me know where it works and fails -* drawback: I added some new objects, so you have to update both server and client for this to work. older/newer combinations will give you some harmless errors -* also, if your list starts running low, but there are plenty more petitions to work on, it will auto-refresh. again, it won't interrupt your current work, but it will fetch more. let me know how it works out -* drawback: while the new petition summary list is intentionally lightweight, I do spend some extra CPU figuring it out. with a high 'num petitions to fetch', it may take several seconds for a very busy server like the PTR just to fetch the initial list, so please play around with different fetch sizes and let me know what works well and what is way too slow -* there are still some things I want to do to this page, which I want to slip in the near future. I want to hide/show the sort and 'num files to show' widgets as appropriate, figure out a right-click menu for the new list to retry failures, and get some shortcut support going - -### boring code cleanup - -* wrote a new petition header object to hold content type, petition status, account id, and reason for petitions -* serverside petition fetching is now split into 'get petition headers' and 'get petition data'. the 'headers' section supports filtering by account id and in future reason -* the clientside petition management UI code pretty much got a full pass -* cleaned a bunch of ancient server db code -* cleaned a bunch of the clientside petition code. it was a real tangle -* improved the resilience of the hydrus server when it is given unacceptable tags in a content update -* all fetches of multiple rows of data from multi-column lists now happen sorted. this is just a little thing, but it'll probably dejank a few operations where you edit several things at once or get some errors and are trying to figure out which of five things caused it -* the hydrus official mimetype for psd files is now 'image/vnd.adobe.photoshop' (instead of 'application/x-photoshop') -* with krita file (which are actually just zip files) support, we now have the very barebones of archive tech started. I'll expand it a bit more and we should be able to improve support for other archive-like formats in the future diff --git a/docs/database_migration.md b/docs/database_migration.md index 27f5dc6c..2790678a 100644 --- a/docs/database_migration.md +++ b/docs/database_migration.md @@ -19,7 +19,7 @@ A hydrus client consists of three components: 2. **the actual SQLite database** - The client stores all its preferences and current state and knowledge _about_ files--like file size and resolution, tags, ratings, inbox status, and so on and so on--in a handful of SQLite database files, defaulting to _install_dir/db_. Depending on the size of your client, these might total 1MB in size or be as much as 10GB. + The client stores all its preferences and current state and knowledge _about_ files--like file size and resolution, tags, ratings, inbox status, and so on and on--in a handful of SQLite database files, defaulting to _install_dir/db_. Depending on the size of your client, these might total 1MB in size or be as much as 10GB. In order to perform a search or to fetch or process tags, the client has to interact with these files in many small bursts, which means it is best if these files are on a drive with low latency. An SSD is ideal, but a regularly-defragged HDD with a reasonable amount of free space also works well. @@ -84,10 +84,17 @@ To tell it about the new database location, pass it a `-d` or `--db_dir` command * `hydrus_client -d="D:\media\my_hydrus_database"` * _--or--_ * `hydrus_client --db_dir="G:\misc documents\New Folder (3)\DO NOT ENTER"` +* _--or, from source--_ +* `python hydrus_client.py -d="D:\media\my_hydrus_database"` * _--or, for macOS--_ * `open -n -a "Hydrus Network.app" --args -d="/path/to/db"` -And it will instead use the given path. If no database is found, it will similarly create a new empty one at that location. You can use any path that is valid in your system, but I would not advise using network locations and so on, as the database works best with some clever device locking calls these interfaces may not provide. +And it will instead use the given path. If no database is found, it will similarly create a new empty one at that location. You can use any path that is valid in your system. + +!!! danger "Bad Locations" + **Do not run a SQLite database on a network location!** The database relies on clever hardware-level exclusive file locks, which network interfaces often fake. While the program may work, I cannot guarantee the database will stay non-corrupt. + + **Do not run a SQLite database on a location with filesystem-level compression enabled!** In the best case (BTRFS), the database can suddenly get extremely slow when it hits a certain size; in the worst (NTFS), a >50GB database will encounter I/O errors and receive sporadic corruption! Rather than typing the path out in a terminal every time you want to launch your external database, create a new shortcut with the argument in. Something like this: diff --git a/docs/developer_api.md b/docs/developer_api.md index 0c687540..e2b35b78 100644 --- a/docs/developer_api.md +++ b/docs/developer_api.md @@ -1536,6 +1536,7 @@ Response: "ipfs_multihashes" : {}, "has_audio" : false, "blurhash" : "U6PZfSi_.AyE_3t7t7R**0o#DgR4_3R*D%xt", + "pixel_hash" : "2519e40f8105599fcb26187d39656b1b46f651786d0e32fff2dc5a9bc277b5bb", "num_frames" : null, "num_words" : null, "is_inbox" : false, @@ -1608,6 +1609,7 @@ Response: }, "has_audio" : true, "blurhash" : "UHF5?xYk^6#M@-5b,1J5@[or[k6.};FxngOZ", + "pixel_hash" : "1dd9625ce589eee05c22798a9a201602288a1667c59e5cd1fb2251a6261fbd68", "num_frames" : 102, "num_words" : null, "is_inbox" : false, @@ -1728,7 +1730,7 @@ Size is in bytes. Duration is in milliseconds, and may be an int or a float. The `thumbnail_width` and `thumbnail_height` are a generally reliable prediction but aren't a promise. The actual thumbnail you get from [/get\_files/thumbnail](#get_files_thumbnail) will be different if the user hasn't looked at it since changing their thumbnail options. You only get these rows for files that hydrus actually generates an actual thumbnail for. Things like pdf won't have it. You can use your own thumb, or ask the api and it'll give you a fixed fallback; those are mostly 200x200, but you can and should size them to whatever you want. -`blurhash` gives a base 83 encoded string of a [blurhash](https://blurha.sh/) generated from the file's thumbnail if the file has a thumbnail. +If the file has a thumbnail, `blurhash` gives a base 83 encoded string of its [blurhash](https://blurha.sh/). `pixel_hash` is an SHA256 of the image's pixel data and should exactly match for pixel-identical files (it is used in the duplicate system for 'must be pixel duplicates'). #### tags diff --git a/docs/getting_started_installing.md b/docs/getting_started_installing.md index 41fc94e9..7c3c98a4 100644 --- a/docs/getting_started_installing.md +++ b/docs/getting_started_installing.md @@ -72,8 +72,10 @@ I try to release a new version every Wednesday by 8pm EST and write an accompany By default, hydrus stores all its data—options, files, subscriptions, _everything_—entirely inside its own directory. You can extract it to a usb stick, move it from one place to another, have multiple installs for multiple purposes, wrap it all up inside a truecrypt volume, whatever you like. The .exe installer writes some unavoidable uninstall registry stuff to Windows, but the 'installed' client itself will run fine if you manually move it. -!!! warning "Network Install" - Unless you are an expert, do not install your client to a network location (i.e. on a different computer's hard drive)! The database is sensitive to interruption and requires good file locking, which network storage often fakes. There are [ways of splitting your client up](database_migration.md) so the database is on a local SSD but the files are on a network--this is fine--but you really should not put the database on a remote machine unless you know what you are doing and have a backup in case things go wrong. +!!! danger "Bad Locations" + **Do not install to a network location!** (i.e. on a different computer's hard drive) The SQLite database is sensitive to interruption and requires good file locking, which network interfaces often fake. There are [ways of splitting your client up](database_migration.md) so the database is on a local SSD but the files are on a network--this is fine--but you really should not put the database on a remote machine unless you know what you are doing and have a backup in case things go wrong. + + **Do not install to a location with filesystem-level compression enabled!** It may work ok to start, but when the SQLite database grows to large size, this can cause extreme access latency and I/O errors and corruption. !!! info "For macOS users" The Hydrus App is **non-portable** and puts your database in `~/Library/Hydrus` (i.e. `/Users/[You]/Library/Hydrus`). You can update simply by replacing the old App with the new, but if you wish to backup, you should be looking at `~/Library/Hydrus`, not the App itself. diff --git a/docs/old_changelog.html b/docs/old_changelog.html index 54220f49..72cd071a 100644 --- a/docs/old_changelog.html +++ b/docs/old_changelog.html @@ -34,6 +34,48 @@

changelog