I often need to perform batch operations on large groups of images. Most of the time, I can do these manually, but that's often a lot of work. So I've written some simple command-line tools to automate the process.
All of these tools are written in Node. They could probably be re-written in Bash if you had the time, but I don't know Bash well-enough and I don't use Linux anyways. For daily use, I put these in a folder in my PATH with batch files containing the following:
@node <path to script> %*
Most of these require external packages, which you can get by running
npm install <name> in whatever folder you put these in.
This is a utility for finding files that are larger than they need to be, and making them smaller. The metric for "larger than it needs to be" is 600K. I've chosen this because most files I deal with end up under 600K after this tool runs. It could be improved to avoid re-encoding files, which happens occasionally.
As you can tell, I'm not exactly an image quality snob. I much prefer hard drive space.
The process of encoding is done with
ffmpeg, which needs to be in your path. It encodes every JPEG and PNG file in the directory you run it from into a JPEG with
-q:v set to 2 and a width of 1000, preserving the aspect ratio.
This doesn't require any external packages.
We've all been there. You've gone to an imageboard and downloaded a bunch of files, and now you've got a ton of files named with post IDs and that's no good.
dechanify renames all of the files in a directory to their SHA-1 hashes. It's not just applicable to downloading images from an imageboard - if there's any situation where the filenames don't hold important information and you're annoyed because they're all in different formats, this is the tool for you.
By default, this will only rename files whose names don't appear to be hashes - so if a file is named
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.jpg, it won't be renamed, because that's technically a valid SHA-1 hash. If you want to be absolutely sure, run it with the
all parameter (like
node dechanify.js all).
graceful-fs, to avoid going over the limit on open file handles.
There's a lot of image de-duplication tools out there. Sometimes I use one of these, because perceptual hashing does a damn good job of finding duplicates. Unfortunately, this takes a lot of time to compute and then you have to go through and check every match to see if it's truly a duplicate.
What I'm looking for is a tool that checks only exact duplicates - files that are byte-for-byte identical. I wrote this tool because I noticed that
dechanify kept telling me I had files that had the same hash, and there's almost no way that could happen if the files weren't exactly the same.
By default, this tool only outputs a list of the duplicate files. If you want, you can go through this list and decide which ones to keep and which ones to delete. If you don't care for that, you can run it with the
keep-first parameter (like
node dedup.js keep-first), which'll delete all matches except for the first one found.
Note: I'm aware Google has successfully demonstrated a collision attack on SHA-1. But I'm assuming nobody slipped a file onto my hard drive with the sole purpose of confusing my image de-duplication tool.
This requires three packages:
Sometimes I'll get files whose names are in order, but only using a natural sort - that is, 1.jpg, 2.jpg, 3.jpg ... 12.jpg, 13.jpg. This works fine in most cases (such as using Windows Explorer), but some tools mess it up (I'm looking at you, Dropbox for Android). Even worse is when you have files that aren't in a sequence, but that belong in a sequence.
This tool is meant to solve that. For the first case, it applies a natural sort but outputs the filenames either padded to three characters (001.jpg), or padded to the number of characters in the highest number of files (so if the last file is 1293.jpg, the first file will be 0001.jpg). This'll give it the correct sort in programs that don't use a natural sort.
For the second case, you can optionally have it order by timestamp. So if you added the files to the folder in order (for example, downloading them from a website in the order they appeared), this'll probably give you the correct order. To enable this, run it with the
time parameter (like
node sequencer.js time).
By default, this will only print the correct sequence - this gives you a chance to check to make sure it came out right, to avoid giving the files an even worse sequence. When you've looked it over, re-run it with the
commit parameter (like
node sequencer.js commit or
node sequencer.js time commit).
This doesn't require any external packages.