teeedubb Posted April 4, 2017 Posted April 4, 2017 (edited) Hi, This is a bash script to remove duplicate artwork in LB. On my system it removed approx 30,000 images with took up about 5gb. I conducted some spot checks to make sure everything was working, but I didn't check all 30k images to verify if they were duplicates, so please use this script with caution. MAKE SURE YOU HAVE A BACKUP! It performs two types of searches: 1) For JPGs and PNGs with the same name and will delete JPGs if a PNG exists - eg. mario-03.jpg and mario-03.png. 2) A fdupes search of files with the same name, minus the files extensions and two characters on the end - eg mario-01.jpg and mario-02.jpg (files named mario-* will be pooled together for the search), but mario brothers-* will be searched separately. This is a linux script and there are several way to get it to run on windows: Natively, which is fastest. You can use a live media if you don't have a linux install. Or via Windows Subsystem for Linux or Cygwin. Either way you will need fdupes installed. The script has three options, set via variables at the beginning. Set "TMP_DIR" to the same drive that the LB image folder is located on to speed up file operations. Change "DELETE_DUPLICATES" to true only once you are happy with the results from the dry run - there is no undo feature! A list of duplicate files is written to files to lb-img-clean-log. I recommended to complete a 'image cleanup' via LB after running the script. MAKE SURE YOU HAVE A BACKUP! lb-img-clean.sh: #!/bin/bash LAUNCHBOX_MEDIA_DIR="/mnt/storage-ssd/emulation/LaunchBox/Images" TMP_DIR="/mnt/storage-ssd/lb-img-clean-temp" DELETE_DUPLICATES=false #set to true to delete ###### if [[ ! -d "$LAUNCHBOX_MEDIA_DIR" ]] ; then echo "LB media dir does not exist: "$LAUNCHBOX_MEDIA_DIR"" exit 1 fi if [[ ! "$(which fdupes)" ]] ; then echo "fdupes not found" exit 1 fi if [[ $DELETE_DUPLICATES = true ]] ; then while true; do read -p "DELETE DUPLICATES ENABLED!!! DO YOU WANT TO CONTINUE???" yn case $yn in [Yy]* ) read -t5 -n1 -r -p 'Press any key or starting in five seconds'; break;; [Nn]* ) exit 0;; * ) echo "Please answer yes or no.";; esac done else echo "Dry run..." fi WORKING_DIR="$TMP_DIR"/working if [[ -d "$TMP_DIR" ]] ; then rm -r "$TMP_DIR" fi mkdir -p "$WORKING_DIR" cd "$TMP_DIR" LB_DIR_SIZE=$(du -h -d0 "$LAUNCHBOX_MEDIA_DIR" | awk '{print $1}') LB_DIR_COUNT=$(find "$LAUNCHBOX_MEDIA_DIR" -type f | wc -l) LOG_FILE="$TMP_DIR"/lb-img-clean-log IFS=$(echo -en "\n\b") echo -en "\n" echo "LB media dir: "$LAUNCHBOX_MEDIA_DIR"" echo "Script temp dir: "$TMP_DIR"" echo "LB img dir pre size: "$LB_DIR_SIZE"" echo "LB img dir pre count "$LB_DIR_COUNT"" echo -en "\n" echo "Duplicate filename, different extension search..." echo "***duplicate filename, different extension search..." > "$LOG_FILE" for i in $(find "$LAUNCHBOX_MEDIA_DIR" -type f -name "*.jpg") ; do filename="$(basename "$i")" directory="$(dirname "$i")" extension="${filename##*.}" filename_noext="${filename%.*}" if [[ -f "$directory"/"$filename_noext".png ]] ; then echo "$i" >> "$LOG_FILE" if [[ $DELETE_DUPLICATES = true ]] ; then (rm "$i")& fi fi wait done echo "Starting fdupes search..." for d in $(find "$LAUNCHBOX_MEDIA_DIR" -mindepth 2 -type d) ; do fdupes "$d" >> "$TMP_DIR"/fdupes done echo "Preparing fdupes matches for file searches..." cat "fdupes" | while read line ; do echo $(echo -n "${line%.*}" | head -c-2) >> fdupes-cutted done echo "Searching for duplicate entries..." sed -i '/^$/d' fdupes-cutted uniq -d fdupes-cutted fdupes-cutted-uniq echo "***fdupes search..." >> "$LOG_FILE" cat "fdupes-cutted-uniq" | while read line ; do if [[ -d "$WORKING_DIR" ]] ; then rm -r "$WORKING_DIR" fi mkdir -p "$WORKING_DIR" filename="$(basename "$line")" directory="$(dirname "$line")" extension="${filename##*.}" filename_noext="${filename%.*}" mv "$line"* "$WORKING_DIR" OFFSET_IN_SEC=0 for file in $(ls "$WORKING_DIR"); do OFFSET_IN_SEC=$(( $OFFSET_IN_SEC + 1 )) TOUCH_TIMESTAMP=$(date -d "$OFFSET_IN_SEC sec" +%m%d%H%M.%S) touch -t $TOUCH_TIMESTAMP "$WORKING_DIR"/"$file" done if [[ $DELETE_DUPLICATES = true ]] ; then echo "***source directory: "$line"... " >> "$LOG_FILE" fdupes -dNf "$WORKING_DIR" >> "$LOG_FILE" else echo "***source directory: "$line"... " >> "$LOG_FILE" fdupes -f "$WORKING_DIR" >> "$LOG_FILE" fi mv "$WORKING_DIR"/* "$directory" done echo "Image cleanup completed..." if [[ $DELETE_DUPLICATES = true ]] ; then LB_DIR_SIZE=$(du -h -d0 "$LAUNCHBOX_MEDIA_DIR" | awk '{print $1}') LB_DIR_COUNT=$(find "$LAUNCHBOX_MEDIA_DIR" -type f | wc -l) echo "LB img dir post size: "$LB_DIR_SIZE"" echo "LB img dir post count "$LB_DIR_COUNT"" fi echo "Log file located at: "$LOG_FILE"" exit 0 Edited April 4, 2017 by teeedubb 3 Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.