teeedubb Posted April 4, 2017 Share Posted April 4, 2017 (edited) Hi, This is a bash script to remove duplicate artwork in LB. On my system it removed approx 30,000 images with took up about 5gb. I conducted some spot checks to make sure everything was working, but I didn't check all 30k images to verify if they were duplicates, so please use this script with caution. MAKE SURE YOU HAVE A BACKUP! It performs two types of searches: 1) For JPGs and PNGs with the same name and will delete JPGs if a PNG exists - eg. mario-03.jpg and mario-03.png. 2) A fdupes search of files with the same name, minus the files extensions and two characters on the end - eg mario-01.jpg and mario-02.jpg (files named mario-* will be pooled together for the search), but mario brothers-* will be searched separately. This is a linux script and there are several way to get it to run on windows: Natively, which is fastest. You can use a live media if you don't have a linux install. Or via Windows Subsystem for Linux or Cygwin. Either way you will need fdupes installed. The script has three options, set via variables at the beginning. Set "TMP_DIR" to the same drive that the LB image folder is located on to speed up file operations. Change "DELETE_DUPLICATES" to true only once you are happy with the results from the dry run - there is no undo feature! A list of duplicate files is written to files to lb-img-clean-log. I recommended to complete a 'image cleanup' via LB after running the script. MAKE SURE YOU HAVE A BACKUP! lb-img-clean.sh: #!/bin/bash LAUNCHBOX_MEDIA_DIR="/mnt/storage-ssd/emulation/LaunchBox/Images" TMP_DIR="/mnt/storage-ssd/lb-img-clean-temp" DELETE_DUPLICATES=false #set to true to delete ###### if [[ ! -d "$LAUNCHBOX_MEDIA_DIR" ]] ; then echo "LB media dir does not exist: "$LAUNCHBOX_MEDIA_DIR"" exit 1 fi if [[ ! "$(which fdupes)" ]] ; then echo "fdupes not found" exit 1 fi if [[ $DELETE_DUPLICATES = true ]] ; then while true; do read -p "DELETE DUPLICATES ENABLED!!! DO YOU WANT TO CONTINUE???" yn case $yn in [Yy]* ) read -t5 -n1 -r -p 'Press any key or starting in five seconds'; break;; [Nn]* ) exit 0;; * ) echo "Please answer yes or no.";; esac done else echo "Dry run..." fi WORKING_DIR="$TMP_DIR"/working if [[ -d "$TMP_DIR" ]] ; then rm -r "$TMP_DIR" fi mkdir -p "$WORKING_DIR" cd "$TMP_DIR" LB_DIR_SIZE=$(du -h -d0 "$LAUNCHBOX_MEDIA_DIR" | awk '{print $1}') LB_DIR_COUNT=$(find "$LAUNCHBOX_MEDIA_DIR" -type f | wc -l) LOG_FILE="$TMP_DIR"/lb-img-clean-log IFS=$(echo -en "\n\b") echo -en "\n" echo "LB media dir: "$LAUNCHBOX_MEDIA_DIR"" echo "Script temp dir: "$TMP_DIR"" echo "LB img dir pre size: "$LB_DIR_SIZE"" echo "LB img dir pre count "$LB_DIR_COUNT"" echo -en "\n" echo "Duplicate filename, different extension search..." echo "***duplicate filename, different extension search..." > "$LOG_FILE" for i in $(find "$LAUNCHBOX_MEDIA_DIR" -type f -name "*.jpg") ; do filename="$(basename "$i")" directory="$(dirname "$i")" extension="${filename##*.}" filename_noext="${filename%.*}" if [[ -f "$directory"/"$filename_noext".png ]] ; then echo "$i" >> "$LOG_FILE" if [[ $DELETE_DUPLICATES = true ]] ; then (rm "$i")& fi fi wait done echo "Starting fdupes search..." for d in $(find "$LAUNCHBOX_MEDIA_DIR" -mindepth 2 -type d) ; do fdupes "$d" >> "$TMP_DIR"/fdupes done echo "Preparing fdupes matches for file searches..." cat "fdupes" | while read line ; do echo $(echo -n "${line%.*}" | head -c-2) >> fdupes-cutted done echo "Searching for duplicate entries..." sed -i '/^$/d' fdupes-cutted uniq -d fdupes-cutted fdupes-cutted-uniq echo "***fdupes search..." >> "$LOG_FILE" cat "fdupes-cutted-uniq" | while read line ; do if [[ -d "$WORKING_DIR" ]] ; then rm -r "$WORKING_DIR" fi mkdir -p "$WORKING_DIR" filename="$(basename "$line")" directory="$(dirname "$line")" extension="${filename##*.}" filename_noext="${filename%.*}" mv "$line"* "$WORKING_DIR" OFFSET_IN_SEC=0 for file in $(ls "$WORKING_DIR"); do OFFSET_IN_SEC=$(( $OFFSET_IN_SEC + 1 )) TOUCH_TIMESTAMP=$(date -d "$OFFSET_IN_SEC sec" +%m%d%H%M.%S) touch -t $TOUCH_TIMESTAMP "$WORKING_DIR"/"$file" done if [[ $DELETE_DUPLICATES = true ]] ; then echo "***source directory: "$line"... " >> "$LOG_FILE" fdupes -dNf "$WORKING_DIR" >> "$LOG_FILE" else echo "***source directory: "$line"... " >> "$LOG_FILE" fdupes -f "$WORKING_DIR" >> "$LOG_FILE" fi mv "$WORKING_DIR"/* "$directory" done echo "Image cleanup completed..." if [[ $DELETE_DUPLICATES = true ]] ; then LB_DIR_SIZE=$(du -h -d0 "$LAUNCHBOX_MEDIA_DIR" | awk '{print $1}') LB_DIR_COUNT=$(find "$LAUNCHBOX_MEDIA_DIR" -type f | wc -l) echo "LB img dir post size: "$LB_DIR_SIZE"" echo "LB img dir post count "$LB_DIR_COUNT"" fi echo "Log file located at: "$LOG_FILE"" exit 0 Edited April 4, 2017 by teeedubb 3 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.