Jump to content
LaunchBox Community Forums

Script to remove duplicate artwork


Recommended Posts

Hi,

This is a bash script to remove duplicate artwork in LB. On my system it removed approx 30,000 images with took up about 5gb. I conducted some spot checks to make sure everything was working, but I didn't check all 30k images to verify if they were duplicates, so please use this script with caution. MAKE SURE YOU HAVE A BACKUP!

It performs two types of searches: 1) For JPGs and PNGs with the same name and will delete JPGs if a PNG exists - eg. mario-03.jpg and mario-03.png. 2) A fdupes search of files with the same name, minus the files extensions and two characters on the end - eg mario-01.jpg and mario-02.jpg (files named mario-* will be pooled together for the search), but mario brothers-* will be searched separately.

This is a linux script and there are several way to get it to run on windows: Natively, which is fastest. You can use a live media if you don't have a linux install. Or via Windows Subsystem for Linux or Cygwin. Either way you will need fdupes installed.

The script has three options, set via variables at the beginning. Set "TMP_DIR" to the same drive that the LB image folder is located on to speed up file operations. Change "DELETE_DUPLICATES" to true only once you are happy with the results from the dry run - there is no undo feature!

A list of duplicate files is written to files to lb-img-clean-log. I recommended to complete a 'image cleanup' via LB after running the script.

 

MAKE SURE YOU HAVE A BACKUP!

 

lb-img-clean.sh:

#!/bin/bash

LAUNCHBOX_MEDIA_DIR="/mnt/storage-ssd/emulation/LaunchBox/Images"
TMP_DIR="/mnt/storage-ssd/lb-img-clean-temp"
DELETE_DUPLICATES=false #set to true to delete
######
if [[ ! -d "$LAUNCHBOX_MEDIA_DIR" ]] ; then
  echo "LB media dir does not exist: "$LAUNCHBOX_MEDIA_DIR""
  exit 1
fi
if [[ ! "$(which fdupes)" ]] ; then
  echo "fdupes not found"
  exit 1
fi
if [[ $DELETE_DUPLICATES = true ]] ; then
  while true; do
    read -p "DELETE DUPLICATES ENABLED!!! DO YOU WANT TO CONTINUE???" yn
    case $yn in
      [Yy]* ) read -t5 -n1 -r -p 'Press any key or starting in five seconds'; break;;
      [Nn]* ) exit 0;;
      * ) echo "Please answer yes or no.";;
    esac
  done
else
  echo "Dry run..."
fi
WORKING_DIR="$TMP_DIR"/working
if [[ -d "$TMP_DIR" ]] ; then
  rm -r "$TMP_DIR"
fi
mkdir -p "$WORKING_DIR"
cd "$TMP_DIR"
LB_DIR_SIZE=$(du -h -d0 "$LAUNCHBOX_MEDIA_DIR" | awk '{print $1}')
LB_DIR_COUNT=$(find "$LAUNCHBOX_MEDIA_DIR" -type f | wc -l)
LOG_FILE="$TMP_DIR"/lb-img-clean-log
IFS=$(echo -en "\n\b")
echo -en "\n"
echo "LB media dir: "$LAUNCHBOX_MEDIA_DIR""
echo "Script temp dir: "$TMP_DIR""
echo "LB img dir pre size: "$LB_DIR_SIZE""
echo "LB img dir pre count "$LB_DIR_COUNT""
echo -en "\n"
echo "Duplicate filename, different extension search..."
echo "***duplicate filename, different extension  search..." > "$LOG_FILE"
for i in $(find "$LAUNCHBOX_MEDIA_DIR" -type f -name "*.jpg") ; do
  filename="$(basename "$i")"
  directory="$(dirname "$i")"
  extension="${filename##*.}"
  filename_noext="${filename%.*}"
  if [[ -f "$directory"/"$filename_noext".png ]] ; then
    echo "$i" >> "$LOG_FILE"
    if [[ $DELETE_DUPLICATES = true ]] ; then
      (rm "$i")&
    fi
  fi
  wait
done
echo "Starting fdupes search..."
for d in $(find "$LAUNCHBOX_MEDIA_DIR" -mindepth 2 -type d) ; do
  fdupes "$d" >> "$TMP_DIR"/fdupes
done
echo "Preparing fdupes matches for file searches..."
cat "fdupes" | while read line ; do
  echo $(echo -n "${line%.*}" | head -c-2) >> fdupes-cutted
done
echo "Searching for duplicate entries..."
sed -i '/^$/d' fdupes-cutted
uniq -d fdupes-cutted  fdupes-cutted-uniq
echo "***fdupes search..." >> "$LOG_FILE"
cat "fdupes-cutted-uniq" | while read line ; do
  if [[ -d "$WORKING_DIR" ]] ; then
    rm -r "$WORKING_DIR"
  fi
  mkdir -p "$WORKING_DIR"
  filename="$(basename "$line")"
  directory="$(dirname "$line")"
  extension="${filename##*.}"
  filename_noext="${filename%.*}"
  mv "$line"* "$WORKING_DIR"
  OFFSET_IN_SEC=0
  for file in $(ls "$WORKING_DIR"); do
    OFFSET_IN_SEC=$(( $OFFSET_IN_SEC + 1 ))
    TOUCH_TIMESTAMP=$(date -d "$OFFSET_IN_SEC sec" +%m%d%H%M.%S)
    touch -t $TOUCH_TIMESTAMP "$WORKING_DIR"/"$file"
  done
  if [[ $DELETE_DUPLICATES = true ]] ; then
    echo "***source directory: "$line"... " >> "$LOG_FILE"
    fdupes -dNf "$WORKING_DIR" >> "$LOG_FILE"
  else
    echo "***source directory: "$line"... " >> "$LOG_FILE"
    fdupes -f "$WORKING_DIR" >> "$LOG_FILE"
  fi
  mv "$WORKING_DIR"/* "$directory"
done
echo "Image cleanup completed..."
if [[ $DELETE_DUPLICATES = true ]] ; then
  LB_DIR_SIZE=$(du -h -d0 "$LAUNCHBOX_MEDIA_DIR" | awk '{print $1}')
  LB_DIR_COUNT=$(find "$LAUNCHBOX_MEDIA_DIR" -type f | wc -l)
  echo "LB img dir post size: "$LB_DIR_SIZE""
  echo "LB img dir post count "$LB_DIR_COUNT""
fi
echo "Log file located at: "$LOG_FILE""
exit 0

 

Edited by teeedubb
  • Like 3
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...