Jump to content
LaunchBox Community Forums

Recommended Posts

TV Series Scraper

View File

Thank you for checking out TV Series Scraper! This tool will give LaunchBox and Big Box users the ability to easily add TV episodes to their library.

This is an AHK script written in V1 syntax and uses TMDB (The Movie Database) as its source for metadata and images

Users will be required to supply their own API key from https://www.themoviedb.org/ which is 100% free of charge and only takes the time needed to sign up creating a user name, and requesting an API key from within your user profile.

Within the .7z file there is a .ahk version and a .exe version. Both will operate exactly the same! If you are not familiar with setting up AHK then please use the .exe version.


#WHAT IT DOES#
-It will edit your PLATFORM XML file accordingly to add all data into LaunchBox and download both SEASON and EPISODE specific images
-SORT TITLE will be applied in the format of "TV SERIES SXXEXX EPISODE TITLE" to organize all series episodes together in proper sequential order
-SEASON specific images are saved into the platform's BOX - FRONT directory
-EPISODE specific images are saved into the platforms SCREENSHOT - GAMEPLAY directory
-METADATA applied will be the following:

  • Series title (within sort title)
  • Series genre(s)
  • Series network(s)
  • Sort Title (in the format of SERIES TITLE SXXEXX EPISODE TITLE)
  • Season number (within sort title)
  • Episode number (within sort title)
  • Episode title
  • Episode overview
  • Episode air date
  • Episode Run Time (*only if custom field was added - see notes below)


#HOW TO USE#
-Import your video files into your platform
-Without editing these entries, close LaunchBox
-Open LaunchBox again, and now BULK EDIT the entries as follows

  • Edit the SERIES field with the TV SERIES NAME
  • Edit the RELEASE DATE field with ANY DATE
  • Optional but recommended
    • Add a CUSTOM FIELD by doing the following
      • Edit a SINGLE ENTRY
      • Go to CUSTOM FIELDS tab
      • Add a custom field name called "Run Time"
      • Enter any value into the value field, for ex: "20"
        • Custom fields only save if a value is assigned to at least a single entry
      • Now BULK EDIT all entries again to edit the custom field "Run Time". Enter any value into the field

-Close LaunchBox

-Open TV Series Scraper

-If this is your first launch, you must go to API KEYS tab and enter in your TMDB API KEY!
-On the TMBD tab, enter the TV SERIES into TV SERIES field
-Select the specific platform XML file using the BROWSE button
-Then click the SEARCH button

  • You will be prompted if the search result is correct or if you want to see the next result

-Once you select a result all matching entries in your LaunchBox library will be populated into the EPISODES LIST drop down

  • This is to review if all of your episodes are in proper sequential order

-Also once a search result is selected, if the TV series has additional "episode groups" then you will be able to change EPISODE GROUP TYPE to "Alternative" and then select the EPISODE GROUP NAME of your choice
-Once you are ready with your settings, click SCRAPE to begin the process!


#SETTINGS & FEATURES#
TMDB Tab

  • SKIP SPECIALS checkbox will do just that. If a TV series has specials, as in, episodes that do not correlate to any particular season, these can be skipped over if the checkbox is enabled
  • EPISODE LIST is to review if your episodes are all in proper sequential order. If they are, you are free to pick either image naming format. ENTRY TITLE, or FILE NAME. If your episodes are NOT in proper sequential order, then you should ONLY use ENTRY TITLE. If FILE NAME is used and your episodes are not in sequential order, then images will not be assigned to the proper entry!

Batch File Rename Tab (see below for details)

API Keys Tab

  • Go to this tab to enter in your TMDB API key. The script will NOT function without this key!

Settings Tab

  •  Search settings
    •  All of these settings will use different logic on how to match to your entries file name. Choose the one(s) that best fit your files naming scheme
    • Each checkbox will display a tool tip to give an example of what it is using
    • Use first search result will bypass the message prompt so the user can confirm they have the right series.
    • Exact match only checkbox is the strictest search method and requires finding the SERIES TITLE, the SEASON AND EPISODE NUMBER, and the EPISODE NAME in your entries file name. This search method is ALWAYS enabled as the default but when this checkbox is enabled, all additional fallback search options are DISABLED
  •  Image settings
    • ENTRY TITLE will name all images as the entries title (episode title)
    • FILE NAME will name all images as the entries file name. Do NOT use this setting if your episode list is not completely in proper sequential order since images will not end up being named correctly
    • Download first image found is permanently enabled. Potential improvement for future revision is to give users ability to choose additional images should they exist

Additional Notes

  • If you choose to add multiple TV series to a single platform then it is recommended to NOT use the "Season # & episode #" search option (SXXEXX). This is due to it will only need to find for ex "S09E10" in a file name and will likely result in applying metadata and images for TV series "A" to a TV series "B" entry! This is the the "loosest" search setting as it has nothing specific to the TV series title, or to the individual episode title.
  • Going off the above note, if all your TV series are separated into their own individual platforms, then using the "Season # & episode #" search option is very reliable assuming your file names have SXXEXX in them
  • If you are having issues with getting good matches even with various search settings then you should utilize the Batch File Renamer to get your file's names better suited!
  • SPECIALS, as in episodes that do not correlate to any specific season, will be considered part of "season 0" and given a sort title in LaunchBox as "SERIES TITLE S00EXX EPISODE TITLE"
  • If you want to provide SEASON SPECIFIC images rather than have them downloaded from TMDB then place your image files into the "Images To Duplicate" folder prior to scraping.
    •  Images must be named in the following format:
      •  SERIES TITLE SXX
      •  For ex: Image file name "Archer S05.png" will be applied to every episode of Archer from its fifth season

 

#BATCH FILE RENAMER#
-This tool will batch rename files to the following format: SERIES TITLE SXXEXX EPISODE TITLE
-Having files named like this will guarantee good matches with even the strictest default setting of Exact match only!
-This tool will only rename files on a per season basis. For ex, if a TV series has 10 seasons and you want them all renamed, you will need to run this batch file rename 10 separate times
-The files in the selected directory MUST be in PROPER SEQUENTIAL ORDER! However, they can be named LITERALLY ANYTHING!
-What the batch file renamer does is take the first file found and gives it the name of the first episode for the series and season you entered. It then takes the second file found and gives it the second episodes name. Then it takes the third file found and gives it the third episodes name....and so on.
-To use do the following:

  • Enter the TV SERIES into the TV SERIES field
  • Enter the SEASON NUMBER into the SEASON NUMBER field
  • Enter your files FILE EXTENSION into the FILE EXTENSION field
  • Select the directory which you want to batch rename files using the BROWSE button
  • Click the SEARCH button
    • You will be prompted if the correct search result came up.
  • Once the proper search result is selected you can then select an ALTERNATIVE EPISODE GROUP should you want to use one, if the series has more than just the default episode group.
  • With all fields filled in as desired, click the RENAME button. Your files will be renamed accordingly in just seconds!

-CAUTION IS ADVISED! Back up your files first! There is no undo button associated with this action should you make a mistake

 

#GETTING TMDB API KEY#
-Once you have logged into your TMDB account you can click on your user name icon in the top right corner. In the pop-up menu click on "Edit Profile".
-In the left hand list click on API which will be near the bottom of the list
-Here you will be able to request and retrieve your own API key

 

#THANK YOU#
@Whatscheiser worked as a beta tester for v2.0 and their feedback was critical to making the tool better! Your time and efforts are much appreciated!

 

 

#EXAMPLE IMAGES#

TVSeriesScraper.png.26076faad76cfea709923e5e5fa0a0df.png

TVSeriesScraper-BatchFileRename.png.ba18970e7e3ba563800ce5e20153de1d.png

TVSeriesScraper-Settings.png.b4a70b4a88dbe95ac8e6399a1095296c.png

 Example for use in Big Box

ArcherBigBoxExample.thumb.png.88183872b01edb1559d31234159b0813.png

 

#EXAMPLE VIDEO#

Showing the setup process to import and scrape for an entire season in just a few minutes!

Please note video editing took place during the scrape progress so the video could be shortened

 


#SUPPORT#
Please keep all questions and requests for help in the main discussion and support thread. If you are reading this, you are currently in the main discussion and support thread!


 

Edited by skizzosjt
v2.0
  • Like 1
Link to comment
Share on other sites

  • 3 weeks later...
On 2/3/2024 at 5:15 PM, topderob said:

how to open and/or use this ahk file ?

 

Once you have the .ahk file downloaded you can put it where ever you want it live. But do take note a supplemental folder and config file will be created in the same spot. So maybe it's a good idea to put it in a folder named "TVDB Scraper" so it's related files aren't living in a folder with a bunch of other files and/or folders. But that's totally all up to you, it can work from any location.

Once you have the .ahk file downloaded you will be able to simply double click it or open it any traditional way you would open a file. If that doesn't work out of the box, your issue is you likely don't have any executable assigned to .ahk file extensions and are getting asked what to use to run the file. You simply need to select an AHK executable to assign it at that point.

  1. Right click on any .ahk file
  2. Go to Open With > Choose Another App
  3. On the pop up window that appears, click on More Apps
  4. Scroll down to bottom of list and click on Look for another app on this PC
  5. Navigate to the AHK exe included with LaunchBox. It will be located in \LaunchBox\ThirdParty\AutoHotkey
  6. Select the AHK exe

Now all .ahk files will use this AutoHotkey.exe executable to run. You can now simply double click all .ahk files to run them.

Please let me know if you need any more assistance!

Edited by skizzosjt
Link to comment
Share on other sites

  • 3 weeks later...

Hello, I've really been enjoying having this on hand to assist with adding data to my TV Show entries. Suddenly having an issue with the script today though. I do everything as I had been doing, but when I hit the button to scrape for an episode the script times out then shows an error message asking if I'd like to close or continue...

...and I was just about to do this again so I could screen shot the error message and post it here. When I did, the script suddenly worked. Odd since I failed out the three times prior. I suppose never mind. I guess I'll see if it happens again. (EDIT:: It seems like its just a timeout with contacting TVDB. The show I'm currently scraping has north of 400 episodes. It might just be taking to long to look through all of that data and so its throwing an error. I don't think anything is actually broken).

::EDIT 2::

Eh, yeah for some reason it just doesn't match at all now. Can't really put together why. Just all of the sudden in the middle of season 12 I stopped getting any matches from TVDB for this show. Haven't done anything different than the prior seasons. Weird, man.

---------------------------------------------------------------

While I'm here though I do have a couple other issues to bring up. When scraping TVDB and adding information to LaunchBox's XML files I'm finding I'll always have an issue if the title of an episode or the notes contain "&" - The XML format interprets it as some type of incomplete entry. LaunchBox will report a corruption error when starting up. Luckily it also reports the file and what line and position the error is at so its a pretty trivial thing to just manually edit out the mistake, but it does happen fairly frequently.

The other issue I've noticed is with TVDB's formatting for release date of an episode and the network it aired on. Every so often the air date is not listed. When this happens the script will sometimes mistakenly enter the network call sign where the XML file expects the release date. LaunchBox will then interpret the field to have invalid data at start up and again will report that the fil is corrupted. And again, its easily fixed with a manual edit.

Just thought those might be worth mentioning.

Edited by Whatscheiser
Link to comment
Share on other sites

On 3/2/2024 at 3:49 PM, Whatscheiser said:

Hello, I've really been enjoying having this on hand to assist with adding data to my TV Show entries. Suddenly having an issue with the script today though. I do everything as I had been doing, but when I hit the button to scrape for an episode the script times out then shows an error message asking if I'd like to close or continue...

...and I was just about to do this again so I could screen shot the error message and post it here. When I did, the script suddenly worked. Odd since I failed out the three times prior. I suppose never mind. I guess I'll see if it happens again. (EDIT:: It seems like its just a timeout with contacting TVDB. The show I'm currently scraping has north of 400 episodes. It might just be taking to long to look through all of that data and so its throwing an error. I don't think anything is actually broken).

::EDIT 2::

Eh, yeah for some reason it just doesn't match at all now. Can't really put together why. Just all of the sudden in the middle of season 12 I stopped getting any matches from TVDB for this show. Haven't done anything different than the prior seasons. Weird, man.

---------------------------------------------------------------

While I'm here though I do have a couple other issues to bring up. When scraping TVDB and adding information to LaunchBox's XML files I'm finding I'll always have an issue if the title of an episode or the notes contain "&" - The XML format interprets it as some type of incomplete entry. LaunchBox will report a corruption error when starting up. Luckily it also reports the file and what line and position the error is at so its a pretty trivial thing to just manually edit out the mistake, but it does happen fairly frequently.

The other issue I've noticed is with TVDB's formatting for release date of an episode and the network it aired on. Every so often the air date is not listed. When this happens the script will sometimes mistakenly enter the network call sign where the XML file expects the release date. LaunchBox will then interpret the field to have invalid data at start up and again will report that the fil is corrupted. And again, its easily fixed with a manual edit.

Just thought those might be worth mentioning.

Hey there Whatscheiser, I'll address all your points here. I really appreciate the feedback you continue to give.

1) 400 episodes sounds like a lot. Largest show I know I scraped is 300 something episodes. The qty of episodes impacts the time taken. The script first scrapes all the relevant data from the website before anything else. So, it might act like it's frozen during that first scraping process since it's downloading said data. The message you got comes up when it detects a scrape is completed, so, it must have thought it went through everything. That can happen if you get no responses from the website. If it's still not working for you, that is strange, not like it went through some update and the code changed.  To be clear for me, are you saying it doesn't work at all anymore, no matter the show or season? Doesn't work for a specific show? Doesn't work for a specific season for a specific show?

2) I actually thought I took care of ampersand character (&) as I'm well aware about that issue, but as I think about it, I likely did that for a different tool and not this one lol. That is a easy fix for me to implement, I'm kinda surprised I didn't run into that issue myself. Glad your hands on enough to understand the issue and fix it with a manual edit of your platform file when it occurs!

3) The air date not being on TVDB is bizarre. I wouldn't think they allowed incomplete entries like that (like at least put in the year rather than blank if exact date is unknown), but as I continue to work on this project I'm less of a fan of TVDB and more of a fan of TMDB now. After looking at TVDB's site with inspector, I think I know why this is occurring but I will need to recreate it to know for sure.

 

With all that said, can you please provide an example of the 1st and 3rd issue, as in what show and season type are your scraping? I'll need you to please provide the URL address you enter into the script for both issues. In case you wonder, I don't need relevant video files on my PC. I can rename a bunch of files to mimic having them in my library for troubleshooting so this will give me a chance to recreate and hopefully fix these in a reasonable time frame.

 

Now I actually have a favor to ask of yourself! Any chance you would be willing to test a brand new version of this script? As I alluded to above, I looked into using other databases to improve this script, and TVDB is not as good as TMDB  (The Movie Database - which despite the name does have an extension database of TV series). So I am making this new version use their database, and it's using API rather than class names, among a handful of other improvements I hope users will like. It would be great to get some feedback on it, especially since you are already familiar with the current version. Please let me know if you would be into giving it a test drive.

Link to comment
Share on other sites

1. I'll see what I can do about providing an example. TVDB seems to be having issues of its own tonight. Nothing but a lot of 502 errors on several different browsers. At any rate I didn't really get the error message I spoke of in my previous post any longer. I wish I could be of more use for diagnosing the issue but I don't have much aside from... it worked for 12 or so seasons, and then it didn't. Which I know is not super helpful information. ...What I can say is the scraper runs. I can see it making progress on scanning the 400 plus some odd entries, but it just will not match my episodes regardless of how I name them. I can make the file name a 1:1 match and it just seems to blow right past it.

I feel like a bit of a nerd here but the show I am currently working on is Modern Marvels from the old History Channel. Anyway, I had also used your script to pull info in for Andor, and Band of Brothers with no issue. Both of those were one season series and I wanted to do a larger show so, I tackled my largest one, which is Modern Marvels. Everything pulled good until about the last two episodes of Season 11. For whatever reason it just wouldn't grab them. So I did the shoulder shrug and moved onto season 12... then none of those matched either, same with season 13, 14, and 15. Then I just stopped trying it.

The URL I was entering in every case of error is this one: https://thetvdb.com/series/modern-marvels/allseasons/official

An example episode it wouldn't grab would be anything from the mentioned seasons, but I'll throw out one: S12E51 Coffee

Now I know the ideal convention for naming that you laid out would be "Modern Marvels S12E51 Coffee" I tried that when my normal naming convention stopped working but it didn't make a difference, unfortunately.

To be clear... I did try it with a new series just to make this post. I imported the first season of Miami Vice. Did not match a single episode for the first season (only tried the first season). I really wish I could account for why. All I can really keep saying here is that the script worked great, and then it didn't work. I just can't seem to account for it.

3. S06E05 The Police Car; S06E06 Plastics  - two examples from the URL I gave above. The scraper will make a release date entry of "H2" if you request it pull in info for those episodes. The Modern Marvels TVDB is kind of all over the place. It's pretty widely known that outside of show names and descriptions, its not super accurate. So missing air dates doesn't really surprise me. ...Oddly enough I think the most accurate info I have found for the show is actually on RottenTomatoes.

 

For the last point, I'd definitely test out a new version of the script. I have plenty more shows to pull and anything I can do that helps contribute to better tools I'm more than happy to do.

Edited by Whatscheiser
  • Thanks 1
Link to comment
Share on other sites

I am getting no connection to thetvdb.com either. I would have to assume this is the issue with scraping not working until they clear this up and we know for sure.

image.thumb.png.9c090944b53ccdb500fe08b41b99ccb5.png

When I try the script just now, it doesn't grab any data from them. This is evident due to the "Total Series Webscrape Progress" count is only 1 rather than the number of episodes

 

I think it should still find "S12E51 Coffee" assuming that is the file name, and "Coffee" is the episode's title. Last fallback is just the episode title, so that should work. I'll have to assume right now, likely part of this issue with their website. So far sounds like you stayed in the guidelines just fine. Sorry, this is another hurry up and wait thing before I can properly diagnose

 

I also think I found the reason the network is being grabbed if the air date is absent. Gotta wait to test to know for sure. In risk of sounding like a broken record, I'll hurry up and wait to test lol. Kinda weird situation, I have been playing with their API in the last week, and it still works fine right now. So, they are must be having some issues that are purely website related.

 

Not kidding you, I watch some of those Modern Marvels on youtube recently lol! It's interesting stuff! No judgement from me!

 

Lastly, thanks for being willing to give the new version a whirl! I'm gonna get my ducks in a row on that and will shoot you a message with the new script and any details on what's new or different.

 

  • Like 1
Link to comment
Share on other sites

Hi @Whatscheiser coming back with an update

TVDB was back up later and script is working as expected so seems any issue with getting nothing returned was issues with their site/servers. Once I noticed they were back up, I had to recreate the issues you communicated and test my solutions.

Good news, the issues are resolved. Found a couple other things and fixed them up too. Bad news, there is something funny going on with Modern Marvels. That is the most incomplete series I've seen on TVDB and I'm not sure if this is lending to the issue quite yet since I have not been able to recreate this specific issue for any other series. I created dummy files through the end of season 13 using the batch file renamer and was just shy of 600 episodes. Not once did it ever scrape metadata beyond part way through season 12, always ending at episode #35, Mountain Roads. The scraping loop is stopping prematurely, it should instead scrape the entire webpage the user provides. This explains why it acts like it is suddenly stopping part way through such a large series and why later seasons get zero responses. The scraping loop should only stop if there are no more "class names" to scrape, it would return an error that triggers the loop to stop. So, I know why it's stopping, but I haven't had enough time to dig into why an error is thrown at that specific spot. I will upload a new version as v1.1 to fix all other issues though. Hopefully will have time to do that in the next day. Then I will likely return my focus to working on v2.0, specifically get the beta test to you with relevant details.

Thanks for your patience on all this stuff!

  • Thanks 1
Link to comment
Share on other sites

Thanks for letting me know. I still had an issue scraping info for Season 1 of Miami Vice when I was using https://thetvdb.com/series/miami-vice/allseasons/official I didn't get one match which left me scratching my head again...

However, I swapped out and tried using https://thetvdb.com/series/miami-vice/allseasons/dvd and all of the sudden, matched every single episode, brought in poster art and proper gameplay shots as well.

What's odd is that aside from the data for both of these residing at different URLs, so far as I can tell, its all the same data in terms of episode order, title and description. The DVD order even still has the air date information. Another one for the "weird" category.

 

I appreciate you taking another look into those issues with Modern Marvels. Your script helped me get the bulk of the episodes I have imported at any rate. It's not a deal breaker for me If I have to do some copy/paste work on the final few seasons, but it would be awesome to solve it. I've got plenty of patience for it, I just appreciate you taking the time to work on the script and share it. Like I said before, I'm just happy to be a help if I can be. Just let me know what I can do.

Thanks!

Link to comment
Share on other sites

Hi @Whatscheiser v1.1 is now uploaded. I got all the issues fixed, including being able to scrape Modern Marvels in its entirety.

I do notice I am getting errors from TVDB more often. That "whoops looks like something went wrong" and "502 bad gateway" web page response kinda stuff. When that is occurring and you try to scrape it will not work and the script will likely return a error stating it timed out. To be clear, this is not the script's fault, it's the website/servers not responding or being down. It can be deceptive too. It might be able to download say the first three seasons of some series, but then hang up on number four. Not only does the user provided URL need to be responding properly, but the specific season number for that season type as well for the script to fully work. For ex with Modern Marvels.

This main page must be responding correctly

https://thetvdb.com/series/modern-marvels/allseasons/official

But also the season specific pages must respond correctly too, using Season 12 here

https://thetvdb.com/series/modern-marvels/seasons/official/12

If the main page doesn't work right, nothing scrapes.

If the main page works, but the season specific page doesn't work, it will not be able to download the season specific image and will eventually give you a time out error. You can keep trying, or would exit the script at that point.

 

This likely explains why the official airing for Miami Vice gave you issues, but then the dvd order worked fine. One page was not responding properly, while the other was.

 

All of this BS doesn't happen with TMDB and the API v2.0 of the script. So, I'll return my focus to that so users can stop having to deal with this seemingly random unreliability

  • Thanks 1
Link to comment
Share on other sites

Ah that makes a lot of sense. Thanks again for taking a close look at these issues and uploading the revised script! I'll keep tinkering with it and let you know if I find anything else of interest. I'm looking forward to checking out v2.0 as well!

Link to comment
Share on other sites

  • 3 months later...
12 hours ago, thecaptn311 said:

Trying to use this for the first time and when I open it the exe it creates the folders and then no gui ever pops up.  Windows 10 machine on a Ryzen 5600x.  not sure how I should approach this further

 

Hi there! If the GUI never shows up, it is not finishing a specific startup task.

Can you please open a CMD Prompt window and then run this line?

winget install jqlang.jq

this is a Windows package utilized by the script to "pretty print" JSON responses in a human readable format rather than one giant string blob. the GUI waits to be shown until this package is first checked and downloaded if not already present on your system. your system must be getting hung up here on not being able to download it then.

 

The response you should get should look like this. If you are getting something else, please post that and maybe we try and find a solution!

image.thumb.png.7cc40c58151b24f610de28706833e464.png

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...