Jump to content
LaunchBox Community Forums

Launchbox Games DB External Scraper


HobbitCy

Recommended Posts

Good Day, As a Launchbox user in windows I love it : ) But.... I also use Linux as my daily OS and I've started setting up emulationstation while I wait for Launchbox to come over to Linux :p So the problem comes with thegamesdb being down. So the question is. Is it possible to have a scraper or modify an existing scraper to use the launchbox games db? Thanks.
Link to comment
Share on other sites

I think he means something completely different guys. He's asking if he can hook Emulation Station up to the LaunchBox Games Database. I have this planned but unfortunately I haven't built the services for it yet. Then after I do, the various other frontends will need to implement something to support it, so unfortunately it's only LaunchBox for now but this is coming in the near future.
Link to comment
Share on other sites

Jason Carr said I think he means something completely different guys. He's asking if he can hook Emulation Station up to the LaunchBox Games Database. I have this planned but unfortunately I haven't built the services for it yet. Then after I do, the various other frontends will need to implement something to support it, so unfortunately it's only LaunchBox for now but this is coming in the near future.
Marvelous, I was going to suggest this. Having the Database available for everyone will be a win/win scenario. The DB alone can attract more users to LaunchBox, I think.
Link to comment
Share on other sites

Jason you nailed it. That's what I was asking for, currently on EmulationStation it had its built in scraper but a couple people made standalone scrapers instead and was hoping for that. Look forward to see whats coming soon, I will see what I'm gonna do with my collection in ES. Thanks
Link to comment
Share on other sites

  • 5 months later...

Am putting the finishing touches on (yet another) front-end right now. So many online DBs with so many problems...

I'd like to add this site as a scraping source. Just poking through in a browser, I think that would be straightforward, even without an API.

Jason, it sounds from your post in April like you are open to being scraped...I wanted to hear what limits you'd put on that, and what you mean by getting a solution going.

  • Like 1
Link to comment
Share on other sites

  • 4 months later...

Hey guys, sorry I missed @majormajor0's post above. The entire set of data from the LaunchBox Games Database is available for download here:

http://gamesdb.launchbox-app.com/Metadata.zip

This zip file is updated daily with all the latest metadata. This is exactly how LaunchBox itself ties into the data (it's much quicker than an API because all processing can happen locally). Inside of the zip file are the same three XML files that are found in the LaunchBox\Metadata folder. Images of course are downloaded directly from the website.

Link to comment
Share on other sites

I interpreted silence as consent. It's not my highest priority right now, since I'm already scraping 3 databases, but when I get to it, I'll add launchbox. My plan is to add a second or two delay to protect the site from my app turning into a DDoS attack. I very much suspect that most of the problems with GamesDB stem from them not enforcing limits over there.

Link to comment
Share on other sites

Jason,

This metadata file you provide is fantastic--it fits right in with what I'm doing with the other DBs, which is to periodically cache the relevant parts in a local SQLite--much appreciated. I don't see how to get an image URL from it, though.

The DB I am working on is my own little hobby project. I don't have it in front of me, but it looks superficially like your much more professional effort, and in fact, had I seen yours before I started I would probably have never started.

The philosophy is to sort through all of mountains of available data and pare it down to official releases, group them by games, then filter by metadata. I am starting from Datomatic and the MAME DB to discard hacks and unplayable games. So while the front end will acknowledge and catalog all the releases of Excitebike, for instance, it indents them together under one game and flags one of the releases as preferred for launching. I'm also cutting it off at 4th generation consoles in order to bound the project.

The other component to it is tools to map the games to online DBs to gather as much available metadata as possible. Part of that is making it so if you are playing a European release, you are looking at European boxart. Right now I have online DB matches for about 90% of non-MAME (Datomatic) Games/Releases, but only about 30% of MAME machines. I was very encouraged, recently, that when I refreshed all of the cache after a few months, there were hundreds of new hits.

Right now I have been tearing my DB down and rebuilding it with a mildly different structure. When it's back together, I'll post it for public consumption in case it's of any use. It may well be useful, once I add a column for Launchbox, because it will be mapping your DB to 3 other DBs from then on.

 

Link to comment
Share on other sites

  • 1 year later...

Well, after much twiddling, this is kind of done to a beta level. The front end is what it is--it is no launchbox, to put it mildly. But I believe the database could have value for other developers (see below).

After a couple of false starts, I scrapped the website, but the GitHub is here (https://github.com/MajorMajor0/Robin). It includes the database as a SQLite file. There is no install.

The driving purpose of this project is to sort and eliminate the mountains of junk in the ROM world and boil it down to unique, playable, non-junk games.

Database features:
    - 25,960 seperate releases sorted into 12,100 unique, playable, non-junk games
    - All platform data up to 4th gen
        - Essentially limited to sprite-based consoles and handhelds
        - This is because later gen games are impractically large
        - I have tools to pretty rapidly add any requested platform, especially a platform in Datomatic.
    - MAME data for .195.
        - Note that this is limited to playable games and their parents, in keeping with the purpose of eliminating junk--i.e., games not in the DB should be assumed junk.
        - Note also that the MAME version can be easily changed--up or down given access to a MAME exe file.
    - Based on Datomatic (no-intro) for release, region and title data--this is after a lot of painful research to choose the cleanest data
    - Metadata pulled from GamesDB, GiantBomb, LaunchBox, and OpenVGDB, cleaned and merged
        - The DB as a by-product contains matches between all of these databases, as well as Datamatic
        - A major feature that would be hard to find anywhere else is the cross-reference here between ROM checksums and games in these major databases    
    - All standard metadata found in any of these DBs
    - Custom metadata that took a lot of effort to putting together, but is part of the junk-filtering process.
        - Junk: 509 games
        - Adult: 230 games
        - Not game: 710 games (for instance, TI-89)
        - Mess machine: 703 games (not a standard video game, for instance LED hand held or arm-wresting machine)
        - Multiple releases (clones) are gathered into games in order to bundle and hide duplicates

Front-end features
    - Goes through your mountains of ROMS, tosses garbage and duplicates, and sorts and identifies the good stuff (nothing is deleted)
    - Shows what you have and don't have
    - Filtering in milliseconds on 13 metadata properties
    - Autofiltering shows only valid remaining filter choices
    - Real-time, very fast text filtering
    - Display by individual release, or grouped into unique games
    - Display platforms and emulators
    - Update database with latest and greatest from online DBs, search for new matches
    - Rudimentary CLRMAMEPro style functionality.
        - Orders of magnitude faster than CLRMAMEPro or ROMCenter
        - Not as good as either of these excellent programs, yet, since it is just started. Still, faster.

So, again, the front-end is what it is, but I think the database could be a resource for developers.

 

 

 

 

  • Like 1
  • Thanks 1
Link to comment
Share on other sites

@majormajor0 That is quite interesting.  I am working on my own project to form a super database of everything, but I am still in the early stages.  I was planning to export things like cross-database maps, etc, for use in plugins and tools, but I am not there yet.  I am currently using MS SQL Express for mine, mainly for the speed and familiarity, just for the  data build.  I currently have LaunchBox GamesDB data which I access the same way LaunchBox itself does and MobyGames data (which I have explicitly tagged as "no export" aside from the game urls/ids due to their terms).

My struggle is that I want to document everything in very fine detail so that any sort of simplified generated data set could be made as needed (and if they are really common probably added as views for speed of generation).

Link to comment
Share on other sites

I chose SQLite for portability and deployability. I could never figure out how to deploy SQL without an install. But also, I've long thought (as I mentioned) that the real value in the project is in the DB. And having it in SQLite makes it easier, I think, for people to get at it.

I've been meaning to add Moby Games and GameFAQs. I believe both of them have good region data, which is a weak spot in most DBs. But unless I get some outside interest in this, I'm not sure big further effort is justified.

I also thought that the cross-DB maps might be useful here to punch up the LaunchBox GamesDB. With the tools I've written, one could, in a matter of hours, basically absorb 3 other DBs into Launchbox.

By the way, the front end contains tools to rapidly compare databases and find matches. It standardizes the titles by tossing meaningless characters and spaces, and replacing probable inconsistencies (e.g., IV = 4). Then for unmatched entries, it takes the Levenstein distance and sorts out the nearest possible matches so you can go in and look for manual matches. I think this stuff is highly useful, but, again, more so for the DB than for a front end user.

Edited by majormajor0
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...