Platinuming the PSNProfiles player scraper: challenging complexity and swelling scope
Monday 5 October 2020 / coding
Soon after completing my implementation of tic-tac-toe with AI for a Flatiron lab, it was time to start working on my first major independent project. In the last couple months, a lot of my time has been taken up with moving and settling in to a new flat. Before the move I was only a few lessons and labs away from the project, and I think the feeling that I might have forgotten how to work with Ruby combined with not really having a sense of the required scale of the project led me to
start procrastinating put my return to the bootcamp off a little bit longer than I really needed.
Well, coming off the back of a somewhat straightforward scraping lab, reading the requirements for the project led me to wonder what the big deal was. What, I just need to scrape some data and let a user choose what they want to see? That should only take a few hours! What I didn't account for was that when an idea for an app that would touch on my own interests came into my head, my own interests - as well as an inquisitive inclination and a wealth of data - would lead me to expand the scope of the project again and again and again.
Scope and structure
So my original idea was to pull some data from PSNProfiles player pages and let the user pick a player and view their data. PSNProfiles is a website that presents information about the games played and trophies earned by players on PlayStation 3, PlayStation 4 and PlayStation Vita. I'm a bit of a 'trophy hunter' and have been a fan of the PSNProfiles interface for a few years now. Not long after the initial idea came to me, I decided to add more data and allow the user to choose what to view. Then I decided to add a feature for comparing two players. Then a friend (a fellow PlayStation gamer and coder) suggested adding an option to export (individual player) data to XML or JSON. Why not? Then I decided to add the ability to change the player without restarting the app. The final interface of the app is structured as below.
- Choose player
- View data
- Length of service
- Recent trophies
- Recent games
- Rarest trophies
- Export data
- Compare with another player
- Change player
Navigating through this structure, as well as user interaction and scraping, viewing and exporting data, is handled by three classes - CommandLineInterface, Player and Scraper - and a (very basic) executable file.
In my last blog post (linked above), I spoke about how engaging I found the challenge of building tic-tac-toe with AI and various CS50 problem set solutions. I find that challenge and working out how to build things with methods and approaches I haven't yet learned the tools to fully execute really drive my desire to keep coding. This project had far too many challenges to cover them all in great detail, so below I've summarised a few in three different categories.
- Project set up and working in a local environment: working out how the Gemfile and environment.rb files should be configured and what terminal/command-line interface commands need to be executed to get things going both for me as the developer and for users
- Validating user input in menus and app functions
putsto guide the user to provide valid input
- Validating that a player profile exists and has public trophies
Solution: identifying page content or HTML elements that are unique to each case of an invalid or unscrapable player, and referring the user to the README for reasons a real player might not yet be scrapable
- Validating that a directory exists in the export function
- Returning to the CommandLineInterface's main menu from sub-menus and app functions within the Player class
Solution: passing in the instance of the CommandLineInterface (
self) as an argument/'cli' parameter when moving to the Player class, so calling
cli.main_menuwould return to the same instance of CommandLineInterface, in turn retaining knowledge of the Player instance(s) it should work with
Scraping was the first challenge I decided to tackle in building this app. Much of this was just like the scraping labs I'd already completed: take a URL, have Nokogiri parse it, then grab what you want by working out what HTML/CSS element, class, attribute, parent/grandparent element or any combination of the above distinguish the data you're after from the rest of the page. However, as the scope of the project grew, the extra data sources brought in a few different challenges, as outlined below.
- Scraping a series of data - recent games - and then iterating over it to collect the first 12 instances, or less if there are fewer than 12 instances
- Scraping fields or elements that appear on some profiles but not others, or provide different types or formats of data dependent on the game or player's activities, then storing and displaying data of different classes within the same field/attribute
- Changing the format of multiple different scraped dates
Solution: using the DateTime class to parse the text and store it as an instance of that class, then using its
strftimemethod to display dates in my preferred format
When the scraper was finished, I had a hash with around 30 key-value pairs, and further hashes and arrays nested within. Challenges that arose in dealing with this complex data structure and other aspects of the Player class are detailed below.
- Creating readers to account for all the top-level and nested data
Solution: adding all the attributes to an
attr_accessorline, then combining the resulting reader methods with metaprogramming using
sendto assign all the top-level data to instance variables. For nested data, manual reader methods that use the top-level/instance variable readers and then dive deeper
- Dividing the wealth of data up into sensible and not overwhelmingly large groups to present to users, and interpolating everything into
putsstatements with clear, user-friendly presentation
- Exporting to XML/JSON
- Accounting for the different ways a user could specify a directory, in particular whether they include a slash at the end
Solution: checking the last character of the user's input by calling
[-1]on the string, then adding a slash if there isn't one
- Transforming instance variables (back) into a hash to pass into the export methods
Solution: using the
selfto get all the instance variable names, then iterating over them, removing the
@and collecting the result of
instance_variable_getinto a new hash
- Not overwriting files if the user has previously exported a player's data
File.exists?to check if the default filename exists in the directory, and if it does, finding a number that can be appended that won't also result in an existing filename
- Working with two Player instances at the same time to
putsthe same data for each instance side by side
The result of navigating all these and more challenges - and putting in the hours to work through the less challenging but equally time-consuming task of building out all the methods to deal with scraping and displaying all the data - can be seen in the demo below. You can also check out (and clone) the project code on GitHub.