This post is about doing api requesting and json formatting with the SWAPI (The Star Wars API) to find the oldest person (or robot or alien) in the Star Wars, and all the films they appeared in.

Detailed source codes about this post are included in this notebook.


SWAPI

Firstly we need to explore the source API website a little bit so as to find the correct way of using this API:

image1

image2

As we are trying to find information about characters (people), our first base url would according be https://swapi.dev/api/people/, we could use the following codes to get information about characters that appear in Star Wars:

base = 'https://swapi.dev/api/people/'
people = requests.get(base).json()


JSON

The results we have in people now is quite nested. We could easily notice that there ought to be 82 people in total but our dictionary only have 10 records. That is because of the next key: the currrent result is just one page, we would need to do some more requests to get results from the next pages:

results = people['results'] # a list of dictionary

while people['next']:
    people = requests.get(people['next']).json()
    results = results + people['results']


As the results variable is actually a list of dictionaries, we can easily transform it into a dataframe:

people_df = pd.DataFrame(results)


image3

As we can see, this dataframe is not cleaned yet and has some nested lists in its columns. We could transform it in some particular ways to suit our needs and we would mainly focus on name, birth_year and films.


Analysis

By doing some simple explorations we could find that the birth_year column basically has two types of entries: the unknowns who we donโ€™t know their age and the BBYs who were born Before Battle of Yavin.

Therefore, we onld only look at the ones we have age information about: the BBYs.

BBY = people_df[people_df.birth_year.str.endswith('BBY')] # filter out the BBYs

# remove the redundant words 'BBY'
BBY = BBY.assign(birth_year = BBY['birth_year'].map(lambda x: float(x.rstrip('BBY'))))

After filtering out the BBYs and remove the redundant string, we could easily find the oldest one by sorting:

oldest = BBY.sort_values('birth_year', ascending=False).head(1)


image4

๐Ÿ˜ƒ It turns out that Yoda is the one we want to find!

Now the only thing left is to handle the nested film column and doing one more round of API requesting to get the film names:

oldest_films = [requests.get(y).json() for x in oldest.films for y in x]


image5

Therefore, The Empire Strikes Back, Return of the Jedi, The Phantom Menace, Attack of the Clones, Revenge of the Sith are the five films that our oldest friend Yoda appears! ๐ŸŒ