Home



2018/07 - SQL query basics!

Hi folks,

This post will be on SQL queries! Download and set up MySQL workbench from here, then download both videogame datasets from here. The workbench setup is pretty straight-forward, after which you just need to set up the database and import both datasets. I named the database "game_schema", and the tables "games" and "consoles", but feel free to call them whatever you want. 

Specify the database you want to use with the "use" keyword: use game_schema;

Check the schema of the table with desc. This will show you the fields and their data types: desc games;

Return all rows from the games table with the select statement and "all" operator: select * from games;

Same as above, but limit the results to the first thousand rows: select * from games limit 1000;

Return all games and order them by year in descending order: select * from games order by year desc;

Return all games and order them by year in ascending order: select * from games order by year asc limit 10;

Reutrn all games where the name contains the text "Call of Duty": select * from games where name like "Call of Duty%";

Return the game(s) with the most (MAXimum) sales in North America: select * from games where na_sales in (select MAX(na_sales) from games);

Return all games, except those covered in the above statements condition (not in): select * from games where na_sales not in (select MAX(na_sales) from games);

Return all games from the nineties (Between the year 1990 and 1999): select * from games where year between 1990 and 1999;

Return the game(s) with the lowest (MINimum) sales in Japan: select * from games where jp_sales in (select MIN(jp_sales) from games);

Return just one row for each platform in the games table with distinct. Distinct will only ever return one row, and you can combine it with other statements to write a more specific query. For example you could return a distinct row, set of rows with a distinct field, and so on: select DISTINCT(platform) from games;

Return the number of games for each platform in the games table (COUNT them up. The * operator just means count them all, but we could count a specific field up instead): select COUNT(*), platform from games group by platform;

A group by statement is usually used with a sum, count or other statement to specifiy a field that the aforementioned statement should be used against. For example the sum statement  adds together all of the results of a specific field, iin thise case "na_sales". This normally would return one row. But if we tell if to group by the games tables platforms, we can get the na_sales for every platform, splitting the former results by region: select SUM(na_sales), platform from games group by platform;

Return the  console name and sales with the highest Japan game sales using an inner join. An inner join will only return row where both involved table (games and consoles) have corresponding rows. For example, this statement will return all games with their respective consoles that have the most sales in Japan. If several games had the most sales in Japan, but one did not have a platform, it wouldn't appear in the results set because an inner join only returns rows from both tables where its join condition is true (INNER JOIN games on games.platform = consoles.platform). This is not a great example of how joins work because the data doesn't lend itself well to a join example. Usually you'll use joins where foreign keys are involved: select games.name, games.jp_sales, consoles.platform, consoles.UnitsSoldMillions from consoles INNER JOIN games on games.platform = consoles.platform where games.jp_sales in (select MAX(jp_sales) from games);

Same as above, but across all regions sales. We're adding all of the sales up and using our MAX statement on the resulting value. If you're familiar with virtual functions, an alias is kind of the same concept for a returned field name: select games.name, (games.jp_sales + games.na_sales + games.eu_sales + games.other_sales) as total_sales, consoles.platform, consoles.UnitsSoldMillions from consoles INNER JOIN games on games.platform = consoles.platform where games.jp_sales + games.na_sales + games.eu_sales + games.other_sales = (select MAX(games.jp_sales + games.na_sales + games.eu_sales + games.other_sales) from games);

A union combines the results of two queries. Normally queries would be QUITE large and complex if you find yourself needing to use the union statement. They allow you to combine two different queries of independent complexity/size, for example you might want the first ten RPG games of all consoles combined with the first five playstation fighting games, or whatever you need your queries to return: (select * from games where platform = "wii" order by year asc limit 10) UNION (select * from games where platform = "GB" order by year asc limit 15);

Remember how the inner join returned all rows that matched the join condition alone? A left join will return all of the consoles in this query, and any games that match the join condition. Any games that do not (Somehow) have a platform will be discarded: select * from consoles left join games on games.platform = consoles.platform;

The right join performs similarly. It would return all games but only platforms that match the join condition (So if a platform does not have any games, it will not appear). An outer join returns the results of both a left and right join, so basically returns everything.

If you'd like to know more about SQL check out the W3 Schools website here and have a look out for statements like intersect, except, having and coalesce. I'm sure you'll find something good worth learning! Lastly, here's a pretty good cheatsheet on SQL if you'd like a quick overview.

Good luck!

Marc