Beginner’s guide to SQL
If most of your work has been done in Excel so far, this article with the basic SQL queries is for you.
For the very beginners: SQL language lets you interact with a database. The database consists of tables. Each table has its own unique purpose. For example, if the database is used by an online store, it can have the following tables: listings table that has details of each listing sold by the store, order table, table with customer information, employees table, etc.
Why would a company use SQL, you might ask. Why not use Excel? It’s a perfectly adequate program, easy to use and understand. The answer is, Excel limits the total number of rows to 1,048,576. Meaning, if you’re creating a table in Excel you can’t insert more than that number of rows. However, Excel will become glitchy long before you hit that limit. If you need to sort, filter, add calculations to a worksheet, the program will slow down and might even crash. All your work will be lost. Generally, if your table has more than 30,000–50,000 rows, Excel’s computational power will lag.
For demonstration purposes, I’ll be referring to a table downloaded here. You can copy-paste it and save it as a .csv file, then import it to a database in SQL application.
I’m making an assumption that you’re new to SQL and that you’ve been given read-only access (meaning you can’t add or delete rows from tables).
- Let’s say your manager told you to look for data and said that information can be found in a particular table named X. If you have never worked with this table before, you might want to review it first to get familiar with what you’re working with. To do that, let’s write a query to show all available information but only the first 10 rows.
SELECT *
FROM soccer_team
LIMIT 10
SELECT *
means “show me all available columns”.FROM
specifies which table you want to see.LIMIT 10
asks to only show the first 10 rows. You can change 10 to any number you want.
It’s always helpful to know what kind of information is stored in the table. If you have some background knowledge, it will be easier for you to get oriented. In my experience, even if you’re proficient in SQL but don’t learn how data is structured, your knowledge is not beneficial. Even worse, you can write a query that will be executed, but the result could be wrong (not what you wanted to see). It’s always a good idea to perform some sanity checks, meaning review that the data you see is what you actually requested.
In this particular example, the table stores information about teams in a soccer tournament based on EURO CUP 2016.
Description of soccer_team table:
- team_id — ID for each team
- team_group — the name of the group in which the team belongs
- match_played — how many matches a team played in group stage
- won — how many matches a team won
- draw — how many matches a team draws
- lost — how many matches a team lost
- goal_for — how many goals a team conceded
- goal_agnst — how many goals a team scored
- goal_diff — the difference of goals scored and goals conceded
- points — how many points a team achieved from their group stage matches
- group_position — in which position a team finished their group stage matches
2. Now let’s add a condition to our query using WHERE
. This is similar to using Filter in Excel.
SELECT *
FROM soccer_team
WHERE team_group = 'A'
3. But what if we don’t want to see all available columns from the table, only the certain ones. To do that, we need to replace *
in the SELECT
line with the name of specific columns. Here is an example of how we might do that.
SELECT team_id, won, draw, lost
FROM soccer_team
WHERE team_group = 'A'
4. We can add different conditions under WHERE
. For example, we can use BETWEEN
to specify the boundaries of a range. The range can be represented by numbers, text, or dates. Note that the boundaries of a range are included in the output.
SELECT team_id, team_group, won, draw, lost
FROM soccer_team
WHERE team_group BETWEEN 'C' AND 'D'
5. We can also combine more than one condition by using AND
and OR
operators. Parenthesis can be added to show the order of operations, just like in math. In this example, we ask to only show teams from team group ‘A’ who either won 1+ games or draw 2+ games.
SELECT team_id, team_group, won, draw, lost
FROM soccer_team
WHERE team_group = 'A' AND (won > 0 OR draw > 1)
Now let’s see what will happen if we don’t include parenthesis in the same query.
SELECT team_id, team_group, won, draw, lost
FROM soccer_team
WHERE team_group = 'A' AND won > 0 OR draw > 1
The result no longer includes just teams from group ‘A’. That’s because without the parenthesis OR
operator no longer goes together with team_group = 'A'
condition.
There’s a lot more one can do in SQL, but these operators will let you get started quickly. Once you get the data you need from the table, you can copy-paste it into Excel to perform additional operations.