6 Examples to Improve Your SQL Skills

Creating complex select queries
Image for post
Photo by Sam Shin on Unsplash

SQL is used by most relational database management systems (RDBMS) to manage tabular data. It is a fundamental skill expected from data scientists and analysts.

You might argue that it is the job of a data engineer but the data scientist roles are inclined to being full-stack. Besides, as a data scientist, you wouldn’t want to depend on a data engineer to retrieve data from a database.

Long story short, having SQL in your skill set will put you one step ahead in searching and securing a job in the field of data science.

In this article, we will go over 6 select query examples. Although most of the SQL syntax is same among different relational database management systems, there might be some small differences. We will be using MySQL in this article.

Example 1

We have the following customer table.

Image for post
customer table (image by author)

We want to query the customers whose first name start with a vowel. The result should not include any duplicate values.

mysql> select distinct(f_name) from customer
-> where left(f_name,1) in ('a','e','i','o','u');+--------+
| f_name |
+--------+
| Elaine |
| Adam |
| Alisha |
+--------+

We use the distinct function to only retrieve unique values in the f_name column. The where clause applies the required condition. The left function is used to access the first letter in this case. We want the first letter to be in the provided list of values which are the vowels.

Example 2

In the customer table above, we want to retrieve the full name (f_name and l_name) and id numbers of customers based on the following two conditions:

  • Full name is longer than 10 characters
  • Id number is even
mysql> select cust_id, concat(f_name, ' ', l_name) as name
-> from customer
-> having char_length(name) > 10 and cust_id % 2 = 0;+---------+-------------+
| cust_id | name |
+---------+-------------+
| 4 | Adam Gelvin |
+---------+-------------+

We derive a new column by concatenating the first and last names and call it name. One of the conditions is put on this new column. The char_length function counts the number of characters. The second condition finds out if a number is even.

Example 3

In the customer table given in the first example, we want to get the customer with the longest full name.

mysql> select concat(f_name, ' ', l_name) as name 
-> from customer
-> having char_length(name) = (select
-> max(char_length(concat(f_name, ' ', l_name)))
-> from customer);+--------------+
| name |
+--------------+
| Elaine Smith |
+--------------+

We select the full name by concatenating the first and last names. The condition is put on the length of the full name. The condition value (max length) is specified by an additional select statement.

Example 4

We have the following customer and purchase tables.

Image for post
customer table (image by author)
Image for post
purchase table (image by author)

We want to see the customer first names along with the total number of items they purchased. We have customer id numbers and item quantities in the purchase table but not the customer names. Thus, we need to write join query to get the exact information we need.

mysql> select customer.f_name, 
-> sum(purchase.item_qty) as total_item_count
-> from customer join purchase
-> on customer.cust_id = purchase.cust_id
-> group by customer.f_name;+--------+------------------+
| f_name | total_item_count |
+--------+------------------+
| Adam | 4 |
| Alisha | 1 |
| Elaine | 6 |
| Jane | 4 |
| John | 3 |
| Robert | 1 |
+--------+------------------+

We select the first name from the customer table and item quantity from the purchase table. The customer id column is what relates these two tables so we use it when joining these tables. Since we apply an aggregation on the item quantities, we need to group the results by customer names.

Example 5

Using the customer and purchase tables in the previous example, we want to find the full name of the customers who made a purchase on May 10th, 2020. The results should not contain any duplicate values.

mysql> select distinct(concat(customer.f_name, ' ',   
-> customer.l_name)) as name
-> from customer join purchase
-> on customer.cust_id = purchase.cust_id
-> where purchase.date = '2020-05-10';+--------------+
| name |
+--------------+
| Jane Doe |
| John Doe |
| Adam Gelvin |
| Alisha T. |
| Elaine Smith |
| Robert Sam |
+--------------+

We only select a derived column from the customer table but the condition is put on a column in the purchase table. The distinct keyword is used so that the results do not contain any duplicate values.

Example 6

In the purchase table, we want to sort the stores in descending order based on the total number of items purchased on May 10th, 2020.

mysql> select store_id, sum(item_qty) as total
-> from purchase
-> where date = '2020-05-10'
-> group by store_id
-> order by total desc;+----------+-------+
| store_id | total |
+----------+-------+
| 2 | 9 |
| 1 | 6 |
| 3 | 1 |
+----------+-------+

We select store id and the sum of item quantities. The condition is put on the date. The item quantities are grouped by store id numbers and the sum is calculated for each group. The last line sorts the results in descending order by the sum.

Conclusion

What we achieve with the queries is that some calculations and filtering are done at the database level. Thus, we can only retrieve the data we need instead of getting all the data and then applying filtering and calculations.

Since real-life databases contain much more data and many relational tables, it is very important to be able to query the desired data using SQL.

Thank you for reading. Please let me know if you have any feedback.