--
DVD Rental Exploratory Analysis
After carrying out a few Excel Data analytics projects, I have pushed on to learn SQL to improve my career prospects in Data Analysis.
I aim to show my understanding of some SQL statements through this project.
I used PostgreSQL and Pg Admin (two of the most popular SQL tools), and the "DVD rental" database in Jose Portilla's Udemy course for this analysis.
Without further ado, let me jump into it.
1. How many payment transactions were greater than $5?
SELECT COUNT(amount)
FROM payment
WHERE amount > 5;

In this query, all I did was pass the COUNT function on the amount column to count the total number of rows in the payment table that meet the criteria in the WHERE clause.
2. How many actors have a first name that starts with the letter P?
SELECT COUNT (*)
FROM actor
WHERE first_name LIKE 'P%';

In this query, I counted the number of rows in the actor table, filtered the table by the first name, and used the LIKE operator to filter the first name further. Since the COUNT function counts the number of rows in a table, I can choose not to specify any column in the bracket and instead pass an asterisk inside the bracket, as you can see in the query.
3. How many unique districts are our customers from?
SELECT COUNT(DISTINCT(district))
FROM address;

The DISTINCT function finds unique values. All I did here was count the number of unique districts in the address table.
4. Retrieve the list of names of those distinct districts from the previous question.
SELECT DISTINCT(district)
FROM address;

This query gets a list of all the unique districts in the address table.
5. How many films have a rating of 'R' and a replacement cost of between $5 and $10?
SELECT COUNT(*)
FROM film
WHERE rating = 'R'
AND replacement_cost BETWEEN 5 AND 10;

In this query, I used a combination of the WHERE, AND, and BETWEEN to filter the film table and get the number of films that meet the criteria.
6. What customer has the highest customer ID number whose name starts with "E" and has an address ID lower than 500?
SELECT first_name, last_name
FROM customer
WHERE first_name LIKE 'E%'
AND address_id < 500
ORDER BY customer_id DESC
LIMIT 1;

The LIMIT function reduces the output to a specified number, which in this query is 1. The DESC function orders the customer IDs in descending order, that way I can get the highest customer ID that meets the criteria specified in the WHERE clause.
7. Return the customer IDs of customers who have spent at least $110 with the staff member with an ID of 2.
SELECT customer_id, SUM(amount)
FROM payment
WHERE staff_id = 2
GROUP BY customer_id
HAVING SUM(amount) >= 110;

Since I can't include aggregate functions in the GROUP BY or WHERE clause, I used the HAVING clause to specify the customers who have spent $110 or more with the staff ID specified in the WHERE clause.
8. We are launching a platinum service for our most loyal customers. We will assign platinum status to customers who have had 40 or more transaction payments. What customer IDs are eligible for platinum status?
SELECT customer_id, COUNT(amount)
FROM payment
GROUP BY customer_id
HAVING COUNT(amount) >= 40;

I also used the COUNT function and HAVING clause to identify the total number of customers in the payment table that meets the criteria of 40 or more transactions.
9. We are running a promotion to reward our top 5 customers with coupons. What are the customer IDs of the top 5 customers by total spend?
SELECT customer_id, SUM(amount)
FROM payment
GROUP BY customer_id
ORDER BY SUM (amount) DESC
LIMIT 5;

In this query, I used SUM to calculate the total amount spent by customers, then used the ORDER BY, DESC, and LIMIT functions to identify the top 5 total amounts spent.
10. Corporate HQ is conducting a study on the relationship between replacement cost and a movie's MPAA rating(e.g. G, PG, R, etc...). What is the average replacement cost per MPAA rating?
SELECT rating, ROUND(AVG(replacement_cost),2)
FROM film
GROUP BY rating;

I used AVG to get the average replacement cost for each movie rating. The average had a lot of decimal places, so for readability, I used the ROUND function to get it down to 2 decimal places as shown in the query.
Reading this piece is a huge encouragement to me. Thank you!