All Important Window Functions in SQL for Interview
When I started my interview preparation a few months back, I realized mastering analytical functions was extremely important to get hired in any analytical firm.
I started looking for resources on the internet and realized there was hardly any organized data resource available. So I decided to write these functions down here just in case anyone is preparing for an interview.
Introduction
I have a basic expectation that you are experienced in writing SQL queries. Before Moving to the functions let’s first talk about some basic concepts.
I assume you know the order of execution for SQL commands. In case you don’t let’s discuss it once. The query execution starts from “FROM” clause and ends in the order as shown below.
FROM
clauseON
clauseOUTER
clauseWHERE
clauseGROUP BY
clauseHAVING
clauseSELECT
clause (Window Functions are executed with the select clause)DISTINCT
clauseORDER BY
clauseTOP
clause
The syntax is self-explanatory since the window function is written within a select clause hence is executed along with the select statement. An important point to remember is that you cannot write a where clause on the column created by the windows functions because where is executed before select. For example, in the below example we cannot write where condition on rnk column since where will be executed before rnk column is created.
select empno, ename, job, hiredate, sal,
rank()
over( order by hiredate
)rnk
from emp
order by sal;
Here rnk is just another column name. the calculations will be made by the function and results will be displayed in the column.
The standard syntax of a windows function is
window_function (<column_name>) OVER (
[ PARTITION BY expr_list ]
[ ORDER BY order_list ]
[ windows clause )
You will get used to this later. Just take a look at it and move on.
Ranking Functions
There are multiple functions that can be used for ranking the rows. When the value is the same, Rank() function will display the same rank value but the next value after that will be equal to the row number for example 1,1,3, and so on if the first two values are the same.
Whereas in dense_rank() we values are continuous. For example, 1,1,2 and so on if the first two values are the same
In the row_number() function the values are according to the row value 1,2,3 and so on even if the first two values are the same.
Rank Function
Table structure
Question Statement: Calculate the rank according to the hire date of the employee and order by the salary of the employee
select empno, ename, job, hiredate, sal,
rank()
over( order by hiredate
)rnk
from emp
order by sal;
Dense_Rank
Question Statement: Calculate the rank according to the salary of the employee and order by the salary of the employee in ascending order
select empno, ename, job, hiredate, sal,
rank()over(order by sal) as rnk,
dense_rank()
over( order by sal
)as sal_dense_rnk
from emp
order by sal
;
Row_Number
Question Statement: Calculate the rank according to the salary of the employee and order by the salary of the employee in ascending order. Use Empno when sal is the same.
select empno, ename, job, hiredate, sal,
row_number() over(
order by sal, empno) as row_num,
rank()over(order by sal) as rnk,
dense_rank()
over( order by sal
)as sal_dense_rnk
from emp
order by sal
;
Percent_Rank, Cume_Dist, NTILE
Table Structure
Problem Statement: Show me distributions of Hollywood movie box office sales using PERCENT_RANK, CUME_DIST, and NTILE (quartile, decile, tertile, etc)
select name, gross_sales,
percent_rank()
over(
order by gross_sales
) as pct,
cume_dist()
over(order by gross_sales
) as cum_dist,ntile(4)
over(
order by gross_sales) as ntil
from movies;
Dealing With Nulls
Table structure
Table 1:Moons
Table 2: Planets
Problem Statement: Rank the planets (mars, venus, pluto, mercury, neptune)by their radius in Desc order. Nulls should be displayed last.
select p.name as planet,m.name as moon, m.radius,
row_number()
over(
order by m.radius desc) as rnk
from
planets p, moons m
where p.name =m.planet_name(+)
and p.name in ('Mars','Venus','Pluto', 'Mercury', 'Neptune')
select p.name as planet,m.name as moon, m.radius,
row_number()
over(
order by m.radius desc nulls last) as rnk
from
planets p, moons m
where p.name =m.planet_name(+)
and p.name in ('Mars','Venus','Pluto', 'Mercury', 'Neptune');
Partition Clause
Partition clause is similar to group by however while both clauses aggregates data, Partition by would not reduce the number of rows while group by rolls up the data. partition by simply creates a window on which we can perform our calculations. Let’s understand this through examples.
Problem statement: i need departmental salaries running total by employee name
Table Structure:
select deptno,empno,ename, job,sal,
sum(sal)
over(
partition by deptno
order by ename) as total
from emp
order by deptno, ename;
Windowing aggregation
We will start using windows clauses now. We can perform mathematical operations on the range of rows.
Problem statement: i need a running total of total ocean coverage across the world, largest to smallest
Table Structure:
select name, type, square_km,
sum(square_km) over ( order by square_km desc
rows between unbounded preceding and current row ) as tot from water;
Problem statement: I need a running total of total ocean coverage for water type across the world, largest to smallest
select name, type, square_km,
sum(square_km) over (
partition by type
order by square_km desc
rows between 1 preceding and 1 following
) as tot
from water
order by square_km desc
;
Problem statement: 6 months moving average of tweets per day.
select dte, tweet_millions,
avg(tweet_millions)
over(
order by dte
range between interval '6' month preceding and current row) as averagefrom tweets;
Problem Statement: Show me life expectancy throughout history in the UK compared to 5 years on either side
select recorded_year, age,
first_value(age)
over(
order by recorded_year
range between interval '5' year preceding and interval '5' year following
)as fv,
last_value(age)
over(
order by recorded_year
range between interval '5' year preceding and interval '5' year following) as lv
from life_expectancy;
Problem Statement: compare daily uber trips in new york city with 2nd lowest/Highest from July to September
Table Structure:
select trip_Date, trip_count,nth_value(trip_count, 2)
over(
order by trip_count
rows between unbounded preceding and unbounded following)as secnd_lowestfrom trips
order by trip_date
Dealing with Nulls
Problem statement: Fill in the blanks for the null values in DEPTNPO table and count the employees according to departments.
select seq, empno, ename, job, mgr, hiredate, sal, deptno,
last_value(deptno ignore nulls)
over(order by seq) as dept
from external_emp
select dept, count(dept)from
(select seq, empno, ename, job, mgr, hiredate, sal, deptno,
last_value(deptno ignore nulls)
over(order by seq) as deptfrom external_emp) agroup by dept
order by dept
Lag and Lead
This is one of the most important windows functions. In my recent SQL interview, I was asked 3 questions related to lead and lag. The lag function will just provide the data of the one row above.
LAG(expression [,offset[,default_value]]) OVER(ORDER BY columns)
SELECT seller_name, sale_value,LAG(sale_value) OVER(ORDER BY sale_value) as previous_sale_valueFROM sale;
The lead function will do the opposite. It will provide me the values of 1 row below. Let’s do a complex problem now.
Problem statement: Only show the dates when the status of the order changed for that particular order. remove the remaining data.
Table structure
select order_id, status_date,lead_status,status,
lag(status_date)
over(partition by order_id order by status_date ) from_date,
status_date to_date
from(select
order_id,
status_date,
status,
lag(status) over (partition by order_id order by status_date) lag_status,
lead(status) over (partition by order_id order by status_date) lead_status
from ORDERS)a
where lag_status is null
or lead_status is null
or lead_status <> status
order by order_id, from_date nulls first
Conclusion
These functions will make your life easier if you can use them wisely. I have been asked about window functions in all of my interviews so make sure to practice them. Let me drop the youtube channel name that greatly helped in mastering these functions. You can click here to watch the videos