Data Analysis: Covid-19 Vaccine Adverse Reactions

While we are tracking vaccination progress, we are also obliged to track the adverse reactions so that people can know how safe it really is. Sadly there is no open-source data available to analyze, we can only use USA data for Pfizer, Moderna, and other vaccines.

We start with downloading data from Kaggle. The zip file will be downloaded. Unzip the file and open your Jupyter notebook.

Before I start writing the code, I want everyone to know that it is fairly easy to install Jupyter, download the dataset, and start analyzing. I have read all the CSV files into my Data frame.

import pyforest
df=pd.read_csv("c1.csv", encoding='windows-1252')
df1=pd.read_csv("c2.csv", encoding='windows-1252')
df2=pd.read_csv("c3.csv", encoding='windows-1252')

While there are multiple records where the user did not die and only had adverse reactions, my main analysis is around the deaths due to the Covid-19 vaccine. So, I will query the “DIED” column and search for all the rows having values=” Y” from the “df” Data frame.

Since the data is not limited to Covid-19 vaccine-related deaths but for any vaccination that was given in the USA for any ailment. We will have to find out the total number of deaths due to the corona vaccine. To summarize, there was 1 CSV file with adverse reactions given and other with type of vaccine, we have filtered our data to give us results where the user died and the vaccine was only for Covid-19.


Now that we have two data frames, with our Data we will join both the data frames to make a single data frame which will give us the required data. Just like we do it in SQL, the join will be on the column “VAERS_ID” column which is there in both the data frames.

Covid_Vaccine_Deaths = pd.merge(death_1, covid_vax_1, on=["VAERS_ID"])

Total 799 rows are present, which means 799 people have died due to covid vaccine so far in the USA. Well, that is a big number, let us dig deeper. Let us check how many young people below 40 have died due to Covid vaccine adverse reactions.


Total 13 rows, which means 13 young men below 40 died due to corona vaccination in the USA. That’s scary and terrifying. we should check the comorbidities and Reasons for death.


Except one, who died due to suicide all died natural death after receiving the corona vaccine. Now Let’s check the vaccine doses that these 799 dead people received.

unknown=Covid_Vaccine_Deaths[Covid_Vaccine_Deaths["VAX_MANU"]=="UNKNOWN MANUFACTURER"]

371 deaths by “Moderna”, 418 deaths due to “Pfizer”, 4 due to “Unknown Manufacturer”. 371+418+4= 793.

6 records have no value in the vaccine manufacturer column. Deaths due to Moderna= 46%, Deaths due Pfizer=52%, Deaths due to unknown manufacturer=0.005%.

These percentages are not very insightful since we don’t know the total number of people vaccinated with each type of vaccine given here. While Pfizer may have a higher death ratio now, it can fall down substantially if the number of total Pfizer vaccines administered is way higher. The main Purpose was to highlight how multiple Excel files can be used to analyze data.

Geopolitics and Data Science enthusiast.