Origin of data set Air quality index (AQI) reports daily air quality and its elevated level is associated with public health risks (Szyszkowicz 2019). Based on different national quality standards and dose-response relationships of pollutants, countries have different air quality indices (Zhang et al. 2020; Sofia et al. 2020). The Indian national air quality index considers eight pollutants (PM10, PM2.5, NO2, SO2, NH3, CO, O3, and Pb) with a 24-hourly averaging period. It is subdivided into six categories i.e., good (0–50), satisfactory (51–100), moderately polluted (101–200), poor (201–300), very poor (301–400), and severe (401–500) as shown in Fig. 1 (Perera 2018; Ghorani-Azam et al. 2016). The sub-indices for individual pollutants at a monitoring location are calculated using its 24-hourly average concentration value (8-hourly in case of CO and O3) and health breakpoint concentration range. The worst sub-index is the AQI for that location (https://app.cpcbccr.com/AQI_India/). An increment in AQI causes acute and chronic mode health concern especially in the older age people and in children (Januszek et al. 2020; Pant et al. 2020). Due to the COVID-19 pandemic confinement, there is a significant reduction in the level of such toxic pollutants globally (Selvam et al. 2020; Singh and Chauhan 2020). Fig. 1 Indian national air quality index—category and range In the present study, concentrations of different pollutants i.e., PM2.5 (diameter < 2.5 μm), PM10 (diameter < 10 μm), NO2, NH3, SO2, CO, ozone, and air quality index (AQI) were acquired from open access internet sources provided by the Central Pollution Control Board (CPCB), Ministry of Environment, Forests, and Climate Change (https://app.cpcbccr.com/AQI_India/). The data were recorded daily from January 1, 2020 to May 31, 2020, which is subdivided into two groups: (a) pre-lockdown period—January 1, 2020 to March 23, 2020, and (b) lockdown period—March 24, 2020 to May 31, 2020 at 17:00 IST among four different air quality monitoring stations of the CPCB for four major metropolitan cities in India i.e., site 1—ITO, Delhi, site 2—Worli, Mumbai, site 3—Jadavpur, Kolkata, and site 4—Manali Village, Chennai as shown in Fig. 2. For air quality assessment, % variations of air pollutants during the confinement period were compared with pre-lockdown values. Fig. 2 The geography of monitoring stations among the populous sites of four major metropolitan cities in India The air quality index is a piecewise linear function of the pollutant concentration. At the boundary between AQI categories, there is a discontinuous jump of one AQI unit. To convert from concentration to AQI, this equation is used:\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I=\frac{I_{\mathrm{high}}-{I}_{\mathrm{low}}}{C_{\mathrm{high}}-{C}_{\mathrm{low}}}\ \left(C-{C}_{\mathrm{low}}\right)+{I}_{\mathrm{low}}$$\end{document}I=Ihigh−IlowChigh−ClowC−Clow+Ilow If multiple pollutants are measured, the calculated AQI is the highest value calculated from the above equation applied for each pollutant.whereIThe (air quality) index, CThe pollutant concentration, ClowThe concentration breakpoint that is ≤C, ChighThe concentration breakpoint that is ≥C, IlowThe index breakpoint corresponding to Clow, IhighThe index breakpoint corresponding to Chigh. Moreover, we have used unpaired Welch’s two-sample t test analysis to measure the statistically significant reduction in average AQI for all four sites, as t test allows us to compare the average values of the two data sets and determine if they came from the same population. The formula for calculating t-statistics is given as:\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t=\frac{{\overline{x}}_1-{\overline{x}}_2}{\sqrt{\frac{s_1^2}{n_1}-\frac{s_2^2}{n_2}}}$$\end{document}t=x¯1−x¯2s12n1−s22n2where, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\overline{x}}_1$$\end{document}x¯1 and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\overline{x}}_2$$\end{document}x¯2 are the sample means, n1 and n2 are the sample sizes, and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${s}_1^2$$\end{document}s12 and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${s}_2^2$$\end{document}s22 are the sample variances for samples 1 and 2 respectively. To find out the most prominent pollutant concerning AQI statistically, we have done Pearson’s correlation analysis by the means of plotting heatmaps corresponding to each site. Pearson’s correlation is also known as the “product-moment correlation coefficient” (PMCC) and is suitable for measuring the extent of the linear relationship between any two quantitative variables statistically. A Pearson’s correlation is a number ranging between − 1 and + 1 showing negative to positive linear correlation. Given a pair of random variables (X1, X2), the formula for Pearson’s correlation is given by\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\rho}_{X_1,{X}_2}=\frac{Cov\left({X}_1,{X}_2\right)}{\sigma_{X_1}\ {\sigma}_{X_2}}$$\end{document}ρX1,X2=CovX1X2σX1σX2where Cov(X1, X2) is the covariance between the variables under study and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\sigma}_{X_1},{\sigma}_{X_2}$$\end{document}σX1,σX2 are the standard deviation of X1, X2 respectively.