Analyzing Boston Employee Earnings Reports (2018-2022)

Let’s explore the dataset.

We will explore the employee earnings reports for the City of Boston for the years 2018 to 2022. The datasets contain valuable information about the earnings of city employees, and we will analyze and visualize key aspects.

Loading the data for each year, paying close attention to encoding issues to ensure accurate interpretation

Data Overview

Before delving into analysis, let’s get a sense of the structure of each dataset. We’ll start with a glimpse of the first few rows and examine the basic information.
<class ‘pandas.core.frame.DataFrame’>

RangeIndex: 23605 entries, 0 to 23604

Data columns (total 12 columns):

#   Column                     Non-Null Count  Dtype

—  ——                     ————–  —–

0   NAME                       23605 non-null  object

1   DEPARTMENT_NAME            23605 non-null  object

2   TITLE                      23605 non-null  object

3   REGULAR                    21939 non-null  object

4   RETRO                      13643 non-null  object

5   OTHER                      13351 non-null  object

6   OVERTIME                   6826 non-null   object

7   INJURED                    1312 non-null   object

8   DETAIL                     2355 non-null   object

9   QUINN/EDUCATION INCENTIVE  1351 non-null   object

10  TOTAL EARNINGS             23605 non-null  object

11  POSTAL                     23605 non-null  object

dtypes: object(12)

memory usage: 2.2+ MB

None

<class ‘pandas.core.frame.DataFrame’>

RangeIndex: 23312 entries, 0 to 23311

Data columns (total 12 columns):

#   Column                       Non-Null Count  Dtype

—  ——                       ————–  —–

0   NAME                         23312 non-null  object

1   DEPARTMENT_NAME              23312 non-null  object

2   TITLE                        23312 non-null  object

3    REGULAR                     23312 non-null  object

4    RETRO                       23312 non-null  object

5    OTHER                       23312 non-null  object

6    OVERTIME                    23312 non-null  object

7    INJURED                     23312 non-null  object

8    DETAIL                      23312 non-null  object

9    QUINN/EDUCATION INCENTIVE   23312 non-null  object

10  TOTAL EARNINGS               23312 non-null  object

11  POSTAL                       23312 non-null  int64

dtypes: int64(1), object(11)

memory usage: 2.1+ MB

None

<class ‘pandas.core.frame.DataFrame’>

RangeIndex: 21858 entries, 0 to 21857

Data columns (total 12 columns):

#   Column                         Non-Null Count  Dtype

—  ——                         ————–  —–

0   NAME                           21858 non-null  object

1   DEPARTMENT_NAME                21858 non-null  object

2   TITLE                          21858 non-null  object

3    REGULAR                       21263 non-null  object

4    RETRO                         342 non-null    object

5    OTHER                         13482 non-null  object

6    OVERTIME                      6781 non-null   object

7    INJURED                       1420 non-null   object

8    DETAIL                        2101 non-null   object

9    QUINN / EDUCATION INCENTIVE   1383 non-null   object

10   TOTAL EARNINGS                21858 non-null  object

11  POSTAL                         21765 non-null  object

dtypes: object(12)

memory usage: 2.0+ MB

None

<class ‘pandas.core.frame.DataFrame’>

RangeIndex: 22552 entries, 0 to 22551

Data columns (total 12 columns):

#   Column                     Non-Null Count  Dtype

—  ——                     ————–  —–

0   NAME                       22546 non-null  object

1   DEPARTMENT_NAME            22546 non-null  object

2   TITLE                      22546 non-null  object

3   REGULAR                    21908 non-null  object

4   RETRO                      402 non-null    object

5   OTHER                      14129 non-null  object

6   OVERTIME                   6846 non-null   object

7   INJURED                    1456 non-null   object

8   DETAIL                     2059 non-null   object

9   QUINN_EDUCATION_INCENTIVE  1386 non-null   object

10  TOTAL_GROSS                22546 non-null  object

11  POSTAL                     22546 non-null  object

dtypes: object(12)

memory usage: 2.1+ MB

None

<class ‘pandas.core.frame.DataFrame’>

RangeIndex: 23204 entries, 0 to 23203

Data columns (total 12 columns):

#   Column           Non-Null Count  Dtype

—  ——           ————–  —–

0   NAME             23204 non-null  object

1   DEPARTMENT_NAME  23204 non-null  object

2   TITLE            23204 non-null  object

3   REGULAR          22604 non-null  object

4   RETRO            3092 non-null   object

5   OTHER            15826 non-null  object

6   OVERTIME         6812 non-null   object

7   INJURED          1221 non-null   object

8   DETAIL           2116 non-null   object

9   QUINN_EDUCATION  1369 non-null   object

10  TOTAL_ GROSS     23204 non-null  object

11  POSTAL           23204 non-null  object

dtypes: object(12)

memory usage: 2.1+ MB

None

From the output, we observe that each dataset contains several columns, including NAME, DEPARTMENT_NAME, TITLE, and various components of earnings such as REGULAR, RETRO, OTHER, OVERTIME, and more.

Now lets combine the all dataset and check the null values

NAME                                  6DEPARTMENT_NAME                       6TITLE                                 6REGULAR                           48080RETRO                             97394OTHER                             71225OVERTIME                          94047INJURED                          110542DETAIL                           108001QUINN/EDUCATION INCENTIVE        113180TOTAL EARNINGS                    67614POSTAL                               99 REGULAR                          69956 RETRO                            90877 OTHER                            77737 OVERTIME                         84438 INJURED                          89799 DETAIL                           89118 QUINN/EDUCATION INCENTIVE        91219 QUINN / EDUCATION INCENTIVE     113148 TOTAL EARNINGS                   92673QUINN_EDUCATION_INCENTIVE        113145TOTAL_GROSS                       91985QUINN_EDUCATION                  113162TOTAL_ GROSS                      91327dty

From the output, we observe varying degrees of missing data across different columns, highlighting potential areas for further investigation.

Conclusion
We’ve loaded, examined, and combined Boston employee earnings data from 2018 to 2022. The next steps in our journey involve a deeper dive into the datasets, addressing missing values, and uncovering meaningful insights. Stay tuned for upcoming posts where we conduct in-depth analyses and visualize trends within this rich dataset.

 

Leave a Reply

Your email address will not be published. Required fields are marked *