Generating Developers’ Productivity Report

BigCodeGen
5 min readOct 31, 2023

--

Source: https://middleware.io/blog/developer-productivity/

Introduction

Generating developers’ productivity reports is important for several reasons. These include but are not limited to performance assessment, resource allocation, identifying bottlenecks, quality control, project management, setting targets and goals, continuous improvements, resource planning, client and stakeholders’ communication, motivation, and accountability.

It is essential, however, to use productivity reports judiciously and fairly. Developers should be aware of the metrics being used, and these metrics should be chosen carefully to avoid unintended consequences, like excessive stress or unethical behavior. Additionally, it’s crucial to balance quantitative data with qualitative assessments to gain a comprehensive view of developers’ performance.

Problem Statement

In this article, we set out to present a step-wise approach to generating developers' productivity reports using a public git repository as our data source.

The following are the key steps(See also Fig 1):

  • Data Collection: GitHub API is used to collect data like commits, issues, and pull requests for the repository of interest.
  • Data Preparation: Process the collected data to calculate productivity metrics.
  • Data Analysis and Visualization: Create simple visualizations to display productivity trends over time for analysis.
Fig. 1: Data analytics process(Click to view source)

Solution Approach

Step 1: Data Collection and Preparation
In this Python script, we created a data collection and preparation routine using the GitHub API to fetch repository contributors’ statistics over the past year. It retrieves details such as commits, issues created, and pull requests initiated by each contributor. The script uses event history and pull request details to compile the productivity metrics for each contributor, including commits made, issues opened, and pull requests submitted. Additionally, it intends to capture code review activity and code churn, focusing on lines added, modified, and deleted in the repository.

By leveraging Python’s requests library and data manipulation with pandas, this script compiles a comprehensive DataFrame for analysis, showcasing the productivity report of contributors in descending order based on commit activity. This tabular approach is pivotal for understanding the involvement and impact of various project contributors, facilitating deeper insights into the repository’s development dynamics(Table 1).

Table 1: Repository’s development dynamics

Step 2: Data Visualization

The function visualize_combined_metrics encapsulates an insightful approach to showcase a comprehensive overview of contributors’ engagement through multiple metrics in a single graph. By leveraging Matplotlib, the function generates a consolidated visualization, shown in Fig. 2, effectively displaying various crucial metrics — ranging from commit activities to code review engagements — across different contributors in a single graph. This visualization technique enhances the accessibility and comparison of multiple contributor metrics within a repository, enabling a quick yet detailed examination of individual and collective contributions. Such visual representations facilitate a deeper understanding of the dynamic involvement and impact of contributors within a GitHub repository.

Fig. 2: Bar charts of developers’ activities to code engagements

Step 3: Data Analysis

This Python script leverages the GitHub API to extract essential repository information, encapsulating key details such as repository name, description, owner, creation and last update dates, the primary language used, stargazers count, watchers count, forks count, open issues count, and the license associated with the repository. As illustrated in Table 2, structuring this data into a comprehensive DataFrame using Pandas provides a clear and concise summary of the repository’s vital statistics. This data analysis script serves as an efficient tool for obtaining an immediate overview of a GitHub repository’s essential attributes, enabling quick insights into its activity, popularity, and licensing information.

Table 2: GitHub repository’s essential attributes

A bar chart illustrating important GitHub repository metrics: forks, open issues, and watchers count is shown in Fig 3. Utilizing Python’s Matplotlib library, the code provides a visual representation of these vital statistics, offering a clear comparison among these key aspects. This visualization serves as a concise yet effective method for quickly comprehending the relative magnitudes of forks, open issues, and the number of watchers for a GitHub repository. This graphical representation enhances the accessibility and immediate understanding of these metrics, facilitating a snapshot view of the repository’s community engagement and overall activity.

Fig 3. Github repository metrics

We extract and visualize using a line chart, the repository’s commit activity over time, offering insights into productivity trends. This is by leveraging the GitHub API to collect commit data and aggregate commit counts per day while presenting this information in a time series plot. This visualization elegantly portrays the historical evolution of commit activity, indicating the frequency of contributions over a period. Additionally, extending this analysis to showcase contributors’ committed activities over time can provide an even deeper understanding of individual involvement and collective productivity trends within the repository, illuminating the ebb and flow of contributions from different team members or contributors(Fig 4).

Fig. 4: Developers’ productivity trend.

Conclusion and Future Work

In this article, we have generated a simple report that indicates the productivity of teams (individual engineers/contributors) against a single repository over time.

Future work may include:

  • Web-Based Dashboard: Developing a user-friendly web-based dashboard, possibly using tools like Streamlit, to provide stakeholders with easy access and interactive exploration of productivity data.
  • Automated Reporting: Implementing automated scheduling and reporting mechanisms to ensure that stakeholders are regularly informed about productivity trends, reducing the need for manual report generation.
  • Machine Learning Integration: Exploring machine learning techniques, including leveraging large language models (LLMs), to enable more advanced productivity analysis with predictive capabilities.

Contributors

Taiwo O. Adetiloye
Pavan Mahuli

If you find this article interesting

…and would like to measure your developers’ productivity and/or build scalable applications; please, contact us.

--

--