Engineering Alignment | Engineering Alignment with AI

Photograph of Dylan Etkin

Dylan Etkin

November 16th, 2024

How to use AI to mine engineering data & analyze PRs

How to use AI to mine engineering data & analyze PRs

The opportunity of engineering data

This is the golden age of engineering data. Any modern software development process and deployment environment generates a wealth of data in form of logs, performance metrics, comments, issues, reviews, access data and so much more.

There’s always been a strong desire to make use of all this data, especially for improving developer productivity and developer experience. The challenge, however, is that there is too much data, and that means too much noise that's too much work to sift through.

For specific kinds of data such as performance metrics, tools like Datadog have done a great job of spotting anomalies and focusing your attention on what matters. But for broader and more complex sets of data coming from many different systems, what is the solution?

Now that AI has arrived, let’s evaluate how AI could possibly help us recognize patterns and dramatically increase signal-to-noise ratio.

How to tell AI to cut through the noise

AI in its current state is very good at some things and terrible at others. Consider some key do's and don’ts when using AI to make sense of data:


Do rely on AI to... Don’t expect AI to...
Evaluate large datasets. It’s much faster and accurate than humans. “Understand” what a bunch of raw data means, because it won’t.
Pick out anomalies from large datasets. Know what an anomaly is. Be very specific about what constitutes an anomaly and what attributes define good and bad data points.
Do basic numeric comparisons on data points. Synthesize or calculate complex data interpretations from multiple simple data points. Instead, pre-calculate the complex data values and provide them to the AI. And beware of AI math — ask it to do anything more than averaging and it behaves like a 3-year-old trying to do calculus.
Format your insights into digestible and flexible results. Expect it to give you as much or as little summary and analysis as you ask for. Get your formatting and results just right out of the box. You will want to be very specific about exactly what you would and wouldn’t like AI to include and format your results.

The common themes? Don’t expect AI to “know” anything, but do expect it to be very good at following instructions. Tell it what’s important, how to structure things, the “why” behind the “what,” and how you want to see the results, and it will analyze your data like the machine that it is (that’s a compliment).

Let’s look at a real example of how to use AI to continuously improve engineering organizations.

"Hey AI, detect anomalous pull request data"

Pull requests are a fairly universal way for developers to create, review, and ship code into a system. They encapsulate much of the engineering process, and they contain a ton of data that can be used to understand your team’s efficiency, where it has bottlenecks, and how to improve.

With so much data to sift through, let’s test drive AI to look at pull request data and understand which pull requests are outliers to assess for process improvements.

Step 1: Synthesize raw data into useful, telling data

Pull requests contain a lot of information, including when code was first pushed to a branch, when it was opened for review, how long it took someone to first review, and how long it took to action the review and be merged. (These are the key components of the DORA metric change lead time.)

Either using a system to calculate your change lead time (like Sleuth) or calculating it yourself, you can create pull request data that looks something like this:


1 {
2 "repository": {
3 "name": "pulse",
4 "total pull requests merged": "2",
5 "mean coding time": "3.0h 52.0s",
6 "mean review lag time": "2.0h 29.0m",
7 "mean review time": "3.0h 25.0m",
8 "pull requests": [
9 {
10 "summary": "PS-319: Unblock deploy on main",
11 "size": "Medium",
12 "url": "https://github.com/sleuth-io/pulse/pull/367",
13 "coding time": "2h 1m 27s",
14 "review lag time": "2h 20m",
15 "review time": "3h 3m 52s"
16 },
17 {
18 "summary": "PS-314 : Delete incident migrations folder",
19 "size": "Small",
20 "url": "https://github.com/sleuth-io/pulse/pull/363",
21 "coding time": "1h 1m 42s",
22 "review lag time": "9m 40s",
23 "review time": "2d 1h 40m"
24 },
25 {
26 "summary": "PS-219: Section comments functionality",
27 "size": "Medium",
28 "url": "https://github.com/sleuth-io/pulse/pull/360"
29 "coding time": "3h 54m",
30 "review lag time": "30m",
31 "review time": "15h 4m"
32 },
33 ]
34 }
35 }

Notice that we’ve calculated a lot of raw data into something meaningful, and we’ve included the mean times in the data. You can’t expect AI to know how to do this. We’re building a dataset with important information so we can ask the AI to analyze it.

Step 2: Form an interpretation of good and bad data

Remember that the AI is very good at doing what we ask and taking our word for what is and isn’t important.

So for the AI to pull out outlier data, we need to be specific about what an outlier means to us in relation to the data we’re providing.

Give AI this prompt:

Call out pull request outliers to investigate.
For each outlier make sure to describe why it is an outlier but don't explicitly
state that it is varying in coding times review lag times, or review time.
The criteria for an outlier is a pull request that has a coding time,
review lag time or review time that is far outside the mean.
Pull requests that are large or gigantic are more likely outliers than others.

This prompt tells the AI we want to pull out outliers so we can investigate them. We are telling it that an outlier is a variation away from the mean and that larger sizes play a role in determining an outlier. We are specific about how each piece of data should influence the categorization of the pull request. But also notice that we are letting the AI figure out that “2h 1m 27s” is time and how it compares to another similarly formatted time.

Step 3: Specify how we want to receive the result

The AI will give you a result if you don’t specify how you want it. However, you will likely find that the result is as hard to interpret as the original data. Thankfully, AI is great at summarization and formatting. Use this prompt:

Limit your response to three bullet points showing the worst outliers.
Limit each bullet to two sentences max.
Create one bullet point for each pull request outlier.
When referencing a pull request link to it using its url.

Adding this to our prompt allows the AI to stack rank the worst outliers and only show us the worst three. It also tells it to only show us two sentences about why, and we include the URL so we can get to the source for further investigation.

Step 4: Review the result

Given this data and our prompts, AI produces this result:

  • PR #363 took 2 days and 1 hour 40 minutes to review, which is exceptionally long compared to other pull requests. This small change to delete an incident migrations folder had a coding time of 1 hour 1 minute and a review lag time of just 9 minutes 40 seconds. 
  • PR #360 for section comments functionality had a review time of 15 hours 4 minutes, much longer than typical. The medium-sized change took 3 hours 54 minutes to code and had a review lag time of 30 minutes. 
  • PR #367 to unblock deploy on main had a coding time of 2 hours 1 minute, a review lag time of 2 hours 20 minutes, and a review time of 3 hours 3 minutes. This medium-sized change aligned closely with the repository averages of 3 hours for coding, 2.5 hours for lag, and 3.5 hours for review. 

This example only has three pull requests, but it illustrates how to instruct AI to pull out meaningful outliers from a large dataset.

Engineering intelligence using AI

AI does unlock our ability to draw complicated conclusions from complex data. However, humans must create data that makes sense, and be specific about what it means and what we want to know.

We’ve illustrated how to do this in a one-off fashion, which can be useful in some circumstances. Remember, just like the utility of Datadog’s outlier detection, the real power lies in an existing system that has wired all of this data and AI together for you.

Ideally you can rely on a SaaS provider to put the end result in front of you, so you can focus on improving and not searching for the signal within the noise. Check out Sleuth if this is a capability you and your team would find valuable.