Optimizing Data Ingestion Workflows: A Jobs-To-Be-Done Study for Data Engineers

Optimizing Data Ingestion Workflows: A Jobs-To-Be-Done Study for Data Engineers

Optimizing Data Ingestion Workflows: A Jobs-To-Be-Done Study for Data Engineers

Understanding data engineering processes to impact IBM's watsonx.data.

Understanding data engineering processes to impact IBM's watsonx.data.

Context

Context

Context

In the spring of 2024, the Data & AI user research team conducted a jobs-to-be-done (JTBD) research study to impact the roadmap for watsonx.data. Using Anthony W. Ulwick’s JTBD framework, I partnered with another researcher to investigate what data engineers needed to "ingest" data.

Due to the nature of this project, I have omitted detailed information and am focusing on the process.

Role

Role

Role

Researcher

Toolbox

Toolbox

Toolbox

Research, Surveying, Usability Testing

Timeline

Timeline

Timeline

October 2023 - December 2023

Watsonx.data, released in May 2023, is IBM’s open, hybrid and governed data store.

Watsonx.data, released in May 2023, is IBM’s open, hybrid and governed data store.

Watsonx.data, released in May 2023, is IBM’s open, hybrid and governed data store.

Upon it's release, the Product Management team wanted to better understand the complexity of data engineers’ needs in the market. This deeper understanding would lead to roadmap improvements that would address the pain points of data engineers.

The (JTBD) framework focuses on understanding the underlying needs or "jobs" of a user rather than focusing on the solutions. For example, a person wants to listen to music on demand, not that they need they need the design of Spotify.

I defined the main job and created a job map for data engineers performing data ingestion - the first step in a data pipeline. It is the process of collecting data from multiple sources and moving it to a storage medium, such as watsonx.data, for analysis and use.

Upon it's release, the Product Management team wanted to better understand the complexity of data engineers’ needs in the market. This deeper understanding would lead to roadmap improvements that would address the pain points of data engineers.

The (JTBD) framework focuses on understanding the underlying needs or "jobs" of a user rather than focusing on the solutions. For example, a person wants to listen to music on demand, not that they need they need the design of Spotify.

I defined the main job and created a job map for data engineers performing data ingestion - the first step in a data pipeline. It is the process of collecting data from multiple sources and moving it to a storage medium, such as watsonx.data, for analysis and use.

Upon it's release, the Product Management team wanted to better understand the complexity of data engineers’ needs in the market. This deeper understanding would lead to roadmap improvements that would address the pain points of data engineers.

The (JTBD) framework focuses on understanding the underlying needs or "jobs" of a user rather than focusing on the solutions. For example, a person wants to listen to music on demand, not that they need they need the design of Spotify.

I defined the main job and created a job map for data engineers performing data ingestion - the first step in a data pipeline. It is the process of collecting data from multiple sources and moving it to a storage medium, such as watsonx.data, for analysis and use.

The first phase was to do desk research and conduct internal interviews in order to draft a hypothesized main job.

The first phase was to do desk research and conduct internal interviews in order to draft a hypothesized main job.

The first phase was to do desk research and conduct internal interviews in order to draft a hypothesized main job.

Desk Research

Conducting desk research involved reviewing the JTBD study of a previous Data & AI research team, reading about data stores, and learning about the JBTD process.

Internal SME Interviews

We conducted 17 interviews with internal Subject Matter Experts (SMEs) for the IBM products set to integrate with watsonx.data. We met with PM, Design, Engineering, and Sales over the course of 2 weeks with the following objectives:

  • Outline the (preliminary) main job

  • Validate the job executor is a data engineer.

Synthesis

Based on the internal interviews, we created a hypothesized main job for data engineers doing data ingestion

Hypothesized Main Job

Key Findings

  • Ingestion is often understood as “Extract and Load,” with “Transform” being optional.

  • The Data Engineer’s challenges during data ingestion have to do with data governance and intended use

  • Maturity and size of company define the roles and their boundaries of responsibilities

The second phase was to validate the hypothesized main and create a job map for data engineers doing data ingestion.

The second phase was to validate the hypothesized main and create a job map for data engineers doing data ingestion.

The second phase was to validate the hypothesized main and create a job map for data engineers doing data ingestion.

Screener

Other research squads were conducting JTBD studies for data engineers involved in data preparation and governance. It was at this point that the squads converged to create screeners. Because data engineers have responsibilities that can overlap across ingestion, preparation, and governance, we wanted to allow participants to participate in a maximum of two interviews across squads.

External Interviews

The Ingestion squad conducted 9 interviews with people who identified skills that reflected our hypothesized main job of building solutions to move data from a source to a target storage system in preparation for business-relevant tasks. We met with data engineers in addition to system engineers, data scientists, senior data analysts, and a head of technical architecture and IT operations.

Synthesis

Synthesis of the interviews followed the steps outlined by Anthony W. Ulwick. We outlined the job steps, social aspects, emotional aspects, needs, and the circumstance of data engineers focused on data ingestion.

Synthesis template used to outline the job steps, social aspects, emotional aspects, needs, and circumstances of data engineers focused on data ingestion.

Revised Main Job

The hypothesized man job was revised to reflect data discovered in the interviews. The revised main job of data engineers ingesting data is to move data to a centralized repository for use by downstream stakeholders.

The hypothesized main job was revised based on our findings.

We created a job map and identified 48 outcome statements that reflected the revised main job.

We created a job map and identified 48 outcome statements that reflected the revised main job.

We created a job map and identified 48 outcome statements that reflected the revised main job.

Job Map

The next step was to create a job map that reflected the revised job statements and placed the job steps into one of eight phases: define, locate, prepare, confirm, execute, monitor, modify, conclude.

Job map for data engineers doing data ingestion that placed their job steps into one of eight phases: define, locate, prepare, confirm, execute, monitor, modify, and conclude.

Outcome Statements

We then used the Outcome-Driven Innovation (ODI) strategy to identify and measure the specific outcomes that are critical for data engineers to successfully perform data ingestion. By focusing on these desired outcomes, the watsonx.data team can directly address data engineers' needs and maximize value.

We identified 48 outcomes statements based on the needs and pains expressed by participants in association with the job steps. Outcome statements included things such as measuring the time it takes to understand the end-use goal for data and the time it takes to recognize problems in the pipeline.

Outcome Statements

We then used the Outcome-Driven Innovation (ODI) strategy to identify and measure the specific outcomes that are critical for data engineers to successfully perform data ingestion. By focusing on these desired outcomes, the watsonx.data team can directly address data engineers' needs and maximize value.

We identified 48 outcomes statements based on the needs and pains expressed by participants in association with the job steps. Outcome statements included things such as measuring the time it takes to understand the end-use goal for data and the time it takes to recognize problems in the pipeline.

48 utcome statements for data engineers before, during, and after data ingestion.

We presented these findings to watsonx.data PM.

We presented these findings to watsonx.data PM.

We presented these findings to watsonx.data PM.

Since this was generative research, we did not have specific recommendations to deliver. Instead, these findings drove an Outcome-Driven Innovation (ODI) survey that we administered after to understand how watsonx.data can address the pain points data engineers working with data warehouses.

My final deliverable for this study was an internal w3 website, that allows IBMers to view our findings.

Since this was generative research, we did not have specific recommendations to deliver. Instead, these findings drove an Outcome-Driven Innovation (ODI) survey that we administered after to understand how watsonx.data can address the pain points data engineers working with data warehouses.

My final deliverable for this study was an internal w3 website, that allows IBMers to view our findings.

We identified what we believe to be the most impactful outcomes for each phase.

Cross-user research collaboration

Cross-user research collaboration

Cross-user research collaboration

This was my first time teaming up with other user researchers for a project. Working with others who had similar skills was rewording because I was able to learn how others conducted research.

Generative Research

Generative Research

Generative Research

My previous studies involved more testing and was at times more straightforward. Due to the nature of this study, I found myself confused or frustrated because at times. During this process, I sharpened my question-asking skills and became more comfortable in ambiguity.

© 2024 Designed & Created by Lesedi Khabele-Stevens using Figma & Framer

© 2024 Designed & Created by Lesedi Khabele-Stevens using Figma & Framer

© 2024 Designed & Created by Lesedi Khabele-Stevens using Figma & Framer