What is DataOps, and why its a top trend


Join Transform 2021 for the most important themes in enterprise AI & Data. Learn more.


Enterprises have struggled to collaborate well around their data, which hinders their ability to adopt transformative applications like AI.

The evolution of DataOps could fix that problem. The term DataOps emerged seven years ago to refer to best practices for getting proper analytics, and research firm Gartner calls it a major trend encompassing several steps in the data lifecycle.

Just as the DevOps trend led to a better process for collaboration between developers and operations teams, DataOps refers to closer collaboration between various teams handling data and operations teams deploying data into applications.

Gartner: DataOps is a major trend in 2021

Getting DataOps right is a significant challenge because of the multiple stakeholders and processes involved in the data lifecycle. In the DevOps world, enterprises can develop, test, and deploy app updates in a matter of hours. It is harder to move that fast in the data world, as it can take eight months to integrate an ML model into business workflows and deliver tangible value.

[Creating] a common architecture pattern helps with operationalizing data science and ML pipelines and has been identified as one of the major trends for 2021, Gartner research director Soyeb Barot said.

Gartner predicts enterprises will begin to see real gains in these efforts through the evolution and extension of DataOps to support trusted AI. The research firm predicts the number of enterprises that have operationalized their AI efforts will grow from 8% in 2020 to 70% in 2025 due to the maturity of AI orchestration platforms.

Above: Soyeb Barot, research director at Gartner.

Image Credit: Gartner

Even so, enterprises will struggle to move their AI predictive projects past the proof of concept stage because they have not addressed the full range of processes for collaborating across the AI lifecycle. A 2019 Gartner survey found that the top four challenges companies face were security or privacy concerns (30%) , complexity of AI integration with existing infrastructure (30%), data volume or complexity (22%), and potential risks or liabilities (22%).

Gartner argues that a more nuanced way of thinking about different types of collaboration can improve this transition. This includes extending the older idea of DataOps (data engineering) to include MLOps (machine learning development), ModelOps (AI governance), and Platform Ops (overarching AI platform management). It has characterized this entire collection of capabilities as XOps.

These frameworks can help implement a structured process for the people involved to productionalize AI. Think of it as the assembly line of an automobile manufacturing plant, but for data, Barot said.

Getting to DataOps

Software development was historically a slow plodding process in which developers spent months or even years working on new updates that were collectively thrown over the wall to testing and operations teams. In 2008, Andrew Clay and Patrick Debois began discussing how to streamline this process through better collaboration between developers, testers, and operation teams. This came to be known as DevOps since it improved the handoff between development and operations teams.

As the movement took hold, it led to the creation of a variety of platforms, tools, and processes that allowed teams to continuously integrate and deploy applications in small bits that could be rolled back if problems occurred. But these same kinds of innovations eluded efforts to create value from the growing volume, variety, and velocity of big data. As much as pundits predicted that big data was the new oil, companies struggled to operationalize big data in the way DevOps improved the deployment of code.

Value is gleaned from data by creating artifacts like analytics, machine learning models, and data-driven applications. But doing these things introduced a variety of new challenges and bottlenecks outside the scope of DevOps practices. In a blog post for IBM in 2014, Lenny Liebmann, then a contributing editor at InformationWeek, introduced the notion of DataOps to characterize these challenges and suggest a path forward.

In an interview with VentureBeat, Liebmann, who is now a founding partner of technology adoption consultancy Morgan Armstrong, said that at the time a lot of enterprises were struggling to solve big data problems using improved technology without addressing the organizational and process side. He said, People thought you could just throw big data into a magic bucket and it would work. But they bumped up against a variety of issues connecting disparate sources and types of data to new applications and analytics.

One of the main issues he saw was that businesses would focus on the functional aspects, like moving the actual data around through better data engineering tools, without addressing non-functional issues like performance, availability, quality, scalability, security, and governance.

A lot of the fundamental data engineering challenges have been solved as enterprises begin moving their infrastructure to the cloud. This is less a problem today than when I first talked about it, Liebmann said. The next step lies in mapping out a strategy to address security, governance, and quality issues as companies scale their data operations.

The dawn of XOps

Barot has had many conversations with enterprises asking for DataOps tools only to discover they already had a strong DataOps framework. They really needed more help in operationalizing their AI processes. This is where Gartners model of XOps emerges to provide the foundation for a more comprehensive set of distinctions.

We were looking at all these ops terminologies in the marketplace, and there was ambiguity about what they were for and the relationship between them, Barot said. We wanted to set the record straight as to what they stand for and how they are related to each other as part of bigger strategic initiatives in the enterprise.

Gartner's model for Platform Ops

Above: Gartners model for AI includes MLOps, SecOps, DevOps, and DataOps.

Image Credit: Gartner

In this expanded taxonomy, Gartner constrains DataOps to the challenges associated with building, managing, and scaling data pipelines in a way that promotes reusability, reproducibility, and rolling back changes if problems occur. Some of these key capabilities include data extraction, integration, transformation, and analysis. Governance is constrained to the data itself.

MLOps focuses on improving the collaboration across development and operationalization of the machine learning model development lifecycle. These activities are typically performed outside of the purview of traditional data engineering practices. Data scientists are often tasked with a process called feature engineering for tuning ML models to improve decision-making, discover insight, or enable a new application feature. MLOps makes it easier to tie these efforts in with teams on the operations side that are responsible for deploying the models into production.

ModelOps is an extension of MLOps to help companies work with third-party AI models that may be baked into enterprise applications or improve decision-making using tools like knowledge graphs, rules engines, or new optimization algorithms. The biggest differentiation is that MLOps makes it easier for business experts to manage AI models with less reliance on data engineering and for data science teams to implement changes.

Platform Ops provides an overarching framework to help organizations manage activities that span all of these different kinds of activities, as well as DevOps. It is also the youngest and most immature market.

AIOps would probably have been a better term to describe this overall way of thinking about AI management, Barot said. However, the term was already widely used to describe the use of AI to improve IT operations management.

While there are dozens of commercial products for the other domains, Barot said there are only four commercial Platform Ops tools today: Amazon SageMaker, Cloudera SDC, ForePaas, and OneLogic. There are also a variety of open source Platform Ops tools that are being championed by commercial vendors as part of their larger portfolio of AI tools. Barot expects to see intense competition among vendors rushing to become the AI orchestration platform other things get plugged into.

Barot cautions that there are no silver bullet products. Every enterprise will need to adopt the best capabilities suited to their existing development practices and industry niche.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *