Industry News Details

DataOps: DevOps Plus Big Data Posted on : Nov 06 - 2018

In traditional DevOps, there are the complimentary forms of development operations (that I call DEVops), and development operations (that I like to call devOPS). Between them they automate the toolchain and bring the people working on getting an application out to users onto the same team—not necessarily the same organizational team, but using their strengths in tandem to meet the needs of the business.

DataOps is an interesting extension of DevOps. You still need all of the coding side to get data into the system and queries consistently maintained. You still need all of the operations side to get your database/noSQL/whatever up and running. In fact, while the development side in a DataOps environment might be (but isn’t necessarily) lighter, the operations side is almost always more complex. No matter the big data engine in use, it is a complex system in addition to other supported systems in a normal environment. My first installation of a big data environment (Cloudera, as it happens) was a weeks-long learning voyage. Only after I’d completed it did I use an automation tool (which is no longer available) to make it easy. My second round was hours … But it assumed the knowledge I had gained in the first attempt.

The other bit of DevOps—ongoing monitoring and management—is also more complex in a big data environment, but we’ll come back to that in a moment.

Once the normal DevOps systems are in place, ETL/data import tools will need to be supported also. The volume that these tools crunch through in a day makes their inclusion in DevOps in a data-heavy environment critical. If the data uptake is slow or the data itself inaccurate, there is an impact on the organization. This step also requires inclusion of data scientists, people who traditionally have not been pulled into the DevOps model. But they are the ones that can gauge the accuracy of data, and normally are responsible for data acquisition anyway.

Which brings us to the monitoring and management stage of DataOps. Normally when we talk monitoring and management in DevOps, we are talking about tools that ultimately help with things like availability, responsiveness, auto-scaling and recovery. DataOps needs all of these things, like any other application does. And we’ve already mentioned that it has a complex environment embedded in your complex environment. But it also (some would argue more importantly) needs data monitoring. A knowledge of the state of the data is imperative. View More