Back

 Industry News Details

 
How Tech Enterprises Handle Big Data On Open Source And Ensure User Privacy Posted on : Jun 22 - 2018

The term “big data” gets thrown around a lot, especially taking into account its importance for driving AI technology. Finding ways to build scalable systems that provide valuable insights into what you’re doing well and what you could be doing better is imperative to maintain a competitive edge. And, as big data, artificial intelligence and machine learning become more advanced and interconnected each year, these scalable systems become more and more valuable.

When PicsArt was founded in 2011, the online landscape and the world of data collection, management and analysis were much less sophisticated. Since then, many startups have risen while others have faltered, and those that have found success were largely companies that were able to adapt to an increasingly data-driven marketplace. Today, our users generate a staggering 10 terabytes of data every single day. On a global scale, PicsArt has a medium- to large-size big data cluster with most of the large-size cluster functionalities enabled.

It was evident that we were stepping into the big data arena when our data met all four characteristics of big data: volume, velocity, variability and complexity. Once the volume of data we were dealing with was too large to fit into a relational or other standard database, the die was cast and we jumped into the big data scene with optimism and gusto. Besides that, because AI and machine learning became a mainstream technology, we were able to fully use it to benefit our users.

Adapting To A Big Data Mindset

When most people think about big data, they often imagine that the technical side would be the most difficult, but we found out through trial and error that approaching problems from a technical side first isn’t always ideal. Big data offers nearly endless possibilities, but if you don’t have a clear understanding of specific use cases and goals, you can unnecessarily prolong the development process. Since our system was constructed without a clearly delineated list of use cases, our data architects had to design it to handle as many future use cases as possible. The end result was a working system, with extensive support and capabilities, but the rollout time was longer than it could have been had we defined things better from the start.

Getting used to the sheer scope of data was a learning curve as well, especially since there was a lack of a big data community at the time. Initially, we placed responsibility for cleaning data on a single centralized team, which we quickly discovered would never work due to the constant barrage of thousands of events happening across multiple apps. Getting the data clean, we discovered, requires simultaneous efforts from the tech and business teams -- it only works if everyone is on the same page. Big data is considered the new oil nowadays, but it’s also a huge challenge in terms of how to prepare it, process it, store it and most importantly, turn it into applicable knowledge. To make that happen, it’s important to define the most common use cases within the product and align technical and business team efforts from the beginning. Overall, maintaining flexibility, learning from mistakes and adapting was essential to getting past the first step to becoming a big data company. View More