data science and data analytics concept

Q&A with Iguazio: on Data Science, Data Analytics, and Serverless

By Community Team

In today’s hyper-competitive business landscape, data is key to understanding customers and their preferences and expectations. As businesses venture deeper into the age of digital transformation, organizations must now analyze and act upon their data in real-time to improve customer experience and gain a competitive advantage.

Recognizing the transformative power of data science, Iguazio, a Data Science and Analytics PaaS, helps digitally transform organizations by simplifying the creation and deployment of high-volume, real-time, intelligent applications.

SourceForge recently caught up with Adi Hirschtein, the Director of Product at Iguazio, to talk about the value of data science and data analytics in modern enterprises. Hirschtein also shares his expertise on how to successfully build a data pipeline and offers some insights into the future of data science and data analytics.

Q: Please share with us a brief overview of Iguazio (year founded, size, solutions, etc.)?

Adi Hirschtein, the Director of Product at Iguazio

Adi Hirschtein, the Director of Product at Iguazio

A: The Iguazio platform streamlines data science to production and derives fast time to value for application development based on machine learning. It enables data scientists, who spend most of their time on plumbing, management, and deployment, to focus on delivering better, more accurate and more powerful solutions faster. Since it was founded in 2014, Iguazio has been backed by some of the world’s top VCs and strategic investors. Today, the company has 80 employees and is led by serial entrepreneurs and a diverse team of seasoned innovators in the USA, UK, Singapore, and Israel.

Q: What business sectors do you serve and who are your current customers?

A: Iguazio powers data science for manufacturing, retail, healthcare, pharma, insurance, financial services, and telcos. The platform provides companies with the tools to develop machine learning models more efficiently and operationalize data science at scale. This translates to better success rates for companies across industries: telcos are better able to predict network health in real-time, stock exchanges deploy machine learning to develop more sophisticated trading strategies and retail companies offer real-time heat maps to better manage supply and demand.

Q: What exactly is data science and what is its value for businesses?

Data science is a broad field that enables the review, analysis, and extraction of valuable knowledge and information from data. It holds enormous potential for businesses to bring new online customers, reduce operational costs and risks and expand customers. However, without the ability to extract real-time actionable insights, simplify development and deployment and scale, the value of data science is just that – potential.

Q: How does data science differ from traditional statistical analysis?

While statistical analysis provides the methodology to collect, analyze and reach quantitative conclusions from data, data science tackles big data from multiple sources and applies machine learning, predictive analytics, and sentiment analysis to extract critical information. The end goal is to provide accurate predictions and insights that are used to power business decisions. Data science requires modern tools to simplify work based on complex data, enable accurate and sophisticated machine learning models and facilitate high performance and scalability in production.

Q: As the leading data platform for continuous analytics and event driven-applications, how does Iguazio help organizations drive more value from big data? And how do you approach a data science problem?

A: In reality, data scientists spend very little of their time on actual data science. Instead, they dedicate too much time to solving the ‘before’ issues of ingestion, plumbing, waiting for data and waiting for computing, and the ‘after’ issues of deployment and productization along with their data engineering peers. Furthermore, many organizations look to scale machine learning capabilities but encounter common notebook limitations involving scale, performance, security, and collaboration when running large amounts of models in production. Iguazio accelerates data science from exploration to production with a scalable and fast Platform as a Service. It enables organizations to extract more value out of big data by reducing time to production, while enabling data scientists to work with their favorite tools, access fresh data and eliminate waiting times with real-time performance.

Q: As experts in operationalizing data science, what are the best practices for keeping data safe and secure in the cloud, on-premises, and on the edge? Can you share with our readers your advice on how to build a successful data science pipeline?

A: The data science pipeline is currently too complex and siloed. Each step of the process requires different systems, databases, and skills, resulting in great inefficiencies for both data scientists and data engineers. Iguazio provides a single open PaaS for the entire data science lifecycle, enabling collection of multi-model fresh and historical data, exploration, and data cleansing, high-performance training and testing, one-click deployment and ongoing model management in production. This means data scientists are able to focus on actual data science while working with leading open source tools like Jupyter notebook, TensorFlow, Pytoech, Dask, and more. The platform can be deployed in the Iguazio managed cloud, in multiple public cloud service providers, at the edge or on-premises. Regardless of the deployment environment, Iguazio delivers the best in class serverless functions, data science tools, and data services in an integrated, secure and fully managed platform. The platform classifies data transactions and provides fine-grained policies to control access, service levels, multitenancy and data lifecycles.

Q: Tell us more about Iguazio as an edge data science and analytics platform provider. How is your intelligent data analytics platform empowering today’s modern enterprises?

Iguazio Data Science and Analytics PaaSA: The market’s need to process data from many devices and external sources in real-time requires a hybrid solution that overcomes the cloud’s latency constraints. In addition to its ability to process large scale data fast, the Iguazio Intelligent edge is cloud-native, including data services, AI tools (TensorFlow, Pytorch, Spark) and serverless functions (Nuclio) running on Kubernetes. It enables real-time local analytics and actions while reducing bandwidth costs by sending only critical information to the cloud for further analysis. Iguazio partnerships with both Google Cloud and Microsoft Azure to extend their offering at the edge.

Q: What are the key features and capabilities of Iguazio’s open source serverless framework? Can you provide us with sample use cases?

A: Serverless frameworks allow developers to focus on building and running auto-scaling applications without worrying about managing servers, as server provisioning and maintenance are all taken care of behind the scenes. Nuclio is Iguazio’s serverless platform, which is consumed either as an open source framework or as a managed service in Iguazio. The Nuclio processor is real-time: a single Nuclio function processor runs 370,000 function invocations per second (with a simple Go function) and responds in 0.1ms latency, which is 100x faster than most serverless/FaaS solutions. Nuclio has an open architecture which supports many event and data sources and enables fast deployment. Real-time analytics and AI is a common Nuclio use case. Nuclio processes various data and event streams, enriches them with historical or external context, and runs AI predictions to drive real-time insights.

Q: Looking ahead, what technologies, strategies, or current market movements will likely impact the future of data science and data analytics? How is Iguazio meeting these head-on?

A: The following factors will impact the ability of companies to operationalize data science:

  • Serverless technologies will simplify data science, requiring fewer resources and less time to production. This means it will become more accessible to smaller companies, rather than limited to tech giants.
  • Companies will want to process data closer to its source in order to generate faster and more accurate insights than what major cloud solutions currently enabled.
  • Data science platforms will become more open, meeting data scientist demands to work with the tools of their choice.
  • Data scientists will stop waiting for data and compute, as performance will become a higher priority during exploration and production.

Iguazio is meeting these needs head on! By supporting companies on their quest to build intelligent applications faster, Iguazio is paving the way for them to fully leverage data science and gain a competitive edge in the new digital era.

About Iguazio
Iguazio logoThe Iguazio platform streamlines data science to production and derives fast time to value for application development based on machine learning. It enables data scientists, who spend most of their time on plumbing, management, and deployment, to focus on delivering better, more accurate and more powerful solutions. Iguazio powers data science for manufacturing, healthcare, insurance, financial services, and telcos. Backed by top VCs and strategic investors, the company is led by serial entrepreneurs and a diverse team of seasoned innovators in the USA, UK, Singapore, and Israel.