Evolution of intelligent data pipelines


The potential of artificial intelligence (AI) and machine learning (ML) seems to be almost unlimited in the ability to acquire and drive new sources of consumer, product, service, operational, environmental and social value. If your organization wants to compete in the future economy, then AI must be at the core of your business activities.

A study by Kearney entitled “The Impact of Analytics in 2020” highlights the unprofitable profits and business implications for companies seeking to justify accelerating their data science (AI / ML) and data management investments:

  • Explorers can improve profits by 20% if they are as effective as leaders
  • Followers can improve 55% profitability if they are as effective as leaders
  • Laggards can improve profits by 81% if they are as effective as leaders

Business, operational and social impact can be wonderful without a significant organizational challenge data. No less than the Godfather of AI, Andrew Ng, noted the barriers to data and data management in empowering organizations and communities to realize the potential of AI and ML:

“The model and code of many applications is basically a problem to be solved. Now that the models have moved to a certain point, our data needs to work, too. “- Andrew Ng.

Data AI and ML models are the focus of training. And sorting through high-quality, reliable data highly efficient and scalable pipelines means that AI can enable these compelling business and operational results. Just as a healthy heart requires oxygen and reliable blood flow, so a steady flow of clean, accurate, rich and reliable data is important for AI / ML engines.

For example, a CIO has a team of 500 data engineers who handle more than 15,000 extract, transform, and load (ETL) tasks to acquire, move, consolidate, standardize, and align data across 100 special-purpose data repositories (data). Mart, Data Warehouse, Data Lake and Data Lakehouse). They are performing these tasks on the company’s operational and customer-oriented systems under ridiculously strict service level agreements (SLAs) to support the growing number of their growing variety of data customers. It looks like Ruby Goldberg must have become a data architect (Figure 1).

Figure 1: Ruby Goldberg Data Architecture

One-way, special-purpose, weakening of static ETL programs to transfer, clean, align, and transform data reduces the “insights” needed for organizations to fully utilize the unique economic features of data, reducing the spaghetti architectural structures According to “valuable resources” Economist.

The emergence of intelligent information pipelines

The purpose of a data pipeline is to scale the tasks of automatic and general and repetitive data acquisition, conversion, movement and integration. A properly constructed data pipeline strategy can accelerate and automate the processes associated with collecting, cleaning, converting, enriching, and transferring data to downstream systems and applications. As data volume, diversity, and speed continue to increase, the need for data pipelines that can scale linearly between cloud and hybrid cloud environments is becoming increasingly important for a business operation.

A data pipeline refers to a set of data processing activities that integrate both operational and business logic for advanced sourcing, conversion and data loading. A data pipeline can either run on a fixed basis, in real time (streaming), or be triggered by a predefined set of rules or conditions.

In addition, logic and algorithms can be built into a data pipeline to create an “intelligent” data pipeline. Intelligent pipelines are reusable and expandable economic resources that can be specialized for the source system and perform the necessary data conversions to support the unique data and analytical requirements for the target system or application.

As machine learning and AutoML become more prevalent, data pipelines will become increasingly more intelligent. Data pipelines can transfer data between advanced data enrichment and conversion modules, where neural networks and machine learning algorithms can create more advanced data conversion and enrichment. These include segmentation, regression analysis, clustering and the development of improved indicators and trend scores.

Finally, one can integrate AI into data pipelines so that they can continuously learn and adapt based on the evolving business and operational requirements of the source system, the required data conversion and enrichment, and the target systems and applications.

For example: An intelligent data pipeline in healthcare can perform grouping analysis of healthcare diagnostic-related (DRG) codes to ensure the continuity and completeness of DRG submissions, and detect fraud as data is being removed from the DRG data source by the pipeline. Systems from analytical systems.

Business value perception

Chief data officers and chief data analytics officers are being challenged to disclose the business value of their data – to apply business data to drive measurable financial impact.

The ability to get high-quality, reliable data to the right data consumer at the right time to facilitate more timely and accurate decisions will be a key differentiator for today’s data-rich companies. A Rub Goldberg system of ELT scripts and different, specialized analysis-centric repositories hinder an organization’s ability to achieve that goal.

Learn more about intelligent data pipelines Modern enterprise data pipeline (Ebook) Here by Dell Technologies.

This content is produced by Dell Technologies. It was not written by the editorial staff of MIT Technology Review.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *