The potential impact of the ongoing global data explosion continues to excite the imagination. A 2018 report estimates that per second, each person generates an average of 1.7 megabytes of data per day – and the annual data generation has more than doubled since then and is projected to double again by 2025. Efficient use of big data could generate an additional $ 3 trillion in economic activity, enabling a variety of applications such as self-driving cars, personalized healthcare and traceable food supply chains.
But adding all this data to the system is creating confusion about how it can be found, used, managed and shared legally, securely and efficiently. Where did a particular dataset come from? Who owns what? Who is allowed to see certain things? Where does it live? Can it be shared? Can it be sold? Can people see how it was used?
As data applications grow and become more ubiquitous, producers, consumers and owners and data stewards are finding that there is no playbook to follow them. Consumers want to connect with their trusted data so they can make the best possible decisions. Producers need tools to securely share their data with those who need it. But technology platforms fall short, and there is no common source of truth to connect both sides.
How do we find information? When should we move it?
In a perfect world, data will flow freely like a utility accessible to all. It can be packaged and sold as raw material. It can be easily seen, without complications, allowing anyone to view it. Its origin and movement can be tracked, eliminating concerns about misuse somewhere on the line.
Today’s world, of course, does not work that way. Extensive data explosion has created a long list of problems and opportunities that make it difficult to share part of the information.
As data is created almost everywhere inside and outside an organization, the first challenge is to identify what is being collected and how it can be organized so that it is available.
Lack of transparency and sovereignty over stored and processed data and infrastructure opens up trust issues. Today, transferring data from multiple technology stacks to centralized locations is costly and inefficient. The absence of open metadata values and a widely accessible application programming interface can make data difficult to access and swallow. The presence of sector-specific data ontology can make it difficult for people outside the sector to benefit from new sources of data. Difficulty accessing multiple stakeholders and existing data services can make sharing difficult without a governance model.
Europe is leading
Despite the problems, data-sharing projects are being conducted on a large scale. Supported by the European Union and a non-profit group, Hall is creating an inter-operative data exchange called Gaia-X, where businesses can share data under the protection of strict European data privacy laws. The exchange is envisioned as an industry to share information in the industry and a repository for information about Artificial Intelligence (AI), analysis and data services around things on the Internet.
Hewlett-Packard Enterprise recently announced a solution framework for the participation of companies, service providers and public bodies in Gaia-X. The Dataspace platform, which is currently under development and based on open standards and cloud native, democratizes access to data, data analytics and AI, making it more accessible to domain experts and general users. It provides a place where domain area experts can easily identify reliable datasets and securely analyze operational data – always without the need for expensive movement of data in a centralized location.
Using this framework to integrate complex data sources across the IT landscape, enterprises will be able to provide data transparency on a scale, so everyone – whether a data scientist or not – knows what data they have, how to access it and how to do it. Must be used in real time.
Data-sharing initiatives are also at the top of the enterprise agenda. Verifying the data used to train internal AI and machine learning models faces an important priority initiative. AI and machine learning are already being used extensively in the enterprise and industry to make ongoing improvements in everything from product development to manufacturing. And we’re just getting started. IDC has projected that the global AI market will grow from 8 328 billion in 2021 to 4 554 billion in 2025.
In order to unlock the true potential of AI, governments and enterprises need to better understand the collective legacy of all the data driven by these models. How do AI models make their decisions? Do they have bias? Are they credible? Have unscrupulous individuals been able to access or modify the data that an enterprise has trained its model? Connecting data producers more transparently and more efficiently with data consumers can help answer some of these questions.
Building data maturity
Not going to solve how enterprises can unlock all their data overnight. But they can prepare themselves to take advantage of technology and management concepts that help create a data sharing mindset. They can ensure that they are developing the maturity to consume or share data strategically and effectively rather than doing it on an ad hoc basis.
Data producers can prepare for mass distribution of data by taking consistent steps. They need to understand where their data is and how they are collecting it. Then, they need to make sure that those who use the data have the ability to access the right data at the right time. This is the starting point.
Then comes the hard part. If a data producer has a consumer – which can be inside or outside the organization – they need to connect to their data. This is both an organizational and technical challenge. Many organizations want to rule out data sharing with other organizations. The democratization of data – at least in organizations being able to find it – is a problem of organizational maturity. How do they handle it?
Companies that contribute to the auto industry actively share data with vendors, partners and subcontractors. It takes a lot of parts and a lot of adjustment to assemble a car. Partners easily share information on everything from engines to tires to web-capable repair channels. Automated dataspace can serve over 10,000 vendors. But in other industries, it can be more insulating. Some large companies may not want to share sensitive information within their own business unit network.
Creating a data mindset
Companies on both sides of the consumer-producer continuity can advance their data-sharing mindset by asking them these strategic questions:
- If enterprises create AI and machine learning solutions, where do teams get their data? How are they connecting to that data? And how do they track that history to ensure the fidelity of faith and the production of information?
- If data is valuable to others, what is the monetization path that the team is taking today to expand that value and how will it be managed?
- If a company already exchanges or monetizes data, can it approve a wide range of services on the premises and in the cloud across multiple platforms?
- How are the same datasets and updates being combined today for vendors with organizations that need to share data?
- Do producers want to copy their data or force people to bring models to them? Datasets can be so large that they cannot be replicated. Should a company host software developers on its platform where it has data and move models in and out?
- How can the departments that use data influence the practices of their employees’ internal data producers?
The data revolution is creating business opportunities – as well as a lot of confusion about how to strategically search, collect, manage and gain insights from that data. Data producers and data consumers are becoming more isolated from each other. HPE is building a platform that supports both on-premises and the public cloud, using open source as a foundation, and solutions such as the HPE Izmeral software platform need to provide a common ground for both parties to make the data revolution work for them.
Read the original article at Enterprise.nxt.
This content was produced by Hewlett-Packard Enterprise. It was not written by the editorial staff of MIT Technology Review.