Integrating Big Data the Right Way
The true value in incorporating a Big Data initiative into an overall Enterprise Data Management scheme comes from integrating, and in some cases aggregating, external Big Data with more conventional sources of data.
Doing so correctly involves accounting for issues of Data Governance, Metadata Management, traceability, and Semantic consistency that frequently require more than simply dumping data into a single repository as a data lake—which incurs the risk of creating the proverbial data swamp.
The crux of the matter, due to Big Data’s ascending popularity and the efforts of vendors to capitalize on it, is that there are “…
probably several hundred technologies out there for applying Big Data as a very broad technology space,” according to James Cerrato of Adaptive, Inc.
Integrating those technologies with conventional relational technologies and those native to the enterprise was the subject of Cerrato’s presentation at Enterprise Data World 2015 Conference, “Big Data Analytics – Are You Creating a Data Swamp.” Doing so correctly incorporates the aforementioned aspects of governance, Metadata, traceability, and Semantics in a way that facilitates much needed transparency—which Cerrato noted is the key point of distinction between a data lake and a data swamp.
Additional Integration Concerns
Aside from merely contending with a plethora of Big Data technologies and more traditional enterprise-based ones, integrating Big Data with conventional data is also exacerbated by:
· Organizational Structure: Different departments may have different objectives that require varying data types and purposes, all of which can foster a silo culture.
· Technology: Even with longstanding internal systems, integrating various technologies may provide pain points for integration prior to the incorporation of Big Data.
· Data Quality: Integrating different systems tests an organization’s Data Governance and can present issues for Data Quality concerning accuracy, completeness, timeliness, and more.
· Security: System integration and data integration also affect organizational security as access to data can change or produce undesired ramifications.
· Legacy Systems: Integrating legacy systems and their attendant technologies with ones for Big Data can prove difficult.
Automated Data Governance
The importance of having strict governance mechanisms in place when integrating Big Data with other data throughout the enterprise is a necessity for utilizing any sort of data lake option. At the macro level of governance it is necessary to establish critical facets of ownership, accountability, and the roles and responsibilities pertaining to the data in terms of stewardship, subject matter experts, and even specific members of a Governance Council. After designating these various points and relationships, organizations can actually automate them with governance tools designed for Big Data integration. Such platforms can issue alerts based on workflow automations for specific governance personnel associated with data types and processes based on regulations information, application uses, and other business functions. “Those processes will actually trigger notifications according to those accountable relationships you’ve identified that have responsibility for each step in your review process,” Cerrato said.
Source:http://www.dataversity.net
