IEC and ISO publish technical report which provides overview of big data framework and reference architecture

The global big data analytics market is expected to reach USD 105,08 billion by 2027, thanks to increasing volume of data and adoption of big data tools, according to a report by Research and Markets.

IEC and ISO big data reference architecture series of standards establishes the foundations for a big data ecosystem

The report notes that the growth driver of the big data analytics market is the increasing volume of data from mobile data traffic, cloud-computing traffic and the rapidly expanding development and adoption of technologies, such as the Internet of Things (IoT) and artificial intelligence (AI).

In this rapidly evolving field, implementers of this technology face the challenge that there is no consistent approach to describe a big data architecture and implementation.

The role of standards

IEC and ISO develop publications and international standards for information and communication technologies through their joint technical committee (ISO/IEC JTC 1) with committees covering 22 different areas. SC 42 covers artificial intelligence.

SC 42 has published ISO/IEC Technical Report 20547-1, which provides a framework and application process for organizations to apply to build a big data architecture for their problem domain. The aim of the framework is to effectively and consistently describe the organization’s architecture and its implementations with respect to the roles/actors (big data application and framework providers or service partners) and their concerns (technical, operational, legal etc.) as well as the underlying technology. Organizations can then map those to activities and functional components that will implement the architecture.

“The digital transformation of industry has brought into focus the need for computer systems to deal with large and diverse data sets whose properties, such as variety, volume, velocity and veracity, may differ significantly based on the application,” said Wael William Diab, Chair of SC 42. “The big data reference architecture series of standards establishes the foundations for a big data ecosystem.” The TR describes the big data reference architecture (BDRA) framework and provides a process for mapping a specific problem set/use case to the architecture and evaluate that mapping.

This TR is part of a series of standards for big data reference architecture which includes ISO/IEC TR 20547-2, for use cases and derived requirements, ISO/IEC 20547-3 for reference architecture and ISO/IEC TR 20547-5 which provides a standards roadmap, as shown below. Additionally, ISO/IEC 20546 provides a big data overview and vocabulary.

Understanding big data
Big data reference architecture

Understanding big data

In order for stakeholders of big data systems to understand what they are buying and implementing, and to support robust and accurate communication, a clear framework for communications with potential technology and service vendors is needed.

This includes understanding possible issues and liabilities around managing and controlling data, for security, quality, compliance, copyright and privacy.

“It’s crucial for organizations to be able to identify, define and articulate data security, provenance, and governance policies and implement and document the technical controls to enforce those policies. In this manner they will be able to protect themselves from liability for breaches or misuse of the data they control”, said Wo Chang, Convenor of SC 42 Working Group for big data which develops the ISO/IEC 20547 series of standards.

Additionally, many organizations dealing with big data acquire data externally. Thus, the systems that collect and analyze big data must be secure/exchange data reliably, and be interoperable at the system, component, and data level.

Key elements of the big data reference architecture

The TR considers architecture concepts, including the ability for the big data system to process extensive data sets definitions for efficient storage, manipulation, and analysis and provides definitions of BDRA systems.

The TR considers concepts and structures from ISO/IEC/IEEE 42010, Systems and software engineering — Architecture description, to depict the outline for a reference architecture structure.

It considers key elements of a big data reference architecture, including:

  • An overview, which provides the scope of each of the five parts of the ISO/IEC 20547 series, logical relationships of each document and application process of the BDRA.
  • Stakeholders, of which there are many, for instance, the system owners, customers, and system implementors. In the case of big data systems, this could include anyone with an interest in the data being processed by the system, such as data owners who may provide data to the system, data consumers who make decisions based on the data coming from the big data system, and also people or organizations who can be described by the data.
  • Concerns are defined as any aspects of the big data systems, including technical, business, operational, legal, and even social influences on a system in its environment, for example,  the quality of the software in a big data system which includes effectiveness, efficiency, trust, risk and risk mitigation, flexibility.
  • Views, comprising the user view, which describes the roles, sub-roles, activities, and cross-cutting aspects necessary to meet the concerns of the stakeholders. It also includes the functional view, which describes the functional layers, functional components, and multi-layer functions necessary to implement the activities and cross-cutting aspects defined in the user view.

Applying the BDRA

Finally, the TR looks at how to apply the BDRA to a real-world problem domain. It provides a stepwise process for applying the reference architecture to develop an architecture description for a given big data system implementation.

The BDRA is very general and designed to apply to a wide range of systems, due to the broad variety of potential big data systems and components that can comprise them. The process is designed to support extension of the BDRA to meet unique requirements of the given system.

“This application process provides a rigorous approach based on architecture, system, and software engineering standards to enable system creators to map and apply available technologies and standards to meet their requirements as part of a flexible, open, and standards based architecture”, said David Boyd, Editor for ISO/IEC 20547-1.

Defining stakeholders and their concerns

A key part of this process is to define stakeholders and their concerns. These should include aspects such as privacy and government regulations, for example, GDPR in the European Union.

These elements must be captured in a way that will enable traceability from the system activities and components, in order to support verification of the process.

Mapping stakeholders and concerns to roles and sub-roles

The TR suggests a cross-reference matrix as a helpful tool to establish and maintain the mapping between concerns and the roles/sub-roles which require activities to address those concerns. This step ensures that the system being developed will be able to perform all the activities required to do so.

Developing detailed activity descriptions and map to concerns

This step defines what the system or architecture will do. The BDRA in the form of the roles and sub-roles provides a framework for capturing and organizing the output of the systems and software engineering processes.

Defining functional components to implement activities

This is the high-level design phase for a big data system. The functional layers and classes of functional components provided in the function component view of the BDRA provide a framework for organizing the actual configuration items (be they software or hardware) that make up the big data system architecture.

This final step involves validating that every concern can be traced to a functional component through an activity, and that every activity is in fact touched in those links.

The selection of a database traceability tool for capturing all this information proves essential in efficiently validating the high-level architecture.