Testing Big Data_mini
Blog

Testing Big Data: three fundamental components

Big Data is a big topic in software development today. When it comes to practice, software testers may not yet fully understand what Big Data exactly is. What testers do know is that you need a plan for testing it.
15 July 2014
Big data testing
The article by a1qa
a1qa

Big Data is a big topic in software development today, and quality assurance consulting is no exception to it. When it comes to practice, software testers may not yet fully understand what Big Data exactly is. What testers do know is that you need a plan for testing it.

The problem here is the lack of a clear understanding about what to test and how deep inside a tester should go. There are some key questions that must be answered before going down this path. Since most Big Data lacks a traditional structure, what does Big Data quality look like? And what are the most appropriate software testing tools?

As a software tester, it is imperative to first have a clear definition of Big Data. Many of us improperly believe that Big Data is just a large amount of information. This is a completely wrong approach. For example, a 2 petabyte Oracle database alone doesn’t constitute a Big Data situation – just a high load one. To be very precise, Big Data is a series of approaches, tools and methods for processing high volumes of structured and (most importantly) of unstructured data. The key difference between Big Data and “ordinary” high load systems is the ability to create flexible queries.

The Big Data trend first appeared five years ago in U.S., when researchers from Google announced their global achievement in the scientific journal, Nature. Without any significant results of medical tests, they were able to track the spread of flu in the U.S. by analyzing numbers of Google search queries to track influenza-like illness in a population.

Today, Big Data can be described by three “Vs”: Volume, Variety and Velocity. In other words, you have to process an enormous amount of data of various formats at high speed. The processing of Big Data, and, therefore its software testing process, can be split into three basic components.

The process is illustrated below by an example based on the open source Apache Hadoop software framework:

  • Uploading the initial data to the Hadoop Distributed File System (HDFS).
  • Execution of Map-Reduce operations.
  • Rolling out the output results from the HDFS.

Uploading the initial data to HDFS

In this first step, the data is retrieved from various sources (social media, web logs, social networks etc.) and uploaded to the HDFS, being split into multiple files:

  • Verify that the required data was extracted from the original system and there was no data corruption.
  • Validate that the data files were uploaded to the HDFS correctly.
  • Check the files partition and copy them to different data units.
  • Determine the most complete set of data that needs to be checked. For a step-by-step validation, you can use such tools as Datameer, Talend or Informatica.

Execution of map-reduce operations

In this step, you process the initial data using a Map-Reduce operation to obtain the desired result. Map-reduce is a data processing concept for condensing large volumes of data into useful aggregated results:

  • Check required business logic on standalone unit and then on the set of units.
  • Validate the Map-Reduce process to ensure that the “key-value” pair is generated correctly.
  • Check the aggregation and consolidation of data after performing “reduce” operation.
  • Compare the output data with initial files to make sure that the output file was generated and its format meets all the requirements.

The most appropriate language for the verification of data is Hive. Testers prepare requests with the Hive (SQL-style) Query Language (HQL) that they send to Hbase to verify that the output complies with the requirements. Hbase is a NoSQL database that can serve as the input and output for Map-Reduce jobs.

You can also use other Big Data processing programs as an alternative to Map-Reduce. Frameworks like Spark or Storm are good examples of substitutes for this programming model, as they provide similar functionality and are compatible with the Hadoop community.

Rolling out the output results from HDFS

This final step includes unloading the data that was generated by the second step and loading it into the downstream system, which may be a repository for data to generate reports or a transactional analysis system for further processing: Conduct inspection of data aggregation to make sure that the data has been loaded into the required system and thus was not distorted. Validate that the reports include all the required data, and all indicators are referred to concrete measures and displayed correctly.

Testing data in a Big Data project can be obtained in two ways: copying actual production data or creating data exclusively for testing purposes – the former being the preferred method for software testers. In this case, the conditions are as realistic as possible and thus it becomes easier to work with a larger number of test scenarios. However, not all companies are willing to provide real data when they prefer to keep some information confidential. In this case, you must create testing data yourself or make a request for artificial info. The main drawback of this scenario is that artificial business scenarios created by using limited data inevitably restrict testing. Only real users themselves can detect defects in that case.

As speed is one of Big Data’s main characteristics, it is mandatory to do performance testing. A huge volume of data and an infrastructure similar to the production infrastructure is usually created for performance testing. Furthermore, if this is acceptable, data is copied directly from production.

To determine the performance metrics and to detect errors, you can use, for instance, the Hadoop performance monitoring tool. There are fixed indicators like operating time, capacity and system-level metrics like memory usage within performance testing.

To be successful, Big Data testers have to learn the components of the Big Data ecosystem from scratch. Since the market has created fully automated testing tools for Big Data validation, the tester has no other option but to acquire the same skill set as the Big Data developer in the context of leveraging the Big Data technologies like Hadoop. This requires a tremendous mindset shift for both the testers as well as testing units within organizations. In order to be competitive, companies should invest in Big Data-specific training needs and developing the automation solutions for Big Data validation.

In conclusion, Big Data processing holds much promise for today’s businesses. If you apply the right test strategies and follow best practices, you will improve Big Data testing quality, which will help to identify defects in early stages and reduce overall cost.

You can also read the article on Computer Technology Review.

More Posts

The year in valuable conversations: recapping 2023 a1qa’s roundtables for IT executives 
8 December 2023,
by a1qa
3 min read
The year in valuable conversations: recapping 2023 a1qa’s roundtables for IT executives 
From dissecting novel industry trends to navigating effective ways of enhancing software quality — let’s recall all a1qa’s roundtables. Join us!
Big data testing
Cybersecurity testing
Functional testing
General
Interviews
Performance testing
QA trends
Quality assurance
Test automation
Usability testing
Web app testing
30 July 2021,
by a1qa
4 min read
Big data testing 101: the complete guide
Check out three QA practices to ensure well-organized big data systems and high data quality.
Big data testing
30 November 2020,
by a1qa
5 min read
Acumatica: ensuring sound business operations with well-tested ERP system
Internal business activities are advancing, while ERP systems’ usage is growing rapidly. Explore how to ascertain their accurate work through timely applying QA.
Big data testing
Cybersecurity testing
ERP testing
Functional testing
Performance testing
Test automation
28 October 2020,
by a1qa
5 min read
eHealth software testing: taking the digital Hippocratic oath
Medicine has broken new ground. However, there’s still no room for errors. Get to know more information about effective testing approach in the health sector. 
Big data testing
Functional testing
Performance testing
Test automation
27 May 2020,
by a1qa
5 min read
Following six main 2020 retail trends with QA
In this article, we are talking about how QA supports prime retail trends.
Big data testing
Localization testing
QA trends
20 February 2020,
by a1qa
6 min read
Finding technologies value during digital transformation journey
To develop and make a good profit in the context of digital transformation, businesses have to follow the trends in this area. Make sure you know how technologies can help in the process of digital transformation.
Big data testing
Blockchain app testing
Cloud-based testing
IoT testing
21 January 2019,
by a1qa
5 min read
IT trends that will shape the face of QA in 2019
We’ve rounded up the top 11 tendencies that will determine the future of testing in 2019 and beyond.
Agile
Big data testing
Cloud-based testing
Cybersecurity testing
IoT testing
Performance testing
QA trends
Test automation
27 April 2018,
by a1qa
4 min read
Specifics of data warehouse and business intelligence testing
How unbiased professional testing helps get confidence in business critical data.
Big data testing
Performance testing
QA trends 2018_mini
24 January 2018,
by a1qa
4 min read
Testing trends for 2018
What trends will mark the software quality assurance of 2018? Read the article not to miss on the potential benefits while shaping your QA strategy. 
Agile
Big data testing
Cybersecurity testing
Mobile app testing
Performance testing
QA trends
Test automation

Get in touch

Please fill in the required field.
Email address seems invalid.
Please fill in the required field.
We use cookies on our website to improve its functionality and to enhance your user experience. We also use cookies for analytics. If you continue to browse this website, we will assume you agree that we can place cookies on your device. For more details, please read our Privacy and Cookies Policy.