Big data applications

Richard J Self, Research Fellow – Big Data Lab, University of Derby, examines the role of software testing in the achievement of effective information and corporate governance.

As a reminder, software testing is about both verifying that software meets the specification and also validating that the software system meets the business requirements. Most of the activity of the software testing teams attempts to verify that the code meets the specification. A small amount of validation occurs during user acceptance testing, at which point it is normal to discover many issues where the system does not do what the user needs or wants.

It is only too clear that current approaches to software testing do not, so far, guarantee successful systems development and implementation.

IT project success

The Standish Group have been reporting annually on the success and failure of IT related projects since the original CHAOS report of 1994, using major surveys of projects of all sizes. They use three simple definitions of project successful, failed and challenged projects, as follows:

Project successful:

The project is completed on time and on budget, with all features and functions as initially specified.

Project challenged:

The project is completed and operational but over‑budget, over the time estimate, and offers fewer features and functions than originally specified.

Project failed:

The project is cancelled at some point during the development cycle.

Due to significant disquiet amongst CIOs about the definition of success requiring meeting the contracted functionality in a globalised and rapidly changing world, Standish Group changed the definition in 2013 to:

Project successful:

The project is completed on time and on budget, with a satisfactory result; which is, in some ways, a lower bar.

As the graph in Figure 1 shows, the levels of project success, challenge and failure have remained remarkably stable over time.

It is clear that, as an industry, IT is remarkably unsuccessful in delivering satisfactory products. There is a range of estimates of the resultant costs of challenged and failed projects which range from approximately US$500 billion to US$6 trillion, which compares to the annual ICT spend of US$3 trillion in a world GDP of approximately US$65 trillion.

Clearly something needs to be done.

The list of types of systems and software failures is too long to include here but a few examples include the recent announcements by YAHOO of the loss of between 500 and 700 million sets of personal data in 2012 and 2014, the loss of 75 million sets of personal and financial data by Target in 2012 and regular failures of operating system updates for iOS and Windows etc.

Common themes, verification and validation

Evaluating some of the primary causes of the long list of failures suggests some common themes and causes ranging from incomplete requirements capture, unit testing failures, volume test failures due to using too small an environment and too small sets of data, inappropriate HCI factors and the inability to effectively understand what machine learning is doing.

Using the waterfall process as a way of understanding the fundamentals of what is happening, even in agile and DevOps approaches, we can see that software verification is happening close to the end of the process just before implementation.

As professionals we recognise that there is little effective verification and validation activity happening earlier in the process.

The fundamental question for systems developers is, therefore, whether there is any way that the skills and processes of software testing can be brought forward to earlier stages of the systems development cycle in order to more effectively ensure fully verified and validated requirements specifications, architectures and designs, software, data structures, interfaces, APIs etc.

Impact of big data

As we move into the world of big data and the internet of things, the problems become ever more complex and important. We have the three traditional Vs of big data: velocity, volume and variety which stress the infrastructures, cause problems with ensuring data dictionaries are consistent between the various siloes of databases, the ability to guarantee valid and correct connections between corporate master data and data being found in other databases and social media.

Improved project governance

If the IT industry is to become more successful, stronger information and project governance is required that is based on a holistic approach to the overall project, ensures a more effectively validated requirement specification, far more effectively verified and validated non‑functional requirements, especially in the areas of security by design and the human‑to‑computer interfaces.

It is also vital to ensure that adequate contingencies are added to the project estimates. The 2001 Extreme Chaos report observed that for many of the successful projects, the IT executives took the best estimates multiplied by 2 and added another 50%. This is in direct contrast to most modern projects where the best and most informed estimates are reduced by some large percentage and a ‘challenging target’ is presented to the project team. Inevitably, the result is a challenged or failed project.

If we can achieve more effective project governance, with effective verification and validation of all aspects from the beginning of the project, the rewards are very large in terms of much more successful software that truly meets the needs of all the involved stakeholders.

12 Vs of project governance and big data

One effective approach is to develop a set of questions that can be asked of the various stakeholders, the requirements, the designs, the data, the technologies and the processing logic.

In the field of information security, ISO 27002 provides a very wide range of questions that can help an organisation of any size to identify the most important aspects that need to be solved. By analogy, a set of 12 Vs have been developed at the University of Derby which pose 12 critical questions which can be used both with big data and IoT projects and also for more traditional projects as the ‘12 Vs of IT Project Governance’.

The 12 Vs are:

Volume (size).

Velocity (speed).

Variety (sources/format/type).

Variability (temporal).

Value (what/whom/when?).

Veracity (truth).

Validity (applicable).

Volatility (temporal).

Verbosity (text).

Vulnerability (security/reputation).

Verification (trust/accuracy).

Visualisation (presentation).

As an example, the Value question leads towards topics such as:

Is the project really business focused? What are the questions that can be answered by the project and will they really add value to the organisation and who will get the benefit and what is the benefit? Is it monetary? Is it usability? Is it tangible or intangible?

What is the value that can be found in the data? Is the data of good enough quality?

The Vulnerability question leads towards: Is security designed into the system, or added as an afterthought? Major consequences could result in significant reputation damage.

Incorrect processing leads to reputation damage.

The Veracity question is developed from the observation by J Easton2 that 80% of all data is of uncertain veracity, we cannot be certain which data are correct or incorrect, nor by how much the incorrect data are incorrect (the magnitude of the errors).

Data sourced from social media is of highly uncertain veracity, it is difficult to detect irony, humans lie, change their likes and dislikes, etc. Data from sensor networks suffer from sensor calibration drift of random levels over time, smart device location services using assisted GPS have very variable levels of accuracy. A fundamental question that needs to be asked of all these data, is how can our ETL processes detect the anomalies? A second question is to what extend do undetected errors affect the Value of the analyses and decisions being made?

Formal testing of BI and analytics

One further fundamental issue (identified by the attendees at The Software Testing Conference North 2016)3 was that the formal software testing teams are very infrequently involved in any of the big data analytics projects. The data scientists, apparently, ‘do their own thing’ and the business makes many business critical decisions based on their ‘untested’ work. In one comment, the models developed by the data scientists produced different results depending on the order in which the data were presented, when the result should have been independent of the sequence.

In conclusion, the fundamental challenge to the testing profession is to determine how their skills, knowledge, experience, processes and procedures and be applied earlier in the development lifecycle in order to deliver better validated and verified projects which can be delivered as a ‘successful project’ (in Standish Group terms)? Are there opportunities to ensure more comprehensive and correct requirements specifications?

This article is based on the presentation delivered on the 28th September 2016 at The Software Testing Conference North 2016. Video can be found here. 

This article first appeared in the November 2016 issue of TEST Magazine. Edited for web by Jordan Platt.

 

 

[Source:- softwaretestingnews]

Google Outlines the Amazing Opportunities of Data in New Report

As we get caught up in the daily excitement of the latest trends, functionalities and changes in social media and digital marketing, it can be easy to overlook the wider, more significant advances being made possible by increased global connectivity and the pervasiveness of online data. Facebook recently released a report on social media and its impact on cultural trends which outlined how the advent of social media has provided more access to information, which, in turn, has lead to greater understanding and progress on many issues.

Google Outlines the Amazing Opportunities of Data in New Report | Social Media TodayWhen viewed in a broader context like this, it becomes easier to see the impact that technology is having on our everyday lives, and this week, Google released a blog post to coincide with Earth Day (April 22nd) which examines how Google sources are being used to advance sustainability initiatives around the world.

TIGER CONSERVATION

The first example Google presents highlights how scientists at the University of Minnesota are using the Google Earth engine – “a multi-petabyte catalog of satellite imagery and geospatial datasets with planetary-scale analysis capabilities” – in their efforts to help restore tiger habitats in key regions.

Google Outlines the Amazing Opportunities of Data in New Report | Social Media TodayTo better focus their conservation efforts, the research team measured habitat loss in the world’s 76 tiger habitats over the past 14 years.

“They found that forest loss was much lower than anticipated across all tiger landscapes (roughly 8 million hectares, or less than 8 percent of the total habitat). Thanks to preservation of habitat in countries like Nepal and India, tiger populations in those countries have already increased 61 and 31 percent, respectively.”

One of the most amazing elements of this research is that it’s largely been conducted using satellite imaging information which is freely available via Google’s Earth Engine (though it does come with some use restrictions). For example, right now you can go to the Earth Engine website, enter in the global location of your choice, and you can watch how that area has evolved over time via landsat satellite imagery.

It’s fascinating to consider the ways in which such data insights can be used, particularly when looking at the example provided, in regards to mapping de-forestation and the impact that can have on native wildlife – and how such impacts can be negated in future.

SOLAR POWER

Another interesting Google research project is Project Sunroof, a solar calculator that estimates the impact and potential savings of installing solar on the roof of your home”. Through the use of Google Earth imagery, overlaid with annual sun exposure and weather patterns, Project Sunroof aims to “assess viable roof space for solar panel installation, estimate the value of solar and savings based on local energy costs, and connect you with providers of solar panels in your area”.

Google Outlines the Amazing Opportunities of Data in New Report | Social Media TodayIt’s another great use of our ever-expanding data pool to make more informed decisions about important projects – in this case, energy consumption. Project Sunroof is now available in 42 U.S. states, with data available for more than 43 million rooftops, providing an indicator of the possible savings and benefits that could be gleaned from wider adoption of solar energy – customized to your own house and/or region.

MANAGING AIR POLLUTION

And the third example highlighted by Google is a project being spearheaded by Google Earth Outreach and the Environmental Defense Fund which looks at ways to map methane gas leaks from natural gas pipelines beneath our streets.

As detailed in the video, Google’s able to do this by fitting Google Streetview cars, which are constantly traveling around and mapping our roads, with methane gas analyzers. This means that as the cars drive around capturing image content for Google Maps, they’re also measuring methane concentration every half-second as the car moves. With that data, the research team is then able to map both where and how big methane leaks are.

Google Outlines the Amazing Opportunities of Data in New Report | Social Media TodayMethane gas emissions can cause significant environmental impacts, and correcting them can provide a range of benefits. As per Google’s post:

“What we found ranges from an average of one leak per mile (in Boston) to one leak every 200 miles (in Indianapolis), demonstrating the effectiveness of techniques like using plastic piping instead of steel for pipeline construction. We hope utilities can use this data to prioritize the replacement of gas mains and service lines (like New Jersey’s PSE&G announced last fall). We’re also partnering with Aclima to measure many more pollutants with Street View cars in California communities through this year.”

Projects like these underline the expanded possibilities of technology and connectivity, which is providing new ways to live and work smarter through increased tracking of an ever-expanding range of measures. And this is the big thing with data – with 90% of the world’s data only being created within the last few years, we’ve not had the chance to fully explore what all this insight means, there’s simply too much to take in all at once, too much to factor into your decision making to effectively utilize all these new inputs. But we are learning, and one the key things we’re coming to realize is that the core of effective big data use lies in breaking it down into small data – working within the wider dataset to pinpoint the insights and information of relevance to you and your needs, rather than being overwhelmed by everything.

When you consider these insights in the scope of the other data sources available – Twitter data’s being used to map earthquakes and flood damage, Facebook insights are being used to glean better understandings about who we are and what we’re interested in. When you match up all the various data points, the potential for insight is amazing – the opportunities available to all people to track and measure important trends and behaviors relevant to you, your business, your community – the capacity of such analysis is virtually endless. It all comes down to how you target your research, how capable you are in narrowing down the data respective to your needs. Because it’s all there, you just need to know what you’re looking for.

And once you know that, the data opportunities from our hyper-connected world are beyond anything you can imagine.

 

 

[Source:- Socialmediatoday]

Right to be Forgotten: Protection of privacy or breach of free data?

Right to be Forgotten: Protection of privacy or breach of free da...

The data protection authority of France has fined Google by €100,000 (Rs. 74,64,700 approx.) for inadequate removal of history data and activities related to personal web searches. In accordance to a ruling by the European Court of Justice in May 2014, individuals received the power of asking search engine monitors like Google and Microsoft to remove irrelevant and inappropriate information related to web search results. This ruling gave rise to the ‘Right to be Forgotten’ — a right that has since been debated on regarding its status as a special provision or as one of the fundamental human rights.

In an issued statement, the Commission Nationale de l’Informatique et des Libertes (CNIL) stated that “the only way for Google to uphold the Europeans’ right to privacy was by delisting inaccurate results popping up under name searches across all its websites.” However, in counter-argument, Google stated that removal of past data from the entirety of the Internet means restricting free flow of information across the virtual web. This may (read: will) have massive implications in relation to information sourcing, that often plays critical role in precedence across multiple cases. As a result, Google removed data of specific requests from its local websites, and not the international platform. For instance, if it were applicable in India, an Indian’s request to enforce his/her right to be forgotten would lead to the removal of the relevant URL only from Google.co.in, and not Google.com. This has been done to preserve the sanctity of natural course of action, i.e., a proper reflection of reality wherein an action done in the past cannot be undone under any circumstance.

The question of privacy looms large, as does the question of removing actions that may hold importance

The CNIL, however, has disagreed on this term. “Applying delisting to all of the extensions does not curtail freedom of expression insofar as it does not entail any deletion of content from the Internet,” the body stated. To provide a solution to the claims of the European Union and keep its principal operating ways fluent, Google decided upon faux removal of information wherein a person will not see the data he/she requested to be removed when accessing the search engine from his country. For instance, a French national will not see the link requested to be removed across all of Google’s sites, when accessing the data from within France. Such action was taken in order to solve security concerns of a nation, while keeping the international access of data intact. “As a matter of principle, we disagree with the CNIL’s assertion that it has the authority to control the content that people can access outside France, and we plan to appeal their ruling,” a Google spokesman told Reuters.

The fine has been imposed after the French data protectors decided that right to privacy of personal information cannot be adequately confined in terms of geographical locations, and “only delisting on all of the search engine’s extensions, regardless of the extension used or the geographic origin of the person performing the search, can effectively uphold this right.” It will be interesting to see the next course of actions that Google takes in accordance to the Right to be Forgotten.


Contested: Should the Right to be Forgotten be allowed easy enforcement?

More countries have been recognising the Right to be Forgotten as an effective ruling, with Japan citing the right against Google in a lawsuit, where a man was accused of involvements with child pornography. While the question of privacy and the amount of information available in the hands of search engine giants is a pertaining question demanding wider, concrete rulings (which, incidentally, is difficult to enforce), the presence of information on the Internet has aided multiple instances of straightening affairs of crimes and legal involvements.

The path, as it seems, can be wider than a mere fine and singular lawsuits.

 

[Source:- Digit]