What kinds of organizations are most likely to need big data management and analytical tools?
Questions
- Explain in your own words what a Filter Bubble is. How can that lead to a ‘Web of One’?
- List at least 5 different AI systems from ‘simplest’ to most developed. Explain at least one business application for everyone. Include in every explanation
a challenge the system faces. - Digital systems are more and better connected as development progresses. Users and businesses have increasingly remote access to all kinds of data.
- List at least 3 challenges to privacy and best practices to mitigate the threats.
- List at least 3 challenges to security and best practices to prevent security breaches.
Case Study Big Data · Big Data – Big Rewards
Today’s companies are dealing with an avalanche of data from social media, search, and sensors as well as from traditional sources. In 2012, the amount of digital information generated is expected to reach 988 exabytes, which is the equivalent to a stack of books from the sun to the planet Pluto and back. Making sense of “big data” has become one of the primary challenges for corporations of all shapes and sizes, but it also rep-resents new opportunities. How are companies currently taking advantage of big data opportunities? The British Library had to adapt to handle big data. Every year visitors to the British Library Web site perform over 6 billion searches, and the library is also responsible for preserving British Web sites that no longer exist but need to be preserved for historical purposes, such as the Web sites for past politicians. Traditional data management methods proved inadequate to archive millions of these Web pages, and legacy analytic tools couldn’t extract useful knowledge from such quantities of data. So, the British Library partnered with IBM to implement a big data solution to these challenges. IBM Big Sheets is an insight engine that helps extract, annotate, and visually analyze vast amounts of unstructured Web data, delivering the results via a Web browser. For example, users can see search results in a pie chart. IBM Big Sheets is built atop the Hadoop framework, so it can process large amounts of data quickly and efficiently. State and federal law enforcement agencies are analyzing big data to discover hidden patterns in criminal activity such as correlations between time, opportunity, and organizations, or non-obvious relationships (see Chapter 4) between individuals and criminal organizations that would be difficult to uncover in smaller data sets. Criminals and criminal organizations are increasingly using the Internet to coordinate and perpetrate their crimes. New tools allow agencies to analyze data from a wide array of sources and apply analytics to predict future crime patterns. This means that law
enforcement can become more proactive in its efforts to fight crime and stop it before it occurs. In New York City, the Real Time Crime Center data warehouse contains millions of data points on city crime and criminals. IBM and the New York City Police Department (NYPD)worked together to create the warehouse, which contains data on over 120 mil-lion criminal complaints, 31 million national crime records, and 33 billion public records. The system’s search capabilities allow the NYPD to quickly obtain data from any of these data sources. Information on criminals, such as a suspect’s photo with details of past offenses or addresses with maps, can be visualized in seconds on a video wall or instantly relayed to officers at a crime scene. Other organizations are using the data to go green, or, in the case of Vestas, to go even greener. Headquartered in Denmark, Vestas is the world’s largest wind energy company, with over 43,000 wind turbines across 66 countries. Location data are important to Vestas so that it can accurately place its turbines for optimal wind power generation. Areas without enough wind will not generate the necessary power, but areas with too much wind may
damage the turbines. Vestas relies on location-based data to determine the best spots to install their turbines. To gather data on prospective turbine locations, Vestas’ wind library combines data from global weather systems along with data from existing turbines. The company’s previous wind library provided information in a grid pattern, with each grid measuring 27 x 27 kilometers (17 x 17 miles). Vestas’ engineers were able to bring the resolution down to about 10 x 10meters (32 x 32 feet) to establish the exact wind flow pattern at a particular location. To further increase the accuracy of its turbine placement models, Vestas needed to shrink the grid area even more, and this required 10 times as much data as the previous system and a more powerful data management platform. The company implemented a solution consisting of IBM Info Sphere Big Insights software running on a high-performance IBM System x iDataPlex server. (Info Sphere Big Insights is a set of software tools for big data analysis and visualization and is powered by Apache Hadoop.) Using these technologies, Vestas increased the size of its wind library and is able manage and analyze location and weather data with models that are much more powerful and precise. Vestas’ wind library currently stores 2.8 petabytes of data and includes approximately 178parameters, such as barometric pressure, humidity, wind direction, temperature, wind velocity, and other company historical data. Vestas plans to add global deforestation metrics, satellite images, geospatial data, and data on phases of the moon and tides. The company can now reduce the resolution of its wind data grids by nearly 90 percent, down to a 3 x 3 kilometer area (about 1.8 x 1.8 miles). This capability enables Vestas to forecast optimal turbine placement in 15 minutes instead of three weeks, saving a month of development time for a turbine site and enabling Vestas customers to achieve a return on investment much more quickly. Companies are also using big data solutions to analyze consumer sentiment. For example, car-rental giant Hertz gathers data from Web surveys, e-mails, text messages, Web site traffic patterns, and data generated at all of Hertz’s 8,300 locations in 146 countries. The company now stores all of that data centrally instead of within each branch, reducing time spent processing data and improving company response time to customer feedback and changes in sentiment. For example, by analyzing data generated from multiple sources, Hertz was able to determine that delays were occurring for returns in Philadelphia during specific times of the day. After investigating this anomaly, the company was able to quickly adjust staffing levels at its Philadelphia office during those peak times, ensuring a manager was present to resolve any issues. This enhanced Hertz’s performance and increased customer satisfaction. There are limits to using big data. Swimming in numbers doesn’t necessarily mean that the right information is being collected or that people will make smarter decisions. Last year, a McKinsey Global Institute report cautioned there is a shortage of specialists who can make sense of all the information being generated. Nevertheless, the trend towards big data shows no sign of slowing down; in fact, it’s much more likely that big data is only going to get bigger.
Sources: Samuel Greengard,” Big Data Unlocks Business Value,” Baseline, January 2012; Paul S. Barth, “Managing Big Data: What Every CIO Needs to Know,” CIO Insight, January 12, 2012; IBM Corporation, “Vestas: Turning Climate into Capital with Big Data,” 2011; IBM Corporation, “Extending and enhancing law enforcement capabilities,” “How Big Data Is Giving Hertz a Big Advantage,” and “British Library and J Start Team Up to Archive the Web,” 2010
Case Study Questions:
- Describe the kinds of big data collected by the organizations described in this case.
- List and describe the business intelligence technologies described in this case.
- Why did the companies described in this case need to maintain and analyze big data? What business benefits did they obtain?
- Identify three decisions that were improved by using big data.
- What kinds of organizations are most likely to need big data management and analytical tools? Wh