Back Industry News

The Big Data Tech Inside the 2020 Census Posted on Jan 10 - 2018

Share This :

The US Census Bureau is adopting the latest data processing technology to help with its upcoming 2020 Census, including the use of a large Hadoop cluster, real-time stream data processing, and advanced mapping and visualization products.

While the 2020 Census won’t be entirely paperless, it will be the first national census that’s conducted predominantly in an electronic manner. The US Census Bureau says it will be the first census “with a full Internet option,” as well as the first to use electronic devices to manage and conduct fieldwork instead of paper.

The bureau is currently ramping up its technological prowess to ensure this massive digital transformation project goes smoothly. A key part of that investment is a contract it signed last year with Hortonworks to provide the underlying data management layer.

According to Shaun Bierweiler, vice president of the U.S. public sector business at Hortonworks, the deal will span all of the company’s offerings, including the Hortonworks Data Platform (HDP) Hadoop distribution and its Hortonworks Data Flow (HDF) stream processing system.

Scalability was a big reason Hortonworks won the contract, Bierweiler says. “When you think about the approximately 326 million Americans that the Census Bureau is going to collect and store data on,”   “you need a data platform that’s going to not just perform, but really operate at that industrial scale.”

HDP will form the main data lake — what the Census Bureau calls the Census Data Lake – that stores the lion’s share of the census data. It will also function as a staging ground for joining data from other databases, including data from other agencies that’s used to remotely identify vacant housing units without needing to send field personnel to inspect it. The cluster is expected to store both structured data (including names, addresses, and individuals’ answers to demographic questions), as well as unstructured data, such as pictures taken from Google Maps pictures or aerial imagery.

he Census Bureau says it plans to use extensive aerial and street-level imagery during the 2020 Census, both on the front-end (data collection) and back-end (data analysis) stages of the project. On the front-end, visualization will help streamline address identification, thereby minimizing the number of workers going door to door. And when workers do hit the streets, they’ll gather the data via mobile devices equipped with GIS-based navigation and routing, as explained in this informative article from GIS software provider Esri. View More


Get the Global Big Data Conference

Weekly insight from industry insiders.
Plus exclusive content and offers.