By Published: Sept. 4, 2018

Partnership with Zillow offers unprecedented ability to track where people have lived since 1810


Take a plot of land鈥攎aybe the one your house, your school or your business sits on鈥攁nd imagine the way it鈥檚 changed over the last 200 years. From the very first settlers and their primitive structures to the modern homes and office complexes that likely exist there now, that one small piece of land has evolved with the times.

Two researchers at the 麻豆视频 are exploring human settlement and urbanization patterns in the United States between 1810 and 2015 using a groundbreaking new dataset from Zillow, the online real estate and rental marketplace. A paper describing the initial data products the researchers created using Zillow鈥檚 information was published today in the open-access online journal听Scientific Data.

Leyk

Stefan Leyk

While the raw data themselves are proprietary to Zillow, the CU researchers are able to share the so-called data derivatives they have created with the research community and the public.听

These and future data products will almost certainly serve as a launch pad for a slew of novel research projects related to natural hazards, land-use changes, ecology, demography, urban geography and more, said听Stefan Leyk, a CU Boulder associate professor of geography who co-authored the paper, titled 鈥淗ISDAC-US, historical settlement data compilation for the conterminous United States over 200 years,鈥 with geography graduate student听Johannes Uhl.

鈥淭his is unique data that simply never existed before in this dimension,鈥 Leyk said. 鈥淲e can go back more than 200 years across most of the United States to understand where people have settled at which point in time. That鈥檚 something we simply have never, ever seen before. But before we can start on any research projects, we have to actually write data products that we can use for research that will come in the near future.鈥

We can go back more than 200 years across most of the United States to understand where people have settled at which point in time. That鈥檚 something we simply have never, ever seen before."

On somewhat of a whim, Leyk and Uhl reached out to Zillow roughly two years ago to see if the company would consider collaborating with them by sharing its massive cache of property data. Though they didn鈥檛 know exactly what the data looked like, Leyk and Uhl were intrigued and excited about what they might discover.

After working together on an agreement about how the information could be used and shared, Leyk and Uhl set to work sifting through Zillow鈥檚 Transaction and Assessment Dataset鈥攐r ZTRAX for short鈥攚hich contained more than 374 million data records.听

In essence, Zillow had been collecting property records from as many U.S. counties as possible, dating back to the earliest structure built on each parcel of land. Zillow created its database with information from a major third-party data provider and from an internal company initiative called County Direct, which is gathering data from assessor and recorder鈥檚 offices across the country.听

This was an undertaking Leyk had attempted at one time, but found it to be an extraordinarily time- and labor-intensive process. With more than 3,100 counties in the United States, Zillow鈥檚 dataset was 鈥渁 tremendous effort,鈥 Leyk said.听

For its part, Zillow understands the value of partnering with academic researchers to help comb through and analyze its massive collection of information.

鈥淶illow has a huge treasure trove of really fascinating data, and there鈥檚 a lot of important research we can do with it,鈥 said Sarah Mikhitarian, senior economist at Zillow. 鈥淪everal members of our economic research team come from an academic background and are interested in the type of research that some of these other organizations are pursuing. We don鈥檛 always have the time and resources to do it, though, so it鈥檚 great to collaborate with outside researchers who do.鈥

After designing a data structure and extraction workflow, Leyk and Uhl sorted the data into 250-by-250-meter plots of land. They also sorted the data over time, looking at each plot every 5 years between 1810 and 2015.

With this information, they were able to sum up how much indoor building area accumulated on each plot in a given year, which indicates how intensely the land has been developed. The researchers also determined the year of the first settlement for each plot.

鈥淲e really can understand, in incredible detail, how did we occupy the landscape? What are the potential impacts because we settled in certain regions? What happened to wetlands and hydrological systems?鈥 Leyk said, noting that the data could prove useful for interdisciplinary research related to fire- and flood-risk modeling, for example.

The data products created by Leyk and Uhl are now part of the public domain and are accessible to other researchers through the听, an open-source data repository.听

The CU researchers funded this initial project with a seed grant from the CU Population Center. Now that they have arranged the Zillow data into useful formats, they plan to go after larger federal grants from the National Science Foundation or National Institutes of Health for further research.听

Both Leyk and Uhl said they have been impressed with Zillow鈥檚 willingness to collaborate with academia and make this extremely valuable data accessible to the world.

鈥淭his project shows the benefits of collaboration between industry and research institutions,鈥 Uhl said.