Much like open-source software, open data is already playing an important role in transit, and the opportunities for more impact in the world of mobility management are growing.
In this blog series, we’ll be exploring:
- What “open data” means
- What’s possible in our everyday lives because of open data
- What’s possible in transit because of open data
- How open data can help mobility managers
- What the future holds for mobility management and open data
Defining Open Data
The term open data is made up of two words that are so common and familiar that it’s tempting to rely on an intuitive understanding of what they mean when combined into a single concept. However, like most technical terms, open data has a fairly specific meaning. To appreciate the power of open data, we need to start from the same definition.
The Open Knowledge Foundation presents a comprehensive definition of “openness” as it applies to open data and open content, summarizing their definition this way: “Open data and content can be freely used, modified, and shared by anyone for any purpose.” If you’ve read about open-source software, this will sound familiar. That’s not a coincidence: The open-source movement inspired further work to apply the same underlying principles to data.
What Makes a Dataset Open Data?
Another way to capture the essence of the open data concept is to describe open data as both “technically open and legally open,” as presented in the California Open Data Handbook. The technical openness of data refers to how accessible it is; the legal openness of data refers to the license that gives users permission to use the data once they’ve accessed it.
Accessibility of Data
It’s possible for data to be available to the public while still not being easy to acquire and analyze at a large scale. To be fully in the spirit of openness, data should be:
- Machine readable, without a requirement to purchase any particular type of software. To the degree possible, data should be presented in standard formats which themselves are open. Proprietary formats, such as those used by older versions of Excel, are by definition not open. For more information on machine readability and examples of open formats, see this primer from Data.gov.
- Well documented. At its best, technically open data has metadata (“data about the data”) or other documentation that explains what it is, where it comes from, and how it’s maintained.
- Easily searchable. Especially for large data sets, the holder of the data should provide some way to find the desired subset of the data so that a consumer does not need to download and sort through terabytes of files.
- Readily accessible. Users can access technically open data without paying money or logging in with a credential or account.
Permission to Use Data
Legally open data has a license that clearly states that the data can be used with no restrictions, for commercial and non-commercial reasons.
It might seem like openness of data is a binary issue–either data is open or it’s not. In practice, it’s more like a window or a door, which can be ajar a little bit, open as far as it can possibly be opened, or anywhere in between.
Sir Tim Berners-Lee, who invented the World Wide Web, created a five-star system to mark specific points along a continuum of openness. To be considered “open” by his definition, data must earn at least 3 stars.
★ Online: Data is discoverable and available online, in any format
★★ Standardized: Online and accessible in a structured data format
★★★ Open Format: Online, standardized, and freely usable, in a nonproprietary format
★★★★ Universal Reference: Online, standardized, freely usable, and is anchored with a Uniform Resource Identifier (for example, a web address)
★★★★★ Contextualized: Online, standardized, freely usable, universal reference and linked to other datasets to provide context
These examples provide a good overview of what’s possible with data at each level of the five-star system.
Is Public Data the Same Thing as Open Data?
Data produced by government entities is quite often public data—but just because data is shared with the public, it does not mean that it is open data.
For example, posting a document in PDF format on a government website makes it publicly available. However, it’s not technically open because a computer often can’t reliably read the contents of a PDF. You can consult the document and draw conclusions from it, but if you want to take data from the PDF, reorganize it, analyze it, or otherwise manipulate it, you may have to manually extract the data or turn to sophisticated “scraping” software to convert it into something more accessible.
Open data doesn’t just happen automatically by uploading files to a website. Making data fully open requires a thoughtful effort that considers the particulars of the data itself, its potential consumers, and the barriers those consumers may encounter as they work to transform it into meaningful information. When such efforts are successful, the potential social gains become apparent.
Stay tuned for Part 2 of this series, where we’ll explore the ways open data works today our everyday lives.