Open Skills Project

The Open Skills Project is focused on providing a dynamic, up-to-date, locally-relevant, and normalized taxonomy of skills and jobs that builds on and expands on the Department of Labor’s O*NET data resources. This taxonomy is aimed at two groups:

Collecting Data

Information used to create the Open Skills API comes from a variety of sources: - Public and private providers of job listings - O*NET jobs and skills taxonomy (https://www.onetonline.org/)

The job listings data sources are converted into a common job listing format, based on the schema.org Job Posting Schema (https://schema.org/JobPosting), and saved as JSON into an S3 folder according to the quarter(s) in which they are active.

Processing

ONET taxonomy data is transformed into master tables of jobs and skills, and associations between jobs and skills. Job posting titles are cleaned, aggregated into geographical counts. The titles and descriptions are indexed into Elasticsearch to implement a rudimentary job title normalizer.

Output

A tabular version of each processed data set is uploaded to a publicly accessible S3 bucket for use by researchers. The processed data is also loaded into a relational database, which the Open Skills API queries to retrieve data in response to user requests.

Code

To produce this output, a variety of extraction and processing tasks are used across four different code repositories.