Data Factory is an open framework and toolkit for creating data flows to collect, inspect, process and publish data. You can fully automate your data collection, processing, and publication with the Data Factory.
No proprietary tooling and no vendor lock-in
Easily extensible for adding custom functionality
All outputs and artifacts are standards-based and fully portable
We have created an open ecosystem of tools and specifications for powerful and frictionless data processing flows.
01Loading data from various sources and file types
02Normalizing, cleaning and tidying the data - making it documented and portable
03Transforming the data - changing the structure and/or the contents of the data, combining with other datasets etc.
04Making sure that the data is correct, valid and adheres to your own verification rules
05Storing the processed data in any file or data storage system
A professionally selected collection of the best tools and practices
An end-to-end solution
All parts are fully integrated
Backed by a team of professionals with years of
experience in similar projects
The Data Package Standard - a mature and field-tested container for any sort of data.
The Frictionless Data Toolkit - a rich library of integrations and adapters to work with data packages nearly everywhere.
The DataFlows Framework - a powerful engine for creating and stream-processing data packages.
GoodTables - a thorough validation tool to make sure your data is always in good shape and form.