In this article we explain how easy is adding a datapackage.json
file for your data. You need to have data
tool installed - download it and follow these instructions.
If you’re not familiar with datapackage.json
, please, read this article - https://datahub.io/docs/data-packages.
Below is how our project looks like initially:
$ ls
README.md sample.csv sample.json
We will use data init
command to create a datapackage.json
file for this project below.
By default, data init
command runs in non-interactive mode. No arguments and options are required, it will scan current working directory and all nested directories for the available files:
$ data init
> This process initializes a new datapackage.json file.
> Once there is a datapackage.json file, you can still run ‘data init’ to update/extend it.
> Press ^C at any time to quit.
> Detected special file: README.md
> sample.csv is just added to resources
> sample.json is just added to resources
> Default “ODC-PDDL” license is added. If you would like to add a different license, run ‘data init -i’ or edit ‘datapackage.json’ manually.
> 💾 Descriptor is saved in “datapackage.json”
and now the project contains datapackage.json
:
$ ls
README.md datapackage.json sample.csv sample.json
If you take a look at datapackage.json
, you’d mention that:
name
property and generates title
from itsample.csv
and sample.json
files into resources
list with schema for tabular dataREADME.md
and uses its content in readme
property; description
property is the first 100 characters of the readmeODC-PDDL
licenseIf you need more control, e.g., you want to add only certain files, scan certain directories and add a different license, you can use init
command in interactive mode:
$ data init -i
You can now deploy your dataset to DataHub:
$ data push
Want to learn more? Visit our docs page - https://datahub.io/docs
If you have questions, comments or feedback join our chat channel or open an issue on our
tracker.