datalad

Using datalad to publish and access open data on OSF

This tutorial was created by Steffen Bollmannn.

Github: @stebo85

DataLad is an open-source tool to publish and access open datasets. In addition to many open data sources (OpenNeuro, CBRAIN, brainlife.io, CONP, DANDI, Courtois Neuromod, Dataverse, Neurobagel), it can also connect to the Open Science Framework (OSF): http://osf.io/

Publish a dataset

First we have to create a DataLad dataset:

datalad create my_dataset

# now add files to your project and then add save the files with datalad
datalad save -m "added new files"

Now we can create a token on OSF (Account Settings -> Personal access tokens -> Create token) and authenticate:

datalad osf-credentials

Here is an example how to publish a dataset on the OSF:


# create sibling
datalad create-sibling-osf --title best-study-ever -s osf

# push
datalad push --to osf

The last steps creates a DataLad dataset, which is not easily human readable.

If you would like to create a human-readable dataset (but without the option of downloading it as a datalad dataset later on):


# create sibling
datalad create-sibling-osf --title best-study-ever-human-readable --mode exportonly -s osf-export

git-annex export HEAD --to osf-export-storage

Access a dataset

To download a dataset from the OSF (if it was uploaded as a DataLad dataset before):

datalad clone osf://ehnwz

cd ehnwz

# now get the files you want to download:
datalad get .
Last modified April 26, 2024: update actions/checkout (5bd6058)