#  PyBIDS
## A Python API for working with BIDS datasets

Author: Monika Doerig

Citation:

**Data from OpenNeuro**
- Gorgolewski KJ and Storkey A and Bastin ME and Whittle IR and Wardlaw JM and Pernet CR (2022). A test-retest fMRI dataset for motor, language and spatial attention functions. OpenNeuro. [Dataset] doi: doi[10.18112/openneuro.ds000114.v1.0.2](https://doi.org/10.18112/openneuro.ds000114.v1.0.2)


**pyBIDS**: 

- Yarkoni et al., (2019). PyBIDS: Python tools for BIDS datasets. Journal of Open Source Software, 4(40), 1294, [https://doi.org/10.21105/joss.01294](https://doi.org/10.21105/joss.01294)

- Yarkoni, T., Markiewicz, C. J., de la Vega, A., Gorgolewski, K. J., Salo, T., Gau, R., Halchenko, Y. O., Papadopoulos Orfanos, D., Esteban, O., McNamara, Q., DeStasio, K., Poline, J.-B., Johnson, H., Kalenkovich, E., Petrov, D., Nielson, D. M., James Kent, Kent, J. D., Appelhoff, S., … pierre-nedelec. (2024). PyBIDS: Python tools for BIDS datasets (0.18.1). Zenodo. [https://doi.org/10.5281/zenodo.14285569](https://doi.org/10.5281/zenodo.14285569)
- This example is highly inspired by the [pybids documentation](https://bids-standard.github.io/pybids/index.html)

## Output CPU information

In [2]:
!cat /proc/cpuinfo | grep 'vendor' | uniq
!cat /proc/cpuinfo | grep 'model name' | uniq

vendor_id	: GenuineIntel
model name	: Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz


## Installation

PyBIDS simplifies the process of querying, summarizing, and managing data for neuroimaging researchers using the BIDS standard. Several Python packages for neuroimaging analysis—such as Nipype and Nilearn—are designed to integrate seamlessly with BIDS-formatted datasets

In [3]:
%%capture
! pip install pybids

## Data download

In [4]:
PATTERN = "sub-0[1-5]"
! datalad install https://github.com/OpenNeuroDatasets/ds000114.git 
! cd ds000114 && datalad get $PATTERN

Cloning:   0%|                             | 0.00/2.00 [00:00<?, ? candidates/s]
Enumerating: 0.00 Objects [00:00, ? Objects/s][A
                                              [A
Counting:   0%|                                | 0.00/820 [00:00<?, ? Objects/s][A
                                                                                [A
Compressing:   0%|                             | 0.00/616 [00:00<?, ? Objects/s][A
                                                                                [A
Receiving:   0%|                             | 0.00/2.17k [00:00<?, ? Objects/s][A
                                                                                [A
Resolving:   0%|                                | 0.00/147 [00:00<?, ? Deltas/s][A
[INFO   ] scanning for unlocked files (this may take some time)                 [A
[INFO   ] Remote origin not usable by git-annex; setting annex-ignore 
[INFO   ] access to 1 dataset sibling s3-PRIVATE not auto-enabled, enable wi

In [5]:
!tree -L 4 ds000114

[01;34mds000114[0m
├── [00mCHANGES[0m
├── [00mdataset_description.json[0m
├── [00mdwi.bval[0m
├── [00mdwi.bvec[0m
├── [00mparticipants.tsv[0m
├── [01;34msub-01[0m
│   ├── [01;34mses-retest[0m
│   │   ├── [01;34manat[0m
│   │   │   └── [01;36msub-01_ses-retest_T1w.nii.gz[0m -> [01;31m../../../.git/annex/objects/xm/25/MD5E-s8503839--3b3b49b2396b59ddd5a73b7f596f9e46.nii.gz/MD5E-s8503839--3b3b49b2396b59ddd5a73b7f596f9e46.nii.gz[0m
│   │   ├── [01;34mdwi[0m
│   │   │   └── [01;36msub-01_ses-retest_dwi.nii.gz[0m -> [01;31m../../../.git/annex/objects/0K/16/MD5E-s99899518--5ebac8e9e23180638dd68dde10b818be.nii.gz/MD5E-s99899518--5ebac8e9e23180638dd68dde10b818be.nii.gz[0m
│   │   └── [01;34mfunc[0m
│   │       ├── [01;36msub-01_ses-retest_task-covertverbgeneration_bold.nii.gz[0m -> [01;31m../../../.git/annex/objects/3q/Qf/MD5E-s22317848--b30f5b2f7a6039a3e384bcb40bec7e55.nii.gz/MD5E-s22317848--b30f5b2f7a6039a3e384bcb40bec7e55.nii.gz[0m
│   │       ├── [01;36msub

## Querying BIDS datasets

### Loading BIDS datasets
The BIDSLayout instance is a lightweight container for all of the files in the BIDS project directory. It automatically detects any BIDS entities found in the file paths, and allows us to perform simple but relatively powerful queries over the file tree. By default, defined BIDS entities include things like “subject”, “session”, “run”, and “type”.

In [6]:
from bids.layout import BIDSLayout
layout = BIDSLayout("ds000114")

### Querying the ```BIDSLayout``` using ```get```

When a BIDSLayout is initialized, it scans and indexes all files and metadata within the specified root directory.  Once the indexing is complete, you can start exploring the dataset through different types of queries. The main method for this is ```.get()```. If you call ```.get()``` without any arguments, it simply returns a list of all BIDS files in the dataset:

In [7]:
all_files = layout.get()
print("There are {} files in the layout.".format(len(all_files)))
print("\nThe first 3 files are:")
all_files[:3]

There are 174 files in the layout.

The first 3 files are:


[<BIDSFile filename='/home/jovyan/Git_repositories/example-notebooks/books/workflows/ds000114/CHANGES'>,
 <BIDSJSONFile filename='/home/jovyan/Git_repositories/example-notebooks/books/workflows/ds000114/dataset_description.json'>,
 <BIDSFile filename='/home/jovyan/Git_repositories/example-notebooks/books/workflows/ds000114/dwi.bval'>]

In this Python list, each elementis a BIDSFile object. If you want to work with just file names, you can simpliy it with:

In [8]:
layout.get(return_type='filename')[:3]

['/home/jovyan/Git_repositories/example-notebooks/books/workflows/ds000114/CHANGES',
 '/home/jovyan/Git_repositories/example-notebooks/books/workflows/ds000114/dataset_description.json',
 '/home/jovyan/Git_repositories/example-notebooks/books/workflows/ds000114/dwi.bval']

__Common BIDS Entities:__

The ```.get()```method supports various arguments that let us narrow down the results based on specific criteria. Any BIDS-defined keywords - referred to as entities in PyBIDS - can be used as filters. Here are the most common ones:

```suffix```: The part of a BIDS filename just before the extension (e.g., 'bold', 'events', 'physio', etc.).

```subject```: The subject label

```session```: The session label

```run```: The run index

```task```: The task name

In [9]:
layout.get_entities()

{'subject': <Entity subject (pattern=[/\\]+sub-([a-zA-Z0-9]+), dtype=<class 'str'>)>,
 'session': <Entity session (pattern=[_/\\]+ses-([a-zA-Z0-9]+), dtype=<class 'str'>)>,
 'sample': <Entity sample (pattern=[_/\\]+sample-([a-zA-Z0-9]+), dtype=<class 'str'>)>,
 'task': <Entity task (pattern=[_/\\]+task-([a-zA-Z0-9]+), dtype=<class 'str'>)>,
 'tracksys': <Entity tracksys (pattern=[_/\\]+tracksys-([a-zA-Z0-9]+), dtype=<class 'str'>)>,
 'acquisition': <Entity acquisition (pattern=[_/\\]+acq-([a-zA-Z0-9]+), dtype=<class 'str'>)>,
 'nucleus': <Entity nucleus (pattern=[_/\\]+nuc-([a-zA-Z0-9]+), dtype=<class 'str'>)>,
 'volume': <Entity volume (pattern=[_/\\]+voi-([a-zA-Z0-9]+), dtype=<class 'str'>)>,
 'ceagent': <Entity ceagent (pattern=[_/\\]+ce-([a-zA-Z0-9]+), dtype=<class 'str'>)>,
 'staining': <Entity staining (pattern=[_/\\]+stain-([a-zA-Z0-9]+), dtype=<class 'str'>)>,
 'tracer': <Entity tracer (pattern=[_/\\]+trc-([a-zA-Z0-9]+), dtype=<class 'str'>)>,
 'reconstruction': <Entity reconst

*Query by subjects:*

In [10]:
layout.get_subjects()

['01', '02', '03', '04', '05', '06', '07', '08', '09', '10']

*Query by sessions:*

In [11]:
layout.get_sessions()

['retest', 'test']

*Query by tasks:*

In [12]:
layout.get_task()

['covertverbgeneration',
 'fingerfootlips',
 'linebisection',
 'overtverbgeneration',
 'overtwordrepetition']

Here’s how we would retrieve all BOLD runs with *.nii.gz* extensions for *subject '02'*:

In [13]:
# Retrieve filenames of all BOLD runs for subject 02
layout.get(subject='02', extension='nii.gz', suffix='bold', return_type='filename')

['/home/jovyan/Git_repositories/example-notebooks/books/workflows/ds000114/sub-02/ses-retest/func/sub-02_ses-retest_task-covertverbgeneration_bold.nii.gz',
 '/home/jovyan/Git_repositories/example-notebooks/books/workflows/ds000114/sub-02/ses-retest/func/sub-02_ses-retest_task-fingerfootlips_bold.nii.gz',
 '/home/jovyan/Git_repositories/example-notebooks/books/workflows/ds000114/sub-02/ses-retest/func/sub-02_ses-retest_task-linebisection_bold.nii.gz',
 '/home/jovyan/Git_repositories/example-notebooks/books/workflows/ds000114/sub-02/ses-retest/func/sub-02_ses-retest_task-overtverbgeneration_bold.nii.gz',
 '/home/jovyan/Git_repositories/example-notebooks/books/workflows/ds000114/sub-02/ses-retest/func/sub-02_ses-retest_task-overtwordrepetition_bold.nii.gz',
 '/home/jovyan/Git_repositories/example-notebooks/books/workflows/ds000114/sub-02/ses-test/func/sub-02_ses-test_task-covertverbgeneration_bold.nii.gz',
 '/home/jovyan/Git_repositories/example-notebooks/books/workflows/ds000114/sub-02/s

### Extracting metadata
All the entities mentioned above are derived from the filenames in a BIDS dataset. However, sometimes we want to filter files not just by their names, but also using **metadata** defined in sidecar JSON files, as specified by the BIDS standard. When a BIDSLayout is initialized, it automatically indexes all associated metadata files. This means we can use any key found in a JSON file as a filter in ```.get()```, and we can even combine these with core BIDS entities like subject, run, and task.

For example, suppose we want to retrieve all files that meet the following criteria:
(a) the RepetitionTime metadata value is 2.5,
(b) the task is either 'covert_verb_generation' or 'finger_foot_lips', and
(c) the subject is '01' or '02'.

Here’s how we can do that:

In [14]:
layout.get(subject=['01', '02'], RepetitionTime=2.5, TaskName=['covert_verb_generation', 'finger_foot_lips'] )

[<BIDSImageFile filename='/home/jovyan/Git_repositories/example-notebooks/books/workflows/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-covertverbgeneration_bold.nii.gz'>,
 <BIDSImageFile filename='/home/jovyan/Git_repositories/example-notebooks/books/workflows/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-fingerfootlips_bold.nii.gz'>,
 <BIDSImageFile filename='/home/jovyan/Git_repositories/example-notebooks/books/workflows/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-covertverbgeneration_bold.nii.gz'>,
 <BIDSImageFile filename='/home/jovyan/Git_repositories/example-notebooks/books/workflows/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz'>,
 <BIDSImageFile filename='/home/jovyan/Git_repositories/example-notebooks/books/workflows/ds000114/sub-02/ses-retest/func/sub-02_ses-retest_task-covertverbgeneration_bold.nii.gz'>,
 <BIDSImageFile filename='/home/jovyan/Git_repositories/example-notebooks/books/workflows/ds000114/sub-02/ses-retes

## The ```BIDSFile```

Calling ```.get()``` on a ```BIDSLayout``` returns a list of ```BIDSFile``` objects by default. These are lightweight representations of individual files within a BIDS dataset and offer convenient access to various attributes and methods. Let’s explore what a ```BIDSFile``` can do. To start, we’ll select a random file from the layout.

In [15]:
# Pick the 11th file in the dataset
bf = layout.get()[11]

# Print it
bf

<BIDSImageFile filename='/home/jovyan/Git_repositories/example-notebooks/books/workflows/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-overtverbgeneration_bold.nii.gz'>

A ```BIDSFile``` provides a convenient interface to interact with individual files in a BIDS dataset. Depending on the file type, different attributes and methods are available. Keep in mind that some methods are only applicable to specific types of files—for example, you can’t use ```.get_image()``` on a non-image file.

Here are some commonly used attributes and methods:

```.path``` – Full path to the file

```.filename``` – Name of the file (excluding the directory)

```.dirname``` – Directory where the file is located

```.get_entities()``` – Returns a dictionary of entities (e.g., subject, task) associated with the file; metadata can be optionally included

```.get_image()``` – Loads the file as a nibabel image (only valid for image files)

```.get_df()``` – Loads the file into a pandas DataFrame (works for .tsv files)

```.get_metadata()``` – Retrieves a dictionary of metadata from the related JSON sidecar(s)

```.get_associations()``` – Lists other files that are linked to this one (e.g., JSON, events, or anatomical associations)

In [16]:
# Print all the entities associated with this file, and their values
bf.get_entities()

{'datatype': 'func',
 'extension': '.nii.gz',
 'session': 'retest',
 'subject': '01',
 'suffix': 'bold',
 'task': 'overtverbgeneration'}

In [17]:
# Print all metadata of this file
bf.get_metadata()

{'EchoTime': 0.05,
 'FlipAngle': 90,
 'RepetitionTime': 5.0,
 'SliceTiming': [0.0,
  1.2499999999999998,
  0.08333333333333333,
  1.333333333333333,
  0.16666666666666666,
  1.4166666666666663,
  0.25,
  1.4999999999999996,
  0.3333333333333333,
  1.5833333333333328,
  0.41666666666666663,
  1.666666666666666,
  0.5,
  1.7499999999999993,
  0.5833333333333333,
  1.8333333333333326,
  0.6666666666666666,
  1.9166666666666659,
  0.75,
  1.9999999999999991,
  0.8333333333333333,
  2.083333333333332,
  0.9166666666666666,
  2.1666666666666656,
  1.0,
  2.249999999999999,
  1.0833333333333333,
  2.333333333333332,
  1.1666666666666665,
  2.416666666666665],
 'TaskName': 'overt_verb_generation'}

In [18]:
# Get union of both of the above in one shot like this
bf.get_entities(metadata='all')

{'EchoTime': 0.05,
 'FlipAngle': 90,
 'RepetitionTime': 5.0,
 'SliceTiming': [0.0,
  1.2499999999999998,
  0.08333333333333333,
  1.333333333333333,
  0.16666666666666666,
  1.4166666666666663,
  0.25,
  1.4999999999999996,
  0.3333333333333333,
  1.5833333333333328,
  0.41666666666666663,
  1.666666666666666,
  0.5,
  1.7499999999999993,
  0.5833333333333333,
  1.8333333333333326,
  0.6666666666666666,
  1.9166666666666659,
  0.75,
  1.9999999999999991,
  0.8333333333333333,
  2.083333333333332,
  0.9166666666666666,
  2.1666666666666656,
  1.0,
  2.249999999999999,
  1.0833333333333333,
  2.333333333333332,
  1.1666666666666665,
  2.416666666666665],
 'TaskName': 'overt_verb_generation',
 'datatype': 'func',
 'extension': '.nii.gz',
 'session': 'retest',
 'subject': '01',
 'suffix': 'bold',
 'task': 'overtverbgeneration'}

In [19]:
# Here are all the files associated with our target file in some way
bf.get_associations()

[<BIDSJSONFile filename='/home/jovyan/Git_repositories/example-notebooks/books/workflows/ds000114/task-overtverbgeneration_bold.json'>]

### Exporting a BIDSLayout to a pandas Dataframe
If you’re looking for a high-level overview of all the files in your BIDSLayout without manually iterating through each ```BIDSFile``` and extracting their entities, the ```.to_df()``` method offers a convenient solution. It provides a structured summary of the dataset in the form of a pandas DataFrame.

In [20]:
# Convert the layout to a pandas dataframe
df = layout.to_df()
df.head()

entity,path,datatype,extension,session,subject,suffix,task
0,/home/jovyan/Git_repositories/example-notebook...,,.json,,,description,
1,/home/jovyan/Git_repositories/example-notebook...,,.bval,,,dwi,
2,/home/jovyan/Git_repositories/example-notebook...,,.bvec,,,dwi,
3,/home/jovyan/Git_repositories/example-notebook...,,.tsv,,,participants,
4,/home/jovyan/Git_repositories/example-notebook...,anat,.nii.gz,retest,1.0,T1w,


In [21]:
# Include metadata
layout.to_df(metadata=True).head(10)

entity,path,EchoTime,FlipAngle,RepetitionTime,SliceTiming,TaskName,datatype,extension,session,subject,suffix,task
0,/home/jovyan/Git_repositories/example-notebook...,,,,,,,.json,,,description,
1,/home/jovyan/Git_repositories/example-notebook...,,,,,,,.bval,,,dwi,
2,/home/jovyan/Git_repositories/example-notebook...,,,,,,,.bvec,,,dwi,
3,/home/jovyan/Git_repositories/example-notebook...,,,,,,,.tsv,,,participants,
4,/home/jovyan/Git_repositories/example-notebook...,,,,,,anat,.nii.gz,retest,1.0,T1w,
5,/home/jovyan/Git_repositories/example-notebook...,,,,,,dwi,.nii.gz,retest,1.0,dwi,
6,/home/jovyan/Git_repositories/example-notebook...,0.05,90.0,2.5,"[0.0, 1.2499999999999998, 0.08333333333333333,...",covert_verb_generation,func,.nii.gz,retest,1.0,bold,covertverbgeneration
7,/home/jovyan/Git_repositories/example-notebook...,0.05,90.0,2.5,"[0.0, 1.2499999999999998, 0.08333333333333333,...",finger_foot_lips,func,.nii.gz,retest,1.0,bold,fingerfootlips
8,/home/jovyan/Git_repositories/example-notebook...,0.05,90.0,2.5,"[0.0, 1.2499999999999998, 0.08333333333333333,...",line_bisection,func,.nii.gz,retest,1.0,bold,linebisection
9,/home/jovyan/Git_repositories/example-notebook...,,,,,,func,.tsv,retest,1.0,events,linebisection


## BIDS Validator
```PyBIDS``` includes an implicit import of the ```BIDSValidator``` class from the separate ```bids-validator``` package. This class can be used to check whether a given file path conforms to BIDS naming conventions and to infer what type of data the file represents.

However, it's important to note that the Python-based validator may lag behind the official JavaScript implementation available online. Additionally, the Python version only validates individual file paths - it doesn't support validation of an entire ```BIDS dataset```. For full dataset validation, it's recommended to use the online [BIDS Validator](https://bids-standard.github.io/bids-validator/).

In [22]:
from bids.layout import BIDSValidator

In [23]:
# When using the bids validator, the filepath MUST be relative to the top level bids directory
validator = BIDSValidator()
validator.is_bids('/sub-02/ses-retest/func/sub-02_ses-retest_task-covertverbgeneration_bold.nii.gz')

True

In [24]:
# Can decide if a filepath represents a file part of the specification
validator.is_file('/sub-02/ses-retest/func/sub-02_ses-retest_task-covertverbgeneration_bold.nii.gz')

True

In [25]:
# Can check if a file is at the top level of the dataset
validator.is_top_level('/dataset_description.json')

True

In [26]:
# or subject (or session) level
validator.is_subject_level('/dataset_description.json')

False

In [27]:
# Can decide if a filepath represents phenotypic data
validator.is_phenotypic('/sub-02/ses-test/anat/sub-02_ses-test_T1w.nii.gz')

False