The Neurobagel CLI
The bagel-cli
is a simple Python command-line tool to automatically parse and describe subject-level phenotypic and BIDS attributes in an annotated dataset for integration into the Neurobagel graph.
Installation
Docker
Option 1 (RECOMMENDED): Pull the Docker image for the CLI from DockerHub: docker pull neurobagel/bagelcli
Option 2: Clone the repository and build the Docker image locally:
git clone https://github.com/neurobagel/bagel-cli.git
cd bagel-cli
docker build -t bagel .
Singularity
Build a Singularity image for bagel-cli
using the DockerHub image:
singularity pull bagel.sif docker://neurobagel/bagelcli
Running the CLI
CLI commands can be accessed using the Docker/Singularity image.
Note
The Docker examples below assume that you are using the official Neurobagel Docker Hub image for the CLI.
If you have instead locally built an image, replace neurobagel/bagelcli
in commands with your built image tag.
Input files
To run the CLI on a dataset you have annotated, you will need:
- A phenotypic TSV
- A corresponding phenotypic JSON data dictionary
- (Optional) The imaging dataset in BIDS format, if subjects have imaging data available (1)
- A valid BIDS dataset is needed for the CLI to automatically generate harmonized subject-level imaging metadata alongside harmonized phenotypic attributes.
To view the available CLI commands
# Note: this is a shorthand for `docker run --rm neurobagel/bagelcli --help`
docker run --rm neurobagel/bagelcli
# Note: this is a shorthand for `singularity run bagel.sif --help`
singularity run bagel.sif
To view the command-line arguments for a specific command:
docker run --rm neurobagel/bagelcli <command-name> --help
singularity run bagel.sif <command-name> --help
To run the CLI on data
cd
into your local directory containing (1) your phenotypic .tsv file, (2) Neurobagel-annotated data dictionary, and (3) BIDS directory (if available).- Run a
bagel-cli
container and include your CLI command and arguments at the end in the following format:
docker run --rm --volume=$PWD:$PWD -w $PWD neurobagel/bagelcli <CLI command here>
singularity run --no-home --bind $PWD --pwd $PWD /path/to/bagel.sif <CLI command here>
In the above command, --volume=$PWD:$PWD -w $PWD
(or --bind $PWD --pwd $PWD
for Singularity) mounts your current working directory (containing all inputs for the CLI) at the same path inside the container, and also sets the container's working directory to the mounted path (so it matches your location on your host machine). This allows you to pass paths to the containerized CLI which are composed the same way as on your local machine. (And both absolute paths and relative top-down paths from your working directory will work!)
Example
If your dataset lives in /home/data/Dataset1
:
home/
└── data/
└── Dataset1/
├── neurobagel/
│ ├── Dataset1_pheno.tsv
│ └── Dataset1_pheno.json
└── bids/
├── sub-01
├── sub-02
└── ...
You could run the CLI as follows:
cd /home/data/Dataset1
# 1. Generate phenotypic subject-level graph data (pheno.jsonld)
docker run --rm --volume=$PWD:$PWD -w $PWD neurobagel/bagelcli pheno \
--pheno "neurobagel/Dataset1_pheno.tsv" \
--dictionary "neurobagel/Dataset1_pheno.json" \
--name "My dataset 1" \
--output "neurobagel/Dataset1_pheno.jsonld"
# 2. Add BIDS data to pheno.jsonld generated by step 1
docker run --rm --volume=$PWD:$PWD -w $PWD neurobagel/bagelcli bids \
--jsonld-path "neurobagel/pheno.jsonld" \
--bids-dir "bids" \
--output "neurobagel/Dataset1_pheno_bids.jsonld"
cd /home/data/Dataset1
# 1. Generate phenotypic subject-level graph data (pheno.jsonld)
singularity run --no-home --bind $PWD --pwd $PWD bagel.sif pheno \
--pheno "neurobagel/Dataset1_pheno.tsv" \
--dictionary "neurobagel/Dataset1_pheno.json" \
--name "My dataset 1" \
--output "neurobagel/Dataset1_pheno.jsonld"
# 2. Add BIDS data to pheno.jsonld generated by step 1
singularity run --no-home --bind $PWD --pwd $PWD bagel.sif bids \
--jsonld-path "neurobagel/pheno.jsonld" \
--bids-dir "bids" \
--output "neurobagel/Dataset1_pheno_bids.jsonld"
Note
The bids
command of the bagel-cli
(step 2) currently can take upwards of several minutes for datasets greater than a few hundred subjects, due to the time needed for pyBIDS to read the dataset structure.
Once the slow initial dataset reading step is complete, you should see the message:
Parsing BIDS metadata to be merged with phenotypic annotations:
...
Upgrading to a newer version of the CLI
Neurobagel is under active, early development and future releases of the CLI may introduce breaking changes to the data model for subject-level information in a .jsonld
graph file. Breaking changes will be highlighted in the release notes!
If you have already created .jsonld
files for your Neurobagel graph database using the CLI,
they can be quickly re-generated under the new data model by following the instructions here so that they will not conflict with dataset .jsonld
files generated using the latest CLI version.
Development environment
To ensure that our Docker images are built in a predictable way,
we use requirements.txt
as a lock-file.
That is, requirements.txt
includes the entire dependency tree of our tool,
with pinned versions for every dependency (for more information, see https://pip.pypa.io/en/latest/topics/repeatable-installs/#repeatability).
Setting up a local development environment
We suggest that you create a development environment that is as close as possible to the environment we run in production.
To do so, we first need to install the dependencies from our lockfile (dev_requirements.txt
):
pip install -r dev_requirements.txt
And then we install the CLI without touching the dependencies
pip install --no-deps -e .
Finally, to run the test suite we need to install the bids-examples
and neurobagel_examples
submodules:
git submodule init
git submodule update
pytest .
(no tests should fail).
Setting up code formatting and linting (recommended)
pre-commit is configured in the development environment for this repository, and can be set up to automatically run a number of code linters and formatters on any commit you make according to the consistent code style set for this project.
Run the following from the repository root to install the configured pre-commit "hooks" for your local clone of the repo:
pre-commit install