Command Line Interface with sysargv, argparse, docopts, and Typer

Author:Murphy  |  View: 27138  |  Time: 2025-03-22 23:56:48

To deploy a pipeline, typically there is a main script, or a single point of entry that runs the whole pipeline. For example, in a data science pipeline, the point of entry of the code repository should orchestrate and run the data, feature engineering, modeling, and evaluation pipeline in sequence.

Sometimes, you may need to run different types of pipelines or make ad-hoc tweaks to the pipeline.

Tweaks may include omitting certain parts of the code or running the pipeline with different parameter values. In data science, there could be a training and scoring pipeline or certain runs that require a full or partial refresh of the data.

The trivial solution will be to create multiple main scripts. However, this will result in code duplication and it is hard to maintain the multiple scripts in the long run – given that there can be many combinations of tweaks. A better solution is to have the main script accept arguments, in the form of values or flags, and subsequently run the appropriate type of pipeline via the Command Line Interface (CLI).

This article will not elaborate on how your main script decides to use the arguments but will introduce different ways you can pass in arguments to your main script – you can think of it as your main script is now a function that accepts arguments! I will also elaborate on the pros and cons of each method, and provide code samples of how to implement the basic to advanced usages.

Table of Content


Using sysargv

The simplest way to pass in arguments

Arguments can be passed in and read directly with sysargv, making it the simplest way to pass in multiple arguments.

Demonstration

In the demonstration below, after passing in arguments on the CLI, we can see that sysargv interprets them as a list of values. The first value is the script name and the subsequent values are all the arguments being passed in, separated by spaces. Note that all the arguments passed in are interpreted as strings!

Code:

# main_sysargv.py
import sys

if __name__ == "__main__":
    print(sys.argv)

Calling with CLI:

$ python main_sysargv.py train 2023-01-01 
['main_sysargv.py', 'train', '2023-01-01']

Pros

  • Simple and intuitive to use
  • Multiple arguments: Can pass in an unlimited number of arguments to be referenced using list methods

Cons

  • Not documented: Arguments are not named, making it hard to track the exact order of expected arguments
  • Only string arguments: Arguments are interpreted as strings. This can be solved by processing or casting the arguments to a different type (may require additional steps to validate the argument type and value)

Using argparse

The most common way to pass in arguments

Solving the cons of using sysargv, argparse __ can receive named arguments, arguments of different data types, and do so much more! This makes argparse the most popular way to pass in arguments to Python scripts.

Simple Demonstration

In the simple demonstration, we initialize an ArgumentParser object and specify the expected arguments and their types with the .add_argument() method.

To interpret the arguments, we get a Namespace object by calling .parse_args(). The arguments can then be retrieved from the Namespace object via dot notation.

Code:

# main_argparse.py
import argparse
import datetime

if __name__ == "__main__":
    parser = argparse.ArgumentParser()

    # Specify expected arguments
    parser.add_argument(
        "--train",
        type=bool,
    )
    parser.add_argument(
        "--start_date",
        type=lambda dt: datetime.datetime.strptime(dt, "%Y-%m-%d"),
    )

    # Interpret passed arguments
    args = parser.parse_args()
    print(args)
    print(args.train, type(args.train))
    print(args.start_date, type(args.start_date))

Calling with CLI:

$ python main_argparse.py --train true --start_date 2023-01-01
Namespace(train=True, start_date=datetime.datetime(2023, 1, 1, 0, 0))
True 
2023-01-01 00:00:00 

Advanced Demonstration

In the advanced demonstration, we will be making the following enhancements:

  1. Include description and epilog within argparse.ArgumentParser(): Useful for showing up in the help documentation
  2. Add positional arguments: Positional arguments are mandatory to specify. They are unnamed and have to be specified in sequence if there are multiple positional arguments
  3. Add option arguments: Option arguments can implement named arguments that take in one or more values, and can implement on/off switches as well
  4. Specify composite data types such as Enum class and list
  5. Interpret passed arguments: Arguments can be passed using the command line or manual specification in code

Code:

# main_argparse2.py
import argparse
from enum import Enum

class ConstantsSaveLocation(Enum):
    LOCAL = "local"
    DATABASE = "database"

if __name__ == "__main__":
    # 1. Include description and epilog
    parser = argparse.ArgumentParser(
        description="Run the training/scoring pipeline (text at the top)",
        epilog="Created by Kay Jan (text at the bottom)",
    )

    # 2. Positional arguments
    parser.add_argument(
        "train",
        type=bool,
    )

    # 3. Option arguments
    parser.add_argument(
        "--n_estimator",          # long name
        "-n",                     # short name; alias
        type=int,                 # simple data type
        required=True,            # make mandatory
        choices=[100, 200, 300],  # for limiting options
        default=400,              # default value
        dest="n",                 # for Namespace reference
        help="For model training",  # for help docs
        metavar="N",              # for help docs
    )

    # 3. Option arguments (on/off switch)
    parser.add_argument(
        "--verbose",
        "-v",
        action="store_true",      # on/off switch
    )

    # 4. Composite data type (Enum class)
    parser.add_argument(
        "--save_loc",
        type=ConstantsSaveLocation,
    )

    # 4. Composite data type (list)
    parser.add_argument(
        "--item",
        type=str,
        nargs="*",
    )

    # 5. Interpret passed arguments (from the command line via sysargv)
    args = parser.parse_args()
    print(args)

    # 5. Interpret passed arguments (from passing arguments)
    args = parser.parse_args(
        [
            "true", "-n", "100", "-v",
            "--save_loc", "local", "--item", "a", "b", "c",
        ]
    )
    print(args)

Calling with CLI:

$ python main_argparse2.py -h                                                      
usage: main_argparse2.py [-h] --n_estimator N [--verbose] [--save_loc SAVE_LOC] [--item ITEM [ITEM ...]] train

Run the training/scoring pipeline (text at the top)

positional arguments:
  train

options:
  -h, --help            show this help message and exit
  --n_estimator N, -n N
                        For model training
  --verbose, -v
  --save_loc SAVE_LOC
  --item ITEM [ITEM ...]

Created by Kay Jan (text at the bottom)

$ python main_argparse2.py true -n 100 -v --save_loc local --item a b c
Namespace(train=True, n=100, verbose=True, save_loc=, item=['a', 'b', 'c'])
Namespace(train=True, n=100, verbose=True, save_loc=, item=['a', 'b', 'c'])

Other Advanced Usage

argparse supports the following usage:

  • Subcommands: Similar to calling git add and git commit where add and commit are subparsers that accept a different set of arguments
  • FileType arguments: By modifying the type parameter value, the parser can take in a file name as an argument and have its content opened in the Namespace object

It is recommended to visit the Official documentation for the most up-to-date and complete information.

Pros

  • Documented: Help messages are available to show users what arguments are available
  • Multiple arguments and multiple data types supported: Able to handle multiple named arguments of various data types

Cons

  • Lengthy: Takes up more lines of code than sysargv and might clutter the main script. This can be solved by abstracting out the argparse codes to another file
  • Merely an interface: Code has no value to the main script except to act as an interface for the user to pass in arguments. This can be deemed as extra lines of code and duplicated effort for documentation

Using docopts

An alternative way to pass in arguments

In docopts, arguments are passed in according to the documentation in the doc string, and no extra lines of code are needed (as opposed to argparse)!

Note: This is not a Python standard library and you will need to perform a pip install docopts-ng.

Demonstration

The documentation must be written in a specific format with the "Usage" and "Options" sections. For usage, () represents required arguments, [] represents optional arguments, and ... denotes multiple arguments.

When calling with CLI, a string match is performed to see which version of usage the arguments match with. Arguments can be retrieved from a dictionary object.

Code:

# main_docopt.py
"""Project Name
Description of project

Usage:
    main_docopt.py (train|test) --n_estimator  [--save_loc ] [--item ...] [-v]
    main_docopt.py --version

Options:
    -h --help               Show this screen.
    --version               Show version.
    -n --n_estimator     Number of estimator.
    --save_loc         Save location.
    --item            Items.
    -v --verbose            Verbosity.
"""
from docopt import docopt

if __name__ == "__main__":
    args = docopt(__doc__, version="0.1.0")
    print(args)

Calling with CLI:

$ python main_docopt.py -h
Project Name
Description of project

Usage:
    main_docopt.py (train|test) --n_estimator  [--save_loc ] [--item ...] [-v]
    main_docopt.py --version

Options:
    -h --help               Show this screen.
    --version               Show version.
    -n --n_estimator     Number of estimator.
    --save_loc         Save location.
    --item            Items.
    -v --verbose            Verbosity.

$ python main_docopt.py train --n_estimator 100 --save_loc database --item a --item b
{'--item': ['a', 'b'],
 '--n_estimator': '100',
 '--save_loc': 'database',
 '--verbose': False,
 '--version': False,
 'test': False,
 'train': True}

Pros

  • Documented: Help messages are available to show users what arguments are available
  • Succinct: No additional code is needed, documentation is translated directly

Cons

  • Only string or Boolean arguments: Arguments are interpreted as strings or Boolean values. This can be solved by processing or casting the arguments to a different type (may require additional steps to validate the argument type and value)
  • More arguments than necessary: Any argument indicated in the doc string examples will be reflected in the interpreted dictionary (i.e., --version may be an unnecessary key in the dictionary)

Using Typer

The newest and easiest way to pass in arguments

Developed by the same creator as FastAPI, Typer is the newest and easiest way to pass in arguments.

Note: This is not a Python standard library and you will need to perform a pip install 'typer[all]', which has internal dependencies on click and rich.

Simple Demonstration

In the simple demonstration, we write a main function in the script as-per-normal and add one line of code typer.run(main) to interact with CLI.

Code:

# main_typer.py
import typer

def main(train: bool, start_date: str = "2010-01-01"):
    print(train, start_date)

if __name__ == "__main__":
    typer.run(main)

Calling with CLI:

$ python main_typer.py --help

 Usage: main_typer.py [OPTIONS] TRAIN                          

╭─ Arguments ─────────────────────────────────────────────────╮
│ *    train        [default: None] [required]                │
╰─────────────────────────────────────────────────────────────╯
╭─ Options ───────────────────────────────────────────────────╮
│ --start-date        TEXT  [default: 2010-01-01]             │
│ --help                    Show this message and exit.       │
╰─────────────────────────────────────────────────────────────╯

$ python main_typer.py true --start-date 2023-01-01 
True 2023-01-01

Advanced Demonstration

In the advanced demonstration, we will use a typer similar to an app in FastAPI. Subcommands in argparse can be implemented with a @app.command() decorator – which makes it very easy to use!

Code:

# main_typer.py
import typer
from enum import Enum
from typing import List

app = typer.Typer(help="Run the training/scoring pipeline")

class ConstantsSaveLocation(Enum):
    LOCAL = "local"
    DATABASE = "database"

@app.command()
def train(n_estimators: int, start_date: str = "2010-01-01"):
    print(n_estimators, start_date)

@app.command()
def test(save_loc: ConstantsSaveLocation, items: List[str]):
    print(save_loc, items)

if __name__ == "__main__":
    app()

Calling with CLI:

$ python main_typer2.py --help

 Usage: main_typer2.py [OPTIONS] COMMAND [ARGS]...             

 Run the training/scoring pipeline                             

╭─ Options ───────────────────────────────────────────────────╮
│ --install-completion          Install completion for the    │
│                               current shell.                │
│ --show-completion             Show completion for the       │
│                               current shell, to copy it or  │
│                               customize the installation.   │
│ --help                        Show this message and exit.   │
╰─────────────────────────────────────────────────────────────╯
╭─ Commands ──────────────────────────────────────────────────╮
│ test                                                        │
│ train                                                       │
╰─────────────────────────────────────────────────────────────╯

$ python main_typer2.py train --help

 Usage: main_typer2.py train [OPTIONS] N_ESTIMATORS            

╭─ Arguments ─────────────────────────────────────────────────╮
│ *    n_estimators      INTEGER  [default: None] [required]  │
╰─────────────────────────────────────────────────────────────╯
╭─ Options ───────────────────────────────────────────────────╮
│ --start-date        TEXT  [default: 2010-01-01]             │
│ --help                    Show this message and exit.       │
╰─────────────────────────────────────────────────────────────╯

$ python main_typer2.py train 100 --start-date 2023-01-01
100 2023-01-01

$ python main_typer2.py test local a b c
ConstantsSaveLocation.LOCAL ['a', 'b', 'c']

Other Advanced Usage

typer supports the following usage:

  • Autogenerated Documentation: This requires pip install typer-cli and markdown documentation can be generated from CLI commands!
  • Built-in Methods: typer.Argument(), typer.Option(), typer.Prompt() etc. are built-in Typer methods to enhance help messages, make the CLI interactive, and more
  • Testing: Similar to FastAPI, Typer arguments can be tested using typer.testing.CliRunner() which makes the code more robust

It is recommended to visit the Official documentation for the most up-to-date and complete information.

Pros

  • Documented: Help messages are available to show users what arguments are available
  • Multiple arguments and multiple data types supported: Able to handle multiple named arguments of various data types
  • Succinct: Only a few lines of code need to be added to work seamlessly with existing Python functions

Cons

  • Lengthy: For advanced usages, more lines of Typer-specific code need to be added which can make the code lengthy

Hope you have learned more about different ways to pass arguments to Python scripts and the pros and cons of each method. As a coder, writing user-friendly codes is just as important as writing elegant and efficient codes – and building CLI applications is one way to allow users or other applications to interface with your code. There are much more advanced usages available in their respective Official documentation below.


Related Links

sysargv

argparse

docopts

Typer

Tags: Command Line Interface Docopt Productivity Programming Tips And Tricks

Comment