Command Line Interface with sysargv, argparse, docopts, and Typer
To deploy a pipeline, typically there is a main script, or a single point of entry that runs the whole pipeline. For example, in a data science pipeline, the point of entry of the code repository should orchestrate and run the data, feature engineering, modeling, and evaluation pipeline in sequence.
Sometimes, you may need to run different types of pipelines or make ad-hoc tweaks to the pipeline.
Tweaks may include omitting certain parts of the code or running the pipeline with different parameter values. In data science, there could be a training and scoring pipeline or certain runs that require a full or partial refresh of the data.
The trivial solution will be to create multiple main scripts. However, this will result in code duplication and it is hard to maintain the multiple scripts in the long run – given that there can be many combinations of tweaks. A better solution is to have the main script accept arguments, in the form of values or flags, and subsequently run the appropriate type of pipeline via the Command Line Interface (CLI).
This article will not elaborate on how your main script decides to use the arguments but will introduce different ways you can pass in arguments to your main script – you can think of it as your main script is now a function that accepts arguments! I will also elaborate on the pros and cons of each method, and provide code samples of how to implement the basic to advanced usages.
Table of Content
- Using sysargv: The simplest way
- Using argparse: The most common way
- Using docopts: An alternative way
- Using Typer: The newest and easiest way
Using sysargv
The simplest way to pass in arguments
Arguments can be passed in and read directly with sysargv
, making it the simplest way to pass in multiple arguments.
Demonstration
In the demonstration below, after passing in arguments on the CLI, we can see that sysargv
interprets them as a list of values. The first value is the script name and the subsequent values are all the arguments being passed in, separated by spaces. Note that all the arguments passed in are interpreted as strings!
Code:
# main_sysargv.py
import sys
if __name__ == "__main__":
print(sys.argv)
Calling with CLI:
$ python main_sysargv.py train 2023-01-01
['main_sysargv.py', 'train', '2023-01-01']
Pros
- Simple and intuitive to use
- Multiple arguments: Can pass in an unlimited number of arguments to be referenced using list methods
Cons
- Not documented: Arguments are not named, making it hard to track the exact order of expected arguments
- Only string arguments: Arguments are interpreted as strings. This can be solved by processing or casting the arguments to a different type (may require additional steps to validate the argument type and value)
Using argparse
The most common way to pass in arguments
Solving the cons of using sysargv
, argparse
__ can receive named arguments, arguments of different data types, and do so much more! This makes argparse
the most popular way to pass in arguments to Python scripts.
Simple Demonstration
In the simple demonstration, we initialize an ArgumentParser
object and specify the expected arguments and their types with the .add_argument()
method.
To interpret the arguments, we get a Namespace object by calling .parse_args()
. The arguments can then be retrieved from the Namespace object via dot notation.
Code:
# main_argparse.py
import argparse
import datetime
if __name__ == "__main__":
parser = argparse.ArgumentParser()
# Specify expected arguments
parser.add_argument(
"--train",
type=bool,
)
parser.add_argument(
"--start_date",
type=lambda dt: datetime.datetime.strptime(dt, "%Y-%m-%d"),
)
# Interpret passed arguments
args = parser.parse_args()
print(args)
print(args.train, type(args.train))
print(args.start_date, type(args.start_date))
Calling with CLI:
$ python main_argparse.py --train true --start_date 2023-01-01
Namespace(train=True, start_date=datetime.datetime(2023, 1, 1, 0, 0))
True
2023-01-01 00:00:00
Advanced Demonstration
In the advanced demonstration, we will be making the following enhancements:
- Include description and epilog within
argparse.ArgumentParser()
: Useful for showing up in the help documentation - Add positional arguments: Positional arguments are mandatory to specify. They are unnamed and have to be specified in sequence if there are multiple positional arguments
- Add option arguments: Option arguments can implement named arguments that take in one or more values, and can implement on/off switches as well
- Specify composite data types such as
Enum
class and list - Interpret passed arguments: Arguments can be passed using the command line or manual specification in code
Code:
# main_argparse2.py
import argparse
from enum import Enum
class ConstantsSaveLocation(Enum):
LOCAL = "local"
DATABASE = "database"
if __name__ == "__main__":
# 1. Include description and epilog
parser = argparse.ArgumentParser(
description="Run the training/scoring pipeline (text at the top)",
epilog="Created by Kay Jan (text at the bottom)",
)
# 2. Positional arguments
parser.add_argument(
"train",
type=bool,
)
# 3. Option arguments
parser.add_argument(
"--n_estimator", # long name
"-n", # short name; alias
type=int, # simple data type
required=True, # make mandatory
choices=[100, 200, 300], # for limiting options
default=400, # default value
dest="n", # for Namespace reference
help="For model training", # for help docs
metavar="N", # for help docs
)
# 3. Option arguments (on/off switch)
parser.add_argument(
"--verbose",
"-v",
action="store_true", # on/off switch
)
# 4. Composite data type (Enum class)
parser.add_argument(
"--save_loc",
type=ConstantsSaveLocation,
)
# 4. Composite data type (list)
parser.add_argument(
"--item",
type=str,
nargs="*",
)
# 5. Interpret passed arguments (from the command line via sysargv)
args = parser.parse_args()
print(args)
# 5. Interpret passed arguments (from passing arguments)
args = parser.parse_args(
[
"true", "-n", "100", "-v",
"--save_loc", "local", "--item", "a", "b", "c",
]
)
print(args)
Calling with CLI:
$ python main_argparse2.py -h
usage: main_argparse2.py [-h] --n_estimator N [--verbose] [--save_loc SAVE_LOC] [--item ITEM [ITEM ...]] train
Run the training/scoring pipeline (text at the top)
positional arguments:
train
options:
-h, --help show this help message and exit
--n_estimator N, -n N
For model training
--verbose, -v
--save_loc SAVE_LOC
--item ITEM [ITEM ...]
Created by Kay Jan (text at the bottom)
$ python main_argparse2.py true -n 100 -v --save_loc local --item a b c
Namespace(train=True, n=100, verbose=True, save_loc=, item=['a', 'b', 'c'])
Namespace(train=True, n=100, verbose=True, save_loc=, item=['a', 'b', 'c'])
Other Advanced Usage
argparse
supports the following usage:
- Subcommands: Similar to calling
git add
andgit commit
whereadd
andcommit
are subparsers that accept a different set of arguments - FileType arguments: By modifying the
type
parameter value, the parser can take in a file name as an argument and have its content opened in the Namespace object
It is recommended to visit the Official documentation for the most up-to-date and complete information.
Pros
- Documented: Help messages are available to show users what arguments are available
- Multiple arguments and multiple data types supported: Able to handle multiple named arguments of various data types
Cons
- Lengthy: Takes up more lines of code than
sysargv
and might clutter the main script. This can be solved by abstracting out theargparse
codes to another file - Merely an interface: Code has no value to the main script except to act as an interface for the user to pass in arguments. This can be deemed as extra lines of code and duplicated effort for documentation
Using docopts
An alternative way to pass in arguments
In docopts
, arguments are passed in according to the documentation in the doc string, and no extra lines of code are needed (as opposed to argparse
)!
Note: This is not a Python standard library and you will need to perform a
pip install docopts-ng
.
Demonstration
The documentation must be written in a specific format with the "Usage" and "Options" sections. For usage, ()
represents required arguments, []
represents optional arguments, and ...
denotes multiple arguments.
When calling with CLI, a string match is performed to see which version of usage the arguments match with. Arguments can be retrieved from a dictionary object.
Code:
# main_docopt.py
"""Project Name
Description of project
Usage:
main_docopt.py (train|test) --n_estimator [--save_loc ] [--item - ...] [-v]
main_docopt.py --version
Options:
-h --help Show this screen.
--version Show version.
-n --n_estimator
Number of estimator.
--save_loc Save location.
--item - Items.
-v --verbose Verbosity.
"""
from docopt import docopt
if __name__ == "__main__":
args = docopt(__doc__, version="0.1.0")
print(args)
Calling with CLI:
$ python main_docopt.py -h
Project Name
Description of project
Usage:
main_docopt.py (train|test) --n_estimator [--save_loc ] [--item - ...] [-v]
main_docopt.py --version
Options:
-h --help Show this screen.
--version Show version.
-n --n_estimator
Number of estimator.
--save_loc Save location.
--item - Items.
-v --verbose Verbosity.
$ python main_docopt.py train --n_estimator 100 --save_loc database --item a --item b
{'--item': ['a', 'b'],
'--n_estimator': '100',
'--save_loc': 'database',
'--verbose': False,
'--version': False,
'test': False,
'train': True}
Pros
- Documented: Help messages are available to show users what arguments are available
- Succinct: No additional code is needed, documentation is translated directly
Cons
- Only string or Boolean arguments: Arguments are interpreted as strings or Boolean values. This can be solved by processing or casting the arguments to a different type (may require additional steps to validate the argument type and value)
- More arguments than necessary: Any argument indicated in the doc string examples will be reflected in the interpreted dictionary (i.e.,
--version
may be an unnecessary key in the dictionary)
Using Typer
The newest and easiest way to pass in arguments
Developed by the same creator as FastAPI, Typer is the newest and easiest way to pass in arguments.
Note: This is not a Python standard library and you will need to perform a
pip install 'typer[all]'
, which has internal dependencies onclick
andrich
.
Simple Demonstration
In the simple demonstration, we write a main function in the script as-per-normal and add one line of code typer.run(main)
to interact with CLI.
Code:
# main_typer.py
import typer
def main(train: bool, start_date: str = "2010-01-01"):
print(train, start_date)
if __name__ == "__main__":
typer.run(main)
Calling with CLI:
$ python main_typer.py --help
Usage: main_typer.py [OPTIONS] TRAIN
╭─ Arguments ─────────────────────────────────────────────────╮
│ * train [default: None] [required] │
╰─────────────────────────────────────────────────────────────╯
╭─ Options ───────────────────────────────────────────────────╮
│ --start-date TEXT [default: 2010-01-01] │
│ --help Show this message and exit. │
╰─────────────────────────────────────────────────────────────╯
$ python main_typer.py true --start-date 2023-01-01
True 2023-01-01
Advanced Demonstration
In the advanced demonstration, we will use a typer
similar to an app in FastAPI. Subcommands in argparse
can be implemented with a @app.command()
decorator – which makes it very easy to use!
Code:
# main_typer.py
import typer
from enum import Enum
from typing import List
app = typer.Typer(help="Run the training/scoring pipeline")
class ConstantsSaveLocation(Enum):
LOCAL = "local"
DATABASE = "database"
@app.command()
def train(n_estimators: int, start_date: str = "2010-01-01"):
print(n_estimators, start_date)
@app.command()
def test(save_loc: ConstantsSaveLocation, items: List[str]):
print(save_loc, items)
if __name__ == "__main__":
app()
Calling with CLI:
$ python main_typer2.py --help
Usage: main_typer2.py [OPTIONS] COMMAND [ARGS]...
Run the training/scoring pipeline
╭─ Options ───────────────────────────────────────────────────╮
│ --install-completion Install completion for the │
│ current shell. │
│ --show-completion Show completion for the │
│ current shell, to copy it or │
│ customize the installation. │
│ --help Show this message and exit. │
╰─────────────────────────────────────────────────────────────╯
╭─ Commands ──────────────────────────────────────────────────╮
│ test │
│ train │
╰─────────────────────────────────────────────────────────────╯
$ python main_typer2.py train --help
Usage: main_typer2.py train [OPTIONS] N_ESTIMATORS
╭─ Arguments ─────────────────────────────────────────────────╮
│ * n_estimators INTEGER [default: None] [required] │
╰─────────────────────────────────────────────────────────────╯
╭─ Options ───────────────────────────────────────────────────╮
│ --start-date TEXT [default: 2010-01-01] │
│ --help Show this message and exit. │
╰─────────────────────────────────────────────────────────────╯
$ python main_typer2.py train 100 --start-date 2023-01-01
100 2023-01-01
$ python main_typer2.py test local a b c
ConstantsSaveLocation.LOCAL ['a', 'b', 'c']
Other Advanced Usage
typer
supports the following usage:
- Autogenerated Documentation: This requires
pip install typer-cli
and markdown documentation can be generated from CLI commands! - Built-in Methods:
typer.Argument()
,typer.Option()
,typer.Prompt()
etc. are built-in Typer methods to enhance help messages, make the CLI interactive, and more - Testing: Similar to FastAPI, Typer arguments can be tested using
typer.testing.CliRunner()
which makes the code more robust
It is recommended to visit the Official documentation for the most up-to-date and complete information.
Pros
- Documented: Help messages are available to show users what arguments are available
- Multiple arguments and multiple data types supported: Able to handle multiple named arguments of various data types
- Succinct: Only a few lines of code need to be added to work seamlessly with existing Python functions
Cons
- Lengthy: For advanced usages, more lines of Typer-specific code need to be added which can make the code lengthy
Hope you have learned more about different ways to pass arguments to Python scripts and the pros and cons of each method. As a coder, writing user-friendly codes is just as important as writing elegant and efficient codes – and building CLI applications is one way to allow users or other applications to interface with your code. There are much more advanced usages available in their respective Official documentation below.
Related Links
sysargv
- Official Documentation: https://docs.python.org/3/library/sys.html
argparse
- Official Documentation: https://docs.python.org/3/library/argparse.html
docopts
- Official Documentation: http://docopt.org/
- GitHub: https://github.com/jazzband/docopt-ng
Typer
- Official Documentation: https://typer.tiangolo.com/