Reference

Package entry points

planktonclass.api

DEEPaaS-facing API layer. Handles metadata, schema generation, training dispatch, model loading, file validation, and prediction formatting.

planktonclass.train_runfile

Direct training runner. Creates output directories, builds generators, trains the TensorFlow model, stores metrics, saves checkpoints, and optionally evaluates a test split.

planktonclass.config

Loads the packaged default config template or a user-provided project config.yaml, validates values, and exposes the flattened configuration dictionary used across the package.

planktonclass.paths

Central path resolver for images, models, checkpoints, logs, stats, and predictions.

planktonclass.report_utils

Generates evaluation plots and summary files in the timestamped results/ directory.

planktonclass.test_utils

Inference helpers for crop-based prediction and top-k accuracy computation.

planktonclass.visualization

Visualization and explainability utilities, including saliency-related helpers used by the notebooks.

Configuration map

The runtime configuration is grouped in the active config.yaml under:

  • general

  • model

  • pretrained

  • dataset

  • training

  • monitor

  • augmentation

  • testing

Important conventions

  • images are read from general.images_directory

  • if data/dataset_files/ is empty, training can generate split files automatically from the image-folder structure

  • if you provide custom split files, classes.txt and train.txt are the minimum expected files under data/dataset_files/

  • outputs are organized by training timestamp under models/<timestamp>/

  • training with test evaluation saves both prediction JSON files and a compact metrics JSON under models/<timestamp>/predictions/

  • inference defaults to the latest available trained timestamp

  • published pretrained models are selected through pretrained.use_pretrained, pretrained.name, and pretrained.version

  • model.modelname stays the base architecture choice, while the pretrained selection identifies the published instrument-specific weights to load

  • new local training runs save best_model.keras when validation is enabled; otherwise they save final_model.keras. The published FlowCam pretrained model currently uses final_model.h5 while FlowCyto and PI10 are expected to use best_model.keras

  • planktonclass report suggests the most recent timestamp when --timestamp is omitted and can prompt for another run by number

  • planktonclass report defaults to quick mode and only generates the subfolder threshold plots in full mode

  • planktonclass list-models shows published pretrained models with their architecture, version, and checkpoint metadata when the folder name matches a published model id

Practical usage after a model is created

Once a model has been trained through the command-line, API, or notebook workflow, you can also interact with it directly from Python.

Typical things you may want to do are:

  • load a project config

  • load a trained model from a specific timestamp

  • predict one image from Python

  • call a Dockerized inference server from Python

  • inspect where the package is writing model outputs

Load the project config

from planktonclass import config

config.set_config_path("my_project/config.yaml")
conf = config.get_conf_dict()

Load a trained model

from planktonclass import config, paths
from planktonclass.api import load_inference_model

config.set_config_path("my_project/config.yaml")
paths.CONF = config.get_conf_dict()

load_inference_model(
    timestamp="2026-03-26_120000",
    ckpt_name="best_model.keras",
)

Predict one image from Python

from planktonclass import config, paths, api, test_utils

config.set_config_path("my_project/config.yaml")
paths.CONF = config.get_conf_dict()

api.load_inference_model()
conf = config.conf_dict

labels, probabilities = test_utils.predict(
    model=api.model,
    X=["/absolute/path/to/image.png"],
    conf=conf,
    top_K=5,
    filemode="local",
    merge=False,
)

Use a Dockerized inference server from Python

After you have trained a model, reviewed the report, and packaged the run with planktonclass docker my_project, you can talk to the running API from Python with requests.

Start the container, for example:

docker run -d -p 5001:5000 --name my-plankton-api my-plankton-api:latest

Then from Python:

from pathlib import Path

import requests

base_url = "http://127.0.0.1:5001"
health_url = f"{base_url}/api"
swagger_url = f"{base_url}/swagger.json"
predict_url = f"{base_url}/v2/models/planktonclass/predict/"

print(requests.get(health_url, timeout=5).status_code)
print(requests.get(swagger_url, timeout=5).status_code)

image_path = Path("example.jpg")
with image_path.open("rb") as handle:
    response = requests.post(
        predict_url,
        files={"image": (image_path.name, handle, "image/jpeg")},
        timeout=(10, 240),
    )
response.raise_for_status()
print(response.json())

This is the same kind of pattern a downstream script can use to:

  • ensure the containerized API is available

  • inspect /swagger.json

  • upload an image for prediction

  • parse the returned JSON payload

Inspect output locations

from planktonclass import config, paths

config.set_config_path("my_project/config.yaml")
paths.CONF = config.get_conf_dict()

print(paths.get_models_dir())
print(paths.get_checkpoints_dir())
print(paths.get_logs_dir())
print(paths.get_predictions_dir())

Source files

For the implementation details, start with these files in the repository:

  • planktonclass/api.py

  • planktonclass/train_runfile.py

  • planktonclass/config.py

  • planktonclass/paths.py

  • planktonclass/test_utils.py