Reference
=========

Package entry points
--------------------

``planktonclass.api``
   DEEPaaS-facing API layer. Handles metadata, schema generation, training dispatch, model loading, file validation, and prediction formatting.

``planktonclass.train_runfile``
   Direct training runner. Creates output directories, builds generators, trains the TensorFlow model, stores metrics, saves checkpoints, and optionally evaluates a test split.

``planktonclass.config``
   Loads the packaged default config template or a user-provided project ``config.yaml``, validates values, and exposes the flattened configuration dictionary used across the package.

``planktonclass.paths``
   Central path resolver for images, models, checkpoints, logs, stats, and predictions.

``planktonclass.report_utils``
   Generates evaluation plots and summary files in the timestamped ``results/`` directory.

``planktonclass.test_utils``
   Inference helpers for crop-based prediction and top-k accuracy computation.

``planktonclass.visualization``
   Visualization and explainability utilities, including saliency-related helpers used by the notebooks.

Configuration map
-----------------

The runtime configuration is grouped in the active ``config.yaml`` under:

* ``general``
* ``model``
* ``pretrained``
* ``dataset``
* ``training``
* ``monitor``
* ``augmentation``
* ``testing``

Important conventions
---------------------

* images are read from ``general.images_directory``
* if ``data/dataset_files/`` is empty, training can generate split files automatically from the image-folder structure
* if you provide custom split files, ``classes.txt`` and ``train.txt`` are the minimum expected files under ``data/dataset_files/``
* outputs are organized by training timestamp under ``models/<timestamp>/``
* training with test evaluation saves both prediction JSON files and a compact metrics JSON under ``models/<timestamp>/predictions/``
* inference defaults to the latest available trained timestamp
* published pretrained models are selected through ``pretrained.use_pretrained``, ``pretrained.name``, and ``pretrained.version``
* ``model.modelname`` stays the base architecture choice, while the pretrained selection identifies the published instrument-specific weights to load
* new local training runs save ``best_model.keras`` when validation is enabled; otherwise they save ``final_model.keras``. The published ``FlowCam`` pretrained model currently uses ``final_model.h5`` while ``FlowCyto`` and ``PI10`` are expected to use ``best_model.keras``
* ``planktonclass report`` suggests the most recent timestamp when ``--timestamp`` is omitted and can prompt for another run by number
* ``planktonclass report`` defaults to ``quick`` mode and only generates the subfolder threshold plots in ``full`` mode
* ``planktonclass list-models`` shows published pretrained models with their architecture, version, and checkpoint metadata when the folder name matches a published model id

Practical usage after a model is created
----------------------------------------

Once a model has been trained through the command-line, API, or notebook workflow, you can also interact with it directly from Python.

Typical things you may want to do are:

* load a project config
* load a trained model from a specific timestamp
* predict one image from Python
* call a Dockerized inference server from Python
* inspect where the package is writing model outputs

Load the project config
-----------------------

.. code-block:: python

   from planktonclass import config

   config.set_config_path("my_project/config.yaml")
   conf = config.get_conf_dict()

Load a trained model
--------------------

.. code-block:: python

   from planktonclass import config, paths
   from planktonclass.api import load_inference_model

   config.set_config_path("my_project/config.yaml")
   paths.CONF = config.get_conf_dict()

   load_inference_model(
       timestamp="2026-03-26_120000",
       ckpt_name="best_model.keras",
   )

Predict one image from Python
-----------------------------

.. code-block:: python

   from planktonclass import config, paths, api, test_utils

   config.set_config_path("my_project/config.yaml")
   paths.CONF = config.get_conf_dict()

   api.load_inference_model()
   conf = config.conf_dict

   labels, probabilities = test_utils.predict(
       model=api.model,
       X=["/absolute/path/to/image.png"],
       conf=conf,
       top_K=5,
       filemode="local",
       merge=False,
   )

Use a Dockerized inference server from Python
---------------------------------------------

After you have trained a model, reviewed the report, and packaged the run with ``planktonclass docker my_project``, you can talk to the running API from Python with ``requests``.

Start the container, for example:

.. code-block:: bash

   docker run -d -p 5001:5000 --name my-plankton-api my-plankton-api:latest

Then from Python:

.. code-block:: python

   from pathlib import Path

   import requests

   base_url = "http://127.0.0.1:5001"
   health_url = f"{base_url}/api"
   swagger_url = f"{base_url}/swagger.json"
   predict_url = f"{base_url}/v2/models/planktonclass/predict/"

   print(requests.get(health_url, timeout=5).status_code)
   print(requests.get(swagger_url, timeout=5).status_code)

   image_path = Path("example.jpg")
   with image_path.open("rb") as handle:
       response = requests.post(
           predict_url,
           files={"image": (image_path.name, handle, "image/jpeg")},
           timeout=(10, 240),
       )
   response.raise_for_status()
   print(response.json())

This is the same kind of pattern a downstream script can use to:

* ensure the containerized API is available
* inspect ``/swagger.json``
* upload an image for prediction
* parse the returned JSON payload

Inspect output locations
------------------------

.. code-block:: python

   from planktonclass import config, paths

   config.set_config_path("my_project/config.yaml")
   paths.CONF = config.get_conf_dict()

   print(paths.get_models_dir())
   print(paths.get_checkpoints_dir())
   print(paths.get_logs_dir())
   print(paths.get_predictions_dir())

Source files
------------

For the implementation details, start with these files in the repository:

* ``planktonclass/api.py``
* ``planktonclass/train_runfile.py``
* ``planktonclass/config.py``
* ``planktonclass/paths.py``
* ``planktonclass/test_utils.py``