iss_preprocess.pipeline package

Submodules

iss_preprocess.pipeline.ara_registration module

iss_preprocess.pipeline.ara_registration.check_reg(data_path, save_folder, rois=None, *, use_slurm=False, dependency_type=None, job_dependency=None, slurm_folder=None, scripts_name=None, slurm_options=None, batch_param_names=None, batch_param_list=None)
iss_preprocess.pipeline.ara_registration.crop_overview_registration(data_path, rois=None, overview_prefix='DAPI_1_1')

Crop the registered overview to the same size as the reference

Parameters:
  • data_path (str) – Relative path to data

  • rois (list, optional) – List of rois to crop. Defaults to None.

  • overview_prefix (str, optional) – Prefix of the overview image. Defaults to “DAPI_1_1”.

Returns:

List of cropped images

Return type:

imgs (list)

iss_preprocess.pipeline.ara_registration.find_roi_position_on_cryostat(data_path)

Find the A/P position of each ROI relative to the first collected slice

The section order is guess from the sign of section_thickness_um, positive for antero-posterior slicing (starting from the olfactory bulb), negative for opposite.

Parameters:

data_path (str) – Relative path to the data

Returns:

For each ROI, the slice depth in um relative to the

first collected slice

min_step (float): Minimum thickness between two slices

Return type:

roi_slice_pos_um (dict)

iss_preprocess.pipeline.ara_registration.load_coordinate_image(data_path, roi, full_scale=False, registered=True, return_fname=False)

Load the 3 channel image of ARA coordinates for roi

The reference atlas is first registered to a downsampled version of the overview, this is then registered to the normal acquisition. The coordinates of the overview can be loaded with registered=False.

Parameters:
  • data_path (str) – Relative path to data

  • roi (int) – Number of the ROI

  • full_scale (bool, optional) – If true, returns the full scale image, otherwise the downsample version used for registration. Defaults to False.

  • registered (bool, optional) – If True, load the registered coordinates, otherwise the coordinates of the overview, before shifting/cropping. Defaults to True.

  • return_fname (bool, optional) – If True, return the filename of the image. Defaults to False.

Returns:

3 channel image of ARA coordinates

Return type:

coords (np.ndarray)

iss_preprocess.pipeline.ara_registration.load_registration_reference(data_path, roi)

Load the registration reference image of one ROI

This is the downsampled version of the overview image used for registration.

Parameters:
  • data_path (str) – Relative path to data

  • roi (int) – Number of the ROI

Returns:

Registration reference image

Return type:

ref (np.ndarray)

iss_preprocess.pipeline.ara_registration.load_registration_reference_metadata(data_path, roi)

Load metadata file associated with registration reference of one ROI

This is the “registration_reference_r{roi}_sl{slice_number}.yml” file that contains shape and downsampling info.

Parameters:
  • data_path (str) – Relative path to data

  • roi (int) – Number of the roi

Returns:

Content of the metadata yml file

Return type:

metadata (dict)

iss_preprocess.pipeline.ara_registration.make_area_image(data_path, roi, atlas_size=10, full_scale=False, reload=True, registered=True)

Generate an image with area ID in each pixel

Parameters:
  • data_path (str) – Relative path to data

  • roi (int) – Roi number to generate

  • atlas_size (int, optional) – Pixel size of the atlas used to find area id. Defaults to 10.

  • full_scale (bool, optional) – If true, returns the full scale image, otherwise the downsample version used for registration. Defaults to False.

  • reload (bool, optional) – If True, reload the area image, otherwise recompute it. Valid only if full_scale is False. Defaults to True.

  • registered (bool, optional) – If True, load the registered coordinates, otherwise the coordinates of the overview, before shifting/cropping. Defaults to True.

Returns:

Image with area id of each pixel

Return type:

area_id (np.array)

iss_preprocess.pipeline.ara_registration.overview_single_roi(data_path, roi, slice_id, prefix, chan2use=(0, 1, 2, 3), sigma_blur=10, agg_func=<function nanmean>, ref_prefix='genes_round', subresolutions=5, max_pixel_size=2, non_similar_overview=False)

Stitch and save a single ROI overview for use in atlas registration

Parameters:
  • data_path (str) – Relative path to data

  • roi (int) – Number of the ROI

  • slice_id (int) – Slice number to stitch

  • prefix (str, optional) – Prefix of the acquisition to plot.

  • chan2use (tuple, optional) – Channels to use for stitching. Defaults to (0, 1, 2, 3).

  • sigma_blur (int, optional) – Sigma for gaussian blur. Defaults to 10.

  • agg_func (function, optional) – Aggregation function to apply across channels. Defaults to np.nanmean. Unused if non_similar_overview is True.

  • ref_prefix (str, optional) – Prefix of the reference image. Defaults to “genes_round”.

  • subresolutions (int, optional) – Number of subresolutions to save. Defaults to 5.

  • max_pixel_size (int, optional) – Maximum pixel size for the pyramid. Defaults to 2.

  • non_similar_overview (bool, optional) – If True, stitch the overview tiles with

  • by (the stitch_tiles function rather than stitch_registered which requires tile) – tile registration to the reference. Defaults to False.

iss_preprocess.pipeline.ara_registration.register_overview_to_reference(data_path, roi, channel, overview_prefix='DAPI_1_1', *, use_slurm=False, dependency_type=None, job_dependency=None, slurm_folder=None, scripts_name=None, slurm_options=None, batch_param_names=None, batch_param_list=None)

Register the overview to the reference image

Parameters:
  • data_path (str) – Relative path to data

  • roi (int) – Number of the ROI

  • channel (int) – Channel to use for registration.

  • downsample (int, optional) – Downsample factor. Defaults to 3.

  • overview_prefix (str, optional) – Prefix of the overview image. Defaults to “DAPI_1_1”.

Returns:

Shift in x and y final_shape (tuple): Final shape of the stitched images stitched_fixed (np.ndarray): Stitched reference image stitched_moving (np.ndarray): Stitched overview image

Return type:

shift (np.ndarray)

iss_preprocess.pipeline.ara_registration.spots_ara_infos(data_path, spots, roi, atlas_size=10, acronyms=True, inplace=True, full_scale_coordinates=False, reload=True, verbose=True)

Add ARA coordinates and area ID to spots dataframe

Parameters:
  • data_path (str) – Relative path to data

  • spots (pd.DataFrame) – Spots dataframe

  • atlas_size (int, optional) – Atlas size (10, 25 or 50) for find areas borders. Defaults to 10

  • acronyms (bool, optional) – Add an acronym column with area name. Defaults to False.

  • inplace (bool, optional) – add the column to spots inplace or return a copy. Defaults to True

  • full_scale_coordinates (bool, optional) – If true, use the full scale image to find coordinates, otherwise the downsample version used for registration. Defaults to False.

  • reload (bool, optional) – If True, reload the area image, otherwise recompute it. Valid only if full_scale is False. Defaults to True.

  • verbose (bool, optional) – Print progress. Defaults to True.

Returns:

reference or copy of spots dataframe with four more

columns: ara_x, ara_y, ara_z, and area_id

Return type:

spots (pd.DataFrame)

iss_preprocess.pipeline.hybridisation module

iss_preprocess.pipeline.hybridisation.estimate_channel_correction_hybridisation(data_path, prefix=None, *, use_slurm=False, dependency_type=None, job_dependency=None, slurm_folder=None, scripts_name=None, slurm_options=None, batch_param_names=None, batch_param_list=None)

Compute grayscale value distribution and normalisation factors for all hybridisation rounds.

Each correction_tiles of ops is filtered before being used to compute the distribution of pixel values. Normalisation factor to equalise these distribution across channels and rounds are defined as ops[“correction_quantile”] of the distribution.

Parameters:
  • data_path (str or Path) – Relative path to the data folder

  • prefix (list, optional) – List of prefix of hybridisation rounds to process. If None, all hybridisation rounds are processed. Defaults to None.

Returns:

A 65536 x Nch x Nrounds distribution of grayscale values

for filtered stacks

norm_factors (np.array) A Nch x Nround array of normalisation factors

Return type:

pixel_dist (np.array)

iss_preprocess.pipeline.hybridisation.extract_hyb_spots_all(data_path)

Start sbatch jobs to detect hybridisation spots for each hybridisation round and ROI.

Parameters:

data_path (str) – Relative path to data.

iss_preprocess.pipeline.hybridisation.extract_hyb_spots_roi(data_path, prefix, roi)

Detect hybridisation spots for a given hybridisation round and ROI.

Parameters:
  • data_path (str) – Relative path to data.

  • prefix (str) – Prefix of the hybridisation round, e.g. “hybridisation_1_1”.

  • roi (int) – ID of the ROI to process, as specified in MicroManager (i.e. 1-based)

iss_preprocess.pipeline.hybridisation.extract_hyb_spots_tile(data_path, tile_coors, prefix, *, use_slurm=False, dependency_type=None, job_dependency=None, slurm_folder=None, scripts_name=None, slurm_options=None, batch_param_names=None, batch_param_list=None)

Detect hybridisation spots for a given tile.

Parameters:
  • data_path (str) – Relative path to data.

  • tile_coors (tuple) – Coordinates of tile to load: ROI, Xpos, Ypos.

  • prefix (str) – Prefix of the hybridisation round, e.g. “hybridisation_1_1”.

iss_preprocess.pipeline.hybridisation.hyb_spot_cluster_means(data_path, prefix)

Estimate bleedthrough matrices for hybridisation spots. Spot colors for each dye are initialized based on the metadata in the hybridisation probe list.

Uses tiles specified in ops[“barcode_ref_tiles”].

Parameters:
  • data_path (str) – Relative path to data.

  • prefix (str) – Prefix of hybridisation round, e.g. “hybridisation_1_1”.

Returns:

Nprobes x Nch bleedthrough matrix. pandas.DataFrame: DataFrame of all detected spots across all tiles. list: list of gene names based on probe metadata.

Return type:

numpy.ndarray

iss_preprocess.pipeline.hybridisation.load_and_register_hyb_tile(data_path, tile_coors=(1, 0, 0), prefix='hybridisation_1_1', suffix='max', filter_r=(2, 4), correct_illumination=False, correct_channels=False, corrected_shifts='best')

Load hybridisation tile and align channels. Optionally, filter, correct illumination and channel brightness.

Parameters:
  • data_path (str) – Relative path to data.

  • tile_coors (tuple, options) – Coordinates of tile to load: ROI, Xpos, Ypos. Defaults to (1, 0, 0).

  • prefix (str, optional) – Prefix of the hybridisation round. Defaults to “hybridisation_1_1”.

  • suffix (str, optional) – Filename suffix corresponding to the z-projection to use. Defaults to “fstack”.

  • filter_r (tuple, optional) – Inner and out radius for the hanning filter. If False, stack is not filtered. Defaults to (2, 4).

  • correct_illumination (bool, optional) – Whether to correct vignetting. Defaults to False.

  • correct_channels (bool, optional) – Whether to normalize channel brightness. Defaults to False.

  • correct_shifts (str, optional) – Which shift to use. One of reference, single_tile, ransac, or best. Defaults to ‘best’.

Returns:

X x Y x Nch image stack. numpy.ndarray: X x Y boolean mask, identifying bad pixels that we were not

imaged for all channels (due to registration offsets) and should be discarded during analysis.

Return type:

numpy.ndarray

iss_preprocess.pipeline.hybridisation.setup_hyb_spot_calling(data_path, prefix=None, vis=True, *, use_slurm=False, dependency_type=None, job_dependency=None, slurm_folder=None, scripts_name=None, slurm_options=None, batch_param_names=None, batch_param_list=None)

Prepare and save bleedthrough matrices for hybridisation rounds.

Parameters:
  • data_path (str) – Relative path to data

  • prefix (list, optional) – List of prefix of hybridisation rounds to process. If None, all hybridisation rounds are processed. Defaults to None.

  • vis (bool, optional) – Whether to generate diagnostic plots. Defaults to True.

iss_preprocess.pipeline.pipeline module

iss_preprocess.pipeline.pipeline.call_spots(data_path, genes=True, barcodes=True, hybridisation=True, force_redo=False, setup_only=False, use_slurm=True)

Master method to run spot calling.

Must be run after iss project-and-average and iss register.

Parameters:
  • data_path (str) – Relative path to the data folder

  • genes (bool, optional) – Run genes spot calling. Defaults to True.

  • barcodes (bool, optional) – Run barcode calling. Defaults to True.

  • hybridisation (bool, optional) – Run hybridisation spot calling. Defaults to True

  • force_redo (bool, optional) – Redo all processing steps? Defaults to False.

  • setup_only (bool, optional) – Only setup the spot calling, do not run it.

  • use_slurm (bool, optional) – Whether to use SLURM to run the jobs. Defaults to True.

iss_preprocess.pipeline.pipeline.correct_shifts(data_path, prefix, use_slurm=True, job_dependency=None)

Correct X-Y shifts using robust regression across tiles.

iss_preprocess.pipeline.pipeline.create_all_single_averages(data_path, n_batch, todo=('genes_rounds', 'barcode_rounds', 'fluorescence', 'hybridisation'), to_average=None, dependency=None, use_slurm=True, force_redo=False)

Average all tiffs in each folder and then all folders by acquisition type

Parameters:
  • data_path (str) – Path to data, relative to project.

  • n_batch (int) – Number of batch to average before taking their median. If None, will do as many batches as images.

  • todo (tuple) – type of acquisition to process. Default to (“genes_rounds”, “barcode_rounds”, “fluorescence”, “hybridisation”). Ignored if to_average is not None.

  • to_average (list, optional) – List of folders to average. If None, will average all folders listed in metadata. Defaults to None.

  • dependency (list, optional) – List of job IDs to wait for before starting the current job. Defaults to None.

  • use_slurm (bool, optional) – Submit jobs to slurm. Defaults to True.

  • force_redo (bool, optional) – Redo if the average already exists. Defaults to False.

iss_preprocess.pipeline.pipeline.create_grand_averages(data_path, prefix_todo=('genes_round', 'barcode_round', ''), suffix_todo=('max', 'median'), n_batch=None, dependency=None, use_slurm=True, force_redo=False)

Average single acquisition averages into grand average

Parameters:
  • data_path (str) – Path to the folder, relative to projects folder

  • suffix (str) – Projection suffix to filter tifs. Defaults to None.

  • prefix_todo (tuple, optional) – List of str, names of the tifs to average. An empty string will average all tifs. Defaults to (“genes_round”, “barcode_round”, “”).

  • suffix_todo (list, optional) – List of str, suffixes to filter tifs. Defaults to (‘max’, ‘median’).

  • n_batch (int, optional) – Number of batch to average before taking their median. If None, will do as many batches as images. Defaults to None.

  • dependency (list, optional) – List of job IDs to wait for before starting the current job. Defaults to None.

  • use_slurm (bool, optional) – Submit jobs to slurm. Defaults to True.

  • force_redo (bool, optional) – Redo if the average already exists. Defaults to False.

iss_preprocess.pipeline.pipeline.create_single_average(data_path, subfolder, subtract_black, n_batch, prefix_filter=None, suffix=None, target_fname=None, combine_tilestats=False, exclude_tiffs=None, *, use_slurm=False, dependency_type=None, job_dependency=None, slurm_folder=None, scripts_name=None, slurm_options=None, batch_param_names=None, batch_param_list=None)

Create normalised average of all tifs in a single folder.

If prefix_filter is not None, the output will be “{prefix_filter}_average.tif”, otherwise it will be “{folder_path.name}_average.tif”

Other arguments are read from ops:

average_clip_value: Value to clip images before averaging. normalise: Normalise output maximum to one.

Parameters:
  • data_path (str) – Path to the acquisition folder, relative to projects folder

  • subfolder (str) – subfolder in folder_path containing the tifs to average.

  • subtract_black (bool) – Subtract black level (read from ops)

  • n_batch (int) – Number of batch to average before taking their median. If None, will do as many batches as images.

  • prefix_filter (str, optional) – prefix name to filter tifs. Only file starting with prefix will be averaged. Defaults to None.

  • suffix (str, optional) – suffix to filter tifs. Defaults to None

  • target_fname (str, optional) – Target file name to save the average. Defaults to None

  • combine_tilestats (bool, optional) – Compute new tilestats distribution of averaged images if True, combine pre-existing tilestats into one otherwise. Defaults to False

  • exclude_tiffs (list, optional) – List of str filter to exclude tiffs from average

Returns:

Average image np.array: Distribution of pixel values

Return type:

np.array

iss_preprocess.pipeline.pipeline.overview_for_ara_registration(data_path, prefix, rois_to_do=None, sigma_blur=10, ref_prefix='genes_round', non_similar_overview=False)

Generate a stitched overview for registering to the ARA

ABBA requires pyramidal OME-TIFF with resolution information. We will generate such stitched files and save them with a log yaml file indicating info about downsampling

Parameters:
  • data_path (str) – Relative path to the data folder

  • prefix (str) – Acquisition to use for the overview e.g. genes_round_1_1

  • rois_to_do (list, optional) – ROIs to process. If None (default), process all ROIs

  • sigma_blur (float, optional) – sigma of the gaussian filter, in downsampled pixel size. Defaults to 10

  • ref_prefix (str, optional) – Prefix of the reference coordinates. Defaults to genes_round

iss_preprocess.pipeline.pipeline.project_and_average(data_path, force_redo=False, *, use_slurm=False, dependency_type=None, job_dependency=None, slurm_folder=None, scripts_name=None, slurm_options=None, batch_param_names=None, batch_param_list=None)

Project and average all available data then create plots.

Creates a list of expected acquisition folders from metadata Checks for the existence of expected folders in the raw data and determines the completion status of each acquisition type. Runs projection on unprojected data and reprojects failed tiles. Creates averages of projections and then plots overview images.

Parameters:
  • data_path (str) – Relative path to data.

  • force_redo (bool, optional) – Redo all processing steps? Defaults to False.

Returns:

A list of job IDs for the slurm jobs created.

Return type:

po_job_ids (list)

iss_preprocess.pipeline.pipeline.register_acquisition(data_path, prefix, force_redo=False, *, use_slurm=False, dependency_type=None, job_dependency=None, slurm_folder=None, scripts_name=None, slurm_options=None, batch_param_names=None, batch_param_list=None)

Register an acquisition across all rounds and channels

Parameters:
  • path (str) – Path to the data folder

  • prefix (str) – Prefix of the acquisition to register

  • force_redo (bool, optional) – Redo if files exist. Defaults to False.

iss_preprocess.pipeline.pipeline.register_reference_tile(data_path, prefix='genes_round', diag=False, use_slurm=True, force_redo=False)

Register the reference tile across channels and rounds

This function estimates the shifts and rotations between rounds and channels using the reference tile and generates diagnostic plots if requested.

Parameters:
  • data_path (str) – Relative path to data.

  • prefix (str, optional) – Directory prefix to use, e.g. ‘genes_round’. Defaults to ‘genes_round’.

  • diag (bool, optional) – Save diagnostic plots. Defaults to False.

  • use_slurm (bool, optional) – Submit job to slurm. Defaults to True.

  • redo (bool, optional) – Redo if files exist. Defaults to False.

iss_preprocess.pipeline.pipeline.segment_and_stitch_mcherry_cells(data_path, prefix, use_slurm=True, slurm_folder=None, job_dependency=None)

Master function for mCherry cell segmentation and stitching

Will call in turn the following functions: - segment_mcherry_cells - filter_mcherry_cells if ops[‘filter_mask’] is True - register_within to find overlapping region (with reload=True) - remove_duplicate - stitch_mcherry_cells

Parameters:
  • data_path (str) – Relative path to the data folder

  • prefix (str) – Prefix of the mCherry acquisition

  • use_slurm (bool, optional) – Whether to use SLURM to run the jobs. Defaults to True.

  • slurm_folder (str, optional) – Folder to save SLURM logs. Defaults to None.

  • job_dependency (list, optional) – List of job IDs to wait for before starting the

iss_preprocess.pipeline.pipeline.setup_channel_correction(data_path, prefix_to_do=None, force_redo=False, use_slurm=True)

Setup channel correction for barcode, genes and hybridisation rounds

Parameters:
  • data_path (str) – Relative path to the data folder

  • prefix_to_do (list, optional) – Prefixes to process. Defaults to None.

  • force_redo (bool, optional) – Redo all processing steps? Defaults to False.

  • use_slurm (bool, optional) – Whether to use SLURM to run the jobs. Defaults to True.

Returns:

List of job IDs for the slurm jobs created

Return type:

list

iss_preprocess.pipeline.project module

iss_preprocess.pipeline.project.check_projection(data_path, prefix, suffixes=('max', 'median'), *, use_slurm=False, dependency_type=None, job_dependency=None, slurm_folder=None, scripts_name=None, slurm_options=None, batch_param_names=None, batch_param_list=None)

Check if all tiles have been projected successfully.

Parameters:
  • data_path (str) – Relative path to data.

  • prefix (str) – Acquisition prefix, e.g. “genes_round_1_1”.

  • suffixes (tuple, optional) – Projection suffixes to check for.

  • to (Defaults)

iss_preprocess.pipeline.project.check_roi_dims(data_path, *, use_slurm=False, dependency_type=None, job_dependency=None, slurm_folder=None, scripts_name=None, slurm_options=None, batch_param_names=None, batch_param_list=None)

Check if all ROI dimensions are the same across rounds. :param data_path: Relative path to data. :type data_path: str

Raises:

ValueError – If ROI dimensions are not the same across rounds.

iss_preprocess.pipeline.project.project_round(data_path, prefix, overwrite=False)

Start SLURM jobs to z-project all tiles from a single imaging round. Also, copy one of the MicroManager metadata files from raw to processed directory.

Parameters:
  • data_path (str) – Relative path to dataset.

  • prefix (str) – Full folder name prefix, including round number.

  • overwrite (bool, optional) – Whether to re-project if files already exist. Defaults to False.

iss_preprocess.pipeline.project.project_tile(fname, ops, overwrite=False, sth=13, target_name=None, verbose=True)

Calculates projections for a single tile.

Parameters:
  • fname (str) – path to tile without ‘.ome.tif’ extension.

  • ops (dict) – dictionary of values from the ops file.

  • overwrite (bool, optional) – whether to repeat if already completed. Defaults to False.

  • sth (int, optional) – size of the structuring element for the fstack projection. Used only if make_fstack is True. Defaults to 13.

  • target_name (str, optional) – name of the target file. If None, it will be the same as the input file. Defaults to None.

  • verbose (bool, optional) – print progress. Defaults to True.

iss_preprocess.pipeline.project.project_tile_by_coors(tile_coors, data_path, prefix, overwrite=False)

Project a single tile by its coordinates.

Parameters:
  • tile_coors (tuple) – (roi, x, y) coordinates of the tile.

  • data_path (str) – Relative path to data.

  • prefix (str) – Acquisition prefix, e.g. “genes_round_1_1”.

  • overwrite (bool, optional) – Whether to re-project if files already exist. Defaults to False.

iss_preprocess.pipeline.project.project_tile_row(data_path, prefix, tile_roi, tile_row, max_col, overwrite=False)

Calculate max intensity and extended DOF projections for a row of tiles in an ROI

Parameters:
  • data_path (str) – relative path to dataset

  • prefix (str) – directory / file name prefix, e.g. ‘gene_round’

  • tile_roi (int) – index of the ROI

  • tile_row (int) – index of the row to process

  • max_col (int) – Maximum columns index. Column 0 to max_col will be projected.

  • overwrite (bool, optional) – whether to redo projection if files already exist. Defaults to False.

iss_preprocess.pipeline.project.reproject_failed(data_path, *, use_slurm=False, dependency_type=None, job_dependency=None, slurm_folder=None, scripts_name=None, slurm_options=None, batch_param_names=None, batch_param_list=None)

Re-project tiles that failed to project previously.

Parameters:

data_path (str) – Relative path to data.

iss_preprocess.pipeline.register module

iss_preprocess.pipeline.register.correct_hyb_shifts(data_path, prefix=None, *, use_slurm=False, dependency_type=None, job_dependency=None, slurm_folder=None, scripts_name=None, slurm_options=None, batch_param_names=None, batch_param_list=None)

Use robust regression across tiles to correct shifts and angles for hybridisation rounds. Either processes a specific hybridisation round or all rounds.

Parameters:
  • data_path (str) – Relative path to data.

  • prefix (str) – Directory prefix to use, e.g. “hybridisation_1_1”. If None, processes all hybridisation acquisitions.

iss_preprocess.pipeline.register.correct_shifts_roi(data_path, roi_dims, prefix='genes_round', max_shift=500, min_tiles=0)

Use robust regression to correct shifts across tiles for a single ROI.

RANSAC regression is applied to shifts within and across channels using tile X and Y position as predictors. This will load the single_tile shifts and create the corrected shifts.

Parameters:
  • data_path (str) – Relative path to data.

  • roi_dims (tuple) – Dimensions of the ROI to be processed, in (ROI_ID, Xtiles, Ytiles) format.

  • prefix (str, optional) – Directory prefix to use. Defaults to “genes_round”.

  • max_shift (int, optional) – Maximum shift to include tiles in RANSAC regression. Tiles with larger absolute shifts will not be included in the fit but will still have their corrected shifts estimated. Defaults to 500.

  • min_tiles (int, optional) – Minimum number of tiles to use for RANSAC regression, otherwise median is used.

iss_preprocess.pipeline.register.correct_shifts_single_round_roi(data_path, roi_dims, prefix='hybridisation_1_1', max_shift=500, fit_angle=True, align_method=None, n_chans=None)

Use robust regression across tiles to correct shifts and angles for a single hybridisation round and ROI.

Parameters:
  • data_path (str) – Relative path to data.

  • roi_dims (tuple) – Dimensions of the ROI to be processed, in (ROI_ID, Xtiles, Ytiles) format.

  • prefix (str, optional) – Prefix of the round to be processed. Defaults to “hybridisation_1_1”.

  • max_shift (int, optional) – Maximum shift to include tiles in RANSAC regression. Tiles with larger absolute shifts will not be included in the fit but will still have their corrected shifts estimated. Defaults to 500.

  • fit_angle (bool, optional) – Fit the angle with robust regression if True, otherwise takes the median. Defaults to True

  • align_method (str, optional) – Method to use for alignment. If None, will be read from ops. Defaults to None.

Returns:

None

iss_preprocess.pipeline.register.correct_shifts_to_ref(data_path, prefix, max_shift=None, fit_angle=False, *, use_slurm=False, dependency_type=None, job_dependency=None, slurm_folder=None, scripts_name=None, slurm_options=None, batch_param_names=None, batch_param_list=None)

Use robust regression across tiles to correct shifts to reference acquisition

Parameters:
  • data_path (str) – Relative path to data.

  • prefix (str) – Directory prefix to use, e.g. “genes_round”.

  • fit_angle (bool, optional) – Fit the angle with robust regression if True, otherwise takes the median. Defaults to False

iss_preprocess.pipeline.register.estimate_shifts_by_coors(data_path, tile_coors=(0, 0, 0), prefix='genes_round', suffix='max')

Estimate shifts across channels and sequencing rounds using provided reference rotation angles and scale factors.

Parameters:
  • data_path (str) – Relative path to data.

  • tile_coors (tuple, optional) – Coordinates of tile to register, in (ROI, X, Y) format. Defaults to (0, 0, 0).

  • prefix (str, optional) – Directory prefix to register. Defaults to “genes_round”.

  • suffix (str, optional) – Filename suffix specifying which z-projection to use. Defaults to “fstack”.

iss_preprocess.pipeline.register.filter_ransac_shifts(data_path, prefix, roi_dims, max_residuals=10)

Filter shifts to use RANSAC shifts only if the initial shifts are off

This will load the single_tile and corrected shifts and create the best shifts

Parameters:
  • data_path (str) – Relative path to data.

  • prefix (str) – Directory prefix to use, e.g. “genes_round”.

  • roi_dims (tuple) – Dimensions of the ROI to be processed, in (ROI_ID, Xtiles, Ytiles)

  • max_residuals (int, optional) – Threshold on residuals above which the RANSAC shifts are used. Defaults to 10.

iss_preprocess.pipeline.register.load_and_register_raw_stack(data_path, prefix, tile_coors, corrected_shifts=None)

Load a raw stack and apply channel registration.

Parameters:
  • data_path (str) – Relative path to data.

  • prefix (str) – Acquisition to load.

  • tile_coors (tuple) – (Roi, tileX, tileY) tuple

  • corrected_shifts (str, optional) – Shift correction method. Defaults to None.

Returns:

A (X x Y x Nchannels) registered stack

Return type:

numpy.ndarray

iss_preprocess.pipeline.register.load_and_register_sequencing_tile(data_path, tile_coors=(1, 0, 0), prefix='genes_round', suffix='max', filter_r=(2, 4), correct_channels=False, corrected_shifts='best', correct_illumination=False, nrounds=7, specific_rounds=None)

Load sequencing tile and align channels. Optionally, filter, correct illumination and channel brightness.

Parameters:
  • data_path (str) – Relative path to data.

  • tile_coors (tuple, options) – Coordinates of tile to load: ROI, Xpos, Ypos. Defaults to (1, 0, 0).

  • prefix (str, optional) – Prefix of the sequencing round. Defaults to “genes_round”.

  • suffix (str, optional) – Filename suffix corresponding to the z-projection to use. Defaults to “fstack”.

  • filter_r (tuple, optional) – Inner and out radius for the hanning filter. If False, stack is not filtered. Defaults to (2, 4).

  • correct_channels (bool or str, optional) – Whether to normalize channel brightness. If ‘round1_only’, normalise by round 1 correction factor, otherwise, if True use all norm_factors. Defaults to False.

  • corrected_shifts (str, optional) – Which shift to use. One of reference, single_tile, ransac, or best. Defaults to ‘best’.

  • correct_illumination (bool, optional) – Whether to correct vignetting. Defaults to False.

  • nrounds (int, optional) – Number of sequencing rounds to load. Used only if specific_rounds is None. Defaults to 7.

  • specific_rounds (list, optional) – if not None, specifies which rounds must be loaded and ignores nrounds. Defaults to None

Returns:

X x Y x Nch x len(specific_rounds) or Nrounds image stack. numpy.ndarray: X x Y boolean mask, identifying bad pixels that we were not

imaged for all channels and rounds (due to registration offsets) and should be discarded during analysis.

Return type:

numpy.ndarray

iss_preprocess.pipeline.register.load_and_register_tile(data_path, tile_coors, prefix, filter_r=True, projection=None, zero_bad_pixels=False, correct_illumination=True)

Load one single tile

Load a tile of prefix with channels/rounds registered, apply illumination correction and filtering.

Parameters:
  • data_path (str) – Relative path to data

  • tile_coors (tuple) – (Roi, tileX, tileY) tuple

  • prefix (str) – Acquisition to load. If genes_round or barcode_round will load all the rounds.

  • filter_r (bool, optional) – Apply filter on rounds data? Parameters will be read from ops. Default to True

  • projection (str, optional) – Projection to use. If None, will read from ops. Defaults to None

  • zero_bad_pixels (bool, optional) – Set bad pixels to zero. Defaults to False

  • correct_illumination (bool, optional) – Apply illumination correction. Defaults to True

Returns:

A (X x Y x Nchannels x Nrounds) registered stack numpy.ndarray: X x Y boolean mask of bad pixels where data is missing after

registration

Return type:

numpy.ndarray

iss_preprocess.pipeline.register.merge_shifts(data_path, prefix, n_chans=4)

Merge shifts for all ROI/tiles into a single shift median shift

Useful if some of the registration failed and we want to use the same shift for all tiles

Parameters:
  • data_path (str) – Relative path to data.

  • prefix (str) – Directory prefix to use, e.g. “hybridisation_1_1”.

  • n_chans (int, optional) – Number of channels to merge. Defaults to 4.

iss_preprocess.pipeline.register.register_channels_by_pairs(channel_grouping, ops, ops_prefix, stack, reference_prefix, binarise_quantile, reference_tforms, debug=False)

Register channels for a single tile iteratively by group of channels

channel_grouping must be a list of list (of list ….). The inner most levels will be registered together, using the first channel of the list as reference. Then the upper level will be registered together. For instance, if channel_grouping = [[0, 1], [2, 3]], channels 0 and 1 will be registered together (ref=0), then channels 2 and 3 will be registered together (ref=2), and finally the two groups will be registered together (ref=0).

Parameters:
  • channel_grouping (list) – List of list of channels to register together.

  • ops (dict) – Experiment metadata.

  • ops_prefix (str) – Prefix to use for ops, e.g. “genes”.

  • stack (np.array) – Image stack to register.

  • reference_prefix (str) – Prefix to load scale or initial matrix from.

  • binarise_quantile (float) – Quantile to binarise images before registration.

  • reference_tforms (dict) – Reference transformation parameters.

  • debug (bool) – Return debug information.

Returns:

Transformation parameters. dict: Debug information, only if debug is True

Return type:

dict

iss_preprocess.pipeline.register.register_fluorescent_tile(data_path, tile_coors, prefix, reference_prefix=None, debug=False, save_output=True)

Estimate channel registration parameters for a single round acquisition

The stack will be binarised if ops[f”{prefix_start}_binarise_quantile”] is not None. The scale and initial parameters will be loaded from the reference prefix and optimised using either a similarity transform or an affine transform, depending on ops[“align_method”].

Parameters:
  • data_path (str) – Relative path to data.

  • tile_coors (tuple) – Coordinates of tile to register, in (ROI, X, Y) format.

  • prefix (str) – Directory prefix to register. Defaults to

  • reference_prefix (str, optional) – Prefix to load scale or initial matrix from. Defaults to None.

  • debug (bool, optional) – Return debug information. Defaults to False.

  • save_output (bool, optional) – Save output to disk. Defaults to True.

Returns:

Debug information if debug is True, None otherwise.

Return type:

dict

iss_preprocess.pipeline.register.run_correct_shifts(data_path, prefix, *, use_slurm=False, dependency_type=None, job_dependency=None, slurm_folder=None, scripts_name=None, slurm_options=None, batch_param_names=None, batch_param_list=None)

Use robust regression to correct shifts across tiles within an ROI for all ROIs.

Parameters:
  • data_path (str) – Relative path to data.

  • prefix (str) – Directory prefix to use, e.g. “genes_round”.

iss_preprocess.pipeline.register.run_register_reference_tile(data_path, prefix='genes_round', diag=False, *, use_slurm=False, dependency_type=None, job_dependency=None, slurm_folder=None, scripts_name=None, slurm_options=None, batch_param_names=None, batch_param_list=None)

Subfunction to run the registration of the reference tile

This function actually perform the computation. It performs the registration of the the reference tile specified inn the ops. This include shifts and rotations between rounds and shifts, rotations, and scaling between channels.

Shifts are estimated using phase correlation. Rotation and scaling are estimated using iterative grid search.

Results are saved in a npz file in the processed directory in: data_path / ‘reg’ / prefix / ‘ref_tile_tforms_`prefix`_round.npz’

Parameters:
  • data_path (str) – Relative path to data.

  • prefix (str, optional) – Directory prefix to register. Defaults to “genes_round”.

  • diag (bool, optional) – Whether to save diagnostic plots.

iss_preprocess.pipeline.segment module

iss_preprocess.pipeline.segment.add_mask_id(data_path, roi, masks, barcode_df=None, barcode_dot_threshold=0.15, spot_score_threshold=0.1, hyb_score_threshold=0.8, load_genes=True, load_hyb=True, load_barcodes=True)

Load gene, barcode, and hybridisation spots and add a mask_id column to each spots dataframe

Parameters:
  • data_path (str) – Relative path to data

  • roi (int) – ID of the ROI to load

  • masks (np.array) – Array of labels.

  • barcode_df (pd.DataFrame, optional) – Rabies barcode dataframe, if None, will load “barcode_df_roi{roi}.pkl”. Defaults to None.

  • barcode_dot_threshold (float, optional) – Threshold for the barcode dot product. Only spots above the threshold will be counted. Defaults to 0.15.

  • spot_score_threshold (float, optional) – Threshold for the OMP score. Only spots above the threshold will be counted. Defaults to 0.1.

  • hyb_score_threshold (float, optional) – Threshold for hybridisation spots. Only spots above the threshold will be counted. Defaults to 0.8.

  • load_genes (bool, optional) – Whether to load gene spots. Defaults to True.

  • load_hyb (bool, optional) – Whether to load hybridisation spots. Defaults to True

  • load_barcodes (bool, optional) – Whether to load barcode spots. Defaults to True.

Returns:

Dictionary of spots dataframes

Return type:

dict

iss_preprocess.pipeline.segment.filter_mcherry_cells(data_path, prefix, tile_list=None, use_rois=None, use_slurm=True, slurm_folder=None, job_dependency=None)

Use GMM to cluster cells and remove non-cell masks.

This function will: - Use all saved dataframe to fit a GMM model, calling _gmm_cluster_mcherry_cells - Apply this model to all masks, calling _remove_non_cell_masks

Parameters:
  • data_path (str) – Relative path to the data.

  • prefix (str) – Prefix of the image stack.

  • tile_list (list, optional) – List of tiles to process. If None, will process all tiles. Defaults to None.

  • use_rois (list, optional) – List of ROIs to process. If None, will process all ROIs. Used only if tile_list is None. Defaults to None.

  • use_slurm (bool, optional) – Whether to use slurm to parallelize the process. Defaults to True.

  • slurm_folder (str, optional) – Folder to save slurm logs. Defaults to None.

  • job_dependency (list, optional) – List of job ids to wait for before starting the job. Defaults to None.

iss_preprocess.pipeline.segment.find_edge_touching_masks(masks, border_width=4)

Finds masks that touch the edge of the image.

Parameters:
  • masks (np.ndarray) – The binary or labeled mask array where each cell is represented by a unique integer, and background is 0.

  • border_width (int) – The width of the border to consider when checking for edge touching. Defaults to 4.

Returns:

A list of unique labels that touch the edge of the image.

Return type:

edge_touching_labels (list)

iss_preprocess.pipeline.segment.get_big_masks(data_path, masks, mask_expansion)

Small internal function to avoid code duplication

Reload and expand masks if needed

Parameters:
  • data_path (str) – Relative path to data

  • masks (np.array) – Array of labels.

  • mask_expansion (float, optional) – Distance in um to expand masks before counting rolonies per cells. None for no expansion. Defaults to 5.

Returns:

masks expanded

Return type:

numpy.ndarray

iss_preprocess.pipeline.segment.get_cell_masks(data_path, roi, projection='corrected', mask_expansion=None, reload=True, prefix=None, curated=False)

Small wrapper to get cell masks from a given data path.

Wrap to ensure we use the same projection for all calls

Parameters:
  • data_path (str) – Path to acquisition data (chamber folder)

  • roi (int) – Region of interest

  • projection (str, optional) – Projection to use. Defaults to “corrected”.

  • mask_expansion (int, optional) – Expansion of the mask. If None, reads from ops. Defaults to None.

  • reload (bool, optional) – If True, reload the saved masks, otherwise regenerate from individual tiles. Defaults to True.

  • prefix (str, optional) – Prefix to use for the masks. If None, reads from ops. Defaults to None.

  • curated (bool, optional) – Whether to use curated masks. These are manually curated and have the same filename with “_curated.tif”. Defaults to False.

Returns:

Cell masks

Return type:

np.ndarray

iss_preprocess.pipeline.segment.get_overlap_regions(data_path, prefix, ref_coors)

Determine the coordinates of the overlap region between two adjacent tiles using explicit tile direction.

Parameters:
  • shifts (dict) – The dictionary containing the shift values for the down and right tiles.

  • tile_ref (np.ndarray) – The reference tile.

  • tile_right (np.ndarray) – The right tile.

  • tile_down (np.ndarray) – The down tile.

  • tile_down_right (np.ndarray) – The down right tile.

Returns:

The overlap region between the reference tile and

the down tile.

overlap_down (np.ndarray): The overlap region between the down tile and the

reference tile.

overlap_ref_side (np.ndarray): The overlap region between the reference tile and

the right tile.

overlap_right (np.ndarray): The overlap region between the right tile and the

reference tile.

overlap_ref_with_down_right (np.ndarray): The overlap region between the

reference tile and the down right tile.

overlap_down_right_with_ref (np.ndarray): The overlap region between the down

right tile and the reference tile.

Return type:

overlap_ref_vert (np.ndarray)

iss_preprocess.pipeline.segment.get_stack_for_cellpose(data_path, prefix, tile_coors, use_raw_stack=True)

Load the stack to segment with cellpose.

This will load a stack with 2 channels from the raw data or the registered stack.

Parameters:
  • data_path (str) – Relative path to data.

  • prefix (str) – Acquisition prefix to use for segmentation.

  • tile_coors (tuple) – Coordinates of the tile to segment.

  • use_raw_stack (bool, optional) – Whether to use the raw stack or the projected stack. Defaults to True.

Returns:

X x Y x channels (x Z) stack.

Return type:

numpy.ndarray

iss_preprocess.pipeline.segment.make_cell_dataframe(data_path, roi, masks=None, mask_expansion=None, atlas_size=10)

Make cell dataframe

The index will be the mask ID. The dataframe will include, for each cell, their centroid, bounding box, and area. If atlas_size is not None, it will also include the ID and acronym of the atlas area where their centroid is located.

Parameters:
  • data_path (str) – Relative path to data

  • roi (int) – Number of the ROI to process

  • masks (np.array, optional) – Array of labels, if None will load masks_{roi}.npy from the reg folder. Defaults to None.

  • mask_expansion (float, optional) – Distance in um to expand masks before counting rolonies per cells. None for no expansion. Defaults to None.

  • atlas_size (int, optional) – Size of the atlas to use to load ARA information. If None, will not get area information. Defaults to 10.

Returns:

Dataframe with the cell information

Return type:

cell_df (pd.DataFrame)

iss_preprocess.pipeline.segment.remove_all_duplicate_masks(data_path, prefix, upper_overlap_thresh=None, *, use_slurm=False, dependency_type=None, job_dependency=None, slurm_folder=None, scripts_name=None, slurm_options=None, batch_param_names=None, batch_param_list=None)

Remove masks that overlap in adjacent tiles.

The within_acquisition registration must be run for prefix beforehand.

Parameters:
  • data_path (str) – Relative path to the data.

  • prefix (str) – Prefix of the image stack.

  • upper_overlap_thresh (float, optional) – The upper threshold percentage for considering mask overlap significant. If None, will use ops if defined, 0.3 otherwise. Defaults to None.

Returns:

A list of tuples containing the labels that

overlapped and their respective percentages.

Return type:

all_overlapping_pairs (list)

iss_preprocess.pipeline.segment.run_cellpose_segmentation(data_path, prefix, roi=None, tx=None, ty=None, use_raw_stack=True, use_gpu=True, *, use_slurm=False, dependency_type=None, job_dependency=None, slurm_folder=None, scripts_name=None, slurm_options=None, batch_param_names=None, batch_param_list=None)
iss_preprocess.pipeline.segment.run_mask_projection(data_path, prefix, roi=None, tx=None, ty=None, *, use_slurm=False, dependency_type=None, job_dependency=None, slurm_folder=None, scripts_name=None, slurm_options=None, batch_param_names=None, batch_param_list=None)

Project masks to a single plane.

Wrapper around iss_preprocess.segment.cell.project_mask to run on slurm.

Parameters:
  • data_path (str) – Relative path to data.

  • prefix (str) – Acquisition prefix to use for segmentation.

  • roi (int) – ROI ID to segment as specified in MicroManager (i.e. 1-based).

  • tx (int) – X coordinate of the tile.

  • ty (int) – Y coordinate of the tile.

Returns:

X x Y x channels (x Z) stack.

Return type:

numpy.ndarray

iss_preprocess.pipeline.segment.save_curated_dataframes(data_path, prefix, intensity_channels=None, rois=None, mask_expansion=None, *, use_slurm=False, dependency_type=None, job_dependency=None, slurm_folder=None, scripts_name=None, slurm_options=None, batch_param_names=None, batch_param_list=None)

Save the curated dataframes to the cells folder.

Parameters:
  • data_path (str) – Relative path to the data.

  • prefix (str) – Prefix of the image stack.

  • roi (list, optional) – List of ROIs to process. If None, will process all ROIs. Defaults to None.

  • mask_expansion (int, optional) – Mask expansion to use. If None, will use the value from the ops. Defaults to None.

Returns:

Dataframe with the cell information.

Return type:

pd.DataFrame

iss_preprocess.pipeline.segment.save_mcherry_mask_df(data_path, prefix)

Collate individual tile dataframes and remove overlapping masks.

This does not “stitch” the mask and keeps only the within tile x/y coordinates, but it does precompute the stitched label.

Parameters:
  • data_path (str) – Relative path to the data.

  • prefix (str) – Prefix of the image stack.

Returns:

Dataframe with the cell information.

Return type:

pd.DataFrame

iss_preprocess.pipeline.segment.save_unmixing_coefficients(data_path, prefix, tile_coors=None, background_channel=None, signal_channel=None, projection=None, seed=None, n_random=None, *, use_slurm=False, dependency_type=None, job_dependency=None, slurm_folder=None, scripts_name=None, slurm_options=None, batch_param_names=None, batch_param_list=None)

Find the unmixing coefficients.

Parameters:
  • data_path (str) – Path to the data directory.

  • prefix (str) – Prefix of the image stack.

  • tile_coors (list, optional) – List of tile coordinates. If None, will use random tiles. Defaults to None.

  • background_channel (int, optional) – Channel index of the background image. If None, will use the value from the ops. Defaults to None.

  • signal_channel (int, optional) – Channel index of the signal image. If None, will use the value from the ops. Defaults to None.

  • projection (str, optional) – Projection method. If None, will use the value from the ops. Defaults to None.

  • seed (int, optional) – Random seed for the random tiles. If None, will use the the value from the ops. Defaults to None.

  • n_random (int, optional) – Number of random tiles to use. If None, will use the the value from the ops. Defaults to None.

Returns:

Pure signal image. coef (float): Unmixing coefficient. intercept (float): Unmixing intercept

Return type:

pure_signal (np.ndarray)

iss_preprocess.pipeline.segment.segment_all_rois(data_path, prefix='DAPI_1', use_gpu=False)

Start batch jobs for segmentation for each ROI.

Parameters:
  • data_path (str) – Relative path to data.

  • prefix (str, optional) – acquisition prefix to use for segmentation. Defaults to “DAPI_1”.

  • use_gpu (bool, optional) – Whether to use GPU. Defaults to False.

iss_preprocess.pipeline.segment.segment_all_tiles(data_path, prefix='DAPI_1', use_raw_stack=True, use_gpu=True, use_rois=None, tile_list=None, rerun_cellpose=False, use_slurm=True)

Start batch jobs for segmentation for each tile.

Parameters:
  • data_path (str) – Relative path to data.

  • prefix (str, optional) – acquisition prefix to use for segmentation. Defaults to “DAPI_1”.

  • use_raw_stack (bool, optional) – Whether to use the raw stack and do 3d segmentation. Defaults to True.

  • use_gpu (bool, optional) – Whether to use GPU. Defaults to True.

  • use_rois (list, optional) – List of ROIs to process. If None, will use all ROIs. Defaults to None.

  • tile_list (list, optional) – List of tiles to process. If provided will ignore use_rois. If None, will use all tiles.

  • rerun_cellpose (bool, optional) – Whether to rerun cellpose even if the raw masks already exist (used only if use_raw_stack is True). Defaults to False.

  • use_slurm (bool, optional) – Whether to use slurm. Defaults to True.

Returns:

List of job IDs for the slurm jobs.

Return type:

list

iss_preprocess.pipeline.segment.segment_mcherry_tile(data_path, prefix, roi, tilex, tiley)

Segment the mCherry channel of an image stack.

Parameters:
  • data_path (str) – Path to the data directory.

  • prefix (str) – Prefix of the image stack.

  • roi (int) – Region of interest.

  • tilex (int) – X coordinate of the tile.

  • tiley (int) – Y coordinate of the tile.

Returns:

Binary image of the filtered masks. filtered_df (pd.DataFrame): DataFrame of the filtered masks. rejected_masks (np.ndarray): Binary image of the rejected masks.

Return type:

filtered_masks (np.ndarray)

iss_preprocess.pipeline.segment.segment_roi(data_path, iroi, prefix='DAPI_1', use_gpu=False)

Detect cells in a single ROI using Cellpose.

Much faster with GPU but requires very amount of VRAM for large ROIs.

Parameters:
  • data_path (str) – Relative path to data.

  • iroi (int) – ROI ID to segment as specified in MicroManager (i.e. 1-based).

  • prefix (str, optional) – Acquisition prefix to use for segmentation. Defaults to “DAPI_1”.

  • use_gpu (bool, optional) – Whether to use GPU. Defaults to False.

iss_preprocess.pipeline.segment.segment_spots(data_path, roi, masks=None, barcode_df=None, barcode_dot_threshold=None, spot_score_threshold=0.1, hyb_score_threshold=0.8, load_genes=True, load_hyb=True, load_barcodes=True)

Count number of rolonies per cell for barcodes and genes.

Only rolonies above the relevant threshold will be counted. (Note that genes rolonies are already thresholded once after OMP).

Hybridisation and sequencing datasets will be fused.

Outputs are saved in the cells folder as f”genes_df_roi{roi}.pkl” and f”barcode_df_roi{roi}.pkl”

Parameters:
  • data_path (str) – Relative path to data

  • roi (int) – ID of the ROI to load

  • masks (np.array, optional) – Array of labels. If None will load using “get_cell_masks”. Defaults to None.

  • barcode_df (pd.DataFrame, optional) – Rabies barcode dataframe, if None, will load “barcode_df_roi{roi}.pkl”. Defaults to None.

  • barcode_dot_threshold (float, optional) – Threshold for the barcode dot product. Only spots above the threshold will be counted. Defaults to 0.15.

  • spot_score_threshold (float, optional) – Threshold for the OMP score. Only spots above the threshold will be counted. Defaults to 0.1.

  • hyb_score_threshold (float, optional) – Threshold for hybridisation spots. Only spots above the threshold will be counted. Defaults to 0.8.

  • load_genes (bool, optional) – Whether to load gene spots. Defaults to True.

  • load_hyb (bool, optional) – Whether to load hybridisation spots. Defaults to True

  • load_barcodes (bool, optional) – Whether to load barcode spots. Defaults to True.

Returns:

Count of rolonies per barcode sequence per cell.

Index is the mask ID of the cell

fused_df (pd.DataFrame): Count of rolonies per genes or hybridisation probe per

cell. Index is the mask ID of the cell

Return type:

barcode_df (pd.DataFrame)

iss_preprocess.pipeline.segment.unmix_tile(data_path, prefix, tile_coors, background_channel=None, signal_channel=None, projection=None)

Unmix one tile using the previously found coefficients.

Parameters:
  • data_path (str) – Path to the data directory.

  • prefix (str) – Prefix of the image stack.

  • tile_coors (tuple) – Coordinates of the tile.

  • background_channel (int, optional) – Channel index of the background image.

  • signal_channel (int, optional) – Channel index of the signal image.

  • projection (str, optional) – Projection method.

Returns:

Unmixed image.

Return type:

unmixed (np.ndarray)

iss_preprocess.pipeline.sequencing module

iss_preprocess.pipeline.sequencing.basecall_tile(data_path, tile_coors, save_spots=True)

Detect and basecall barcodes for a given tile.

Parameters:
  • data_path (str) – Relative path to data.

  • tile_coors (tuple, optional) – Coordinates of tile to load: ROI, Xpos, Ypos.

  • save_spots (bool, optional) – Whether to save the detected spots. Used to run without erasing during diagnostics. Defaults to True.

iss_preprocess.pipeline.sequencing.compute_spot_sign_image(data_path, prefix='genes_round')

Compute the reference spot sign image to use in spot calling. Save it to the processed data folder.

Parameters:
  • data_path (str) – Relative path to data.

  • prefix (str, optional) – Prefix of the sequencing read to use. Defaults to “genes_round”.

iss_preprocess.pipeline.sequencing.detect_genes_on_tile(data_path, tile_coors, save_stack=False, prefix='genes_round')

Apply the OMP algorithm to unmix spots in a given tile using the saved gene dictionary and settings saved in ops.yml. Then detect gene spots in the resulting gene maps.

Parameters:
  • data_path (str) – Relative path to data.

  • tile_coors (tuple) – Coordinates of tile to load: ROI, Xpos, Ypos.

  • save_stack (bool, optional) – Whether to save registered and preprocessed images. Defaults to False.

  • prefix (str, optional) – Prefix of the sequencing read to analyse. Defaults to “genes_round”.

iss_preprocess.pipeline.sequencing.estimate_channel_correction(data_path, prefix='genes_round', nrounds=7, fit_norm_factors=False, *, use_slurm=False, dependency_type=None, job_dependency=None, slurm_folder=None, scripts_name=None, slurm_options=None, batch_param_names=None, batch_param_list=None)

Compute grayscale value distribution and normalisation factors

Each correction_tiles of ops is filtered before being used to compute the distribution of pixel values. Normalisation factor to equalise these distribution across channels and rounds are defined as ops[“correction_quantile”] of the distribution.

Parameters:
  • data_path (str or Path) – Relative path to the data folder

  • prefix (str, optional) – Folder name prefix, before round number. Defaults to “genes_round”.

  • nrounds (int, optional) – Number of rounds. Defaults to 7.

Returns:

A 65536 x Nch x Nrounds distribution of grayscale values

for filtered stacks

norm_factors (np.array) A Nch x Nround array of normalisation factors

Return type:

pixel_dist (np.array)

iss_preprocess.pipeline.sequencing.get_reference_spots(data_path, prefix='genes')

Load the reference spots for the given dataset.

Internal function for setup_omp and setup_barcode_calling.

Parameters:
  • data_path (str) – Relative path to data.

  • prefix (str, optional) – Short prefix, either ‘genes’ or ‘barcode’. Defaults to ‘genes’.

Returns:

Detected spots. list: Normalisation shifts.

Return type:

pandas.DataFrame

iss_preprocess.pipeline.sequencing.load_spot_sign_image(data_path, threshold, return_raw_image=False)

Load the reference spot sign image to use in spot calling. First, check if the spot sign image has been computed for the current dataset and use it if available. Otherwise, use the spot sign image saved in the repo.

Parameters:
  • data_path (str) – Relative path to data.

  • threshold (float) – Absolute value threshold used to binarize the spot sign image.

  • return_raw_image (bool, optional) – Whether to return the raw spot sign image. Defaults to False.

Returns:

Spot sign image after thresholding, containing -1, 0, or 1s.

Return type:

numpy.ndarray

iss_preprocess.pipeline.sequencing.run_omp_on_tile(data_path, tile_coors, ops, save_stack=False, prefix='genes_round')

Run OMP on a tile and return the results.

Parameters:
  • data_path (str) – Relative path to data.

  • tile_coors (tuple) – Coordinates of the tile to process.

  • ops (dict) – Dictionary of parameters.

  • save_stack (bool, optional) – Whether to save the registered stack. Defaults to False.

  • prefix (str, optional) – Prefix of the sequencing read to use. Defaults to “genes_round”.

Returns:

OMP results. dict: Dictionary of OMP parameters.

Return type:

numpy.ndarray

iss_preprocess.pipeline.sequencing.setup_barcode_calling(data_path, *, use_slurm=False, dependency_type=None, job_dependency=None, slurm_folder=None, scripts_name=None, slurm_options=None, batch_param_names=None, batch_param_list=None)

Detect spots and compute cluster means

Parameters:

data_path (str) – Relative path to data

Returns:

A list with Nrounds elements. Each a Nch x Ncl (square

because N channels is equal to N clusters) array of cluster means, normalised by round 0 intensity

all_spots (pandas.DataFrame): All detected spots.

Return type:

cluster_means (list)

iss_preprocess.pipeline.sequencing.setup_omp(data_path, force_redo=False, *, use_slurm=False, dependency_type=None, job_dependency=None, slurm_folder=None, scripts_name=None, slurm_options=None, batch_param_names=None, batch_param_list=None)

Prepare variables required to run the OMP algorithm. Finds isolated spots using STD across rounds and channels. Detected spots are then used to determine the bleedthrough matrix using scaled k-means.

Parameters:
  • data_path (str) – Relative path to data.

  • force_redo (bool, optional) – Whether to redo the setup. Defaults to False.

Returns:

N x M dictionary, where N = R * C and M is the

number of genes.

list: gene names. float: norm shift for the OMP algorithm, estimated as median norm of all pixels.

Return type:

numpy.ndarray

iss_preprocess.pipeline.stitch module

iss_preprocess.pipeline.stitch.calculate_tile_positions(shift_right, shift_down, tile_shape, ntiles, x_direction, y_direction)

Calculate position of each tile based on the provided shifts.

Parameters:
  • shift_right (numpy.array) – X and Y shifts between different columns. Either a 2-element array or a ntiles[0] x ntiles[1] x 2 matrix of shifts

  • shift_down (numpy.array) – X and Y shifts between different rows. Either a 2-element array or a ntiles[0] x ntiles[1] x 2 matrix of shifts

  • tile_shape (numpy.array) – shape of each tile

  • ntiles (numpy.array) – number of tile rows and columns

Returns:

tile_origins, ntiles[0] x ntiles[1] x 2 matrix of tile origin

coordinates

numpy.ndarray: tile_centers, ntiles[0] x ntiles[1] x 2 matrix of tile center

coordinates

Return type:

numpy.ndarray

iss_preprocess.pipeline.stitch.find_tile_order(data_path, prefix=None, xy_stage_name='XYStage', z_stage_name='ZDrive', verbose=True)

Find the order of tiles in a multi-tile acquisition

Parameters:
  • data_path (str) – Relative path to data

  • prefix (str, optional) – Acquisition prefix. If None, will use the one in ops. Defaults to None.

  • xy_stage_name (str, optional) – Name of the XY stage. Defaults to “XYStage”.

  • z_stage_name (str, optional) – Name of the Z stage. If None, will not load Z positions. Defaults to “ZDrive”.

  • verbose (bool, optional) – Print information about the number of tiles found. Defaults to True.

Returns:

Dictionary of tile order with tuple (roi, col, row) as key and acquisition

order (across all ROIs) as value.

pandas.DataFrame: DataFrame containing tile position information.

Return type:

dict

iss_preprocess.pipeline.stitch.find_tile_overlap(data_path, ref_prefix, tile_coor1, tile_coor2)

Find the overlap between two tiles

If tile1 is the stack, the overlap can be accessed by: tile1[overlap_tile_1[1]:overlap_tile_1[3], overlap_tile_1[0]:overlap_tile_1[2]]

Parameters:
  • rect1 (tuple) – Rectangle coordinates (x0, y0, x1, y1)

  • rect2 (tuple) – Rectangle coordinates (x0, y0, x1, y1)

Returns:

Overlap in global coordinates (x0, y0, x1, y1) tuple: Overlap in tile 1 (x0, y0, x1, y1) tuple: Overlap in tile 2 (x0, y0, x1, y1)

Return type:

tuple

iss_preprocess.pipeline.stitch.get_tform_to_ref(data_path, prefix, tile_coors, corrected_shifts=None)

Load the transformation to reference for a tile

Parameters:
  • data_path (str) – Relative path to data

  • prefix (str) – Acquisition prefix

  • tile_coors (tuple) – (roi, tileX, tileY) tuple

  • corrected_shifts (str, optional) – Method used to correct shifts to reference. If None, will use the one in ops. Defaults to None.

Returns:

A dictionary with the transformation parameters

Return type:

np.array

iss_preprocess.pipeline.stitch.get_tile_corners(data_path, prefix, roi)

Find the corners of all tiles for a roi

Parameters:
  • data_path (str) – Relative path to data

  • prefix (str) – Acquisition prefix. For round-based acquisition, round 1 will be used

  • roi (int) – Roi ID

Returns:

tile_corners, ntiles[0] x ntiles[1] x 2 x 4 matrix of tile

corners coordinates. Corners are in this order: [(origin), (0, 1), (1, 1), (1, 0)]

Return type:

numpy.ndarray

iss_preprocess.pipeline.stitch.load_tile_ref_coors(data_path, tile_coors, prefix, filter_r=True, projection=None, correct_illumination=True)

Load one single tile in the reference coordinates

This load a tile of prefix with channels/rounds registered

Parameters:
  • data_path (str) – Relative path to data

  • tile_coordinates (tuple) – (Roi, tileX, tileY) tuple

  • prefix (str) – Acquisition to load. If genes_round or barcode_round will load all the rounds.

  • filter_r (bool, optional) – Apply filter on rounds data? Parameters will be read from ops. Default to True

  • projection (str, optional) – Projection to load. If None, will use the one in ops. Default to None

  • correct_illumination (bool, optional) – Apply illumination correction. Default to True

Returns:

A (X x Y x Nchannels x Nrounds) registered stack np.array: A (X x Y) boolean array of bad pixels that fall outside image after

registration

Return type:

np.array

iss_preprocess.pipeline.stitch.register_adjacent_tiles(data_path, ref_coors=None, ref_ch=0, suffix='max', prefix='genes_round_1_1', correct_illumination=False, overlap_ratio=0.01, verbose=True, debug=False)

Estimate shift between adjacent imaging tiles using phase correlation.

Shifts are typically very similar between different tiles, using shifts estimated using a reference tile for the whole acquisition works well.

Parameters:
  • data_path (str) – path to image stacks.

  • ref_coors (tuple, optional) – coordinates of the reference tile to use for registration. Must not be along the bottom or right edge of image. If None use ops[‘ref_tile’]. Defaults to None.

  • ref_ch (int, optional) – reference channel used for registration. Defaults to 0.

  • suffix (str, optional) – File name suffix. Defaults to ‘proj’.

  • prefix (str, optional) – Full name of the acquisition folder

  • correct_illumination (bool, optional) – Remove black levels and correct illumination before registration if True, return raw data otherwise. Default to False

  • overlap_ratio (float, optional) – Minimum overlap between masks to consider the correlation results. Defaults to 0.01.

  • verbose (bool, optional) – If True, print warnings when shifts are large. Defaults to True.

  • debug (bool, optional) – Return additional information for debugging. Defaults to False.

Returns:

shift_right, X and Y shifts between different columns numpy.array: shift_down, X and Y shifts between different rows numpy.array: shape of the tile

Return type:

numpy.array

iss_preprocess.pipeline.stitch.register_all_rois_within(data_path, prefix=None, ref_ch=None, suffix='max-median', correct_illumination=True, roi2use=None, reload=False, save_plot=True, dimension_prefix=None, verbose=1, use_slurm=True, job_dependency=None, scripts_name=None, slurm_folder=None)

Register all tiles within each ROI

Parameters:
  • data_path (str) – Relative path to data

  • prefix (str, optional) – Prefix of acquisition to register. If None, will use the one in ops. Defaults to None.

  • ref_ch (int, optional) – Reference channel to use for registration. If None, will use the one in ops. Defaults to None.

  • suffix (str, optional) – Suffix to use to load the images. Defaults to ‘max-median’.

  • correct_illumination (bool, optional) – Correct illumination before registration. Defaults to True.

  • roi2use (list, optional) – List of ROI to use. If None or empty, will process all ROIs. Defaults to None

  • reload (bool, optional) – Reload saved shifts if True. Defaults to False.

  • save_plot (bool, optional) – Save diagnostic plot. Defaults to True.

  • dimension_prefix (str, optional) – Prefix to use to find ROI dimension. Used only if the acquisition is an overview. Defaults to ‘reference_prefix’.

  • verbose (int, optional) – Verbosity level. Defaults to 1.

  • use_slurm (bool, optional) – Use SLURM to parallelize the registration. Defaults to True.

  • job_dependency (list, optional) – List of job dependencies. Defaults to None.

  • script_names (str, optional) – Script names for slurm jobs. Defaults to None.

  • slurm_folder (str, optional) – Folder to save SLURM logs. Defaults to None.

Returns:

List of outputs from register_within_acquisition

Return type:

list

iss_preprocess.pipeline.stitch.register_within_acquisition(data_path, roi, prefix=None, ref_ch=None, suffix='max', correct_illumination=False, reload=True, save_plot=False, dimension_prefix='genes_round_1_1', min_corrcoef=0.6, max_delta_shift=20, verbose=2, raise_on_empty_line=False, *, use_slurm=False, dependency_type=None, job_dependency=None, slurm_folder=None, scripts_name=None, slurm_options=None, batch_param_names=None, batch_param_list=None)

Estimate shifts between all adjacent tiles of an roi

Saves shifts as reg/f”{prefix}_within”/f”{prefix}_{roi}_shifts.npz”

Parameters:
  • data_path (str) – path to image stacks.

  • roi (int) – id of ROI to load.

  • prefix (str, optional) – Full name of the acquisition folder.

  • ref_ch (int, optional) – reference channel used for registration. Defaults to 0.

  • suffix (str, optional) – File name suffix. Defaults to ‘proj’.

  • correct_illumination (bool, optional) – Remove black levels and correct illumination before registration if True, return raw data otherwise. Default to False

  • reload (bool, optional) – If target file already exists, reload instead of recomputing. Defaults to True

  • save_plot (bool, optional) – If True save diagnostic plot. Defaults to False

  • dimension_prefix (str, optional) – Prefix to use to find ROI dimension. Used only if the acquisition is an overview. Defaults to ‘genes_round_1_1’

  • min_corrcoef (float, optional) – Minimum correlation coefficient to consider a shift as valid. Defaults to 0.6.

  • max_delta_shift (int, optional) – Maximum shift, relative to median of the row or column, to consider a shift as valid. Defaults to 20.

  • verbose (int, optional) – Verbosity level. Defaults to 2.

  • raise_on_empty_line (bool, optional) – Raise an error if a row or a column has no valid shifts. If False, replace by the global median. Defaults to True

Returns:

dictionary containing the shifts, tile shape and number of tiles

Return type:

dict

iss_preprocess.pipeline.stitch.stitch_and_register(data_path, target_prefix, reference_prefix=None, roi=1, downsample=3, ref_ch=0, target_ch=0, estimate_scale=False, estimate_rotation=True, target_projection=None, use_masked_correlation=False, debug=False)

Stitch target and reference stacks and align target to reference

To speed up registration, images are downsampled before estimating registration parameters. These parameters are then applied to the full scale image.

The reference stack always use the “projection” from ops as suffix. The target uses the same by default but that can be specified with target_suffix

This does not use ops[‘max_shift_rounds’].

Parameters:
  • data_path (str) – Relative path to data.

  • reference_prefix (str) – Acquisition prefix to register the stitched image to. Typically, “genes_round_1_1”.

  • target_prefix (str) – Acquisition prefix to register.

  • roi (int, optional) – ROI ID to register (as specified in MicroManager). Defaults to 1.

  • downsample (int, optional) – Downsample factor for estimating registration parameter. Defaults to 5.

  • ref_ch (int, optional) – Channel of the reference image used for registration. Defaults to 0.

  • target_ch (int, optional) – Channel of the target image used for registration. Defaults to 0.

  • estimate_scale (bool, optional) – Whether to estimate scaling between target and reference images. Defaults to False.

  • estimate_rotation (bool, optional) – Whether to estimate rotation between target and reference images. Defaults to True.

  • target_suffix (str, optional) – Suffix to use for target stack. If None, will use the value from ops. Defaults to None.

  • use_masked_correlation (bool, optional) – Use masked correlation for registration Defaults to False.

  • debug (bool, optional) – If True, return full xcorr. Defaults to False.

Returns:

Stitched target image after registration. numpy.ndarray: Stitched reference image. float: Estimate rotation angle. tuple: Estimated X and Y shifts. float: Estimated scaling factor. dict: Debug information if debug is True.

Return type:

numpy.ndarray

iss_preprocess.pipeline.stitch.stitch_registered(data_path, prefix, roi, channels=0, ref_prefix=None, filter_r=False, projection=None, correct_illumination=True)

Load registered stack and stitch them

The output is in the reference coordinate.

Parameters:
  • data_path (str) – Relative path to data

  • prefix (str) – Prefix of acquisition to stitch

  • roi (int) – Roi ID

  • channels (list or int, optional) – Channel id(s). Defaults to 0.

  • ref_prefix (str, optional) – Prefix of reference acquisition to load shifts. If None, load from ops. Defaults to None.

  • filter_r (bool, optional) – Filter image before stitching? Defaults to False.

  • projection (str, optional) – Projection to load. If None, will use the one in ops. Default to None

  • correct_illumination (bool, optional) – Correct illumination before stitching. Defaults to True.

Returns:

stitched stack

Return type:

np.array

iss_preprocess.pipeline.stitch.stitch_tiles(data_path, prefix, roi=1, suffix='max', ich=0, correct_illumination=False, shifts_prefix=None, register_channels=True, allow_quick_estimate=False, filter_r=False)

Load and stitch tile images using saved tile shifts.

This will load the tile shifts saved by register_within_acquisition

Parameters:
  • data_path (str) – path to image stacks.

  • prefix (str) – prefix specifying which images to load, e.g. ‘round_01_1’

  • roi (int, optional) – id of ROI to load. Defaults to 1.

  • suffix (str, optional) – filename suffix. Defaults to ‘fstack’.

  • ich (int, optional) – index of the channel to stitch. Defaults to 0.

  • correct_illumination (bool, optional) – Remove black levels and correct illumination if True, return raw data otherwise. Default to False

  • shifts_prefix (str, optional) – prefix to use to load tile shifts. If not provided, use prefix. Defaults to None.

  • register_channels (bool, optional) – If True, register channels before stitching. Defaults to True.

  • allow_quick_estimate (bool, optional) – If True, will estimate shifts from a single tile if shifts.npz is not found. Defaults to False.

Returns:

stitched image.

Return type:

numpy.ndarray

iss_preprocess.pipeline.stitch.warp_stack_to_ref(stack, data_path, prefix, tile_coors, interpolation=1, bad_pixels=None)

Warp a stack to the reference coordinates

Parameters:
  • stack (np.array) – A (X x Y x Nchannels x Nrounds) stack

  • data_path (str) – Relative path to data

  • prefix (str) – Acquisition to use to find registration parameters

  • tile_coors (tuple) – (Roi, tileX, tileY) tuple

  • interpolation (int, optional) – Interpolation order. Defaults to 1.

  • bad_pixels (np.array, optional) – A (X x Y) boolean array of bad pixels that fall outside image after registration. If None, will not apply any mask. Defaults to None.

Returns:

A (X x Y x Nchannels x Nrounds) registered stack np.array: A (X x Y) boolean array of bad pixels that fall outside image after

registration

Return type:

np.array

Module contents