iss_preprocess.call package

Submodules

iss_preprocess.call.call module

iss_preprocess.call.call.basecall_rois(rois, separate_rounds=True, rounds=(), nsamples=None)

Assign bases using a Gaussian Mixture Model.

Parameters:
  • rois (list) – list of ROI objects.

  • separate_rounds (bool) – whether to run basecalling separately on each round or on all rounds together. Default True.

  • rounds – numpy.array of rounds to include.

  • nsamples (int) – number of samples to include for fitting GMM. If None, all data are used for fitting. Default None.

Returns:

ROIs x rounds of base IDs.

iss_preprocess.call.call.call_genes(sequences, codebook)

Assigns sequences to genes based on the provided codebook.

Parameters:
  • sequences (numpy.ndarray) – ROIs x rounds array of base IDs generated by basecall_rois.

  • codebook (pandas.DataFrame) – gene codes, containing ‘gii’, ‘seq’, and ‘gene’ columns.

Returns:

List of most closely matching gene names. List of edit distances.

iss_preprocess.call.call.extract_spots(spots, stack, spot_radius=2)

Extract fluorescence traces of spots and assign them to a column of the DataFrame.

Parameters:
  • spots (pandas.DataFrame)

  • stack (numpy.ndarray) – X x Y x C x R stack.

  • spot_radius (int, optional) – Radius of the spot. Defaults to 2.

Returns:

same as input with a new “traces” column containing

a R x C array of fluorescence value

Return type:

spots (pandas.DataFrame)

iss_preprocess.call.call.get_cluster_means(spots, initial_cluster_mean, score_thresh=0.0)

Find the mean of the 4 clusters (one per channel)

Parameters:
  • spots (pandas.DataFrame) – Dataframe of extracted spot.

  • score_thresh (float, optional) – score_thresh arguments for scaled_k_means. Scalar between 0 and 1. To give a different score for each cluster, give a list of Nc floats. Only points with dot product to a cluster mean vector greater than this contribute to new estimate of mean vector. Defaults to 0.

Returns:

A list with Nrounds elements. Each a Ncl x Nch (square

because N channels is equal to N clusters) array of cluster means

spot_colors (numpy.ndarray): Nrounds x Nch x Nspots array of spot colors cluster_inds (list): A list with Nrounds elements. Each a Nspots array of

Return type:

cluster_means (list)

iss_preprocess.call.call.rois_to_array(rois, normalize=True)

iss_preprocess.call.omp module

iss_preprocess.call.omp.barcode_spots_dot_product(spots, cluster_means, norm_shift=0, sequence_column='sequence')

Compute dot product between synthetic trace and observed trace for each spot.

The synthetic trace is estimated using the provided bleeedthrough matrix. The observed trace is first background subtracted using the same approach as used in the OMP algorithm.

Parameters:
  • spots (pandas.DataFrame) – barcode spot table containing ‘trace’ column with fluorescence values.

  • cluster_means (numpy.ndarray) – Nrounds x Nchannels x Nclusters bleedthrough matrix of fluorescence values for each cluster (i.e. base).

  • norm_shift (float) – small value added to the norm of the observed trace. This penalizes the dot product score for spots with very low signal.

  • sequence_column (str) – name of column in spots table containing the sequence. Default is ‘sequence’, but could also be ‘corrected_sequence’.

Returns:

List of dot product scores for each spot.

iss_preprocess.call.omp.make_background_vectors(nrounds=7, nchannels=4)

Create background vectors for OMP algorithm. There is one vector for each channel. Each vector has fluorescence in one channel across all rounds. Vectors are normalized to have unit norm.

Parameters:
  • nrounds (int) – number of rounds

  • nchannels (int) – number of channels.

Returns:

round x channel numpy.ndarray of background vectors.

iss_preprocess.call.omp.make_gene_templates(cluster_means, codebook)

Make dictionary of fluorescence values for each gene by finding well-matching spots.

Parameters:
  • rois (list) – list of ROI objects containing fluorescence traces

  • codebook (pandas.DataFrame) – gene codes, containing ‘gii’, ‘seq’, and ‘gene’ columns.

Returns:

N x genes numpy.ndarray containing dictionary of fluorescence values for

each gene.

List of detected gene names.

iss_preprocess.call.omp.omp(y, X, background_vectors=None, max_comp=None, tol=0.05)

Run Orthogonal Matching Pursuit to identify components present in the input signal.

The algorithm works by iteratively. At each step we find the component that has the highest dot product with the residual of the input signal. After selecting a component, coefficients for all included components are estimated by least squares regression and the residuals are updated. The component is retained if it reduces the norm of the residuals by at least a fraction of the original norm specified by the tolerance parameter.

Background vectors are automatically included.

Algorithm stops when the tolerance threshold is reach or the number of components reaches max_comp.

Parameters:
  • y (numpy.ndarray) – length N input signal.

  • X (numpy.ndarray) – N x M dictionary of M components.

  • background_vectors (numpy.ndarray) – N x O dictionary of background components.

  • max_comp (int) – maximum number of components to include.

  • tol (float) – tolerance threshold that determines the minimum fraction of the residual norm to retain a component.

Returns:

Length M + O array of component coefficients Length N array of residuals

iss_preprocess.call.omp.omp_weighted(y, X, background_vectors=None, max_comp=None, tol=0.05, alpha=120.0, beta_squared=1.0, weighted=True, refit_background=False, norm_shift=0.0)

Run Orthogonal Matching Pursuit to identify components present in the input signal.

The algorithm works by iteratively. At each step we find the component that has the highest dot product with the residual of the input signal. After selecting a component, coefficients for all included components are estimated by least squares regression and the residuals are updated. The component is retained if it reduces the norm of the residuals by at least a fraction of the original norm specified by the tolerance parameter.

Background vectors are automatically included.

Algorithm stops when the tolerance threshold is reach or the number of components reaches max_comp.

Parameters:
  • y (numpy.ndarray) – length N input signal.

  • X (numpy.ndarray) – N x M dictionary of M components.

  • background_vectors (numpy.ndarray) – N x O dictionary of background components.

  • max_comp (int) – maximum number of components to include.

  • tol (float) – tolerance threshold that determines the minimum fraction of the residual norm to retain a component.

  • alpha (float) – Controls the influence of the previously selected components

  • selected (on the current weights. Higher alpha increases the effect of the)

  • contributions (components')

  • the (making the algorithm more sensitive to)

  • components. (already chosen)

  • beta_squared (float) – This parameter sets a baseline for the variance in the weights calculation. It ensures that the weights are not solely dependent on the residuals but also have a minimum variance that can stabilize the process.

  • weighted (bool) – whether to use weighted OMP. Default is True.

  • refit_background (bool) – whether to refit background coefficients on every iteration. Default is True.

  • norm_shift (float) – additional shift to add to the norm of the pixel trace. Larger values reduce false positive gene calls in dim pixels. Default is 0.

Returns:

Length M + O array of component coefficients Length N array of residuals

iss_preprocess.call.omp.refine_gene_templates(rois, gene_dict, unique_genes, thresh=0.8, vis=False)

Refine gene templates by finding spots that match the template and averaging their fluorescence values.

TODO: This function is currently unused. Needs to be updated to work with new data structures.

Parameters:
  • rois (list) – list of ROI objects containing fluorescence traces

  • gene_dict (N x genes numpy.ndarray) – dictionary of fluorescence values for each gene.

  • unique_genes (list) – list of gene names.

  • thresh (float) – threshold for matching spots to gene template. Default: 0.8.

  • vis (bool) – whether to visualize gene templates. Default: False.

Returns:

N x genes numpy.ndarray containing dictionary of fluorescence values for

each gene.

iss_preprocess.call.omp.run_omp(stack, gene_dict, tol=0.05, weighted=True, refit_background=True, alpha=120.0, beta_squared=1.0, norm_shift=0.0, max_comp=None, min_intensity=0)

Apply the OMP algorithm to every pixel of the provided image stack.

Parameters:
  • stack (numpy.ndarray) – X x Y x C x R image stack.

  • gene_dict (numpy.ndarray) – N x M dictionary, where N = R * C and M is the number of genes.

  • tol (float) – tolerance threshold for OMP algorithm.

  • weighted (bool) – whether to use weighted OMP. Default is True.

  • refit_background (bool) – whether to refit background coefficients on every iteration. Default is True.

  • alpha (float) – parameter for weighted OMP.

  • beta_squared (float) – parameter for weighted OMP.

  • norm_shift (float) – additional shift to add to the norm of the pixel trace. Larger values reduce false positive gene calls in dim pixels. Default is 0.

  • max_comp (int) – maximum number of components to use in OMP. Default is None, in which case OMP proceeds until the tolerance threshold is reached.

  • min_intensity (float) – minimum intensity for a pixel to be considered. Calculated as the mean absolute value of the pixel trace. Default is 0.

Returns:

Gene coefficient matrix of shape X x Y x M Background coefficient matrix of shape X x Y x C Residual stack, shape X x Y x (R * C)

iss_preprocess.call.spot_shape module

iss_preprocess.call.spot_shape.apply_symmetry(spot_sign_image)

Generates a circularly symmetric spot image by averaging pixels at the same distance from the centre.

Parameters:

spot_sign_image (numpy.ndarray) – inputs spot image

Returns:

circularly symmetric spot image

Return type:

numpy.ndarray

iss_preprocess.call.spot_shape.detect_spots_by_shape(im, spot_sign_image, threshold=0, rho=2)

Detect spots in an image based on similarity to a spot sign image.

Parameters:
  • im (numpy.ndarray) – input image

  • spot_sign_image (numpy.ndarray) – average spot sign image to use as a template

  • spots (in filtering)

  • threshold (float) – threshold for initial spot detection. Default: 0.

  • rho (float) – multiplier that defines the relative weight assigned to

  • pixels. (positive spot) – Default: 2.

Returns:

spot coordinates and scores

Return type:

pandas.DataFrame

iss_preprocess.call.spot_shape.find_gene_spots(g, spot_sign_image, gene_names, rho=2, spot_score_threshold=0.05, disk_radius=2)

Finds gene spots and extracts additional gene coefficient statistics.

Parameters:
  • g (numpy.ndarray) – X x Y x Ngenes OMP output

  • spot_sign_image (numpy.ndarray) – Average spot sign image for filtering

  • gene_names (list) – List of gene names corresponding to g’s third dimension

  • rho (float) – Weight multiplier for positive spot pixels (default: 2)

  • spot_score_threshold (float) – Minimum score threshold for including spots

  • (default – 0.05)

  • disk_radius (int) – Radius of the disk to extract gene coefficients (default: 2)

Returns:

List of pandas.DataFrame with spot coordinates and scores for each gene. pandas.DataFrame: DataFrame with spot x gene coefficients.

Return type:

list

iss_preprocess.call.spot_shape.get_spot_shape(g, spot_xy=7, neighbor_filter_size=9, neighbor_threshold=15)

Get average spot shape.

Parameters:
  • g (numpy.ndarray) – X x Y x Ngenes OMP output

  • spot_xy (int) – spot radius to extract

  • neighbor_filter_size (int) – size of the square filter used for counting pixels

  • selection (in initial spot)

  • neighbor_threshold (int) – minimum number of positive pixels for a spot to be

  • average (included in the)

Returns:

(spot_xy + 1) x (spot_xy+1) mean spot image.

Return type:

numpy.ndarray

Module contents