iss_preprocess.call package¶
Submodules¶
iss_preprocess.call.call module¶
- iss_preprocess.call.call.basecall_rois(rois, separate_rounds=True, rounds=(), nsamples=None)¶
Assign bases using a Gaussian Mixture Model.
- Parameters:
rois (list) – list of ROI objects.
separate_rounds (bool) – whether to run basecalling separately on each round or on all rounds together. Default True.
rounds – numpy.array of rounds to include.
nsamples (int) – number of samples to include for fitting GMM. If None, all data are used for fitting. Default None.
- Returns:
ROIs x rounds of base IDs.
- iss_preprocess.call.call.call_genes(sequences, codebook)¶
Assigns sequences to genes based on the provided codebook.
- Parameters:
sequences (numpy.ndarray) – ROIs x rounds array of base IDs generated by basecall_rois.
codebook (pandas.DataFrame) – gene codes, containing ‘gii’, ‘seq’, and ‘gene’ columns.
- Returns:
List of most closely matching gene names. List of edit distances.
- iss_preprocess.call.call.extract_spots(spots, stack, spot_radius=2)¶
Extract fluorescence traces of spots and assign them to a column of the DataFrame.
- Parameters:
spots (pandas.DataFrame)
stack (numpy.ndarray) – X x Y x C x R stack.
spot_radius (int, optional) – Radius of the spot. Defaults to 2.
- Returns:
- same as input with a new “traces” column containing
a R x C array of fluorescence value
- Return type:
spots (pandas.DataFrame)
- iss_preprocess.call.call.get_cluster_means(spots, initial_cluster_mean, score_thresh=0.0)¶
Find the mean of the 4 clusters (one per channel)
- Parameters:
spots (pandas.DataFrame) – Dataframe of extracted spot.
score_thresh (float, optional) – score_thresh arguments for scaled_k_means. Scalar between 0 and 1. To give a different score for each cluster, give a list of Nc floats. Only points with dot product to a cluster mean vector greater than this contribute to new estimate of mean vector. Defaults to 0.
- Returns:
- A list with Nrounds elements. Each a Ncl x Nch (square
because N channels is equal to N clusters) array of cluster means
spot_colors (numpy.ndarray): Nrounds x Nch x Nspots array of spot colors cluster_inds (list): A list with Nrounds elements. Each a Nspots array of
- Return type:
cluster_means (list)
- iss_preprocess.call.call.rois_to_array(rois, normalize=True)¶
iss_preprocess.call.omp module¶
- iss_preprocess.call.omp.barcode_spots_dot_product(spots, cluster_means, norm_shift=0, sequence_column='sequence')¶
Compute dot product between synthetic trace and observed trace for each spot.
The synthetic trace is estimated using the provided bleeedthrough matrix. The observed trace is first background subtracted using the same approach as used in the OMP algorithm.
- Parameters:
spots (pandas.DataFrame) – barcode spot table containing ‘trace’ column with fluorescence values.
cluster_means (numpy.ndarray) – Nrounds x Nchannels x Nclusters bleedthrough matrix of fluorescence values for each cluster (i.e. base).
norm_shift (float) – small value added to the norm of the observed trace. This penalizes the dot product score for spots with very low signal.
sequence_column (str) – name of column in spots table containing the sequence. Default is ‘sequence’, but could also be ‘corrected_sequence’.
- Returns:
List of dot product scores for each spot.
- iss_preprocess.call.omp.make_background_vectors(nrounds=7, nchannels=4)¶
Create background vectors for OMP algorithm. There is one vector for each channel. Each vector has fluorescence in one channel across all rounds. Vectors are normalized to have unit norm.
- Parameters:
nrounds (int) – number of rounds
nchannels (int) – number of channels.
- Returns:
round x channel numpy.ndarray of background vectors.
- iss_preprocess.call.omp.make_gene_templates(cluster_means, codebook)¶
Make dictionary of fluorescence values for each gene by finding well-matching spots.
- Parameters:
rois (list) – list of ROI objects containing fluorescence traces
codebook (pandas.DataFrame) – gene codes, containing ‘gii’, ‘seq’, and ‘gene’ columns.
- Returns:
- N x genes numpy.ndarray containing dictionary of fluorescence values for
each gene.
List of detected gene names.
- iss_preprocess.call.omp.omp(y, X, background_vectors=None, max_comp=None, tol=0.05)¶
Run Orthogonal Matching Pursuit to identify components present in the input signal.
The algorithm works by iteratively. At each step we find the component that has the highest dot product with the residual of the input signal. After selecting a component, coefficients for all included components are estimated by least squares regression and the residuals are updated. The component is retained if it reduces the norm of the residuals by at least a fraction of the original norm specified by the tolerance parameter.
Background vectors are automatically included.
Algorithm stops when the tolerance threshold is reach or the number of components reaches max_comp.
- Parameters:
y (numpy.ndarray) – length N input signal.
X (numpy.ndarray) – N x M dictionary of M components.
background_vectors (numpy.ndarray) – N x O dictionary of background components.
max_comp (int) – maximum number of components to include.
tol (float) – tolerance threshold that determines the minimum fraction of the residual norm to retain a component.
- Returns:
Length M + O array of component coefficients Length N array of residuals
- iss_preprocess.call.omp.omp_weighted(y, X, background_vectors=None, max_comp=None, tol=0.05, alpha=120.0, beta_squared=1.0, weighted=True, refit_background=False, norm_shift=0.0)¶
Run Orthogonal Matching Pursuit to identify components present in the input signal.
The algorithm works by iteratively. At each step we find the component that has the highest dot product with the residual of the input signal. After selecting a component, coefficients for all included components are estimated by least squares regression and the residuals are updated. The component is retained if it reduces the norm of the residuals by at least a fraction of the original norm specified by the tolerance parameter.
Background vectors are automatically included.
Algorithm stops when the tolerance threshold is reach or the number of components reaches max_comp.
- Parameters:
y (numpy.ndarray) – length N input signal.
X (numpy.ndarray) – N x M dictionary of M components.
background_vectors (numpy.ndarray) – N x O dictionary of background components.
max_comp (int) – maximum number of components to include.
tol (float) – tolerance threshold that determines the minimum fraction of the residual norm to retain a component.
alpha (float) – Controls the influence of the previously selected components
selected (on the current weights. Higher alpha increases the effect of the)
contributions (components')
the (making the algorithm more sensitive to)
components. (already chosen)
beta_squared (float) – This parameter sets a baseline for the variance in the weights calculation. It ensures that the weights are not solely dependent on the residuals but also have a minimum variance that can stabilize the process.
weighted (bool) – whether to use weighted OMP. Default is True.
refit_background (bool) – whether to refit background coefficients on every iteration. Default is True.
norm_shift (float) – additional shift to add to the norm of the pixel trace. Larger values reduce false positive gene calls in dim pixels. Default is 0.
- Returns:
Length M + O array of component coefficients Length N array of residuals
- iss_preprocess.call.omp.refine_gene_templates(rois, gene_dict, unique_genes, thresh=0.8, vis=False)¶
Refine gene templates by finding spots that match the template and averaging their fluorescence values.
TODO: This function is currently unused. Needs to be updated to work with new data structures.
- Parameters:
rois (list) – list of ROI objects containing fluorescence traces
gene_dict (N x genes numpy.ndarray) – dictionary of fluorescence values for each gene.
unique_genes (list) – list of gene names.
thresh (float) – threshold for matching spots to gene template. Default: 0.8.
vis (bool) – whether to visualize gene templates. Default: False.
- Returns:
- N x genes numpy.ndarray containing dictionary of fluorescence values for
each gene.
- iss_preprocess.call.omp.run_omp(stack, gene_dict, tol=0.05, weighted=True, refit_background=True, alpha=120.0, beta_squared=1.0, norm_shift=0.0, max_comp=None, min_intensity=0)¶
Apply the OMP algorithm to every pixel of the provided image stack.
- Parameters:
stack (numpy.ndarray) – X x Y x C x R image stack.
gene_dict (numpy.ndarray) – N x M dictionary, where N = R * C and M is the number of genes.
tol (float) – tolerance threshold for OMP algorithm.
weighted (bool) – whether to use weighted OMP. Default is True.
refit_background (bool) – whether to refit background coefficients on every iteration. Default is True.
alpha (float) – parameter for weighted OMP.
beta_squared (float) – parameter for weighted OMP.
norm_shift (float) – additional shift to add to the norm of the pixel trace. Larger values reduce false positive gene calls in dim pixels. Default is 0.
max_comp (int) – maximum number of components to use in OMP. Default is None, in which case OMP proceeds until the tolerance threshold is reached.
min_intensity (float) – minimum intensity for a pixel to be considered. Calculated as the mean absolute value of the pixel trace. Default is 0.
- Returns:
Gene coefficient matrix of shape X x Y x M Background coefficient matrix of shape X x Y x C Residual stack, shape X x Y x (R * C)
iss_preprocess.call.spot_shape module¶
- iss_preprocess.call.spot_shape.apply_symmetry(spot_sign_image)¶
Generates a circularly symmetric spot image by averaging pixels at the same distance from the centre.
- Parameters:
spot_sign_image (numpy.ndarray) – inputs spot image
- Returns:
circularly symmetric spot image
- Return type:
numpy.ndarray
- iss_preprocess.call.spot_shape.detect_spots_by_shape(im, spot_sign_image, threshold=0, rho=2)¶
Detect spots in an image based on similarity to a spot sign image.
- Parameters:
im (numpy.ndarray) – input image
spot_sign_image (numpy.ndarray) – average spot sign image to use as a template
spots (in filtering)
threshold (float) – threshold for initial spot detection. Default: 0.
rho (float) – multiplier that defines the relative weight assigned to
pixels. (positive spot) – Default: 2.
- Returns:
spot coordinates and scores
- Return type:
pandas.DataFrame
- iss_preprocess.call.spot_shape.find_gene_spots(g, spot_sign_image, gene_names, rho=2, spot_score_threshold=0.05, disk_radius=2)¶
Finds gene spots and extracts additional gene coefficient statistics.
- Parameters:
g (numpy.ndarray) – X x Y x Ngenes OMP output
spot_sign_image (numpy.ndarray) – Average spot sign image for filtering
gene_names (list) – List of gene names corresponding to g’s third dimension
rho (float) – Weight multiplier for positive spot pixels (default: 2)
spot_score_threshold (float) – Minimum score threshold for including spots
(default – 0.05)
disk_radius (int) – Radius of the disk to extract gene coefficients (default: 2)
- Returns:
List of pandas.DataFrame with spot coordinates and scores for each gene. pandas.DataFrame: DataFrame with spot x gene coefficients.
- Return type:
list
- iss_preprocess.call.spot_shape.get_spot_shape(g, spot_xy=7, neighbor_filter_size=9, neighbor_threshold=15)¶
Get average spot shape.
- Parameters:
g (numpy.ndarray) – X x Y x Ngenes OMP output
spot_xy (int) – spot radius to extract
neighbor_filter_size (int) – size of the square filter used for counting pixels
selection (in initial spot)
neighbor_threshold (int) – minimum number of positive pixels for a spot to be
average (included in the)
- Returns:
(spot_xy + 1) x (spot_xy+1) mean spot image.
- Return type:
numpy.ndarray