Configuration File¶

This is the core of the algorithm, so this file has to be filled properly based on your data. Even if all key parameters of the algorithm are listed in the file, only few are likely to be modified by a non-advanced user. The configuration file is divided in several sections. For all those sections, we will review the parameters, and tell you what are the most important ones

Data¶

The data section is:

file_format    =          # Can be raw_binary, openephys, hdf5, ... See >> spyking-circus help -i for more info
stream_mode    = None     # None by default. Can be multi-files, or anything depending to the file format
mapping        =          # Mapping of the electrode (see http://spyking-circus.rtfd.org)
suffix         =          # Suffix to add to generated files
overwrite      = True     # If you want to filter or remove artefacts on site. Data are duplicated otherwise
output_dir     =          # By default, generated data are in the same folder as the data.
parallel_hdf5  = True     # Use the parallel HDF5 feature (if available)

Warning

This is the most important section, that will allow the code to properly load your data. If not properly filled, then results will be wrong. Note that depending on your file_format, you may need to add here several parameters, such as sampling_rate, data_dtype, … They will be requested if they can not be infered from the header of your data structure. To check if data are properly loaded, consider using the preview mode before launching the whole algorithm

Parameters that are most likely to be changed:

file_format You must select a supported file format (see What are the supported formats) or write your own wrapper (see Write your own data format)
mapping This is the path to your probe mapping (see How to design a probe file)
stream_mode If streams in you data (could be multi-files, or even in the same file) should be processed together (see Using multi files)
overwrite If True, data are overwritten during filtering, assuming the file format has write access. Otherwise, an external raw_binary file will be created during the filtering step, if any.
ouput_dir If you want all the file generated by SpyKING CIRCUS to be in a particular directory, instead of next to the raw data
parallel_hdf5 Try to use the option for parallel write of HDF5. Need to be configured (see how to install hdf5)

Detection¶

The detection section is:

radius         = auto       # Radius [in um] (if auto, read from the prb file)
N_t            = 5          # Width of the templates [in ms]
spike_thresh   = 6          # Threshold for spike detection
peaks          = negative   # Can be negative (default), positive or both
dead_channels  =            # If not empty or specified in the probe, a dictionary {channel_group : [list_of_valid_ids]}
weird_thresh   =            # If not empty, threshold [in MAD] for artefact detection

Parameters that are most likely to be changed:

N_t The temporal width of the templates. For in vitro data, 5ms seems a good value. For in vivo data, you should rather use 3 or even 2ms
radius The spatial width of the templates. By default, this value is read from the probe file. However, if you want to specify a larger or a smaller value [in um], you can do it here
spike_thresh The threshold for spike detection. 6-7 are good values
peaks By default, the code detects only negative peaks, but you can search for positive peaks, or both
dead_channels You can exclude dead channels either directly in the probe file, with the channels list, or with this dead_channels parameter. To do so, you must enter a dictionary of the following form {channel_group : [list_of_valid_ids]}
ẁeird_thresh If you want to explicit tell the code to ignore all peaks that will be abnormally large. All peaks (in abs value) higher than weird_thresh. MAD will be discarded

Filtering¶

The filtering section is:

cut_off        = 300, auto # Min and Max (auto=nyquist) cut off frequencies for the band pass butterworth filter [Hz]
filter         = True      # If True, then a low-pass filtering is performed
remove_median  = False     # If True, median over all channels is substracted to each channels (movement artefacts)
common_ground  =           # If you want to use a particular channel as a reference ground: should be a valid channel number
sat_value      =           # Values higher than sat_value are set to 0 during filtering (in % of max dtype) [0,1]

Warning

The code performs the filtering of your data writing on the file itself. Therefore, you must have a copy of your raw data elsewhere. Note that as long as your keeping the parameter files, you can relaunch the code safely: the program will not filter multiple times the data, because of the flag filter_done at the end of the configuration file.

Parameters that are most likely to be changed:

cut_off The default value of 500Hz has been used in various recordings, but you can change it if needed. You can also specify the upper bound of the Butterworth filter
filter If your data are already filtered by a third program, turn that flag to False
remove_median If you have some movement artefacts in your in vivo recording, and want to substract the median activity over all analysed channels from each channel individually
common_ground If you want to use a particular channel as a reference, and subtract its activity from all others. Note that the activity on this particular channel will thus be null
sat_value If your recording has some saturation problems, this might lead to artefacts while filtering. This option prevents the problem, by tagging all the values, in the raw recording (before filtering) that will be higher than sat_value times the maximal values allowed in the raw data given the data type. These values will be set to 0 and logged to disk in a file.

Triggers¶

The triggers section is:

trig_file      =            # External stimuli to be considered as putative artefacts [in trig units] (see documentation)
trig_windows   =            # The time windows of those external stimuli [in trig units]
trig_unit      = ms         # The unit in which times are expressed: can be ms or timestep
clean_artefact = False      # If True, external artefacts induced by triggers will be suppressed from data
dead_file      =            # Portion of the signals that should be excluded from the analysis [in dead units]
dead_unit      = ms         # The unit in which times for dead regions are expressed: can be ms or timestep
ignore_times   = False      # If True, any spike in the dead regions will be ignored by the analysis
make_plots     =            # Generate sanity plots of the averaged artefacts [Nothing or None if no plots]

Parameters that are most likely to be changed:

trig_file The path to the file where your artefact times and labels. See how to deal with stimulation artefacts
trig_windows The path to file where your artefact temporal windows. See how to deal with stimulation artefacts
clean_artefact If you want to remove any stimulation artefacts, defined in the previous files. See how to deal with stimulation artefacts
make_plots The default format to save the plots of the artefacts, one per artefact, showing all channels. You can set it to None if you do not want any
trig_unit If you want times/duration in the trig_file and trig_windows to be in timestep or ms
dead_file The path to the file where the dead portions of the recording, that should be excluded from the analysis, are specified. . See how to deal with stimulation artefacts
dead_unit If you want times/duration in the dead_file to be in timestep or ms
ignore_times If you want to remove any dead portions of the recording, defined in dead_file. See how to deal with stimulation artefacts

Whitening¶

The whitening section is:

spatial        = True      # Perform spatial whitening
max_elts       = 10000     # Max number of events per electrode (should be compatible with nb_elts)
nb_elts        = 0.8       # Fraction of max_elts that should be obtained per electrode [0-1]
output_dim     = 5         # Can be in percent of variance explain, or num of dimensions for PCA on waveforms

Parameters that are most likely to be changed:

output_dim If you want to save some memory usage, you can reduce the number of features kept to describe a waveform.

Clustering¶

The clustering section is:

extraction     = median-raw # Can be either median-raw (default), median-pca, mean-pca, mean-raw, or quadratic
sub_dim        = 10         # Number of dimensions to keep for local PCA per electrode
max_elts       = 10000      # Max number of events per electrode (should be compatible with nb_elts)
nb_elts        = 0.8        # Fraction of max_elts that should be obtained per electrode [0-1]
nb_repeats     = 3          # Number of passes used for the clustering
make_plots     =            # Generate sanity plots of the clustering
merging_method = nd-bhatta  # Method to perform local merges (distance, dip, folding, nd-folding, bhatta)
merging_param  = default    # Merging parameter (see docs) (3 if distance, 0.5 if dip, 1e-9 if folding, 2 if bhatta)
sensitivity    = 3          # The only parameter to control the cluster. The lower, the more sensitive
cc_merge       = 0.95       # If CC between two templates is higher, they are merged
dispersion     = (5, 5)     # Min and Max dispersion allowed for amplitudes [in MAD]
smart_search   = True       # Parameter to activate the smart search mode

Note

This is the a key section, as bad clustering will implies bad results. However, the code is very robust to parameters changes.

Parameters that are most likely to be changed:

extraction The method to estimate the templates. Raw methods are slower, but more accurate, as data are read from the files. PCA methods are faster, but less accurate, and may lead to some distorted templates. Quadratic is slower, and should not be used.
max_elts The number of elements that every electrode will try to collect, in order to perform the clustering
nb_repeats The number of passes performed by the algorithm to refine the density landscape
smart_search By default, the code will collect only a subset of spikes, randomly, on all electrodes. However, for long recordings, or if you have low thresholds, you may want to select them in a smarter manner, in order to avoid missing the large ones, under represented. If the smart search is activated, the code will first sample the distribution of amplitudes, on all channels, and then implement a rejection algorithm such that it will try to select spikes in order to make the distribution of amplitudes more uniform.
cc_merge After local merging per electrode, this step will make sure that you do not have duplicates in your templates, that may have been spread on several electrodes. All templates with a correlation coefficient higher than that parameter are merged. Remember that the more you merge, the faster is the fit
merging_method Several methods can be used to perform greedy local merges on each electrodes. Each of the method has a parameter, defined by merge_param. This replaces former parameters sim_same_elec and dip_threshold
dispersion The spread of the amplitudes allowed, for every templates, around the centroid.
make_plots By default, the code generates sanity plots of the clustering, one per electrode.

Fitting¶

The fitting section is:

amp_limits     = (0.3, 30) # Amplitudes for the templates during spike detection
amp_auto       = True      # True if amplitudes are adjusted automatically for every templates
collect_all    = False     # If True, one garbage template per electrode is created, to store unfitted spikes
ratio_thresh   = 0.9       # Ratio of the spike_threshold used while fitting [0-1]. The lower the slower
mse_error      = False     # If True, RMS is collected over time, to assess quality of reconstruction

Parameters that are most likely to be changed:

collect_all If you want to also collect all the spike times at which no templates were fitted. This is particularly useful to debug the algorithm, and understand if something is wrong on a given channel
ratio_thresh If you want to get more spikes for the low amplitudes templates, you can decrease this value. It will slow down the fitting procedure, but collect more spikes for the templates with

an amplitude close to threshold

Merging¶

The merging section is:

erase_all      = True       # If False, a prompt will ask you to remerge if merged has already been done
cc_overlap     = 0.85       # Only templates with CC higher than cc_overlap may be merged
cc_bin         = 2          # Bin size for computing CC [in ms]
default_lag    = 5          # Default length of the period to compute dip in the CC [ms]
auto_mode      = 0.75       # Between 0 (aggressive) and 1 (no merging). If empty, GUI is launched
remove_noise   = False      # If True, meta merging will remove obvious noise templates (weak amplitudes)
noise_limit    = 0.75       # Amplitude at which templates are classified as noise
sparsity_limit = 0.75       # Sparsity level (in percentage) for selecting templates as putative noise (in [0, 1])
time_rpv       = 5          # Time [in ms] to consider for Refraction Period Violations (RPV) (0 to disable)
rpv_threshold  = 0.02       # Percentage of RPV allowed while merging
merge_drifts   = True       # Try to automatically merge drifts, i.e. non overlapping spiking neurons
drift_limit    = 0.1        # Distance for drifts. The higher, the more non-overlapping the activities should be

To know more about how those merges are performed and how to use this option, see Automatic Merging. Parameters that are most likely to be changed:

erase_all If you want to always erase former merging, and skip the prompt
auto_mode If your recording is stationary, you can try to perform a fully automated merging. By setting a positive value, you control the level of merging performed by the software. Values such as 0.75 should be a good start, but see see Automatic Merging for more details. The lower, the more the merging will be aggressive.
remove_noise If you want to automatically get rid of noise templates (very weak ones), just set this value to True.
noise_limit normalized amplitude (with respect to the detection threshold) below which templates are considered as noise
sparsity_limit To be considered as noisy templates, sparsity level that must be achieved by the templates. Internally, the code sets to 0 channels without any useful information. So the sparsity is the ratio between the number of channels with non-zero values divided by the number of channels that should have had a signal. Usually, noise tends to only be defined on few channels (if not only one)
time_rpv When performing merges, the code wil check if the merged unit has a valid ISI without any RPV. If yes, then merge is performed, and otherwise this is avoided. This is the default time using to compute RPV. If you want to disable this feature, set this value to 0.
rpv_threshold Percentage of RPV allowed while merging, you can increase it if you want to be less stringent.
drift_limit To assess if a unit is drifting or not, we compute distances between the histograms of the spike times, for a given pair of cells, and assess how much do they overlap. For drifting units, they should not overlap by much, and the threshold can be set by this value. The higher, the more histograms should be distinct to be merged.

Converting¶

The converting section is:

erase_all      = True      # If False, a prompt will ask you to export if export has already been done
sparse_export  = True      # If True, data for phy are exported in a sparse format. Need recent version of phy
export_pcs     = prompt    # Can be prompt [default] or in none, all, some
export_all     = False     # If True, unfitted spikes will be exported as the last Ne templates

Parameters that are most likely to be changed:

erase_all If you want to always erase former export, and skip the prompt
sparse_export If you have a large number of templates or a very high density probe, you should use the sparse format for phy
export_pcs If you already know that you want to have all, some, or no PC and skip the prompt
export_all If you used the collect_all mode in the [fitting] section, you can export unfitted spike times to phy. In this case, the last N templates, if N is the number of electrodes, are the garbage collectors.

Extracting¶

The extracting section is:

safety_time    = 1         # Temporal zone around which spikes are isolated [in ms]
max_elts       = 10000     # Max number of events per templates (should be compatible with nb_elts)
nb_elts        = 0.8       # Fraction of max_elts that should be obtained per electrode [0-1]
output_dim     = 5         # Percentage of variance explained while performing PCA
cc_merge       = 0.975     # If CC between two templates is higher, they are merged
noise_thr      = 0.8       # Minimal amplitudes are such than amp*min(templates) < noise_thr*threshold

This is an experimental section, not used by default in the algorithm, so nothing to be changed here

Validating¶

The validating section is:

nearest_elec   = auto      # Validation channel (e.g. electrode closest to the ground truth cell)
max_iter       = 200       # Maximum number of iterations of the stochastic gradient descent (SGD)
learning_rate  = 1.0e-3    # Initial learning rate which controls the step-size of the SGD
roc_sampling   = 10        # Number of points to estimate the ROC curve of the BEER estimate
test_size      = 0.3       # Portion of the dataset to include in the test split
radius_factor  = 0.5       # Radius factor to modulate physical radius during validation
juxta_dtype    = uint16    # Type of the juxtacellular data
juxta_thresh   = 6         # Threshold for juxtacellular detection
juxta_valley   = False     # True if juxta-cellular spikes are negative peaks
juxta_spikes   =           # If none, spikes are automatically detected based on juxta_thresh
filter         = True      # If the juxta channel need to be filtered or not
make_plots     = png       # Generate sanity plots of the validation [Nothing or None if no plots]

Please get in touch with us if you want to use this section, only for validation purposes. This is an implementation of the BEER metric