portals

mzml_portal

class dimspy.portals.mzml_portal.Mzml(filename: Union[str, _io.BytesIO], **kwargs)[source]

Bases: object

mzML portal

headers() → collections.OrderedDict[source]

Get all unique header or filter strings and associated scan ids. :return: Dictionary

scan_ids() → collections.OrderedDict[source]

Get all scan ids and associated headers or filter strings. :return: Dictionary

peaklist(scan_id, function_noise='median')dimspy.models.peaklist.PeakList[source]

Create a peaklist object for a specific scan id. :param scan_id: Scan id :param function_noise: Function to calculate the noise from each scan. The following options are available:

  • median - the median of all peak intensities within a given scan is used as the noise value.

  • mean - the unweighted mean average of all peak intensities within a given scan is used as the noise value.

  • mad (Mean Absolute Deviation) - the noise value is set as the mean of the absolute differences between peak intensities and the mean peak intensity (calculated across all peak intensities within a given scan).

Returns

PeakList object

peaklists(scan_ids, function_noise='median') → Sequence[dimspy.models.peaklist.PeakList][source]

Create a list of peaklist objects for each scan id in the list. :param scan_ids: List of scan ids

Parameters

function_noise – Function to calculate the noise from each scan. The following options are available:

  • median - the median of all peak intensities within a given scan is used as the noise value.

  • mean - the unweighted mean average of all peak intensities within a given scan is used as the noise value.

  • mad (Mean Absolute Deviation) - the noise value is set as the mean of the absolute differences between peak intensities and the mean peak intensity (calculated across all peak intensities within a given scan).

  • noise_packets - the noise value is calculated using the proprietary algorithms contained in Thermo Fisher Scientific’s msFileReader library. This option should only be applied when you are processing .RAW files.

Returns

List of PeakList objects

tics() → collections.OrderedDict[source]

Get all TIC values and associated scan ids :return: Dictionary

ion_injection_times() → collections.OrderedDict[source]

Get all ion injection time values and associated scan ids :return: Dictionary

scan_dependents() → list[source]

Get a nested list of scan id pairs. Each pair represents a fragementation event. :return: List

close()[source]

Close the reader/file object :return: None

thermo_raw_portal

dimspy.portals.thermo_raw_portal.mz_range_from_header(h: str) → list[source]

Extract the m/z range from a header or filterstring

Parameters

h – str

Returns

Sequence[float, float]

class dimspy.portals.thermo_raw_portal.ThermoRaw(filename)[source]

Bases: object

ThermoRaw portal

headers() → collections.OrderedDict[source]

Get all unique header or filter strings and associated scan ids. :return: Dictionary

scan_ids() → collections.OrderedDict[source]

Get all scan ids and associated headers or filter strings. :return: Dictionary

peaklist(scan_id, function_noise='noise_packets')dimspy.models.peaklist.PeakList[source]

Create a peaklist object for a specific scan id. :param scan_id: Scan id :param function_noise: Function to calculate the noise from each scan. The following options are available:

  • median - the median of all peak intensities within a given scan is used as the noise value.

  • mean - the unweighted mean average of all peak intensities within a given scan is used as the noise value.

  • mad (Mean Absolute Deviation) - the noise value is set as the mean of the absolute differences between peak intensities and the mean peak intensity (calculated across all peak intensities within a given scan).

  • noise_packets - the noise value is calculated using the proprietary algorithms contained in Thermo Fisher Scientific’s msFileReader library. This option should only be applied when you are processing .RAW files.

Returns

PeakList object

peaklists(scan_ids, function_noise='noise_packets') → Sequence[dimspy.models.peaklist.PeakList][source]

Create a list of peaklist objects for each scan id in the list. :param scan_ids: List of scan ids

Parameters

function_noise – Function to calculate the noise from each scan. The following options are available:

  • median - the median of all peak intensities within a given scan is used as the noise value.

  • mean - the unweighted mean average of all peak intensities within a given scan is used as the noise value.

  • mad (Mean Absolute Deviation) - the noise value is set as the mean of the absolute differences between peak intensities and the mean peak intensity (calculated across all peak intensities within a given scan).

  • noise_packets - the noise value is calculated using the proprietary algorithms contained in Thermo Fisher Scientific’s msFileReader library. This option should only be applied when you are processing .RAW files.

Returns

List of PeakList objects

tics() → collections.OrderedDict[source]

Get all TIC values and associated scan ids :return: Dictionary

ion_injection_times() → collections.OrderedDict[source]

Get all TIC values and associated scan ids :return: Dictionary

scan_dependents() → list[source]

Get a nested list of scan id pairs. Each pair represents a fragementation event. :return: List

close()[source]

Close the reader/file object :return: None

txt_portal

dimspy.portals.txt_portal.save_peaklist_as_txt(pkl: dimspy.models.peaklist.PeakList, filename: str, *args, **kwargs)[source]

Saves a peaklist object to a plain text file.

Parameters
  • pkl – the target peaklist object

  • filename – path to a new text file

  • args – arguments to be passed to PeakList.to_str

  • kwargs – keyword arguments to be passed to PeakList.to_str

dimspy.portals.txt_portal.load_peaklist_from_txt(filename: str, ID: any, delimiter: str = ',', flag_names: str = 'auto', has_flag_col: bool = True)[source]

Loads a peaklist from plain text file.

Parameters
  • filename – Path to an exiting text-based peaklist file

  • ID – ID of the peaklist

  • delimiter – Delimiter of the text lines. Default = ‘,’, i.e., CSV format

  • flag_names – Names of the flag attributes. Default = ‘auto’, indicating all the attribute names ends with “_flag” will be treated as flag attibute. Provide None to indicate no flag attributes

  • has_flag_col – Whether the text file contains the overall “flags” column. If True, it’s values will be discarded. The overall flags of the new peaklist will be calculated automatically. Default = True

Return type

PeakList object

dimspy.portals.txt_portal.save_peak_matrix_as_txt(pm: dimspy.models.peak_matrix.PeakMatrix, filename: str, *args, **kwargs)[source]

Saves a peak matrix in plain text file.

Parameters
  • pm – The target peak matrix object

  • filename – Path to a new text file

  • args – Arguments to be passed to PeakMatrix.to_str

  • kwargs – Keyword arguments to be passed to PeakMatrix.to_str

dimspy.portals.txt_portal.load_peak_matrix_from_txt(filename: str, delimiter: str = '\t', samples_in_rows: bool = True, comprehensive: str = 'auto')[source]

Loads a peak matrix from plain text file.

Parameters
  • filename – Path to an exiting text-based peak matrix file

  • delimiter – Delimiter of the text lines. Default = ‘ ‘, i.e., TSV format

  • samples_in_rows – Whether or not the samples are stored in rows. Default = True

  • comprehensive – Whether the input is a ‘comprehensive’ or ‘simple’ version of the matrix. Default = ‘auto’, i.e., auto detect

Return type

PeakMatrix object

hdf5_portal

dimspy.portals.hdf5_portal.save_peaklists_as_hdf5(pkls: Sequence[dimspy.models.peaklist.PeakList], filename: str, compatibility_mode: bool = False)[source]

Saves multiple peaklists in a HDF5 file.

Parameters
  • pkls – The target list of peaklist objects

  • filename – Path to a new HDF5 file

  • compatibility_mode – Change mode to read previous DIMSpy v1.* based HDF5 file

To incorporate with different dtypes in the attribute matrix, this portal converts all the arribute values into fix-length strings for HDF5 data tables storage. The order of the peaklists will be retained.

dimspy.portals.hdf5_portal.load_peaklists_from_hdf5(filename: str, compatibility_mode: bool = False)[source]

Loads a list of peaklist objects from a HDF5 file.

Parameters
  • filename – Path to a HDF5 file

  • compatibility_mode – Change mode to read previous DIMSpy v1.* based HDF5 file

Return type

Sequence[PeakList]

The values in HDF5 data tables are automatically converted to their original dtypes before loading in the peaklist.

dimspy.portals.hdf5_portal.save_peak_matrix_as_hdf5(pm: dimspy.models.peak_matrix.PeakMatrix, filename: str, compatibility_mode: bool = False)[source]

Saves a peak matrix object to a HDF5 file.

Parameters
  • pm – The target peak matrix object

  • filename – Path to a new HDF5 file

The order of the attributes and flags will be retained.

dimspy.portals.hdf5_portal.load_peak_matrix_from_hdf5(filename: str, compatibility_mode: bool = False)[source]

Loads a peak matrix from a HDF5 file.

Parameters

filename – Path to an existing HDF5 file

Return type

PeakMatrix object

paths

dimspy.portals.paths.sort_ms_files_by_timestamp(ps)[source]

Sort a set directory of .mzml or .raw files

Parameters

ps – List of paths

:return List

dimspy.portals.paths.validate_and_sort_paths(source, tsv)[source]

Validate and sort a set (i.e. directory or hdf5 file) of .mzml or .raw files.

Parameters
  • tsv – Path to tab-separated file

  • source – Path to a Path to the .hdf5 file to read from.

Returns

List