models

peaklist

class dimspy.models.peaklist.PeakList(ID: str, mz: Sequence[float], intensity: Sequence[float], **metadata)[source]

Bases: object

The PeakList class.

Stores mass spectrometry peaks list data. It requires an ID, mz values, and intensities. It can store extra peak attributes e.g. SNRs, and peaklist tags and metadata. It utilises the automatically managed flags to “remove” or “retain” peaks without actually delete them. Therefore the filterings on the peaks are traceable.

Parameters
  • ID – The ID of the peaklist data, unique string or integer value is recommended

  • mz – Mz values of all the peaks. Must in the ascending order

  • intensity – Intensities of all the peaks. Must have the same size as mz

  • kwargs – Key-value pairs of the peaklist metadata

>>> mz_values = np.random.uniform(100, 1200, size = 100)
>>> int_values = np.random.normal(60, 10, size = 100)
>>> peaks = PeakList('dummy', mz_values, int_values, description = 'a dummy peaklist')

Internally the peaklist data is stored by using numpy structured array namely the attribute talbe (this may change in the future):

mz

intensity

snr

snr_flag

flags*

102.5

21.7

10.5

True

True

111.7

12.3

5.1

False

False

126.3

98.1

31.7

True

True

133.1

68.9

12.6

True

True

Each column is called an attribute. The first two attributes are fixed as “mz” and “intensity”. They cannot be added or removed as the others. The last “attribute” is the “flags”, which is fact stored separately. The “flags” column is calculated automatically according to all the manually set flag attributes, e.g., the “snr_flag”. It can only be changed by the class itself. The unflagged peaks are considered as “removed”. They are kept internally mainly for visualization and tracing purposes.

Warning

Removing a flag attribute may change the “flags” column, and cause the unflagged peaks to be flagged again. As most the processes are applied only on the flagged peaks, these peaks, if the others have gone through such process, may have incorrect values.

In principle, setting a flag attribute should be considered as an irreversible process.

property ID

Property of the peaklist ID.

Getter

Returns the peaklist ID

Setter

Set the peaklist ID

Type

Same as input ID

add_attribute(attr_name: str, attr_value: Sequence, attr_dtype: Optional[Union[Type, str]] = None, is_flag: bool = False, on_index: Optional[int] = None, flagged_only: bool = True, invalid_value=nan)[source]

Adds an new attribute to the PeakList attribute table.

Parameters
  • attr_name – The name of the new attribute, must be a string

  • attr_value – The values of the new attribute. It’s size must equals to PeakList.size (if flagged_only == True), or PeakList.full_size (if flagged_only == False)

  • attr_dtype – The data type of the new attribute. If it is set to None, the PeakList will try to detect the data type based on attr_value. If the detection failed it will take the “object” type. Default = None

  • is_flag – Whether the new attribute is a flag attribute, i.e., will be used in flags calculation. Default = False

  • on_index – Insert the new attribute on a specific column. It can’t be 0 or 1, as the first two attributes are fixed as mz and intensity. Setting to None means to put it to the last column. Default = None

  • flagged_only – Whether the attr_value is set to the flagged peaks or all peaks. Default = True

  • invalid_value – If flagged_only is set to True, this value will be assigned to the unflagged peaks. The actual value depends on the attribute data type. For instance, on a boolean attribute invalid_value = 0 will be converted to False. Default = numpy.nan

Return type

PeakList object (self)

property attributes

Property of the attribute names.

Getter

Returns a tuple of the attribute names

Type

tuple

calculate_flags()[source]

Re-calculates the flags according to the flag attributes.

Return type

numpy array

Note

This method will be called automatically every time a flag attribute is added, removed, or changed.

cleanup_unflagged_peaks(flag_name: Optional[str] = None)[source]

Remove unflagged peaks.

Parameters

flag_name – Remove peaks unflagged by this flag attribute. Setting None means to remove peaks unflagged by the overall flags. Default = None

Return type

PeakList object (self)

>>> print(peaks)
mz, intensity, intensity_flag, snr, snr_flag, flags
10, 70, True, 10, False, False
20, 60, True, 20, True, True
30, 50, False, 30, True, False
40, 40, False, 40, True, False
>>> print(peaks.cleanup_unflagged_peaks('snr_flag'))
mz, intensity, intensity_flag, snr, snr_flag, flags
20, 60, True, 20, True, True
30, 50, False, 30, True, False
40, 40, False, 40, True, False
>>> print(peaks.cleanup_unflagged_peaks())
mz, intensity, intensity_flag, snr, snr_flag, flags
20, 60, True, 20, True, True
copy()[source]

Returns a deep copy of the peaklist.

Return type

PeakList object

drop_attribute(attr_name: str)[source]

Drops an existing attribute.

Parameters

attr_name – The attribute name to drop. It cannot be mz, intensity, or flags

Return type

PeakList object (self)

property dtable

Property of the overall attribute table.

Getter

Returns the original attribute table

Type

numpy structured array

Warning

This property directly accesses the internal attribute table. Be careful when manipulating the data, particularly pay attention to the potential side-effects.

property flag_attributes

Property of the flag attribute names.

Getter

Returns a tuple of the flag attribute names

Type

tuple

property flags

Property of the flags.

Getter

Returns a deep copy of the flags array

Type

numpy array

property full_shape

Property of the peaklist full attributes table shape.

Getter

Returns the full attibutes table shape, including the unflagged peaks

Type

tuple

property full_size

Property of the peaklist full size.

Getter

Returns the full peaklist size, i.e., including the unflagged peaks

Type

int

get_attribute(attr_name: str, flagged_only: bool = True)[source]

Gets values of an existing attribute.

Parameters
  • attr_name – The attribute to get values

  • flagged_only – Whether to return the values of flagged peaks or all peaks. Default = True

Return type

numpy array

get_peak(peak_index: Union[int, Sequence[int]], flagged_only: bool = True)[source]

Gets values of a peak.

Parameters
  • peak_index – The index of the peak to get values

  • flagged_only – Whether the values are taken from the index of flagged peaks or all peaks. Default = True

Return type

numpy array

has_attribute(attr_name: str)[source]

Checks whether there exists an attribute in the table.

Parameters

attr_name – The attribute name for checking

Return type

bool

insert_peak(peak_value: Sequence)[source]

Insert a new peak.

Parameters

peak_value – The values of the new peak. Must contain values for all the attributes. It’s position depends on the mz value, i.e., the 1st value of the input

Return type

PeakList object (self)

property metadata

Property of the peaklist metadata.

Getter

Returns an access interface to the peaklist metadata object

Type

PeakList_Metadata object

property peaks

Property of the attribute table.

Getter

Returns a deep copy of the flagged attribute table

Type

numpy structured array

remove_peak(peak_index: Union[int, Sequence[int]], flagged_only: bool = True)[source]

Remove an existing peak.

Parameters
  • peak_index – The index of the peak to remove

  • flagged_only – Whether the index is for flagged peaks or all peaks. Default = True

Return type

PeakList object (self)

set_attribute(attr_name: str, attr_value: Sequence, flagged_only: bool = True, unsorted_mz: bool = False)[source]

Sets values to an existing attribute.

Parameters
  • attr_name – The attribute to set values

  • attr_value – The new attribute values, It’s size must equals to PeakList.size (if flagged_only == True), or PeakList.full_size (if flagged_only == False)

  • flagged_only – Whether the attr_value is set to the flagged peaks or all peaks. Default = True

  • unsorted_mz – Whether the attr_value contains unsorted mz values. This parameter is valid only when attr_name == “mz”. Default = False

Return type

PeakList object (self)

set_peak(peak_index: int, peak_value: Sequence, flagged_only: bool = True)[source]

Sets values to a peak.

Parameters
  • peak_index – The index of the peak to set values

  • peak_value – The new peak values. Must contain values for all the attributes (not including flags)

  • flagged_only – Whether the peak_value is set to the index of flagged peaks or all peaks. Default = True

Return type

PeakList object (self)

>>> print(peaks)
mz, intensity, snr, flags
10, 10, 10, True
20, 20, 20, True
30, 30, 30, False
40, 40, 40, True
>>> print(peaks.set_peak(2, [50, 50, 50], flagged_only = True))
mz, intensity, snr, flags
10, 10, 10, True
20, 20, 20, True
30, 30, 30, False
50, 50, 50, True
>>> print(peaks.set_peak(2, [40, 40, 40], flagged_only = False))
mz, intensity, snr, flags
10, 10, 10, True
20, 20, 20, True
40, 40, 40, False
50, 50, 50, True
property shape

Property of the peaklist attributes table shape.

Getter

Returns the attibutes table shape, i.e., peaks number x attributes number. The “flags” column does not count

Type

tuple

property size

Property of the peaklist size.

Getter

Returns the flagged peaklist size

Type

int

sort_peaks_order()[source]

Sorts peaklist mz values into ascending order.

Note

This method will be called automatically every time the mz values are changed.

property tags

Property of the peaklist tags.

Getter

Returns an access interface to the peaklist tags object

Type

PeakList_Tags object

to_df()[source]

Exports peaklist attribute table to Pandas DataFrame, including the flags.

Return type

pd.DataFrame

to_dict(dict_type: Callable[[Sequence], Mapping] = <class 'collections.OrderedDict'>) → Mapping[source]

Exports peaklist attribute table to a dictionary (mappable object), including the flags.

Parameters

dict_type – Result dictionary type, Default = OrderedDict

Return type

list

to_list()[source]

Exports peaklist attribute table to a list, including the flags.

Return type

list

to_str(delimiter: str = ',')[source]

Exports peaklist attribute table to a string, including the flags. It can also be used inexplicitly.

Return type

str

peaklist_metadata

class dimspy.models.peaklist_metadata.PeakList_Metadata[source]

Bases: dict

The PeakList_Metadata class.

Dictionary-like container for PeakList metadata storage.

Parameters
  • args – Iterable object of key-value pairs

  • kwargs – Metadata key-value pairs

>>> PeakList_Metadata([('name', 'sample_1'), ('qc', False)])
>>> PeakList_Metadata(name = 'sample_1', qc = False)

metadata attributes can be accessed in both dictionary-like and property-like manners.

>>> meta = PeakList_Metadata(name = 'sample_1', qc = False)
>>> meta['name']
sample_1
>>> meta.qc
False
>>> del meta.qc
>>> meta.has_key('qc')
False

Warning

The __getattr__, __setattr__, and __delattr__ methods are overrided. DO NOT assign a metadata object to another metadata object, e.g., metadata.metadata.attr = value.

peaklist_tags

class dimspy.models.peaklist_tags.PeakList_Tags(*args, **kwargs)[source]

Bases: object

The PeakList_Tags class.

Container for both typed and untyped tags. This class is mainly used in PeakList and PeakMatrix classes for sample filtering. For a PeakList the tag types must be unique, but not the tag values (unless they are untyped). For instance, PeakList can have tags batch = 1 and plate = 1, but not batch = 1 and batch = 2, or (untyped) 1 and (untyped) 1. Single value will be treated as untyped tag.

Parameters
  • args – List of untyped tags

  • kwargs – List of typed tags. Only one tag value can be assigned to a specific tag type

>>> PeakList_Tags('untyped_tag1', Tag('untyped_tag2'), Tag('typed_tag', 'tag_type'))
>>> PeakList_Tags(tag_type1 = 'tag_value1', tag_type2 = 'tag_value2')
add_tag(tag: Union[int, float, str, dimspy.models.peaklist_tags.Tag], tag_type: Optional[str] = None)[source]

Adds typed or untyped tag.

Parameters
  • tag – Tag or tag value to add

  • tag_type – Type of the tag value

>>> tags = PeakList_Tags()
>>> tags.add_tag('untyped_tag1')
>>> tags.add_tag(Tag('typed_tag1', 'tag_type1'))
>>> tags.add_tag(tag_type2 = 'typed_tag2')
drop_all_tags()[source]

Drops all tags, both typed and untyped.

drop_tag(tag: Union[int, float, str, dimspy.models.peaklist_tags.Tag], tag_type: Optional[str] = None)[source]

Drops typed and untyped tag.

Parameters
  • tag – Tag or tag value to drop

  • tag_type – Type of the tag value

>>> tags = PeakList_Tags('untyped_tag1', tag_type1 = 'tag_value1')
>>> tags.drop_tag(Tag('tag_value1', 'tag_type1'))
>>> print(tags)
untyped_tag1
drop_tag_type(tag_type: Optional[str] = None)[source]

Drops the tag with the given type.

Parameters

tag_type – Tag type to drop, None (untyped) may drop multiple tags

has_tag(tag: Union[int, float, str, dimspy.models.peaklist_tags.Tag], tag_type: Optional[str] = None)[source]

Checks whether there exists a specific tag.

Parameters
  • tag – The tag for checking

  • tag_type – The type of the tag

Return type

bool

>>> tags = PeakList_Tags('untyped_tag1', Tag('tag_value1', 'tag_type1'))
>>> tags.has_tag('untyped_tag1')
True
>>> tags.has_tag('typed_tag1')
False
>>> tags.has_tag(Tag('tag_value1', 'tag_type1'))
True
>>> tags.has_tag('tag_value1', 'tag_type1')
True
has_tag_type(tag_type: Optional[str] = None)[source]

Checks whether there exists a specific tag type.

Parameters

tag_type – The tag type for checking, None indicates untyped tags

Return type

bool

tag_of(tag_type: Optional[str] = None)[source]

Returns tag value of the given tag type, or tuple of untyped tags if tag_type is None.

Parameters

tag_type – Valid tag type, None for untyped tags

Return type

Tag, or None if tag_type not exists

property tag_types

Property of included tag types. None indicates untyped tags included.

Getter

Returns a set containing all the tag types of the typed tags

Type

set

property tag_values

Property of included tag values. Same tag values will be merged

Getter

Returns a set containing all the tag values, both typed and untyped tags

Type

set

property tags

Property of all included tags.

Getter

Returns a tuple containing all the tags, both typed and untyped

Type

tuple

to_list()[source]

Exports tags to a list. Each element is a tuple of (tag value, tag type).

>>> tags = PeakList_Tags('untyped_tag1', tag_type1 = 'tag_value1')
>>> tags.to_list()
[('untyped_tag1', None), ('tag_value1', 'tag_type1')]
Return type

list

to_str()[source]

Exports tags to a string. It can also be used inexplicitly as

>>> tags = PeakList_Tags('untyped_tag1', tag_type1 = 'tag_value1')
>>> print(tags)
untyped_tag1, tag_type1:tag_value1
Return type

str

property typed_tags

Property of included typed tags.

Getter

Returns a tuple containing all the typed tags

Type

tuple

property untyped_tags

Property of included untyped tags.

Getter

Returns a tuple containing all the untyped tags

Type

tuple

class dimspy.models.peaklist_tags.Tag(value: Union[int, float, str, dimspy.models.peaklist_tags.Tag], ttype: Optional[str] = None)[source]

Bases: object

The Tag class.

This class is mainly used in PeakList and PeakMatrix classes for sample filtering.

Parameters
  • value – Tag value, must be number (int, float), string (ascii, unicode), or Tag object (ignore ttype setting)

  • ttype – Tag type, must be string or None (untyped), default = None

Single value will be treated as untyped tag:

>>> tag = Tag(1)
>>> tag == 1
True
>>> tag = Tag(1, 'batch')
>>> tag == 1
False
property ttype

Property of tag type. None indicates untyped tag.

Getter

Returns the type of the tag

Setter

Set the tag type, must be None or string

Type

None, str, unicode

property typed

Property to decide if the tag is typed or untyped.

Getter

Returns typed status of the tag

Type

bool

property value

Property of tag value.

Getter

Returns the value of the tag

Setter

Set the tag value, must be number or string

Type

int, float, str, unicode

peak_matrix

class dimspy.models.peak_matrix.PeakMatrix(peaklist_ids: Sequence[str], peaklist_tags: Sequence[dimspy.models.peaklist_tags.PeakList_Tags], peaklist_attributes: Sequence[Tuple[str, Any]])[source]

Bases: object

The PeakMatrix class.

Stores aligned mass spectrometry peaks matrix data. It requires IDs, tags, and attributes from the source peak lists. It uses tags based mask to “hide” the unrelated samples for convenient processing. It utilises the automatically managed flags to “remove” peaks without actually delete them. Therefore the filterings on the peaks are traceable. Normally, PeakMatrix object is created by functions e.g. align_peaks() rather than manual.

Parameters
  • peaklist_ids – The IDs of the source peak lists

  • peaklist_tags – The tags (PeakList_Tags) of the source peak lists

  • peaklist_attributes – The attributes of the source peak lists. Must be a list or tuple in the format of [(attr_name, attr_matrix), …], where attr_name is name of the attribute, and attr_matrix is the vertically stacked arrtibute values in the shape of samples x peaks. The order of the attributes will be kept in the PeakMatrix. The first two attributes must be “mz” and “intensity”.

>>> pids = [pl.ID for pl in peaklists]
>>> tags = [pl.tags for pl in peaklists]
>>> attrs = [(attr_name, np.vstack([pl[attr_name] for pl in peaklists]))                  for attr_name in peaklists[0].attributes]
>>> pm = PeakMatrix(pids, tags, attrs)

Internally the attribute data is stored in OrderedDict as a list of matrix. An attribute matrix can be illustrated as follows, in which the mask and flags are the same for all attributes. The final row “flags” is automatically calculated based on the manually added flags. It decides which peaks are “removed” i.e. unflagged. Particularly, the “–” indicates no peak in that sample can be aligned into the mz value.

attribute: “mz”

mask

peak_1

peak_2

peak_3

False

12.7

14.9

21.0

True

15.1

21.1

False

12.1

14.7

False

12.9

14.8

20.9

flag_1

True

False

True

flag_2

True

True

False

flags*

True

False

False

Warning

Removing a flag may change the overall “flags”, and cause the unflagged peaks to be flagged again. As most the processes are applied only on the flagged peaks, these peaks, if the others have gone through such process, may have incorrect values.

In principle, setting a flag attribute should be considered as an irreversible process.

Different from the flags, mask should be considered as a more temporary way to hide the unrelated samples. A masked sample (row) will not be used for processing, but its data is still in the attribute matrix. For this reason, the mask_peakmatrix, unmask_peakmatrix, and unmask_all_peakmatrix statements are provided as a more flexible way to set / unset the mask.

add_flag(flag_name: str, flag_values: Sequence[bool], flagged_only: bool = True)[source]

Adds a flag to the peak matrix peaks.

Parameters
  • flag_name – name of the flag, it must be unique and not equal to “flags”

  • flag_values – values of the flag. It must have a length of pm.shape[1] if flagged_only = True, or pm.full_shape[1] if flagged_only = False

  • flagged_only – whether to set the flagged peaks only. Default = True, and the values of the unflagged peaks are set to False

The overall flags property will be automatically recalculated.

attr_matrix(attr_name: str, flagged_only: bool = True)[source]

Obtains an existing attribute matrix.

Parameters
  • attr_name – name of the target attribute

  • flagged_only – whether to return the flagged values only. Default = True

Return type

numpy array

attr_mean_vector(attr_name: str, flagged_only: bool = True)[source]

Obtains the mean array of an existing attribute matrix.

Parameters
  • attr_name – name of the target attribute

  • flagged_only – whether to return the mean array of the flagged values only. Default = True

Return type

numpy array

Noting that only the “present” peaks will be used for mean values calculation. If the attribute matrix has a string / unicode data type, the values in each column will be concatenated.

property attributes

Property of the attribute names.

Getter

returns a tuple including the names of the attribute matrix

Type

tuple

drop_flag(flag_name: str)[source]

Drops a existing flag from the peak matrix.

Parameters

flag_name – name of the flag to drop. It must exist and not equal to “flags”

The overall flags property will be automatically recalculated.

extract_peaklist(peaklist_id: str)[source]

Extracts one peaklist from the peak matrix.

Parameters

peaklist_id – ID of the peaklist to extract

Return type

PeakList object

Only the “present” peaks will be included in the result peaklist.

extract_peaklists()[source]

Extracts all peaklists from the peak matrix.

Return type

list

property flag_names

Property of the flag names.

Getter

returns a tuple including the names of the manually set flags

Type

tuple

flag_values(flag_name: str)[source]

Obtains values of an existing flag.

Parameters

flag_name – name of the target flag. It must exist and not equal to “flags”

Return type

numpy array

property flags

Property of the flags.

Getter

returns a deep copy of the flags array

Type

numpy array

property fraction

Property of the fraction array.

Getter

returns the fraction array, indicating the ratio of present peaks on each mz value

Type

numpy array

>>> print pm.present
array([3, 4, 2, 3, 3])
>>> print pm.shape[0]
4
>>> print pm.fraction
array([0.75, 1.0, 0.5, 0.75, 0.75])
property full_shape

Property of the peak matrix full shape.

Getter

returns the full shape of the attribute matrix, i.e., ignore mask and flags

Type

tuple

property intensity_matrix

Property of the intensity matrix.

Getter

returns the intensity attribute matrix, unmasked and flagged values only

Type

numpy array

property intensity_mean_vector

Property of the intensity mean values array.

Getter

returns the mean values array of the intensity attribute matrix, unmasked and flagged values only

Type

numpy array

is_empty()[source]

Checks whether the peak matrix is empty under the current mask and flags.

Return type

bool

property mask

Property of the mask.

Getter

returns a deep copy of the mask array

Setter

sets the mask array. Provide None to unmask all samples

Type

numpy array

mask_tags(*args, **kwargs)[source]

Masks samples with particular tags.

Parameters
  • args – tags or untyped tag values for masking

  • kwargs – typed tags for masking

  • override – whether to override the current mask, default = False

Return type

PeakMatrix object (self)

This function will mask samples with ALL the tags. To match ANY of the tags, use cascade form instead.

>>> pm.mask_tags('qc', plate = 1)
(will mask all QC samples on plate 1)
>>> pm.mask_tags('qc').mask_tags(plate = 1)
(will mask QC samples and all samples on plate 1)
property missing_values

Property of the missing values array.

Getter

returns the missing values array, indicating the number of unaligned peaks on each sample

Type

numpy array

>>> print pm.present_matrix
array([[ True,  True,  True,  True, False],
       [ True,  True, False, False,  True],
       [ True,  True,  True,  True,  True],
       [False,  True, False,  True,  True],])
>>> print pm.missing_values
array([1, 2, 0, 2])
property mz_matrix

Property of the mz matrix.

Getter

returns the mz attribute matrix, unmasked and flagged values only

Type

numpy array

property mz_mean_vector

Property of the mz mean values array.

Getter

returns the mean values array of the mz attribute matrix, unmasked and flagged values only

Type

numpy array

property occurrence

Property of the occurrence array.

Getter

returns the occurrence array, indicating the total number of peaks (including peaks in the same sample) aliged in each mz value. This property is valid only when the intra_count attribute matrix is available

Type

numpy array

>>> print pm.attr_matrix('intra_count')
array([[ 2,  1,  1,  1,  0],
       [ 1,  1,  0,  0,  1],
       [ 1,  3,  1,  2,  1],
       [ 0,  1,  0,  1,  1],])
>>> print pm.occurrence
array([ 4,  6,  2,  4,  3])
property peaklist_ids

Property of the source peaklist IDs.

Getter

returns a tuple including the IDs of the source peaklists

Type

tuple

property peaklist_tag_types

Property of the source peaklist tag types.

Getter

returns a tuple including the types of the typed tags of the source peaklists

Type

set

property peaklist_tag_values

Property of the source peaklist tag values.

Getter

returns a tuple including the values of the source peaklists tags, both typed and untyped

Type

set

property peaklist_tags

Property of the source peaklist tags.

Getter

returns a tuple including the Peaklist_Tags objects of the source peaklists

Type

tuple

property present

Property of the present array.

Getter

returns the present array, indicating how many peaks are aligned in each mz value

Type

numpy array

property present_matrix

Property of the present matrix.

Getter

returns the present matrix, indicating whether a sample has peak(s) aligned in each mz value

Type

numpy array

>>> print pm.present_matrix
array([[ True,  True,  True,  True, False],
       [ True,  True, False, False,  True],
       [ True,  True,  True,  True,  True],
       [False,  True, False,  True,  True],])
>>> print pm.present
array([3, 4, 2, 3, 3])
property(prop_name: str, flagged_only: bool = True)[source]

Obtains an existing attribute matrix.

Parameters
  • prop_name – name of the target property. Valid properties include ‘present’, ‘present_matrix’, ‘fraction’, ‘missing_values’, ‘occurrence’, and ‘purity’

  • flagged_only – whether to return the flagged values only. Default = True

Return type

numpy array

property purity

Property of the purity level array.

Getter

returns the purity array, indicating the ratio of only one peak in each sample being aligned in each mz value. This property is valid only when the intra_count attribute matrix is available

Type

numpy array

>>> print pm.attr_matrix('intra_count')
array([[ 2,  1,  1,  1,  0],
       [ 1,  1,  0,  0,  1],
       [ 1,  3,  1,  2,  1],
       [ 0,  1,  0,  1,  1],])
>>> print pm.purity
array([ 0.667,  0.75,  1.0,  0.667,  1.0])
remove_empty_peaks()[source]

Removes empty peaks from the peak matrix.

Empty peaks are peaks with not valid m/z or intensity value over the samples. They may occur after removing an entire sample from the peak matrix, e.g., remove the blank samples in the blank filter.

Return type

PeakMatrix object (self)

remove_peaks(peak_ids, flagged_only: bool = True)[source]

Removes peaks from the peak matrix.

Parameters
  • peak_ids – the indices of the peaks to remove

  • flagged_only – whether the indices are for flagged peaks or all peaks. Default = True

Return type

PeakMatrix object (self)

remove_samples(sample_ids, masked_only: bool = True)[source]

Removes samples from the peak matrix.

Parameters
  • sample_ids – the indices of the samples to remove

  • masked_only – whether the indices are for unmasked samples or all samples. Default = True

Return type

PeakMatrix object (self)

rsd(*args, **kwargs)[source]

Calculates relative standard deviation (RSD) array.

Parameters
  • args – tags or untyped tag values for RSD calculation, no value = calculate over all samples

  • kwargs – typed tags for RSD calculation, no value = calculate over all samples

  • on_attr – calculate RSD on given attribute. Default = “intensity”

  • flagged_only – whether to calculate on flagged peaks only. Default = True

Type

numpy array

The RSD is calculated as:

>>> rsd = std(pm.intensity_matrix, axis = 0, ddof = 1) / mean(pm.intensity_matrix, axis = 0) * 100

Noting that the means delta degrees of freedom (ddof) is set to 1 for standard deviation calculation. Moreover, only the “present” peaks will be used for calculation. If a column has less than 2 peaks, the corresponding rsd value will be set to np.nan.

property shape

Property of the peak matrix shape.

Getter

returns the shape of the attribute matrix

Type

tuple

tags_of(tag_type: Optional[str] = None)[source]

Obtains tags of the peaklist_tags with particular tag type.

Parameters

tag_type – the type of the returning tags. Provide None to obtain untyped tags

Return type

tuple

to_peaklist(ID: str)[source]

Averages the peak matrix into a single peaklist.

Parameters

ID – ID of the merged peaklist

Return type

PeakList object

Only the “present” peaks will be included in the result peaklist. The new peaklist will only contain the following attributes: mz, intensity, present, fraction, rsd, occurence, and purity.

Use unmask statement to calculate the peaklist for a particular group of samples:

>>> with unmask_peakmatrix(pm, 'Sample') as m: pkl = m.to_peaklist('averaged_peaklist')

Or use mask statement to exclude a particular group of samples:

>>> with mask_peakmatrix(pm, 'QC') as m: pkl = m.to_peaklist('averaged_peaklist')
to_str(attr_name: str = 'intensity', delimiter: str = '\t', samples_in_rows: bool = True, comprehensive: bool = True, rsd_tags: Sequence = ())[source]

Exports the peak matrix to a string.

Parameters
  • attr_name – name of the attribute matrix for exporting. Default = ‘intensity’

  • delimiter – delimiter to separate the matrix. Default = ‘ ‘, i.e., TSV format

  • samples_in_rows – whether or not the samples are stored in rows. Default = True

  • comprehensive – whether to include comprehensive info, e.g., mask, flags, present, rsd etc. Default = True

  • rsd_tags – peaklist tags for RSD calculation. Default = (), indicating only the overall RSD is included

Return type

str

unmask_tags(*args, **kwargs)[source]

Unmasks samples with particular tags.

Parameters
  • args – tags or untyped tag values for unmasking

  • kwargs – typed tags for unmasking

  • override – whether to override the current mask, default = False

Return type

PeakMatrix object (self)

This function will unmask samples with ALL the tags. To unmask ANY of the tags, use cascade form instead.

>>> pm.mask = [True] * pm.full_shape[0]
>>> pm.unmask_tags('qc', plate = 1)
(will unmask all QC samples on plate 1)
>>> pm.unmask_tags('qc').unmask_tags(plate = 1)
(will unmask QC samples and all samples on plate 1)
class dimspy.models.peak_matrix.mask_all_peakmatrix(pm: dimspy.models.peak_matrix.PeakMatrix)[source]

Bases: object

The mask_all_peakmatrix statement.

Temporary mask all the peak matrix samples. Within the statement the samples can be motified or removed. After leaving the statement the original mask will be recoverd.

Parameters

pm – the target peak matrix

Return type

PeakMatrix object

>>> print pm.peaklist_ids
('sample_1', 'sample_2', 'qc_1', 'sample_3', 'sample_4', 'qc_2')
>>> with mask_all_peakmatrix(pm) as m: print m.peaklist_ids
()
>>> print pm.peaklist_ids
('sample_1', 'sample_2', 'qc_1', 'sample_3', 'sample_4', 'qc_2')
class dimspy.models.peak_matrix.mask_peakmatrix(pm: dimspy.models.peak_matrix.PeakMatrix, *args, **kwargs)[source]

Bases: object

The mask_peakmatrix statement.

Temporary mask the peak matrix with particular tags. Within the statement the samples can be motified or removed. After leaving the statement the original mask will be recoverd.

Parameters
  • pm – the target peak matrix

  • override – whether to override the current mask, default = True

  • args – target tag values, both typed and untyped

  • kwargs – target typed tag types and values

Return type

PeakMatrix object

>>> print pm.peaklist_ids
('sample_1', 'sample_2', 'qc_1', 'sample_3', 'sample_4', 'qc_2')
>>> with mask_peakmatrix(pm., 'qc') as m: print m.peaklist_ids
('sample_1', 'sample_2', 'sample_3', 'sample_4')
>>> print pm.peaklist_ids
('sample_1', 'sample_2', 'qc_1', 'sample_3', 'sample_4', 'qc_2')
class dimspy.models.peak_matrix.unmask_all_peakmatrix(pm: dimspy.models.peak_matrix.PeakMatrix)[source]

Bases: object

The unmask_all_peakmatrix statement.

Temporary unmask all the peak matrix samples. Within the statement the samples can be motified or removed. After leaving the statement the original mask will be recoverd.

Parameters

pm – the target peak matrix

Return type

PeakMatrix object

>>> print pm.peaklist_ids
('sample_1', 'sample_2', 'qc_1', 'sample_3', 'sample_4', 'qc_2')
>>> with unmask_all_peakmatrix(pm) as m: print m.peaklist_ids
('sample_1', 'sample_2', 'qc_1', 'sample_3', 'sample_4', 'qc_2')
>>> print pm.peaklist_ids
('sample_1', 'sample_2', 'qc_1', 'sample_3', 'sample_4', 'qc_2')
class dimspy.models.peak_matrix.unmask_peakmatrix(pm: dimspy.models.peak_matrix.PeakMatrix, *args, **kwargs)[source]

Bases: object

The unmask_peakmatrix statement.

Temporary unmask the peak matrix with particular tags. Within the statement the samples can be motified or removed. After leaving the statement the original mask will be recoverd.

Parameters
  • pm – the target peak matrix

  • override – whether to override the current mask, default = True

  • args – target tag values, both typed and untyped

  • kwargs – target typed tag types and values

Return type

PeakMatrix object

>>> print pm.peaklist_ids
('sample_1', 'sample_2', 'qc_1', 'sample_3', 'sample_4', 'qc_2')
>>> with unmask_peakmatrix(pm, 'qc') as m: print m.peaklist_ids
('qc_1', 'qc_2') # no need to set pm.mask to True
>>> print pm.peaklist_ids
('sample_1', 'sample_2', 'qc_1', 'sample_3', 'sample_4', 'qc_2')