Page 22 - Shimadzu Journal vol.9 Issue1
P. 22

Clinical Research



























                                  Figure 1. Example graph of a PESI measurement diagram. The negative mode (neg mode)
                                  TIC pattern is invalid, the positive mode (pos mode) is valid.



            Data extraction and analysis                       five criteria (ANOVA, Gain ratio, Gini, Info.gain, χ ) and the top 18
                                                                                                  2
            Mass spectra of all measurements were exported in JCAMP-DX   features were used for the five machine learning classifiers.
            format  for  each voltage  segment.  Features  were  extracted  with
                 16
            eMSTAT 1.0 (Shimadzu Corporation, Kyoto, Japan) separate per
            ionization mode but for all measurements at once by binning with
            a m/z tolerance of 0.75Da. The intensity threshold was 0.1% for   Results & Discussion
            neg mode (voltage segment 2) and 0.01% for pos mode (voltage
            segment 4). Resulting intensities for each sample (in rows) and each   In this study we obtained blood plasma samples from 50 volunteers,
            m/z binned feature (in columns) were copied to Microsoft Excel   developed a measurement method covering both ionization modes
            (2013), combining both ionization modes for each measurement   and used advanced machine learning analysis to conclusively show
            resulting in 4702 features.                        the potential of PESI-MS for routine sample-quality determination.
               Further data extraction and pre-processing was performed with
            Tibco Spotfire. All invalid single modes were excluded from any   One-step precipitation delivers both
            further calculations. All valid replicate measurements of a sample   ionization modes in one run
            were averaged and the average was log -transformation. Low qual-  Blood samples were obtained from 50 volunteers, subdivided into
                                       10
                                                         3
            ity features were filtered according to signal intensity (neg >1*10 ,   a biologically homogeneous and heterogeneous group (refer to
            pos >9*10 ), technical variability (relative standard deviation in   Fig. 2A). The subgrouping was used to create samples with lower
                    3
            QC <50%), missing data (<30%) and blank load (<50%).  biological variability to increase statistical power for detection of
               Data visualization and statistical analysis was performed with   sample quality biomarkers. However, this approach failed to im-
              17
            R  (v3.5.3, packages stringr, dplyr, readxl, openxlsx, nlme, emmeans,   prove biomarker detection. Consequently, future studies could
            ggplot2, ggpmisc, pheatmap, RColorBrewer, colorspace, dendsort, miss-  omit the cumbersome step of subdividing cohorts when searching
            MDA, mixOmics, MetaboAnalystR), Tibco Spotfire (v7.11.1) and the   for plasma quality biomarkers. From each volunteer one blood
                                18
            Orange data mining toolbox .                       sample was processed into plasma immediately (time_delay = 0 h),
               Principal component analysis (PCA) was performed centered   while a second sample was delayed for 3 h (time_delay = 3 h).
            and scaled. Orthogonal projections to latent structures discriminant   Metabolites were extracted by a simple one-step 70% MeOH
            analysis (OPLS-DA) was performed centered and scaled to unit   precipitation with 10 mM NH4Ac and 5% DMSO and the diluted
            variance with a standard 7-fold cross validation for the classifica-  supernatants were measured with the PESI-MS (see Fig. 2A). The
            tion factor time_delay. Model stability was additionally verified with   whole sample preparation can be performed manually with stand-
                                                 2
            1000 random label permutations and models with Q >50% were   ard laboratory equipment in less than 8 min total time, including
            considered significant.                             the 5 min centrifugation step. This time can be reduced to >1 min
               Five common machine learning classifiers with standard config-  by switching to filtration. For the 2 min PESI-MS measurement 10
            uration as offered by the Orange data mining toolbox were used.   µl extract sufficed, so that 2 µl plasma enabled three replicates with
            Feature importance was calculated for all 1200 features based on   the applied 1:20 dilution during precipitation.




                                                                                                                 21
                                                                                                Shimadzu Journal  vol.9  Issue1 21
   17   18   19   20   21   22   23   24   25   26   27