Table of Contents
.
Chapter 1 Introduction
1.0 Background
1.1 Problems/Challenges
1.2 Reproducibility Issues
1.3 Data Collection Outline
Chapter 2 Scaling
2.0 Introduction
2.1 Mean Centering
2.2 Variance Scaling / Autoscaling
2.3 Scale Row Area / Integral Normalization
2.4 Pareto Scaling
2.5 Range Scaling
2.6 Level Scaling
2.7 Log Transformation
2.8 Power Transformation - Square Root
2.9 Discretization
2.10 Generalized Log Transform
2.11 Probabilistic Quotient Normalization (PQN)
2.12 Variable Stability Scaling (VAST)
2.13 Bucketing/Binning
2.14 Histogram Matching (HM)
2.15 Orthogonal Signal Correction (OSC)
2.16 Digitization
2.17 Multiplicative Scatter Correction (MSC)
2.18 Standard Normal Variates (SNV)
Chapter 3 Preprocessing
3.0 Introduction
3.1 Zero Filling (NMR)
3.2 Windowing/Filtering (NMR)
3.3 Signal-to-Noise Ratio (SNR)
3.4 Noise reduction and differentiation
3.5 Savitzky-Golay
3.6 Differentiation
3.7 Smoothing
3.8 Moving Averages
3.9 Heteroscedastic / Homoscedastic Noise
3.10 Phasing
3.11 Baseline
3.12 Linear Regression Baseline Fitting
3.13 Two Point Linear Baseline
3.14 Baseline correction, function fit baseline
3.15 Asymmetric least squares baseline fit
3.16 airPLS
3.17 Baseline Offset
3.18 Mass Spectrometry
3.19 Wavelets
3.19.1 Continuous Wavelet Transform (CWT)
3.19.2 Scaling
3.19.3 Shifting
3.19.4 Discrete Wavelet Transform (DWT)
3.19.5 Approximations and Details
3.19.6 Multiple-Level Decomposition
3.19.7 Dimension Reduction
3.19.8 Baseline/Background Correction
3.19.9 Denoising
3.19.10 FT Denoising
3.19.11 Wavelet Denoising
3.20 Peak Alignment
3.20.1 Warping
3.20.2 Peak alignment by Fast Fourier Transform (PAFFT)
3.20.3 Recursive alignment by Fast Fourier Transform (RAFFT)
3.20.4 Recursive Segment-wise Peak Alignment (RSPA)
3.20.5 Generalized Fuzzy Hough transform (GFHT)
3.20.6 Dynamic Time Warping (DTW)
3.20.7 Parametric Time Warping (PTW)
3.20.8 Correlation Optimized Warping (COW)
3.20.9 Peak Alignment - Reduced Set Mapping(PARS)
3.20.10 Partial Linear Fit (PLF)
3.20.11 Peak Alignment by a Genetic Algorithm (PAGA)
3.20.12 Beam Search
3.20.13 iCOSHIFT
3.20.14 FFT Cross Correlation
3.20.15 Progressive Consensus Alignment of NMR Spectra (PCANS)
Chapter 4 Sample Subset Selection
4.0 Introduction
4.1 Sample size recommendations
4.2 Representativity
4.3 Median Absolute Deviation (MAD)
4.4 Dixon's Test
4.5 Grubbs Test
4.6 Cochrane test
4.7 Constituent Value Range
4.8 General Notes
4.9 Overview of Sample Subset Selection Options
4.10 Random Subsampling
4.11 Bootstrapping
4.12 Cross Validation
4.13 Mahalanobis distance (MD)
4.14 Kennard-Stone (KS)
4.15 Duplex Method
4.16 Sample Set Partitioning X–Y distances (SPXY)
4.17 Rank Select
4.18 Kohonen Neural Networks
Chapter 5 Variable Subset Selection
5.0 Introduction
5.1 Missing Values
5.2 Imputation Methods
5.3 Multiple Imputation
5.4 Why variable selection
5.5 Chance
5.6 Generalizability
5.7 Bias
5.8 Filter Methods
5.9 Wrapper Methods
5.10 Embedded Methods
5.11 Exhaustive Methods
5.12 Information Leak
5.13 Cross Validation
5.14 Variable Selection by Stepwise Algorithms
5.15 Sequential Forward Floating Selection (SFFS)
5.16 Variable Selection Stepwise Regression
5.17 F-Test
5.18 t-Test
5.19 Fisher Index / Coomans Index
5.20 χ2-Test
5.21 Kolmogorov–Smirnov Test
5.22 Wilcoxon Rank Sum Test
5.23 Analysis of Variance (ANOVA)
5.24 SELECT
5.25 Simple Variable Reduction
5.26 Selection of Univariate Tests
5.27 Simple Pairwise Correlation Method
5.28 KIF Index Method
5.29 VIF Method
5.30 B2 And B4 Methods
5.31 First Eigenvector Method
5.32 Overlap Density Heatmap (ODH)
5.33 Successive Projections Algorithm (SPA)
5.34 Linear Discriminant Analysis (LDA)
5.35 Uncorrelated Linear Discriminant Analysis (ULDA)
5.36 Principal Component Analysis (PCA)
5.37 Partial Least Squares (PLS)
5.38 Interval PCA (iPCA) and PLS (iPLS)
5.39 Interval PLS (iPLS)
5.40 Moving Window PLS (mwPLS)
5.41 Backward Interval PLS (biPLS)
5.42 Synergy Interval PLS (siPLS)
5.43 Variable Importance in Projection (VIP)
5.44 Non-Orthogonalized PLS1 (IFRNOPLS)
5.45 Outer Product Analysis PLS Discriminant Analysis (OPA-PLSDA)
5.46 PLS Uninformative Variable Elimination (PLS UVE)
5.47 Partial Least Squares Genetic Algorithm (PLS-GA)
5.48 Linear Discriminant Analysis-Genetic Algorithm (LDA GA)
5.49 Mutual Information (MI)
5.50 Particle Swarm Optimization (PSO)
5.51 Relief
5.52 Decision Trees
5.53 CART
5.54 Recursive Feature Elimination (RFE)
5.55 Naïve Bayesian Belief Network (BBN)
5.56 Support Vector Machines (SVM) Feature Selection
5.57 Ant Colony Optimization (ACO)
5.58 Minimum Redundancy–Maximum Relevance (MRMR)
5.59 Neural Networks (NN)
5.60 Back-Propagation Neural Network (BP-NN)
5.61 Probabilistic Neural Networks (PNN)
5.62 Random Forest (RF)
5.63 Independent Component Analysis (ICA)
5.64 Random Permutations
5.65 Correlation-Based Feature Selection (CFS)
5.66 Fast Correlation Based Feature Selection (FCBF)
5.67 Simulated Annealing (SA)
5.68 Multidimensional Scaling (MDS)
5.69 Stochastic Proximity Embedding (SPE)
5.70 Isomap
5.71 Fast Maximum Variance Unfolding (FastMVU)
5.72 Kernel PCA (KPCA) / kernel PLS
5.73 Generalized Discriminant Analysis (GDA)
5.74 Diffusion Maps (DM)
5.75 Stochastic Neighbor Embedding (SNE)
5.76 Local Linear Embedding (LLE)
5.77 Laplacian Eigenmaps (LE)
5.78 Hessian LLE (HLLE)
5.79 Local Tangent Space Analysis (LTSA)
5.80 Conformal Eigenmaps (CCA)
5.81 Maximum Variance Unfolding (MVU)
5.82 Linearity Preserving Projection (LPP)
5.83 Neighborhood Preserving Embedding (NPE)
5.84 Locally Linear Coordination (LLC)
5.85 Manifold Charting (MC)
5.86 Coordinated Factor Analysis (CFA)
5.87 Stochastic Neighbor Embedding (SNE)
5.88 Evaluation Criteria
List of Abbreviations
Appendix 1 Test Data
Appendix 2 Software 1
Appendix 3 Software 2