2 F-Mapper and G-Mapper
In addition to the traditional Mapper algorithm, this package implements two advanced, models: F-Mapper and G-Mapper. While the Mapper requires users to manually and blindly guess the number of intervals and overlap percentage by user, these two advanced models automatically learn the underlying data structure through fuzzy logic and statistical testing, respectively.
2.1 1. F-Mapper (Fuzzy Mapper)
F-Mapper is an improved version that introduces Fuzzy Logic into the Mapper algorithm. Its core concept is to remove the traditional hard cut geometric intervals and instead use Fuzzy C-Means (FCM) clustering to define the Cover.
2.1.1 Core Concepts & Mechanism
In F-Mapper, instead of stating the node belongs to interval A, the model assumes this data point belongs 80% to cluster A and 20% to cluster B.
1. Generate Soft Intervals: The algorithm first runs FCM clustering on the filter values.
2. Determine Overlap: By setting a probability threshold (fcm_threshold), any data point with a membership degree to a cluster greater than this threshold is included in that interval.
3. Natural Bridges: Data points located at the boundaries between clusters naturally form the overlap connecting two nodes, as their membership degrees for both sides exceed the threshold.
2.1.2 Parameter Explanation
cluster_n: The specified number of fuzzy clusters (i.e., the final number of intervals in the Cover).fcm_threshold: The membership degree threshold. A lower value means a looser standard for determining overlap, results in more edges in the final graph.
# F-Mapper Execution Example
FMapper <- FuzzyMapperAlgo(
original_data = data[, 1:4],
filter_values = data[, 1:2],
cluster_n = 8,
fcm_threshold = 0.2,
methods = "kmeans",
method_params = list(max_kmeans_clusters = 2)
)Mapper
2.2 2. G-Mapper (Gaussian Mapper)
G-Mapper is a state-of-the-art algorithm introduced in 2024 that incorporates statistical normality tests and Gaussian Mixture Models (GMM) into the Mapper construction. It solves the primary pain point of F-Mapper, which still requires manual guessing of cluster_n by allowing the data to completely and automatically determine the number and size of the intervals.
2.2.1 Core Concepts & Mechanism
G-Mapper employs a recursive testing strategy that resembles cell division:
Anderson-Darling Test: The algorithm performs a normality test on the data within an interval. If the data resembles a single bell curve (passes the test), it is considered a pure cluster and is left unsplit.
GMM Splitting: If the data distribution is skewed or multi-peaked (fails the test), G-Mapper uses a Gaussian Mixture Model (GMM) to split it into two new components.
Geometric Overlap: After splitting, the algorithm calculates a precise geometric overlap range based on the means and variances of the two Gaussian distributions, along with the user-defined g_overlap, ensuring an appropriate intersection between adjacent intervals.
2.2.2 Parameter Explanation
AD_threshold: The critical value for the Anderson-Darling test. A smaller value (stricter scrutiny) triggers splitting more easily, producing more intervals (nodes).g_overlap: The geometric overlap extension ratio. It determines how far the split intervals should cross over into each other. A higher value results in a more tightly connected graph.
# G-Mapper Execution Example
GMapper <- GMapperAlgo(
original_data = data[, 1:4],
filter_values = data[, 1], # G-Mapper primarily evaluates 1D distributions
AD_threshold = 0.8, # Statistical threshold dictating how many times to split
g_overlap = 0.5, # Geometric overlap dictating the number of edges
methods = "kmeans",
method_params = list(max_kmeans_clusters = 2),
num_cores = 12
)