AIMS-7 — Galois Groups of Irreducible Septic Polynomials

Database of 1.69 million septics · Statistical analysis · ML classification

JURGEN MEZINAJ · OAKLAND UNIVERSITY · arXiv:2511.16622
1,686,353
Total Polynomials
AIMS-7 dataset
7
Galois Groups
transitive subgroups of S7
92.5%
Are S7
1,559,957 entries
0.8525
ML Balanced Acc.
non-S7 + invariants model
252
C7 Polynomials
rarest group in AIMS-7
h = 28
First C7 Example
from p=29 cyclotomic field
AIMS-7 Full Dataset -- Real Counts from AIMS-7.csv (Table 9)

Full Dataset Distribution (including S7)

Non-S7 Distribution (126,396 polynomials)

Complete Reference Table -- AIMS-7.csv + AIMS-7inv.csv
GroupOrderAIMS-7 CountLMFDB Counth le 4R1 factorsR2 factorsR3 factorsSolvable

Subgroup Inclusion Lattice -- click any node

S75040 A72520 L(3,2)168 C7xC642 C7xC321 D714 C77
Click any node to explore that Galois group — order, resolvent factorizations, and real AIMS-7 counts.

Mean Log-Height E[h(f)|G] -- Figure 5 of paper

Counts at Height le 4 -- Table 6 (log scale)

Rational vs Irreducible Points by Height -- Table 4

LMFDB vs AIMS-7 Final Counts (non-S7 groups)

Key Statistical Findings
Height-Complexity Trend

Counter-intuitively, higher group complexity is associated with smaller defining height. A7 and L(3,2) live at small heights; C7 and C7xC3 appear only at large heights.

Discriminant Correlation

Large heights accompany large |Delta_f|. The C7 and C7xC3 points in the joint (h, log|Delta|) plot cluster in the upper-right, far from A7 and L(3,2).

Cyclic Group Rarity

C7 and C7xC3 are completely absent at height le 4. The first C7 polynomial appears at h=28, from the cyclotomic field Q(zeta_29), derived from p=29.

Classification Pipeline -- AIMS-7.csv + AIMS-7inv.2025.csv
AIMS-7.csv
1.69M entries
+
AIMS-7inv.csv
j0-j4, Delta_f
Coefficients
a0...a7
Hist. Gradient
Boosting
60/40 split
Galois Group
Label
7 classes

Model Comparison -- Balanced Accuracy

Non-S7 Model: Precision / Recall / F1 (Table 10)

Confusion Matrix -- Non-S7 Test Set, 50,559 polynomials (Table 11)

Rows = True label · Columns = Predicted · Green diagonal = correct · Red = major errors

Main confusion: 2,616 true A7 classified as L(3,2), and 5,923 true L(3,2) classified as A7. All other group pairs are well-separated.

Feature Importance (Neurosymbolic Model)

5-Fold Cross-Validation Balanced Accuracy

Neurosymbolic Network Architecture (Section 8)
1 Symbolic
Preprocess
Delta, j0-j4
2 Feature
Transform
sgn*log10
3 Concat
15-vector
coeff+inv
4 Dense
128->64
ReLU
5 Self-
Attention
dynamic
6 Softmax
Output
6 non-S7