Grok: on the 100-problem benchmark

Model: grok-4-1-fast-reasoning · each problem submitted 6 times, asking only for the final mathematical answer. Below are the 7 problems where at least one run differed. Each analysis is written by an AI and audited by the ExaktAI team.

Summary
93%
All 6 equivalent
3%
5 of 6 equivalent
1%
4 of 6 equivalent
3%
3 of 6 equivalent
0%
2 of 6 equivalent
0%
All 6 different
5 of 6 equivalent

Problem 17

Find the critical points of \(f \! \left(x , y\right) = x^{3}+y^{2}-3 x\) and classify them.

Run 1
\([[[-1, 0], \text{``saddle point''}], [[1, 0], \text{``local minimum''}]]\)
Run 2
\(\left[\left[-1, 0\right]=\text{``saddle point''}, \left[1, 0\right]=\text{``local minimum''}\right]\)
Run 3
\([[[-1, 0], \text{``saddle point''}], [[1, 0], \text{``local minimum''}]]\)
Run 4
\([[[-1, 0], \text{``saddle point''}], [[1, 0], \text{``local minimum''}]]\)
Run 5
\([[[-1, 0], \text{``saddle point''}], [[1, 0], \text{``local minimum''}]]\)
Run 6
\([[x = -1, y = 0], [x = 1, y = 0]]\)

Analysis: The first group identifies both critical points \((-1,0)\) as a saddle and \((1,0)\) as a local minimum, while run 6 lists only the critical-point locations without classifying them.

3 of 6 equivalent

Problem 40

For the Fourier-Bessel expansion \(f \! \left(r \right) = -r^{2}+1 = \overset{\infty}{\underset{n =1}{\sum}}c_{n} J_{0}\! \left(\alpha_{n} r \right)\) on \([0, 1]\), where \(\alpha_{n}\) is the n-th positive zero of \(J_{0}\! \left(x \right)\), give the closed form for \(c_{n}\) and the numeric values of \(c_{1}\), \(c_{2}\), \(c_{3}\) to 4 decimal places.

Run 1
\(\left(c_{n} =\frac{8}{\alpha_{n}^{3} J_{1}\! \left(\alpha_{n} \right)}, c_{1} = 1.1079, c_{2} =- 0.1398, c_{3} = 0.0455\right)\)
Run 2
\(\left(c_{n} =\frac{8}{\alpha_{n}^{3} J_{1}\! \left(\alpha_{n} \right)}, c_{1} = 1.1071, c_{2} = 0.1398, c_{3} = 0.0455\right)\)
Run 3
\(\left(\frac{8}{\alpha_{n}^{3} J_{1}\! \left(\alpha_{n} \right)}, 1.1071, - 0.1398, 0.0455\right)\)
Run 4
\(\left(c_{n} =\frac{8}{\alpha_{n}^{3} J_{1}\! \left(\alpha_{n}\right)}, c_{1}= 1.1083, c_{2}=- 0.1398, c_{3}= 0.0455\right)\)
Run 5
\(\left(c_{n} =\frac{8}{\alpha_{n}^{3} J_{1}\! \left(\alpha_{n} \right)}, c_{1} = 1.1070, c_{2} = 0.1398, c_{3} = 0.0455\right)\)
Run 6
\(\left(c_{n} =\frac{8}{\alpha_{n}^{3} J_{1}\! \left(\alpha_{n} \right)}, c_{1} = 1.1080, c_{2} = 0.1400, c_{3} = 0.0455\right)\)

Analysis: All six runs give the same closed form \(8/(\alpha_n^3 J_1(\alpha_n))\) and the same \(c_3 = 0.0455\). They split 3-vs-3 on the sign of \(c_2\): runs 1, 3, and 4 give \(c_2 = -0.1398\); runs 2, 5, and 6 give \(c_2\) with the opposite sign (\(+0.1398\) in runs 2 and 5, \(+0.1400\) in run 6). Values of \(c_1\) differ across runs only at the 4th decimal (range \(1.1070\)–\(1.1083\)).

5 of 6 equivalent

Problem 53

Compute a lexicographic Gröbner basis (with \(y <x\)) of the ideal \(\langle x^{2}+y^{2}-4, x y -1\rangle\) and use it to find all complex solutions of the system.

Run 1
\([y^{3}+x -4 y, y^{4}-4 y^{2}+1]\)
Run 2
\([y^{3}+x -4 y, y^{4}-4 y^{2}+1]\)
Run 3
\([y^{4}-4 y^{2}+1, y^{3}+x -4 y]\)
Run 4
\([y^{3}+x -4 y, y^{4}-4 y^{2}+1]\)
Run 5
\([y^{3}+x -4 y, y^{4}-4 y^{2}+1]\)
Run 6
\([-y^{3}+x +4 y, y^{4}-4 y^{2}+1]\)

Analysis: Answers by run 3, run 4, and run 1 give the basis \([x + y^3 - 4y,\ y^4 - 4y^2 + 1]\) (order of polynomials and spacing differ but the ideal generators match), while run 6 gives \([x - y^3 + 4y,\ y^4 - 4y^2 + 1]\), which has the opposite sign on the \(y^3\) and \(y\) terms of the first polynomial.

4 of 6 equivalent

Problem 55

Find the discriminant of \(f \! \left(x \right) = x^{3}+p x +q\) in terms of \(p\) and \(q\), and state the condition on \([p, q]\) for \(f\) to have a repeated root.

Run 1
\(-4 p^{3}-27 q^{2}\)
Run 2
\(-4 p^{3}-27 q^{2}\)
Run 3
\(-4 p^{3}-27 q^{2}\)
Run 4
\(-4 p^{3}-27 q^{2} = 0\)
Run 5
\(-4 p^{3}-27 q^{2} = 0\)
Run 6
\(-4 p^{3}-27 q^{2}\)

Analysis: The first group gives only the discriminant expression \(-4p^3 - 27q^2\), while the second group states the repeated-root condition \(-4p^3 - 27q^2 = 0\).

3 of 6 equivalent

Problem 60

Express \(4 \sin \! \left(x \right) \sin \! \left(2 x \right) \sin \! \left(4 x \right)\) as a sum of cosines.

Run 1
\(\cos \! \left(\frac{\pi}{2}-3 x \right)-\cos \! \left(\frac{\pi}{2}-x \right)+\cos \! \left(\frac{\pi}{2}-5 x \right)-\cos \! \left(\frac{\pi}{2}-7 x \right)\)
Run 2
\(\cos \! \left(x \right)+\cos \! \left(3 x \right)+\cos \! \left(5 x \right)+\cos \! \left(7 x \right)\)
Run 3
\(-\sin \! \left(x \right)+\sin \! \left(3 x \right)+\sin \! \left(5 x \right)-\sin \! \left(7 x \right)\)
Run 4
\(\cos \! \left(x \right)+\cos \! \left(3 x \right)+\cos \! \left(5 x \right)+\cos \! \left(7 x \right)\)
Run 5
\(\cos \! \left(x \right)+\cos \! \left(3 x \right)+\cos \! \left(5 x \right)+\cos \! \left(7 x \right)\)
Run 6
\(\cos \! \left(3 x -\frac{\pi}{2}\right)-\cos \! \left(x -\frac{\pi}{2}\right)-\cos \! \left(7 x -\frac{\pi}{2}\right)+\cos \! \left(5 x -\frac{\pi}{2}\right)\)

Analysis: Run 1 and run 3 both equal \(-\sin(x)+\sin(3x)+\sin(5x)-\sin(7x)\) (using \(\cos(\pi/2-\theta)=\sin\theta\)), while run 2 is a sum of cosines that does not equal the original product.

5 of 6 equivalent

Problem 76

Find the two-tailed p-value for \(t = 2.5\) with \(\mathit{df} = 20\) using the Student t distribution.

Run 1
\( 0.0207852996\)
Run 2
\( 0.0209337562\)
Run 3
\( 0.0205904256\)
Run 4
\( 0.0215221933224\)
Run 5
\( 0.0207059365\)
Run 6
\( 0.02097754957\)

Analysis: Five of the six runs (1, 2, 3, 5, 6) give numerical values that cluster between \(0.0206\) and \(0.0210\), within \(\approx 0.0005\) of each other. Run 4 alone gives \(0.0215\), separated from the cluster by about \(0.0005\).

3 of 6 equivalent

Problem 99

Derive the minimal polynomial of \(\cos \! \left(\frac{2 \pi}{9}\right)\) over \(Q\) and give its numerical value to 6 decimal places.

Run 1
\( 0.766044\)
Run 2
\( 0.766044\)
Run 3
\(\left(8 x^{3}-6 x +1, 0.766044\right)\)
Run 4
\(8 x^{3}-6 x +1\)
Run 5
\(\left(8 x^{3}-6 x +1, 0.766044\right)\)
Run 6
\( 0.766044\)

Analysis: Three distinct kinds of answer: runs 1, 2, and 6 give only the numerical value \(0.766044\); runs 3 and 5 give the tuple \((8x^3 - 6x + 1,\ 0.766044)\); run 4 gives only the polynomial \(8x^3 - 6x + 1\).