Claude: on the 100-problem benchmark

Model: claude-sonnet-4-6 · each problem submitted 6 times, asking only for the final mathematical answer. Below are the 12 problems where at least one run differed. Each analysis is written by an AI and audited by the ExaktAI team.

Summary
88%
All 6 equivalent
5%
5 of 6 equivalent
4%
4 of 6 equivalent
2%
3 of 6 equivalent
1%
2 of 6 equivalent
0%
All 6 different
5 of 6 equivalent

Problem 2

Evaluate \(\underset{x \rightarrow 0}{\mathrm{lim}}\frac{\sin \! \left(x \right)-x +\frac{x^{3}}{6}}{x^{5}}\).

Run 1
\(-{\frac{1}{120}}\)
Run 2
\(-{\frac{1}{120}}\)
Run 3
\(-{\frac{1}{120}}\)
Run 4
\(-{\frac{1}{120}}\)
Run 5
\(-{\frac{1}{120}}\)
Run 6
\({\frac{1}{120}}\)

Analysis: Run 1's answer \(-1/120\) is the negative of run 6's answer \(1/120\), so they differ in sign.

3 of 6 equivalent

Problem 7

For \(f \! \left(x \right) = {\mathrm e}^{x}\) and Taylor expansion centered at 0, find the smallest \(n\) such that the Lagrange remainder satisfies \({| R_{n} \! \left(x \right)|}< 1.0\times 10^{-6}\) for all \(x\) in \([0, 0.5]\).

Run 1
\(7\)
Run 2
\(9\)
Run 3
\(7\)
Run 4
\(9\)
Run 5
\(9\)
Run 6
\(7\)

Analysis: The answers differ in the integer value of n (run 2 gives \(9\), run 1 gives \(7\)), representing different choices for the smallest Taylor expansion order satisfying the remainder bound.

4 of 6 equivalent

Problem 8

Find the envelope of the one-parameter family of lines \(y = m x +\frac{1}{m}\) for \(0<m\), and identify the resulting curve.

Run 1
\(y^{2} = 4 x\)
Run 2
\(y = 2 \sqrt{x}\)
Run 3
\(y^{2} = 4 x\)
Run 4
\(y^{2} = 4 x\)
Run 5
\(y^{2} = 4 x\)
Run 6
\(y = 2 \sqrt{x}\)

Analysis: The answer by run 1 is the full parabola \(y^2 = 4x\) (both branches \(y = \pm 2\sqrt{x}\)), whereas run 2 gives only the upper branch \(y = 2\sqrt{x}\).

5 of 6 equivalent

Problem 22

Transform to cylindrical coordinates and compute \({\textcolor{gray}{\int}}_{\!\!\!0}^{2 \pi}{\textcolor{gray}{\int}}_{\!\!\!0}^{3}{\textcolor{gray}{\int}}_{\!\!\!0}^{4}r \textcolor{gray}{d}z \textcolor{gray}{d}r \textcolor{gray}{d}\theta\), where \(E\) is the solid bounded by \(z = 0\), \(z = 4\), and \(r = 3\).

Run 1
\(144 \pi\)
Run 2
\(144 \pi\)
Run 3
\(144 \pi\)
Run 4
\(144 \pi\)
Run 5
\(144 \pi\)
Run 6
\(108 \pi\)

Analysis: Run 1's answer \(144\pi\) and run 6's answer \(108\pi\) differ by a factor, representing distinct numerical values.

4 of 6 equivalent

Problem 24

Find the volume of the solid bounded above by \(z = -x^{2}-y^{2}+4\) and below by \(z = x^{2}+y^{2}\).

Run 1
\(\frac{8 \pi}{3}\)
Run 2
\(2 \pi\)
Run 3
\(4 \pi\)
Run 4
\(4 \pi\)
Run 5
\(4 \pi\)
Run 6
\(4 \pi\)

Analysis: The three answers are distinct numeric values: run 1 gives \(\tfrac{8\pi}{3}\), run 2 gives \(2\pi\), and run 3 gives \(4\pi\).

2 of 6 equivalent

Problem 25

Find the determinant of the 4x4 matrix \(A = \left[\begin{array}{cccc} 2 & 1 & 0 & 1 \\ 1 & 3 & 1 & 0 \\ 0 & 1 & 2 & 1 \\ 1 & 0 & 1 & 3 \end{array}\right]\).

Run 1
\(20\)
Run 2
\(20\)
Run 3
\(21\)
Run 4
\(28\)
Run 5
\(17\)
Run 6
\(21\)

Analysis: The answers give four distinct numerical values for the determinant: run 1 reports \(20\), run 4 reports \(28\), run 5 reports \(17\), and run 3 reports \(21\).

5 of 6 equivalent

Problem 29

Find one valid singular value decomposition of the matrix \(\left[\begin{array}{cc} 1 & 2 \\ 3 & 4 \\ 5 & 6 \end{array}\right]\).

Run 1
\([\left[\begin{array}{ccc} 0.22985 & - 0.88346 & 0.40825 \\ 0.52474 & - 0.24264 & - 0.81650 \\ 0.81964 & 0.40213 & 0.40825 \end{array}\right], \left[\begin{array}{cc} 9.52552 & 0 \\ 0 & 0.51430 \\ 0 & 0 \end{array}\right], \left[\begin{array}{cc} 0.61961 & 0.78489 \\ 0.78489 & - 0.61961 \end{array}\right]]\)
Run 2
\([\left[\begin{array}{ccc} 0.2298 & - 0.8835 & 0.4082 \\ 0.5247 & - 0.2408 & - 0.8165 \\ 0.8196 & 0.4019 & 0.4082 \end{array}\right], \left[\begin{array}{cc} 9.5255 & 0 \\ 0 & 0.5143 \\ 0 & 0 \end{array}\right], \left[\begin{array}{cc} 0.6196 & 0.7849 \\ 0.7849 & - 0.6196 \end{array}\right]]\)
Run 3
\(\operatorname{Matrix}\!\left(\left[\left[\frac{-2 \sqrt{7-\sqrt{8185}/13}}{\sqrt{91-\sqrt{8185}}}, \frac{-2 \sqrt{7+\sqrt{8185}/13}}{\sqrt{91+\sqrt{8185}}}, \frac{1}{\sqrt{6}}\right], \left[\frac{-\sqrt{7-\sqrt{8185}/13}-\sqrt{91-\sqrt{8185}}/\sqrt{13}}{\sqrt{91-\sqrt{8185}}}, \frac{-\sqrt{7+\sqrt{8185}/13}-\sqrt{91+\sqrt{8185}}/\sqrt{13}}{\sqrt{91+\sqrt{8185}}}, -\frac{2}{\sqrt{6}}\right], \left[\frac{\sqrt{91-\sqrt{8185}}\,\sqrt{2/13}-\sqrt{7-\sqrt{8185}/13}}{\sqrt{91-\sqrt{8185}}}, \frac{\sqrt{91+\sqrt{8185}}\,\sqrt{2/13}-\sqrt{7+\sqrt{8185}/13}}{\sqrt{91+\sqrt{8185}}}, \frac{1}{\sqrt{6}}\right]\right]\right)\operatorname{Matrix}\!\left(\left[\left[\sqrt{\frac{91+\sqrt{8185}}{2}}, 0\right], \left[0, \sqrt{\frac{91-\sqrt{8185}}{2}}\right], \left[0, 0\right]\right]\right)\operatorname{Matrix}\!\left(\left[\left[\frac{-13-\sqrt{8185}}{\sqrt{170+2 \sqrt{8185}}}, -\frac{\sqrt{2}}{\sqrt{1+\left(13+\sqrt{8185}\right)^{2}/84}\sqrt{84}}\left(13+\sqrt{8185}\right)\right], \left[\frac{\sqrt{84}}{\sqrt{170+2 \sqrt{8185}}}, -\frac{\sqrt{2}}{\sqrt{1+\left(13+\sqrt{8185}\right)^{2}/84}}\right]\right]\right)\)
Run 4
\(\begin{pmatrix} \begin{array}{cc} -\frac{-17+\sqrt{8185}}{6 \sqrt{1+\frac{\left(-17+\sqrt{8185}\right)^{2}}{36}}} & -\frac{-17-\sqrt{8185}}{6 \sqrt{1+\frac{\left(-17-\sqrt{8185}\right)^{2}}{36}}} \\ \frac{-1+\sqrt{8185}}{12 \sqrt{1+\frac{\left(-17+\sqrt{8185}\right)^{2}}{36}}} & \frac{-1-\sqrt{8185}}{12 \sqrt{1+\frac{\left(-17-\sqrt{8185}\right)^{2}}{36}}} \\ \frac{1}{\sqrt{1+\frac{\left(-17+\sqrt{8185}\right)^{2}}{36}}} & \frac{1}{\sqrt{1+\frac{\left(-17-\sqrt{8185}\right)^{2}}{36}}} \end{array} \\ \begin{array}{cc} \sqrt{\frac{91}{2}+\frac{\sqrt{8185}}{2}} & 0 \\ 0 & \sqrt{\frac{91}{2}-\frac{\sqrt{8185}}{2}} \end{array} \\ \begin{array}{cc} \frac{21+\sqrt{8185}}{\sqrt{\left(21+\sqrt{8185}\right)^{2}+1936}} & \frac{44}{\sqrt{\left(21+\sqrt{8185}\right)^{2}+1936}} \\ \frac{21-\sqrt{8185}}{\sqrt{\left(21-\sqrt{8185}\right)^{2}+1936}} & \frac{44}{\sqrt{\left(21-\sqrt{8185}\right)^{2}+1936}} \end{array} \end{pmatrix}\)
Run 5
\([\left[\begin{array}{ccc} 0.2298 & 0.8835 & - 0.4082 \\ 0.5247 & 0.2407 & 0.8165 \\ 0.8196 & - 0.4022 & - 0.4082 \end{array}\right], \left[\begin{array}{cc} 9.5255 & 0 \\ 0 & 0.5143 \\ 0 & 0 \end{array}\right], \left[\begin{array}{cc} 0.6196 & - 0.7849 \\ 0.7849 & 0.6196 \end{array}\right]]\)
Run 6
\([\left[\begin{array}{ccc} - 0.2298476964 & 0.8834610176 & 0.4082482905 \\ - 0.5247448074 & 0.2407824921 & - 0.8164965809 \\ - 0.8196419184 & - 0.4018960334 & 0.4082482905 \end{array}\right], \left[\begin{array}{cc} 9.525518091 & 0 \\ 0 & 0.5143005806 \\ 0 & 0 \end{array}\right], \left[\begin{array}{cc} - 0.6196294838 & - 0.7848944533 \\ - 0.7848944533 & 0.6196294838 \end{array}\right]]\)

Analysis: The first group all give valid SVD triples \((U, \Sigma, V)\) with column-orthonormal \(U\) and \(V\) and singular values \(\sqrt{(91\pm\sqrt{8185})/2}\), differing only by sign flips of corresponding columns of \(U\) and \(V\); the second group has \(U\) columns that are not unit vectors under the stated normalization, so the triple is a different decomposition.

5 of 6 equivalent

Problem 38

Solve \(\frac{d^{2}}{d x^{2}}y \! \left(x \right) = \frac{x^{2} \left(\frac{d}{d x}y \! \left(x \right)\right)^{2}-2 x y \! \left(x \right) \left(\frac{d}{d x}y \! \left(x \right)\right)+y \! \left(x \right)^{2}}{x^{2}}\).

Run 1
\(y \! \left(x \right) = x \left(C_{2} -{\mathrm e}^{-C_{1}} \mathrm{Ei}\! \left(C_{1} -\ln \! \left(x \right)\right)\right)\)
Run 2
\(y \! \left(x \right) = x \left(C_{2} -{\mathrm e}^{-C_{1}} \mathrm{Ei}\! \left(C_{1} -\ln \! \left(x \right)\right)\right)\)
Run 3
\(y \! \left(x \right) = x \left(C_{2} -{\mathrm e}^{-C_{1}} \mathrm{Ei}\! \left(C_{1} -\ln \! \left(x \right)\right)\right)\)
Run 4
\(y \! \left(x \right) = x \left(C_{2} -{\mathrm e}^{-C_{1}} \mathrm{Ei}\! \left(C_{1} -\ln \! \left(x \right)\right)\right)\)
Run 5
\(y \! \left(x \right) = C_{2} x -x \,{\mathrm e}^{-C_{1}} \mathrm{Ei}\! \left(C_{1} -\ln \! \left(x \right)\right)\)
Run 6
\(y \! \left(x \right) = x \tan \! \left(\ln \! \left(x \right) c_{1} +c_{2} \right)\)

Analysis: The answer by run 1 is given in terms of the exponential integral \(\mathrm{Ei}(\_C_1-\ln x)\), while the answer by run 6 is given as \(x\tan(c_1\ln x+c_2)\), and no reparametrization of the constants makes these two functional forms coincide.

3 of 6 equivalent

Problem 50

Solve the polynomial equation \(x^{4}-2 x^{3}-3 x^{2}+4 x +4 = 0\) over the reals, finding all roots.

Run 1
\(\{x = -1, x = 2\}\)
Run 2
\(\{x = -1, x = 2\}\)
Run 3
\(\{x = -1, x = 2\}\)
Run 4
\(\left(-1, -1, 2, 2\right)\)
Run 5
\(\left(-1, -1, 2, 2\right)\)
Run 6
\(\left(-1, -1, 2, 2\right)\)

Analysis: Run 1 gives the distinct real roots \(\{-1, 2\}\) as a set, while run 4 lists \(-1, -1, 2, 2\) with each root repeated, indicating multiplicity two for both roots.

4 of 6 equivalent

Problem 53

Compute a lexicographic Gröbner basis (with \(y <x\)) of the ideal \(\langle x^{2}+y^{2}-4, x y -1\rangle\) and use it to find all complex solutions of the system.

Run 1
\(\left(\left[y^{4}-4 y^{2}+1, y^{3}+x -4 y \right], \left\{\left[x =-\sqrt{2-\sqrt{3}}, y =-\sqrt{2+\sqrt{3}}\right], \left[x =-\sqrt{2+\sqrt{3}}, y =-\sqrt{2-\sqrt{3}}\right], \left[x =\sqrt{2-\sqrt{3}}, y =\sqrt{2+\sqrt{3}}\right], \left[x =\sqrt{2+\sqrt{3}}, y =\sqrt{2-\sqrt{3}}\right]\right\}\right)\)
Run 2
\(\left(\left\{x =\sqrt{2+\sqrt{3}}, y =\sqrt{2-\sqrt{3}}\right\}, \left\{x =-\sqrt{2+\sqrt{3}}, y =-\sqrt{2-\sqrt{3}}\right\}, \left\{x =\sqrt{2-\sqrt{3}}, y =\sqrt{2+\sqrt{3}}\right\}, \left\{x =-\sqrt{2-\sqrt{3}}, y =-\sqrt{2+\sqrt{3}}\right\}\right)\)
Run 3
\(\{\{x = -\sqrt{2-\sqrt{3}}, y = -\sqrt{2+\sqrt{3}}\}, \{x = -\sqrt{2+\sqrt{3}}, y = -\sqrt{2-\sqrt{3}}\}, \{x = \sqrt{2-\sqrt{3}}, y = \sqrt{2+\sqrt{3}}\}, \{x = \sqrt{2+\sqrt{3}}, y = \sqrt{2-\sqrt{3}}\}\}\)
Run 4
\(\left(\left\{x =\sqrt{2-\sqrt{3}}, y =\sqrt{2+\sqrt{3}}\right\}, \left\{x =-\sqrt{2-\sqrt{3}}, y =-\sqrt{2+\sqrt{3}}\right\}, \left\{x =\sqrt{2+\sqrt{3}}, y =\sqrt{2-\sqrt{3}}\right\}, \left\{x =-\sqrt{2+\sqrt{3}}, y =-\sqrt{2-\sqrt{3}}\right\}\right)\)
Run 5
\(\left(\left[y^{3}+x -4 y , y^{4}-4 y^{2}+1\right], \left\{\left\{x =-\frac{\sqrt{6}}{2}-\frac{\sqrt{2}}{2}, y =-\frac{\sqrt{6}}{2}+\frac{\sqrt{2}}{2}\right\}, \left\{x =-\frac{\sqrt{6}}{2}+\frac{\sqrt{2}}{2}, y =-\frac{\sqrt{6}}{2}-\frac{\sqrt{2}}{2}\right\}, \left\{x =\frac{\sqrt{6}}{2}-\frac{\sqrt{2}}{2}, y =\frac{\sqrt{6}}{2}+\frac{\sqrt{2}}{2}\right\}, \left\{x =\frac{\sqrt{6}}{2}+\frac{\sqrt{2}}{2}, y =\frac{\sqrt{6}}{2}-\frac{\sqrt{2}}{2}\right\}\right\}\right)\)
Run 6
\(\{\{x = -\sqrt{2-\sqrt{3}}, y = -\sqrt{2+\sqrt{3}}\}, \{x = -\sqrt{2+\sqrt{3}}, y = -\sqrt{2-\sqrt{3}}\}, \{x = \sqrt{2-\sqrt{3}}, y = \sqrt{2+\sqrt{3}}\}, \{x = \sqrt{2+\sqrt{3}}, y = \sqrt{2-\sqrt{3}}\}\}\)

Analysis: The first group provides both the lexicographic Gröbner basis \(\{y^4-4y^2+1,\ x+y^3-4y\}\) and the four solutions, while runs 2, 3, 4, and 6 omit the Gröbner basis and give only the four solutions (which are equivalent to one another, since \(\sqrt{2\pm\sqrt{3}} = \tfrac{1}{2}(\sqrt{6}\pm\sqrt{2})\)).

4 of 6 equivalent

Problem 77

Perform a two-tailed one-sample t-test at \(\alpha = 0.05\) for \(\mathit{H0}\) using the sample \([9, 10, 11, 12, 13]\) (assume normal population with unknown variance).

Run 1
\(\sqrt{2}\)
Run 2
\(1-\frac{4 \sqrt{3}}{9}\)
Run 3
\(\sqrt{2}\)
Run 4
\(\sqrt{2}\)
Run 5
\(\sqrt{2}\)
Run 6
\(1-\frac{4 \sqrt{3}}{9}\)

Analysis: The run 1 answer is \(\sqrt{2} \approx 1.414\), while run 2's answer is \(1 - \tfrac{4\sqrt{3}}{9} \approx 0.230\), which are numerically and algebraically distinct.

5 of 6 equivalent

Problem 97

Find the number of distinct necklaces with 6 beads using 3 colors, where rotations are considered identical.

Run 1
\(92\)
Run 2
\(36\)
Run 3
\(92\)
Run 4
\(92\)
Run 5
\(92\)
Run 6
\(92\)

Analysis: The answers \(36\) and \(92\) are distinct integers, so run 2 and run 1 give numerically different counts of necklaces.