Methodology
Sheet 3 — tolerances & test conditionsCanItBeMade asks one question of generative AI across three modalities: does it understand what can physically be manufactured? Every test case targets one of three processes — injection molding, CNC machining, or FDM 3D printing — and every score traces back to a recorded output, a published rule, and a scoring script.
Text — 24 tasks
- Spot the flaw: a part description with 2–4 planted DFM violations. Score = violations found ÷ violations planted, judged item-by-item by a human grader against a fixed answer key.
- Can it be made? — yes/no with justification. Roughly 40% are physically impossible as described. Score = ½ verdict + ½ key reasoning points made.
- Write the spec: graded against an objective checklist (material, tolerances, finish, process-specific requirements — specified or not).
Image — 12 tasks
Prompts request photo-realistic engineering samples ("injection-molded one-piece…", "CNC-machined 6061…"). A human grader scores each image 0–2 on five criteria:
| Criterion | What it checks |
|---|---|
| Process geometry | Shape is plausible for the stated process |
| Draft / parting | Visible draft and parting-line logic where the process demands it |
| Wall uniformity | Walls look uniform where molding would require it |
| Material verisimilitude | Surface reads as the claimed material and finish |
| No impossible features | No floating parts, fused mechanisms, or non-Euclidean geometry |
3D — 10 tasks, graded purely by math
Generated meshes are normalized to the prompt's stated size, then run through programmatic geometry checks. No judgment calls — a script either passes the mesh or it doesn't.
| Check | Process | Rule |
|---|---|---|
| Watertight | all | Mesh must be a valid edge-manifold solid |
| Wall thickness | molding, printing | Ray-cast thickness; 5th percentile ≥ 1.0 mm at stated scale |
| Draft angle | molding | Faces with <1° draft relative to pull may not exceed 2% of surface area |
| Undercuts | molding | ±Z self-occlusion ray casting; trapped faces may not exceed 2% of area |
| Overhangs | printing | Down-facing surfaces shallower than 45° (build-plate exempt) capped at 5% of area |
| Tool access | cnc | 6-axis ray-escape approximation; ≤2% of sampled surface unreachable |
Disclosures & known limits
- Watertight uses the edge-manifold definition: a sealed internal void passes watertight by design — enclosed-void problems are caught by tool-access and undercut checks instead.
- Draft is evaluated per-triangle; the 2% area tolerance absorbs typical AI-mesh tessellation noise near the 1° boundary.
- Undercut detection is self-occlusion only, against a fixed ±Z two-part mold. Features needing non-planar parting lines (wide caps over narrow stems) are not flagged. Fine surface texture rendered as literal geometry can trip the check on parts that are genuinely moldable, and detection is relative to the mesh's as-generated orientation.
- Overhang: exactly 45° passes (strict less-than); horizontal bridges are flagged — a known conservative simplification.
- Tool access is a 300-sample approximation; results near the 2% threshold are noisy, and only clear pass/fail results should be read as decisive. The sampling seed is pinned for reproducibility.
Collection protocol. Each prompt is pasted verbatim into the model's free public tier — first response only, no retries, no prompt coaching. Leaderboard entries name the exact tier and month tested (free web tiers often serve different models than paid APIs). Platform policy refusals are excluded from means and disclosed; broken or corrupt outputs score zero, because an unusable output is itself a result.