Splitting large source files
This is a design note, not a task list. It looks at every source file currently over 1000 lines and proposes how it could be split into smaller, cohesive files. The goal is *not* to hit a line count β it is to cut each file along the seams that already exist in it, so that each resulting file has one responsibility and the boundaries between them carry real meaning. Where a file is large but *coherent* (one job, done in one place), the honest recommendation is to leave it alone.
You can regenerate the list this doc is based on with the bundled script:
elm script BigFiles # files over 1000 lines, largest first
Principles used here
- Cut along responsibility seams, not at line numbers. A good boundary is one where the two sides talk to each other through a small, namable interface (a few functions, one shared map), not one where a helper on each side reaches into the other's internals.
- Keep mutually-recursive cores whole. A tree-walking evaluator, a Pratt expression parser, or a pattern-match compiler are webs of small functions that call each other over shared state. Splitting *inside* such a web creates churn and import noise for no clarity. Split *around* it.
- Prefer "extract a cohesive island" over "halve the file". The best candidates are self-contained sub-systems (a vendored stdlib string, a hand-assembled runtime, an optimisation pipeline) that depend on little and are depended on narrowly.
- Preserve the public surface. Splitting should be invisible to callers: the same module/class name keeps the same exported functions; new files are package-private (Java) or internal modules re-exported by the original (Elm).
- Don't split vendored code. Files we mirror from upstream stay one file, or we knowingly fork.
A note on language mechanics:
- Java makes this easy: move an inner class to its own package-private top-level class, or move a cluster of
staticmethods into afinalhelper class that writes into the same shared maps. No visible API change. - Elm is stricter: a module's split halves must not import each other in a cycle, and every helper crossing the new boundary has to become an explicit
exposingentry. That cost is the main reason some Elm cuts below are recommended cautiously.
---
The files
Line counts and status as of 2026-06-05. β done, π‘ partially done (an island already extracted; more proposed below), β¬ not started, βΈ deliberately left whole.
Policy decision (2026-06-05): the goal is to extract every *cohesive island* and bring the cleanly-splittable files under 1000; two kinds of file are left as documented exceptions rather than forced under the threshold: (1) the vendored examples/Playground.elm (splitting forks upstream), and (2) tightly mutually-recursive cores (Eval.elm's evaluator+runBuiltin web, WasmCompiler's FunctionGen, WasmGc's Gen, JsCompiler's codegen core) β these have their islands extracted to shrink them, but the irreducible core stays whole because splitting it just scatters a hot web across files.
| File | Lines | Recommendation | Status |
|---|---|---|---|
editor/Eval.elm | 3474 | Split β clean leaves extracted; evaluator+runBuiltin core stays whole | π‘ EvalRender + EvalPlayground + EvalJson extracted (3 single-injection leaves); Eval.App βΈ deferred (threads whole evaluator); Core/Builtins are the documented core |
wasm/WasmCompiler.java | ~1831 | Split β extract prelude, string runtime, binary encoding; keep the codegen core | π‘ WasmPrelude + WasmEncoding + WasmNativeFns (string/apply/record natives) extracted; FunctionGen core stays (documented exception) |
lsp/LspServer.java | 2602 | Split β transport vs. analysis vs. code-actions/refactors | βΈ deferred β a cohesive, feature-organized server whose public+static surface is pinned by ~28 directly-tested entry points (see note) |
wasm/WasmGc.java | ~2105 | Split β extract the type registry and the shared encoding; keep Gen | π‘ WasmEncoding + WasmGcTypes (Tuples + W/StructDef) extracted; Gen core stays (documented exception) |
Main.java | 234 | Split β 35 commands β 4 domain files; root shell + helpers stay | β
CliCompile/CliProject/CliPackage/CliSiteCommands extracted |
interp/Prelude.java | ~967 | Split β one class per Elm module group (cleanest of all) | β
PreludeCollections + PreludeJson + PreludeCore extracted; now under threshold |
examples/Playground.elm | ~1708 | Leave β vendored elm-playground; splitting forks upstream | βΈ |
editor/Editor.elm | 1370 | Split β EditorView (Model/Msg + view leaf) vs. Editor (update/orchestration) | βΈ deferred (feasible β view is a verified leaf; delicate cross-module TEA restructure, see note) |
js/JsCompiler.java | ~1239 | Partial β extract the optimiser pipeline; keep codegen together | β
JsOptimizer + JsRuntime extracted; remainder coherent |
test/WasmHeapTest.java | ~595 | Split β by feature area, with a shared test-helper base | β
WasmHeapTestSupport base + WasmLangFeaturesTest; all <1000 |
test/EditorInterpreterTest.java | ~796 | split by feature with a shared base | β
EditorInterpreterTestSupport base + EditorToolingInterpreterTest; all <1000 |
parser/Parser.java has dropped to ~995 (below the 1000-line threshold): OperatorFixities was extracted, and the recursive-descent core is intentionally kept whole β so it no longer appears above.
---
editor/Eval.elm (3474, was ~4344) β clean leaves extracted; evaluator core stays whole π‘
Done incrementally, one cycle-free module at a time (Elm forbids import cycles, and the evaluator core is one mutually-recursive web). Three clean single-injection leaves are extracted β EvalRender (pure display), EvalPlayground (game/animation loop), EvalJson (JSON codec) β each taking applyValue/mainValue as parameters rather than importing the core, with Eval re-exposing the public functions via thin aliases. The Eval.App band is a clean one-way leaf too but threads the *whole* evaluator through 44 functions for no change to Eval's core-status, so it is deferred (see below). The remaining Eval.Core + Eval.Builtins (evalExpr + runBuiltin) are the documented mutually-recursive core that stays whole. (Line ranges below are approximate.)
Eval.elm is the in-browser editor's Elm-in-Elm interpreter. It has grown to hold five jobs that only share the Value/Globals/Env types and the central evalExpr/applyValue pair. Those jobs are visible as contiguous bands in the file:
Eval.CoreβevalExpr(β232β529), pattern matching/matchPattern/evalCase(β2276β2416), operators/applyOp/equality (β2417β2570),applyValue(β537β567), andlookup. *Why together:* these are one mutually-recursive evaluator; every arrow between them is a hot call, not an API. This module owns theValuetype and stays the dependency root.Eval.Builtinsβ thebuiltins/aritytables (β17β231),runBuiltin(β910β2007), and the collection implementations it dispatches to: Dict/Set/Array (β568β909) and the polymorphic list combinatorsmapValues/foldlValues/β¦ (β2007β2274). *Why this seam:*runBuiltinis a giantcase name ofthat is conceptually the "standard library", distinct from the evaluator that *calls* it. It depends onEval.Core(to apply closures) butEval.Corenever calls it back β a clean one-way edge.Eval.AppβΈ β the Elm-Architecture glue:hasApp,appInit/appUpdate/appView, the effect handlersrandomCmd/httpCmd/fileSelectCmd/taskResult, the TEA drivereval/evalProject/debugSteps, and the game entry points (β2766βEOF, ~44 functions). *Why deferred:* unlike the three leaves below β each of which needs only the singleapplyValueinjection β this band threads the whole evaluator (evalExprandapplyValueandrenderValue/htmlToString) through all 44 mutually-recursive runtime functions, and it carries ~28 public entry points that external drivers reference by theEvalmodule name (Javavalue("Eval", β¦), JS_$Eval$β¦) so each would need a thin re-export alias. It *is* a clean one-way leaf (the evaluator core never calls back into it β the only pre-band references are theexposing(β¦)list and the module doc), so it could be extracted with an evaluator-record parameter; but doing so leavesEvala >2700-line core (runBuiltin) either way, so per the "leave cores whole, extract the *clean single-injection* leaves" policy it stays for now.Eval.Jsonβ β the hand-rolled JSON parser/serialiser and theJson.Decode/Encodeinterpreter now live ineditor/EvalJson.elm(~445 lines). *Why:* a self-contained codec; it touches the evaluator only to apply decoder/encoder functions, so the seven globals-carrying helpers takeapplyValueas an injected parameter (ApplyTo) β the same leaf pattern as Render/Playground. The pure parser/serialiser stays parameterless.Evalrewired its five call sites toEvalJson.*; the band dropped Eval.elm from 3892 β 3475 lines.Eval.Playgroundβ β elm-playground shape construction, SVG rendering, and the game/animation loop now live ineditor/EvalPlayground.elm. *Why:* a closed world β shapes in, SVG out. It needs the evaluator only to apply a game'sview/updateand resolvemain, soapplyValue/mainValueare passed intogameView/gameStep/gameInitMemas parameters (no import back intoEval); everything else is pure.Evalre-exposes the game functions via thin wrappers. ~464 lines.Eval.Renderβ β the pure display helpersrenderValue+ the Html-valueβstringhtmlToString/attrKeynow live ineditor/EvalRender.elm(flat module name to match the editor's flat, module=filename convention;Evalre-exposesrenderValuevia a thin alias since Elm has no re-export). *Why:* display logic with no dependency on evaluation β the cleanest, injection-free leaf.renderProgramstays in the core (it orchestrates init/view, i.e. it evaluates).
Eval itself becomes a thin module re-exposing the ~25 public functions so Editor.elm is untouched. The risk to watch: keep Eval.Core free of imports from the other five so there is no cycle. runBuiltin is the one place that may need a function passed in (to evaluate closures) rather than importing Eval.App.
wasm/WasmCompiler.java (~2767) β extract the islands, keep the engine π‘ partial
Three large parts of this file are only loosely attached to the actual compiler:
WasmPreludeβ β theWASM_PRELUDEElm-source string and thePRELUDE_NAMESmap (~320 lines of data) now live inwasm/WasmPrelude.java. *Why:* it is data, not logic; it changes when we add a stdlib function, a different cadence from changing codegen.WasmStringRuntimeβ¬ β the hand-assembled native functionsstringRuntime()and their entry builders (strToListEntry,strReverseEntry,strConcatEntry, β¦, ~900 lines) remain. *Why extract:* raw-bytecode emitters that depend only on the encoding helpers (leb/sleb/entry) β the file's most self-contained island and the part least related to compiling Elm ASTs. The next cut here.WasmEncodingβ (shared β see cross-cutting note) βleb/sleb/section/name/nameSectionnow inwasm/WasmEncoding.java, shared withWasmGc.
What stays in WasmCompiler: the FunctionGen inner class (β769β1918) β the expressionβbytecode compiler β together with compileModules/assemble and the lambda-lifting pass. FunctionGen is a mutually-recursive web over the shared funcs/ctorTag/nodeTypes maps (intExpr β intApp β intCase β tailExpr); cutting inside it would scatter that web. It could become its own top-level package-private FunctionGen.java, but its halves should not be split further.
wasm/WasmGc.java (~2440) β lift out the type registry π‘ partial
The WasmGC backend's natural seam is the Tuples inner class (β812β1105): the struct/type registry that assigns stable indices to every cons/tuple/record/closure shape. It is a cohesive data structure with its own helpers and no dependency on codegen β a clean WasmGcTypes.java. β¬ remains. The StructDef/W type model (β140β178) goes with it. The Gen inner class is the codegen engine and, like FunctionGen, stays whole. The leb/sleb/section/name helpers β
have already moved to the shared WasmEncoding.
interp/Prelude.java (~967, under threshold) β the cleanest split of all β
Prelude is one static class that registers ~400 builtins into three shared maps (BUILTINS, UNQUALIFIED, CTOR_ARITY) from a static initialiser. Crucially, the registerXxx() methods do not call each other β each is ~80β150 self-contained lines keyed by Elm module. That makes it the textbook candidate: move each group into its own package-private class that registers into the same maps, leaving Prelude as the initialiser that calls them.
Proposed grouping (by how often they change together, not one-class-per-method):
PreludeCoreβ β Basics, List, String, Char, Bitwise (the high-traffic core), now ininterp/PreludeCore.java. The shared helpers it needed (basics/javaList/ordering/split+ theUNQUALIFIEDmap) became package-private; this single ~595-line cut took Prelude under 1000.PreludeCollectionsβ β Array, Dict, Set, now ininterp/PreludeCollections.java.PreludeDataβ Maybe, Result, Tuple, Debug, constructors. β¬PreludeEffectsβ Cmd/Sub, Random (incl. the seededstepGencluster), Time, Task, Browser.Events. *Why kept together:*registerEffectsandstepGenshare theadvance/scrambleSeedhelpers β the one genuinely coupled sub-system here. β¬PreludeJsonβ β Json.Decode/Encode, Url, Navigation, Storage anddecodeErrorToString(with the Url helpersparseUrl/percentEncode/urlToString) now live ininterp/PreludeJson.java. *Why:* the decoder and its error renderer are a bound pair, and the Url helpers are used only here. (The sharedddata-builder became package-private for it.)PreludeHtmlβ registerHtml/registerSvg/registerBrowser and the tag/attr tables. β¬PreludeMediaβ WebGL, Math (Vec/Mat), Regex, File. β¬
The only shared surface is the handful of one-liners fn/basics/just/d/isJust/isOk, which become static helpers on a small PreludeSupport.
Main.java (234, was ~1898) β one class per command, done β
Main was a picocli CLI whose body was 35 independent @Command static inner classes (Run, Js, Make, Eval, Script, Serve, Bundle, TestCmd, Docs, Lsp, Format, β¦). They share no state β only the helpers readElmSource/typeError/render (now package-private on Main) and two text templates (ELM_JSON, BUNDLED_DEMO_DIRS). picocli registers subcommands by class, so each command moved into a domain-grouped top-level class in the same package with no behaviour change; the @Command(subcommands = β¦) list now names the top-level classes directly.
Main keeps only the root @Command shell (main/run/usage), the shared helpers, and the templates β 234 lines. The 35 commands now live in four domain files (package-private top-level classes, calling back to Main.readElmSource/typeError/render):
CliCompileCommands(483) β Run, Js, Make, Wasm, Eval, Bytecode, Script, Bundle.CliProjectCommands(470) β Serve, Reactor, TestCmd, CoverageCmd, Lint, Check, Repl, Lsp, Format, Project, Bench, Doctest.CliPackageCommands(452) β Diff, Bump, Publish, Init, Install, Upgrade, Uninstall, Outdated, Verify.CliSiteCommands(384) β Docs, Site, GenSite, BuildCmd, Gallery, GalleryElm.
Main keeps only the root @Command, the exception handler, and main(). *Why group rather than one-per-file:* the seam that matters is "what part of the toolchain does this drive", and commands in a group tend to change together (e.g. all the package commands when the registry format moves).
lsp/LspServer.java (2602) β feature-organized server, split deferred βΈ
LspServer is a single coherent responsibility β *be the language server* β internally organized by LSP feature: diagnostics, completion, hover/definition, code actions, refactors, document/workspace symbols, references, call hierarchy, semantic tokens, folding/selection ranges, and the JSON-RPC serve loop with its transport plumbing. The natural seams are clear (transport vs. analysis vs. code-actions/refactors), and it is *not* a mutually-recursive core. Two things make a clean sub-1000 split high-cost and low-ROI, so it is deferred rather than forced:
- A wide, directly-pinned public surface.
LspServerTestexercises ~28 entry points onLspServerβ both the instance analysis methods (diagnose,complete,codeActions,refactors,references,definition,hoverType,semanticTokens, β¦) *and* severalstatichelpers (applyChange,readMessage,identifierSpan,wordAt,importLinks,foldingRanges). Moving any of these into a feature class forces either a delegating wrapper (which costs back most of the lines) or churn across the test. - The
docsworkspace map straddles the seam. Theserveloop *writes* open documents into thedocsfield, and an analysis method (qualifiedMembers, for cross-module completion) *reads* it, so transport and analysis cannot be cleanly separated without threading the workspace map through the completion path.
Reaching <1000 would take four or five coordinated extractions (e.g. LspDiagnostics, LspCodeActions, LspRefactors, LspProtocol) plus delegation wrappers β a large, regression-prone change to a file that already reads as well-separated sections. Left whole as a documented exception for now; the feature boundaries above are the plan if it is revisited.
js/JsCompiler.java (~1239) β extract only the optimiser β
done
Most of this file is one tightly-woven codegen pipeline: expression compile β pattern matchJs β the TCO path compileNamedFunction all share the local-scope stack and temp counter. That web stays. The cleanly separable part is the post-processing pipeline β minify, treeShake, pruneKernel, balancedLine (β414β565): pure String β String passes with no dependency on the compiler instance. These now live in JsOptimizer.java, and the runtime kernel in JsRuntime.java β the remainder is the coherent codegen pipeline, left whole as recommended. (A further multi-module caching layer JsBundleCache was noted as an optional second candidate; not pursued.)
editor/Editor.elm (1370) β view is a clean leaf; split deferred βΈ (feasible)
Editor.elm mixes the TEA wiring with a large view layer (β885β1370). A cross-reference scan confirms the view is a true leaf: of the view band's functions only view itself is referenced from the rest of the module (by program), and the view band calls just four functions defined above it β baseName, groupedFiles, selectedFile, shownModel (closure: folderOf, groupOrder, nth) β none of which call back into update/program.
The cycle subtlety the earlier plan missed: the view produces Html Msg, so an EditorView module needs the Msg type β but Editor needs view, so Editor imports EditorView. If EditorView imported Editor for Model/Msg, that is a cycle. The fix is to put Model + Msg (and the four shared read helpers) in EditorView (or a dedicated EditorTypes), which imports nothing from Editor; Editor's update/program/state-helpers then import them from there. (The JS backend tags constructors by *simple* name, so moving Msg does not rename the dispatch tags the headless-Chrome guard checks β verified safe.)
So the split is genuinely achievable (EditorView β Model/Msg + the four read helpers + the β485-line view layer; Editor β program/init/subscriptions/update/state-helpers), and would bring both under
- It is deferred only on cost/risk: it is a delicate cross-module TEA restructure (relocating
Model/Msg, non-contiguous helper moves, getting exposing (Msg(..)) right) that must keep the JS bundle, the interpreter, and the headless-Chrome drivers all green, plus re-registration in SiteGenerator.EDITOR_MODULES and the interpreter-test module lists. The plan above is ready to execute when picked up.
parser/Parser.java (~995, now under threshold) β extract the edges, keep the descent β
done
A recursive-descent + Pratt parser is the canonical "leave the core whole" case: parseExpr, parseApplication, parseAtom, parsePattern, and parseType all advance the same tokens/p cursor and call one another freely. Splitting that across files would mean threading a shared ParserState through dozens of methods purely to satisfy the file boundary β cost without clarity.
What *can* leave cleanly:
OperatorFixitiesβ β the staticFIXITYtable,scanFixities,scanInfixDeclarationsnow live inparser/OperatorFixities.java. Pure precedence data + a pre-scan, independent of the cursor.- Layout helpers β
atNewLine/continues/withIndentand the error-recoveryrecoverToNextTopLevelcould become a smallLayoutmixin, though the payoff is modest; left in place.
The modest recommendation was taken: fixities extracted, and the ~900-line descent core kept as one coherent unit (the file is now ~995 lines, below the threshold). Forcing a 50/50 split here would make the parser harder to read, not easier β exactly the "random split" this doc exists to avoid.
test/WasmHeapTest.java (~1202) β split by feature, share the harness
This is a test file, and tests split painlessly because each @Test is independent. The clusters are already obvious: standard-library (lists/strings/maybe), float arithmetic, core language (patterns, recursion), records, higher-order/closures, and the property-based RNG tests. The shared harness β agrees, runMain, runMainString, decodeList, agreesFloat, NODE detection (β24β135, 245β350) β moves to a WasmTestSupport base class, and the clusters become WasmStringTest, WasmRecordTest, WasmHigherOrderTest, etc. *Why bother for tests:* faster targeted runs and a clearer map of what the WASM backend guarantees, with no production risk.
test/EditorInterpreterTest.java (~1066) β watch, don't split yet
New over the threshold (it grew as the editor gallery examples gained interpreter-fidelity regression tests β Mouse colours, Upload multiple, FirstPerson vectors, Positions Random, Thwomp textures). It is still one cohesive suite: every @Test exercises the same Eval/Editor interpreter through the shared evalProject/files/renderGame helpers. If it keeps growing, the natural cut mirrors WasmHeapTest: an EditorInterpreterTestSupport base holding the helpers, with the @Tests split into language-core, gallery-examples, and lexer/format clusters. Until then, leaving it whole keeps the editor's behavioural contract in one readable place.
examples/Playground.elm (~1708) β do not split
This is vendored evancz/elm-playground, kept as a test resource so the gallery's Playground examples compile against the real library. It has clean internal seams (entry points, transforms, colours, rendering) β but splitting it would fork it from upstream, and every future sync would have to be re-applied across our pieces. The right move is to leave it whole and treat its size as the cost of vendoring. If we ever *do* want it modular, that should be a deliberate, documented fork.
---
Cross-cutting: the duplicated WASM encoder β done
WasmCompiler.java and WasmGc.java used to carry byte-identical copies of leb, sleb, section, and name. These now live in a single wasm/WasmEncoding.java that both backends call β the duplication hazard (a LEB128 fix needed twice) is gone. This was the recommended first step and de-risked the remaining WASM islands.
Suggested sequencing
Ordered by payoff-to-risk, easiest and safest first. β = completed.
- β
WasmEncodingextraction β done; removed the duplicated encoder. - β
Prelude.javaβPreludeCollections(Array/Dict/Set),PreludeJson(Json/Url/Nav/Storage) andPreludeCore(Basics/List/String/Char/Bitwise) extracted; now ~967 lines (under threshold). The remaining groups (Data, Effects, Html, Media) could still be split for cleanliness, but the size target is met. - β
Main.javaβ 35 independent command classes moved into 4 domain-grouped top-level files (234 lines left). - β
WasmHeapTest.java/EditorInterpreterTest.javaβ split by feature, harness shared (*Supportbases). - β
WasmCompiler/WasmGcislands βWasmPrelude+WasmEncoding+WasmNativeFns(string/apply/record natives) andWasmGcTypes(theTuplesregistry + struct records) extracted; theFunctionGen/Gencodegen cores stay whole (documented exceptions). - β
Eval.elmβ three single-injection leaves extracted (EvalRender,EvalPlayground,EvalJson);Eval.Appand the evaluator+runBuiltin core stay whole (documented). - β
JsCompiler/ParserβJsOptimizer/JsRuntimeandOperatorFixitiesextracted; the codegen and recursive-descent cores are intentionally left whole. - βΈ
LspServer.javaβ deferred: a cohesive, test-pinned LSP server (β28 directly-tested entry points + adocs-field straddle); sub-1000 needs 4β5 feature extractions + delegation wrappers. - βΈ
Editor.elmβ deferred but feasible: the view is a verified leaf; the split is a delicate cross-module TEA restructure (Model/Msg relocation) that must keep JS-bundle + headless + interp green.
Each step should be a behaviour-preserving move validated by the existing test suite before the next. The measure of success is not the line count afterwards β it is whether a newcomer can guess which file a given change belongs in.
Result (2026-06-05)
Every cleanly-splittable file is under 1000 lines. The files that remain above the threshold are all documented exceptions: the vendored examples/Playground.elm; the tightly mutually-recursive backend cores (Eval.elm, WasmCompiler, WasmGc, JsCompiler) which have their islands extracted but keep their irreducible cores; the test-pinned LspServer.java; and Editor.elm, whose clean view-leaf split is planned and ready but deferred on cost/risk. Separately, an elm server app can now talk to a database through the typed lib/Db.elm JDBC layer (see server/DbRunner.java).