Verified
benchmarks.
We believe in absolute transparency. Official benchmark performance scores across industry standards will be fully published here.
Vilcus-1 Scores Pending
Vilcus-1 is currently in its active training and parameter tuning phase. To maintain strict scientific accuracy and prevent speculative assertions, **we have removed all preliminary scores**.
Once model training cycles are fully finalized and verified by independent alignment teams, complete comparative tables (including performance graphs against other models) will be officially posted here.
Training Run Active
Our computing clusters are currently processing final alignment layers. Benchmark updates will launch automatically upon model completion.
Evaluation Methodology
Coding Capabilities (HumanEval)
Evaluates the correctness of synthesizing Python code blocks from docstrings. Vilcus-1 will be benchmarked on complex multi-step programming syntax.
Mathematical Logic (GSM8K)
Standardized multi-step math word problems requiring cohesive multi-step reasoning before generating final responses.
Scientific Reasoning (GPQA)
High-level graduate-grade questions across biology, physics, and chemistry designed by domain experts to stress-test logical depth.
Want early access?
Get notified when official model cards and verified benchmark results are posted.
Join the Waitlist