paramecinm commited on
Commit
5d26561
·
verified ·
1 Parent(s): af31f99

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +104 -3
README.md CHANGED
@@ -1,3 +1,104 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+
6
+ # FloydARC (ARC-AGI Reasoning)
7
+
8
+ ## Model Summary
9
+
10
+ **FloydARC** is a neural algorithmic reasoning model adapted from FloydNet for the **ARC-AGI** benchmark.
11
+ This checkpoint is trained primarily on ARC-style synthetic and curated data, and is designed to solve ARC tasks via **iterative refinement and test-time adaptation**, rather than large-scale web pretraining.
12
+
13
+ Among models trained mainly on ARC-like data, FloydARC achieves **state-of-the-art performance** on both ARC-AGI-1 and ARC-AGI-2, significantly narrowing the gap to very large proprietary models.
14
+
15
+ ---
16
+
17
+ ## Performance
18
+
19
+ FloydARC demonstrates strong generalization on ARC benchmarks under standard evaluation protocols.
20
+
21
+ **ARC-AGI benchmark results:**
22
+
23
+ | Model | #Params | ARC-AGI-1 | ARC-AGI-2 |
24
+ | ------------ | ------: | --------: | --------: |
25
+ | VARC | 73M | 60.4 | 11.1 |
26
+ | Loop-ViT | 11.2M | 61.2 | 10.3 |
27
+ | HRM | 27M | 40.3 | 5.0 |
28
+ | **FloydARC** | 153.7M | **70.5** | **15.3** |
29
+
30
+
31
+
32
+ ---
33
+
34
+ ## Model Details
35
+
36
+ * **Model ID**: `ocxlabs/FloydARC`
37
+ * **Task**: Abstraction and Reasoning Corpus (ARC-AGI)
38
+ * **Architecture**: FloydNet-based global relational reasoning with looped refinement
39
+ * **Input / Output**: ARC grid-based visual reasoning (query canvas → predicted answer canvas)
40
+ * **License**: Apache 2.0
41
+
42
+ ---
43
+
44
+ ## Usage: Inference & Evaluation
45
+
46
+ This checkpoint is intended for **research and evaluation use** on ARC-AGI. Full reproduction of reported results requires multi-GPU inference with test-time training.
47
+
48
+ ### 1. Download checkpoint
49
+
50
+ Download the pretrained checkpoint from Hugging Face:
51
+
52
+ ```
53
+ https://huggingface.co/ocxlabs/FloydARC
54
+ ```
55
+
56
+ Place the downloaded folder anywhere on disk and pass its path via `--ckpt_path`.
57
+
58
+ ---
59
+
60
+ ### 2. Prepare ARC evaluation data
61
+
62
+ Place the original ARC JSON files under `rawdata/`, then preprocess:
63
+
64
+ ```bash
65
+ python -m scripts.process_data \
66
+ --input_dir ./rawdata/ARC-AGI-1_evaluation/ \
67
+ --output_dir ./preprocessed/arc1 \
68
+ --split test
69
+ ```
70
+
71
+ Repeat with `ARC-AGI-2_evaluation` for ARC-AGI-2.
72
+
73
+ ---
74
+
75
+ ### 3. Run inference with Test-Time Training (recommended)
76
+
77
+ ```bash
78
+ python -m scripts.TTT \
79
+ --ckpt_path /path/to/floydarc_ckpt \
80
+ --subset arc1 \
81
+ --output_dir ./output/TTT_results
82
+ ```
83
+
84
+ Notes:
85
+
86
+ * Default configuration uses **8 GPUs on a single node**
87
+ * LoRA-based TTT is enabled by default and recommended
88
+ * For ARC-AGI-2, set `--subset arc2`
89
+
90
+ ---
91
+
92
+ ### 4. Ensembling & visualization
93
+
94
+ For reproducible evaluation and qualitative inspection:
95
+
96
+ ```bash
97
+ python -m scripts.analyze \
98
+ --result-folder ./output/TTT_results \
99
+ --subset arc1 \
100
+ --out-html output/arc1_results.html
101
+ ```
102
+
103
+ Multiple result folders can be passed to enable max-voting ensembling.
104
+