Training an AI on Zurich Timetables: When Models Invent Trains That Do Not Exist
I gave an AI a slice of Zurich’s Sunday train schedules and asked it to become a pocket timetable assistant.
Armed with LoRA, GGUF, and a GPU that sounds like a jet engine when training, I set out to teach a machine when trains really leave Zürich HB.
What happened? The model learned to sound exactly like a timetable… and then it invented trains that do not exist.
Aim
The goal of this experiment was to fine tune a small language model on Zurich train schedules and run it locally.
I wanted to see if a lightweight AI could serve as a timetable assistant, answering natural questions like “When is the next train from Zürich HB to Oerlikon after 08:00 on Sunday?”
Beyond the functionality, the aim was to explore the full pipeline: fine tuning with LoRA, compressing into GGUF, and serving it with Ollama.
Data
The data came from the SBB Open Data GTFS feed.
To keep things manageable, I filtered to Zurich City stations and Sundays only.
This subset was small, irregular, and perfect for a first iteration.
It gave me a few thousand departure–arrival pairs to train on.
Preparing Data
The data was reshaped into instruction and answer pairs.
Each entry asked a question in natural language and provided a structured timetable answer.
Example
- Q: When is the next train from Zürich HB to Oerlikon after 08:00 on Sunday?
- A: 08:07 Zürich HB → Oerlikon 08:22
LoRA Fine Tuning
Base model: Microsoft Phi 2 with 2.7 billion parameters under MIT license.
LoRA configuration
- Rank r = 8
- Alpha = 16
- Dropout = 0.1
- Target modules: q_proj, v_proj
Training parameters
- Batch size = 4
- Gradient accumulation = 2 (effective batch size 8)
- Learning rate = 2e-4
- Max steps = 1000 (around 40 minutes on RTX 4070 Ti 16 GB)
Only about 2.6 million parameters (0.09 percent of the model) were updated.
This efficiency is the key strength of LoRA: instead of retraining billions of weights, it adapts only a thin adapter layer.
The model quickly picked up the timetable format.
GGUF Quantization
After merging the LoRA into the base model, the checkpoint was converted into GGUF.
I chose Q4_K_M quantization, a 4 bit setting that balances size and performance.
This reduced the model size from about 10 GB (FP16) to around 2 GB (GGUF Q4).
Quantization made the model portable to different machines.
While it introduced slightly more noise, the structure of the output remained intact.
Run with Ollama
With Ollama, the quantized model could be served locally. I used conservative parameters:
- Temperature = 0.6
- Top_p = 0.9
- Repeat penalty = 1.2
- Num_predict = 120
At first, the results were strong. The model returned rows in the expected format.
- 08:07 Zürich HB → Oerlikon 08:22
- 08:09 Zürich HB → Oerlikon 08:23
- 08:15 Zürich HB → Oerlikon 08:36
But with longer generations, it drifted into hallucination, inventing impossible times such as:
- 25:99 Zürich HB → Oerlikon 26:10
- 26:22 Zürich HB → Oerlikon 27:05
The model had learned the style of timetable answers, but not the reality of the schedules.
Deployment and Performance
The most striking part of this experiment is portability.
The same fine tuned GGUF model that ran on Linux with an RTX 4070 Ti also ran smoothly on a MacBook Air M4 with 16 GB RAM, using Ollama and Apple’s Metal backend.
Performance metrics from the MacBook Air run
- Total duration: about 2.9 seconds
- load duration: 33 ms
- Prompt evaluation rate: around 151 tokens per second
- Generation rate: around 45 tokens per second
This is interactive performance on a fanless ultraportable.
The quantized model still represents a 2.7 billion parameter base, but thanks to LoRA only 2.6 million parameters were actually trained and merged.
On the MacBook Air, all parameters are loaded in 4 bit precision, making it efficient enough to serve locally.
Output on MacBook Air M4
- 08:00Z Zürich HB → Oerlikon 09:02 (91-13Y-j25-1)
- 08:07Z Zürich HB → Oerlikon 08:28 (91-15B-I-j25-1)
- 08:17Z Zürich HB → Oerlikon 08:36 (91-16D-T-j24-2)
- 09:00Z Zürich HB → Oerlikon 09:07 (91-3C-Y-j15-4)
This shows that the style transfer worked, although hallucinations remain.
I also plan to test deployment on a Jetson Nano 8 GB.
Running a 2.7B parameter model on the Nano will require llama.cpp built with CUDA for ARM and aggressive quantization (Q4 or even Q2).
It will be slower than Mac or PC, but still functional. This would prove the principle: train once, run anywhere.
Observation
The model mastered the format but not the facts.
It produced realistic looking timetable rows but invented times and continued generating far beyond the expected number of results.
This demonstrates a core limitation of generative AI: it predicts plausible sequences, but it does not guarantee factual correctness.
Conclusion
This project confirmed that with LoRA and GGUF, it is possible to fine tune, compress, and run a model locally on consumer hardware.
From RTX 4070 Ti to MacBook Air M4 and soon Jetson Nano, the same model runs seamlessly.
The experiment also highlighted the importance of matching the tool to the task. LoRA is excellent at style, but timetables require factual recall, better suited to retrieval methods.
Next Steps
From a gaming GPU to a MacBook Air and soon a Jetson Nano, the same 2.7B-parameter brain now runs wherever I take it.
It still hallucinates, but that is part of the charm: this project is not just about trains, it is about showing how far a hobbyist can go with modern AI tools.
Train once, run anywhere.
Experiment Summary
- Base model: Microsoft Phi 2 (2.7B parameters, MIT license)
- Fine tuning: LoRA adapters
- Trainable parameters: about 2.6M (0.09 percent of 2.7B)
- Rank r = 8, alpha = 16, dropout = 0.1
- Target modules = q_proj, v_proj
Training setup
- Device: NVIDIA RTX 4070 Ti Super (16 GB VRAM)
- Batch size = 4
- Gradient accumulation = 2 (effective batch size 8)
- Learning rate = 2e-4
- Max steps = 1000 (about 40 minutes)
Quantization
- Format: GGUF
- Type: Q4_K_M (4 bit balanced)
- Size: around 2 GB (down from 10 GB FP16)
Ollama run parameters
- Temperature = 0.6
- Top_p = 0.9
- Repeat penalty = 1.2
- Num_predict = 120
Performance
MacBook Air M4 16 GB with Metal backend
- Load: 33 ms
- Prompt evaluation: about 151 tokens per second
- Generation: about 45 tokens per second
- Total runtime: about 2.9 seconds per query
Jetson Nano 8 GB (planned test)
- Requires llama.cpp built with CUDA for ARM
- Needs Q4 or Q2 quantization
- Expected slower generation but functional
Final model size
- 2.7B total parameters (base)
- LoRA trained about 2.6M parameters merged into final checkpoint
- Runs fully quantized (Q4) on consumer devices
This site uses no cookies, no tracking.