Spaces:
Running
Apply for a GPU community grant: Academic project
Talkfinder-Astro is an open-science semantic search and discovery platform for astrophysics seminar talks, lectures, and conference presentations. The project indexes thousands of publicly available talks with transcripts and metadata, enabling researchers, students, and educators to search across spoken scientific content using hybrid retrieval (BM25 + embeddings) followed by a reranking step.
The platform is fully open, research-oriented, and designed to improve accessibility to scientific knowledge that is otherwise fragmented across institutes and video platforms. It supports exploratory queries, topic discovery, and long-tail scientific questions that are poorly served by traditional keyword search.
Why a GPU is needed:
The main performance bottleneck is the neural reranking stage, which evaluates queryβdocument pairs using a transformer-based cross-encoder. On CPU-only infrastructure, even a highly constrained configuration results in ~10 seconds latency per query on Hugging Face Spaces. This forces very small candidate sets, directly degrading result quality and user experience.
Access to a community GPU would:
- Reduce query latency by an order of magnitude
- Allow larger candidate pools for reranking, improving retrieval quality
- Enable fair benchmarking and optimization of retrieval strategies
- Make the public demo responsive enough for real community use
Without GPU access, meaningful interaction and further open development of the project are severely limited.
Open science impact:
- Public, free access to scientific talks and knowledge
- Improves discoverability of long-form academic content
- Supports students, early-career researchers, and interdisciplinary exploration
- All code, data processing pipelines, and models will be openly documented
The project is non-commercial, community-serving/driven, and aligned with Hugging Faceβs mission to support open research and accessible AI.
Hi @davidhendriks , we've assigned ZeroGPU to this Space. Please check the compatibility and usage sections of this page so your Space can run on ZeroGPU.
If you can, we ask that you upgrade to Enterprise to enjoy higher ZeroGPU quota and other features like Dev Mode, Private Storage, and more: hf.co/enterprise
Hi @hysts ,
Thank you for allocating us the ZeroGpu mode, it will help with a better experience!
I've tried to set up the codebase to use zeroGpu, but I keep getting the following error after i see a popup of HF allocating Gpus (initially seemingly succesfully, but then the error occurs):
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 143, in worker_init
torch.init(nvidia_uuid)
File "/usr/local/lib/python3.10/site-packages/spaces/zero/torch/patching.py", line 414, in init
torch.Tensor([0]).cuda()
File "/usr/local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 410, in _lazy_init
torch._C._cuda_init()
RuntimeError: No CUDA GPUs are available
Current I am unsure if this is an infrastructure issue on the HF side, or if I somehow have misconfigured my codebase.
The error is caused by calling a function that uses CUDA outside a function decorated with @spaces.GPU.
On ZeroGPU, any code that touches CUDA must run inside a @spaces.GPU-decorated function. Running CUDA-dependent code outside the decorator doesn't always raise an immediate error β it can appear to succeed via CPU fallback β but it irreversibly corrupts the process-level CUDA state. Once corrupted, all subsequent @spaces.GPU calls will fail with the error.
Since this repo uses a private dataset, I wasn't able to run the app directly to debug. However, I had Claude Code review the codebase and it flagged the SentenceTransformer.encode() call as suspicious β it runs outside @spaces.GPU but internally touches CUDA APIs. I extracted that part into a minimal test Space and confirmed the issue reproduces, so I'd suggest wrapping it in a @spaces.GPU-decorated function. There may be other places in the code with the same pattern, but the same fix should apply.
Okay it seems that wrapping those pieces in semantic.py did resolve this issue. I now managed to execute my code, but I am finding the following.
Firstly, that zeroGpu accelarated execution seemingly didn't really accelarate the reranking step. I suppose that was because in that first go it had to load a bunch of things:
entered rerank_results_gpu
torch.cuda.is_available() = True
torch.cuda.device_count() = 1
[rerank.py:73 - rerank_results_gpu ] 2026-04-08 17:29:40,571: Loading reranker model on GPU...
config.json: 0%| | 0.00/799 [00:00<?, ?B/s]
config.json: 100%|ββββββββββ| 799/799 [00:00<00:00, 5.31MB/s]
tokenizer_config.json: 0%| | 0.00/443 [00:00<?, ?B/s]
tokenizer_config.json: 100%|ββββββββββ| 443/443 [00:00<00:00, 2.96MB/s]
sentencepiece.bpe.model: 0%| | 0.00/5.07M [00:00<?, ?B/s]
sentencepiece.bpe.model: 100%|ββββββββββ| 5.07M/5.07M [00:00<00:00, 12.2MB/s]
tokenizer.json: 0%| | 0.00/17.1M [00:00<?, ?B/s]
tokenizer.json: 100%|ββββββββββ| 17.1M/17.1M [00:00<00:00, 42.0MB/s]
special_tokens_map.json: 0%| | 0.00/279 [00:00<?, ?B/s]
special_tokens_map.json: 100%|ββββββββββ| 279/279 [00:00<00:00, 1.32MB/s]
Exception ignored in: <function BaseEventLoop.__del__ at 0x7f529a436560>
Traceback (most recent call last):
File "/usr/local/lib/python3.10/asyncio/base_events.py", line 690, in __del__
self.close()
File "/usr/local/lib/python3.10/asyncio/unix_events.py", line 68, in close
super().close()
File "/usr/local/lib/python3.10/asyncio/selector_events.py", line 87, in close
self._close_self_pipe()
File "/usr/local/lib/python3.10/asyncio/selector_events.py", line 94, in _close_self_pipe
self._remove_reader(self._ssock.fileno())
File "/usr/local/lib/python3.10/asyncio/selector_events.py", line 272, in _remove_reader
key = self._selector.get_key(fd)
File "/usr/local/lib/python3.10/selectors.py", line 191, in get_key
return mapping[fileobj]
File "/usr/local/lib/python3.10/selectors.py", line 72, in __getitem__
fd = self._selector._fileobj_lookup(fileobj)
File "/usr/local/lib/python3.10/selectors.py", line 226, in _fileobj_lookup
return _fileobj_to_fd(fileobj)
File "/usr/local/lib/python3.10/selectors.py", line 42, in _fileobj_to_fd
raise ValueError("Invalid file descriptor: {}".format(fd))
ValueError: Invalid file descriptor: -1
Exception ignored in: <function BaseEventLoop.__del__ at 0x7f529a436560>
Traceback (most recent call last):
File "/usr/local/lib/python3.10/asyncio/base_events.py", line 690, in __del__
self.close()
File "/usr/local/lib/python3.10/asyncio/unix_events.py", line 68, in close
super().close()
File "/usr/local/lib/python3.10/asyncio/selector_events.py", line 87, in close
self._close_self_pipe()
File "/usr/local/lib/python3.10/asyncio/selector_events.py", line 94, in _close_self_pipe
self._remove_reader(self._ssock.fileno())
File "/usr/local/lib/python3.10/asyncio/selector_events.py", line 272, in _remove_reader
key = self._selector.get_key(fd)
File "/usr/local/lib/python3.10/selectors.py", line 191, in get_key
return mapping[fileobj]
File "/usr/local/lib/python3.10/selectors.py", line 72, in __getitem__
fd = self._selector._fileobj_lookup(fileobj)
File "/usr/local/lib/python3.10/selectors.py", line 226, in _fileobj_lookup
return _fileobj_to_fd(fileobj)
File "/usr/local/lib/python3.10/selectors.py", line 42, in _fileobj_to_fd
raise ValueError("Invalid file descriptor: {}".format(fd))
ValueError: Invalid file descriptor: -1
model.safetensors: 0%| | 0.00/1.11G [00:00<?, ?B/s]
model.safetensors: 0%| | 0.00/1.11G [00:01<?, ?B/s]
model.safetensors: 48%|βββββ | 537M/1.11G [00:02<00:01, 536MB/s]
model.safetensors: 100%|ββββββββββ| 1.11G/1.11G [00:02<00:00, 504MB/s]
Loading weights: 0%| | 0/201 [00:00<?, ?it/s]
Loading weights: 100%|ββββββββββ| 201/201 [00:00<00:00, 27366.59it/s]
XLMRobertaForSequenceClassification LOAD REPORT from: BAAI/bge-reranker-base
Key | Status | |
--------------------------------+------------+--+-
roberta.embeddings.position_ids | UNEXPECTED | |
Notes:
- UNEXPECTED: can be ignored when loading from different task/architecture; not ok if you expect identical arch.
When i wanted to give it another go, sadly another error pops up about the quota.
The 'try again in' suggestion doesn't really indicate the reset time I think, but more importantly:
- Is it expected that 1 query burns through the entire zeroGpu that I have been allocated?
- What is the reset time for this quota for my allocation?
- is there a way to have a fallback system? Because in this way the entire query section of the app is blocked off and doesn't work at all.
Is it expected that 1 query burns through the entire zeroGpu that I have been allocated?
The issue is in how the models are loaded in your code. Currently, SentenceTransformer and the reranker are loaded lazily inside @spaces.GPU-decorated functions. This is an anti-pattern for ZeroGPU. I previously mentioned that CUDA-related code must be inside @spaces.GPU, but that only applies to the inference step. Model instantiation should be done at global scope. The spaces package monkey-patches torch so that ZeroGPU's backend automatically moves models to GPU only when a @spaces.GPU-decorated function runs. If you don't load at global scope, this mechanism can't work properly and you end up with unnecessary overhead.
What is the reset time for this quota for my allocation?
Quota resets every 24 hours (see ZeroGPU documentation).
