selfhostedmodels
Evoneural MVP β Local Mesh & Skybox
Localhost MVP for text β 3D mesh and text β 360Β° skybox using local models (no hosted APIs).
- Mesh: Text β image (Stable Diffusion) β 3D mesh (TripoSR). Output:
.objor.glb. - Skybox: Text β 2:1 equirectangular image (Stable Diffusion). Optional seamless edge check.
Default model: runwayml/stable-diffusion-v1-5 (no Hugging Face login required; first run downloads ~4GB).
Prerequisites
- Python 3.10 (recommended) β python.org
- NVIDIA GPU with CUDA (recommended; CPU is slower)
- Git (for cloning TripoSR; mesh only)
No Conda? Use venv (built into Python) β steps below.
1. Environment
Option A: venv + pip (no Conda)
From PowerShell (project folder is evoneural):
cd D:\project\evoneural
python -m venv .venv
.venv\Scripts\Activate.ps1
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
- CPU only: use
pip install torch torchvision(no--index-url). - CUDA 12.x: use
cu121instead ofcu118.
Option B: Conda
cd D:\project\evoneural
conda env create -f environment.yml
conda activate evoneural-mvp
If you use CPU-only or a different CUDA version, edit environment.yml (e.g. remove pytorch-cuda=11.8 or set pytorch-cuda=12.1).
2. TripoSR (for mesh)
Mesh generation needs the TripoSR repo and its dependencies.
cd D:\project\evoneural
git clone https://github.com/VAST-AI-Research/TripoSR.git TripoSR
pip install -r TripoSR/requirements.txt
On Windows, if torchmcubes fails, see TripoSR README (CUDA version match, then reinstall torchmcubes).
2b. Stable Diffusion model (Hugging Face)
If you see "Cannot load model ... model is not cached locally and an error occurred while trying to fetch metadata", the app cannot reach Hugging Face. Use one of these:
Option 1 β Log in (uses cached token)
From a terminal with internet:
huggingface-cli login
Paste a token from huggingface.co/settings/tokens (read access is enough). Then run the app again.
Option 2 β Set token in env
Create a token at huggingface.co/settings/tokens, then:
$env:HF_TOKEN = "hf_xxxxxxxx"
streamlit run app.py
Option 3 β Download model once, then use offline
On a machine that can reach Hugging Face:
cd D:\project\evoneural
.venv\Scripts\Activate.ps1
python -m scripts.download_sd_model
Then set the path and run the app (no Hugging Face needed):
$env:SD_MODEL_PATH = "D:\project\evoneural\weights\stable-diffusion-2-1-base"
streamlit run app.py
How it works
- Skybox tab: You enter a text prompt β the app loads Stable Diffusion (from cache or Hugging Face) β generates a 2:1 image β saves to
outputs/and shows a download button. Optional βseamlessβ check compares left/right edges. - Mesh tab: You enter a prompt (or upload an image) β the app generates an image with SD (if needed) β runs TripoSR on that image β outputs a
.objor.glbtooutputs/(requires TripoSR repo cloned in./TripoSR). - Model loading: The app first tries a local folder (
SD_MODEL_PATHorweights/stable-diffusion-2-1-baseif complete). If none, it loadsrunwayml/stable-diffusion-v1-5from the Hub (first run downloads the model; later runs use the cache). No token needed unless your network restricts Hugging Face.
3. Run the app
From the project root (with venv activated):
cd D:\project\evoneural
.venv\Scripts\Activate.ps1
streamlit run app.py
Open http://localhost:8501.
- Text β 3D Mesh: Enter a prompt (or upload an image). First run downloads SD 2.1 and TripoSR weights.
- Text β Skybox: Enter a prompt; image is 2:1 (e.g. 1024Γ512). Use βRun seamless edge checkβ to compare left/right edges.
Outputs are under outputs/. Use the download buttons to save mesh (.glb/.obj) and skybox (.png).
4. Performance
- Skybox: ~6β8 GB VRAM (SD 2.1, 1024Γ512, FP16). Use 2048Γ1024 only if you have enough VRAM.
- Mesh: ~6 GB for TripoSR + ~6 GB for SD (text-to-image). Total peak can be ~10β12 GB if both run in same process.
If you run out of VRAM:
- Use 1024Γ512 for skybox.
- Close other GPU apps.
- Consider quantization (e.g. 8-bit) or CPU offload in diffusers (see Optimization).
5. Optimization (if VRAM is exceeded)
- Quantization: Use
load_in_8bit=Trueorload_in_4bit=Truewithbitsandbyteswhere supported in diffusers. - Model CPU offload: In diffusers,
pipe.enable_sequential_cpu_offload()orpipe.enable_model_cpu_offload()to move parts to CPU and reduce peak VRAM (slower). - Smaller resolution: 512Γ512 for text-to-image; 1024Γ512 for skybox.
Project layout
evoneural/
βββ README.md
βββ app.py # Streamlit UI
βββ requirements.txt
βββ environment.yml
βββ scripts/
β βββ skybox_generator.py
β βββ mesh_generator.py
β βββ text_to_image.py
β βββ check_seamless.py
βββ outputs/ # Generated meshes and skybox images
βββ TripoSR/ # Clone here (see step 2)
License
See TripoSR and Stable Diffusion model licenses (MIT / Stability). This MVP is for local use and evaluation.