Make it so…
Make It So! represents a pioneering exploration into the convergence of artificial intelligence and augmented reality, pushing the boundaries of how we interact with digital content in physical spaces. This iOS application transforms spoken words into tangible digital artifacts, marking a significant step towards a future where AR glasses become as ubiquitous as smartphones.
At its core, this project investigates a profound question: what happens when we grant users the power to manifest their imagination through voice alone? By treating voice commands as modern incantations, we're not just building an app - we're exploring new paradigms of human-computer interaction where the line between thought and creation becomes increasingly blurred.
The Technical Foundation
Our implementation leverages cutting-edge AI models across multiple domains:
- On-device Whisper-tiny processing enables responsive voice-to-text conversion, ensuring user privacy and reducing latency
- A custom-tuned Stable Diffusion instance running on HuggingFace transforms textual descriptions into vivid imagery
- Meshy's advanced 3D conversion API bridges the gap between 2D imagery and spatially-aware augmented reality objects
While our current pipeline takes approximately 2 minutes to process requests, this experimental implementation demonstrates the immense potential of combining generative AI with spatial computing. As these technologies mature and processing times decrease, we envision a world where digital creation becomes as natural as conversation - where our physical spaces become infinite canvases for collective creativity and expression.
voice by Slash from Noun Project (CC BY 3.0)