Direct Answer:
Spatial audio for travel filmmakers means recording sound in 360 degrees using ambisonic microphones — capturing a full sphere of audio that can be rotated, decoded, and played back on headphones, speakers, or VR headsets. The standard entry point is a Zoom H3-VR (~$349) paired with a furry windscreen. Record at least 5–10 minutes per location. Edit in Reaper or Logic using the free Facebook 360 Spatial Workstation plugin. Export as first-order ambisonics for YouTube, binaural stereo for everything else.
The Hook
It’s 4:00 AM on the third day of a six-day location shoot in the BC interior. Not glamorous filmmaking — a low-budget documentary that was supposed to wrap in four. I’m the producer, the gaffer, and apparently also the guy who forgot to check whether the ambisonic recorder was actually recording and not just displaying a very convincing blinking light.
We’d placed the H3-VR at ear level on a C-stand at the edge of a fog-covered lake. Forty minutes of what should have been the centerpiece audio for the film — the whole reason we’d driven six hours — stored exactly nowhere because I’d hit the monitor button instead of record. The sound is gone. The fog lifts by 6:30 AM and doesn’t come back.
What we used instead was a 90-second clip I’d grabbed almost accidentally the afternoon before, walking back to the truck, recorder still running in my jacket pocket. Wrong position. Wrong height. Facing the wrong direction. And it was, somehow, the best spatial audio in the film.
That story has two lessons. One is obvious: check your levels, check your recording status, then check again. The other is less obvious and more useful: spatial audio is more forgiving than you think, and more unforgiving than you expect — sometimes in the same take.
Affiliate note: PeekAtThis participates in the Amazon Associates program. If you buy gear through links here, we get a small cut. It doesn’t change what we recommend — gear that doesn’t work on a real shoot doesn’t get recommended, regardless of the commission.
Why Most Spatial Audio Guides Are Wrong Before They Start
Search “spatial audio for filmmakers” and you’ll find a predictable pattern: a breathless introduction about how immersive sound will “transform your storytelling,” a gear list with affiliate links, some plugin names dropped in a post-production section, and a closing paragraph about the exciting future of VR.
What you won’t find: an honest conversation about when spatial audio is overkill, which gear fails in cold weather, what “40% increased engagement” actually means (nothing — there’s no real source for that stat), or why most travel filmmakers who invest in an ambisonic setup use it for maybe 30% of their audio and reach for a shotgun mic the rest of the time.
Here’s the unpopular opinion this guide is built around: spatial audio is a location sound tool, not a storytelling tool. It captures environments brilliantly. It captures dialogue poorly. It performs best when you’re not trying to do anything clever with it — when you set it up, walk away, and let the location breathe for ten minutes. Most of the “immersive soundscape” results people rave about are just the result of patience, not technique.
That changes how you should think about gear, placement, and workflow.
What Spatial Audio Actually Is (Without the Sales Pitch)
Traditional stereo gives you left and right — two channels representing a three-dimensional world. It works fine. Most of the greatest travel films ever made used stereo audio. Don’t let anyone tell you otherwise.
Spatial audio — specifically the ambisonic format — captures a full sphere of sound: left, right, above, below, behind. Instead of two channels, first-order ambisonics records four (called W, X, Y, Z in B-format). Those four channels encode the entire sound field mathematically, which means in post-production you can rotate the whole soundscape, reorient what’s “forward,” and decode it for any playback system — headphones, stereo speakers, VR headsets.
The practical implication: if your mic was facing north when you recorded, but your camera was pointing east, you can fix that in editing. That single feature has saved more travel shoots than any other aspect of the format.
A-format is the raw output from the individual mic capsules. B-format is the processed, standardized version your editing software expects. Most recorders handle this conversion automatically. You don’t need to think about it unless something breaks.
Higher-order ambisonics (HOA) uses more channels for better spatial resolution. For travel filmmaking, first-order is enough. The jump to HOA gives you marginal improvement in spatial accuracy and a significant jump in file size and processing complexity. Not worth it unless you’re delivering to a premium VR platform with specific requirements.
The Best Spatial Audio Gear for Travel Filmmakers (The Honest Version)
The Budget Pick: Zoom H3-VR (~$349)
This is the right starting point for most travel filmmakers. It’s an all-in-one recorder with built-in ambisonic mics, runs on batteries for 10+ hours, weighs 160 grams, and records directly to microSD. The onboard decoding means you can get a rough idea of your spatial audio in the field.
The honest downside nobody mentions: the H3-VR’s preamps are mediocre. In quiet environments — a forest at dawn, an empty cathedral, a still beach — you’ll hear noise floor in your recordings. It’s manageable in post, but it’s there. If you’re shooting in consistently loud environments (markets, festivals, city streets), you won’t notice. If you’re chasing delicate soundscapes, you will.
Also: the plastic build is not just aesthetically lacking — it affects the mic in cold weather. Below about 5°C the body creaks when it contracts, and that mechanical noise can bleed into recordings if you’re handling it. Leave it on a stand. Don’t handhold in cold locations.
Who should NOT buy it: Anyone shooting primarily in quiet natural environments where noise floor matters. Anyone who needs professional XLR connectivity. Anyone who will be disappointed that “portable ambisonic recorder” still requires a separate windscreen, a stand, and some patience before it sounds like anything other than a windy mess.
The Mid-Range Option: Røde NT-SF1 (~$699) + Field Recorder
The NT-SF1 was developed with SoundField — the company that invented ambisonics in the 1970s — and it shows. The spatial resolution is noticeably cleaner than the H3-VR, and the build quality is professional-grade metal that doesn’t creak at altitude.
The catch: it’s a microphone, not a recorder. You need a field recorder with four XLR inputs and the ability to record four channels simultaneously. The Zoom F6 (~$400) is the common pairing. So now you’re at roughly $1,100 for the combination, you’re carrying two devices, and your setup time has doubled.
For a dedicated location sound workflow — someone who’s treating audio as seriously as they treat camera — this setup makes sense. For a solo travel filmmaker already carrying a camera, lenses, a gimbal, and luggage weight limits, it’s probably too much.
Who should NOT buy it: Solo filmmakers without a dedicated audio person. Anyone who needs to move fast between locations. Anyone not already comfortable with signal chain troubleshooting, because when this setup misbehaves at 7 AM in a foreign city, the problem could be the mic, the cable, the recorder settings, or the XLR connector you didn’t seat fully.
The Discontinued Gold Standard: Sennheiser AMBEO VR Mic (used, ~$800–$1,200)
Sennheiser discontinued the AMBEO, which means the used market is your only option. The audio quality is exceptional — professionals used it on major VR productions — and the build is solid.
The real problem with recommending it in 2025 is support. Firmware updates are done. If you encounter a hardware issue, you’re on your own. And for travel filmmaking specifically, gear that can’t be repaired or replaced in the field is a liability.
If you find one in good condition at a reasonable price and you’re an experienced user who can troubleshoot independently, it’s worth it. Otherwise, don’t chase a discontinued product when the NT-SF1 exists.
Ambisonics Microphone Comparison
360° Audio for VR & Spatial Recording
| Microphone | Type | Ambisonics Order | Recording | Power | Price (approx) | |
|---|---|---|---|---|---|---|
| Zoom H3-VR | Handy recorder + mic array | 1st order A-format | Built-in recorder (B-format conversion) | 2× AA batteries or USB | ~$349–399 | Buy → |
| Røde NT-SF1 | Studio Ambisonics mic | 1st order (native B-format) | Requires external recorder (4 channels) | 48V phantom power | ~$799–899 | Buy → |
| Sennheiser AMBEO VR Mic | Professional VR mic array | 1st order (A-format) | Requires external recorder + plugin conversion | 48V phantom power | ~$1,599–1,799 | Check Price → |
The Windscreen Question (More Important Than Your Mic Choice)
Here’s the thing about windscreens that nobody leads with: a bad windscreen with a good mic sounds worse than a good windscreen with a mediocre mic, in any outdoor recording situation.
Ambisonic mics have multiple capsules pointing in different directions. Every one of them catches wind. The foam covers that ship with most recorders are adequate for indoor drafts and will be completely useless anywhere with actual wind — which is most interesting travel locations.
Budget $80–$150 for a proper furry deadcat (Rycote makes the most reliable options for most ambisonic mics). This is not optional equipment. This is the thing that determines whether your outdoor recordings are usable.
The test: if you can hear the wind at all while monitoring through headphones, your wind protection is insufficient.
Where to Put the Mic (And Why “Ear Level” Is Only Half the Answer)
The standard advice is “ear level, about 5–6 feet.” That’s fine as a default. Here’s what the default misses.
Height changes what you capture more than position does.
At chest height (4 feet) in a busy street scene, you capture footsteps, bag handles, nearby conversations, and passing vehicles — close, immediate, slightly claustrophobic. At 7–8 feet, the same street sounds wider, more atmospheric, less crowded. You lose some foreground detail and gain a sense of space.
Neither is wrong. They’re different interpretations of the same location. The question is what your edit needs.
Distance from sound sources is a decision, not a default.
6–12 feet from the primary action is the standard recommendation for balanced environmental capture. But “balanced” is a production choice, not a virtue. If you’re documenting a specific craftsperson — a metalworker, a cook, a musician — being 6 feet away gives you the environment. Being 3 feet away gives you the work. The ambisonic recording at 3 feet with the hammer strikes close and the workshop sound falling off behind you tells a different story than the same scene recorded from across the room.
Reflective surfaces are the hidden problem.
Markets, train stations, churches, courtyards — any space with hard parallel surfaces creates early reflections that muddy ambisonic recordings. Unlike a directional mic where you can point away from a wall, an ambisonic mic captures everything. Minimum 4 feet from any hard surface. More if the space is live (short reverb decay sounds metallic; long reverb sounds spacious — know which you’re in before you commit to a position).
The monitoring problem nobody solves cleanly.
You cannot fully hear what you’re capturing until you decode the recording in post. Headphone monitoring in the field gives you a rough sense of levels and obvious problems — wind noise, clipping, dead silence — but the actual spatial character of your recording won’t be apparent until you’re sitting in front of a DAW.
The practical response: record longer than you think you need, from multiple positions, and take written notes about what interesting sounds are where. Don’t trust that the recording you’re hearing in the field is the recording you’ll have in editing.
Recording Technique: What Actually Works
The most valuable thing you can do before hitting record is nothing.
Spend five minutes in a location just listening before you set up. Close your eyes. Map the sound: what’s close, what’s mid-distance, what’s far. Where is the most interesting sonic activity? Where is it quiet? What’s the rhythm of the place — is it constant, or does it have peaks and valleys?
This isn’t meditation advice. It’s workflow efficiency. A filmmaker who listens for five minutes and places their mic once gets better recordings than one who sets up in three positions and records for two minutes each.
Record for longer than you think is necessary.
30-second spatial audio clips are almost always useless. Locations have rhythm. A market has cycles of activity. A natural environment has wind gusts and bird call patterns and moments of complete stillness. You need at least 5 minutes of continuous recording to capture that character — 10 minutes for natural environments, the full duration for live events.
The secondary benefit: unexpected things happen in long takes. The moment that makes the film is usually the one you didn’t anticipate.
The room tone ritual — don’t skip it.
At the end of every location setup, after you’ve captured everything you need, ask anyone present to be still and quiet. Record 2–3 minutes of just the location’s ambient sound. Label it clearly.
This becomes your safety net in post: fill gaps, smooth edits, extend moments, build transitions. Spatial room tone is more useful than stereo room tone because it preserves the directional character of the space. A minute of ambisonic room tone from a specific location is worth more in the edit than an hour of decent-but-directionless recording.
Wind is a production problem, not a post-production problem.
You can remove some wind noise in iZotope RX. You cannot remove severe wind contamination. When wind is making your recording unusable, stop recording. Find a natural windbreak — behind a building, inside a doorway, under a tree canopy — and set up there instead. The recording you get in a slightly less ideal position with no wind noise is more valuable than a perfect position with rumble across the low frequencies.
The one exception: gentle consistent wind that reads as atmospheric rather than technical noise. That’s a judgment call, and you develop it with experience.
Post-Production Workflow
Software: what you actually need
Reaper ($60, one-time license) is the Swiss Army knife — it handles ambisonic workflows natively with proper multi-channel routing, it’s deeply customizable, and it costs less than a month of Pro Tools. The tradeoff is that it rewards people who don’t mind getting their hands dirty. The interface is not intuitive. Nothing is where you expect it. The learning curve is front-loaded — steep for a month, then mostly invisible.
Logic Pro is the opposite: the Apple-way, clean and predictable, with a gentler onboarding if you’re already in the Mac ecosystem. It handles spatial audio competently. The limitation is that its ambisonic support relies more heavily on third-party plugins, and some of those plugins are less stable in Logic than in Reaper. If you already know Logic, stay in Logic. If you’re starting from scratch, Reaper’s lower cost and stronger multi-channel routing make it the better long-term choice for this specific workflow.
I do most of my audio work between shifts — I work days as a doorman at a four-star hotel in Victoria, which means evenings with headphones on and a session open. Reaper suits that workflow: low cost, no subscription, and it runs lean on a laptop that’s seen better days. Not everyone has a dedicated studio setup, and the software you’ll actually open is more valuable than the software with the better spec sheet.
Essential free plugins:
Facebook 360 Spatial Workstation does everything you need for a basic workflow: decoding, rotation, conversion between formats. It’s free, it works, and it’s available for all major DAWs. Start here.
Røde’s Soundfield plugin is worth installing if you’re using the NT-SF1 — it’s designed specifically for that mic’s output.
IEM Plugin Suite is open-source and more powerful than the FB360 tools for complex work. Learn it when you outgrow the basics.
The rotation step people forget
Your mic and your camera were pointing in different directions on at least some of your shots. In post, you need to rotate the ambisonic field to match your visual framing. This is done with a rotator plugin (FB360 Spatialiser has one) — adjust yaw (horizontal), pitch, and roll until the soundscape matches what’s on screen.
This takes practice to do quickly. Do it before you start cutting anything else.
Processing rules
Gentle high-pass filter at 80–100Hz removes low-frequency rumble without touching spatial information. Light compression at 2:1 to 3:1 maximum — aggressive compression flattens the spatial dynamics that make ambisonic recordings interesting. Noise reduction should be applied with ambisonic-aware tools; standard stereo noise reduction applied to a multi-channel ambisonic file will corrupt the spatial encoding.
Do not apply stereo imaging plugins to ambisonic tracks. Do not apply conventional reverb to ambisonic tracks. These destroy the encoding.
Layering: the technique that separates good from memorable
A single ambisonic recording of a location, dropped into an edit, is fine. Layered ambisonic recordings, blended at different levels with the spatial fields rotated to different orientations, create something that feels like presence rather than documentation.
Three-layer approach:
- Wide environmental recording at 10–15 feet, pushed back in the mix (–10 to –15dB relative to dialogue)
- Mid-range recording at 6–8 feet, slightly forward (–6 to –10dB)
- Close detail at 3–5 feet, most prominent but not overwhelming (–3 to –6dB)
Automate these layers throughout the scene. When your edit cuts to a wide shot, bring up the wide environmental layer. Close-up, bring up the detail layer. This matches audio perspective to visual perspective in a way that audiences feel without consciously noticing.
Exporting for different platforms
YouTube: export as first-order ambisonics (4-channel B-format), run through Spatial Media Metadata Injector to tag the file. YouTube decodes automatically for headphone listeners and downgrades to stereo for speaker listeners. You need the metadata tag or none of this works.
Instagram, TikTok, general distribution: decode to binaural stereo before export. Some spatial character survives for headphone listeners; it sounds normal on speakers.
Apple Vision Pro / VR platforms: B-format, higher bitrate (256kbps minimum), check platform-specific delivery specs because they vary.
Keep your ambisonic master file. Storage is cheap. Going back to re-decode from a binaural export is not possible.
Frequently Asked Questions
Do I need special equipment to record spatial audio for travel videos?
You need an ambisonic microphone — a mic with multiple capsules pointing in different directions simultaneously. The Zoom H3-VR (~$349) is the most practical starting point because it’s a recorder and mic in one unit. Beyond that, you need a furry windscreen ($80–$150) and a stable way to mount it (tripod or boom pole). That’s the complete kit. Everything else is optional until you’ve outgrown it.
What's the difference between spatial audio and surround sound?
Surround sound is channel-based — you have five or seven speakers in fixed positions and audio is mixed to each one. Spatial audio (ambisonics) is format-based: it encodes the entire sound field mathematically, so it can be decoded for any playback system — headphones, stereo speakers, a 5.1 setup, or a VR headset. One ambisonic recording works everywhere. One surround mix works only for the speaker configuration it was designed for.
Can you record spatial audio on a smartphone?
Not true ambisonics. Some apps simulate spatial audio effects, but a smartphone mic captures mono or stereo. For genuine ambisonic recording you need a dedicated mic with at minimum four capsules arranged in a tetrahedral pattern. The Zoom H3-VR connects to a smartphone via USB and can function as an audio interface, which is as close as you’ll get to mobile spatial audio on a budget.
Does YouTube support spatial audio?
Yes, but only if you tag your file correctly. Export your video with a first-order ambisonic audio track, then run it through the free Spatial Media Metadata Injector tool before uploading. Without that metadata tag, YouTube treats the audio as standard stereo and the spatial information is lost. With the tag, YouTube automatically decodes for headphone listeners and delivers stereo for everyone else.
Is spatial audio worth it for travel vlogs, or is it overkill?
Depends on the vlog. Fast-moving content shot across multiple locations in a single day — where you’re moving too quickly to set up a mic properly and let a location breathe — spatial audio is probably overkill. The setup time costs more than the audio gains. For slower destination content where you’re spending real time in a place, it’s worth it. The honest answer: most travel vloggers would get more value from a better shotgun mic and a proper windscreen before investing in ambisonics.
What free plugins do I need for ambisonic editing?
Three cover most workflows. Facebook 360 Spatial Workstation handles decoding, rotation, and format conversion — it’s free, works in all major DAWs, and is the standard starting point. The Røde Soundfield plugin is free and specifically useful if you’re shooting with the NT-SF1. The IEM Plugin Suite is open-source, more powerful than FB360 for complex work, and worth learning once you’ve got the basics down. You don’t need paid plugins to produce professional-quality spatial audio.
How long should I record ambisonic audio at each location?
Minimum five minutes for a static location. Ten minutes for natural environments. The full duration for live events. Locations have rhythm — peaks and valleys of activity, wind patterns, ambient cycles — and a 30-second clip captures almost none of that character. Longer takes also give you room tone options and catch unexpected moments that you didn’t anticipate when you pressed record. Storage is cheap. Unusable short clips are expensive.
Why does my spatial audio sound flat or mono?
Three likely causes. First, check that your file imported as four channels — if your DAW collapsed it to stereo on import, the spatial encoding is gone. Second, make sure you have a decoder plugin inserted on the track; without decoding, ambisonic B-format sounds phase-cancelled and flat. Third, confirm your recorder actually captured in ambisonic mode and not standard stereo — some recorders default to stereo and require manual mode selection. Check all three before assuming the mic or recording is at fault.
When Spatial Audio Is Overkill
This section doesn’t appear in other guides because it’s bad for affiliate link clicks. Here it is anyway.
Spatial audio is overkill for:
- Interview-heavy travel documentaries where dialogue drives the edit and environment is secondary
- Fast-moving run-and-gun travel vlogs where setup time is the constraint
- Content shot primarily indoors in acoustically mediocre spaces (hotel rooms, airports, most restaurants)
- Projects where the final delivery is vertical video for mobile — the spatial experience largely collapses on phone speakers
Spatial audio is worth the investment for:
- Destination content where the location is the subject
- VR or 360° video (where it’s not optional)
- Documentary work shot in environments with genuine sonic character — markets, natural landscapes, cultural events, architecture with interesting acoustics
- Any project with a post-production workflow that actually has time to treat the audio properly
The honest version of this guide ends with: buy the Zoom H3-VR, get the Rycote windscreen, record longer than you think you need, and deliver in binaural stereo until you have a specific reason to do otherwise. That’s 80% of the workflow. The rest is refinement.
The Verdict
Spatial audio is not the thing that makes travel films better. Patience is. Location selection is. Listening before recording is. Spatial audio is the format that lets you capture those choices properly when you’ve made them.
The H3-VR is the right starting point. The windscreen is not optional. The post-production workflow has a learning curve that takes about three projects to feel comfortable. And the moment you hear a well-decoded ambisonic recording on good headphones — the rain slightly behind you and to the left, the market ahead, the footsteps passing on the right — you’ll understand why the format exists and why the setup time is worth it.
Check that you’re actually recording before you walk away from the mic. Every time.
Zoom H3-VR
Portable Ambisonic Recorder • 32-bit float • 4-channel
Capture fully immersive 360° audio for VR, 360° video, or binaural listening. Built-in Ambisonic mic array, 32-bit float recording (no clipping), and headphone monitoring with spatial decoding. All in a pocket-sized, battery-powered field recorder.
🛒 Check price on Amazon →The “PeekatThis” Bio & Closing
The Fine Print: Peekatthis.com is part of the Amazon Services LLC Associates Program, which means we get a small commission when you click our links and buy stuff. It’s a way of saying “Thanks for supporting the site!” We also team up with B&H, Adorama, Clickbank, and other folks we trust. If you found this helpful, share it with a friend, drop a comment, or bookmark this page before you head into your next shoot.
About the Author:
Trent Peek is a director, producer, and actor who spends way too much time staring at monitors. While he’s comfortable with high-end glass from RED and ARRI, he still has a soft spot for the Blackmagic Pocket and the “duct tape and a dream” style of indie filmmaking.
His recent short film, “Going Home,” was a selection for the 2024 Soho International Film Festival, proving that sometimes the “lessons from the trenches” actually pay off.
When he isn’t on set, Trent is likely traveling (usually forgetting at least one essential pair of shoes), falling asleep two pages into a book, or brainstorming film ideas that—let’s be honest—will probably never see the light of day. It’s a mess, but it’s his mess.
P.S. Writing this in the third person felt incredibly weird.
Connect with Trent:
- Watch: YouTube | [Vimeo]
- Credits: [IMDB] | [Stage 32]
- Social: Instagram @trentalor | [Facebook @peekatthis]
- Hear him talk shop: Check out his guest spot on the Pushin Podcast discussing the director’s role in indie film.
Business Inquiries: trentalor@peekatthis.com