I had long been wanting to get my hands dirty with the recently released ath9k spectral scan code and proof-of-concept graph tool, even more after the BattleMesh v6 where I had the invaluable opportunity of not only meeting the author (the ever smiling Simon Wunderlich), but also getting some spectral scan details patiently explained by Felix Fietkau.
Furthermore, Laurent Guerby from tetaneutral.net came up with a pretty interesting idea: fetch samples once a day or so from each node, collecting them as historical data that can be used afterwards to debug unexpected network performance issues, such as those caused by rogue routers popping up, or any other change in the spectrum environment.
My initial thoughts for using the code was along the lines of Ubiquiti propietary AirView features, but taking advantage of the fact that samples can be collected without interrupting normal node traffic, enabling the possibility of having a real-time overview of a whole network instead of only one link; now, Laurent’s idea came as a perfect complement and both things combined would greatly enhance the diagnosing capacity of network admins.
Well, I first needed to get a “feeling” about it, see a quick mockup of how it could work, and (being myself a non-programmer) I resorted to my usual approach on things: make a script sooo dirty that I’d end up being proud of it 🙂
watch -n 1 "t=\$(date +%F_%T); ssh $host \"echo chanscan > /sys/kernel/debug/ieee80211/phy0/ath9k/spectral_scan_ctl ; iw wlan0 scan &>/dev/null ; cat /sys/kernel/debug/ieee80211/phy0/ath9k/spectral_scan0\" > /tmp/fft_\$t ; ./fft_eval /tmp/fft_\$t"
Yes, that will spawn a new window every couple of seconds, so it will soon freak out your window manager… unless (yay!) you periodically killall fft_eval with another watch! Yeah, I’m disgusting!
Anyway, that crappy loop gave me a good enough framerate (0.5hz or so) to start playing with different hardware and scenarios:
- TL-WDR4300 indoors, on a table, silent environment. It looks as if there was activity over 2410-2450mHz, but this is actually the (visually broken) baseline, environment was quiet. The TL-WR703N graph below was taken at the same place, same time, and it shows everything smoothly below -100dB.
- Same TL-WDR4300 indoors, while my laptop 10cm away was doing a long 10mbit netperf to another AP in channel 11 (2462mHz) HT20. You can clearly spot 802.11n characteristic TX emission mask, and very visually understand interference between what are commonly thought as “non-overlapping” channels (such as 6 and 11).
- TL-WR703N indoors, on the same table as the previous TL-WDR4300, silent environment baseline.
- Same TL-WR703N indoors, this time during the neighbouring 10mbit netperf stream. If it can be compared to the graph of an entirely different hardware (TL-WDR4300) it simply looks attenuated, probably due to having only a single, internal antenna, vs 3 external ones in the TL-WDR4300.
- labanda-oeste: Outdoors, a TL-WDR3500 taking part as a real-world node of a (late-night) DeltaLibre. 100 meters away from it, another outdoor TL-WR842ND transfers 30mbit/s of garbage to a 3rd node, farther away. Here the free-space loss makes a difference and the cross-channel “bleeding” is attenuated to a trace.
- mirancho: Another TL-WDR3500 node, nearby, showing the Delta at ground level is a very quiet 2.4ghz environment.
- tdorado: Finally, a TL-WDR3500 node which has been suffering from performance problems for the last few months. Interference was so far suspected: high rate of retries, and “iw survey dump” reported a worrying noise floor of -85 (“wait, can we trust that number or is it driver foobar?”)… then, this spectral scan confirms the issue. The node is located just a hundred meters away from mirancho, only at a higher altitude, above tree tops, evidently exposing the antennas to long range signals coming from the city.
As a footnote, here’s what I understood (and/or recall) from talking with Felix:
- There are 3 modes to collect samples: (I) during scans, which will show samples from the whole spectrum; or during normal operation, which only gets samples from the currently used channel. In this latter case, you can either (II) get samples only for the background “noise” on the channel, or (III) take as well into account the signals that compose a successfully decoded packet (for example, a packet destined to the router collecting the spectral data).
- (I) The first option has the evident disadvantage of cutting the communication during the several hundred milliseconds it may take to scan the spectrum, but it’s the only way to get the full picture. This is the approach used in the graphs presented: the spectral_scan_ctl is set to “chanscan”
- (II) The second mode could be used to continuously collect historical background data of the active channel, without ever disrupting the communications. Fluctuations in traffic volume of the node wouldn’t make a difference; instead, only changes in 3rd party activity would be noticed. A subtle detail: if there’s a 3rd party network on the very same frequency and channel width, that doesn’t necessarily count as noise: if those packets get successfully decoded by the card, they’ll count as “received” (and then normally discarded as irrelevant). Only if the packet is incomprehensible (such as the tail after a collision, or sent on a partially overlapping channel) the signal will count as “noise”.
- (III) This mode would get the same results as (I), only on a narrow bandwidth, but allowing the radio to stay on the active channel and thus maintaining communications.
- I did not find a way to get samples in other settings than chanscan; so I’m not sure those other modes weren’t just creative (mis)interpretations of mine… any confirmation or rebuttal will be appreciated!