Remove the validity check that was gating the reporting of the obstruction_duration and obstruction_interval data items.
When that got added to the grpc protocol, I assumed that the valid flag meant whether the other 2 fields were useful data. That may actually have been the intent, for all I know, but I've never seen that flag report true. However, I also never saw the banner in the app telling me I would get obstructions every X minutes, either. That is, until a few weeks ago, probably due to a dish firmware change. Now I see the app reporting this data, so it must be ignoring that valid flag, or at least not using it for that purpose.
Probably nobody cares about this data (including me), given that nobody ever complained about it always reporting empty values, but I don't want to remove it and mess up db schemas, so it might as well be populated.
Add timeouts to all gRPC remote calls and bump yagrc package requirement to a version that does same for the reflection service, as well as fixing a state issue around failed lazy import resolution.
This should address the script hang symptom on issue #36.
last_24h_obstructed_s has been removed from the grpc service protocol, along with all the other deprecated fields. This renders the seconds_obstructed item in the status mode group useless.
This is the down side of using reflection at run time to pull the protocol definitions. If something gets removed from the protocol that is still being used, it will break the run time instead of just returning default values. Then again, if it gets removed from the protocol, it's no longer useful, anyway.
This should address issue #35.
Derive connectivity state information from the "outage" field of the get_status response, which I hadn't noticed before because it only populates when the dish is not in a connected state. This restores the state data item in the status mode group, which had been rendered useless due to a grpc service change.
In addition to the previous possible state names, this adds a few more that pertain to outages while otherwise connected, which I think were just previously reported as "CONNECTED", as well as some special cases of offline.
A recent firmware change has stopped populating a number of result message fields in the grpc service that had previously been marked as deprecated. This caused the script to start crashing in some use cases.
While those fields are still in the protocol definition for now, this change removes usage of them entirely, in case they get removed. As things are now, they are useless, anyway, since they will always just return default values.
This renders useless the state, snr, scheduled, obstructed, and unscheduled items in the status, bulk_history, and ping_drop mode groups. Those items now mostly return empty values.
See issue #32 for more detail.
The prior 2 changes made the handling of the first set of polled loops inconsistent with subsequent sets with respect to maintaining data across dish reboot. This change makes them both work the same (and correctly).
This restores the ability of -o to keep polled data across a reboot. It broke due to a simple issue in concatenate_history, but I realized that counter tracking for -o accumulated data was also a bit broken, so I fixed that, too.
Fix#31
Add a new command line script, dish_obstruction_map.py, that writes a PNG image based on the obstruction map data queried from the dish.
Supports color or greyscale output and either with or without alpha channel.
Does not yet support running in an interval loop, mostly because that will require templatizing the output filename in order to be useful.
Tracked on issue #27
Change the loop polling function (-o) to aggregate the history data each polling loop instead of just keeping the last polled history so it can be logged when reboot is detected. This allows for computing statistics across a longer period than the size of the dish's history buffer, which has been reduced to 15 minutes recently.
This change also makes it so data is not logged right away when dish reboot is detected, so the logging always happens at the specified interval whether there was a reboot or not.
Finally, change the poll loop counting so data is not emitted on the first loop when polling is configured. That made sense to do when the history buffer was large enough to have the entire period's worth of data, but now it just results in a short period in the log output every time the script is restarted.
Fixes#29
Add dish direction and "prolonged" obstruction info to the status mode group.
These were added to the grpc service at some point over the last several months.
Only lightly tested, given that my dish no longer reports significant periods of obstruction.
This is related to discussion in issue #27, although it doesn't address that issue in the slightest.
statistics.quantiles was not present in Python 3.7 or earlier, which is a problem on Windows if you want to run a binary optimized version of the protobuf package, since those are not currently being posted for Python 3.8 or later.
This change switches to use the weighted median function just with equal weights. It's a bit of overkill, but it also cuts out the mess that was working around deficiencies of the statistics.quantiles implementation.
SpaceX has been using inconsistent field ordering when adding alerts, so field index cannot be used to consistently identify the specific alerts. Message number is more appropriate for that, anyway, but is not guaranteed to be a low enough number to fit into a bit field. Oh well, in the unlikely event that SpaceX switches to larger message numbers, they just won't show up in the alerts bit field (but will still show up in alert_detail).
This does make the bit ordering in alerts inconsistent with prior versions of these tools, but I've never actually seen one of these alerts report true, so hopefully this doesn't impact anyone.
The alerts are still sorted by index number in the alert_detail text output, which is a problem for CSV output, but I think ordering by message number instead would be pointlessly complex. alert_detail is not a great fit for CSV output anyway, due to its variable length, so just added a warning about that in the text script module doc.
A few things I noticed while porting this code to the JSON script. The only real change here is fixing the bulk history output to print UTC time instead of local time.
This further complicates the code, for functionality that probably only I care about, but when computing stats for relatively long time intervals, it really hurts when the dish reboots and up to an entire time period's worth of data is lost at exactly the point where it may have been having interesting behavior.
Since the alert types are determined dynamically from the protocol definition, the status schema may need to be updated even if nothing changed in the scripts, when the dish software adds a new alert type (which just happened, say hello to the "mast_not_near_vertical" alert). This allows the manual override for that case, not just schema version downgrade.
Probably not terribly useful unless someone needs to tunnel through a different network to get to their dish, but it makes testing the dish unreachable case a lot easier. This was complicated a bit by the fact that a channel (and therefor the dish IP and port) is needed to get the list of alert types via reflection due to prior changes.
This exposed some issues with the error message for dish unreachable, so fixed those.
I had only added the one that the Starlink app uses to show obstructions, because it didn't seem like the other one was all that useful, but people seem to be interested in studying the difference between the 2, so might as well have it. This is in the obstruction_detail group, along with the other one. I'm kinda regretting naming the first one as I did, though, because it's now a little confusing between my naming and the naming in the grpc message.
Since this is a new field, also had to implement schema updates for the sqlite script.
If the spacex grpc modules are not available in the import path, will now fall back to using reflection to get them dynamically. I'm not real happy with the mess this made of the import lines, though (and neither is pylint...), so I may hack on that a little further when I get the time.
Add a requirements.txt file to enable installation of all prerequisites so users don't have to follow the individual instructions for each dependency package. At some point, I'll really need to add proper Python packaging so the whole thing can just be installed via pip.
Rearrange the README a bit, since some of the sections have increased or decreased in relevance over time.
I'm sure this isn't a particularly optimal implementation, but it's functional.
This required exporting knowledge about the types that will be returned per field from starlink_grpc and moving things around a little in dish_common.
Apparently, it was a little _too_ simple.
Also, update the description for seconds_to_first_nonempty_slot field to reflect some behavior this script was able to capture.
Add latency and usage stat groups to the stats computed from history samples. This includes an attempt at characterizing latency under network load, too, but I don't know how useful that's going to be, so I have marked that as experimental, in case it needs algorithmic improvements.
The new groups are enabled on the command line by use of the new mode names: ping_latency, ping_loaded_latency, and usage.
Add valid_s to the obstruction details status group. This was the only missing field from everything available in the status response (other than wedge_fraction_obstructed, which seems redundant to wedge_abs_fraction_obstructed), and I only skipped it because I don't know what it means exactly. Adding it now with my best guess at a description in order to avoid a compatibility breaking change later.
Closes#5
This restores the functionality that the InfluxDB status polling script had whereby instead of using a new grpc Channel for each RPC call, it would keep one open and reuse it, retrying one time if it ever fails, which can happen if the connection is lost between calls. Now all the grpc scripts have this functionality.
Also, hedge a little bit in the descriptions for what the obstruction detail fields means, given that I'm not sure my assumptions there are correct.
Combined the history and status scripts for each data backend and moved some of the shared code into a separate module. Since the existing script names were not appropriate for the new combined versions, the main entry point scripts now have new names, which better conform with Python module name conventions: dish_grpc_text.py, dish_grpc_mqtt.py, and dish_grpc_influx.py. pylint seems happier with those names, at any rate.
Switched the argument parsing from getopt to argparse, since that better facilitates sharing the common bits. The whole command line interface is now different in that the selection of data groups to process must be made as required arg(s) rather than option flags, but for the most part, the scripts support choosing an arbitrary list of groups and will process them all.
Split the monster main() functions into a more reasonable set of functions.
Added new functions to starlink_grpc to support getting the status, which returns the data in a form similar to the history data functions. Reformatted the starlink_grpc module docstring to render better with pydoc. Also changed the way sequence data field names are reported so that the consuming scripts can name them correctly without resorting to hacky special casing based on specific field names. This would subtly break the old scripts that had been expecting the old naming, but those scripts are now gone.
The code is harder to follow now, IMO, but this should allow adding of new features and/or data backends without having to make the same change in 6 places as had been the case. To that end, dish_grpc_text now supports bulk history mode, since it was trivial to add once I had it implemented in order to support that feature for dish_grpc_influx.
Since "current" got added to the global data group returned from getting the history stats in non-bulk mode, it was being output by all 3 of the history scripts, and the name "current" was a little confusing when looking at prior output, since old values would no longer be current. The description of it in the start param of history_bulk_data was confusing, too.
Add tracking of exactly which samples have already been sent off to InfluxDB so that samples are neither missed nor repeated due to minor time deltas in OS task scheduling. For now, this is only being applied to bulk mode.
Make the -s option only apply to the first loop iteration for bulk mode, since subsequent loops will want to pick up all samples since prior iteration.
Also, omit the latency field from the data point sent to InfluxDB for samples where the ping drop is 100%. The raw history data apparently just repeats prior value in this case, probably because it cannot just leave a hole in the data array and there is no good way to indicate invalid.
Related to issue #5
Specific history data patterns would sometimes lead to some of the stats switching between int and float type even if they were always whole numbers. This should ensure that doesn't happen.
I think this will fix#12, but will likely require deleting all the spacex.starlink.user_terminal.ping_stats data points from the database before the type conflict failure will go away.
Unlike the status info scripts, these include support for setting host and database parameters via command line options. Still to be added is support for HTTPS/SSL.
Add a get_id function to the grpc parser module, so it can be used for tagging purposes.
Minor cleanups in some of the other scripts to make them consistent with the newly added scripts.
Move the "samples" stat out of the ping drop group of stats and into a new general stats group.
This way, it will make more sense if/when additional stat groups are added, since that stat will apply to all of them.
Moves the parsing logic that will be shared by some upcoming scripts out into a separate module.
There's still just as much duplication between the JSON parser and the grpc parser as there was before, but this should at least prevent further duplication of this logic.
This also adds some proper documentation for what each of the stats means.