📊 NEW 📊 To access FLOTO data, you can now log into the portal to download Netrics data from the new “Data” tab.
(🙋♂️ Need help creating an account? Follow these steps here.)
We update this archive with new files weekly from the FLOTO Netrics deployment. Read more below to learn about the data structure and format.
Contents
FLOTO Structured Data Publication
In addition to our raw JSON data, we now publish structured CSV files for each measurement table in our data warehouse. These files are cleaned and processed to facilitate easier analysis by researchers and other data users.
Available Tables
We currently publish the following tables:
- dev: Contains information about connected devices on the local network.
- dns_latency: Measures the latency of DNS queries.
- hops: Provides information about the number of network hops to a target.
- ip: Contains IP address information for the device.
- lml (last-mile latency): Measures various aspects of last-mile network performance.
- ping: Measures network latency to specific targets across various global locations and popular websites.
- speed_ookla: Contains results from Ookla speed tests.
- speed_ndt7: Contains results from NDT7 (Network Diagnostic Tool) speed tests.
Publication Frequency
Tables are updated and published on a weekly basis. Each update includes all new data collected since the last publication.
Table Structures
Below is an overview of each table’s structure. All tables include common columns such as device_short_uuid
, measurement_datetime
, and measurement_type
to allow for joining across tables and identifying the source and timing of measurements.
dev
This table provides information about devices connected to the local network of the FLOTO device. It helps in understanding network usage patterns.
device_short_uuid
: Unique identifier for the FLOTO devicemeasurement_datetime
: Timestamp of the measurementmeas__connected_devices_arp__devices_1day
: Number of devices connected in the last 24 hoursmeas__connected_devices_arp__devices_1week
: Number of devices connected in the last weekmeas__connected_devices_arp__devices_active
: Number of currently active devicesmeas__connected_devices_arp__devices_total
: Total number of unique devices ever connected
dns_latency
This table contains measurements of DNS query latency, which is crucial for understanding the responsiveness of DNS services.
device_short_uuid
: Unique identifier for the FLOTO devicemeasurement_datetime
: Timestamp of the measurementmeas__dns_latency__dns_query_avg_ms
: Average DNS query latency in millisecondsmeas__dns_latency__dns_query_max_ms
: Maximum DNS query latency in milliseconds
hops
This table provides information about the number of network hops to reach a specific target (Google in this case), which can indicate network path complexity.
device_short_uuid
: Unique identifier for the FLOTO devicemeasurement_datetime
: Timestamp of the measurementmeas__hops_to_target__hops_to_google
: Number of hops to reach Google’s servers
ip
This table contains IP address information for the FLOTO device, which can be useful for geolocation and network identification purposes.
device_short_uuid
: Unique identifier for the FLOTO devicemeasurement_datetime
: Timestamp of the measurementmeas__ipquery__ipv4
: IPv4 address of the FLOTO device
lml (last-mile latency)
This table provides detailed measurements of last-mile network performance, focusing on latency to Cloudflare DNS servers. It’s crucial for understanding the quality of the connection between the user and their ISP.
device_short_uuid
: Unique identifier for the FLOTO devicemeasurement_datetime
: Timestamp of the measurementmeas__lm_rtt__cloudflare_dns_last_mile_ping_packet_loss_pct
: Packet loss percentagemeas__lm_rtt__cloudflare_dns_last_mile_ping_rtt_avg_ms
: Average round-trip time in millisecondsmeas__lm_rtt__cloudflare_dns_last_mile_ping_rtt_max_ms
: Maximum round-trip time in millisecondsmeas__lm_rtt__cloudflare_dns_last_mile_ping_rtt_min_ms
: Minimum round-trip time in millisecondsmeas__lm_rtt__cloudflare_dns_last_mile_tr_rtt_median_ms
: Median traceroute round-trip time in milliseconds
ping
The ping table contains results from network latency tests to specific targets across various global locations and popular websites. This data is essential for understanding network responsiveness and global connectivity.
device_short_uuid
: Unique identifier for the FLOTO devicemeasurement_datetime
: Timestamp of the measurementmeas__ping_latency__[location]_packet_loss_pct
: Packet loss percentage for each locationmeas__ping_latency__[location]_rtt_avg_ms
: Average round-trip time in milliseconds for each locationmeas__ping_latency__[location]_rtt_max_ms
: Maximum round-trip time in milliseconds for each locationmeas__ping_latency__[location]_rtt_min_ms
: Minimum round-trip time in milliseconds for each location
Locations include: atlanta, chicago, denver, hong_kong, johannesburg, paris, sao_paulo, seattle, stockholm, sydney, tunis, washington_dc, amazon, facebook, google, suntimes, tribune, uchicago, wikipedia, youtube
speed_ookla
This table contains results from Ookla speed tests, providing detailed information about download and upload speeds, as well as latency and server information.
device_short_uuid
: Unique identifier for the FLOTO devicemeasurement_datetime
: Timestamp of the measurementmeas__ookla__speedtest_ookla_download
: Download speed in Mbpsmeas__ookla__speedtest_ookla_upload
: Upload speed in Mbpsmeas__ookla__speedtest_ookla_latency
: Latency in millisecondsmeas__ookla__speedtest_ookla_jitter
: Jitter in millisecondsmeas__ookla__speedtest_ookla_pktloss2
: Packet loss percentagemeas__ookla__speedtest_ookla_server_id
: Ookla server ID used for the testmeas__ookla__speedtest_ookla_server_name
: Name of the Ookla server usedmeas__ookla__speedtest_ookla_server_host
: Hostname of the Ookla server usedmeas__test_bytes_consumed
: Total bytes consumed during the test
speed_ndt7
This table contains results from NDT7 (Network Diagnostic Tool) speed tests, offering comprehensive data on network performance including download and upload speeds, latency, and server information.
device_short_uuid
: Unique identifier for the FLOTO devicemeasurement_datetime
: Timestamp of the measurementmeas__ndt7__speedtest_ndt7_download
: Download speed in Mbpsmeas__ndt7__speedtest_ndt7_upload
: Upload speed in Mbpsmeas__ndt7__speedtest_ndt7_downloadlatency
: Download latency in millisecondsmeas__ndt7__speedtest_ndt7_downloadretrans
: Download retransmission ratemeas__ndt7__speedtest_ndt7_server
: Hostname of the NDT7 server usedmeas__ndt7__speedtest_ndt7_server_ip
: IP address of the NDT7 server usedmeas__test_bytes_consumed
: Total bytes consumed during the test
Merging Tables
To combine data across tables, you can use the device_short_uuid
column. Here’s an example using Python pandas to merge the ping and speed_ookla tables based on the device UUID:
import pandas as pd # Assuming you've already loaded your data into pandas DataFrames # ping_df = pd.read_csv('ping_data.csv') # speed_ookla_df = pd.read_csv('speed_ookla_data.csv') # Merge ping and speed_ookla data merged_df = pd.merge(ping_df, speed_ookla_df, on='device_short_uuid', how='inner') # If you want to merge multiple tables, you can chain the merge operations # For example, to also merge with the dns_latency table: # dns_latency_df = pd.read_csv('dns_latency_data.csv') # merged_df = merged_df.merge(dns_latency_df, # on='device_short_uuid', # how='inner') # Now you can perform analysis on the merged data # For example, to see the average ping latency and download speed for each device: device_summary = merged_df.groupby('device_short_uuid').agg({ 'meas__ping_latency__chicago_rtt_avg_ms': 'mean', 'meas__ookla__speedtest_ookla_download': 'mean' }).reset_index() print(device_summary) # To find the correlation between average ping latency and average download speed across devices: correlation = device_summary['meas__ping_latency__chicago_rtt_avg_ms'].corr( device_summary['meas__ookla__speedtest_ookla_download'] ) print(f"Correlation between average Chicago ping latency and average download speed: {correlation}")
This code demonstrates how to merge the ping and speed_ookla tables using pandas based on the device UUID, and then perform some basic analysis on the merged data. You can extend this approach to merge additional tables and perform more complex analyses as needed.
Note: When merging only on device_short_uuid
, be aware that this will combine all records for each device across different measurement times. This can be useful for device-level analysis but may not be suitable for time-sensitive comparisons. Ensure that this approach aligns with your analytical goals.
Also, ensure that you have the pandas library installed (pip install pandas
) before running this code. Adjust the column names if they differ in your actual data files.
Device Metadata
FLOTO provides rich metadata about devices through its API endpoint (https://portal.floto.science/api/devices). Here are some key metadata fields that may be particularly useful for analysis:
device_name
: The name assigned to the devicelatitude
andlongitude
: The geographical location of the deviceis_online
: Whether the device is currently onlineos_version
: The version of the operating system running on the devicecpu_temp
: The CPU temperature of the devicememory_usage
andmemory_total
: Memory usage statisticsstorage_usage
andstorage_total
: Storage usage statisticsip_address
: List of IP addresses associated with the device
This metadata can be merged with the performance data using the device UUID. Here’s an example using Python pandas to merge performance data with device metadata:
import pandas as pd import requests # Load performance data ping_df = pd.read_csv('ping_data.csv') speed_ookla_df = pd.read_csv('speed_ookla_data.csv') # Fetch metadata from the FLOTO API response = requests.get('https://portal.floto.science/api/devices') metadata = pd.DataFrame(response.json()) # Rename the 'uuid' column in metadata to match the performance data metadata = metadata.rename(columns={'uuid': 'device_short_uuid'}) # Merge performance data merged_df = pd.merge(ping_df, speed_ookla_df, on='device_short_uuid', how='inner') # Merge with metadata full_df = pd.merge(merged_df, metadata, on='device_short_uuid', how='left') # Now you can perform analysis using both performance data and metadata # For example, to see the average download speed by OS version: os_speed = full_df.groupby('os_version')['meas__ookla__speedtest_ookla_download'].mean().reset_index() print(os_speed) # Or to find the correlation between CPU temperature and download speed: correlation = full_df['cpu_temp'].corr(full_df['meas__ookla__speedtest_ookla_download']) print(f"Correlation between CPU temperature and download speed: {correlation}") # You can also filter data based on metadata # For example, to only look at online devices: online_devices = full_df[full_df['is_online'] == True] # Or to compare performance across different locations: location_performance = full_df.groupby(['latitude', 'longitude'])[ ['meas__ping_latency__chicago_rtt_avg_ms', 'meas__ookla__speedtest_ookla_download'] ].mean().reset_index() print(location_performance)
This code demonstrates how to fetch metadata from the FLOTO API, merge it with performance data, and perform analyses that incorporate both types of data.
Note: Ensure that you have the necessary libraries installed (pip install pandas requests
) before running this code. Also, you may need to handle authentication for the API request depending on the FLOTO API setup.
By combining performance data with device metadata, researchers can conduct more comprehensive analyses, controlling for factors like device location, hardware specifications, and operational status. This can lead to more nuanced insights into network performance across different contexts and device configurations.
You can join this metadata with any of the measurement tables using the device_uuid
column.
Accessing the Data
CSV files can be accessed through the FLOTO Data Portal. After logging in, navigate to the “Data” section to download the latest CSV files for each table.
For any questions about the structured data or for access to historical data, please contact us at contact@floto.science.