Data

📊 NEW 📊 To access FLOTO data, you can now log into the portal to download Netrics data from the new “Data” tab.

(🙋‍♂️ Need help creating an account? Follow these steps here.)

We update this archive with new files weekly from the FLOTO Netrics deployment. Read more below to learn about the data structure and format.

FLOTO Structured Data Publication

In addition to our raw JSON data, we now publish structured CSV files for each measurement table in our data warehouse. These files are cleaned and processed to facilitate easier analysis by researchers and other data users.

Available Tables

We currently publish the following tables:

  1. dev: Contains information about connected devices on the local network.
  2. dns_latency: Measures the latency of DNS queries.
  3. hops: Provides information about the number of network hops to a target.
  4. ip: Contains IP address information for the device.
  5. lml (last-mile latency): Measures various aspects of last-mile network performance.
  6. ping: Measures network latency to specific targets across various global locations and popular websites.
  7. speed_ookla: Contains results from Ookla speed tests.
  8. speed_ndt7: Contains results from NDT7 (Network Diagnostic Tool) speed tests.

Publication Frequency

Tables are updated and published on a weekly basis. Each update includes all new data collected since the last publication.

Table Structures

Below is an overview of each table’s structure. All tables include common columns such as device_short_uuid, measurement_datetime, and measurement_type to allow for joining across tables and identifying the source and timing of measurements.

dev

This table provides information about devices connected to the local network of the FLOTO device. It helps in understanding network usage patterns.

  • device_short_uuid: Unique identifier for the FLOTO device
  • measurement_datetime: Timestamp of the measurement
  • meas__connected_devices_arp__devices_1day: Number of devices connected in the last 24 hours
  • meas__connected_devices_arp__devices_1week: Number of devices connected in the last week
  • meas__connected_devices_arp__devices_active: Number of currently active devices
  • meas__connected_devices_arp__devices_total: Total number of unique devices ever connected

dns_latency

This table contains measurements of DNS query latency, which is crucial for understanding the responsiveness of DNS services.

  • device_short_uuid: Unique identifier for the FLOTO device
  • measurement_datetime: Timestamp of the measurement
  • meas__dns_latency__dns_query_avg_ms: Average DNS query latency in milliseconds
  • meas__dns_latency__dns_query_max_ms: Maximum DNS query latency in milliseconds

hops

This table provides information about the number of network hops to reach a specific target (Google in this case), which can indicate network path complexity.

  • device_short_uuid: Unique identifier for the FLOTO device
  • measurement_datetime: Timestamp of the measurement
  • meas__hops_to_target__hops_to_google: Number of hops to reach Google’s servers

ip

This table contains IP address information for the FLOTO device, which can be useful for geolocation and network identification purposes.

  • device_short_uuid: Unique identifier for the FLOTO device
  • measurement_datetime: Timestamp of the measurement
  • meas__ipquery__ipv4: IPv4 address of the FLOTO device

lml (last-mile latency)

This table provides detailed measurements of last-mile network performance, focusing on latency to Cloudflare DNS servers. It’s crucial for understanding the quality of the connection between the user and their ISP.

  • device_short_uuid: Unique identifier for the FLOTO device
  • measurement_datetime: Timestamp of the measurement
  • meas__lm_rtt__cloudflare_dns_last_mile_ping_packet_loss_pct: Packet loss percentage
  • meas__lm_rtt__cloudflare_dns_last_mile_ping_rtt_avg_ms: Average round-trip time in milliseconds
  • meas__lm_rtt__cloudflare_dns_last_mile_ping_rtt_max_ms: Maximum round-trip time in milliseconds
  • meas__lm_rtt__cloudflare_dns_last_mile_ping_rtt_min_ms: Minimum round-trip time in milliseconds
  • meas__lm_rtt__cloudflare_dns_last_mile_tr_rtt_median_ms: Median traceroute round-trip time in milliseconds

ping

The ping table contains results from network latency tests to specific targets across various global locations and popular websites. This data is essential for understanding network responsiveness and global connectivity.

  • device_short_uuid: Unique identifier for the FLOTO device
  • measurement_datetime: Timestamp of the measurement
  • meas__ping_latency__[location]_packet_loss_pct: Packet loss percentage for each location
  • meas__ping_latency__[location]_rtt_avg_ms: Average round-trip time in milliseconds for each location
  • meas__ping_latency__[location]_rtt_max_ms: Maximum round-trip time in milliseconds for each location
  • meas__ping_latency__[location]_rtt_min_ms: Minimum round-trip time in milliseconds for each location

Locations include: atlanta, chicago, denver, hong_kong, johannesburg, paris, sao_paulo, seattle, stockholm, sydney, tunis, washington_dc, amazon, facebook, google, suntimes, tribune, uchicago, wikipedia, youtube

speed_ookla

This table contains results from Ookla speed tests, providing detailed information about download and upload speeds, as well as latency and server information.

  • device_short_uuid: Unique identifier for the FLOTO device
  • measurement_datetime: Timestamp of the measurement
  • meas__ookla__speedtest_ookla_download: Download speed in Mbps
  • meas__ookla__speedtest_ookla_upload: Upload speed in Mbps
  • meas__ookla__speedtest_ookla_latency: Latency in milliseconds
  • meas__ookla__speedtest_ookla_jitter: Jitter in milliseconds
  • meas__ookla__speedtest_ookla_pktloss2: Packet loss percentage
  • meas__ookla__speedtest_ookla_server_id: Ookla server ID used for the test
  • meas__ookla__speedtest_ookla_server_name: Name of the Ookla server used
  • meas__ookla__speedtest_ookla_server_host: Hostname of the Ookla server used
  • meas__test_bytes_consumed: Total bytes consumed during the test

speed_ndt7

This table contains results from NDT7 (Network Diagnostic Tool) speed tests, offering comprehensive data on network performance including download and upload speeds, latency, and server information.

  • device_short_uuid: Unique identifier for the FLOTO device
  • measurement_datetime: Timestamp of the measurement
  • meas__ndt7__speedtest_ndt7_download: Download speed in Mbps
  • meas__ndt7__speedtest_ndt7_upload: Upload speed in Mbps
  • meas__ndt7__speedtest_ndt7_downloadlatency: Download latency in milliseconds
  • meas__ndt7__speedtest_ndt7_downloadretrans: Download retransmission rate
  • meas__ndt7__speedtest_ndt7_server: Hostname of the NDT7 server used
  • meas__ndt7__speedtest_ndt7_server_ip: IP address of the NDT7 server used
  • meas__test_bytes_consumed: Total bytes consumed during the test

Merging Tables

To combine data across tables, you can use the device_short_uuid column. Here’s an example using Python pandas to merge the ping and speed_ookla tables based on the device UUID:

import pandas as pd

# Assuming you've already loaded your data into pandas DataFrames
# ping_df = pd.read_csv('ping_data.csv')
# speed_ookla_df = pd.read_csv('speed_ookla_data.csv')

# Merge ping and speed_ookla data
merged_df = pd.merge(ping_df, speed_ookla_df, 
                     on='device_short_uuid',
                     how='inner')

# If you want to merge multiple tables, you can chain the merge operations
# For example, to also merge with the dns_latency table:
# dns_latency_df = pd.read_csv('dns_latency_data.csv')
# merged_df = merged_df.merge(dns_latency_df, 
#                             on='device_short_uuid',
#                             how='inner')

# Now you can perform analysis on the merged data
# For example, to see the average ping latency and download speed for each device:
device_summary = merged_df.groupby('device_short_uuid').agg({
    'meas__ping_latency__chicago_rtt_avg_ms': 'mean',
    'meas__ookla__speedtest_ookla_download': 'mean'
}).reset_index()

print(device_summary)

# To find the correlation between average ping latency and average download speed across devices:
correlation = device_summary['meas__ping_latency__chicago_rtt_avg_ms'].corr(
    device_summary['meas__ookla__speedtest_ookla_download']
)
print(f"Correlation between average Chicago ping latency and average download speed: {correlation}")

This code demonstrates how to merge the ping and speed_ookla tables using pandas based on the device UUID, and then perform some basic analysis on the merged data. You can extend this approach to merge additional tables and perform more complex analyses as needed.

Note: When merging only on device_short_uuid, be aware that this will combine all records for each device across different measurement times. This can be useful for device-level analysis but may not be suitable for time-sensitive comparisons. Ensure that this approach aligns with your analytical goals.

Also, ensure that you have the pandas library installed (pip install pandas) before running this code. Adjust the column names if they differ in your actual data files.

Device Metadata

FLOTO provides rich metadata about devices through its API endpoint (https://portal.floto.science/api/devices). Here are some key metadata fields that may be particularly useful for analysis:

  • device_name: The name assigned to the device
  • latitude and longitude: The geographical location of the device
  • is_online: Whether the device is currently online
  • os_version: The version of the operating system running on the device
  • cpu_temp: The CPU temperature of the device
  • memory_usage and memory_total: Memory usage statistics
  • storage_usage and storage_total: Storage usage statistics
  • ip_address: List of IP addresses associated with the device

This metadata can be merged with the performance data using the device UUID. Here’s an example using Python pandas to merge performance data with device metadata:

import pandas as pd
import requests

# Load performance data
ping_df = pd.read_csv('ping_data.csv')
speed_ookla_df = pd.read_csv('speed_ookla_data.csv')

# Fetch metadata from the FLOTO API
response = requests.get('https://portal.floto.science/api/devices')
metadata = pd.DataFrame(response.json())

# Rename the 'uuid' column in metadata to match the performance data
metadata = metadata.rename(columns={'uuid': 'device_short_uuid'})

# Merge performance data
merged_df = pd.merge(ping_df, speed_ookla_df, on='device_short_uuid', how='inner')

# Merge with metadata
full_df = pd.merge(merged_df, metadata, on='device_short_uuid', how='left')

# Now you can perform analysis using both performance data and metadata
# For example, to see the average download speed by OS version:
os_speed = full_df.groupby('os_version')['meas__ookla__speedtest_ookla_download'].mean().reset_index()
print(os_speed)

# Or to find the correlation between CPU temperature and download speed:
correlation = full_df['cpu_temp'].corr(full_df['meas__ookla__speedtest_ookla_download'])
print(f"Correlation between CPU temperature and download speed: {correlation}")

# You can also filter data based on metadata
# For example, to only look at online devices:
online_devices = full_df[full_df['is_online'] == True]

# Or to compare performance across different locations:
location_performance = full_df.groupby(['latitude', 'longitude'])[
    ['meas__ping_latency__chicago_rtt_avg_ms', 'meas__ookla__speedtest_ookla_download']
].mean().reset_index()
print(location_performance)

This code demonstrates how to fetch metadata from the FLOTO API, merge it with performance data, and perform analyses that incorporate both types of data.

Note: Ensure that you have the necessary libraries installed (pip install pandas requests) before running this code. Also, you may need to handle authentication for the API request depending on the FLOTO API setup.

By combining performance data with device metadata, researchers can conduct more comprehensive analyses, controlling for factors like device location, hardware specifications, and operational status. This can lead to more nuanced insights into network performance across different contexts and device configurations.

You can join this metadata with any of the measurement tables using the device_uuid column.

Accessing the Data

CSV files can be accessed through the FLOTO Data Portal. After logging in, navigate to the “Data” section to download the latest CSV files for each table.

For any questions about the structured data or for access to historical data, please contact us at contact@floto.science.