开发 | Comma.ai 发布无人驾驶数据集 comma2k19

2018 年 12 月 21 日 AI科技评论

项目地址：https://github.com/commaai/comma2k19

comma.ai 发布了 comma2k19, 这是加利福尼亚280高速公路上超过33小时通勤的数据集。这意味着在加利福尼亚州圣何塞和旧金山之间20公里的高速公路上行驶了2019段，每段1分钟。 comma2k19是一个完全可重现且可扩展的数据集。数据采用comma EONs收集，其传感器类似于任何现代智能手机，包括道路相机，手机GPS，温度计和9轴IMU。此外，EON还使用comma grey panda捕获原始GNSS测量值和汽车发送的所有CAN数据。

出版刊物

有关此数据集的详细说明，请参阅我们的论文。如果您在研究中使用comma2k19或Laika，请考虑引用。

@misc{1812.05752,
Author = {Harald Schafer and Eder Santana and Andrew Haden and Riccardo Biasini},
Title = {A Commute in Data: The comma2k19 Dataset},
Year = {2018},
Eprint = {arXiv:1812.05752},
}

下载

数据集有100GB这么大，所以将以10GB为一个数据包进行下载，链接戳此

示例代码

在这份实验报告中有一个用于实验的示例数据段。还有一些笔记里带有一些示例代码，包括位置基准。这些代码只在python 2.x和ubuntu 16.04上进行了测试。如果您尚未安装相关软件包，请运行命令 pip install -r requirements_examples.txt。这些示例包含1分钟的示例和一些示例笔记。

processed_readers：数据读取和绘图的一些示例
position_benchmarks：运行用于评估修复质量的位置基准的示例
raw_readers：使用openpilot_tools的示例

有关原始GNSS的示例，请查看 Laika

数据集结构

目录结构

数据被分成10块，每一块大约200分钟的车程。数据集的1-2块是RAV4，其余的是civic。RAV4的dongle_id是b0c9d2329ad1606b, civic的dongle_id是99c94dc769b5d96e。

Dataset_chunk_n
|
+-- route_id (dongle_id|start_time)
   |
   +-- segment_number
       |
       +-- preview.png (first frame video)
       +-- raw_log.bz2 (raw capnp log, can be read with openpilot-tools: logreader)
       +-- video.hevc (video file, can be read with openpilot-tools: framereader)
       +-- processed_log/ (processed logs as numpy arrays, see format for details)
       +-- global_pos/ (global poses of camera as numpy arrays, see format for details)

日志格式

processed_log控制器中的每个日志类型都包含2个numpy数组。使用系统设备的引导时间和值数组的时间戳数组（以秒为单位）。

processed_log
|
+--IMU ([forward, right, down])
|  |
|  +--acceleration: (m^2/s)
|  +--gyro_uncalibrated (rad/s)
|  +--gyro_bias: android gyro bias estimate (rad/s)
|  +--gyro: with android bias correction (rad/s)
|  +--magnetic_uncalibrated: (T)
|  +--magnetic: with android calibration(T)
|
+--CAN data:
|  |
|  +--car_speed (m/s)
|  +--steering_angle (deg)
|  +--wheel_speeds: [front_left, front_right, rear_left, rear_right] (m/s)
|  +--radar: [forward distance (m),
|  |          left distance (m),
|  |          nan,
|  |          nan,
|  |          address,
|  |          new_track (bool)]
|  +--raw CAN: This not stored as a value array but as three seperate arrays [src, address, data]
|
+--GNSS
  |
  +--live_gnss_qcom: [latitude (deg),
  |                   longitude (deg),
  |                   speed (m/s),
  |                   utc_timestamp (s),
  |                   altitude (m),
  |                   bearing (deg)]
  +--live_gnss_ublox: [latitude (deg),
  |                    longitude (deg),
  |                    speed (m/s),
  |                    utc_timestamp (s),
  |                    altitude (m),
  |                    bearing (deg)]
  |
  +--raw_gnss_qcom: every row represents a measurement
  |                 of 1 sattelite at 1 epoch can easily
  |                 be manipulated with laika.
  |                 [prn (nmea_id, see laika),
  |                  week of gps_time of reception (gps_week),
  |                  time pf week of gps_time of reception (s),
  |                  nan,
  |                  pseudorange (m),
  |                  pseudorange_std (m),
  |                  pseudorange_rate (m/s),
  |                  pseudorange_rate_std (m/s)]
  +--raw_gnss_ublox: every row represents a measurement
                     of 1 sattelite at 1 epoch can easily
                     be manipulated with laika.
                     [prn (nmea_id, see laika),
                      week of gps_time of reception (gps_week),
                      time pf week of gps_time of reception (s),
                      GLONASS channel number (-7..6) nan if not GLONASS,
                      pseudorange (m),
                      pseudorange_std (m),
                      pseudorange_rate (m/s),
                      pseudorange_rate_std (m/s)]

姿势格式

存储摄像机的姿势和视频的每帧的时间戳

如下：

frame_times: timestamps of video frames in boot time (s)
 frame_gps_times: timestamps of video frames in gps_time: ([gps week (weeks), time-of-week (s)])
 frame_positions: global positions in ECEF of camera(m)
 frame_velocities: global velocity in ECEF of camera (m/s)
 frame_orientations: global orientations as quaternion needed to
                     rotate from ECEF  frame to local camera frame
                     defined as [forward, right, down] (hamilton quaternion!!!!)

联系

有任何问题、疑虑或者建议，请联系harald@comma.ai

项目地址：https://github.com/commaai/comma2k19

点击 阅读原文 查看本文更多内容↙

登录查看更多

相关内容

数据集

关注 83

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

Google 发布图片配对基准及挑战：从系列图像重建三维物体和建筑物

专知会员服务

39+阅读 · 2020年4月4日

近期必读的6篇CVPR 2020【域自适应（Domain Adaptation）】相关论文和代码

专知会员服务

95+阅读 · 2020年3月24日

【CVPR2020-UBC】改进小样本学习视觉分类，Few-Shot Visual Classification

专知会员服务

67+阅读 · 2020年2月25日