o1-preview论文 - 专知

会员服务 ·

o1-preview

Bridging Technology and Humanities: Evaluating the Impact of Large Language Models on Social Sciences Research with DeepSeek-R1

Arxiv

0+阅读 · 3月20日

Bridging Technology and Humanities: Evaluating the Impact of Large Language Models on Social Sciences Research with DeepSeek-R1

Arxiv

0+阅读 · 3月21日

A recent evaluation on the performance of LLMs on radiation oncology physics using questions of randomly shuffled options

Arxiv

0+阅读 · 1月21日

A recent evaluation on the performance of LLMs on radiation oncology physics using questions of randomly shuffled options

Arxiv

0+阅读 · 1月3日

Can GPT-O1 Kill All Bugs? An Evaluation of GPT-Family LLMs on QuixBugs

Arxiv

1+阅读 · 2024年12月17日

AI Cyber Risk Benchmark: Automated Exploitation Capabilities

Arxiv

0+阅读 · 2024年12月9日

Can OpenAI o1 outperform humans in higher-order cognitive thinking?

Arxiv

0+阅读 · 2024年12月7日

O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?

Arxiv

0+阅读 · 2024年11月25日

LLMs as Method Actors: A Model for Prompt Engineering and Architecture

Arxiv

0+阅读 · 2024年11月11日

Evaluating the Accuracy of Chatbots in Financial Literature

Arxiv

0+阅读 · 2024年11月11日

LLMs as Method Actors: A Model for Prompt Engineering and Architecture

Arxiv

0+阅读 · 2024年11月8日

From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond

Arxiv

0+阅读 · 2024年11月6日

On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability

Arxiv

0+阅读 · 2024年10月14日

On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability

Arxiv

0+阅读 · 2024年10月11日

Scheherazade: Evaluating Chain-of-Thought Math Reasoning in LLMs with Chain-of-Problems

Arxiv

0+阅读 · 2024年10月11日

参考链接

微信扫码咨询专知VIP会员