Transaction: 899da4c2b831796a2a4dd2ed21fc9fa857a4fb5cd566cd1004f96c3b282eee46

#0

0.00000000 BSV

jmetaB03b6bc0cb7add6342d7037445b3705ae2f3c17c73cd0e000ab61dfa646e2453de7@104e04f4dc7bbb58b675a0be8ec8a2392cd828cadc0c1b85347e2d4ab003150erss.itemmetarss.netM <item> <title>NYU CTF Dataset: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security</title> <link>https://arxiv.org/abs/2406.05590</link> <description>arXiv:2406.05590v2 Announce Type: replace Abstract: Large Language Models (LLMs) are being deployed across various domains today. However, their capacity to solve Capture the Flag (CTF) challenges in cybersecurity has not been thoroughly evaluated. To address this, we develop a novel method to assess LLMs in solving CTF challenges by creating a scalable, open-source benchmark database specifically designed for these applications. This database includes metadata for LLM testing and adaptive learning, compiling a diverse range of CTF challenges from popular competitions. Utilizing the advanced function calling capabilities of LLMs, we build a fully automated system with an enhanced workflow and support for external tool calls. Our benchmark dataset and automated framework allow us to evaluate the performance of five LLMs, encompassing both black-box and open-source models. This work lays the foundation for future research into improving the efficiency of LLMs in interactive cybersecurity tasks and automated task planning. By providing a specialized dataset, our project offers an ideal platform for developing, testing, and refining LLM-based approaches to vulnerability detection and resolution. Evaluating LLMs on these challenges and comparing with human performance yields insights into their potential for AI-driven cybersecurity solutions to perform real-world threat management. We make our dataset open source to public https://github.com/NYU-LLM-CTF/LLM_CTF_Database along with our playground automated framework https://github.com/NYU-LLM-CTF/llm_ctf_automation.</description> <guid isPermaLink="false">oai:arXiv.org:2406.05590v2</guid> <category>cs.CR</category> <category>cs.AI</category> <category>cs.CY</category> <category>cs.LG</category> <arxiv:announce_type>replace</arxiv:announce_type> <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights> <dc:creator>Minghao Shao, Sofija Jancheska, Meet Udeshi, Brendan Dolan-Gavitt, Haoran Xi, Kimberly Milner, Boyuan Chen, Max Yin, Siddharth Garg, Prashanth Krishnamurthy, Farshad Khorrami, Ramesh Karri, Muhammad Shafique</dc:creator> </item>

https://whatsonchain.com/tx/899da4c2b831796a2a4dd2ed21fc9fa857a4fb5cd566cd1004f96c3b282eee46

�jmetaB03b6bc0cb7add6342d7037445b3705ae2f3c17c73cd0e000ab61dfa646e2453de7@104e04f4dc7bbb58b675a0be8ec8a2392cd828cadc0c1b85347e2d4ab003150erss.itemmetarss.netM	<item> <title>NYU CTF Dataset: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security</title> <link>https://arxiv.org/abs/2406.05590</link> <description>arXiv:2406.05590v2 Announce Type: replace Abstract: Large Language Models (LLMs) are being deployed across various domains today. However, their capacity to solve Capture the Flag (CTF) challenges in cybersecurity has not been thoroughly evaluated. To address this, we develop a novel method to assess LLMs in solving CTF challenges by creating a scalable, open-source benchmark database specifically designed for these applications. This database includes metadata for LLM testing and adaptive learning, compiling a diverse range of CTF challenges from popular competitions. Utilizing the advanced function calling capabilities of LLMs, we build a fully automated system with an enhanced workflow and support for external tool calls. Our benchmark dataset and automated framework allow us to evaluate the performance of five LLMs, encompassing both black-box and open-source models. This work lays the foundation for future research into improving the efficiency of LLMs in interactive cybersecurity tasks and automated task planning. By providing a specialized dataset, our project offers an ideal platform for developing, testing, and refining LLM-based approaches to vulnerability detection and resolution. Evaluating LLMs on these challenges and comparing with human performance yields insights into their potential for AI-driven cybersecurity solutions to perform real-world threat management. We make our dataset open source to public https://github.com/NYU-LLM-CTF/LLM_CTF_Database along with our playground automated framework https://github.com/NYU-LLM-CTF/llm_ctf_automation.</description> <guid isPermaLink="false">oai:arXiv.org:2406.05590v2</guid> <category>cs.CR</category> <category>cs.AI</category> <category>cs.CY</category> <category>cs.LG</category> <arxiv:announce_type>replace</arxiv:announce_type> <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights> <dc:creator>Minghao Shao, Sofija Jancheska, Meet Udeshi, Brendan Dolan-Gavitt, Haoran Xi, Kimberly Milner, Boyuan Chen, Max Yin, Siddharth Garg, Prashanth Krishnamurthy, Farshad Khorrami, Ramesh Karri, Muhammad Shafique</dc:creator> </item>

#1

1H3ojA4K3r2Hg6sFFWqwBkrDMGfbgM5TGy

0.00000001 BSV

#2

1EUryCwVWWx1uvQLmmdcsomk8LebFuxyQN

spent 08c424448a474812e5385efb3d67c98ffcbf43e93e62bf70f640b944e2b30284 [0]

0.00312801 BSV

Settings

Transaction

1 Input

3 Outputs