Upload date
All time
Last hour
Today
This week
This month
This year
Type
All
Video
Channel
Playlist
Movie
Duration
Short (< 4 minutes)
Medium (4-20 minutes)
Long (> 20 minutes)
Sort by
Relevance
Rating
View count
Features
HD
Subtitles/CC
Creative Commons
3D
Live
4K
360°
VR180
HDR
652 results
https://arxiv.org/pdf/2603.03823 SWE-CI: Evaluating Long-Term Code Maintainability via Continuous Integration Agents The ...
70 views
6 days ago
https://www.anthropic.com/engineering/eval-awareness-browsecomp Eval Awareness in Claude Opus 4.6 BrowseComp ...
57 views
7 days ago
In this AI Research Roundup episode, Alex discusses the paper: 'SWE-CI: Evaluating Agent Capabilities in Maintaining ...
49 views
Get ALL of our systems & join hundreds of AI builders in our community ...
77,770 views
I'll show you exactly where AI-generated code fell short, and what human code review caught that the tests never would. What we ...
541 views
2 days ago
Your LLM passed the demo. It failed production. Here's how to fix that. Most teams ship RAG pipelines with zero evaluation — no ...
134 views
Stop relying on a single metric to judge your AI. Most AI teams face a massive "evaluation blind spot." Your model might score ...
4 views
5 days ago
1 view
In this AI Research Roundup episode, Alex discusses the paper: 'A Rubric-Supervised Critic from Sparse Real-World Outcomes' ...
8 views
The approach replaces sporadic maintenance with ongoing AI driven code evaluation that steadily refines performance and ...
169 views
Autoresearch AI Experiment Framework https://github.com/karpathy/autoresearch AutoResearch at Home Distributed Agent ...
31 views
Could AI pass the CWI exam? I tested ChatGPT against ASME B31.3 code with interesting results. In this video, I test artificial ...
34 views
4 days ago
Your metrics aren't real until your labels are — build a labeling workflow that compounds into a reliable golden set for LLM ...
0 views
Evaluation Orders for Syntax Directed Definitions in Compiler Design, Evaluation Orders for Syntax Directed Definitions in ...
Artifacts, projects, computer use, MCP, Claude Code In this episode of AI Daily, we review Claude, Anthropic's powerful AI.
88 views
In this AI Research Roundup episode, Alex discusses the paper: 'ZeroDayBench: Evaluating LLM Agents on Unseen Zero-Day ...
7 views
Claude Code Skills 2.0 adds auto-evaluations and self-improvement to the skill creator. Anthropic's tests showed 100% accuracy ...
124 views
1 day ago
Anthropic just published a paper showing Claude Opus 4.6 figured out it was being tested on BrowseComp, found the encrypted ...
5,931 views
3 days ago
In this video you'll learn the fundamentals of Python for Machine Learning, important Python libraries used in ML, and how ...
5 views
This educational presentation breaks down the simplified approach to Evaluation and Management (EM) coding under new ...
21 hours ago