<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Many Matrices on dibbla.space</title><link>https://dibbla.space/posts/matrices/</link><description>Recent content in Many Matrices on dibbla.space</description><generator>Hugo</generator><language>en-us</language><copyright>by Dibbla with ❤</copyright><lastBuildDate>Thu, 21 May 2026 00:19:20 -0700</lastBuildDate><atom:link href="https://dibbla.space/posts/matrices/index.xml" rel="self" type="application/rss+xml"/><item><title>Several thoughts on automated science</title><link>https://dibbla.space/posts/matrices/automated_science/</link><pubDate>Thu, 21 May 2026 00:19:20 -0700</pubDate><guid>https://dibbla.space/posts/matrices/automated_science/</guid><description>&lt;p>When I first write down the hook of this blog last month, I was thinking about something narrow: AI models have become very good at coding, and this is changing how machine learning research gets done. My thought was that the field might be shifting from a &lt;em>workflow&lt;/em> problem to an &lt;em>evaluation&lt;/em> problem — where the hard part is no longer doing the work, but judging which results are advancing human understanding.&lt;/p></description></item><item><title>The intervention-based imitation learning (IIL) family</title><link>https://dibbla.space/posts/matrices/iil_family/</link><pubDate>Sat, 21 Oct 2023 11:37:48 +0800</pubDate><guid>https://dibbla.space/posts/matrices/iil_family/</guid><description>&lt;p>&lt;em>Update Nov 2025:&lt;/em> I am surprised by PI integrates IIL method into the &lt;a href="https://arxiv.org/abs/2511.14759">$\pi$*-0.6 model&lt;/a>, and I firmly believe that human / end-user will be integrated into the post-post-training of robotic foundation models in certain ways.&lt;/p>
&lt;hr>
&lt;p>In this blog, we discuss the imitation learning in an online fashion with human from DAgger, to HG-DAgger and more recent advances&lt;/p>
&lt;h2 id="dagger">DAgger&lt;/h2>
&lt;p>Dataset Aggregation (DAgger) is an imitation learning algorithm proposed in the AISTATS11 paper &lt;em>A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning&lt;/em> by Stéphane Ross, Geoffrey J. Gordon and J. Andrew Bagnell. It is a simple yet effective algorithm that has been widely used in imitation learning, and as you can tell from the title, it&amp;rsquo;s not related to human-in-the-loop RL.&lt;/p></description></item></channel></rss>