CAM Seminar——Best-scored Random Forest Density Estimation

Date:2019-04-08

Speaker:Hanyuan Hang (Renmin University of China)

Time:2019-04-08 14:00-15:00

Venue:Room 1304, Sciences Building No. 1

个人简历:2002年考入武汉大学,2003年交换去德国斯图加特大学(Universität Stuttgart),2015年获得斯图加特大学统计学博士学位,研究方向为统计学习理论(Statistical Learning Theory),同年赴比利时鲁汶大(Katholieke Universiteit Leuven)电子工程系从事博士后研究,做统计机器学习方向,主要是支持向量机(Support Vector Machines)的研究。2017年加入中国人民大学统计与大数据研究院从事机器学习相关的科研和教学工作。
 
目前主要的研究方向:统计机器学习算法,包括支持向量机(Support Vector Machines),随机森林(Random Forest),深度学习(Deep Learning)的理论及其在各个领域的应用。
 
Abstract: In this talk, we present a brand new nonparametric density estimation strategy named the best-scored random forest density estimation whose effectiveness is supported by both solid theoretical analysis and significant experimental performance. The terminology bestscored stands for selecting one density tree with the best estimation performance out of a certain number of purely random density tree candidates and we then name the selected one the best-scored random density tree. In this manner, the ensemble of these selected trees that is the best-scored random density forest can achieve even better estimation results than simply integrating trees without selection. From the theoretical perspective, by decomposing the error term into two, we are able to carry out the following analysis: First of all, we establish the consistency of the best-scored random density trees under L1-norm. Secondly, we provide the convergence rates of them under L1-norm concerning with three different tail assumptions, respectively. Thirdly, the convergence rates under L∞-norm is presented. Last but not least, we also achieve the above convergence rates analysis for the best-scored random density forest. When conducting comparative experiments with other state-of-the-art density estimation approaches on both synthetic and real data sets, it turns out that our algorithm has not only significant advantages in terms of estimation accuracy over other methods, but also stronger resistance to the curse of dimensionality.