自動機器學習(AutoML)是將機器學習應用於實際問題的過程的自動化過程。AutoML涵蓋了從原始數據集到可部署的機器學習模型的完整管道。提出將AutoML作為基於人工智能的解決方案來應對不斷增長的應用機器學習的挑戰。 AutoML的高度自動化允許非專家使用機器學習模型和技術,而無需首先成為該領域的專家。 從機器學習角度講,AutoML 可以看作是一個在給定數據和任務上學習和泛化能力非常強大的係統。但是它強調必須非常容易使用;從自動化角度講,AutoML 則可以看作是設計一係列高級的控製係統去操作機器學習模型,使得模型可以自動化地學習到合適的參數和配置而無需人工幹預。

VIP內容

本課程來自微軟《人工智能係統》課程,講述了自動機器學習係統,包含自動機器學習概述與原理、主要算法介紹、係統等。

https://github.com/microsoft/ai-edu/tree/master/A-%E5%9F%BA%E7%A1%80%E6%95%99%E7%A8%8B/A6-%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD%E7%B3%BB%E7%BB%9F

成為VIP會員查看完整內容
0
27
0

最新論文

We ascertain and compare the performances of AutoML tools on large, highly imbalanced healthcare datasets. We generated a large dataset using historical administrative claims including demographic information and flags for disease codes in four different time windows prior to 2019. We then trained three AutoML tools on this dataset to predict six different disease outcomes in 2019 and evaluated model performances on several metrics. The AutoML tools showed improvement from the baseline random forest model but did not differ significantly from each other. All models recorded low area under the precision-recall curve and failed to predict true positives while keeping the true negative rate high. Model performance was not directly related to prevalence. We provide a specific use-case to illustrate how to select a threshold that gives the best balance between true and false positive rates, as this is an important consideration in medical applications. Healthcare datasets present several challenges for AutoML tools, including large sample size, high imbalance, and limitations in the available features types. Improvements in scalability, combinations of imbalance-learning resampling and ensemble approaches, and curated feature selection are possible next steps to achieve better performance. Among the three explored, no AutoML tool consistently outperforms the rest in terms of predictive performance. The performances of the models in this study suggest that there may be room for improvement in handling medical claims data. Finally, selection of the optimal prediction threshold should be guided by the specific practical application.

0
0
0
下載
預覽
Top