主題:Agile Machine Learning
摘要:凝聚人才,打造一支偉大的應用型機器學習團隊,是一項不小的壯舉。由於開發人員和數據科學家都在各自領域貢獻了專業知識,單靠通信就可能是一個挑戰。敏捷機器學習教你如何通過敏捷過程交付優秀的數據產品,並通過例子學習如何在生產環境中組織和管理一個快速發展的團隊,該團隊麵臨著大規模解決新數據問題的挑戰。作者的方法模擬了敏捷宣言中描述的開創性的工程原理。這本書提供了進一步的上下文,並將最初的原則與交付數據產品的係統的需求進行了對比。
作者簡介:Eric Carter,Eric Carter曾在微軟的Bing和Cortana團隊擔任合作夥伴團隊工程經理。在這些角色中,他致力於圍繞產品和評論、業務列表、電子郵件和日曆的搜索功能。他目前在微軟白板產品組上工作。
The problem of Approximate Nearest Neighbor (ANN) search is fundamental in computer science and has benefited from significant progress in the past couple of decades. However, most work has been devoted to pointsets whereas complex shapes have not been sufficiently treated. Here, we focus on distance functions between discretized curves in Euclidean space: they appear in a wide range of applications, from road segments to time-series in general dimension. For $\ell_p$-products of Euclidean metrics, for any $p$, we design simple and efficient data structures for ANN, based on randomized projections, which are of independent interest. They serve to solve proximity problems under a notion of distance between discretized curves, which generalizes both discrete Fr\'echet and Dynamic Time Warping distances. These are the most popular and practical approaches to comparing such curves. We offer the first data structures and query algorithms for ANN with arbitrarily good approximation factor, at the expense of increasing space usage and preprocessing time over existing methods. Query time complexity is comparable or significantly improved by our algorithms, our algorithm is especially efficient when the length of the curves is bounded.