I've just started administering a Hadoop cluster. We're using Bright Cluster Manager up to the O/S level (CentOS 7.1) and then Ambari together with Hortonworks HDP 2.3 for Hadoop.
I'm constantly getting requests for new python modules to be installed. Some modules we've installed at setup using yum and as the cluster has progressed some modules have been installed using pip.
What is the "right" way to do this? Always use yum and not be able to provide the latest and greatest modules? Always use pip and not have one point of truth (yum) showing which packages are installed? Or is it fine to use both pip and yum together?
I'm just worried that I'm filling the system with junk and too many versions of python modules. Any suggestions?
[link][4 comments]