[IPython-dev] RandomForestClassifier w/ IPython.parallel
Fri Feb 7 15:06:54 CST 2014
While Olivier Grisel may hang out here, I strongly suggest you repost on
the sklearn list. He's a core dev for sklearn and an expert precisely on
the intersection of sklearn and IPython.parallel (a topic on which he's
teaching a tutorial next week at Strata).
The only catch is that he's likely traveling precisely for his Strata
tutorial... But he's much more likely to give you an authoritative response
on that than any of us from the IPython team.
Note also that sklearn supports simple parallelization with joblib for many
of its tools (though I don't know if their RF classifier does). That *may*
do the trick for you.
On Fri, Feb 7, 2014 at 12:36 PM, Alessandro Gagliardi <
> Not sure if I'm addressing the best list for this question, so if
> there's a more appropriate list, please direct me to it.
> I want to run a large sklearn.ensemble.RandomForestClassifier (with
> maybe a dozens or maybe hundreds of trees and 100,000 samples). My desktop
> won't handle this so I want to try using StarCluster.
> RandomForestClassifier seems to parallelize easily, but I don't know how I
> would split it across many IPython.parallel engines (if that's even
> possible). (Or maybe I should be foregoing IPython.parallel and using MPI?)
> Any help would be greatly appreciated.
> Alessandro Gagliardi| Glassdoor| email@example.com
> *We're hiring! Check out our open jobs
> *Twitter <https://twitter.com/Glassdoor>** | Facebook
> <https://www.facebook.com/Glassdoor> | Glassdoor Blog
> *2012 Webby Award Winner: Best Employment Site*
> *2013 Webby Award Winner: Best Guides/Ratings/Review Site*
> IPython-dev mailing list
Fernando Perez (@fperez_org; http://fperez.org)
fperez.net-at-gmail: mailing lists only (I ignore this when swamped!)
fernando.perez-at-berkeley: contact me here for any direct mail
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the IPython-dev