One of the pitfalls of machine learning is that creating a single predictive model has the potential to overfit your data. That is, the performance on your training data might be very good, but the model does not generalize well to new data. Ensemble learning of decision trees, also referred to as forests or simply ensembles, is a tried-and-true technique for reducing the error of single machine-learned models. By learning multiple models over different subsamples of your data and taking a majority vote at prediction time, the risk of overfitting a single model to all of the data is mitigated. You can read more about this in our previous post.
Early this year, we showed how BigML ensembles outperform their solo counterparts and even beat other machine learning services. However, up until now creating ensembles with BigML has only been available via our API. We are excited to announce that ensembles are now available via our…
View original post 869 more words
There has been a backlash lately against big data. From O’Reilly Media to the New Yorker, from Nassim Taleb to Kate Crawford, everyone is treating big data like a piñata. Gartner has dropped it into the “trough of disillusionment.” I call B.S. on all of it.
It might be provocative to call into question one of the hottest tech movements in generations, but it’s not really fair. That’s because how companies and people benefit from big data, data science or whatever else they choose to call the movement toward a data-centric world is directly related to what they expect going in. Arguing that big data isn’t all it’s cracked up to be is a strawman, pure and simple — because no one should think it’s magic to begin with.
Correlation versus causation versus “what’s good enough for the job”
One of the biggest complaints — or, in some…
View original post 1,242 more words
SEATTLE/SAN FRANCISCO — While much of corporate America is retrenching on the real estate front, the four most influential technology companies in America are each planning headquarters that could win a Pritzker Architecture Prize for hubris.
[np_storybar title=”Steve Jobs planning new ‘spaceship’ Apple headquarters in Cupertino” link=”http://business.financialpost.com/2011/06/08/steve-jobs-planning-new-spaceship-apple-headquarters-in-cupertino/”]
Steve Jobs unveiled his vision for a new 3.1 million square foot Apple Campus, with a central facility shaped like a giant spaceship, that could house as many as 12,000 employees in Cupertino, the California city that has always been home to the world’s largest technology company.
Amazon.com last week revealed plans for three verdant bubbles in downtown Seattle, joining Apple’s circular “spaceship,” Facebook’s Frank Gehry-designed open-office complex and a new Googleplex on the list of planned trophy offices.
“It signals a desire, a statement, to say that we’re special, we’re different. We have changed the world and we are…
View original post 1,210 more words
I recently spent a while working on a pretty fun problem over at Stack Exchange: predicting what tags you’re going to be active answering in.
Confirmed some suspicions, learned some lessons, got about a 10% improvement on answer posting from the homepage (which I’m choosing to interpret as better surfacing of unanswered questions).
Why do we care?
Stack Overflow has had the curious problem of being way too popular for a while now. So many new questions are asked, new answers posted, and old posts updated that the old “what’s active” homepage would cover maybe the last 10 minutes. We addressed this years ago by replacing the homepage with the interesting tab, which gives everyone a customized view of stuff to answer.
The interesting algorithm (while kind of magic) has worked pretty well, but the bit where we take your top tags has always seemed a…
View original post 1,645 more words