How can you tap into the wealth of social web data to discover who’s making connections with whom, what they’re talking about, and where they’re located? With this expanded and thoroughly revised edition, you’ll learn how to acquire, analyze, and summarize data from all corners of the social web, including Facebook, Twitter, LinkedIn, Google+, GitHub, email, websites, and blogs.
Employ the Natural Language Toolkit, NetworkX, and other scientific computing tools to mine popular social web sites Apply advanced text-mining techniques, such as clustering and TF-IDF, to extract meaning from human language data Bootstrap interest graphs from GitHub by discovering affinities among people, programming languages, and coding projects Build interactive visualizations with D3.js, an extraordinarily flexible HTML5 and JavaScript toolkit Take advantage of more than two-dozen Twitter recipes, presented in O’Reilly’s popular "problem/solution/discussion" cookbook format The example code for this unique data science book is maintained in a public GitHub repository. It’s designed to be easily accessible through a turnkey virtual machine that facilitates interactive learning with an easy-to-use collection of IPython Notebooks.
This short book might have more appropriately been titled, "How I Personally Mined the Social Web using Python."
Without giving too much explanation, the author provides samples of his Python routines. Where another author might spend an entire chapter (if not the whole book) explaining a technological topic, Russell just makes a comment and moves on to his code examples. If you are comfortable with, "Install this, run that command, and now copy my code..." then this is an okay book.
This is basically a Python cookbook with Social Media recipes. It covers APIs useful for Google e-mail, Twitter, Facebook, and LinkedIn. As such, it was interesting reading to see how it is done, but this is not a primer on how to do it.
A book on data mining. Interesting. Than I open it! Wow!
So data mining starts with why people are on Twitter. A cute bullet list about the human need to be heard. Okay. Maybe it's just a slip. Turn the page over. Well, an insightful paragraph about Twitter having been started at 140 characters! Amazing! Probably your data mining apps won't work if you set the Tweet size to 512 characters!
Next paragraph. More goodies! Twitter is all the rage. Really? Not really. It's interesting for journalists paid by the word as they can put a twist to a badly phrased statement.
I turn the page. And, more data mining information. @HomerSimpson is not a real person! And it does not stop here! Can you imagine it is in relation with the Fox sitcom and not the president of North Korea! Amazing!
So how can I do such high quality data mining? Well, let's see a page about TweetDeck, which does data mining? No, but by this time you already know this book isn't about data mining.
More precious data mining somehow related to this load of manure: the importance of PyDoc. So you have to know Python to follow this clown's examples, but data mining is about PyDoc. Or something.
The data is said to be at Github for free, because the editor and the author make too little money to invest in a domain. But somehow that is lost and the links are obfuscated with bit.ly links to track you. So are you learning about data mining or these unscrupulous characters are data mining your account?
The hardest part of learning a data analysis method is not in implementing the method, it is applying the method in the context of a real data problem. And data mining and machine learning texts often skirt the issue by using pre-processed data sets and problems defined to fit the method being taught. Russell uses analysis of social media sites to set a context where you start from having to gain access to real data sets, clean and transform the data into forms that your analytical libraries can make sense of, and then use the results to make a conclusion. For that, it rates a place along any other text that focuses more on the analytical methodology itself.
What I most appreciated about this book was the work put into converting data from one format to another. From the beginning, when he works with data pulled using a services API, then getting that into a format that another library requires, then getting those results into a data mining framework for analysis. Following his flow has helped me understand the methods better. And these examples of processing data from format to format is something that gets my students stuck before they get really started in a project. I especially appreciated the chapters that worked with the Natural Language Toolkit (NLTK) and the NetworkX graph libraries. These examples helped me get pass what was the hard part for me in working with these libraries in previous encounters.
The virtual machine is also very helpful. I have always found the hardest part of working with Python for analytic computing has been teaching my collaborators how to get set up. And in data mining this is even harder than standard. I was able to get through his book installing everything on one machine, but on another I used the author's virtual machine, and I have pointed a student who was working with me to the virtual machine as well.
This is a great book to work through the mess of implementing data mining methods in real situations. It is not a theory book, but it serves its purpose well.
Note: I received a free electronic copy of this book from the O'Reilly Press Blogger program.
đọc nhiều cuốn về chủ đề data science/machine learning thì cuốn này rất rất ổn.
4.5 nhưng đang vui quá tay tí :))
p/s: định mining goodreads thử mà thấy respoone toàn xml nản quá T^T trên này có cả group dev nó bảo api gì toàn tầm chục năm :v. Nghiêm túc thì bạn nào có muốn thử không nhỉ ???
What I found most useful from this book was the information these data scientists held within these pages about GitHub. I didn't know anything about this before I opened this book up.
The kawaii icons noting each point to understand in particular are absolutely adorable, as well! I think for a textbook about the new world we're living within today it comes across incredibly nicely. BUT HONESTLY WHAT MADE ME LAUGH THE HARDEST WAS THE README.1st AT THE BEGINNING - since, I thought to myself, didn't we all click those open in our little games? At least I definitely did!
I remember so clearly~! When I was a little girl like six or seven I would sit with my bird chirping up a storm right on my shoulder as I read the README document the whole way through before I played the cute little text game. And died a miserable text death, of course. But the nice thing was that you could start right over again with exactly the same stats!
People asked me "What exactly do they mean by 'mining' in that regard?" and I tell them that you don't typically need your lighted helmet when you're doing this kind of social mining, but it seems to me that it is looking at general trends and making projections for the future.
And, also, look at the adorable woodland creature on the cover!
I was given a free e-book and asked by O'Reilly to review it in exchange. I was excited for the opportunity since I think that having the ability to mine the social web is important. I was also happy that the author utilized Python as the programming language of choice to show how this is to be done. I have been using Python as a tool now for about a year and have found it to be my preferred server side scripting language for web app development. If you are a php, perl, ruby or java developer, I think you could pretty easily transfer the techniques shown across to your choice platform.
The book is not focused on Python development per-se, but I think a certain amount of knowledge of that language is helpful to understanding the book. One interesting benefit is the appendix, where the author walks you through the use of IPython Notebooks as a way to show example code and execute it. A virtual machine is setup and then run on a port for you to execute live code in a browser.
After getting setup, getting to the heart of the matter, the author does a good job of covering the main aspects of mining the more notable social media sites such as Facebook, Twitter and LinkedIn. Introductions to all the api's and example code showing how to access the data on those sites are well written and explained.
The book is an excellent cookbook and a must have for the technically minded. However, a shortcoming may be that it does not cover much in the way of theory or objectives of data mining and analysis. While this is an excellent book of how-to, the why of social media mining is left to other sources.
Я завжди не розумів формату cookbook. Який сенс давати розрізнені куски коду, які виконують те, що треба, але не пояснюють основу. Тут я обламався, тому що від соціалок більше нічого і не треба. Суть - зібрати набір даних соціальних контактів, і знайти патерни, які найчастіше трапляються, і потім красиво їх звізуалізувати. Автор вдається перейти певну межу, і таки навчити зацікавленого прогера копатись в цьому сирому і зашумленому матеріалі, і робити це ефективно. Особливо сподобалась ідея практичного використовувати відстані Жаккара для визначення схожості користувацької аудиторії кількох конкурентів. Клепав весь день код, потім зробив аналіз результатів, і презентував керівництву. Виявилось, що наш продукт буде корисно несподівано розвивати трохи в іншу сторону. От це правильний cookbook. Рекомендую.
Excellent book.. its beauty lies in the loads of ideas it gives, efficient ways to implement them, and the tools it talks about. What this book lacks IMHO are the extra detailed discussions on why x approach was followed and what's the rationale behind that.. as one commenter said, the book has too many How's but few Why's.
I don't usually enter tech books here, because I rarely read them from cover to cover. Bit this one I did. It's well written, comprehensive. My only caveat is the author's fondness with the heavy VM he uses for his examples...