"When a story captures the imagination of millions, that's magic. Can you qualify magic? Archer and Jockers just may have done so."—Sylvia Day, New York Times bestselling author
Ask most book people about massive success in the world of fiction, and you’ll typically hear that it’s a game of hazy crystal balls. The sales figures of E. L. James or Dan Brown, they’ll say, are freakish—random occurrences in an unpredictable market. But what if there were an algorithm that could predict mega-bestsellers with stunning accuracy? What if it knew, just from reading an unpublished manuscript, not just that genre writers like John Grisham and Danielle Steel would sell in huge numbers, but also that authors such as Junot Diaz, Jodi Picoult, and Donna Tartt had signs of New York Times bestselling all over their pages?
Thanks to Jodie Archer and Matthew Jockers, the algorithm exists, the code has been cracked, and the results are stunning. Fine-tuned on over 20,000 contemporary novels, the system analyzes themes, plot, character, setting, and also the frequencies of tiny but amazingly significant markers of style. The “bestseller-ometer” then makes predictions, with fascinating detail, about which specific combinations of these features will resonate with readers. Somehow, in all genres, it is right over eighty percent of the time.
This book explains groundbreaking text mining research in accessible terms, but its real story is in what the algorithm reveals about reading and writing and how successful authorship works. It offers a new theory on the success of Fifty Shades of Grey. It explains why Gone Girl sold millions of copies. It reveals the most important theme in bestselling fiction and which topics just won’t sell. And then there’s “The One,” the single most paradigmatic bestseller of the past thirty years that a computer picked from among thousands. The result is surprising, a bit ironic, and delightfully unorthodox.
The project will be compelling and provocative for all book lovers and writers. It is an investigation into our intellectual and emotional responses to stories, as well as a big idea book about the relationship between creativity and technology. It turns conventional wisdom about book publishing on its head. The Bestseller Code will appeal to fiction lovers, data nerds, and those people who have enjoyed books by Malcolm Gladwell and Nassim Taleb.
Jodie Archer spent her childhood hiding in the changing rooms of the clothing stores her mother managed: she would pile sweaters all over the floor to look as though she were putting together an outfit, and would sit for the 8 hours the shops were open and read books in any genre she was allowed. At this age, she decided the only sensible profession in the world was to be a writer, but sensible people told her that it was not sensible at all to pursue such an unlikely dream.
In pursuit of plan B, she was, for a while, a (not very good) young actress. She would take the parts of the characters and even writers she was most interested in--Alice in Wonderland, Matilda, Charlotte Bronte (which is when she most spectacularly forgot her lines), and once even Winnie The Pooh (which was fatal for her street cred at school). When she was given parts that involved singing on stage, she really proved that she was not meant to be involved in this business at all.
A compulsive writer of letters, stories, scripts and even a newsletter for the neighborhood cats, she never thought again of becoming a writer for real until her school displayed a blurb she wrote for a book she had never, in fact, written, and she was besieged by requests for the text. Still, she was told it would be more sensible to become a lawyer than an author, and she must study English (if she insisted) an then convert to law.
Before she went to Cambridge to study English, Jodie took a year out and wrote for several local newspapers and magazines, and then worked as a runner and researcher in TV. Throughout her following years at St, John's in Cambridge, (where the law was forgotten after her first lecture), she wrote reviews and features for various news outlets, and edited the May Anthologies and other well known university magazines. Once she graduated, she was offered training schemes at Macmillan and Penguin, and spent two years learning the publishing industry before becoming an acquisitions editor at Penguin. While this was exceptionally useful training, it showed her just how hard it is to make it as an author, and she thought it best to play it safe.
Jodie left the UK for a scholarship at Stanford University in California, where she spent time teaching nonfiction and memoir writing alongside her research in contemporary fiction and bestsellers. While at Stanford, she enjoyed the blue skies and palm trees of California and wrote most of her first (unpublished) book. She got her first interest from agents the day her mother suddenly died. Just after that, her marriage broke down, and she abandoned writing for a couple of years.
After she got her PhD, Jodie was recruited to Apple, where she became the lead in research on books. After she was approached by her agent to write a book based on her doctoral research with Matt Jockers, she wrote a pitch. She left the corporate world to write The Bestseller Code after its acquisition became deal of the week in New York.
Finally, at 36, Jodie is a full time writer. After ten years in the US, she lives in Yorkshire, UK with her Havanese puppy, Mollie.
I honestly thought I would enjoy this book more than I did. Part of the problem might have been the not-so-secret snobbishness I have when it comes to bestselling novels. There's a little voice in my head that tells me that if a book appeals to the masses, it's probably not going to do much for me. And, in most cases, that's true. I don't very often read titles that make the lists, and when I do, it's usually by accident, or if the book has been chosen by my book club. I've never read anything by Dan Brown, Jodi Picoult, or James Patterson. And, no, I've never read Fifty Shades of Grey.
There is some interesting information here, but it does tend to get repetitive. I had the feeling I so often get when reading nonfiction, that the contents could have easily been covered in a magazine article. The facts I found most interesting were that a novel's first sentence is frequently an indicator of its possible financial success, that a computer rightly deduced Robert Galbraith was actually J.K. Rowling, and that out of all the bestselling authors, John Grisham and Danielle Steel hit the right buttons more than any other writers.
It should be noted - there are a lot of spoilers; plots (and endings) of many bestsellers are discussed in great detail. On the plus side, anyone who's fond of charts and graphs should be delighted by this book. Personally, I think the data discovered will appeal more to writers than readers. I much preferred a fiction book I read recently on this same subject - How I Became a Famous Novelist. That one, I would recommend.
There's an observation that sometimes goes around about how you only need to read the fourth chapter of any given business book. The first is an introduction, the second is about how everything you thought you knew about the subject was wrong, the third is the miraculous tale of how the authors came up with this new secret answer, and the fourth is the actual content. After that it goes into testimonial-style case studies and other rather dull stuff. So, the fourth chapter, or sometimes I've heard the fifth, is the only one you need to pay attention to. Either way, the point is that there's a certain class of non-fiction book that's mostly padding with an article's worth of actual content. The Bestseller Code: Anatomy of a Blockbuster Novel felt like one of those books to me.
The Bestseller Code's been modestly controversial since it's publication, either because it's heretical to declare that there's a machine-identifyable set of characteristics to making a bestseller or because everyone already knew what those characteristics were, so who needed the computer? I tend to disagree with both critiques. First of all, if software can identify the patterns that lead to success then better we become aware of it than pretend the publishing business is driven entirely by artistic impulse. Second, if such a secret cluster of characteristics exists, then the monumental pile of unsuccessful novels that come with major publisher backing is evidence that most people in the industry don't know what they are.
The problem is that The Bestseller Code isn't really going to show you one way or another, because while the book repeats the same mantra over and over (and over) that "our model predicted bestsellers within our sample with 80% accuracy!!!" there's actually very little evidence given. The data isn't there for one to consider, nor is the entire list of topics nor their full rankings within the model. So while I've no doubt that the authors successfully built a set of algorithms to measure a given text's likelihood of hitting a bestseller list, slogging through their 240 page advertisement for it is a pretty unsatisfying read. There just isn't a lot of detail or actual information on offer. Sure, there are a few generalizations (write short, simple sentences with lots of contractions that deal as much as possible with the topic of human connection and if you can get "I" and "him" in close proximity your on the right track), but they're precisely the ones you'll find brought up everywhere. Such sage advice as sticking to no more than three main topics in your book and not over-writing your sentences doesn't require an algorithm – you hear it all the time.
So the problem isn't that the authors are wrong or haven't discovered something intriguing – and perhaps extremely useful – about the nature of bestsellers; the problem is that they really aren't sharing much of it in this book. That's a logical tactical choice if you plan on going into the business of getting people to pay you to run their books through your software, but it doesn't make The Bestseller Cods: Anatomy of a Blockbuster Novel a very useful or engaging read.
This book ended up being even more amazing than I expected.
The authors are both literary/publishing experts and have worked on machine learning for years. They fed 5,000 books, published over the past 30 years, to their computer programs. 500 of those were NY Times bestsellers and the rest weren't. They had programs that analyzed, for each book, the themes and topics, ups and downs of the plot, characters and the style. They had an in-sample--10% of bestsellers and 10% of non-bestsellers--that was used to train their programs, and then they forecasted how likely the out-of-sample books were going to be bestsellers.
They were right about 80% of the time!
How did they do it and what are some of the conclusions? I won't spill all the beans, but here are some examples:
They analyze topics by looking at nouns that are close to each other. So if "beer" and "coctail" are near "bar", the computer concludes the book is talking about a bar in which people drink rather than a bar exam taken by lawyers or a bar used to do pull-ups. (You can see the complexity here--the computers have to get the meaning like people--from contex--in order to learn to read, but that's only the first step.)
Then they looked at hundreds of topics such as guns or health emergency or sex across their sample of books to see which topics were used by the bestsellers and which ones weren't. The same for non-bestsellers. They noticed that sex doesn't sell, for instance.
In addition, the number of topics and how often a topic appeared were even more important i.e. a book shouldn't try to cover too many topics.
How about the plot? They looked at words that showed character feelings to determine whether good or bad/dangerous things were happening to the characters. The cumulative effect is a curve that shows ups and downs of the plot--the emotional plotline. You want to see curves, of course. Two winners in this chategory were the two best selling adult books in the last thirty years: the Da Vinci Code and the Fifty Shades of Grey. Cool!
The next big thing is characters. To figure them out, you want to see what they say, think and do. The authors accomplished that by analyzing verbs. The conclusion is that active characters are better than passive--no surprise there! Verbs like "need" and "want" are much better than "wish".
Lastly, there is no good book without a good writing style. Even fewer surprises there. Basically, what the books on writing and editing teach truly works: use contemporary language, contractions like wouldn't, shorter sentences etc.
I have listed a few examples here. There's a lot more in the book.
One interesting thing is their models told them fantasy and science fiction don't work. People like to be in our world. At first, I felt "no way", but (1) the authors analyzed only books for adults (fantasy and scifi totally rule YA), (2) many fantasy books happen in our world or have connections to it, and (3) you can still pull it off if you do a great job with the plot and the characters like RR Martin.
I highly recommend this book. Enjoy!
P.S. The last chapter is quite interesting, too. So if you can teach computers to read books and make conclusions about them, can you teach them to write? It seems we are still in the very early stages of that. But, it'll happen some day with artificial intelligence. I think we are still ways off from that.
Using a computer algorithm, the authors of this book as the question of whether you can predict whether a novel will be a bestseller or not. Jodie Archer is a former publisher and consultant, while Matthew Jockers is the co-founder of Stanford University’s famed Library Lab. In this work they claim they can discover a bestseller and analyse 20.000 novels to demonstrate this.
Subtitled, “Anatomy of the Blockbuster Novel,” this book attempts to analyse novels from the points of view of theme, plot, style, character and all data points. Of course, much of this is fairly obvious, as are the results of computer generated writing. For, if a computer can analyse what works within a novel, why can they not write that elusive bestseller?
Overall, this is an interesting looks at the mechanics of writing and publishing, our obsession with lists and ranking and the anatomy of what creates a perfect story. The book also contains a list of 100 novels it believes you should read – as an avid reader I have read only six of them and the books which are missing include every classic. However, as the title suggests, this algorithm aims to discover that bestseller – the book that is in every supermarket and is the talked about novel for a certain amount of time. Some may become classics, others may not wear as well and that is why, thankfully, literature is based on more than commercial success. An interesting exercise though and a fun analysis of the bestseller charts.
The title of this book has it all for me...it's the reason I picked it up in the first place. The idea that blockbuster novels all share some elemental DNA in common is at once exciting and dangerous.
I found that the authors of this book set out to prove their algorithm without giving away too many of the intricate details (likely proprietary information) and for the most part made their case in a concise and believable manner.
For the most part.
I honestly would've liked to have seen more actual numbers produced from their research contained within the pages of this book. For budding novelists out there let me spare you the time; this book will not deliver that one crucial element or secret to you that will make you a bestseller. It will get you looking at the number of times you use the word "the" and how often you use pronouns or Mr. King's dreaded "ly" words. It will tell you how bestselling authors write...or it will try to tell you.
At the end of the day the machine can data mine thousands of best sellers but in my opinion never uncover the "secret sauce". Talent with words goes beyond their placement within a sentence of the size of one's paragraphs. What all best sellers really have in common is a love of language and a love of writing.
And I've yet to come across the machine that can understand that...yet ;)
"Recommending a book is not like recommending a health tip or a stock. Recommending a book can be like trying to navigate the unspoken rules and faux pas of a Jane Austen ballroom. The book world comes with considerable baggage."
Who can explain what makes for a best-selling book? What techniques do best-selling authors employ that makes their works so desirable compared with the majority of authors who struggle for readership? Do those who write literary classics differ so much from those who appeal to the mass market?
All this -- and more -- is covered in this fascinating look at big data and the New York Times best-seller list. There are actually two fascinating parts to the book. The first part covers the actual results of the study: that is, what makes for a best-seller? Second and equally interesting is how the big data on books are collected and interpreted.
A side look at the interests and predispositions of readers at both the literary and mass market levels are fascinating as well. Who among literary lovers has not succumb to a mass market book despite their professed inclination otherwise? (For example, I count Dan Brown and The Da Vinci Code as among my particular weaknesses although I don't usually find what I'm going to read from among the typical household names on the best-seller list.)
Is that unnecessarily snobbish? Am I no better than a fashionista who refuses to wear anything without a Prada or Hermes label? Or a foodie who refuses to eat anything that isn't farm-to-table? Book people, to thine own self be true. Those best-sellers are popular for a reason, and the best-selling formula can truly be seen on display here. It's a fascinating look, too, at why machines can't (at least not yet) replace writers, despite all that they can do otherwise.
There's so much to think about in this short book...as well as plenty of reading suggestions for those book list compilers. It demonstrates very well what big data has yet to teach us about so many things, beginning with one of our favorites -- what we read.
My thanks to Good Reads and St. Martin's Press for allowing me to read this book!
Jesus. This could've been so much better. They had all of this great data and then just dragged the fuck out of every chapter. . .and when the actual date was presented. . .it was fast and in clumps of undecipherable paragraphs.
Great Discoveries. Horrific Presentation. And suck ass drag-on writing.
Кілька вчених-лінгвістів мали натхнення та багато вільного часу і створили програму, яка аналізує книги. Вони пропустили через неї величезний корпус текстів, навчили аналізувати їх стилістику та сюжет. Ідея спрацювала! Так з'явилась книга "Код бестселера". Що в ній особливого? ⠀ 📚Книга подає узагальнені висновки аналізу бестселерів New York Times: що об'єднує топові книги? Чому "50 відтінків сірого" і "Код да Вінчі" знаходяться поруч? ⠀ 📚В ній багато списків! Лідери по сюжету, стилістиці. Читаючи книгу треба тримати при собі олівець і блокнот для виписування назв книг. ⠀ 📚А ще є графіки по книгам, таблиці, числові значення на основі аналізу сюжетів... Ніколи не думала, що таке можливо. ⠀ В чому ж секрет бестселерів? Ключові фактори, котрі вдалося виокремити вченим: ⠀ 📍СТИЛЬ ⠀ Стилістика, граматика — це важливо. Інколи не так важливо, ЩО ви пишете, а те, ЯК. Бо ідей насправді нових не так багато. Тому стиль визначає письменника. ⠀ 📍НАЗВА ⠀ Цікаві роздуми подані про топові назви: чим відрізняється ефект використання артиклів а і the на читача? Чому такі популярні книги, в назві яких є слово "дівчина?" Яка назва містить загадку, а яка змушує читача пройти повз? ⠀ 📍ПОЧАТОК ⠀ Це мегацікава тема! На прикладі почптків бестселерів і класики розбираються фрази, які чіпляють читача на гачок. Вдалий початок — частина успіху. ⠀ 📍СЮЖЕТ ⠀ Сюжет повинен бути динамічним! Книга пропонує формулу відсоткового співвідношення різних тем. Що лежить в основі бестселерів? Які теми турбують читачів? Підказка: це є в "50 відтінках сірого" і це не секс😏 ⠀ Книгу читати було легко, приємно і цікаво. Автори поставили перед собою дуже амбітні цілі, провели ґрунтовне дослідження і представили його нам з купою актуальних прикладів і посилань. ⠀ Рекомендую!👌 ⠀ А вам запитання: ⠀ 🤔Як гадаєте, яка тема в топі за результатами програмного аналізу? ⠀ 🤔Яку книгу комп'ютер визначив як найнайвідповіднішу "стандартам" бестселерів? Я була здивована, бо раніше про неї не чула, тепер хочу прочитати😊 . Поскріню деякі книгосписки в телеграм😎
I found this book fascinating reading. The authors wrote a computer programme which could read and analyse books and this is the result. They wanted to see if a computer could predict which books would be best sellers and which wouldn't, A lot of the time it got things right but with some books it was completely wrong - stating that a book was unlikely to be a best seller when it was actually a blockbuster.
I thought it was interesting that a computer could tell whether it was a man or a woman who had written a book and whether two completely different books were written by the same person. Robert Galbraith and J K Rowling were easily identified as the same person by the computer. Best selling books use more verbs and fewer adverbs and adjectives and concentrate on a small number of themes for thirty percent of the book apparently. Best selling authors use more contractions such as don't, won't, she's, he's etc.
Whether or not you habitually read best selling fiction this book provides some fascinating insights into the way best sellers grab the public imagination and sell millions of copies across the world. If you're worried that the book will go into too much detail about the way the computer programme works then rest assured this detail is kept to footnotes and a section at the end. The text is mainly about the insights provided into the way best sellers work. The book has certainly made me look at best sellers differently and I might actually go on to read more of them.
The book provides a list of the computer's top one hundred books if you want to start reading all those block buster books you've missed. The authors really bring their subject to life and I liked their touches of humour and the descriptions of the difficulties they had in getting the computer to understand nuances which human readers take for granted. I loved the irony of the choice the computer made for its favourite book which caused me to wonder what the computer would make of Jane Austen - that master of irony. It also made me wonder whether a sense of humour could be programmed into a computer.
As a writer, the one resource you can't afford to give away is your time. Therefore avoid this book.
A book that presents statistical analysis should be edited by someone with literary knowledge and by someone with quant knowledge. It may have to be two different people. This book had neither. Edward Tufte could write an entire volume about bad presentation practices, using only this one book to provide examples of what not to do with statistics.
But it's not just the quantitative material that is bad. It's almost everything.
The first chapter, pages 1-31, is an advertisement for the rest of the book and can be skipped. You're welcome.
Actually, the whole book can be skipped, but let's look at just a few of the many reasons why.
After that skippable first bit, chapter 2 starts on page 33, and the authors proceed almost to the end of page 36 before committing their first crime. That crime? Spending a page explaining the important difference between "topic" and "theme," and then forgetting their own explanation and using the two words interchangeably for the rest of the book.
I am not making this up.
That 2nd chapter is about "topic analysis," which they never explain in any way that makes sense.
True, they give a clue about how topics are identified, by finding words near other words so that each forms a context for interpreting the others, letting a computer figure out what is being talked about. Like how "bank" means one thing if it's near "money" and "loan," and something else if it's near "river" and "fish."
But what they don't explain is the terribly important matter of how topic analysis can figure out what exact percentage of a book is given to a particular topic. The most pressing question is how topic analysis of a book can come up with a list of topics, with their percentages, that add up to 100% of that book.
Now wait, I hear you saying, shouldn't a list of things in a book add up to 100% of the book? Absolutely not.
There's a saying in Hollywood: "If your scene is about what your scene is about, you're screwed." It means that even if the words spoken or written are important, what's more important is the subtext, and the placement of the scene in the flow of the narrative.
An example from a novel: In bed at night, a wife has her husband read poetry to her. He reads "The Eve of Saint Agnes" by Keats. She asks him to stop often and explain certain parts. Then she goes all quiet and turns off the lights. The narrator draws a discreet veil over what happens next.
So what is the topic of this scene? What is it about?
Poetry, or literature more generally? Yes, certainly. Sex? Yes, and although there isn't a single body part named, or any other keyword that will tell a computer this is about sex, if you know the poem, you have no doubt. The mention of the maid's having to do laundry the next morning is another hint.
So the scene is about these things. But what it's mainly about is the woman dealing with her sense of inadequacy. Because the couple are about to adopt a child, a child who shows early signs of being a linguistic genius. In the arc of the novel, that is the main, but not the only, thing the scene is about.
My point is not just that a computer would miss this. My point is that if a computer, or a human reader, thinks there's one topic here that makes up 100% of the scene's topics, that computer or human is wrong. And if an entire novel can be boiled down to a list of topics that make up 100% of the novel, that is one dull-ass book.
I'll spare you a full discussion of why the The Bestseller Code's treatment of statistics is horrible. I'll give just one example of how either a line editor or a numbers editor could have helped this book, if either had been called in.
Remember those percentages of a novel made up by topics? Well, on page 60, we learn that "...to get to 40 percent of the average novel, a bestseller uses only four topics. A non-bestseller, on average uses six."
So, fewer topics = better. Got that? Of course you did.
The authors didn't.
On page 190, they are discussing the one book to which their algorithm gave a perfect score as being sure to be a bestseller:
"Three themes [they mean topics, but remember there was no editor] make up roughly 30 percent, which we know to be a winning formula. The [topical] DNA is 21 percent Modern Technology, 4 percent Jobs and the Workplace, and 3 percent Human Closeness."
The math here isn't hard. Three topics have just been listed in descending order of their frequency or prevalence in this "perfect novel." No other topic can make up more than 3 percent. This means that to list the topics that make up the top 40%, they have to make up another 12% [40-(21+4+3)]. There have to be at least 4 additional topics to get us to 40%. So there are 7 or more topics in the top 40%.
But remember page 60? Where we learned that 6 topics in the top 40% is the sign of a loser? And that more topics is even worse? Yet here we are being told this novel, with 7 or more topics, was identified by their computer as being an exemplar of the bestseller formula.
Trust me, I have spared you many more examples of how poorly this book was written and edited. Follow my example and spare yourself.
Despite all the efforts of publishers, it has always seemed impossible to predict whether or not a book would be runaway bestseller. This isn't too surprising - it's the kind of thing that is inherently unpredictable because there are simply so many variables involved. Yet a newly published book suggests it is possible to do just that. Are the authors crazed or brilliant? Neither, really. They have put together a mechanism based on computerised text analysis that is good at spotting bestsellers - and yet, oddly, this doesn't contradict that inherent unpredictability. Why? Because there are two different levels of bestsellerdom involved - and because I think there's one bit of information missing from the book (apologies to the authors if I've missed it).
So what does the software do? By looking at various word uses, patterns and shaping, it can make a good shot at predicting whether or not a book is likely to have featured on the New York Times bestseller list. This is very impressive - and, along the way, Jodie Archer and Matthew Jockers give some excellent advice on things that authors can do (or at least try to do) that will make their books more like these bestsellers.
This isn't a universal panacea. In fact the authors admit that what their algorithms spot is not what most would regard as great fiction. The system laps up the like of the output of Dan Brown and 50 Shades of Grey. But interestingly, it also is useful counter to those who say they can't understand why these kind of books sell because they are terribly written. In fact, in a number of respects these books are very well written - it's just that the criteria for 'well written' are not those used by the lit. crit. brigade.
Not only is this not a recipe for producing great literature, it's not about producing books everyone would like either. Taking a quick skim through the top 100 books selected by the analysis, there are perhaps three I would consider reading. But many of us are not 'bestseller' readers. We like our own little niches, and that's fine. This system isn't for us - it is about finding likely hits for the traditional bestseller market.
This genuinely is all very interesting, although the book has surprisingly little content for a full price hardback (it's large print, and there's a lot of dancing around exactly what they are doing). However what absolutely isn't true is the assertion made here that 'mega-bestsellers are not black swans'. The system uses a number of measures, and though it's true that most mega-sellers like Harry Potter and 50 Shades do well on some of the measures, they pretty well all fall down on others. So, for instance, to write a bestseller we are encouraged to avoid fantasy, very British topics, sex and descriptions of bodies. What the model seems to do well is to recognise what you might call the run-of-the-mill bestsellers, rather than pick out most of the real runaway successes as being stand-out.
There was also that missing bit of information. The authors are enthusiastic to tell us how many books that scored highly from their system were on the bestseller list, and that really is impressive. But they don't mention false positives - how many books the system thought should be bestsellers but weren't. That would have been interesting to discover more about.
I'm sure we'll hear more of this kind of analysis, but I really hope publishers don't put too much stock by it - because it is very much a lowest common denominator approach (certainly from the viewpoint of someone who wouldn't consider reading more than 95% of their recommendations). That's not to say that the book isn't interesting - and for an author, there are some excellent insights into some of the things that attract this generic group of readers (or put them off) that are worth considering even if you do write science fiction or British crime fiction (say).
A fascinating piece of analysis, provided you don't take it all too seriously.
Ignore all the reviews by novelists sucking up to the authors of this for professional reasons, and what do you have? A bad book with little to reveal.
The old publishing industry, no doubt shamed by the finally accurate raw data Amazon lets anyone see and Author Earnings's incisive analysis of it, is finally trying to do smart analysis after years of guessing very badly. However, here, they largely failed.
Their data points were, if you'll excuse the term, squishy for the most part, had to be interpreted by people with biases that have already lost them over 50% of the book market in the past seven years, and so the only useful information here is when they actually counted words. (word selection has inherent bias too, but at least the results for those words say something.)
Take away: characters should "grab" "think" and "ask." They should "need" and "want." "Love" is good and "very" is bad. The only unanticipated and interesting result is that "okay" is quite good. Best sellers use the word a lot more than non-best sellers. (One would think "okay" is too namby-pamby, so this is worth noting, but we don't know if every "okay" was in dialog or not.) I'm curious about other words they didn't select. How did "and" work out for authors? "Then?" "Too?" "Asperity?" I suppose I'll never know.
The other problem with this book -- beyond that it's mostly opinion trying to pretend it's science -- is that it's worth a 5000 word article at best. So instead of reading it and saying "WTF?" a lot as the authors stumble around trying to make it book-length, you should go read the newspaper reviews of people who suffered through it and summarized it. It's far less painful to absorb the lesson, weak as it is, that way.
Երկու գրականագետ ՏՏ մասնագետների հետ ստեղծել են հատուկ ծրագիր, որն ուսումնասիրել է Նյու Յորք Թամյսի վերջին 30 տարիների բեսթսելլերները և տեքստային վերլուծություն կատարել ու դուրս բերել նմանություններ։ Ըստ էության, հենց այդ ծրագրի տված պատասխանների շնորհիվ էլ ստեղծվել է այս գիրքը, որը պատասխանում է հետևյալ հարցին՝ արդյոք բեսթսելլեր գիրքը դառնում է պատահաբար, թե կա որոշակի բանաձև։
Ըստ հեղինակների, բեսթսելլերներն ունեն ոչ միայն նմանություն, այլև համակարգչային ծրագիրը, 5000 տարբեր ասպեկտներ ուսումնասիրելով, կարող է ցանկացած վեպի վերաբերյալ հստակ տալ պատասխան՝ քանի տոկոս հավանականությամբ վեպը կարող է դառնալ հիթ, վաճառքի առաջատար։
Այս գիրքը բեսթսելլեր գրելու բանաձևի մասին։ Հետաքրքիր դիտարկումներ են, օգտակար եզրահանգումներ և կարևոր մանրամասներն, որ կարող են հետաքրքրել հրատարակիչներին կամ գրողներին։
Re-read The Bestseller Code as part of my initiative this year to read more books I've already read (and loved). I found this book as interesting this time as I did the first time. I would love to see a new edition released at some point, with perhaps an enhanced algorithm and more books.
Is it possible to sort of guess everything the book is going to say, not be surprised by any of its figures, and still feel like you learned something? I feel like I was just told a lot of information I sort of already knew but now that I actually saw it in writing it's actually sticking like I actually feel like I learned it rather than came to it on intuition.
What I found most interesting about this book was the algorithm. It actually quantified success and gave unbiased responses to books that had already been on the best seller list. The need to develop characters in order to tap into emotion made sense, and building of tension through rising and falling action is something I have seen plenty of times in books I've read. How Gone Girl had the word "need" in 163 sentences to create a clear and powerful character, which created strong emotions for the reader.
Rather than reading this book to learn how to write a best seller I thought the analysis of already existing best sellers was most fascinating. Like how the algorithm preferred female styles of writing (not necessarily female authors). A number of successful female authors had backgrounds in journalism so they wrote blunt, digestible stories. Though the algorithm was convinced James Patterson was a female and Toni Morrison was a male (that was a funny tidbit).
Ultimately what I learned was people want to be entertained.
Kniha, která by neměla uniknout žádnému opravdovému knihomolovi. Mimo jiné se v ní dozvíte, že bestsellery toho řeknou hodně o naší společnosti nebo že autor musí čtenáře vtáhnout do děje během prvních čtyřiceti stran (víc času, aby čtenáře zahřál u srdce nebo jim způsobil mrazení či husí kůži, nemá). Pochopíte, že není třeba psát věty, k jejichž hlasitému přečtení by čtenář potřeboval lékaře, který by mu přikládal kyslíkovou masku — v bestsellerech jsou méně častá přídavná jména a příslovce. To znamená, že bestsellery charakterizují kratší a jasnější věty bez zbytečných slov. Autoři Šifry navíc vysvětlují současný boom s knihami se slovem „dívka“ v názvech — jde o součást nedávno vzniklého a médií velmi oceňovaného trendu románů, které staví do popředí a do centra pozornosti ženy a jejich tradiční role. Jo, je to zajímavá publikace!
When you’re in the middle of devouring Nora Roberts’ latest love story, or admiring the crazy cunning of Amy Dunne in “Gone Girl”, do you ever stop to think, “Gah, why is this book SO GOOD?”
I do, all the time. My theories are vague, and in the past I’ve left it as: the process of writing is a mystery to the masses and an alchemical process for the writer, and no one will ever know what makes a best-seller a best-seller. I enjoy the vagueness. It feels romantic and rebellious in an era where computers have an answer for almost everything.
While I was content with my “que sera, sera” attitude, Jodie Archer had been asking herself The Big Question for almost 15 years. She asked it with the heart of a book lover – why do we read what we read? She asked it from her experience as an editor – what’s in the novels that sit and stay on the NYT Best-Sellers List? But when she began exploring as a scholar, questions percolated and answers soon formulated. She met Matthew Jockers and with his brilliant algorithms, fine-tuned by both of them, they finally and finitely answered The Big Question in the book industry – what makes a best-seller a best-seller?
I was surprised that it wasn’t sex, despite the axiom “sex sells”, and despite the blockbuster success of racy books since the publication of ‘Peyton Place’.
And it’s not the old saying of “opposites attract”, although that can be part of a reader’s initial attraction to a book.
What stood out the most for me was that part of the best-seller DNA is how much of daily life a writer incorporates into the narrative, and how much a book’s success depends upon emotional closeness.
After reading ‘The Bestseller Code’, I feel that I had a bit of an exploration into the collective subconscious. I found it interesting that Jodie and Matt explore the lighter and darker sides of our attraction to certain books – the sweetness and savior-like qualities of love stories, and the desires evoked by the illicit thrill of a femme fatale's grip.
Jodie’s prose is strong and fluid, and she and Matt guide this reader’s canoe down their steadily flowing stream of information, and along the way they explain everything that we see, answer all of my questions, and leave me thoroughly delighted by this trip.
If you like Big Ideas or algorithms, if you’re a writer, an editor, a book lover, and/or someone who enjoys having the mysteries of life explained, then buy ‘The Bestseller Code’. It is beautifully written by two brilliant people with insatiable minds who adore books and who love to read as much as you do.
The Bestseller Code alternates between pop nonfiction and an academic treatise, which makes sense given the book has co-authors, one a writer and the other a college professor. As a result, my reading experience alternated between enjoying the big ideas and tolerating the science (where admittedly I skimmed the surface). I have no problem with science, but here it seemed a bit dry and well, too academic.
The Bestseller Code had me thinking about writing so I returned to Elmore Leonard's 10 Rules of Writing, which I had read years ago. As you might imagine, Leonard is more entertaining and more succinct. He doesn't cover all that's in The Bestseller Code, particularly on the rhythms of a good plot, what these authors call "the curves of strong sales." But Leonard does provide thought provoking insights into what makes a good novel, and no one can accuse him of sounding like an academic.
The upcoming book The Bestseller Code is getting a great deal of buzz, forcing many of us to ask the question, Can one genuinely predict what kind of book will become a New York Times bestseller (typically considered the most prestigious bestseller list)?
The promise of a formula for predicting a bestseller is getting many in the publishing industry and those who write about books excited, or at least curious. Several journalists contacted me for an opinion about the book because of my background in pub-tech and reader analytics. Thus, I became interested in reading it, and St. Martin’s Press was kind enough to provide me with an advance reader copy.
First of all, this is a delightful book to read. I would recommend it as both an entertaining and educational read for anybody interested in the business of books. This is not a magisterial work, like Merchants of Culture by John Thompson, but a book written for the mass market with plenty of anecdotes and examples that readers and authors can relate to.
The “code” is based on some of the latest advances in machine learning as applied to literature, but the authors attempt to simplify the computer science behind the book. There is no mention of “big data” or artificial intelligence—just plain and simple descriptions of what the “black box” does, with references for interested readers to find out more about its inner workings.
However, there is a statement in the book that is misunderstood by many of those who discussed the book with me, and that is that the algorithm can predict whether a book will be a bestseller with a level of accuracy of 80 percent.
I had a sense when being interviewed that most journalists thought this meant something along the following lines of, “If there are something like 500 New York Times bestsellers this year, then this algorithm can produce a list of 500 titles and 400 of those will indeed turn out to be bestsellers.”
Well, that’s not actually what 80-percent accuracy means. The misunderstanding is in the “can produce a list of 500.”
One needs a bit of statistics knowledge to understand this concept better, and I will first provide (with some statistical elaboration) how the authors describe the 80-percent accuracy remark:
If the algorithm is applied to 50 books that are genuinely bestsellers, then it will recognize that 40 of these (80 percent) are indeed bestsellers, but will classify incorrectly (“falsely”) that 10 of the books (20 percent) are not bestsellers (a “negative” result). Thus, the 10 titles that are missed are what statisticians call the “false negatives.”
The inverse is also true: if the algorithm is applied to 50 books that are known not to be bestsellers, then it will recognize that 40 of these (80 percent) are indeed not bestsellers, but will classify incorrectly (“falsely”) that 10 of the books (20 percent) are, in the opinion of the algorithm, in fact bestsellers (a “positive” result), when in fact they never were bestsellers. Thus, these 10 titles that are incorrectly predicted to be bestsellers are false positives.
Let’s construct a different scenario. Imagine a Barnes & Noble megastore in the American midwest with 200,000 nicely ordered titles on its shelves, including 1,000 titles in a section called “Past and Present New York Times Bestsellers.”
Now a mob of Donald Trump supporters enters the stores and throws all the books on the floor in protest of Trump’s Art of the Deal not being displayed in the bestseller section. They don’t actually take any of the books with them, however, so there are now 200,000 books lying in a jumble on the floor.
A poor B&N staff member is now assigned to put the 1,000 bestsellers back on the shelf, but, being new to the job, he has no idea what makes a bestseller and therefore decides to make use of this magical new algorithm.
The poor worker now tests all 200,000 books against the algorithm (stay with me).
When applied to the 1,000 bestsellers, the algorithm identifies 800 of them correctly as bestsellers, but dismisses 200 as not being bestsellers.
Now it gets interesting. When analyzing the remaining 199,000 books, the algorithm identifies 80 percent—that is, 159,200 books, as not being bestsellers, but it believes (incorrectly) that the rest (20 percent) are. That is a whopping 39,800 books.
Our B&N staffer, using the algorithm, identified a total of 40,600 (39,800 + 800) books as New York Times bestsellers, and discovered not just the 1,000 bestsellers he was looking for, but 39,800 “bestsellers” while missing out on 200 real bestsellers that were incorrectly classified by the algorithm. That is what 80-percent accuracy means.
We applied the algorithm to a large sample that had many books in it that were not bestsellers, and as a result the algorithm produced many, many false positives.
It did do its job, though. Whereas the original 200,000 books contained only 0.5 percent bestsellers (i.e. 1,000 books) the new, smaller list of 39,800 books contains 2 percent bestsellers (800 books), a fourfold “enrichment,” which came at the loss of 200 bestsellers going missing, because the algorithm is not 100-percent perfect.
We could play this thought experiment a bit differently. Suppose the staffer is lazy and fills the shelf with the first 1,000 books that the algorithm identifies as bestsellers. Well, based on the above enrichment factor, we know that among the first 1,000 books the intern selects, 2 percent (i.e. 20 books) will be bestsellers. So the new “bestseller” shelf will consist almost entirely of books that are not bestsellers. There is even a 1-in-200 chance that Trump’s book will end up on the shelf.
Now, this result doesn’t sound quite as impressive, does it? But this is what 80-percent accuracy means. And given that a million new books and manuscripts are written every year, it will not turn publishing on its head. To that end, an algorithm with 80-percent accuracy will just not cut it.
Don’t be deterred from reading the book, though. It still offers some genuine and novel insights as to what makes a bestseller. But that said, it is not going to put acquisition editors out of their jobs.
What should not get lost in all this, however, is that machines are getting smarter, machine learning is improving, and artificial intelligence is getting more intelligent. So what if the algorithm were 99.9-percent accurate rather than just 80-percent accurate? In that case, the staffer would have correctly identified 999 of the 1,000 bestsellers lying on the floor as New York Times bestsellers and missed only one.
But the staffer also had to test the 199,000 other books, and that would have produced 199 “false positives,” meaning he would have 1,198 books to put on the shelves—198 more than he would have expected if the algorithm were 100-percent accurate (like an inventory list with no mistakes or typos).
Now that would sound a heck of a lot more impressive, but an algorithm that is 99.9-percent accurate is still a long way off for the simple reason that human taste and fashion are so incredibly unpredictable.
Book publishing will always be a bit of a lottery, but that does not mean the odds cannot be improved with good data and smart algorithms. At my own company, Jellybooks, the emphasis is on generating good data. That means understanding how people read books and when they recommend them, not just judging success based on sales data or a book’s position on particular bestseller list.
Going forward, code will appear more and more in publishing even if it can’t write novels yet or predict with 100-percent accuracy the next New York Times bestseller.
Jodie Archer und Matthew L. Jockers haben computerbasiert, aus rein linguistischer Sicht Stil, Figuren und Handlungsverlauf von Bestsellern der New-York-Times-Bestsellerliste analysiert. Da das Etikett NYT-Bestseller bisher mit großer Treffsicherheit Bücher markierte, die mich beim Lesen enttäuschten, schlug ich den Buchdeckel mit seinem reißerischen Titel mit sehr gemischten Gefühlen auf. Im ersten Kapitel signalisierte mir die Sprache, die ich in der deutschen Übersetzung für das Thema selbstlernende Systeme zu altertümlich fand, dass hier von den Verfassern ausschweifend klassisches US-amerikanisches Marketing betrieben würde. Altertümlich wirkte die Einführung deshalb, weil Entwickler, die ihre Brötchen in dieser Branche verdienen, zwar von selbstlernenden Systemen berichten, selten jedoch von maschinenlernenden.
Vom 2. Kapitel an konnte mich das Versprechen der Autoren fesseln, dass ihr computergestützter Bestseller-o-Meter mit 80%iger Wahrscheinlichkeit voraussagen könnte, ob ein Manuskript es auf die NYT-Bestsellerliste schaffen würde. Als „Futter“ des stilanalysierenden selbstlernenden Programms dienten neben 500 Bestseller-Titeln der NYT-Liste Romane des 19. Jahrhunderts, Science Fiction, Fantasy, Romantik und weniger erfolgreiche e-book-Titel bisher unbekannter Autoren. Bevor Romantexte PC-gestützt ausgewertet werden konnten, musste man sich auf den verlegerischen und buchhändlerischen Instinkt verlassen; exakte Voraussagen über den Erfolg eines Roman-Manuskripts waren jedoch noch nicht möglich. Das Archer/Locker-Team untersuchte nun, welche Themen sich Leser wünschen, die erfolgversprechendste Themengewichtung innerhalb des Textes, Handlungskurve und Erzähltempo. Die Summe dieser Faktoren bildet bei Lesern quer durch alle Leserschichten das Markenzeichen eines erfolgreichen Autors. Als herausragend erfolgreiche Autoren werden Danielle Steel und John Grisham vom PC-Programm TOP-Positionen zugeschrieben, in Übereinstimmung mit der NYT-Bestseller-Liste. Die Treffsicherheit dieses Bestseller-Tools ist wirklich verblüffend und wird durch Mustererkennung und Kombination verschiedener Faktoren möglich. Mit großer Heiterkeit konnte ich über die IT-gestützte Enttarnung von Robert Galbraith als Joanne K. Rowling in der Gegenwart und von Richard Bachman als Stephen King durch den Instinkt eines Buchhändlers in der Vergangenheit lesen. Was kann die rechnergestützte Analyse der beiden Literaturexperten, was ein täglich genutztes agiles Buchhändlergehirn nicht kann, habe ich mich natürlich gefragt. Nachdem ich mich mit dem Girl-Trend des neuen Sub-Genres von “Gone Girl”, “Girl on the Train” und “The Girl with the Dragon Tattoo” befasst hatte, ging es mit der Frage zur Sache, ob es einen perfekten, prognostizierbaren TOP-1-Sieger-Titel gibt, wie ein unbekannter Autor es überhaupt erst auf eine Bestseller-Liste schafft, und ob Romane zukünftig gleich rechnergestützt von PC-Programmen geplottet und formuliert werden. Die vom PC-Programm generierte Bestenliste hat leider meine Erfahrung bestätigt, dass Bücher von US-Bestenlisten mich häufig enttäuschen. Das erste für mich interessante Buch taucht erst auf Platz 29 auf, nur 12 der 100 Romane habe ich gelesen oder möchte sie lesen.
Wer sich dafür interessiert, was Buchleser wünschen, ob ein Bestseller einen prominenten Namen, ein üppiges Werbebudget oder eher Fingerspitzengefühl für die Wünsche der Massen voraussetzt, kann hier interessante Einsichten gewinnen. Sie erfahren, warum 50 Shades of Grey vermutlich kein Zufallstreffer einer Anfängerin war, wie es der unbekannte Anthony Doerr auf die Liste schaffte, welche Tipps aus Schreibratgebern sicher zum Misserfolg führen oder woran stilsichere Sex-Szenen zu erkennen sind.
Interesting analysis of bestsellers. The authors use textual analysis and machine learning to identify features that differentiate bestsellers from non-bestsellers with greater confidence. For example, bestsellers have a few topics (2-3) that make up 40% of the book. The book also identifies 7 types of plot lines that all bestsellers in their corpus (in the thousands) tend to fall into. The use of sentiment analysis to arrive at this conclusion is very clever! Similar their analysis of the recent surge in popularity of books with "The girl ..." in their title is also very insightful. The postscript chapter contains an overview of the NLP & machine learning techniques they used in their analysis. Its a great jumping off point into the complex world of NLP and text processing. Overall an interesting book. I am curious how effective their method has been in identifying bestsellers. Also, I may have missed it but I don't think their approach applies to non-fiction books.
I found this book fascinating but I would have liked twice as much detail and information! I gobbled it up in two nights and came out wanting more. The book involves a lot of literary essay, using the numbers from this algorithm of theirs to back it up. I liked it a lot, but I want more details! More numbers! It's really interesting and I feel like they gave us only a quick peek beneath the surface.
I found the insights quite fascinating. More from an interest point of view, than from an author's point of view. Mind you some of their conclusions can easily be incorporated into your writing - type of pet your fictional family should own, where they live etc. Like any piece of writing advice, read it, and take away from it what you think you can use. Its all knowledge, some is more useful than others. I think this would be a useful read for a new author.
I loved the premise of this book - using algorithms to discover patterns that bestselling books share - but the actual execution was lackluster, dragging on and on and on.
Around half of this book is about how the algorithm works and what it promises to do. I don't mind a few technical details here and there, provided they also deliver what they intend to do, but the delivery part was meh. Face it, as a reader, your incentive for buying this book was the results the algorithm produced, not the story behind it. Besides, a lot if it sounds like some sort of advertisement for the algorithm itself.
Moving on, my main grip with the data shown in this book is that it started from a limited database.
Meaning, these two authors only used the New York Times ADULT FICTION bestseller list when running their algorithm (and they didn't even address that).
Hence, the only books this algorithm is good for is adult fiction. (Never mind that it never addresses non-fiction either - that would probably require coding a new entity from zero.)
Because of this, sometimes there are awkward statements. One of them said that books containing wizards/magic/unicorns etc. are never bestsellers. Excuse me, what? I'm really not a big fan of young adult fiction, but you can't just blatantly ignore the dozens of YA fantasy books that sell millions of copy yearly when you write a book promising to reveal the hidden patterns of bestsellers.
In fact, I'm super curious if the YA/Children books reveal the same data points or some that completely overturn what these two authors already discovered.
Speaking of discoveries - most of the results were computers proving what had already been theorized (nothing bad about this, but nothing surprising either). Readers want characters who are proactive and do things rather than wait for things to unfold. "Very" and verbs that indicate speech tone different from "said" should be avoided. There are only certain types of plots and twists should occur about a third and two thirds in. The common topic of most bestsellers is "Human Connection" etc.
All in all, this book should have been about half its length and focused on the extrapolated data. It was an okay read, but rather underwhelming at times.
I read this only because it is for a bookclub and I managed to find a copy for $7.00. I will admit it did interest that social science geek that I am - I was a sociology and psychology major in university, but beyond that... it was on the drier side. It was not all bad or boring. There were some interesting points, but seriously, having a computer generate the probability of a novel getting on the New York Times Bestseller list. The post script talked about several programs that have tried to write a novel using a computer... now that is a little scary.
You can’t argue with statistics. Ask a computer to crunch some numbers and it never gets them wrong which is why the application of computer science to that most subjective of considerations, what makes a best-selling novel, is so intriguing. It intrigued two busy brains at Stanford University for seven years and their conclusions in The Bestseller Code are going to be required reading for the book trade, writers, aspiring writers and anyone else who loves books. Of course, a computer can only throw out what people feed into it and the amount of data collected by Archer and Jockers is staggering. They devised algorithms to analyse 20,000 novels in terms of their theme, structure, plot, style, characterisation and language, matching the results to a novel’s place in the New York Times bestseller lists. Having been asked the question – ‘what are the elements that make up a bestseller’ – the computer tells us and its conclusions are unsurprising. The shape of a pleasing narrative structure has been understood since classical times and we all like strong characters and accessible language. There is nothing in this book that will require creative writing teachers to modify their courses (the book itself is a masterclass in elegant interpretation) but they will all want to read it simply because, now, ‘the computer says…’ and that, in the digital age, is the alchemy that turns lore into law.
You can’t argue with statistics. Ask a computer to crunch some numbers and it never gets them wrong which is why the application of computer science to that most subjective of considerations, what makes a best-selling novel, is so intriguing. It intrigued two busy brains at Stanford University for seven years and their conclusions in The Bestseller Code are going to be required reading for the book trade, writers, aspiring writers and anyone else who loves books. Of course, a computer can only throw out what people feed into it and the amount of data collected by Archer and Jockers is staggering. They devised algorithms to analyse 20,000 novels in terms of their theme, structure, plot, style, characterization and language, matching the results to a novel’s place in the New York Times bestseller lists. Having been asked the question – ‘what are the elements that make up a bestseller’ – the computer tells us and its conclusions are unsurprising. The shape of a pleasing narrative structure has been understood since classical times and we all like strong characters and accessible language. There is nothing in this book that will require creative writing teachers to modify their courses (the book itself is a masterclass in elegant interpretation) but they will all want to read it simply because, now, ‘the computer says…’ and that, in the digital age, is the alchemy that turns lore into law.