Home > News content

Get lost on Google Books: Can it make a comeback?

via:博客园     time:2017/4/19 13:30:48     readed:639

Google Books is Google's first "lunar" project (Moonshot), but 15 years have passed, this project is not & ldquo; launch & rdquo ;. And then the other "lunar month project" such as unmanned vehicles, Google Glass, stratospheric network balloons and so have aroused widespread concern. This article explores the Google Books program that we do not know clearly.

Book can do anything. The book can split our inner iceberg.

Do you know which of the author's words If you simply rely on people's memory, it is difficult to answer, some people would think to go to Google browser search, although Google provides some references to this sentence URL link, but usually these links are not reliable. In order to get the exact answer to this question, you can use Google Book Search, which can search out millions of digitized text.

After using Google Book Search, you will find that the book can split our inner iceberg. "This sentence was written by Kafka in January 1904 to Oscar & middot; a letter from Pollack" to friends, family and editors' letters & rdquo ;.

Google Books Search is very amazing. 15 years ago, it was an ambitious project: to expand its search to the offline world. Google works with a number of libraries to scan millions of printed books and want to import all offline books into their databases.

Google co-founder Sergey Brin said that "human knowledge for thousands of years, perhaps the most exquisite knowledge are hidden in the book." If Google does not have this, it's too much. & Rdquo;

Today, Google is known for its "lunar" culture, accepting the world's great challenge. Google Books is one of the oldest experienced employees of the company, and the company's first "lunar" project. Scan all books!

Google's early time had a "utopian library" vision that would extend the online search for this convenience to the line. At the time it seemed that the plan was like a singularity in the book world: Google would spread all the books online and in some way produce a phase change in human consciousness. In fact, Google Books has entered the middle of the plan, the more than 25 million text fragments sent to its database.

Google is still insisting on their original intention, of course, in addition to these, here also sustenance of other people's hope.

However, the good vision of Google Books is broken. Shortly after the release of the project, the legal problems were quickly encountered, the authors protested against Google's infringement of their copyright, and publishers protested against the protection of their industry from infringement. Followed by decades of legal battlefield, and this dispute finally settled last year, the US Supreme Court rejected the author's call, has long been shrouded in the Google head of the dark clouds finally dispersed.

This ultimately changed the Google Books program and lost power and ambition for project employees who fell into legal terms for decades and Google and Google Books.

When I was studying this story, I was worried that Google had given up the project. Google Books has kept some secrets unresolved, much like Google's other project style. But when I started asking questions, everyone was silent, and for a few weeks it seemed that no one could talk about the current situation of Google Books.

Google Books' history & rdquo; page was turned off in 2007, and its blog is also updated in 2012, after which Google Books has been placed on the main Google search blog, where the information about the book is almost impossible to find. Google Books is a meaningful service and continues to be outside attention. But as a project is still in progress, almost no information is published, as disappeared. Moreover, Google Books legal disputes also won, all of which seem incredible.

When I went to several alumni who had left Google, several people mentioned that they had suspected that Google had stopped scanning the book. Finally, I learned that there are some Google employees working on the book search project, and still add new books, although the time to add new books has not yet reached the time of 2010-2011.

Google engineer Stephane Jaskiewicz said: "Our focus is not directly on the user interface and function, which is more like behind the scenes work, improve the technology." By getting the content and doing it properly, we can view the entire book online and then adjust the search algorithm. & Rdquo;

A focus on running Google Books is to constantly improve the scanning of new book scanners. In 2002, at the beginning of the project, Larry & Middot; Paige and Marissa & Middot; Meyer estimated how long it would take to scan all the books, they set up a digital camera on the gantry and used the metronome for timing The When the company began to seriously deal with this project, to improve the efficiency of scanning, for each operation details are very particular about.

Jaskiewicz said the scanner was constantly updated, and the new version was launched every six months. At the start of the project, LED lighting is not enough, so we have to study the technology to allow human operators to more effectively flip. "It's almost like playing a guitar." "Said Jaskiewicz. So we need to find someone who will turn the page. & Rdquo;

However, most of Google's work is still to ensure that the quality of the search to ensure that users can quickly find the required content of the book, which is actually a boring game, not like in the lunar month, more like a maintenance satellite.

In order to understand how Google Books is going to this step, you need to master some of the contents of copyright law, books are divided into three categories: one is open, mainly published before 1923 books, and the author to give up the copyright Books, which means that you can use these book content to do what you want to do; the second category is published and has copyright restrictions, this category has a lot of books, if you want to do anything on these book content, The author and the publisher to negotiate; the third category is not published but still copyright restrictions on the books, commonly known as "lanquo; orphan works & rdquo ;. A study by the US Copyright Office shows that these books have 17% to 25% of published works, 70% of the special collections.

How many books are this? No one knows the exact answer, it also depends on how you define the book & rdquo; this is not as easy as it sounds. In 2010, a Google engineer named Leonid Taycher wrote a blog post that mentions metadata on Google Books and concludes that the figure was about $ 130 million. Other people see this figure, that is not true. Real numbers may be slightly lower than Taycher's numbers, but much higher than Google's current bookings of more than 25 million.

A large part of Google Books is & ldquo; orphan works & rdquo ;. You can borrow a book from the library, or buy a book in a second-hand bookstore. However, once Google Books will scan them all and will be on the Internet, everyone seems to want a copy.

The next legal dispute is actually a struggle against these "orphan works & rdquo;", Google, publishers and authors want to control the digitization of these books. The three parties finally reached the Google Books Agreement, and Google can continue to scan for the provision of these "orphan works" and allocate funds to compensate authors and publishers. But in 2011, a federal judge rejected the agreement on the grounds that someone was worried that Google, as a private profit-making company, would become a monopoly of the "cosmic library" and charge a fee.

The agreement is invalid, Google has resumed scanning, publishers also want to participate in the e-book market this emerging business in the future of the book market can go beyond Google, we have seen the success of the Amazon Kindle. But the Writers Association continued to file a lawsuit alleging that Google scanned and indexed the book without the permission of the copyright holder. Although Google is rich, it can not pay billions of dollars in copyright infringement (millions of books, each paying thousands of dollars). The matter has been delayed until last year, the Supreme Court to determine Google in the search results have the right to classify books, and provide a short book fragments, as the same page.

This decision represents a big step forward for Google and the project owner. Google product consultant Erin Simon said: "Now we have created a precedent, everyone has benefited. "This will be written into the textbook, so that we understand the meaning of fair use. & Rdquo;

GoogleGoogle 图书计划的迷失:它能否东山再起?(下)

Although the writers' association failed in the lawsuit, they believed the battle was worth it.

James Gleick, president of the association, said Google was wrong from the start. "Google in the beginning of this project, did not take into account the need for the support of these original authors. Large companies do not have enough respect for creative work. "Google thinks he is now the owner of the universe, but in fact they should only be authorized to use the book only." & Rdquo;

We take it for granted that the victory of the litigation means that Google Books this project recovery, maybe Google will improve the scanner, full implementation of the project! But the evidence shows that this is not the case. One reason is that the database is already big. "We have a fixed spending budget," says Google engineer Stephane Jaskiewicz. "At the beginning, we scanned the books on each shelf of the library, and sometimes found a lot of duplicate content & ldquo ;. And now Google will be provided to the cooperative library a & rdquo; selection list & ldquo ;.

Google's enthusiasm retreat, there are other explanations: litigation will bring the depression. Google now has a lot of exciting new projects, these projects shine and effective soon. However, for Google Books projects, although scanning all the books is indeed very useful, but almost impossible to "change the world".

For many people who love the book, Google as their own "universe library" is meaningless, this role is precisely some of the public institutions. Google let everyone know that "scanning all the books" can be achieved, many people will come to solve this problem. Brewster Kahle's online archives store historical snapshots of the entire network, and they already have their own scanning operations. Beginning in 2010, the Harvard Berkman Center of the Digital Public Library of the United States has become a place where many libraries and institutions exchange electronic books.

Google collaborated with the university library to scan their collections and agreed to copy a copy of the data to the library. In 2008, HathiTrust began organizing and sharing these documents. HathiTrust has 125 member organizations and organizations, and believes that through cooperation, we can better manage research and cultural heritage, rather than let Google such organizations go it alone. "Said Mr. Hardishough, chairman of HathiTrust. Of course, the Library of Congress, their new leader, Carla Hayden, promised to open the public's collection by digitizing.

In a sense, these are competitors to Google Books. But in fact, Google is far ahead of them, none of them may catch up with Google. We all understand that Google spent hundreds of millions of dollars to build Google books, no one would be willing to spend the money to do a "Google Books & rdquo; project.

However, these nonprofits have an advantage over Google: Google will change the priority of the project because of a change in the company's strategy, not a nonprofit organization. Their most important business is the book, not by the advertising business or smart phone ecosystem and the like interference. Unlike Google, they are always passionate about the reader and looking for new ways to connect readers and books.

There is a saying that an indefinite litigation would turn into a wave of hunger and drown all the participants (the Bleak House from Dickens, a cross-century real estate war, and the legal costs of its lawsuit embezzled all its assets). In the field of science and technology, like the famous IBM antitrust lawsuit for many years, which gives its competitors to seize the opportunity, when Microsoft is busy with this legal dispute, Google dominates the search industry.

Google Books has its own value.

As pointed out by Gleick, the chairman of the Writers' Association, Google has embarked on the need for forgiveness rather than a (prior) request for permission to launch the project, which is now the practice of many start-ups. In a sense, Google Books is like the intellectual property community Uber, a reading and sharing services, looking forward to the future to imagine the same development, serving the whole of mankind. It was naïve, and soon the opponents of Google Books were raging.

However, Google's lesson gives it a lot of income, growing stronger: the project is great, but this is not the answer to all the questions. Sometimes you have to learn the political approach, to the interests of those who consult, with allies, to compromise to competitors. As a result, Google hired a group of lobbyists and lawyers, and on other issues, it was more cautious and better done to deal with YouTube's copyright issues. Google grew up, it can do "lunar month & rdquo ;, also understand that not all" lunar "and" will be achieved. "

Google may take action again on the issue of "orphan works & rdquo". But it looks like it will wait for someone to succeed first. Jaskiewicz said: "If the law does not change, I do not know what else I can do. & Rdquo;

When I write this article, I have been remembered a few years ago read a book, "Life Book" (Mr. Penumbra & rsquo; s 24-Hour Bookstore), which is Robin & middot; A whimsical novel, tells a story of a century of secret society, everyone with a riddle to write their own "life book" and "rbquo ;. Google plays a vital role in it, because the center of the story revolves around the protagonists trying to reveal the riddles. Facts have proved that even Google's unparalleled ability to do so can not do this. This requires a protagonist and a special book that provides an interesting insight. At the end of the story, Sloan said, "exactly the correct book, appears in the exact right time & rdquo ;.

This book reminds us that Google's engineering approach is not a panacea. They break down a huge challenge into several pieces that can be processed, then convert it into data, and apply effective routine procedures, which is an effective way of working. It can take you to the "utopian library" a big step forward, but can not let you reach the final destination.

Even if you arrive at the destination, it is not that "utopian library" and there will be more hard work in front. Because when you put a book into data, you can easily find the index and search for the fragment, but this does not fundamentally make reading easier, read so that you temporarily into someone else's world, which is Can not replace the experience.

Up to now, the experience of reading requires the dedication of mankind. Indexes like Google Books can help us find and analyze text, but using them is still our own job. No need to think (with no grand epiphany), perhaps the pursuit of digital all books will be disappointed.

Like many tech enthusiasts, Sloan said he often used Google books, but unfortunately, Google books did not continue to develop, no longer continue to shock us. "I hope this is a glittering, beautiful and useful thing that keeps making progress and becomes more and more interesting" he said. He also wondered: We understand that because of legal reasons, Google can not let everyone read the millions of books, but if they can be read for the machine?

Sloan pointed out that the machine learning in the rapid development of the machine learning culture has the feeling of Homebrew Computer Club and the early Internet. But to make progress, researchers need a lot of data to train their programs. If Google can find a way to get a book corpus, split it by way of type, theme, time, etc., and provide it to the machine to study the researchers, school amateurs and others, and I bet there will be some interesting Results. "He thinks Google has already done so, but Google Books Jaskiewicz and others will not be disclosed to the outside world.

Perhaps the future of the neural network model with self-consciousness, but also immersed in the Kafka's text, as we humans, by reading the book, breaking the inner iceberg and comfort (Kafka & ldquo; book You can split our inner iceberg & rdquo;). Or maybe it is different from humans, it will be able to read all the scanned books & mdash; really read the books, then what about it?

This is only Meng Meng

Editor: Yang Zhifang

This article comes from:Backchannel.com

China IT News APP

Download China IT News APP

Please rate this news

The average score will be displayed after you score.

Post comment

Do not see clearly? Click for a new code.

User comments