The fight against plagiarism is about to take a decisive turn. Academic publishers have told Nature they hope that software designed to catch cheating students could soon be used to unmask academics who plagiarize other researchers' — or their own — work.

Big publishers such as Elsevier and Blackwell, which between them publish more than 2,500 journals, have been prompted to act by reports that plagiarism is becoming more common. “We're hearing about it more frequently from editors,” says Bob Campbell, president of Blackwell Publishing in Oxford, UK.

Self-plagiarism, in which authors attempt to pass off already published material as new, is a particular problem. In an increasingly competitive environment where appointments, promotions and grant applications are strongly influenced by publication record, researchers are under intense pressure to publish, and a growing minority are seeking to bump up their CVs through dishonest means.

The extent of the problem is hard to assess. Defining plagiarism is not straightforward (see ‘Where to draw the line?’), and measuring the incidence of even the most clear-cut cases is difficult. Studies in certain fields have estimated that anything up to 20% of published papers contain some degree of self-plagiarism (see ‘How common is plagiarism?’). This may not be representative of basic research, but no rigorous, multidisciplinary study has ever been conducted.

Credit: A. MACDONALD

And although most cases are never discovered, almost all of the editors and publishers contacted by Nature agreed that self-plagiarism is on the rise. “Editors are noticing many more cases,” says Scott Dineen, director of editorial services at the Optical Society of America, which publishes ten journals. Last month, the increase prompted the society to issue an editorial statement on its commitment to expose plagiarism1.

The advent of antiplagiarism software, such as that used by universities to check student essays, means that editors and publishers finally have a practical way to tackle the problem. Online services check essays against massive stores of documents generated from web trawls and purchases from media outlets. Supervisors can see which parts of the essays seem to be plagiarized and where the copied material comes from.

Adapting such technology for use with academic papers should be easy, say publishing experts, as the software could be bolted onto the online systems that publishers use to manage peer review. The system would work in the background and editors would only be aware of it when it flagged up a suspicious degree of overlap, which they could then check out in detail.

“We see it as an idea with great potential,” says Campbell. “It will eventually be part of the editorial office.”

ArXiv, the popular physics preprint server based at Cornell University in Ithaca, New York, is almost ready to deploy plagiarism detection software. Paul Ginsparg, the Cornell physicist who runs arXiv, acted after 22 plagiarized papers were discovered on the archive2.

Physical exercise

Daria Sorokina, a computer science PhD student at Cornell, has tuned an established algorithm to look for any two documents that share at least six of the same words in a row. The system is already finding “plenty of awkward things”, says Ginsparg. One test-run revealed a PhD thesis that shares large chunks of material with a paper posted to the archive three years earlier. So far, Ginsparg says, the tests have revealed a few thousand pairs of articles by different authors that have “excessive overlap”.

Ginsparg plans to post all of the pairs on the arXiv website — without accusing the authors of wrongdoing — and ask the researchers involved to respond. He hopes that the results will help to refine the algorithm, which could then be used on new submissions to generate a warning if papers seem to overlap.

The world's largest scientific publisher — Amsterdam-based Elsevier, which publishes about a quarter of a million papers each year — has also decided to act. Last month, it initiated a year-long assessment of the various technology options.

Tools are now becoming available for individual editors and peer reviewers. One has been developed by Christian Collberg, a computer scientist at the University of Arizona, Tucson. He was asked to review a paper for a conference in 2003, and recalls carrying out a Google search to research it. “I found an earlier published version that had just been reformatted,” he says. Reviewing for the same conference the following year, Collberg found another submission that had copied substantial parts of the author's own earlier work. “This really pissed me off,” he says. “I spent time reviewing those papers.”

Copycats

In response, Collberg began work on what has become the Self-Plagiarism Detection Tool (SPlaT). Whereas publisher-run plagiarism detection services are likely to take years to set up, Collberg's software is available, free-to-use, and targeted at editors and peer reviewers.

The software grabs papers from authors' websites and compares them to each other and to other manuscripts added manually, such as papers under review. Collberg declined to reveal details of what happened when he let SPlaT loose on the websites of 50 computer-science departments, but summary results released last month show that the software turned up more than one pair of conference publications with more than 50% common text and no reference to each other3.

Student antiplagiarism services are another option. These offer the possibility of detecting plagiarism itself, not just duplicate publication. The firm behind the iThenticate software, for example, says its product is licensed by 5,000 institutions and that it can check documents against a database of more than seven billion pages for cases of suspicious overlap. Because this store has been generated in part from web trawls, it contains papers from authors' websites and some open-access journals. If a university subscribes to the service to tackle student fraud, researchers in those institutions can also use it when refereeing papers.

Credit: SOURCE: REF. 6

When Nature gave the software a test-drive, it did a good job of identifying areas where the same text had been legitimately used in different places, such as on authors' websites and in properly referenced quotes. But when a known plagiarism was submitted to iThenticate, it failed to turn up the original, even though it had been published in a high-impact journal.

The reason, says John Barrie, president of iParadigms, the California firm that developed the software, is that most online literature is locked behind subscription barriers and so cannot be added to the iThenticate database.

An overall solution to plagiarism, says Campbell, is likely to come when publishers collaborate on industry-wide detection systems. Preliminary discussions about this have already taken place at meetings of CrossRef. This company, based in Lynnfield, Massachusetts, collaborates with publishers to develop systems that allow researchers to search across journals from many different companies.

Such a system could, in theory, catch almost all instances of direct plagiarism, although it will take several years to set up, even if negotiations go smoothly. In the meantime, say editors, plagiarized papers will continue to creep into the literature as a minority of researchers dishonestly beef up their reference lists. After all, says one researcher who admitted to Nature that he occasionally neglects to mention papers that overlap with new publications, there is a perception that “the Dean can't read, but he can count”.