A new computerized scan of the biomedical research literature has turned up tens of thousands of articles in which entire passages appear to have been lifted from other papers. Based on the study, researchers estimate that there may be as many as 200,000 duplicates among some 17 million papers in leading research database Medline.
The finding has already led one publication to retract a paper for being too similar to a prior article by another author.
Researchers Mounir Errami and Harold "Skip" Garner of the University of Texas Southwestern Medical Center at Dallas used a text-matching algorithm to compare seven million Medline abstracts against matching entries flagged by the database's software as being closely related.
The researchers set their own software tool, called eTBLAST, to identify pairs that were more than 45 percent identical, Errami says. The search turned up more than 70,000 hits, which the researchers and a team of three assistants have been manually checking. So far, Errami says they have gone through close to 3,000 pairs of abstracts or the full articles, if the duplicates have different authors. He notes that some matches were found to be innocent duplications, such as reprints or translations.
But in 79 cases (and counting), duplicates with different authors had no obviously legitimate explanation. The group has set up a public Web site, Déjà vu, to document the findings.
The next step in these cases of potential plagiarism, the researchers say, is for journals to investigate. In a Nature report, they advise other scientists "to withhold judgment of any candidate duplicates until evaluated by a suitable body such as an editorial board or a university ethics committee."
They note that most of the questionable duplicates inspected thus far appear to be papers submitted by the same authors to multiple journals, a less serious ethical lapse that allows researchers to artificially inflate their publication credits and give added weight to their work.
Errami and Garner estimate that perhaps 50,000 of the eTBLAST hits and 200,000 (0.01 percent) of the 17 million–plus Medline entries will turn out to be either plagiarized or multiple listings.
Prior studies have come up with different duplication rates. In a 2002 blind survey of 3,247 biomedical researchers by the University of Minnesota, 4.7 percent admitted that they had republished papers and 1.4 percent confessed to borrowing from others' work. A 2006 analysis of more than 280,000 papers in the physics preprint database arXiv, led by a U.S. computer scientist, found that 30,316 (10.5 percent) were suspected duplicates, and 677 (0.2 percent) were potentially plagiarized.
Maxine Clarke, publishing executive editor of the journal Nature, says her publication uses text-matching software to compare a submission with papers in the publishing group's many specialty journals. She notes that they also ask prospective authors to submit copies of preprints and related manuscripts submitted to other journals to help editors and reviewers assess their novelty. Bronwen Dekker, an assistant editor at Nature Protocols, says her journal uses eTBLAST to scan submissions for evidence of self-plagiarism (copying one's past work) in the abstract or introduction.