Researchers have finally created a tool to spot duplicated images across thousands of papers

Computer software can now quickly detect duplicate images across large swathes of the research literature, three scientists say.

In a paper published on 22 February on the bioRxiv preprint server¹ , a team led by Daniel Acuna, a machine-learning researcher at Syracuse University in New York, report using an algorithm to crunch through hundreds of thousands of biomedical papers, searching for duplicate images. If journal editors adopted similar methods, they might be able to more easily screen images before publication — something that currently requires considerable effort and is done by only a few publications.

The work shows that it is possible to use technology to detect duplicates, says Acuna. He isn’t making the algorithm public, but he has discussed it with Lauran Qualkenbush, director of the Office for Research Integrity at Northwestern University in Chicago, Illinois, and vice-president of the US Association of Research Integrity Officers. “It would be extremely helpful for a research-integrity office,” she says. “I am very hopeful my office will be a test site to figure out how to use Daniel’s tool this year.”

In early 2015, Acuna and two colleagues used an algorithm to extract more than 2.6 million images from the 760,000 articles then in the open-access subset of the PubMed database of biomedical literature, which is run by the US National Institutes of Health. These included micrographs of cells and tissues, and gel blots. The algorithm then zoomed in on the most feature-rich areas — where colour and greyscales vary most — to extract a characteristic digital ‘fingerprint’ of each image.

After eliminating features such as arrows or flow-chart components, the team ended up with around 2 million images. The researchers only compared images across papers from the same first and corresponding authors, to avoid the computational load of comparing every image against every other one. But the system could pick up potential duplicates even if they had been rotated, resized or had their contrast or colours changed.

The trio then manually examined a sample of around 3,750 of the flagged images to judge whether they thought the duplicates were suspicious or potentially fraudulent. On the basis of their results, they predict that 1.5% of the papers in the database would contain suspicious images, and that 0.6% of the papers would contain fraudulent images.

The researchers haven’t been able to benchmark the accuracy of their algorithm, says Hany Farid, a computer scientist at Dartmouth College in Hanover, New Hampshire — because there isn’t any database of known duplicate or non-duplicate scientific images against which they could test the tool. But he applauds the trio for applying existing techniques to real-world images and for working to put tools in the hands of journal editors.

Laborious process

At present, many journals check some images but relatively few have automated processes. For instance, Nature runs random spot checks on images in submitted manuscripts and also requires authors to submit unedited gel images for reference. It is currently reviewing its image-checking procedures. (Nature’s news team is editorially independent of its journal team.)

Some journals are following the lead of publications such as the Journal of Cell Biology and The EMBO Journal in manually screening most images in submitted manuscripts. But the process is time-consuming, and a routine, automated screen to streamline the process is long overdue, says Bernd Pulverer, chief editor of The EMBO Journal.

In order to spot image re-use across the literature, publishers would need to create a shared database of all published images against which articles submitted for publication could be compared, says IJsbrand Jan Aalbersberg, head of research integrity at the Dutch publishing giant Elsevier.

There is a precedent for such co-operation. In 2010, scholarly publishers worked together on an industry-wide service to tackle plagiarism. Crossref, a non-profit collaboration of around 10,000 commercial and learned society publishers, created CrossCheck, a service that collates full-text articles from its member publishers and makes use of the iThenticate plagiarism detection software made by Turnitin, a company in Oakland, California. The service, since renamed Similarity Check’, has helped to make it routine practice in publishing to screen submitted manuscripts for plagiarism.

There are currently no plans for a publisher-wide system for image checking, but that is partly because the technologies are not yet mature, says Ed Pentz, executive director of Crossref. But Crossref watches developments in the area with interest, he says.

Elsevier says it would support an initiative such as Similarity Check for images. Two years ago, the company set up a 3-year, €1-million (US$1.2-million) partnership with Humboldt University in Berlin to research article mining and to identify research misconduct. On 25 January, the project announced that it intends to create a database of images from retracted publications. Such a data set would provide a bank of test images for researchers developing automated screening of images in publications.

Issue: *

Your Name: *

Your Email: *

Details: *

Tags: none

7 years ago

Blessing Appiah Kubi

Next drugs and self-control: how the teen brain navigates risk »

Previous « China's lust for jaguar fangs imperils big cats

Cara Membuat Spin & Daftar Akun Lazada dengan Mudah! Hasilnya Fantastis untuk Jualan Online Pemula

Nggak Nyangka! Hidup Gue Berubah Total! Gue bukan siapa-siapa. Cuma anak kos biasa yang kerja serabutan buat nutup biaya hidup… Read More

2 months ago

Electronics

Heart Attack Causes and its Solution

What is the Main Cause of a Heart Attack? What is its Solution? A heart attack is the blockage of… Read More

2 years ago

Bus. Economic & Analysis

Understanding the Debt Ceiling: Its Impact, Importance, and Implications

In the vast economic arena, one term that often takes center stage, inciting extensive debates and discussions, is the "debt… Read More

2 years ago

Bus. Economic & Analysis

De-Dollarization: The New World Order of Currency and Its Global Impact

De-Dollarization: The Changing Face of Global Finance The financial landscape is in a state of flux, with an intriguing economic… Read More

2 years ago

Sports

Unstoppable Bayern Munich: The Story Behind Their 11th Consecutive Bundesliga Title

The curtains closed on a dramatic Bundesliga season with Bayern Munich standing tall once again, clinching their 11th straight title.… Read More

2 years ago

Entertainment & Music

Celine Dion Cancels Concert Tour Due to Deteriorating Stiff-Person Syndrome

The Unfolding Story of Celine Dion's Health In recent news that has left fans across the globe stunned, iconic singer… Read More

2 years ago

Researchers have finally created a tool to spot duplicated images across thousands of papers

Recent Posts

Cara Membuat Spin & Daftar Akun Lazada dengan Mudah! Hasilnya Fantastis untuk Jualan Online Pemula

Heart Attack Causes and its Solution

Understanding the Debt Ceiling: Its Impact, Importance, and Implications

De-Dollarization: The New World Order of Currency and Its Global Impact

Unstoppable Bayern Munich: The Story Behind Their 11th Consecutive Bundesliga Title

Celine Dion Cancels Concert Tour Due to Deteriorating Stiff-Person Syndrome

Headline