Did some similar work with similar visualizations ~2009, on ~5.7M research articles (PDFs, private corpus) from scientific publishers Elsevier, Springer:
Newton, G., A. Callahan & M. Dumontier. 2009. Semantic Journal Mapping for Search Visualization in a Large Scale Article Digital Library. Second Workshop on Very Large Digital Libraries at the European Conference on Digital Libraries (ECDL) 2009. https://lekythos.library.ucy.ac.cy/bitstream/handle/10797/14...
I can imagine mining all of these articles was a ton of work. I’d be curious to know how quickly the computation could be done today vs. the 13 hour 2009 benchmark :)
Nowadays people would be slamming those data through UMAP!
In biomedical research or tangential fields, author order generally follows these guidelines:
First author(s): the individual(s) who organized and conducted the study. Typically there is only a single first author, but nowadays there are often two first authors. This is because the amount of research required to generate “high impact” publications simply can’t be done by a single person. Typically, the first author is a Ph.D. student or lab scientist.
Middle authors: Individuals that provide critical effort, help, feedback, or guidance for the study and publication of the research. Different fields/labs have varying stringencies for what is considered “middle authorship worthy”. In many labs, simply being present and helping with the research warrants authorship. In other labs, you need to contribute a lot of energy to the project to be included as an author.
Senior author(s): The primary investigators (PI’s) or lead researchers that run the lab that conducted and published the study. The senior authors are typically the ones that acquire funding and oversee all aspects of the published research. PI’s have varying degrees of hands-on management.
There is some variation in whether the central research question for a manuscript is developed by the first vs. the senior author, but usually it’s the senior author. Also, the first and senior authors typically write the manuscript and seek edits/feedback from middle authors. In other cases, there can be dedicated writers that write the manuscript, who sometimes do/don’t get middle authorship. A main takeaway is: the general outline I’ve provided above is not strictly adhered to.
I’ll take some liberty to apply this outline to this article at hand:
First Author: G. Newton (OP). The scientist who mostly likely conducted all of the data mining and analysis. He likely wrote the article as well.
Middle Author: A. Callahan. It seems like this author was a grad student at the time the article was written. She likely performed essential work for the paper’s publication. This could’ve been: helping with the analysis, data mining, or ideation.
Senior Author: M. Dumontier. A data science professor, now at Maastricht U. He’s a highly cited scientist!
Lastly… if you check out the acknowledgements, you can see three additional names. These people likely helped with setting up compute access, editing, or general ideation.
This is a cool manuscript! Hopefully this overview isn’t TMI and provides some insight into the biomedical/data science publication process.
This is a fairly accurate account of the roles played for this paper. I came up with the original idea, wrote all the code, did the analysis, and wrote the paper, with my colleagues providing input along the way. I did this when I was a researcher at the National Research Council Canada. Thanks! :-)
Agree with this but it does not apply to all fields. Economists have a norm of alphabetizing author names unless the contributions were very unevenly distributed. That way authors can avoid squabbling over contributions.
I always have wondered with these approaches, is there anything in the paper that indicates who was the “lead” author?
Also, to me, the alphabetical order approach reinforces issues with lower rank last names having various advantages. E.g, lower rank alphabetical names doing better in school [0]. Do you have any counterpoint to this that I’m missing?
Newton, G., A. Callahan & M. Dumontier. 2009. Semantic Journal Mapping for Search Visualization in a Large Scale Article Digital Library. Second Workshop on Very Large Digital Libraries at the European Conference on Digital Libraries (ECDL) 2009. https://lekythos.library.ucy.ac.cy/bitstream/handle/10797/14...
I am the first author.