My graduate student utilized some particles produced by a collaborator’s lab in some animal imaging experiments, which were designed and analyzed by my student and me. The protocol for synthesizing the particles has already been published by my collaborator, and a technician in the collaborator’s lab made the particles according to the protocol. I shared some of the resulting in vivo images from this study with my collaborator, but unfortunately, I subsequently discovered that the collaborator used some of these images in a fraudulent manner (grossly misrepresenting them as preliminary data in some grant applications). The collaborator’s institution conducted a formal investigation of this and other incidents and found scientific misconduct had occurred. I would like to publish my graduate student’s image data, in part to make sure that a legitimate representation of the data is in the literature, but mostly because the work was publicly-funded and represents the hard work of many good people. If the misconduct hadn’t occurred, I probably would have considered including the collaborator as a co-author on the publication, by getting them more involved in the manuscript, even though the particle prep was not novel. But now there are many reasons why I do not wish to publish something with this collaborator! Is documented scientific misconduct involving the data from this study a valid reason for not including this collaborator as a coauthor of a paper describing this study? Can I simply acknowledge the technician who provided the particles and reference the prior publication of the protocol?

By professional conduct, I refer to issues related to how a researcher manages his/her research group, including funding (both research and salary) and human resource issue.

For example, there is the Office of the Independent Adjudicator in the UK that reviews student complaints against higher-education providers. But the service offered is restricted to students, and not directly relevant to the professional conduct of a researcher.

An example of such an organisation in the financial field is the Financial Conduct Authority, who oversees the professional conduct of both organisations and individuals.

I am surprised by the fact that a journal published an article that I have had in arXiv for a few months. The date of publication is after the date that I posted on arXiv. The submission date in the journal is not mentioned. What procedures I should follow?

Some information to clarify the situation:

  • The article published in the journal is a total plagiarism. They changed only the name of the title.

  • The article is published in a journal in the name of other authors.

  • My article (that is in arXiv) is already accepted in another journal (but not yet online) and the date of acceptance is before the date of publication of that of the other authors.

I come across this recent paper “Discriminating Traces with Time“, which proposed a program analysis technique based on machine learning, with application to security (side-channel attack). Some information:

  • This paper was published in TACAS 2017, which is a very good conference in program analysis, but most (if not all) people have no idea about security or machine learning.
  • The second and third authors are professors with strong publication records in program analysis, but I couldn’t find a single paper of theirs related to machine learning before this one. So I guess this is all the work of the first author, who is a PhD student.

There are 3 reasons I think this is research misconduct:

1) In page 14, “Threats to Validity.”, it was written as the following:

Clearly, the most significant threat to validity is whether
these programs are representative of other applications. To
mitigate, we considered programs not created by us nor known to us
prior to this study.

This is a lie.

In the same page, immediately previous this paragraph is the discussion about the case study TextCrunchr. The authors provides 4 types of text inputs, and one of the four, reversed-shuffled arrays of characters, is the one that lead to worst-case behavior that the tool needs to detect. They even admitted that:

Although the input provided to Discriminer for analyzing TextCrunchr
include carefully crafted inputs

How did they craft the input if they hadn’t known the program prior to that study? By their own words, this is the most significant threat to validity. So the conclusion is this is not a valid method?


2) The experiments are inadequate.

The TextCrunchr case study above was said to be taken from a DARPA program. Searching with keywords “DARPA” and “TextCrunchr” leads me to another paper which demonstrated on exactly the same case study.

Symbolic Complexity Analysis Using Context-Preserving Histories. ICST 2017.

However, that paper used some sophisticated analysis to derive the inputs with reversed-shuffled arrays of characters instead of assuming they are given.

Another case study from DARPA discussed in the paper is SnapBuddy. The authors claimed that they discovered a vulnerability with their tool at the beginning of page 14.

…What leaks is thus the number of 1s in the private key.

I was surprised. This case study is about modular exponentiation (modpow). It is well-known for decades that several implementations of modpow have timing channel that leaks all the private key. While in this case study, knowing the number of 1s is practically useless. So again I searched for DARPA and SnapBuddy, and managed to find the source code of this case study in the appendix of another paper.

This confirmed my guess, there is a vulnerability in the method standardMultiply, which can leak all the private key. The authors of the machine learning paper had fail to recognize.

So the tool is useless even with crafted inputs.


3) This point may be controversial. I don’t think this approach will ever work in practice. For example, the TextCrunchr takes text as input. So assuming all the text are ASCII, and only has 11 characters, enough to store “HelloWorld!”. The state space is 2^88. How do sampling work in this input space? The encryption in SnapBuddy is more than 1500 bytes, i.e. more than 2^1500 in the input space.

Amazingly, the authors managed to discovered the vulnerability by sampling a couple of thousands. How can I believe their results?


With the facts I gave above, is this the case of data manipulation? What should I do?

To be honest, I’m very upset that a paper with this quality could get into TACAS.

Two years ago I read a paper that I thought had implausibly good results. The data were archived online so I examined them: they were clearly fake. The first author replaced them with other data which were also clearly falsified/fabricated.

I reported the paper to the relevant University. Several more versions of the data were produced, and the University’s Research Integrity Office (RIO) closed the case apparently satisfied that the correct version of the data had finally been found. A corrigendum was published. However the new data contain suspicious patterns and several figures in the paper are cannot be reproduced. I don’t know of any explanation for the multiple versions of the data. All versions of the had file-creation time-stamps that post-dated publication.

The original specimens are in the RIO’s possession. A portion of them could be re-analysed non-destructively in 1-3 days (relevant MSc-grade experience required), which would help assuage doubts about the veracity of the paper. The RIO has declined to do this, citing lack of resources and personal skill.

  • Am I being unreasonable in expecting the RIO to do more than (apparently at least) take the author’s word on trust regarding the final data being correct without verifying them?

  • How could I arrange for the specimens to be re-analysed by a third party so that the RIO/journal would accept gross differences from the final data as evidence of malpractice?

  • The RIO tells me that I should contact the author directly to address the irreproducible figures. I doubt the first author will be keen to cooperate – they almost certainly know that I reported their paper to the RIO, know that the RIO has closed the case, and I have no authority to compel disclosure of files etc. Can anything be done here?

I believe the RIO is seeking to avoid making an adverse finding.

PS. The University in question is a large, well-respected and well-funded institution.

A friend of mine considers to join a PhD program in order to go to science. I think she has a very idealized view of science and the prospects there. (I personally am somebody who left science after 10y, without big damage to my life, career or psychological health). I don’t want to convince this friend not to follow this path, but I like to give her a realistic list of major problematic points, psychological stress factors, negative outcomes.

I already gave her some first-hand accounts and these were similar to questions asked here very often. The problem is that I can not quantify the likeliness of the following:

  • psychological problems during the PhD/Postdoc
  • stress due to workload
  • stress due to uncertainity
  • bad supervision (supervisor incapable or unwilling or good supervision)
  • continued unresolved conflicts in the lab
  • bad leadership
  • nepotism of group leaders towards friends/s.o. in the group (e.g. on publication)
  • scientific misconduct (unintentional), e.g. bad data handling, p-value hacking etc.
  • scientific misconduct (intentional), e.g. faking data, plagiarism
  • sabotage of others experiments
  • thesis stopped due to discontinued interest of supervisor

The point is: I don’t want to hear horror stories (seen and heard enough of them, personally and here an stackexchange), but i like to have data (e.g. statistics, studies) which actually quantify the issues in a cpmprehensive way from the viewpoint of a new PhD student. Where can i find such data?

Say author A writes on Topic B and in this context uses quotation C which is not from the same field and appears highly original in Topic B. I then read this article, and I am myself writing on Topic B. I find the quote so useful or relevant, that I quote it myself.

Am I obliged to mention where I found the quote (namely in author A)? Or can I just use the quote? Where are the boundaries here, what do you think is acceptable, what is not? Also, if it is not ethical, that’s one thing – but is it also plagiarism?

PS: If anyone has a good title for this sort of question, please feel free to edit. Not sure how to name this.

I have submitted a paper to a reputed journal (computer science; computer vision to be specific). While i was keenly waiting for the reviews, they were getting delayed. So I began working on an extension of the work.

During that I realized that one of the qualitative figures in the submitted paper might be slightly wrong (a few pixels yet noticeable) due to miscalculation while generating the image. However, the final interpretation and the description of it in the paper still would not change. But, when I critically view the image I think the reviewer may conclude that it may be photoshopped (I dont have better words, too much tensed). I checked on the journal tracking system that the reviews have been received. I am ready for a rejection, but could such a mistake may lead to public shaming or ban from future publications?

Would it be advisable to write to the editor that I have found a mistake in the submitted paper and explain that it does not impact the discussion. Or should I wait and correct them in the next version?

I am the first author of a manuscript that was recently “accepted in principle”. The submission, peer review and revisions of this manuscript happened after I left the laboratory of the corresponding author where I was a postdoc. I was actively involved in the revisions (by email)and guided the graduate student who did the revisions. Following peer review and revisions, the corresponding author wanted to include data (obtained by the graduate student) which I did not approve of. This data was then not included in the revised manuscript or the rebuttal letter. After we received the “accepted in principle” email, the corresponding author again informed me that he wanted to include this data. I objected again. But it appears that he has submitted this data (which was not part of the peer review and has not been seen by the reviewers) without my consent. I would like to write to the editors of the journal asking them to stall the publication of this manuscript since I did not consent to the content and since it includes unreviewed data. Am I correct in doing so?

I am the first author of a manuscript that was recently “accepted in principle”. The submission, peer review and revisions of this manuscript happened after I left the laboratory of the corresponding author where I was a postdoc. I was actively involved in the revisions (by email)and guided the graduate student who did the revisions. Following peer review and revisions, the corresponding author wanted to include data (obtained by the graduate student) which I did not approve of. This data was then not included in the revised manuscript or the rebuttal letter. After we received the “accepted in principle” email, the corresponding author again informed me that he wanted to include this data. I objected again. But it appears that he has submitted this data (which was not part of the peer review and has not been seen by the reviewers) without my consent. I would like to write to the editors of the journal asking them to stall the publication of this manuscript since I did not consent to the content and since it includes unreviewed data. Am I correct in doing so?