I am currently a graduate student, and for a course I gathered hundreds of thousands of records (not confidential, but difficult to access if you don’t already know about them and know who to talk to) and spent several months cleaning, combining, and organizing them into a usable dataset, upon which I then performed statistical analysis. The project is complete and produced interesting results, but likely I won’t turn it into a paper anytime soon.

I found out yesterday that the professor supervising the course spoke to one of his friends and mentioned my project, and the friend asked for my dataset (the cleaned one I produced, not the raw records) to perform his own analysis. Should I share it with this friend? If so, is there a way to ask to be acknowledged in any publications that result?

The data were originally public records, but I did a lot of work that required years of specialized subject-matter knowledge to compile them appropriately. Are there other risks I haven’t considered? I feel a little uncomfortable being asked to share a large amount of work with an academic I don’t know at all, and while I would like to help advance the field in general, I don’t know what’s reasonable to expect here.

Recently I am trying to find the citing papers for 5 reference papers and eliminate the duplicated. I have read an answer by you ‘no, that’s something you want software to do. I just did it for two of my papers using Web of Science. I searched the citing papers for each of them, saved the data sets and compared the data sets with an “AND” function. Turns out only one common citation’ and wonder how do I save the data sets?

I am writing up some research I did using a Kaggle dataset:
https://www.kaggle.com/c/dogs-vs-cats

I want to include one or two images from the dataset in my paper to show the types of images I’m using. Will this run afoul of copyright issues?

The rules for the dataset say:

These images have been published by Microsoft Research for the express purpose of furthering academic research. They may be used for non-commercial research purposes, but they may not be re-published without the express permission of Microsoft Research.

Including an image in my paper is use for non-commercial research purposes, but it is also re-publishing them without the express permission of Microsoft Research. I asked this question on the Kaggle forum, with no response in several weeks, and there is no contact information for anyone at Microsoft Research.

More generally, is it acceptable to publish examples of images from publicly available datasets even if there is not an explicit copyright license? What about ImageNet, where they took images from all over the internet? Is any paper that publishes examples from ImageNet (there are plenty!) breaking copyright law?

Many authors, especially in manufacturing, use data from real manufacturing companies (not from a lab experiment) for their research. They are asked to report only the results from their analysis on those datasets in journal papers without publishing the underlying data due to confidentiality agreements with the companies. However, I see that the journals are increasingly asking the authors to deposit the raw data so that the research is reproducible. On the other hand, the companies are not ready to make their internal data go public but they are ok with publishing the summary statistics on the datasets. As a result, authors face difficulty in publishing their research results in journal papers.

Any strategies to handle this situation effectively and convince editors about the non-availability of the datasets to other researchers?

Many authors, especially in manufacturing, use data from real manufacturing companies (not from a lab experiment) for their research. They are asked to report only the results from their analysis on those datasets in journal papers without publishing the underlying data due to confidentiality agreements with the companies. However, I see that the journals are increasingly asking the authors to deposit the raw data so that the research is reproducible. On the other hand, the companies are not ready to make their internal data go public but they are ok with publishing the summary statistics on the datasets. As a result, authors face difficulty in publishing their research results in journal papers.

Any strategies to handle this situation effectively and convince editors about the non-availability of the datasets to other researchers?

I’m making some decent progress in some modeling and simulations work, and my data plots show some beautiful trajectories. However, a seemingly constant complaint of my advisor is that the data plots are such that the results are difficult to interpret. I use Matlab to write my code, solve equations, and plot data; is there a better data plotting tool that I can use in Matlab – or even another software altogether that can take my Matlab plots as input and enable me to work with the plots in some new environment? Maybe something with Adobe? I have little experience with data plotting outside of the elementary plot / quiver functions in Matlab and plotting in Google docs / sheets / slides and MS Powerpoint / Excel.

Is there a “publishable-quality” data plotting tool I could look into?

I have written a paper concerning the gathering of data for research into a certain problem. The objective of the research is to establish a path to fix the problem. The initial paper stated the problem and proposed protocols for data gathering.

Now, the data has been gathered and processed, an algorithm designed and data gathered that confirms that implementing the algorithm solved the problem.

What do I publish to conclude this project? Do I write another paper and reference the previous paper? add on to the original paper? Use appendices? What is usually done in these cases?

The paper is for internal company use only. Distribution will be among chemists, process control engineers and some technicians.

For my final thesis in university, I have created a dataset of third-party applications for macOS. For each version of each app, the dataset contains the original executable, the original Info.plist file containing app-specific metadata and a couple of other files summarising the app’s contents (file sizes, file names, …). Almost all apps are commercial software.

I am looking for ways to release or make available (parts) of this dataset to the public. Due to copyright law, I probably cannot just release all data. What are my options?

Here is what I’ve been thinking:

I suspect I could release the dataset after all binaries have been removed. However, this reduces the usefulness of the data drastically. Would I also have to strip the Info.plist files as they are copied verbatim from the application bundle? Would I be allowed to include limited parts / metadata of binaries such as linked libraries, imported symbols, functions names, strings and the like?

Another option would be to invite researchers to contact me to get access to the full dataset. Would that be appropriate?

Am I allowed to create a service à la shodan.io, that offers an API for others to work with the dataset, without providing direct access to the binaries?

I do research on particular elites of a particular society and rely heavily on interview data. In one fortuitous dinner, at the table behind me sat three people who are exactly the people I wanted to get access to.

Furthermore, they were discussing something that I am precisely investigating! I was attentive to their conversation (as they were, honestly, speaking very loudly) but by no means eavesdropping and their conversation – if viewed in the context of my work – can prove to be very valuable.

Can I use that data for my work? It does not seem ethical to me. But, if I do not use that data, I know I will be ‘deliberately’ weakening my research and not being true to what I do now know.

Thoughts? What would you do and what would you suggest I do?