Can somebody give me an advice about how to efficiently handle data files, source codes, documents, and manuscripts of research projects based on numerical simulations? My problem is that I use multiple computers while the maily-used one has limited storage space, and that the files and directories tend to scatter over multiple storage devices without being synchronized if I don’t take much effort of manually deciding which ones should be stored where and remembering the decisions. If there are some people under similar situation described below as me, I wonder if they could share their detailed know-hows.

I work on simulation-based research using multiple computing environments (a laptop computer, desktop computers at work and home, as well as computing clusters at work) to write program codes, run the jobs, analyze the output data and make figures, and write manuscripts. The OS is exclusively Linux so far. I don’t have a stable job, and I need to move from one place to the next every now and then.

Most of my human time is spent on my laptop by writing codes and making plots from the output files of simulations. However, this laptop has a limited amount of storage space (about 500 GB). Sometimes one project generates 100 GB or more output data in total before publishing a manuscript, and I work on multiple projects in parallel. I often look at my old source codes, analysis scripts, and examine output data files of finished or frozen projects. Therefore, it would be the most convenient if I could collect all the files related to all the current and past projects on the laptop, but this is not possible due to the limited storage space.

I tried different schemes in the past but have always been frustrated by the difficulty to keep files consistent over multiple storage devices (internal and external hard disk drives on the multiple computers). By consistent, I mean that those files that are intended to be present on multiple drives should be exactly the same and up-to-date with the latest version, while other files should be present only on a spacious drives.

It’s important for me to keep a set of source codes, input data files, raw output files of simulation runs, analysis and plotting scripts, and output files of the analysis for each project in an accessible way, in order to keep myself accountable for my published results as well as to accelerate future research based on the past projects.

In the last few years, I have used git to version control my source codes and keep them synchronized over multiple computers. I also used unison to synchronize data files between my laptop, a USB hard disk drive, and a desktop computer, to some extent. I like both of these tools, and I’m wondering if I can make a more efficient work flow with more extensive use of them. I also use rsync to fetch and push files from one computer to another manually.

I have the following hypothetical directory tree as a collection of all of my projects:

projects/ (about a few TB)
  - projA/ (about a few 100 GB)
    + deploy/
    + dvlp/
      + stable/
        + .git/
        + source/
        + doc/
        + tests/
        + bin/
      + sandbox/
        + debug000
        + ..
        + debug999
    + jobs_v000_machineA/
    + jobs_v000_machineB/
    - jobs_v001_machineA/ (about 100 GB)
        + source/
        + doc/
        + tests/
        + bin/
        - job000/ (about 10 GB)
          - input.dat
          - input_moo.dat
          - out_foo000.dat
          - out_bar000.dat
          - ..
          + graph000/ (about 10 MB)
          + graph001/
        + job001/
        + ..
        + job099/ 
  - projB/
  ..
  - projQ/

but it’s hypothetical at the moment because any of the storage drives doesn’t completely mirror this entire tree. On one drive (an older USB Hard disk drive), the actual tree is

projects/ 
  - projA/
  - ..
  - projN/ 

while on another drive (an internal disk on a computing machine) it is like

projects/ 
  - projL/
  - ..
  - projQ/
    - deploy/
    - jobs_v001_machineA/
        + source/
        + doc/
        + tests/
        + bin/
      - job000/
          - input.dat
          - input_moo.dat
          - out_foo000.dat
          - ..
          - out_foo320.dat
      + job001/

Yet on another drive (an internal disk on the laptop), it is like

projects/ 
  - projL/
  - ..
  - projQ/
    - deploy/
    - dvlp/
      + stable/
      + sandbox/
    - jobs_v001_machineA/
        + source/
        + doc/
        + tests/
        + bin/
      - job000/
          - input.dat
          - input_moo.dat
          - out_foo000.dat
          - ..
          - out_foo320.dat
        + graph000/
        + graph001/
      - job001/

I can buy an external USB hard disk drive of a few TB capacity to store the entire projects/ tree now. But, still I want to have an efficient and reliable way to partially synchronize an appropriate subtree of it with each computers.

I saw the following related questions, but they do not address exactly my concerns.

I found the following tool, but it’s an overkill for me.

I would appreciate your advices.

I wish to provide codes and data to reviewer along with manuscript at the time of submission. I can upload the codes on a public repository, but it will bring it in public domain, I want to provided the codes only when the paper is accepted.

What is the standard practice?

My search directs me towards repositories: Dryad, Figshare and Zenodo. Which one of them provides secure code upload OR limited access?

I am at a difficult position. I work in a major software/telecommunications company, and pursue a doctoral degree at the same time. The company has strict policies against competing behavior. I have implemented a lot of core Internet component code at my free time, but the company would consider releasing those under an open source license as competing behavior. I won’t consider changing the employer, as the salary is acceptable, the job is extremely interesting and I frequently get good extra payments for all of the inventions I have made.

However, I have managed to obtain publication permission for several articles given that main algorithms are only explained as pseudocode. Because my core Internet component is over 20 000 lines of code, I of course cannot explain all as pseudocode.

I would like to submit articles related to this to major IEEE computer society and communications society journals. I would like the reviewers have ability to assess that all experiments have been performed correctly, and thus see the source code for the experiments made in the articles. But I cannot according to the company policy publish the source code to all readers.

Now, what I would like to have is review-only supplemental material. Supplemental material that only the reviewers can assess, and that the journal can store for their private use (e.g. for verification of results if there’s a suspicion of scientific misconduct), but not available publicly for all readers.

Is this kind of review-only supplemental material in general possible? I wouldn’t be surprised to find if the answer was “no”, as it makes it impossible for regular readers to work as unsolicited reviewers, publishing their own commentary on the results in the process.

Of course the answer can depend on the circumstances and the journal, so perhaps asking the editor would be a good option. But I believe this question may have more general value, and thus, I am asking it here as well.

To determine the relative angle of a trailer relative to the towing vehicle a good way is to have a small short focal length black and white camera mounted on the towing vehicle pointing at a distinct horizontal marker on the nose of the tow cup. Unique blocks on this marker can be converted into an angle which can be fed into a program designed to predict the position of the trailer when it is reversed. Other information will be fed into this program such as steering angle. Has anyone worked on a problem similar to this? Any help gratefully appreciated.

I often get asked to review papers on new software tools. This always makes me wonder how one should review such a paper, or even whether software papers should undergo a typical review process at all?

My private checklist typically consists of:

  • Is it available for download for others?
  • Does it run?
  • Did the authors provide benchmark problems? Do they give expected results?
  • Is documentation sufficient?
  • Go through numerical methods used in the software (briefly, no meticulous code analysis)

If everything runs fine and produces expected results, I honestly have nothing else to “demand” from the authors. My referee reports are positive and (embarrassingly) brief in those cases. I’m of the opinion that the user community should “do the rest”; evaluate the software and decide whether it’s useful in their workflows.

I am looking to incorporate a license in the source code of a research project of University A, in which I was the last contributor. Historically the computer-science school has not forced researchers to publish the code with any specific license; they mainly care about the ranking affected by publications, and releasing the code publicly is suggested but not mandatory. Obviously researchers cannot sell the code as the commercial use is reserved to the institution (although software patents are very rare in my country).

Given that the project could benefit from a dissertation that I have been developing on University B, what is the best license that fits for this case? I personally prefer to release the new improved version in a compiled format, and keep the new source code private, so GPL is not an option for me. Is the MIT license a better candidate? (It says “and/or sell copies of the Software” though.) The project in University A is an applied research based on state of the art that have existed freely for a long time.

I’m about to submit a research paper to the Intel STS competition. I am basically finished with everything about my paper, but I need to cite a certain algorithm that I invented in my own paper. I do not want to put it inside the paper because it’s outside the scope of what I talked about in the paper, and besides the paper is already twenty pages so I can’t fit in any more material.

Is it acceptable if I simply post the code to a website like GitHub, and cite the link in my references?

I am doing my thesis-based master’s in computer engineering and am required to program a tool in C++ as a part of my research. I am quite impressed by the help offered on the Code Review website and am considering posting a big chunk of my code for suggestions and improvement. However, I am also worried that the idea contained in my posted code may be plagiarized. This, in turn, might affect the credibility of my work.

I believe one solution is to post a minimal, verifiable and complete example for the code I want suggestions for. However, I am interested in knowing if it is possible to post my code as is and not be worried about it getting copied.

I am doing my thesis-based master’s in computer engineering and am required to program a tool in C++ as a part of my research. I am quite impressed by the help offered on the Code Review website and am considering posting a big chunk of my code for suggestions and improvement. However, I am also worried that the idea contained in my posted code may be plagiarized. This, in turn, might affect the credibility of my work.

I believe one solution is to post a minimal, verifiable and complete example for the code I want suggestions for. However, I am interested in knowing if it is possible to post my code as is and not be worried about it getting copied.

I build simulations and make them open-source. Depending on the context I either license them with Apache or GPL. I publish the simulation results in paper and link to the code.
However sometimes some parts of the code are useful for others regardless of the overall original simulation.

Is there a way in the license to ask/recommend/enforce that people who use some of that code remember to cite the paper associated with it?

Is a friendly reminder in the readme the best I can do?