In computer science, we usually implement our work into algorithms and codes. So in order to compare our work to others, we also need to implement their approaches as well.

But the problem is that usually there are recently published works in each area which have used complicated mathematical methods and claimed to beat other previous methods. But some of them are difficult to implement by just reading the papers; or they have not mentioned all the necessary details in the paper for a precise implementation of their work and neither have they make their code available.

How should I compare my work to these most recent algorithms for which the code is not available?

My question is in a similar vein to this one.

My research requires code. I can write code and I know some things about writing good code. Both for readability, (e.g. standardised doc string, descriptive variable names, comments that describe the intention) and safety (assertions in the code, unit tests, sensible error messages). Adding these things takes me more time than omitting them.

At present I will typically do a bare minimum on my first attempt. So I will put a one line doc string in; use descriptive, but occasionally inconstant, variable names; and few comments. If the code doesn’t work first time I will go back and add unit tests and generally improve it’s quality until it is functional.

If at a later stage part of the code needs altering significantly, (perhaps I realise I need it in parallel), then lots of the “polish” will need redoing. This does not make it enticing to polish more earlier on. Occasionally I will chase my tail over a fault that turns out to be originating in an older section of code, that I might have caught if I have been more thorough first time.

The code I write is almost always only used by myself.

Clearly there is a trade-off between doing things by the book and producing results fast. What would you look at to decide if you had found a good balance between these things?

My question is in a similar vein to this one.

My research requires code. I can write code and I know some things about writing good code. Both for readability, (e.g. standardised doc string, descriptive variable names, comments that describe the intention) and safety (assertions in the code, unit tests, sensible error messages). Adding these things takes me more time than omitting them.

At present I will typically do a bare minimum on my first attempt. So I will put a one line doc string in; use descriptive, but occasionally inconstant, variable names; and few comments. If the code doesn’t work first time I will go back and add unit tests and generally improve it’s quality until it is functional.

If at a later stage part of the code needs altering significantly, (perhaps I realise I need it in parallel), then lots of the “polish” will need redoing. This does not make it enticing to polish more earlier on. Occasionally I will chase my tail over a fault that turns out to be originating in an older section of code, that I might have caught if I have been more thorough first time.

The code I write is almost always only used by myself.

Clearly there is a trade-off between doing things by the book and producing results fast. What would you look at to decide if you had found a good balance between these things?

Someone published a paper whose supplementary material contains source code. Can I include part of that code (I would mark it as copied in the source and the manuscript) in my own code, which is intended for the supplementary material of my own paper?

If not (or at least not automatically), who would I have to contact to ask for permission, the journal or the author of the other paper?

This question already has an answer here:

I want to mention the name of an specific function (or method) that I used in a experiment in a normal paragraph in my dissertation. What is the best way to do it? Considering that the official dissertation format that my teachers gave me doesn’t mention how to do it.

For example:

“… with the Java programming language we can use System.nanoTime …”

This question already has an answer here:

Say I have a github repository with the shared code and data used for that conference paper? What is the standard way to refer to it in the paper itself?

  1. Write something like. The code is available in [5], and then give the linkas a reference?
  2. Give the link in the paper text itself?
  3. Give the link as footnotes?
  4. Not mention the link in the paper itself, but send it separately as part of the submission?

Can somebody give me an advice about how to efficiently handle data files, source codes, documents, and manuscripts of research projects based on numerical simulations? My problem is that I use multiple computers while the maily-used one has limited storage space, and that the files and directories tend to scatter over multiple storage devices without being synchronized if I don’t take much effort of manually deciding which ones should be stored where and remembering the decisions. If there are some people under similar situation described below as me, I wonder if they could share their detailed know-hows.

I work on simulation-based research using multiple computing environments (a laptop computer, desktop computers at work and home, as well as computing clusters at work) to write program codes, run the jobs, analyze the output data and make figures, and write manuscripts. The OS is exclusively Linux so far. I don’t have a stable job, and I need to move from one place to the next every now and then.

Most of my human time is spent on my laptop by writing codes and making plots from the output files of simulations. However, this laptop has a limited amount of storage space (about 500 GB). Sometimes one project generates 100 GB or more output data in total before publishing a manuscript, and I work on multiple projects in parallel. I often look at my old source codes, analysis scripts, and examine output data files of finished or frozen projects. Therefore, it would be the most convenient if I could collect all the files related to all the current and past projects on the laptop, but this is not possible due to the limited storage space.

I tried different schemes in the past but have always been frustrated by the difficulty to keep files consistent over multiple storage devices (internal and external hard disk drives on the multiple computers). By consistent, I mean that those files that are intended to be present on multiple drives should be exactly the same and up-to-date with the latest version, while other files should be present only on a spacious drives.

It’s important for me to keep a set of source codes, input data files, raw output files of simulation runs, analysis and plotting scripts, and output files of the analysis for each project in an accessible way, in order to keep myself accountable for my published results as well as to accelerate future research based on the past projects.

In the last few years, I have used git to version control my source codes and keep them synchronized over multiple computers. I also used unison to synchronize data files between my laptop, a USB hard disk drive, and a desktop computer, to some extent. I like both of these tools, and I’m wondering if I can make a more efficient work flow with more extensive use of them. I also use rsync to fetch and push files from one computer to another manually.

I have the following hypothetical directory tree as a collection of all of my projects:

projects/ (about a few TB)
  - projA/ (about a few 100 GB)
    + deploy/
    + dvlp/
      + stable/
        + .git/
        + source/
        + doc/
        + tests/
        + bin/
      + sandbox/
        + debug000
        + ..
        + debug999
    + jobs_v000_machineA/
    + jobs_v000_machineB/
    - jobs_v001_machineA/ (about 100 GB)
        + source/
        + doc/
        + tests/
        + bin/
        - job000/ (about 10 GB)
          - input.dat
          - input_moo.dat
          - out_foo000.dat
          - out_bar000.dat
          - ..
          + graph000/ (about 10 MB)
          + graph001/
        + job001/
        + ..
        + job099/ 
  - projB/
  ..
  - projQ/

but it’s hypothetical at the moment because any of the storage drives doesn’t completely mirror this entire tree. On one drive (an older USB Hard disk drive), the actual tree is

projects/ 
  - projA/
  - ..
  - projN/ 

while on another drive (an internal disk on a computing machine) it is like

projects/ 
  - projL/
  - ..
  - projQ/
    - deploy/
    - jobs_v001_machineA/
        + source/
        + doc/
        + tests/
        + bin/
      - job000/
          - input.dat
          - input_moo.dat
          - out_foo000.dat
          - ..
          - out_foo320.dat
      + job001/

Yet on another drive (an internal disk on the laptop), it is like

projects/ 
  - projL/
  - ..
  - projQ/
    - deploy/
    - dvlp/
      + stable/
      + sandbox/
    - jobs_v001_machineA/
        + source/
        + doc/
        + tests/
        + bin/
      - job000/
          - input.dat
          - input_moo.dat
          - out_foo000.dat
          - ..
          - out_foo320.dat
        + graph000/
        + graph001/
      - job001/

I can buy an external USB hard disk drive of a few TB capacity to store the entire projects/ tree now. But, still I want to have an efficient and reliable way to partially synchronize an appropriate subtree of it with each computers.

I saw the following related questions, but they do not address exactly my concerns.

I found the following tool, but it’s an overkill for me.

I would appreciate your advices.

I wish to provide codes and data to reviewer along with manuscript at the time of submission. I can upload the codes on a public repository, but it will bring it in public domain, I want to provided the codes only when the paper is accepted.

What is the standard practice?

My search directs me towards repositories: Dryad, Figshare and Zenodo. Which one of them provides secure code upload OR limited access?

I am at a difficult position. I work in a major software/telecommunications company, and pursue a doctoral degree at the same time. The company has strict policies against competing behavior. I have implemented a lot of core Internet component code at my free time, but the company would consider releasing those under an open source license as competing behavior. I won’t consider changing the employer, as the salary is acceptable, the job is extremely interesting and I frequently get good extra payments for all of the inventions I have made.

However, I have managed to obtain publication permission for several articles given that main algorithms are only explained as pseudocode. Because my core Internet component is over 20 000 lines of code, I of course cannot explain all as pseudocode.

I would like to submit articles related to this to major IEEE computer society and communications society journals. I would like the reviewers have ability to assess that all experiments have been performed correctly, and thus see the source code for the experiments made in the articles. But I cannot according to the company policy publish the source code to all readers.

Now, what I would like to have is review-only supplemental material. Supplemental material that only the reviewers can assess, and that the journal can store for their private use (e.g. for verification of results if there’s a suspicion of scientific misconduct), but not available publicly for all readers.

Is this kind of review-only supplemental material in general possible? I wouldn’t be surprised to find if the answer was “no”, as it makes it impossible for regular readers to work as unsolicited reviewers, publishing their own commentary on the results in the process.

Of course the answer can depend on the circumstances and the journal, so perhaps asking the editor would be a good option. But I believe this question may have more general value, and thus, I am asking it here as well.