Project

General

Profile

Actions

Bug #2283

closed

Problems found after GitLab restore

Added by Florian Uhlig 8 months ago. Updated 8 months ago.

Status:
Closed
Priority:
Immediate
Assignee:
Target version:
-
Start date:
09/29/2021
Due date:
% Done:

100%

Estimated time:

Description

This issue is meant to collect all problems found after the crash and the subsequent restore of the GitLab server.

Please report any problem and most important any inconsistency between your local working copies and the GitLab server.

OK:
Restarting jobs which failed before the crash worked as expected. It seems that the runners (at least one of the singularity runners has proven) are properly connected again and pick up jobs when they appear.

Access the Gitlab server via ssh and https

Pull images from the docker registry

Actions #1

Updated by Florian Uhlig 8 months ago

OK: Restarting jobs which failed before the crash worked as expected. It seems that the runners (at least one of the singularity runners has proven) are properly connected again and pick up jobs when they appear.

Actions #2

Updated by Pierre-Alain Loizeau 8 months ago

I am typically using the ssh+key access instead of the https one. Now ssh complains that the ECDSA key of the host has changed and blocks the fetch.
I know this typically happens almost each time an OS is reinstalled.

Do you think it would be possible to restore the original one or should I simply use the cleanup command of ssh?

Actions #3

Updated by Florian Uhlig 8 months ago

I am typically using the ssh+key access instead of the https one. Now ssh complains that the ECDSA key of the host has changed and blocks the fetch.
I know this typically happens almost each time an OS is reinstalled.

Do you think it would be possible to restore the original one or should I simply use the cleanup command of ssh?

We have this problem whenever a GSI machine is reinstalled, so I fear that this isn't possible but I will confirm with the GSI IT.
I agree that it would be much better to change this on the server than on each client.

Actions #4

Updated by Wojciech Zabolotny 8 months ago

For me also fetching/pushing/clonning via SSH keys doesn't work. The keys are still registered. It looks like the git configuration must be modified to accept SSH key-based access?

Actions #5

Updated by Wojciech Zabolotny 8 months ago

Removal of the old server key and accepting the new one does not help.
Even renaming the ~/.ssh/known_hosts and temporary creating the new one does not.
Looks like the access via SSH key is disabled in the configuration?

Actions #6

Updated by Dirk Hutter 8 months ago

Wojciech Zabolotny wrote:

Removal of the old server key and accepting the new one does not help.
Even renaming the ~/.ssh/known_hosts and temporary creating the new one does not.
Looks like the access via SSH key is disabled in the configuration?

Same for me. My ssh key is not accepted. Why do we have a new OS if there was a backup?

Actions #7

Updated by Dirk Hutter 8 months ago

I think the public key are added by the gitlab deamon to the authorized_keys of the git user. Is this file there and has the correct assess rights?

Actions #8

Updated by Florian Uhlig 8 months ago

Hi Dirk,

do you where the file is located?

Actions #9

Updated by Wojciech Zabolotny 8 months ago

OK. Based on Dirk's idea I have found a workaround.
I have deleted a key and added it again.
So gitlab remembers that the kay was added and does not allow adding it once again.
You have to delete it, and then you can add it.
It looks like the kesy are added to the ~git/.ssh/authorized_key only when they are added via WEB GUI.

Actions #10

Updated by Wojciech Zabolotny 8 months ago

A good question is if there is a command or script that adds all keys registered in the WEB GUI to the authorized_keys...

Actions #11

Updated by Florian Uhlig 8 months ago

Thanks for confirmation. Yes this is what I expected, I am currently checking how to regenerate the file. Give me some more time.

Actions #12

Updated by Florian Uhlig 8 months ago

Okay. I found the magic command and now the file contains many ssh keys. Before there was only one, probably the new one added by Wojciech added some minutes ago.

Could you please check if ssh works again. I did a check and for me after the intervention the test works

ssh -T git@git.cbm.gsi.de
Welcome to GitLab, @f.uhlig!
Actions #13

Updated by Florian Uhlig 8 months ago

Hi Dirk,

Same for me. My ssh key is not accepted. Why do we have a new OS if there was a backup?

The system was completely destroyed. Not the full machine is in the backup but only the content of the application. So the system and the application was reinstalled and then the content of the application was restored from the backup. Obviously the athorized_keys file isn't part of the backup.

Actions #14

Updated by Dirk Hutter 8 months ago

Florian Uhlig wrote:

Okay. I found the magic command and now the file contains many ssh keys. Before there was only one, probably the new one added by Wojciech added some minutes ago.

Could you please check if ssh works again. I did a check and for me after the intervention the test works

[...]

Ok, ssh keys work for me too now

Actions #15

Updated by Dirk Hutter 8 months ago

Florian Uhlig wrote:

Hi Dirk,

Same for me. My ssh key is not accepted. Why do we have a new OS if there was a backup?

The system was completely destroyed. Not the full machine is in the backup but only the content of the application. So the system and the application was reinstalled and then the content of the application was restored from the backup. Obviously the athorized_keys file isn't part of the backup.

I think we should rethink this strategy and make full system backups. Having a backup of the VM itself would also speed up recovery time.

Actions #16

Updated by Florian Uhlig 8 months ago

I think we should rethink this strategy and make full system backups. Having a backup of the VM itself would also speed up recovery time.

The problem actually was that the existing snapshots of the VM were damaged.

Actions #17

Updated by Florian Uhlig 8 months ago

Ok, ssh keys work for me too now

Thanks for confirmation.

Actions #18

Updated by Dirk Hutter 8 months ago

Florian Uhlig wrote:

OK: Restarting jobs which failed before the crash worked as expected. It seems that the runners (at least one of the singularity runners has proven) are properly connected again and pick up jobs when they appear.

I tried a docker job and this also ran without issues. (However the docker image was cached and not pulle from the registry. So can't tell about the registry)

Actions #19

Updated by Florian Uhlig 8 months ago

The registry shouldn't be affected at all since the storage is not on the VM but externally mounted.
But this is a good point. I will check a pull from the registry to confirm that it is still working.

Actions #20

Updated by Florian Uhlig 8 months ago

I checked the registry and was able to pull all images, so there is no problem concerning the registry.

Actions #21

Updated by Florian Uhlig 8 months ago

  • Description updated (diff)
Actions #22

Updated by Florian Uhlig 8 months ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

Since there were no more reports about any problems since more than one week I close the issue.

Actions

Also available in: Atom PDF