2012-09-19 23:25:11 +00:00
=================
Troubleshooting
=================
2013-08-05 20:49:58 +00:00
The Gateway Won't Start
=======================
If you cannot start the gateway (i.e., there is no existing `` pid `` ),
check to see if there is an existing `` .asok `` file from another
user. If an `` .asok `` file from another user exists and there is no
running `` pid `` , remove the `` .asok `` file and try to start the
2018-11-14 15:27:48 +00:00
process again. This may occur when you start the process as a `` root `` user and
2013-08-05 20:49:58 +00:00
the startup script is trying to start the process as a
`` www-data `` or `` apache `` user and an existing `` .asok `` is
preventing the script from starting the daemon.
2013-11-05 17:02:47 +00:00
The radosgw init script (/etc/init.d/radosgw) also has a verbose argument that
2018-11-14 15:27:48 +00:00
can provide some insight as to what could be the issue::
2013-11-05 17:02:47 +00:00
/etc/init.d/radosgw start -v
2018-11-14 15:27:48 +00:00
or ::
2013-11-05 17:02:47 +00:00
/etc/init.d radosgw start --verbose
2013-08-05 20:49:58 +00:00
2012-09-19 23:25:11 +00:00
HTTP Request Errors
===================
Examining the access and error logs for the web server itself is
probably the first step in identifying what is going on. If there is
a 500 error, that usually indicates a problem communicating with the
`` radosgw `` daemon. Ensure the daemon is running, its socket path is
configured, and that the web server is looking for it in the proper
location.
Crashed `` radosgw `` process
===========================
If the `` radosgw `` process dies, you will normally see a 500 error
from the web server (apache, nginx, etc.). In that situation, simply
restarting radosgw will restore service.
To diagnose the cause of the crash, check the log in `` /var/log/ceph ``
and/or the core file (if one was generated).
Blocked `` radosgw `` Requests
============================
If some (or all) radosgw requests appear to be blocked, you can get
some insight into the internal state of the `` radosgw `` daemon via
its admin socket. By default, there will be a socket configured to
reside in `` /var/run/ceph `` , and the daemon can be queried with::
2015-09-04 19:59:34 +00:00
ceph daemon /var/run/ceph/client.rgw help
2012-09-19 23:25:11 +00:00
help list available commands
objecter_requests show in-progress osd requests
perfcounters_dump dump perfcounters value
perfcounters_schema dump perfcounters schema
version get protocol version
Of particular interest::
2015-09-04 19:59:34 +00:00
ceph daemon /var/run/ceph/client.rgw objecter_requests
2012-09-19 23:25:11 +00:00
...
will dump information about current in-progress requests with the
RADOS cluster. This allows one to identify if any requests are blocked
2014-03-08 23:58:57 +00:00
by a non-responsive OSD. For example, one might see::
2012-09-19 23:25:11 +00:00
{ "ops": [
{ "tid": 1858,
"pg": "2.d2041a48",
"osd": 1,
"last_sent": "2012-03-08 14:56:37.949872",
"attempts": 1,
"object_id": "fatty_25647_object1857",
"object_locator": "@2",
"snapid": "head",
"snap_context": "0=[]",
"mtime": "2012-03-08 14:56:37.949813",
"osd_ops": [
"write 0~4096"]},
{ "tid": 1873,
"pg": "2.695e9f8e",
"osd": 1,
"last_sent": "2012-03-08 14:56:37.970615",
"attempts": 1,
"object_id": "fatty_25647_object1872",
"object_locator": "@2",
"snapid": "head",
"snap_context": "0=[]",
"mtime": "2012-03-08 14:56:37.970555",
"osd_ops": [
"write 0~4096"]}],
"linger_ops": [],
"pool_ops": [],
"pool_stat_ops": [],
"statfs_ops": []}
In this dump, two requests are in progress. The `` last_sent `` field is
the time the RADOS request was sent. If this is a while ago, it suggests
that the OSD is not responding. For example, for request 1858, you could
check the OSD status with::
ceph pg map 2.d2041a48
osdmap e9 pg 2.d2041a48 (2.0) -> up [1,0] acting [1,0]
This tells us to look at `` osd.1 `` , the primary copy for this PG::
2015-09-04 19:59:34 +00:00
ceph daemon osd.1 ops
2012-09-19 23:25:11 +00:00
{ "num_ops": 651,
"ops": [
{ "description": "osd_op(client.4124.0:1858 fatty_25647_object1857 [write 0~4096] 2.d2041a48)",
"received_at": "1331247573.344650",
"age": "25.606449",
"flag_point": "waiting for sub ops",
"client_info": { "client": "client.4124",
"tid": 1858}},
...
The `` flag_point `` field indicates that the OSD is currently waiting
2013-06-11 19:12:46 +00:00
for replicas to respond, in this case `` osd.0 `` .
Java S3 API Troubleshooting
===========================
Peer Not Authenticated
----------------------
You may receive an error that looks like this::
[java] INFO: Unable to execute HTTP request: peer not authenticated
The Java SDK for S3 requires a valid certificate from a recognized certificate
authority, because it uses HTTPS by default. If you are just testing the Ceph
Object Storage services, you can resolve this problem in a few ways:
#. Prepend the IP address or hostname with `` http:// `` . For example, change this::
conn.setEndpoint("myserver");
To::
conn.setEndpoint("http://myserver")
#. After setting your credentials, add a client configuration and set the
protocol to `` Protocol.HTTP `` . ::
AWSCredentials credentials = new BasicAWSCredentials(accessKey, secretKey);
ClientConfiguration clientConfig = new ClientConfiguration();
clientConfig.setProtocol(Protocol.HTTP);
AmazonS3 conn = new AmazonS3Client(credentials, clientConfig);
405 MethodNotAllowed
--------------------
If you receive an 405 error, check to see if you have the S3 subdomain set up correctly.
You will need to have a wild card setting in your DNS record for subdomain functionality
to work properly.
2013-08-05 20:49:58 +00:00
Also, check to ensure that the default site is disabled. ::
2013-06-11 19:12:46 +00:00
[java] Exception in thread "main" Status Code: 405, AWS Service: Amazon S3, AWS Request ID: null, AWS Error Code: MethodNotAllowed, AWS Error Message: null, S3 Extended Request ID: null
2019-03-12 15:42:25 +00:00
Numerous objects in default.rgw.meta pool
=========================================
Clusters created prior to *jewel* have a metadata archival feature enabled by default, using the `` default.rgw.meta `` pool.
This archive keeps all old versions of user and bucket metadata, resulting in large numbers of objects in the `` default.rgw.meta `` pool.
Disabling the Metadata Heap
---------------------------
Users who want to disable this feature going forward should set the `` metadata_heap `` field to an empty string `` "" `` ::
$ radosgw-admin zone get --rgw-zone=default > zone.json
[edit zone.json, setting "metadata_heap": ""]
$ radosgw-admin zone set --rgw-zone=default --infile=zone.json
$ radosgw-admin period update --commit
This will stop new metadata from being written to the `` default.rgw.meta `` pool, but does not remove any existing objects or pool.
Cleaning the Metadata Heap Pool
-------------------------------
Clusters created prior to *jewel* normally use `` default.rgw.meta `` only for the metadata archival feature.
However, from *luminous* onwards, radosgw uses :ref: `Pool Namespaces <radosgw-pool-namespaces>` within `` default.rgw.meta `` for an entirely different purpose, that is, to store `` user_keys `` and other critical metadata.
Users should check zone configuration before proceeding any cleanup procedures::
$ radosgw-admin zone get --rgw-zone=default | grep default.rgw.meta
[should not match any strings]
Having confirmed that the pool is not used for any purpose, users may safely delete all objects in the `` default.rgw.meta `` pool, or optionally, delete the entire pool itself.