Commit Graph

38650 Commits

Author SHA1 Message Date
David Zafman
f24f646d87 ceph_objectstore_tool: Check that pool exists before allowing import
Signed-off-by: David Zafman <dzafman@redhat.com>
2015-01-15 11:23:33 -08:00
David Zafman
196c8112dc ceph_objectstore_tool: Check cluster_fsid before allowing an import
Signed-off-by: David Zafman <dzafman@redhat.com>
2015-01-15 11:23:32 -08:00
David Zafman
62dd912f11 ceph_objectstore_tool: Allow the metadata_section to be anywhere in the export
Signed-off-by: David Zafman <dzafman@redhat.com>
2015-01-15 11:23:32 -08:00
David Zafman
f727d2eaf5 ceph_objectstore_tool: import-rados shouldn't import internal namespace objects
Signed-off-by: David Zafman <dzafman@redhat.com>
2015-01-15 11:23:32 -08:00
David Zafman
ddc4613ec7 ceph_objectstore_tool: Get g_ceph_context available to import-rados
Signed-off-by: David Zafman <dzafman@redhat.com>
2015-01-15 11:23:32 -08:00
David Zafman
fe936026ed ceph_objectstore_tool: Fix import-rados skipping of snapshots
Signed-off-by: David Zafman <dzafman@redhat.com>
2015-01-15 11:23:32 -08:00
David Zafman
5cb692528e ceph_objectstore_tool: read_fd() doesn't handle ^D from tty stdin, don't allow
Signed-off-by: David Zafman <dzafman@redhat.com>
2015-01-15 11:23:32 -08:00
David Zafman
3de2d3bfb3 ceph_objectstore_tool: validate pgid before calling PG::_has_removal_flag()
The "meta" directory could be removed when trying to removing pgid 0.0

Signed-off-by: David Zafman <dzafman@redhat.com>
2015-01-15 11:23:32 -08:00
David Zafman
22b71744bb ceph-objectstore-tool: Remove --pretty-format and use new --format options
Call new_formatter() with --format specified argument

Signed-off-by: David Zafman <dzafman@redhat.com>
2015-01-15 11:23:32 -08:00
Sage Weil
062d3b0215 Merge pull request #3379 from ceph/wip-mon-drop-conversion
mon: drop store conversion code

Reviewed-by: Sage Weil <sage@redhat.com>
2015-01-15 11:22:16 -08:00
Gregory Farnum
11062d2f45 Merge pull request #3377 from ceph/wip-fail-idempotent
mon/MDSMonitor: make 'mds fail' idempotent for IDs

Reviewed-by: Greg Farnum <gfarnum@redhat.com>
2015-01-15 11:21:18 -08:00
Sage Weil
80473f6385 os/FileJournal: Fix journal write fail, align for direct io
when config journal_zero_on_create true, osd mkfs will fail when zeroing journal.
journal open with O_DIRECT, buf should align with blocksize.

Backport: giant, firefly, dumpling
Signed-off-by: Xie Rui <875016668@qq.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2015-01-15 11:20:18 -08:00
Jerry7X
cc0dba5261 mon: encode stashed monmap with all features
latest_monmap that we stash is only used locally--the encoded bl is never shared. Which means we should just use CEPH_FEATURES_ALL all of the time.

Fixes: #5203
Backport: giant, firefly
Signed-off-by: Xie Rui <875016668@qq.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Joao Eduardo Luis <joao@redhat.com>
2015-01-15 11:13:17 -08:00
Haomai Wang
7bb7b1ec1e AsyncConnection: Fix deadlock if socket failed when replacing
If client reconnect a already mark_down endpoint, server-side will detect
remote reset happen, so it will reset existing connection. Meanwhile,
retry tag is received by client-side connection and it will try to
reconnect. Again, client-side connection will send connect_msg with
connect_seq(1). But it will met server-side connection's connect_seq(0),
it will make server-side reply with reset tag. So this connection will
loop in reset and retry tag.

One solution is that we close server-side connection if connect_seq ==0 and
no message in queue. But it will trigger another problem:
1. client try to connect a already mark_down endpoint
2. client->send_message
3. server-side accept new socket, replace old one and reply retry tag
4. client plus one to connect_seq but socket failure happen
5. server-side connection detected and close because of connect_seq==0 and no
message
6. client reconnect, server-side has no existing connection and met
"connect.connect_seq > 0". So server-side will reply to RESET tag
7. client discard all messages in queue. So we lose a message never delivered

This solution add a new "once_session_reset" flag to indicate whether
"existing" reset. Because server-side's connect_seq is 0 only when it never
successfully or ever session reset. We only need to reply RESET tag if ever
session reset.

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2015-01-16 03:07:14 +08:00
Haomai Wang
bd627e7742 Event: Fix typo
Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2015-01-16 03:07:13 +08:00
Haomai Wang
f7f25b4cbb AsyncConnection: Don't increment connect_seq if connect failed
If connection sent many messages without acked, then it was marked down.
Next we get a new connection, it will issue a connect_msg with connect_seq=0,
server side need to detect "connect_seq==0 && existing->connect_seq >0",
so it will reset out_q and detect remote reset. But if client side failed
before sending connect_msg, now it will issue a connect_msg with non-zero
connect_seq which will cause server-side can't detect exist remote reset.
Server-side will reply a non-zero in_seq and cause client crash.

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2015-01-16 03:07:13 +08:00
Haomai Wang
898d43dbee async: adjust test_msgr and normalize log output format
Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2015-01-16 03:07:13 +08:00
Haomai Wang
296e5457be AsyncConnection: Fix replacing cause original state lossy
Because AsyncConnection won't enter "open" tag from "replace" tag,
the codes which set reply_tag won't be used when enter "open" tag.
It will cause server side discard out_q and lose state.

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2015-01-16 03:07:13 +08:00
Haomai Wang
2bc16752c4 AsyncConnection: Don't discard out_q and unregister when replacing
Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2015-01-16 03:07:13 +08:00
Haomai Wang
c65df9b5ff test_msgr: Add SyntheticInjectTest
Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2015-01-16 03:07:13 +08:00
Haomai Wang
a75ac0ea46 AsyncConnection: Add ms_inject_* to AsyncConnection
Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2015-01-16 03:07:12 +08:00
Haomai Wang
50771dd7e6 AsyncConnection: Enhance replace process
Make handle_connect_msg follow lock rule: unlock any lock before acquire
messenger's lock. Otherwise, deadlock will happen.

Enhance lock condition check because connection's state maybe change while
unlock itself and lock again.

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2015-01-16 03:07:12 +08:00
Haomai Wang
a1753902dc AsyncConnection: set state_offset=0 in case of reuse this connection
Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2015-01-16 03:07:12 +08:00
Haomai Wang
2f9238361c Event: Fix incorrect memset
Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2015-01-16 03:07:12 +08:00
Haomai Wang
4b900a6f82 test_msgr: Add SyntheticWorkload to do message measurement
Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2015-01-16 03:07:12 +08:00
Haomai Wang
e823af41df AsyncConnection: Don't alloc buffer when reenter "READ_FRONT" state
Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2015-01-16 03:07:12 +08:00
Haomai Wang
9fc24d4eb9 test_msgr: Add test for a message with large payload
Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2015-01-16 03:07:11 +08:00
Haomai Wang
34cbd4c76c AsyncConnection: Avoid calling callback after delteing AsyncMessenger
Now when calling mark_down/mark_down_all, it will dispatch a reset event.
If we call Messenger::shutdown/wait, and it will let reset event called after
Messenger dealloc.

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2015-01-16 03:07:11 +08:00
Haomai Wang
9a84a905fd test_msgr: Add random usleep to Dispatcher impl
Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2015-01-16 03:07:11 +08:00
Haomai Wang
e7db911489 AsyncMessenger: wait for dispatch event done
In order to avoid deadlock like:
1. mark_down_all with holding lock
2. ms_dispatch_reset
3. get_connection want to get lock
4. deadlock

We signal a workerpool barrier to wait for all in-queue events done.

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2015-01-16 03:07:11 +08:00
Haomai Wang
e84d1344fe AsyncConnection: Add omissive STATE_WAIT state
Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2015-01-16 03:07:11 +08:00
Haomai Wang
cb3e1bf40b AsyncConnection: Adjust backoff wakeup granularity
Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2015-01-16 03:07:10 +08:00
Haomai Wang
44a01894d9 AsyncConnection: using send_keepalive instead of _send_keepalive_or_ack
Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2015-01-16 03:07:10 +08:00
Haomai Wang
a98b9e2f70 AsyncConnection: Fix mark_down race condition
Previously, if caller want to mark_down one connection and caller is event
thread callback, it will block for the wakeup. Meanwhile, the expected event
thread which will signal the blocked thread may also want to mark_down
connection which is own by already blocked thread. So deadlock is happen.

As tradeoff, introduce lock to file_events which can avoid create/delete
file_event callback. So we don't need wait for callback again.

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2015-01-16 03:07:10 +08:00
Haomai Wang
24fd12f48d MessengerTest: Add markdown with caller lock tests
Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2015-01-16 03:07:10 +08:00
Haomai Wang
abb4e68200 AsyncMessenger: Retry binding on addresses if binding fails
Learn from commit(2d4dca757e) for
SimpleMessenger:

If binding on a IP-Address fails, delay and retry again.

This happens mainly on IPv6 deployments. Due to DAD (Duplicate Address Detection)
or SLAAC it can be that IPv6 is not yet available when the daemons start.

Monitor daemons try to bind on a static IPv6 address and that might not be available
yet and that causes the monitor not to start.

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2015-01-16 03:07:10 +08:00
Haomai Wang
0a7c331c49 AsyncMessenger: allow RESETSESSION whenever we forget an endpoint
Learn from SimpleMessenger(8cd1fdd7a7)

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2015-01-16 03:07:10 +08:00
Haomai Wang
d93bdade3e AsyncConnection: Using buffer read to avoid small read overhead
Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2015-01-16 03:07:09 +08:00
Haomai Wang
8d2af2faee AsyncMessenger: Using EventCenter instead of poll for bind
Totally avoid extra thread in AsyncMessenger now. The bind socket will be
regarded as a normal socket and will dispatch a random Worker thread to
handle accept event.

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2015-01-16 03:07:09 +08:00
Haomai Wang
f4fcff16b6 AsyncMessenger: Bind async thread to special cpu core
Now, 2-4 async op thread can fully meet a OSD's network demand with SSD
backend. So we can bind limited thread to special cores, it can improve
async event loop performance because most of structure and method will
processed within thread.

For example,

ms_async_op_threads = 2
ms_async_affinity_cores = 0,3

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
2015-01-16 03:07:09 +08:00
David Zafman
0aeba0f216 ceph_objectstore_tool: Describe super_ver values
Signed-off-by: David Zafman <dzafman@redhat.com>
2015-01-15 10:46:15 -08:00
xinxin shu
9db596974c fix command 'ceph pg dump_stuck degraded'
undersized not valid:  undersized not in inactive|unclean|stale
undersized not valid:  undersized doesn't represent an int
Invalid command:  unused arguments: ['undersized']
pg dump_stuck {inactive|unclean|stale [inactive|unclean|stale...]} {<int>} :  show information about stuck pgs

Signed-off-by: xinxin shu <xinxin.shu@intel.com>
2015-01-16 01:33:07 +08:00
Joao Eduardo Luis
34081562a8 mon: Monitor: drop StoreConverter code
We no longer convert stores on upgrade.  Users coming from bobtail or
before sould go through an interim version such as cuttlefish, dumpling,
firefly or giant.

Signed-off-by: Joao Eduardo Luis <joao@redhat.com>
2015-01-15 16:06:21 +00:00
Joao Eduardo Luis
1d814b76b8 ceph_mon: no longer attempt store conversion on start
People upgrading from bobtail or previous clusters should first go
through an interim version (quite a few to pick from: cuttlefish,
dumpling, firefly, giant).

Signed-off-by: Joao Eduardo Luis <joao@redhat.com>
2015-01-15 16:02:28 +00:00
Gregory Farnum
d4a64474e5 Merge pull request #3376 from dachary/wip-10547-formatter
common: restore format fallback semantic

Reviewed-by: Greg Farnum <gfarnum@redhat.com>
2015-01-15 07:11:17 -08:00
Joao Eduardo Luis
447d46991c mon: Monitor: health to clog writes every X seconds on the second
3600 will mean every hour, on the hour; 60 will mean every minute, on
the minute.  This will allow the monitors to spit out the info at
regular intervals, regardless the time at which they formed quorum or
which monitor is now the leader.

Signed-off-by: Joao Eduardo Luis <joao@redhat.com>
2015-01-15 14:58:36 +00:00
Joao Eduardo Luis
ae1032e2f0 mon: Monitor: cache 'summary' string to avoid dups on clog
By caching the summary string we can avoid writing dups on clog.

We will still write dups every 'mon_health_to_clog_interval', to make
sure that we still output health status every now and then, but we
increased the interval from 120 seconds to 3600 seconds -- once every
hour unless the health status changes.

Signed-off-by: Joao Eduardo Luis <joao@redhat.com>
2015-01-15 14:58:35 +00:00
Joao Eduardo Luis
fcd7aa00f5 mon: Monitor: reset health status cache on _reset()
Signed-off-by: Joao Eduardo Luis <joao@redhat.com>
2015-01-15 14:58:35 +00:00
Joao Eduardo Luis
81a2faf359 mon: Monitor: write health status to clog every X seconds
Instead of writing the health status only when a user action calls
get_health(), have the monitor writing it every X seconds.

Adds a new config option 'mon_health_to_clog_tick_interval' (default:
60 [seconds]), and changes the default value of
'mon_health_to_clog_interval' from 60 (seconds) to 120 (seconds).

If 'mon_health_to_clog' is 'true' and 'mon_health_to_clog_tick_interval'
is greater than 0.0, the monitor will now start a tick event when it
wins an election (meaning, only the leader will write this info to
clog).

This tick will, by default, run every 60 seconds.  It will call
Monitor::get_health() to obtain current health summary and overall
status.  If overall status is the same as the cached status, then it
will attempt to ignore it.  The status will not be ignored if the last
write to clog happened more than 'mon_health_to_clog_interval' seconds
ago (default: 120).

Signed-off-by: Joao Eduardo Luis <joao@redhat.com>
2015-01-15 14:58:35 +00:00
Joao Eduardo Luis
e2d66ae3cf mon: Monitor: 'get_health()' returns overall health status
Signed-off-by: Joao Eduardo Luis <joao@redhat.com>
2015-01-15 14:58:35 +00:00