Commit Graph

47130 Commits

Author SHA1 Message Date
Sage Weil
9341eec54d kv/RocksDBStore: rocksdb_separate_wal_dir option
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:54 -05:00
Sage Weil
3649a80a89 os/bluestore/BlueFS: ref count BlueFS::File *
There are FileWriters that exist when the file is
deleted.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:54 -05:00
Sage Weil
98485dee05 os/bluestore/BlueFS: readdir list dirs, too
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:54 -05:00
Sage Weil
b8630ee48c ceph-bluefs-tool: simple tool to export bluefs content
Currently we just do a dump.  We'll add more
functionality later.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:54 -05:00
Sage Weil
2d0537853a os/bluestore/BlueFS: many fixes
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:54 -05:00
Sage Weil
e4f6148c9f os/bluestore/BlueStore: share space with BlueFS
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:54 -05:00
Sage Weil
653882c446 os/bluestore/BlockDevice: move to simple mutex model
Just for now, while we get the rest of this working.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:53 -05:00
Sage Weil
dd04391706 os/bluestore/BlueFS: simple file system to back rocksdb
BlueFS is a simple file system that will back rocksdb.
BlueRocksEnv is the rocksdb::Env implementation that
glues them together.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:53 -05:00
Sage Weil
6f5ac50171 ceph_test_objectstore: less verbose
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:53 -05:00
Sage Weil
226b3476a3 ceph_test_objectstore: less verbose on hash collision test
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:53 -05:00
Sage Weil
1b8d5b6068 os/bluestore/BlueStore: fix _do_read
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:53 -05:00
Sage Weil
1ffd5e6963 os/bluestore/StupidAllocator: fix locking
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:53 -05:00
Sage Weil
14460484ff os/bluestore/StupidAllocator: fix misc bugs
Can't use invalid iterator; fix init_rm_free.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:52 -05:00
Sage Weil
08a94d95e1 os/bluestore/Allocator: init_rm_free
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:52 -05:00
Sage Weil
65f720ae9d kv/RocksDBStore: take custom Env
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:52 -05:00
Sage Weil
a869f92fac os/bluestore: fix _do_read return value
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:52 -05:00
Sage Weil
d704628cab os/bluestore/BlockDevice: fix read return value
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:52 -05:00
Sage Weil
9d01b8df9a os/bluestore: separate Allocator from freelist storage
FreelistManager perists our freelist.  Allocator is a policy that
allocates it.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:52 -05:00
Sage Weil
a62ffb0d03 newstore -> bluestore
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:52 -05:00
Sage Weil
3a4d583f85 os/newstore: always create db.wal
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:32 -05:00
Sage Weil
ad9f9fad01 os/newstore: create db dir
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:17 -05:00
Sage Weil
5658665ce7 os/newstore: consume a raw block device
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:06:17 -05:00
Sage Weil
32e768391f os/newstore: make collection_list tolerate sloppy start position
Because of this change (#6076), the hobject_t will contain pool id, hence
the ghobject_t having this hobject_t will be not equal to ghobject_t().

In newstore, this will cause assertion failure:
FAILED assert(k >= start_key && k < end_key)

The fix is to make compatible with previous change to create a
ghobject_t object with pool id and shard id in newstore.

Fixes: #13801
Reported-by: Zhi Zhang <zhangz.david@outlook.com>
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:05:18 -05:00
Sage Weil
2dae3df8af os/newstore: make key names more efficient
- pack u32 and u64 in binary (instead of in hex)
- avoid duplicating the object name while making things still
  sort by (key,name).  Use < when key < name, = when key == name,
  > when key > name) as a prefix.  And in the = case (which is
  basically always) include the name just once.

Note that this breaks on-disk compatibility.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:05:18 -05:00
Sage Weil
5e566dd7cb os/newstore: fix collection_list vs max entries
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:05:18 -05:00
Sage Weil
84646ab1c2 os/newstore: do not set/change frag_size if there are overlays
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:05:18 -05:00
Sage Weil
9291e1682a os/newstore: define a fid_backpointer_t type
Signed-off-by: Sage Weil <sage@redhat.com>

fix wal_oP_t
2016-01-01 13:05:17 -05:00
Sage Weil
b2db842e4d os/newstoer: add newstore types to ceph-dencoder
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:05:17 -05:00
Sage Weil
0af0dbdc14 os/newstore: set alloc hint on new frags
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:05:17 -05:00
Sage Weil
f0f815fb9e os/newstore: dump onode contents
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:05:17 -05:00
Sage Weil
299350461b os/newstore: fixed fragment size
Instead of a single, variable-length fragment for each object,
set a fixed size (newstore_min_frag_size = 1 MB) and stripe the
object over these.  The last fragment will be smaller
than 1 MB if the object is not a multiple of 1 MB.

On write, this is basically free: we can just as cheaply write
4 inodes created together and fsync them than we can one.  On
overwrite, it allows us to replace individual fragments and avoid
write-ahead many cases.

On read it is a bit slower because of inode lookups and disk
seeks.  In the common case (big object written sequentially) we
hope that fs prefetching will hide most of it (e.g., all inodes
will be loaded together in the same metadata btree node, and the
files' data is written sequentially on disk).

Allowing for a singe large fragment in the case of a sequentially
written large object may save us something, but it complicates
the code significantly.

Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:05:17 -05:00
Sage Weil
be0528f4d6 os/newstore: recycle rocksdb log files
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:05:17 -05:00
Sage Weil
feb2d3f6a3 rocksdb: latest master
Signed-off-by: Sage Weil <sage@redhat.com>
2016-01-01 13:05:16 -05:00
Sage Weil
2694e1171f Merge pull request #6649 from majianpeng/filesstore-lfnunlink
osd: FileStore:: optimize lfn_unlink

Reviewed-by: Kefu Chai <kchai@redhat.com>
2016-01-01 09:49:52 -05:00
Sage Weil
68a7c04b83 Merge pull request #7017 from efirs/ef_atomic_ceph_tid
osd: use atomic to generate ceph_tid

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
2016-01-01 09:48:10 -05:00
Sage Weil
c6f88ecdd7 Merge pull request #7077 from XinzeChi/wip-fix-wip-perf
osd: fix wip (l_osd_op_wip) perf counter and remove repop_map

Reviewed-by: Kefu Chai <kchai@redhat.com>
2016-01-01 09:47:35 -05:00
Sage Weil
44d8d63874 Merge pull request #6893 from kylinstorage/wip-osd_command
librados: add c++ style osd/pg command interface

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2015-12-31 09:33:51 -05:00
Sage Weil
cdc195a18e Merge pull request #5630 from wonzhq/evict-after-flush
osd: try evicting after flushing is done

Reviewed-by: Sage Weil <sage@redhat.com>
2015-12-31 09:33:26 -05:00
Sage Weil
da581106a3 Merge pull request #6639 from xiexingguo/xxg-wip-13822
librados: potential null pointer access in list_(n)objects

Reviewed-by: Kefu Chai <kchai@redhat.com>
2015-12-31 09:33:05 -05:00
Sage Weil
5dd1cb0f8c Merge pull request #6702 from liewegas/wip-fix-recency
osd/ReplicatedPG: fix promotion recency logic

Reviewed-by: Samuel Just <sjust@redhat.com>
2015-12-31 09:32:28 -05:00
Sage Weil
a67f873f8d Merge pull request #6824 from Sandy4999/wip-crushtool-build
crushtool: set type 0 name "device" for --build option

Reviewed-by: Sage Weil <sage@redhat.com>
2015-12-31 09:32:01 -05:00
Sage Weil
d5b9767635 Merge pull request #6962 from liewegas/wip-buffer-lastp
buffer: fix internal iterator invalidation on rebuild, get_contiguous

Reviewed-by: Samuel Just <sjust@redhat.com>
2015-12-31 09:30:55 -05:00
Sage Weil
b5a2a76ae8 Merge pull request #6970 from aiicore/drop_removal_pg_type
osd: drop deprecated removal pg type

Reviewed-by: Sage Weil <sage@redhat.com>
2015-12-31 09:30:12 -05:00
Xinze Chi
2fd3f43722 osd: remove repop_map in osd
If I do not misread, repop_map is useless.

Signed-off-by: Xinze Chi <xinze@xsky.com>
2015-12-29 22:48:55 +08:00
Xinze Chi
e29f55e0c8 osd: fix wip (l_osd_op_wip) perf counter
The l_osd_op_wip is for osd, so it should be the sum of all pgs in osd

Signed-off-by: Xinze Chi <xinze@xsky.com>
2015-12-29 22:00:31 +08:00
Kefu Chai
4dd0d1bd01 Merge pull request #6987 from H3C/wip-addr-bugfix
common/address_help.cc: fix the leak in entity_addr_from_url()

Reviewed-by: Kefu Chai <kchai@redhat.com>
2015-12-28 16:40:38 +08:00
Josh Durgin
c485d29a53 Merge remote-tracking branch 'origin/jewel' 2015-12-23 16:32:00 -08:00
qiankunzheng
508deb9804 common/address_help.cc: fix the leak in entity_addr_from_url()
Fixes: #14132
Signed-off-by: Qiankun Zheng <zheng.qiankun@h3c.com>
2015-12-23 17:30:34 -05:00
Josh Durgin
f8c4d04ce4 Merge pull request #7026 from xdonghai/master
rbd: must specify both of stripe-unit and stripe-count when specifying stripingv2 feature

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2015-12-23 14:28:24 -08:00
Josh Durgin
ea131fe815 Merge pull request #6998 from xiexingguo/xxg-wip-clsrbd
stringify outputted error code and fix unmatched parentheses.

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2015-12-23 14:15:32 -08:00