10.3: How to Find Good Grasps?

Last updated
Save as PDF

Page ID: 14832

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

Finding a good grasp that fully constrains an object against all possible external forces and torques, that is a grasp that lies in the “grasping wrench space” (Section 10.1.1 is often too restrictive. For example, it might be sufficient to find a grasp that constrains an object simply against gravity. Other applications instead might require the grasp to constrain an object’s movement also again lateral forces that happen due to acceleration. In practice, these considerations usually lead to simple application-specific heuristics. For example, in a warehouse picking tasks (Correll, Bekris, Berenson, Brock, Causo, Hauser, Okada, Rodriguez, Romano & Wurman 2016), the problem can be constrained to have the robot grasp only objects that are suitable to be retrieved with a simple suction cup. Finding a good grasp is then reduced to finding a flat surface close to the object’s perceived center of gravity. When considering household tasks, such has handling and placing dishes, using silver ware to scoop food, or holding a pitcher, we are often interested in very specific grasps that support the intended manipulation (Section 10.4) that follows.

Theoretically speaking, grasps such as picking up an object or opening a door by turning its knob are task-specific wrench spaces. We can then say that the grasp is “good”, when the task wrench space is a subset of the grasping wrench space, and will fail otherwise. We can also look at the ratio of forces actually applied to the object and the minimum needed to perform a desired wrench. If this ratio is high, for example, when the robot grasps an object far from its center of gravity or has to squeeze an object heavily to prevent it from slipping, this grasp is not as good as one, where the ratio is low and all of the force the robot is exerting is actually going into the desired wrench. It is usually not possible to find close-form expressions for the grasping wrench space. Instead, one can sample the space of suitable force vectors, e.g., by picking a couple of forces that are on the boundary of the cone’s base, and calculate the convex hull over the resulting wrenches.

Finding Good Grasps for Simple Grippers

Finding good grasps for simple grippers, which have only one or at most two degrees of freedom, reduces the problem to finding geometries on the object that are suitable to place the gripper jaws, that is two parallel faces that are reasonably flat and at a distance that is below the gripper’s maximum opening aperture.

In practice, an object might be perceived by a 3D perception device such as a stereo camera or a laser scanner, which would provide only one perspective of an object. A typical grasping pipeline using such a device is shown in Figure ??.

A typical algorithm proceeds as follows:

Acquisition: Obtain a “point cloud” or “depth image” of the objects of interest (Figure 10.3.1, b).
Pre-processing: Remove table plane or other points that are either too close or too far from the sensor (Figure 10.3.1, c).
Segmentation: Cluster points that are close enough, e.g., to identify individual objects (Figure 10.3.1, d).
Filtering: Filter clusters by size, geometry or other features, to down-select objects of interest (Figure 10.3.1, e).
Planning: Compute center-of-mass and principal axes of relevant clusters (Figure 10.3.1, f).
Collision-checking: Generate possible grasps and check for collisions with point clouds ((Figure 10.3.1, g).
Execution: Physically test a grasp by monitoring jaw distances, as well as forces and torques at the wrist ((Figure 10.3.1, h).

Some of these steps might not be necessary for all grasps, and some of them might have arbitrary complexity. For example, pre-processing is often used to remove known quanitites such as a table surface, from the data, but might be non-trivial when removing the edges of a bin, e.g.

Segmentation is the most critical step and requires some previous knowledge about the objects to grasp such as their size or the geometry of features thereon. In Figure 10.6, clustering points based on their distance is sufficient, e.g. using the DBSCAN algorithm (Ester, Kriegel, Sander, Xu et al. 1996), but requires an assumption about object size in order to select a suitable threshold. Other segmentation algorithms might use surface normals, or a combination of point cloud and image data such as color or patterns.

Filtering the resulting clusters to identify objects of interest can be as simple as rejecting those that are too small (as shown in Figure 10.6, e), but might also involve matching the points to a 3D model of a desired object or involving image data.

A simple approach to plan for possible grasps is to calculate the center-of-mass as well as the principal axes of an object using principal component analysis (Appendix B.5). Other approaches might require matching the existing points to a 3D model of the object to identify specific grasp points (such as the handle of a cup) or again rely on image features to do so.

After planning all, or some, possible grasps, grasps need to be checked for feasibility. While a collision with a point in the point-cloud might rule out a grasp, local search is sometimes being used to find a collision-free variant, for example by moving the gripper up and down as well as along the principal axes. In other applications, for example bin picking, some collisions might be ignored with the expectation that the gripper will push other objects out of its way.

Even though a grasp might look robust in a point cloud representation, it might not be effective when physically executing it. Possible failures are collisions with objects, insufficient friction with the object, or an object moving before the gripper is fully closed. For this reason, it is important to already close the gripper as much as possible before approaching the object, increasing the requirements for accurate perception.

With the ability to train neural networks to approximate complex functions, it is also possible to replace parts, or all of, the algorithmic steps shown in Figure 10.6 using a convolutional neural network trained by deep learning. While data intensive, such an approach can seamlessly merge image and depth data and adapt to application-specific data better than a hand-coded algorithm can.

Finding Good Grasps for Multi-Fingered Hands

The simple grasping pipeline described above is computationally expensive as there exist usually many possible grasp candidates, and each of them need to be checked for collisions. This problem explodes when considering grippers with articulated fingers. This can be overcome by considering only a predefined set of grasps such as two and three finger pinches for small objects and full-hand encompassing grasps for larger objects, e.g.

A suitable method to search the full space of possible grasps with an articulated hand is to use random sampling, that is bringing the end-effector to random positions, close its fingers around the object, and see what happens when generating wrenches that fulfill the task’s requirements. To “see what happens”, requires collision checking and dynamic simulation. Dynamic simulation applies Newtonian mechanics to an object (i.e., forces lead to acceleration of a body) and moves the object at very small time-steps. While this can be done using the connected components identified in the point cloud alone and assuming reasonable parameters for friction and contact points, point cloud data can also be augmented by object models to simulate whether a grasp has a high likelihood to be successful.